Science.gov

Sample records for acid sequence analysis

  1. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    States, David J.

    2004-07-28

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  2. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    SciTech Connect

    Myers, G.; Foley, B.; Korber, B.; Mellors, J.W.; Jeang, K.T.; Wain-Hobson, S.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.

  3. An analysis of amino acid sequences surrounding archaeal glycoprotein sequons.

    PubMed

    Abu-Qarn, Mehtap; Eichler, Jerry

    2007-05-01

    Despite having provided the first example of a prokaryal glycoprotein, little is known of the rules governing the N-glycosylation process in Archaea. As in Eukarya and Bacteria, archaeal N-glycosylation takes place at the Asn residues of Asn-X-Ser/Thr sequons. Since not all sequons are utilized, it is clear that other factors, including the context in which a sequon exists, affect glycosylation efficiency. As yet, the contribution to N-glycosylation made by sequon-bordering residues and other related factors in Archaea remains unaddressed. In the following, the surroundings of Asn residues confirmed by experiment as modified were analyzed in an attempt to define sequence rules and requirements for archaeal N-glycosylation.

  4. Sequencing and computational analysis of complete genome sequences of Citrus yellow mosaic badna virus from acid lime and pummelo.

    PubMed

    Borah, Basanta K; Johnson, A M Anthony; Sai Gopal, D V R; Dasgupta, Indranil

    2009-08-01

    Citrus yellow mosaic badna virus (CMBV), a member of the Family Caulimoviridae, Genus Badnavirus, is the causative agent of Citrus mosaic disease in India. Although the virus has been detected in several citrus species, only two full-length genomes, one each from Sweet orange and Rangpur lime, are available in publicly accessible databases. In order to obtain a better understanding of the genetic variability of the virus in other citrus mosaic-affected citrus species, we performed the cloning and sequence analysis of complete genomes of CMBV from two additional citrus species, Acid lime and Pummelo. We show that CMBV genomes from the two hosts share high homology with previously reported CMBV sequences and hence conclude that the new isolates represent variants of the virus present in these species. Based on in silico sequence analysis, we predict the possible function of the protein encoded by one of the five ORFs.

  5. Human retroviruses and aids, 1992. A compilation and analysis of nucleic acid and amino acid sequences

    SciTech Connect

    Myers, G.; Korber, B.; Berzofsky, J.A.; Pavlakis, G.N.; Smith, R.F.

    1992-10-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) HIV and SIV Nucleotide Sequences; (H) Amino Acid Sequences; (III) Analyses; (IV) Related Sequences; and (V) Database Communications. information within all the parts is updated at least twice in each year, which accounts for the modes of binding and pagination in the compendium. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions below of the parts of the compendium, the user should read the individual introductions for each part.

  6. Analysis of cloned cDNA and genomic sequences for phytochrome: complete amino acid sequences for two gene products expressed in etiolated Avena.

    PubMed Central

    Hershey, H P; Barker, R F; Idler, K B; Lissemore, J L; Quail, P H

    1985-01-01

    Cloned cDNA and genomic sequences have been analyzed to deduce the amino acid sequence of phytochrome from etiolated Avena. Restriction endonuclease site polymorphism between clones indicates that at least four phytochrome genes are expressed in this tissue. Sequence analysis of two complete and one partial coding region shows approximately 98% homology at both the nucleotide and amino acid levels, with the majority of amino acid changes being conservative. High sequence homology is also found in the 5'-untranslated region but significant divergence occurs in the 3'-untranslated region. The phytochrome polypeptides are 1128 amino acid residues long corresponding to a molecular mass of 125 kdaltons. The known protein sequence at the chromophore attachment site occurs only once in the polypeptide, establishing that phytochrome has a single chromophore per monomer covalently linked to Cys-321. Computer analyses of the amino acid sequences have provided predictions regarding a number of structural features of the phytochrome molecule. PMID:3001642

  7. Analysis of amino acid sequence variations and immunoglobulin E-binding epitopes of German cockroach tropomyosin.

    PubMed

    Jeong, Kyoung Yong; Lee, Jongweon; Lee, In-Yong; Ree, Han-Il; Hong, Chein-Soo; Yong, Tai-Soon

    2004-09-01

    The allergenicities of tropomyosins from different organisms have been reported to vary. The cDNA encoding German cockroach tropomyosin (Bla g 7) was isolated, expressed, and characterized previously. In the present study, the amino acid sequence variations in German cockroach tropomyosin were analyzed in order to investigate its influence on allergenicity. We also undertook the identification of immunodominant peptides containing immunoglobulin E (IgE) epitopes which may facilitate the development of diagnostic and immunotherapeutic strategies based on the recombinant proteins. Two-dimensional gel electrophoresis and immunoblot analysis with mouse anti-recombinant German cockroach tropomyosin serum was performed to investigate the isoforms at the protein level. Reverse transcriptase PCR (RT-PCR) was applied to examine the sequence diversity. Eleven different variants of the deduced amino acid sequences were identified by RT-PCR. German cockroach tropomyosin has only minor sequence variations that did not seem to affect its allergenicity significantly. These results support the molecular basis underlying the cross-reactivities of arthropod tropomyosins. Recombinant fragments were also generated by PCR, and IgE-binding epitopes were assessed by enzyme-linked immunosorbent assay. Sera from seven patients revealed heterogeneous IgE-binding responses. This study demonstrates multiple IgE-binding epitope regions in a single molecule, suggesting that full-length tropomyosin should be used for the development of diagnostic and therapeutic reagents.

  8. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Patel, Kamlesh D [Ken; SNL,

    2016-07-12

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  9. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    Patel, Kamlesh D; SNL,

    2012-06-01

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  10. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3

    PubMed Central

    Xiao, Jingfa; Hao, Lirui; Crowley, David E.; Zhang, Zhewen; Yu, Jun; Huang, Ning; Huo, Mingxin; Wu, Jiayan

    2015-01-01

    Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals. PMID:26301592

  11. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3.

    PubMed

    Wang, Xiaoyu; Chen, Meili; Xiao, Jingfa; Hao, Lirui; Crowley, David E; Zhang, Zhewen; Yu, Jun; Huang, Ning; Huo, Mingxin; Wu, Jiayan

    2015-01-01

    Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals.

  12. Plant mitochondrial nucleic acid sequences as a tool for phylogenetic analysis.

    PubMed Central

    Hiesel, R; von Haeseler, A; Brennicke, A

    1994-01-01

    To evaluate the potential of mitochondrial nucleic acid sequences as a phylogenetic tool, we have analyzed cytochrome oxidase subunit III (coxIII) coding sequences in representatives of the major groups of land plants. The phylogenetic tree derived from these mitochondrial sequences confirms the monophyletic origin of land plant mitochondria with the general order and descent of land plants deduced by other molecular, physiological, and morphological traits. The mitochondrial sequences strongly suggest a close phylogenetic relationship between Bryophyta and Lycopodiatae, whereas Psilophytatae cluster with the other vascular plants. In addition to the high sequence similarity, both Hepaticophytina and Lycopodiatae contain a related intron in the coxIII gene that, to our knowledge, is not found in any other plant species. The slowly evolving mitochondrial sequences of plants are shown to provide a useful phylogenetic tool to evaluate distant evolutionary relationships within this kingdom. PMID:7507251

  13. Phylogenetic analysis of beta-papillomaviruses as inferred from nucleotide and amino acid sequence data.

    PubMed

    Gottschling, Marc; Köhler, Anja; Stockfleth, Eggert; Nindl, Ingo

    2007-01-01

    Human papillomaviruses (HPV) of the beta-group seem to be involved in the pathogenesis of non-melanoma skin cancer. Papillomaviruses are host specific and are considered closely co-evolving with their hosts. Evolutionary incongruence between early genes and late genes has been reported among oncogenic genital alpha-papillomaviruses and considerably challenge phylogenetic reconstructions. We investigated the relationships of 29 beta-HPV (25 types plus four putative new types, subtypes, or variants) as inferred from codon aligned and amino acid sequence data of the genes E1, E2, E6, E7, L1, and L2 using likelihood, distance, and parsimony approaches. An analysis of a L1 fragment included additional nucleotide and amino acid sequences from seven non-human beta-papillomaviruses. Early genes and late genes evolution did not conflict significantly in beta-papillomaviruses based on partition homogeneity tests (p > or = 0.001). As inferred from the complete genome analyses, beta-papillomaviruses were monophyletic and segregated into four highly supported monophyletic assemblages corresponding to the species 1, 2, 3, and fused 4/5. They basically split into the species 1 and the remainder of beta-papillomaviruses, whose species 3, 4, and 5 constituted the sistergroup of species 2. beta-Papillomaviruses have been isolated from humans, apes, and monkeys, and phylogenetic analyses of the L1 fragment showed non-human papillomaviruses highly polyphyletic nesting within the HPV species. Thus, host and virus phylogenies were not congruent in beta-papillomaviruses, and multiple invasions across species borders may contribute (additionally to host-linked evolution) to their diversification.

  14. In silico comparative analysis of DNA and amino acid sequences for prion protein gene.

    PubMed

    Kim, Y; Lee, J; Lee, C

    2008-01-01

    Genetic variability might contribute to species specificity of prion diseases in various organisms. In this study, structures of the prion protein gene (PRNP) and its amino acids were compared among species of which sequence data were available. Comparisons of PRNP DNA sequences among 12 species including human, chimpanzee, monkey, bovine, ovine, dog, mouse, rat, wallaby, opossum, chicken and zebrafish allowed us to identify candidate regulatory regions in intron 1 and 3'-untranslated region (UTR) in addition to the coding region. Highly conserved putative binding sites for transcription factors, such as heat shock factor 2 (HSF2) and myocite enhancer factor 2 (MEF2), were discovered in the intron 1. In 3'-UTR, the functional sequence (ATTAAA) for nucleus-specific polyadenylation was found in all the analysed species. The functional sequence (TTTTTAT) for maturation-specific polyadenylation was identically observed only in ovine, and one or two nucleotide mismatches in the other species. A comparison of the amino acid sequences in 53 species revealed a large sequence identity. Especially the octapeptide repeat region was observed in all the species but frog and zebrafish. Functional changes and susceptibility to prion diseases with various isoforms of prion protein could be caused by numeric variability and conformational changes discovered in the repeat sequences.

  15. Microwave-assisted acid and base hydrolysis of intact proteins containing disulfide bonds for protein sequence analysis by mass spectrometry.

    PubMed

    Reiz, Bela; Li, Liang

    2010-09-01

    Controlled hydrolysis of proteins to generate peptide ladders combined with mass spectrometric analysis of the resultant peptides can be used for protein sequencing. In this paper, two methods of improving the microwave-assisted protein hydrolysis process are described to enable rapid sequencing of proteins containing disulfide bonds and increase sequence coverage, respectively. It was demonstrated that proteins containing disulfide bonds could be sequenced by MS analysis by first performing hydrolysis for less than 2 min, followed by 1 h of reduction to release the peptides originally linked by disulfide bonds. It was shown that a strong base could be used as a catalyst for microwave-assisted protein hydrolysis, producing complementary sequence information to that generated by microwave-assisted acid hydrolysis. However, using either acid or base hydrolysis, amide bond breakages in small regions of the polypeptide chains of the model proteins (e.g., cytochrome c and lysozyme) were not detected. Dynamic light scattering measurement of the proteins solubilized in an acid or base indicated that protein-protein interaction or aggregation was not the cause of the failure to hydrolyze certain amide bonds. It was speculated that there were some unknown local structures that might play a role in preventing an acid or base from reacting with the peptide bonds therein.

  16. Transcriptomic Analysis of Octanoic Acid Response in Drosophila sechellia Using RNA-Sequencing.

    PubMed

    Lanno, Stephen M; Gregory, Sara M; Shimshak, Serena J; Alverson, Maximilian K; Chiu, Kenneth; Feil, Arden L; Findley, Morgan G; Forman, Taylor E; Gordon, Julia T; Ho, Josephine; Krupp, Joanna L; Lam, Ivy; Lane, Josh; Linde, Samuel C; Morse, Ashley E; Rusk, Serena; Ryan, Robie; Saniee, Avva; Sheth, Ruchi B; Siranosian, Jennifer J; Sirichantaropart, Lalitpatr; Sternlieb, Sonya R; Zaccardi, Christina M; Coolon, Joseph D

    2017-10-12

    The dietary specialist fruit fly Drosophila sechellia has evolved to specialize on the toxic fruit of its host plant Morinda citrifolia Toxicity of Morinda fruit is primarily due to high levels of octanoic acid (OA). Using RNA interference (RNAi), prior work found that knockdown of Osiris family genes Osiris 6 (Osi6), Osi7, and Osi8 led to increased susceptibility to OA in adult D. melanogaster flies, likely representing genes underlying a Quantitative Trait Locus (QTL) for OA resistance in D. sechellia While genes in this major effect locus are beginning to be revealed, prior work has shown at least five regions of the genome contribute to OA resistance. Here, we identify new candidate OA resistance genes by performing differential gene expression analysis using RNA sequencing (RNA-seq) on control and OA-exposed D. sechellia flies. We found 104 significantly differentially expressed genes with annotated orthologs in D. melanogaster, including six Osiris gene family members, consistent with previous functional studies and gene expression analyses. Gene ontology (GO) term enrichment showed significant enrichment for cuticle development in upregulated genes and significant enrichment of immune and defense responses in downregulated genes suggesting important aspects of the physiology of D. sechellia that may play a role in OA resistance. In addition, we identified 5 candidate OA resistance genes that potentially underlie QTL peaks outside of the major effect region, representing promising new candidate genes for future functional studies. Copyright © 2017, G3: Genes, Genomes, Genetics.

  17. Analysis of protein function and its prediction from amino acid sequence.

    PubMed

    Clark, Wyatt T; Radivojac, Predrag

    2011-07-01

    Understanding protein function is one of the keys to understanding life at the molecular level. It is also important in the context of human disease because many conditions arise as a consequence of alterations of protein function. The recent availability of relatively inexpensive sequencing technology has resulted in thousands of complete or partially sequenced genomes with millions of functionally uncharacterized proteins. Such a large volume of data, combined with the lack of high-throughput experimental assays to functionally annotate proteins, attributes to the growing importance of automated function prediction. Here, we study proteins annotated by Gene Ontology (GO) terms and estimate the accuracy of functional transfer from protein sequence only. We find that the transfer of GO terms by pairwise sequence alignments is only moderately accurate, showing a surprisingly small influence of sequence identity (SID) in a broad range (30-100%). We developed and evaluated a new predictor of protein function, functional annotator (FANN), from amino acid sequence. The predictor exploits a multioutput neural network framework which is well suited to simultaneously modeling dependencies between functional terms. Experiments provide evidence that FANN-GO (predictor of GO terms; available from http://www.informatics.indiana.edu/predrag) outperforms standard methods such as transfer by global or local SID as well as GOtcha, a method that incorporates the structure of GO.

  18. Composition for nucleic acid sequencing

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  19. Processing and amino acid sequence analysis of the mouse mammary tumor virus env gene product.

    PubMed Central

    Arthur, L O; Copeland, T D; Oroszlan, S; Schochetman, G

    1982-01-01

    The envelope proteins of mouse mammary tumor virus (MMTV) are synthesized from a subgenomic 24S mRNA as a 75,000-dalton glycosylated precursor polyprotein which is eventually processed to the mature glycoproteins gp52 and gp36. In vivo synthesis of this env precursor in the presence of the core glycosylation inhibitor tunicamycin yielded a precursor of approximately 61,000 daltons (P61env). However, a 67,000-dalton protein (P67env) was obtained from cell-free translation with the MMTV 24S mRNA as the template. To determine whether the portion of the protein cleaved from P67env to give P61env was removed from the NH2-terminal end of P67env and as such would represent a leader sequence, the NH2-terminal amino acid sequence of the terminal peptide gp52 was determined. Glutamic acid, and not methionine, was found to be the amino-terminal residue of gp52, indicating that the cleaved portion was derived from the NH2-terminal end of P67env. The NH2-terminal amino acid sequences of gp52's from endogenous and exogenous C3H MMTVs were determined though 46 residues and found to be identical. However, amino acid composition and type-specific gp52 radioimmunoassays from MMTVs grown in heterologous cells indicated primary structure differences between gp52's of the two viruses. The nucleic acid sequence of cloned MMTV DNA fragments (J. Majors and H. E. Varmus, personal communication) in conjunction with the NH2-terminal sequence of gp52 allowed localization of the env gene in the MMTV genome. Nucleotides coding for the NH2 terminus of gp52 begin approximately 0.8 kilobase to the 3' side of the single EcoRI cleavage site. Localization of the env gene at that point agrees with the proposed gene order -gag-pol-env- and also allows sufficient coding potential for the glycoprotein precursor without extending into the long terminal repeat. Images PMID:6281457

  20. Complete amino acid sequence of branched-chain amino acid aminotransferase (transaminase B) of Salmonella typhimurium, identification of the coenzyme-binding site and sequence comparison analysis

    SciTech Connect

    Feild, M.J.

    1988-01-01

    The complete amino acid sequence of the subunit of branched-chain amino acid aminotransferase of Salmonella typhimurium was determined by automated Edman degradation of peptide fragments generated by chemical and enzymatic digestion of S-carboxymethylated and S-pyridylethylated transaminase B. Peptide fragments of transaminase B were generated by treatment of the enzyme with trypsin, Staphylococcus aureus V8 protease, endoproteinase Lys-C, and cyanogen bromide. Protocols were developed for separation of the peptide fragments by reverse-phase high performance liquid chromatography (HPLC), ion-exchange HPLC, and SDS-urea gel electrophoresis. The enzyme subunit contains 308 amino acid residues and has a molecular weight of 33,920 daltons. The coenzyme-binding site was determined by treatment of the enzyme, containing bound pyridoxal 5-phosphate, with tritiated sodium borohydride prior to trypsin digestion. Monitoring radioactivity incorporation and peptide map comparisons with an apoenzyme tryptic digest, allowed identification of the pyridoxylated-peptide which was isolated by reverse-phase HPLC and sequenced. The coenzyme-binding site is a lysyl residue at position 159. Some peptides were further characterized by fast atom bombardment mass spectrometry.

  1. Automated high throughput nucleic acid purification from formalin-fixed paraffin-embedded tissue samples for next generation sequence analysis

    PubMed Central

    Haile, Simon; Pandoh, Pawan; McDonald, Helen; Corbett, Richard D.; Tsao, Philip; Kirk, Heather; MacLeod, Tina; Jones, Martin; Bilobram, Steve; Brooks, Denise; Smailus, Duane; Steidl, Christian; Scott, David W.; Bala, Miruna; Hirst, Martin; Miller, Diane; Moore, Richard A.; Mungall, Andrew J.; Coope, Robin J.; Ma, Yussanne; Zhao, Yongjun; Holt, Rob A.; Jones, Steven J.

    2017-01-01

    Curation and storage of formalin-fixed, paraffin-embedded (FFPE) samples are standard procedures in hospital pathology laboratories around the world. Many thousands of such samples exist and could be used for next generation sequencing analysis. Retrospective analyses of such samples are important for identifying molecular correlates of carcinogenesis, treatment history and disease outcomes. Two major hurdles in using FFPE material for sequencing are the damaged nature of the nucleic acids and the labor-intensive nature of nucleic acid purification. These limitations and a number of other issues that span multiple steps from nucleic acid purification to library construction are addressed here. We optimized and automated a 96-well magnetic bead-based extraction protocol that can be scaled to large cohorts and is compatible with automation. Using sets of 32 and 91 individual FFPE samples respectively, we generated libraries from 100 ng of total RNA and DNA starting amounts with 95–100% success rate. The use of the resulting RNA in micro-RNA sequencing was also demonstrated. In addition to offering the potential of scalability and rapid throughput, the yield obtained with lower input requirements makes these methods applicable to clinical samples where tissue abundance is limiting. PMID:28570594

  2. Automated high throughput nucleic acid purification from formalin-fixed paraffin-embedded tissue samples for next generation sequence analysis.

    PubMed

    Haile, Simon; Pandoh, Pawan; McDonald, Helen; Corbett, Richard D; Tsao, Philip; Kirk, Heather; MacLeod, Tina; Jones, Martin; Bilobram, Steve; Brooks, Denise; Smailus, Duane; Steidl, Christian; Scott, David W; Bala, Miruna; Hirst, Martin; Miller, Diane; Moore, Richard A; Mungall, Andrew J; Coope, Robin J; Ma, Yussanne; Zhao, Yongjun; Holt, Rob A; Jones, Steven J; Marra, Marco A

    2017-01-01

    Curation and storage of formalin-fixed, paraffin-embedded (FFPE) samples are standard procedures in hospital pathology laboratories around the world. Many thousands of such samples exist and could be used for next generation sequencing analysis. Retrospective analyses of such samples are important for identifying molecular correlates of carcinogenesis, treatment history and disease outcomes. Two major hurdles in using FFPE material for sequencing are the damaged nature of the nucleic acids and the labor-intensive nature of nucleic acid purification. These limitations and a number of other issues that span multiple steps from nucleic acid purification to library construction are addressed here. We optimized and automated a 96-well magnetic bead-based extraction protocol that can be scaled to large cohorts and is compatible with automation. Using sets of 32 and 91 individual FFPE samples respectively, we generated libraries from 100 ng of total RNA and DNA starting amounts with 95-100% success rate. The use of the resulting RNA in micro-RNA sequencing was also demonstrated. In addition to offering the potential of scalability and rapid throughput, the yield obtained with lower input requirements makes these methods applicable to clinical samples where tissue abundance is limiting.

  3. High speed nucleic acid sequencing

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  4. Bioinformatics analysis of the oxidosqualene cyclase gene and the amino acid sequence in mangrove plants

    NASA Astrophysics Data System (ADS)

    Basyuni, M.; Wati, R.

    2017-01-01

    This study described the bioinformatics methods to analyze seven oxidosqualene cyclase (OSC) genes from mangrove plants on DDBJ/EMBL/GenBank as well as predicted the structure, composition, similarity, subcellular localization and phylogenetic. The physical and chemical properties of seven mangrove OSC showed variation among the genes. The percentage of the secondary structure of seven mangrove OSC genes followed the order of a helix > random coil > extended chain structure. The values of chloroplast or signal peptide were too low, indicated that no chloroplast transit peptide or signal peptide of secretion pathway in mangrove OSC genes. The target peptide value of mitochondria varied from 0.163 to 0.430, indicated it was possible to exist. These results suggested the importance of understanding the diversity and functional of properties of the different amino acids in mangrove OSC genes. To clarify the relationship among the mangrove OSC gene, a phylogenetic tree was constructed. The phylogenetic tree shows that there are three clusters, Kandelia KcMS join with Bruguiera BgLUS, Rhizophora RsM1 was close to Bruguiera BgbAS, and Rhizophora RcCAS join with Kandelia KcCAS. The present study, therefore, supported the previous results that plant OSC genes form distinct clusters in the tree.

  5. Molecular Biocomputing Suite: a word processor add-in for the analysis and manipulation of nucleic acid and protein sequence data.

    PubMed

    Muller, P Y; Studer, E; Miserez, A R

    2001-12-01

    In all fields of molecular biology, researchers are increasingly challenged by experiments planned and evaluated on the basis of nucleic acid and protein sequence data generally retrieved from public databases. Despite the wide spectrum of available Web-based software tools for sequence analysis, the routine use of these tools has disadvantages, particularly because of the elaborate and heterogeneous ways of data input, output, and storage. Here we present a Visual Basic-encoded Microsoft Word Add-In, the Molecular BioComputing Suite (MBCS), available at the BioTechniques Software Library (www.BioTechniques.com). The MBCS software aims to manage and expedite a wide range of sequence analyses and manipulations using an integrated text editor environment including menu-guided commands. Its independence of sequence formats enables MBCS to be used as a pivotal application between other software tools for sequence analysis, manipulation, annotation, and editing.

  6. Advances in sequence analysis.

    PubMed

    Califano, A

    2001-06-01

    In its early days, the entire field of computational biology revolved almost entirely around biological sequence analysis. Over the past few years, however, a number of new non-sequence-based areas of investigation have become mainstream, from the analysis of gene expression data from microarrays, to whole-genome association discovery, and to the reverse engineering of gene regulatory pathways. Nonetheless, with the completion of private and public efforts to map the human genome, as well as those of other organisms, sequence data continue to be a veritable mother lode of valuable biological information that can be mined in a variety of contexts. Furthermore, the integration of sequence data with a variety of alternative information is providing valuable and fundamentally new insight into biological processes, as well as an array of new computational methodologies for the analysis of biological data.

  7. Chip-based sequencing nucleic acids

    DOEpatents

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  8. Lactic acid bacterial diversity in the traditional mexican fermented dough pozol as determined by 16S rDNA sequence analysis.

    PubMed

    Escalante, A; Wacher, C; Farrés, A

    2001-02-28

    The lactic acid bacteria diversity of pozol, a Mexican fermented maize dough, was studied using a total DNA extraction and purification procedure and PCR amplification of 16S rDNA for gram-positive and related bacterial groups. Thirty-six clones were obtained and sequenced to 650 nucleotides. These partial sequences were identified by submission to the non-redundant nucleotide database of NCBI. The identified sequences were aligned with reference sequences of the closest related organisms. This analysis indicated that only 14 sequences were unique clones and these were identified as Lactococcus lactis, Streptococcus suis, Lactobacillus plantarum, Lact. casei, Lact. alimentarium, and Lact. delbruekii and Clostridium sp. Two non-ribosomal sequences were also detected. Unlike other environments analyzed with this molecular approach where many unidentified microorganisms are found, the identity of most sequences could be established as lactic acid bacteria, indicating that this is the main group among the gram-positive bacteria in pozol. Use of this molecular method permitted detection of lactic acid bacteria different from those previously isolated and identified by culture techniques

  9. Image analysis for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Palaniappan, Kannappan; Huang, Thomas S.

    1991-07-01

    There is a great deal of interest in automating the process of DNA (deoxyribonucleic acid) sequencing to support the analysis of genomic DNA such as the Human and Mouse Genome projects. In one class of gel-based sequencing protocols autoradiograph images are generated in the final step and usually require manual interpretation to reconstruct the DNA sequence represented by the image. The need to handle a large volume of sequence information necessitates automation of the manual autoradiograph reading step through image analysis in order to reduce the length of time required to obtain sequence data and reduce transcription errors. Various adaptive image enhancement, segmentation and alignment methods were applied to autoradiograph images. The methods are adaptive to the local characteristics of the image such as noise, background signal, or presence of edges. Once the two-dimensional data is converted to a set of aligned one-dimensional profiles waveform analysis is used to determine the location of each band which represents one nucleotide in the sequence. Different classification strategies including a rule-based approach are investigated to map the profile signals, augmented with the original two-dimensional image data as necessary, to textual DNA sequence information.

  10. Analysis of expression and amino acid sequence of the allergen Mag 3 in two species of house dust mites-Dermatophagoides farinae and D. pteronyssinus (Acari: Astigmata: Pyroglyphidae).

    PubMed

    Asman, Marek; Solarz, Krzysztof; Szilman, Ewa; Szilman, Piotr

    2010-01-01

    In the 90's of the XX century, 2 new and important allergens of house dust mites mites were cloned and sequenced: Mag 1 and Mag 3. However, the second allergen has been identified to date only in extracts of Dermatophagoides farinae [DF ]. In this work, we aimed to detect expression of this important allergen and for the first time analyze to the amino acid sequence in other species of house dust mite - Dermatophagoides pteronyssinus [DP ]. We were able to confirm the expression of allergen Mag 3 in DF and to exclude it in DP . By sequencing the products of DNA amplification, we revealed the nucleotide sequence encoding allergen Mag 3 in DF . This analysis enabled detection of 9 single base changes. An analysis of encoded amino acid sequence by triplets with substituted nucleotides revealed that 8 changes were polymorphic, and 1 was a mutation substituting GTG (valine) for ATG (methionine) at 236 position. However, the presence of amino acid sequence difference in this allergen might suggest that there exist other isoforms which can make difficult both diagnosis as well as immunotherapy in persons who produce allergic response to this allergen. The variants of allergen Mag 3 (group 14) are still not known beside the very good known allergen variants of the other main groups 1, 2, 4, 5 or 7. Thus, the identification and definition of allergic properties of allergen Mag 3 variants needs to be further investigated.

  11. Human Retroviruses and AIDS. A compilation and analysis of nucleic acid and amino acid sequences: I--II; III--V

    SciTech Connect

    Myers, G.; Korber, B.; Wain-Hobson, S.; Smith, R.F.; Pavlakis, G.N.

    1993-12-31

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (I) HIV and SIV Nucleotide Sequences; (II) Amino Acid Sequences; (III) Analyses; (IV) Related Sequences; and (V) Database Communications. Information within all the parts is updated at least twice in each year, which accounts for the modes of binding and pagination in the compendium.

  12. Genome Sequence and Transcriptome Analysis of Meat-Spoilage-Associated Lactic Acid Bacterium Lactococcus piscium MKFS47

    PubMed Central

    Johansson, Per; Laine, Pia; Smolander, Olli-Pekka; Sonck, Matti; Rahkila, Riitta; Jääskeläinen, Elina; Paulin, Lars; Auvinen, Petri; Björkroth, Johanna

    2015-01-01

    Lactococcus piscium is a psychrotrophic lactic acid bacterium and is known to be one of the predominant species within spoilage microbial communities in cold-stored packaged foods, particularly in meat products. Its presence in such products has been associated with the formation of buttery and sour off-odors. Nevertheless, the spoilage potential of L. piscium varies dramatically depending on the strain and growth conditions. Additional knowledge about the genome is required to explain such variation, understand its phylogeny, and study gene functions. Here, we present the complete and annotated genomic sequence of L. piscium MKFS47, combined with a time course analysis of the glucose catabolism-based transcriptome. In addition, a comparative analysis of gene contents was done for L. piscium MKFS47 and 29 other lactococci, revealing three distinct clades within the genus. The genome of L. piscium MKFS47 consists of one chromosome, carrying 2,289 genes, and two plasmids. A wide range of carbohydrates was predicted to be fermented, and growth on glycerol was observed. Both carbohydrate and glycerol catabolic pathways were significantly upregulated in the course of time as a result of glucose exhaustion. At the same time, differential expression of the pyruvate utilization pathways, implicated in the formation of spoilage substances, switched the metabolism toward a heterofermentative mode. In agreement with data from previous inoculation studies, L. piscium MKFS47 was identified as an efficient producer of buttery-odor compounds under aerobic conditions. Finally, genes and pathways that may contribute to increased survival in meat environments were considered. PMID:25819958

  13. Differentiation of acetic acid bacteria based on sequence analysis of 16S-23S rRNA gene internal transcribed spacer sequences.

    PubMed

    González, Angel; Mas, Albert

    2011-06-30

    The 16S-23S gene internal transcribed spacer sequence of sixty-four strains belonging to different acetic acid bacteria genera were analyzed, and phylogenetic trees were generated for each genera. The topologies of the different trees were in accordance with the 16S rRNA gene trees, although the similarity percentages obtained between the species was shown to be much lower. These values suggest the usefulness of including the 16S-23S gene internal transcribed spacer region as a part of the polyphasic approach required for the further classification of acetic acid bacteria. Furthermore, the region could be a good target for primer and probe design. It has also been validated for use in the identification of unknown samples of this bacterial group from wine vinegar and fruit condiments.

  14. Coronavirus genome: prediction of putative functional domains in the non-structural polyprotein by comparative amino acid sequence analysis.

    PubMed Central

    Gorbalenya, A E; Koonin, E V; Donchenko, A P; Blinov, V M

    1989-01-01

    Amino acid sequences of 2 giant non-structural polyproteins (F1 and F2) of infectious bronchitis virus (IBV), a member of Coronaviridae, were compared, by computer-assisted methods, to sequences of a number of other positive strand RNA viral and cellular proteins. By this approach, juxtaposed putative RNA-dependent RNA polymerase, nucleic acid binding ("finger"-like) and RNA helicase domains were identified in F2. Together, these domains might constitute the core of the protein complex involved in the primer-dependent transcription, replication and recombination of coronaviruses. In F1, two cysteine protease-like domains and a growth factor-like one were revealed. One of the putative proteases of IBV is similar to 3C proteases of picornaviruses and related enzymes of como- nepo- and potyviruses. Search of IBV F1 and F2 sequences for sites similar to those cleaved by the latter proteases and intercomparison of the surrounding sequence stretches revealed 13 dipeptides Q/S(G) which are probably cleaved by the coronavirus 3C-like protease. Based on these observations, a partial tentative scheme for the functional organization and expression strategy of the non-structural polyproteins of IBV was proposed. It implies that, despite the general similarity to other positive strand RNA viruses, and particularly to potyviruses, coronaviruses possess a number of unique structural and functional features. PMID:2526320

  15. Distinguishing Proteins From Arbitrary Amino Acid Sequences

    PubMed Central

    Yau, Stephen S.-T.; Mao, Wei-Guang; Benson, Max; He, Rong Lucy

    2015-01-01

    What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid sequence can be a protein. Here we show that when the collection of arbitrary amino acid sequences is viewed in an appropriate geometric context, the protein sequences cluster together. This leads to a new computational test, described here, that has proved to be remarkably accurate at determining whether an arbitrary amino acid sequence can be a protein. Even more, if the results of this test indicate that the sequence can be a protein, and it is indeed a protein sequence, then its identity as a protein sequence is uniquely defined. We anticipate our computational test will be useful for those who are attempting to complete the job of discovering all proteins, or constructing the protein universe. PMID:25609314

  16. The complete amino acid sequence of prochymosin.

    PubMed Central

    Foltmann, B; Pedersen, V B; Jacobsen, H; Kauffman, D; Wybrandt, G

    1977-01-01

    The total sequence of 365 amino acid residues in bovine prochymosin is presented. Alignment with the amino acid sequence of porcine pepsinogen shows that 204 amino acid residues are common to the two zymogens. Further comparison and alignment with the amino acid sequence of penicillopepsin shows that 66 residues are located at identical positions in all three proteases. The three enzymes belong to a large group of proteases with two aspartate residues in the active center. This group forms a family derived from one common ancestor. PMID:329280

  17. Characterization of facultative oligotrophic bacteria from polar seas by analysis of their fatty acids and 16S rDNA sequences.

    PubMed

    Mergaert, J; Verhelst, A; Cnockaert, M C; Tan, T L; Swings, J

    2001-04-01

    One hundred and seventy three bacterial strains, isolated previously after enrichment under oligotrophic, psychrophylic conditions from Arctic (98 strains) and Antarctic seawater (75 strains), were characterized by gas-liquid chromatographic analysis of their fatty acid compositions. By numerical analysis, 8 clusters, containing 2 to 59 strains, could be delineated, and 8 strains formed separate branches. Five clusters contained strains from both poles, two minor clusters were confined to Arctic isolates, and one cluster consisted of Antarctic isolates only. The 16S rRNA genes from 23 strains, representing the different fatty acid profile clusters and including the unclustered strains, were sequenced. The sequences grouped with the alpha and gamma Proteobacteria, the high percent G+C gram positives, and the Cytophaga-Flavobacterium-Bacteroides branch. The sequences of strains from 4 clusters and of 7 unclustered strains were closely related (sequence similarities above 97%) to reference sequences of Sulfitobacter mediterraneus, Halomonas variabilis, Alteromonas macleodii, Pseudoalteromonas species, Shewanella frigidimarina, and Rhodococcus fascians. Strains from the other four clusters and an unclustered strain showed sequence similarities below 97% with nearest named neighbours, including Rhizobium, Glaciecola, Pseudomonas, Alteromonas macleodii and Cytophaga marinoflava, indicating that the clusters which they represent form as yet unnamed taxa.

  18. Cloning, sequence analysis and expression of the F1F0-ATPase beta-subunit from wine lactic acid bacteria.

    PubMed

    Sievers, Martin; Uermösi, Christina; Fehlmann, Marc; Krieger, Sibylle

    2003-09-01

    The nucleotide sequences of the genes encoding the F1F0-ATPase beta-subunit from Oenococcus oeni, Leuconostoc mesenteroides subsp. mesenteroides, Pediococcus damnosus, Pediococcus parvulus, Lactobacillus brevis and Lactobacillus hilgardii were determined. Their deduced amino acid sequences showed homology values of 79-98%. Data from the alignment and ATPase tree indicated that O. oeni and L. mesenteroides subsp. mesenteroides formed a group well-separated from P. damnosus and P. parvulus and from the group comprises L. brevis and L. hilgardii. The N-terminus of the F1F0-ATPase beta-subunit of O. oeni contains a stretch of additional 38 amino acid residues. The catalytic site of the ATPase beta-subunit of the investigated strains is characterized by the two conserved motifs GGAGVGKT and GERTRE. The amplified atpD coding sequences were inserted into the pCRT7/CT-TOPO vector using TA-cloning strategy and transformed in Escherichia coli. SDS-PAGE and Western blot analyses confirmed that O. oeni has an ATPase beta-subunit protein which is larger in size than the corresponding molecules from the investigated strains.

  19. The complete amino acid sequence of yeast phosphoglycerate kinase.

    PubMed Central

    Perkins, R E; Conroy, S C; Dunbar, B; Fothergill, L A; Tuite, M F; Dobson, M J; Kingsman, S M; Kingsman, A J

    1983-01-01

    The complete amino acid sequence of yeast phosphoglycerate kinase, comprising 415 residues, was determined. The sequence of residues 1-173 was deduced mainly from nucleotide sequence analysis of a series of overlapping fragments derived from the relevant portion of a 2.95-kilobase endonuclease-HindIII-digest fragment containing the yeast phosphoglycerate kinase gene. The sequence of residues 174-415 was deduced mainly from amino acid sequence analysis of three CNBr-cleavage fragments, and from peptides derived from these fragments after digestion by a number of proteolytic enzymes. Cleavage at the two tryptophan residues with o-iodosobenzoic acid was also used to isolate fragments suitable for amino acid sequence analysis. Determination of the complete sequence now allows a detailed interpretation of the existing high-resolution X-ray-crystallographic structure. The sequence -Ile-Ile-Gly-Gly-Gly- occurs twice in distant parts of the linear sequence (residues 232-236 and 367-371). Both these regions contribute to the nucleoside phosphate-binding site. A comparison of the sequence of yeast phosphoglycerate kinase reported here with the sequences of phosphoglycerate kinase from horse muscle and human erythrocytes shows that the yeast enzyme is 64% identical with the mammalian enzymes. The yeast has strikingly fewer methionine, cysteine and tryptophan residues. PMID:6347186

  20. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-06-06

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  1. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-05-30

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  2. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-06-06

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  3. Distinct structural features of the. cap alpha. and. beta. subunits of nitrogenase molybdenum-iron protein of Clostridium pasteurianum: an analysis of amino acid sequences

    SciTech Connect

    Wang, S.Z.; Chen, J.S.; Johnson, J.L.

    1988-04-19

    Nitrogenase is composed of two separately purified proteins, a molybdenum-iron (MoFe) protein and an iron (Fe) protein. Structural genes (nifD and nifK) encoding ..cap alpha.. and ..beta.. subunits of the MoFe protein of Clostridium pasteurianum (Cp) have been cloned and sequenced. The deduced amino acid sequences were analyzed for structures that could be related to the unique properties of the Cp protein, particularly its low capacity to form an active enzyme with a heterologous Fe protein. Cp nifK is located immediately downstream from Cp nifD, with the start codon of nifK overlapping by one base with the stop codon of nifD. An open reading frame following nifK was identified as nifE. The amino acid sequence deduced from nifK encompasses the partial amino acid sequences previously reported from the isolated ..beta.. subunit. Cp nifK encodes a polypeptide of 458 amino acid residues (M/sub r/ 50,115) whose amino-terminal region is about 50 resides shorter than the otherwise conserved corresponding polypeptides from four other organisms. In contrast, Cp ..cap alpha.. subunit (nifD product) contains an additional stretch of 50 amino acid residues in the 380-430 region, which is unique to the Cp protein. It therefore appears that the combined size of the ..cap alpha.. and ..beta.. subunits could be important to nitrogenase function. An analysis of the predicted secondary structure from the amino acid sequence of each subunit from three species (C. pasteurianum, Azotobacter vinelandii, and Rhizobium japonicum) further revealed structural features, including regions adjacent to some of the conserved cysteine resides, differentiating the Cp MoFe protein from others. These different regions may be further tested for correlation with distinct properties of Cp nitrogenase.

  4. Dipeptide Sequence Determination: Analyzing Phenylthiohydantoin Amino Acids by HPLC

    NASA Astrophysics Data System (ADS)

    Barton, Janice S.; Tang, Chung-Fei; Reed, Steven S.

    2000-02-01

    Amino acid composition and sequence determination, important techniques for characterizing peptides and proteins, are essential for predicting conformation and studying sequence alignment. This experiment presents improved, fundamental methods of sequence analysis for an upper-division biochemistry laboratory. Working in pairs, students use the Edman reagent to prepare phenylthiohydantoin derivatives of amino acids for determination of the sequence of an unknown dipeptide. With a single HPLC technique, students identify both the N-terminal amino acid and the composition of the dipeptide. This method yields good precision of retention times and allows use of a broad range of amino acids as components of the dipeptide. Students learn fundamental principles and techniques of sequence analysis and HPLC.

  5. Phylogenetic analysis of dicyemid mesozoans (phylum Dicyemida) from innexin amino acid sequences: dicyemids are not related to Platyhelminthes.

    PubMed

    Suzuki, Takahito G; Ogino, Kazutoyo; Tsuneki, Kazuhiko; Furuya, Hidetaka

    2010-06-01

    Dicyemid mesozoans are endoparasites, or endosymbionts, found only in the renal sac of benthic cephalopod molluscs. The body organization of dicyemids is very simple, consisting of usually 10 to 40 cells, with neither body cavities nor differentiated organs. Dicyemids were considered as primitive animals, and the out-group of all metazoans, or as occupying a basal position of lophotrochozoans close to flatworms. We cloned cDNAs encoding for the gap junction component proteins, innexin, from the dicyemids. Its expression pattern was observed by whole-mount in situ hybridization. In adult individuals, the innexin was expressed in calottes, infusorigens, and infusoriform embryos. The unique temporal pattern was observed in the developing infusoriform embryos. Innexin amino acid sequences had taxon-specific indels which enabled identification of the 3 major protostome lineages, i.e., 2 ecdysozoans (arthropods and nematodes) and the lophotrochozoans. The dicyemids show typical, lophotrochozoan-type indels. In addition, the Bayesian and maximum likelihood trees based on the innexin amino acid sequences suggested dicyemids to be more closely related to the higher lophotrochozoans than to the flatworms. Flatworms were the sister group, or consistently basal, to the other lophotrochozoan clade that included dicyemids, annelids, molluscs, and brachiopods.

  6. Comparative RNA-Sequence Transcriptome Analysis of Phenolic Acid Metabolism in Salvia miltiorrhiza, a Traditional Chinese Medicine Model Plant

    PubMed Central

    Song, Zhenqiao; Guo, Linlin; Liu, Tian; Lin, Caicai; Wang, Jianhua

    2017-01-01

    Salvia miltiorrhiza Bunge is an important traditional Chinese medicine (TCM). In this study, two S. miltiorrhiza genotypes (BH18 and ZH23) with different phenolic acid concentrations were used for de novo RNA sequencing (RNA-seq). A total of 170,787 transcripts and 56,216 unigenes were obtained. There were 670 differentially expressed genes (DEGs) identified between BH18 and ZH23, 250 of which were upregulated in ZH23, with genes involved in the phenylpropanoid biosynthesis pathway being the most upregulated genes. Nine genes involved in the lignin biosynthesis pathway were upregulated in BH18 and thus result in higher lignin content in BH18. However, expression profiles of most genes involved in the core common upstream phenylpropanoid biosynthesis pathway were higher in ZH23 than that in BH18. These results indicated that genes involved in the core common upstream phenylpropanoid biosynthesis pathway might play an important role in downstream secondary metabolism and demonstrated that lignin biosynthesis was a putative partially competing pathway with phenolic acid biosynthesis. The results of this study expanded our understanding of the regulation of phenolic acid biosynthesis in S. miltiorrhiza. PMID:28194403

  7. Development of microwave-assisted acid hydrolysis of proteins using a commercial microwave reactor and its combination with LC-MS for protein full-sequence analysis.

    PubMed

    Chen, Lu; Wang, Nan; Li, Liang

    2014-11-01

    Microwave-assisted acid hydrolysis (MAAH) can be used to degrade a protein non-specifically into many peptides with overlapping sequences which can be identified by mass spectrometry (MS) to produce a sequence map that covers the full sequence of a protein. The success of this method for protein sequence analysis depends on the proper control of the MAAH process, which is currently done using a household microwave oven. However, to meet the regulatory or good laboratory practice (GLP) requirement in a clinical or pharmaceutical laboratory, using a commercial microwave device is often required. In this paper, we report a method of performing MAAH using a CEM Discover single-mode microwave reactor. It is shown that, using an optimized protocol for MAAH, reproducible results comparable to those obtained using a household microwave oven can be generated using the commercial reactor. To illustrate the potential applications of MAAH MS for characterizing clinically relevant proteins, this method was applied, for the first time, to map the amino acid sequences of normal and sickle-cell human hemoglobin as well as bovine hemoglobin. Full sequence coverage was readily achieved from 294 and 266 unique peptides matched to the alpha and beta subunits of normal hemoglobin, respectively, 334 and 265 unique peptides matched to the alpha and beta submit units of sickle-cell hemoglobin, and 377 and 224 unique peptides matched to the alpha and beta subunits of bovine hemoglobin. This method opens the possibility for any laboratory to use a commercial laboratory equipment to perform MAAH MS for protein full-sequence analysis. Copyright © 2014 Elsevier B.V. All rights reserved.

  8. Limited proteolysis and sequence analysis of the 2-oxo acid dehydrogenase complexes from Escherichia coli. Cleavage sites and domains in the dihydrolipoamide acyltransferase components.

    PubMed Central

    Packman, L C; Perham, R N

    1987-01-01

    The structures of the dihydrolipoamide acyltransferase (E2) components of the 2-oxo acid dehydrogenase complexes from Escherichia coli were investigated by limited proteolysis. Trypsin and Staphylococcus aureus V8 proteinase were used to excise the three lipoyl domains from the E2p component of the pyruvate dehydrogenase complex and the single lipoyl domain from the E2o component of the 2-oxoglutarate dehydrogenase complex. The principal sites of action of these enzymes on each E2 chain were determined by sequence analysis of the isolated lipoyl fragments and of the truncated E2p and E2o chains. Each of the numerous cleavage sites (12 in E2p, six in E2o) fell within similar segments of the E2 chains, namely stretches of polypeptide rich in alanine, proline and/or charged amino acids. These regions are clearly accessible to proteinases of Mr 24,000-28,000 and, on the basis of n.m.r. spectroscopy, some of them have previously been implicated in facilitating domain movements by virtue of their conformational flexibility. The limited proteolysis data suggest that E2p and E2o possess closer architectural similarities than would be predicted from inspection of their amino acid sequences. As a result of this work, an error was detected in the sequence of E2o inferred from the previously published sequence of the encoding gene, sucB. The relevant peptides from E2o were purified and sequenced by direct means; an amended sequence is presented. Images Fig. 1. Fig. 2. PMID:3297046

  9. A global analysis of CNVs in swine using whole genome sequence data and association analysis with fatty acid composition and growth traits.

    PubMed

    Revilla, Manuel; Puig-Oliveras, Anna; Castelló, Anna; Crespo-Piazuelo, Daniel; Paludo, Ediane; Fernández, Ana I; Ballester, Maria; Folch, Josep M

    2017-01-01

    Copy number variations (CNVs) are important genetic variants complementary to SNPs, and can be considered as biomarkers for some economically important traits in domestic animals. In the present study, a genomic analysis of porcine CNVs based on next-generation sequencing data was carried out to identify CNVs segregating in an Iberian x Landrace backcross population and study their association with fatty acid composition and growth-related traits. A total of 1,279 CNVs, including duplications and deletions, were detected, ranging from 106 to 235 CNVs across samples, with an average of 183 CNVs per sample. Moreover, we detected 540 CNV regions (CNVRs) containing 245 genes. Functional annotation suggested that these genes possess a great variety of molecular functions and may play a role in production traits in commercial breeds. Some of the identified CNVRs contained relevant functional genes (e.g., CLCA4, CYP4X1, GPAT2, MOGAT2, PLA2G2A and PRKG1, among others). The variation in copy number of four of them (CLCA4, GPAT2, MOGAT2 and PRKG1) was validated in 150 BC1_LD (25% Iberian and 75% Landrace) animals by qPCR. Additionally, their contribution regarding backfat and intramuscular fatty acid composition and growth-related traits was analyzed. Statistically significant associations were obtained for CNVR112 (GPAT2) for the C18:2(n-6)/C18:3(n-3) ratio in backfat and carcass length, among others. Notably, GPATs are enzymes that catalyze the first step in the biosynthesis of both triglycerides and glycerophospholipids, suggesting that this CNVR may contribute to genetic variation in fatty acid composition and growth traits. These findings provide useful genomic information to facilitate the further identification of trait-related CNVRs affecting economically important traits in pigs.

  10. Cloning and Sequence Analysis of cDNAs Encoding Two Acidic PLA(2) from venom of Ophiophagus hannah(King Cobra), Guangxi Species.

    PubMed

    Wang, Qiu-Yan; Shu, Yu-Yan; Zhuang, Mao-Xing; Lin, Zheng-Jiong

    2001-01-01

    Total RNA was extracted from venom glands of Ophiophagus hannah, Guangxi species. The cDNAs encoding PLA(2) were amplified by RT-PCR and cloned into the PUCm-T vector. The positive clones encoding two acidic PLA(2) (APLA(2)-1 and APLA(2)-2) were selected and bidirectionally sequenced. Their complete amino acid sequences were deduced and found to be identical to the known amino acid sequences. Their isoelectric points calculated by computer agreed with the values determined with their protein. Homology analysis indicated that the mature peptide of APLA(2)-1 had high homology with PLA(2) from venoms of Ophiophagus hannah, Fujian and Taiwan species, but APLA(2)-2 had lower homology. The most striking difference between APLA(2)-2 and other PLA(2) from Ophiophagus hannah venoms is the missing of a extra "pancreatic loop" at residues 62--66 in APLA(2)-2, and it may be related to their species evolution and biological activity.

  11. Identification of single amino acid substitutions (SAAS) in neuraminidase from influenza a virus (H1N1) via mass spectrometry analysis coupled with de novo peptide sequencing.

    PubMed

    Peng, Qisheng; Wang, Zijian; Wu, Donglin; Li, Xiaoou; Liu, Xiaofeng; Sun, Wanchun; Liu, Ning

    2016-08-01

    Amino acid substitutions in the neuraminidase of the influenza virus are the main cause of the emergence of resistance to zanamivir or oseltamivir during seasonal influenza treatment; they are the result of non-synonymous mutations in the viral genome that can be successfully detected by polymer chain reaction (PCR)-based approaches. There is always an urgent need to detect variation in amino acid sequences directly at the protein level. Mass spectrometry coupled with de novo sequencing has been explored as an alternative and straightforward strategy for detecting amino acid substitutions, as well - this approach is the primary focus of the present study. Influenza virus (A/Puerto Rico/8/1934 H1N1) propagated in embryonated chicken eggs was purified by ultracentrifugation, followed by PNGase F treatment. The deglycosylated virion was lysed and separated by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). The gel band corresponding to neuraminidase was picked up and subjected to liquid chromatography tandem mass spectrometry (LC-MS/MS) analysis. LC-MS/MS analyses, coupled with manual de novo sequencing, allowed the determination of three amino acid substitutions: R346K, S349 N, and S370I/L, in the neuraminidase from the influenza virus (A/Puerto Rico/8/1934 H1N1), which were located in three mutated peptides of the neuraminidase: YGNGVWIGK, TKNHSSR, and PNGWTETDI/LK, respectively. We found that the amino acid substitutions in the proteins of RNA viruses (including influenza A virus) resulting from non-synonymous gene mutations can indeed be directly analyzed via mass spectrometry, and that manual interpretation of the MS/MS data may be beneficial. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  12. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid sequence...

  13. Cystatin. Amino acid sequence and possible secondary structure.

    PubMed Central

    Schwabe, C; Anastasi, A; Crow, H; McDonald, J K; Barrett, A J

    1984-01-01

    The amino acid sequence of cystatin, the protein from chicken egg-white that is a tight-binding inhibitor of many cysteine proteinases, is reported. Cystatin is composed of 116 amino acid residues, and the Mr is calculated to be 13 143. No striking similarity to any other known sequence has been detected. The results of computer analysis of the sequence and c.d. spectrometry indicate that the secondary structure includes relatively little alpha-helix (about 20%) and that the remainder is mainly beta-structure. PMID:6712597

  14. Differences in acid tolerance between Bifidobacterium breve BB8 and its acid-resistant derivative B. breve BB8dpH, revealed by RNA-sequencing and physiological analysis.

    PubMed

    Yang, Xu; Hang, Xiaomin; Tan, Jing; Yang, Hong

    2015-06-01

    Bifidobacteria are common inhabitants of the human gastrointestinal tract, and their application has increased dramatically in recent years due to their health-promoting effects. The ability of bifidobacteria to tolerate acidic environments is particularly important for their function as probiotics because they encounter such environments in food products and during passage through the gastrointestinal tract. In this study, we generated a derivative, Bifidobacterium breve BB8dpH, which displayed a stable, acid-resistant phenotype. To investigate the possible reasons for the higher acid tolerance of B. breve BB8dpH, as compared with its parental strain B. breve BB8, a combined transcriptome and physiological approach was used to characterize differences between the two strains. An analysis of the transcriptome by RNA-sequencing indicated that the expression of 121 genes was increased by more than 2-fold, while the expression of 146 genes was reduced more than 2-fold, in B. breve BB8dpH. Validation of the RNA-sequencing data using real-time quantitative PCR analysis demonstrated that the RNA-sequencing results were highly reliable. The comparison analysis, based on differentially expressed genes, suggested that the acid tolerance of B. breve BB8dpH was enhanced by regulating the expression of genes involved in carbohydrate transport and metabolism, energy production, synthesis of cell envelope components (peptidoglycan and exopolysaccharide), synthesis and transport of glutamate and glutamine, and histidine synthesis. Furthermore, an analysis of physiological data showed that B. breve BB8dpH displayed higher production of exopolysaccharide and lower H(+)-ATPase activity than B. breve BB8. The results presented here will improve our understanding of acid tolerance in bifidobacteria, and they will lead to the development of new strategies to enhance the acid tolerance of bifidobacterial strains.

  15. KM+, a mannose-binding lectin from Artocarpus integrifolia: amino acid sequence, predicted tertiary structure, carbohydrate recognition, and analysis of the beta-prism fold.

    PubMed Central

    Rosa, J. C.; De Oliveira, P. S.; Garratt, R.; Beltramini, L.; Resing, K.; Roque-Barreira, M. C.; Greene, L. J.

    1999-01-01

    The complete amino acid sequence of the lectin KM+ from Artocarpus integrifolia (jackfruit), which contains 149 residues/mol, is reported and compared to those of other members of the Moraceae family, particularly that of jacalin, also from jackfruit, with which it shares 52% sequence identity. KM+ presents an acetyl-blocked N-terminus and is not posttranslationally modified by proteolytic cleavage as is the case for jacalin. Rather, it possesses a short, glycine-rich linker that unites the regions homologous to the alpha- and beta-chains of jacalin. The results of homology modeling implicate the linker sequence in sterically impeding rotation of the side chain of Asp141 within the binding site pocket. As a consequence, the aspartic acid is locked into a conformation adequate only for the recognition of equatorial hydroxyl groups on the C4 epimeric center (alpha-D-mannose, alpha-D-glucose, and their derivatives). In contrast, the internal cleavage of the jacalin chain permits free rotation of the homologous aspartic acid, rendering it capable of accepting hydrogen bonds from both possible hydroxyl configurations on C4. We suggest that, together with direct recognition of epimeric hydroxyls and the steric exclusion of disfavored ligands, conformational restriction of the lectin should be considered to be a new mechanism by which selectivity may be built into carbohydrate binding sites. Jacalin and KM+ adopt the beta-prism fold already observed in two unrelated protein families. Despite presenting little or no sequence similarity, an analysis of the beta-prism reveals a canonical feature repeatedly present in all such structures, which is based on six largely hydrophobic residues within a beta-hairpin containing two classic-type beta-bulges. We suggest the term beta-prism motif to describe this feature. PMID:10210179

  16. Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences

    PubMed Central

    Navon, Sharon Penias; Kornberg, Guy; Chen, Jin; Schwartzman, Tali; Tsai, Albert; Puglisi, Elisabetta Viani; Puglisi, Joseph D.; Adir, Noam

    2016-01-01

    Bioinformatic analysis of Escherichia coli proteomes revealed that all possible amino acid triplet sequences occur at their expected frequencies, with four exceptions. Two of the four underrepresented sequences (URSs) were shown to interfere with translation in vivo and in vitro. Enlarging the URS by a single amino acid resulted in increased translational inhibition. Single-molecule methods revealed stalling of translation at the entrance of the peptide exit tunnel of the ribosome, adjacent to ribosomal nucleotides A2062 and U2585. Interaction with these same ribosomal residues is involved in regulation of translation by longer, naturally occurring protein sequences. The E. coli exit tunnel has evidently evolved to minimize interaction with the exit tunnel and maximize the sequence diversity of the proteome, although allowing some interactions for regulatory purposes. Bioinformatic analysis of the human proteome revealed no underrepresented triplet sequences, possibly reflecting an absence of regulation by interaction with the exit tunnel. PMID:27307442

  17. Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences.

    PubMed

    Navon, Sharon Penias; Kornberg, Guy; Chen, Jin; Schwartzman, Tali; Tsai, Albert; Puglisi, Elisabetta Viani; Puglisi, Joseph D; Adir, Noam

    2016-06-28

    Bioinformatic analysis of Escherichia coli proteomes revealed that all possible amino acid triplet sequences occur at their expected frequencies, with four exceptions. Two of the four underrepresented sequences (URSs) were shown to interfere with translation in vivo and in vitro. Enlarging the URS by a single amino acid resulted in increased translational inhibition. Single-molecule methods revealed stalling of translation at the entrance of the peptide exit tunnel of the ribosome, adjacent to ribosomal nucleotides A2062 and U2585. Interaction with these same ribosomal residues is involved in regulation of translation by longer, naturally occurring protein sequences. The E. coli exit tunnel has evidently evolved to minimize interaction with the exit tunnel and maximize the sequence diversity of the proteome, although allowing some interactions for regulatory purposes. Bioinformatic analysis of the human proteome revealed no underrepresented triplet sequences, possibly reflecting an absence of regulation by interaction with the exit tunnel.

  18. Analysis of oropouche virus L protein amino acid sequence showed the presence of an additional conserved region that could harbour an important role for the polymerase activity.

    PubMed

    Aquino, V H; Moreli, M L; Moraes Figueiredo, L T

    2003-01-01

    We described here the complete nucleotide sequence of the L RNA segment of Oropouche virus (genus Orthobunyavirus, family Bunyaviridae). We found the L RNA segment is 6846 nucleotides long and encodes a putative RNA polymerase of 2250 amino acids. Phylogenetic analysis showed that ORO virus cluster to the Orthobunyavirus genus confirming the serological classification. It also showed that Bunyamwera and California viruses, from the Orthobunyavirus genus, are more closely related to each other than to ORO virus. Sequence comparisons performed between the L proteins of 15 bunyaviruses and the PB1 proteins of 3 influenza viruses revealed that ORO L protein contains the 3 regions characteristic of arenaviruses and bunyaviruses. These comparisons also showed the existence of an additional fourth conserved region in the L protein of bunyaviruses that contains at least two active sites.

  19. Analysis of the complete sequences of two biologically distinct Zucchini yellow mosaic virus isolates further evidences the involvement of a single amino acid in the virus pathogenicity.

    PubMed

    Nováková, S; Svoboda, J; Glasa, M

    2014-01-01

    The complete genome sequences of two Slovak Zucchini yellow mosaic virus isolates (ZYMV-H and ZYMV-SE04T) were determined. These isolates differ significantly in their pathogenicity, producing either severe or very mild symptoms on susceptible cucurbit hosts. The viral genome of both isolates consisted of 9593 nucleotides in size, and contained an open reading frame encoding a single polyprotein of 3080 amino acids. Despite their different biological properties, an extremely high nucleotide identity could be noted (99.8%), resulting in differences of only 5 aa, located in the HC-Pro, P3, and NIb, respectively. In silico analysis including 5 additional fully-sequenced and phylogenetically closely-related isolates known to induce different symptoms in cucurbits was performed. This suggested that the key single mutation responsible for virus pathogenicity is likely located in the N-terminal part of P3, adjacent to the PIPO.

  20. Amino acid sequence and carbohydrate-binding analysis of the N-acetyl-D-galactosamine-specific C-type lectin, CEL-I, from the Holothuroidea, Cucumaria echinata.

    PubMed

    Hatakeyama, Tomomitsu; Matsuo, Noriaki; Shiba, Kouhei; Nishinohara, Shoichi; Yamasaki, Nobuyuki; Sugawara, Hajime; Aoyagi, Haruhiko

    2002-01-01

    CEL-I is one of the Ca2+-dependent lectins that has been isolated from the sea cucumber, Cucumaria echinata. This protein is composed of two identical subunits held by a single disulfide bond. The complete amino acid sequence of CEL-I was determined by sequencing the peptides produced by proteolytic fragmentation of S-pyridylethylated CEL-I. A subunit of CEL-I is composed of 140 amino acid residues. Two intrachain (Cys3-Cys14 and Cys31-Cys135) and one interchain (Cys36) disulfide bonds were also identified from an analysis of the cystine-containing peptides obtained from the intact protein. The similarity between the sequence of CEL-I and that of other C-type lectins was low, while the C-terminal region, including the putative Ca2+ and carbohydrate-binding sites, was relatively well conserved. When the carbohydrate-binding activity was examined by a solid-phase microplate assay, CEL-I showed much higher affinity for N-acetyl-D-galactosamine than for other galactose-related carbohydrates. The association constant of CEL-I for p-nitrophenyl N-acetyl-beta-D-galactosaminide (NP-GalNAc) was determined to be 2.3 x 10(4) M(-1), and the maximum number of bound NP-GalNAc was estimated to be 1.6 by an equilibrium dialysis experiment.

  1. Phenolic acid esterases, coding sequences and methods

    DOEpatents

    Blum, David L.; Kataeva, Irina; Li, Xin-Liang; Ljungdahl, Lars G.

    2002-01-01

    Described herein are four phenolic acid esterases, three of which correspond to domains of previously unknown function within bacterial xylanases, from XynY and XynZ of Clostridium thermocellum and from a xylanase of Ruminococcus. The fourth specifically exemplified xylanase is a protein encoded within the genome of Orpinomyces PC-2. The amino acids of these polypeptides and nucleotide sequences encoding them are provided. Recombinant host cells, expression vectors and methods for the recombinant production of phenolic acid esterases are also provided.

  2. Mouse Vk gene classification by nucleic acid sequence similarity.

    PubMed

    Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

    1989-01-01

    Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.

  3. Analysis of key genes of jasmonic acid mediated signal pathway for defense against insect damages by comparative transcriptome sequencing.

    PubMed

    Yang, Fengshan; Zhang, Yuliang; Huang, Qixing; Yin, Guohua; Pennerman, Kayla K; Yu, Jiujiang; Liu, Zhixin; Li, Dafei; Guo, Anping

    2015-11-12

    Corn defense systems against insect herbivory involve activation of genes that lead to metabolic reconfigurations to produce toxic compounds, proteinase inhibitors, oxidative enzymes, and behavior-modifying volatiles. Similar responses occur when the plant is exposed to methyl jasmonate (MeJA). To compare the defense responses between stalk borer feeding and exogenous MeJA on a transcriptional level, we employed deep transcriptome sequencing methods following Ostrinia furnacalis leaf feeding and MeJA leaf treatment. 39,636 genes were found to be differentially expressed with O. furnacalis feeding, MeJA application, and O. furnacalis feeding and MeJA application. Following Gene Ontology enrichment analysis of the up- or down- regulated genes, many were implicated in metabolic processes, stimuli-responsive catalytic activity, and transfer activity. Fifteen genes that indicated significant changes in the O. furnacalis feeding group: LOX1, ASN1, eIF3, DXS, AOS, TIM, LOX5, BBTI2, BBTI11, BBTI12, BBTI13, Cl-1B, TPS10, DOX, and A20/AN1 were found to almost all be involved in jasmonate defense signaling pathways. All of the data demonstrate that the jasmonate defense signal pathway is a major defense signaling pathways of Asian corn borer's defense against insect herbivory. The transcriptome data are publically available at NCBI SRA: SRS965087.

  4. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-07-21

    A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.

  5. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.

  6. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1999-10-26

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  7. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  8. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2001-06-05

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  9. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, M.S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device. 27 figs.

  10. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  11. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2003-08-19

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  12. Methods for analyzing nucleic acid sequences

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid. The method provides a complex comprising a polymerase enzyme, a target nucleic acid molecule, and a primer, wherein the complex is immobilized on a support Fluorescent label is attached to a terminal phosphate group of the nucleotide or nucleotide analog. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The time duration of the signal from labeled nucleotides or nucleotide analogs that become incorporated is distinguished from freely diffusing labels by a longer retention in the observation volume for the nucleotides or nucleotide analogs that become incorporated than for the freely diffusing labels.

  13. Cloning, sequence analysis, and expression in Escherichia coli of the gene encoding an alpha-amino acid ester hydrolase from Acetobacter turbidans.

    PubMed

    Polderman-Tijmes, Jolanda J; Jekel, Peter A; de Vries, Erik J; van Merode, Annet E J; Floris, René; van der Laan, Jan-Metske; Sonke, Theo; Janssen, Dick B

    2002-01-01

    The alpha-amino acid ester hydrolase from Acetobacter turbidans ATCC 9325 is capable of hydrolyzing and synthesizing beta-lactam antibiotics, such as cephalexin and ampicillin. N-terminal amino acid sequencing of the purified alpha-amino acid ester hydrolase allowed cloning and genetic characterization of the corresponding gene from an A. turbidans genomic library. The gene, designated aehA, encodes a polypeptide with a molecular weight of 72,000. Comparison of the determined N-terminal sequence and the deduced amino acid sequence indicated the presence of an N-terminal leader sequence of 40 amino acids. The aehA gene was subcloned in the pET9 expression plasmid and expressed in Escherichia coli. The recombinant protein was purified and found to be dimeric with subunits of 70 kDa. A sequence similarity search revealed 26% identity with a glutaryl 7-ACA acylase precursor from Bacillus laterosporus, but no homology was found with other known penicillin or cephalosporin acylases. There was some similarity to serine proteases, including the conservation of the active site motif, GXSYXG. Together with database searches, this suggested that the alpha-amino acid ester hydrolase is a beta-lactam antibiotic acylase that belongs to a class of hydrolases that is different from the Ntn hydrolase superfamily to which the well-characterized penicillin acylase from E. coli belongs. The alpha-amino acid ester hydrolase of A. turbidans represents a subclass of this new class of beta-lactam antibiotic acylases.

  14. Amino acid sequence analysis and identification of mutations in the NS gene of 2009 influenza A (H1N1) isolates from Kenya.

    PubMed

    George, Gachara; Samuel, Symekher; John, Mbithi; James, Simwa; Musa, Ng'ayo; Japheth, Magana; Wallace, Bulimo

    2011-08-01

    Although the important role of the nonstructural (NS) gene of influenza A virus in virulence and replication is well-established, the knowledge about the extent of variation in the NS gene of 2009 influenza A (H1N1) viruses in Kenya and Africa is scanty. This study analysed the NS gene of 31 isolates from Kenya in order to obtain a more detailed knowledge about the genetic variation of NS gene of 2009 influenza A (H1N1) isolates from Kenya. A comparison with the vaccine strain and viruses isolated elsewhere in Africa was also made. The amino acid sequences of the non-structural protein, NS1 of the viruses from this study and the vaccine strain revealed 18 differences. Conversely, the nuclear export protein (NEP) of the isolates in this study had 11 differences from the vaccine strain. Analysis of the NS1 protein showed only one fixed amino acid change I123V which is one of the characteristics of clade 7 viruses. In the NEP, the amino acid at position 77 was the most mutable with 9 (39%) of all mutations seen in this protein. A mutation A115T which is a characteristic of clade 5 viruses was noted in the isolates from Lagos, Nigeria. The study shows a substantial number of mutations in the NS gene that has not been reported elsewhere and gives a glimpse of the evolution of this gene in the region.

  15. E-probe Diagnostic Nucleic acid Analysis (EDNA): A theoretical approach for handling of next generation sequencing data for diagnostics

    USDA-ARS?s Scientific Manuscript database

    There are many plant pathogen-specific diagnostic assays, based on PCR and immune-detection. However, the ability to test for large numbers of pathogens simultaneously is lacking. Next generation sequencing (NGS) allows one to detect all organisms within a given sample, but has computational limitat...

  16. Cladistic analysis of anuran POMC sequences.

    PubMed

    Alrubaian, Jasem; Danielson, Phillip; Walker, David; Dores, Robert M

    2002-03-01

    Procedures for performing cladistic analyses can provide powerful tools for understanding the evolution of neuropeptide and polypeptide hormone coding genes. These analyses can be done on either amino acid data sets or nucleotide data sets and can utilize several different algorithms that are dependent on distinct sets of operating assumptions and constraints. In some cases, the results of these analyses can be used to gauge phylogenetic relationships between taxa. Selecting the proper cladistic analysis strategy is dependent on the taxonomic level of analysis and the rate of evolution within the orthologous genes being evaluated. For example, previous studies have shown that the amino acid sequence of proopiomelanocortin (POMC), the common precursor for the melanocortins and beta-endorphin, can be used to resolve phylogenetic relationships at the class and order level. This study tested the hypothesis that POMC sequences could be used to resolve phylogenetic relationships at the family taxonomic level. Cladistic analyses were performed on amphibian POMC sequences characterized from the marine toad, Bufo marinus (family Bufonidae; this study), the spadefoot toad, Spea multiplicatus (family Pelobatidae), the African clawed frog, Xenopus laevis (family Pipidae) and the laughing frog, Rana ridibunda (family Ranidae). In these analyses the sequence of Australian lungfish POMC was used as the outgroup. The analyses were done at the amino acid level using the maximum parsimony algorithm and at the nucleotide level using the maximum likelihood algorithm. For the anuran POMC genes, analysis at the nucleotide level using the maximum likelihood algorithm generated a cladogram with higher bootstrap values than the maximum parsimony analysis of the POMC amino acid data set. For anuran POMC sequences, analysis of nucleotide sequences using the maximum likelihood algorithm would appear to be the preferred strategy for resolving phylogenetic relationships at the family taxonomic

  17. Uses of phage display in agriculture: sequence analysis and comparative modeling of late embryogenesis abundant client proteins suggest protein-nucleic acid binding functionality.

    PubMed

    Kushwaha, Rekha; Downie, A Bruce; Payne, Christina M

    2013-01-01

    A group of intrinsically disordered, hydrophilic proteins-Late Embryogenesis Abundant (LEA) proteins-has been linked to survival in plants and animals in periods of stress, putatively through safeguarding enzymatic function and prevention of aggregation in times of dehydration/heat. Yet despite decades of effort, the molecular-level mechanisms defining this protective function remain unknown. A recent effort to understand LEA functionality began with the unique application of phage display, wherein phage display and biopanning over recombinant Seed Maturation Protein homologs from Arabidopsis thaliana and Glycine max were used to retrieve client proteins at two different temperatures, with one intended to represent heat stress. From this previous study, we identified 21 client proteins for which clones were recovered, sometimes repeatedly. Here, we use sequence analysis and homology modeling of the client proteins to ascertain common sequence and structural properties that may contribute to binding affinity with the protective LEA protein. Our methods uncover what appears to be a predilection for protein-nucleic acid interactions among LEA client proteins, which is suggestive of subcellular residence. The results from this initial computational study will guide future efforts to uncover the protein protective mechanisms during heat stress, potentially leading to phage-display-directed evolution of synthetic LEA molecules.

  18. Fluorescence melting curve analysis using self-quenching dual-labeled peptide nucleic acid probes for simultaneously identifying multiple DNA sequences.

    PubMed

    Ahn, Jeong Jin; Kim, Youngjoo; Lee, Seung Yong; Hong, Ji Young; Kim, Gi Won; Hwang, Seung Yong

    2015-09-01

    Previous fluorescence melting curve analysis (FMCA) used intercalating dyes, and this method has restricted application. Therefore, FMCA methods such as probe-based FMCA and molecular beacons were studied. However, the usual dual-labeled probes do not possess adequate fluorescence quenching ability and sufficient specificity, and molecular beacons with the necessary stem structures are hard to design. Therefore, we have developed a peptide nucleic acid (PNA)-based FMCA method. PNA oligonucleotide can have a much higher melting temperature (Tm) value than DNA. Therefore, short PNA probes can have adequate Tm values for FMCA, and short probes can have higher specificity and accuracy in FMCA. Moreover, dual-labeled PNA probes have self-quenching ability via single-strand base stacking, which makes PNA more favorable. In addition, this method can facilitate simultaneous identification of multiple DNA templates. In conventional real-time polymerase chain reaction (PCR), one fluorescence channel can identify only one DNA template. However, this method uses two fluorescence channels to detect three types of DNA. Experiments were performed with one to three different DNA sequences mixed in a single tube. This method can be used to identify multiple DNA sequences in a single tube with high specificity and high clarity. Copyright © 2015 Elsevier Inc. All rights reserved.

  19. Porcine proinsulin: characterization and amino acid sequence.

    PubMed

    Chance, R E; Ellis, R M; Bromer, W W

    1968-07-12

    Proinsulin in nearly homogeneous form has been isolated from a preparation of porcine insulin. A molecular weight close to 9100 was calculated from the amino acid composition and from sedimentation-equilibrium studies. Through the action of trypsin this single-chain protein is transformed to desalanine insulin by cleavage of a polypeptide chain connecting the carboxy-terminus of the B chain to the amino-terminus of the A chain of insulin. The amino acid sequence of this connecting peptide was found to be Arg-Arg-Glu-Ala-Gln-Asn-Pro-Gln-Ala-Gly-Ala-Val-Glu-Leu-Gly-Gly-Gly-Leu-Gly-Gly-Leu-Gln-Ala-Leu-Ala-Leu-Glu-Gly-Pro-Pro-Gln-Lys-Arg.

  20. Transcriptome analysis of the pectoral muscles of local chickens and commercial broilers using Ribo-Zero ribonucleic acid sequencing.

    PubMed

    Zhang, Yanhua; Li, Donghua; Han, Ruili; Wang, Yanbin; Li, Guoxi; Liu, Xiaojun; Tian, Yadong; Kang, Xiangtao; Li, Zhuanjian

    2017-01-01

    The molecular mechanisms underlying meat quality and muscle growth are not clear. The meat quality and growth rates of local chickens and commercial broilers are very different. The Ribo-Zero RNA-Seq technology is an effective means of analyzing transcript groups to clarify molecular mechanisms. The aim of this study was to provide a reference for studies of the differences in the meat quality and growth of different breeds of chickens. Ribo-Zero RNA-Seq technology was used to analyze the pectoral muscle transcriptomes of Gushi chickens and AA broilers. Compared with AA broilers, 1649 genes with annotated information were significantly differentially expressed (736 upregulated and 913 downregulated) in Gushi chickens with Q≤0.05 (Q is the P-value corrected by multiple assumptions test) at a fold change ≥2 or ≤0.5. In addition, 2540 novel significantly differentially expressed (SDE) genes (1405 upregulated and 1135 downregulated) were discovered. The results showed that the main signal transduction pathways that differed between Gushi chickens and AA broilers were related to amino acid metabolism. Amino acids are important for protein synthesis, and they regulate key metabolic pathways to improve the growth, development and reproduction of organisms. This study showed that differentially expressed genes in the pectoral tissues of Gushi chickens and AA broilers were related to fat metabolism, which affects meat. Additionally, a large number of novel genes were found that may be involved in fat metabolism and thus may affect the formation of meat, which requires further study. The results of this study provide a reference for further studies of the molecular mechanisms of meat formation.

  1. Amino acid sequence of bovine gamma E (IVa) lens crystallin.

    PubMed Central

    Kilby, G. W.; Sheil, M. M.; Shaw, D.; Harding, J. J.; Truscott, R. J.

    1997-01-01

    When electrospray ionization mass spectrometry (ESMS) was used to analyze purified bovine gamma E (gamma IVa)-crystallin, it yielded a relative molecular mass (M(r)) of 20.955 +/- 5. This mass is significantly different from that calculated from the published sequence (M(r) 20.894) (White HE et al., 1989, J Mol Biol 207:217-235). Further, ES-MS analysis of the protein after it had been reduced and carboxymethylated indicated the presence of five cysteine residues, whereas the published sequence contains six (Kilby GW et al., 1995, Eur Mass Spectrom 1:203-208). The entire protein sequence of gamma E crystallin has therefore been studied via a combination of ES-MS, ES-MS/MS, and Edman amino acid sequencing. The corrected sequence gives an M(r) of 20.955.3, which matches that obtained by ES-MS analysis of the purified native protein. The corrected sequence is also in agreement with a recent cDNA sequence obtained for a bovine gamma-crystallin by R. Hay (pers. comm.). PMID:9098901

  2. Amino acid sequence of bovine gamma E (IVa) lens crystallin.

    PubMed

    Kilby, G W; Sheil, M M; Shaw, D; Harding, J J; Truscott, R J

    1997-04-01

    When electrospray ionization mass spectrometry (ESMS) was used to analyze purified bovine gamma E (gamma IVa)-crystallin, it yielded a relative molecular mass (M(r)) of 20.955 +/- 5. This mass is significantly different from that calculated from the published sequence (M(r) 20.894) (White HE et al., 1989, J Mol Biol 207:217-235). Further, ES-MS analysis of the protein after it had been reduced and carboxymethylated indicated the presence of five cysteine residues, whereas the published sequence contains six (Kilby GW et al., 1995, Eur Mass Spectrom 1:203-208). The entire protein sequence of gamma E crystallin has therefore been studied via a combination of ES-MS, ES-MS/MS, and Edman amino acid sequencing. The corrected sequence gives an M(r) of 20.955.3, which matches that obtained by ES-MS analysis of the purified native protein. The corrected sequence is also in agreement with a recent cDNA sequence obtained for a bovine gamma-crystallin by R. Hay (pers. comm.).

  3. Amino acid sequence and comparative antigenicity of chicken metallothionein.

    PubMed Central

    McCormick, C C; Fullmer, C S; Garvey, J S

    1988-01-01

    The complete amino acid sequence of metallothionein (MT) from chicken liver is reported. The primary structure was determined by automated sequence analysis of peptides produced by limited acid hydrolysis and by trypsin digestion. The comparative antigenicity of chicken MT was determined by radioimmunoassay using rabbit anti-rat MT polyclonal antibody. Chicken MT consists of 63 amino acids as compared to 61 found in MTs from mammals. One insertion (and two substitutions) occurs in the amino-terminal region, a region considered invariant among mammalian MTs. Eighteen of the 20 cysteines in chicken MT were aligned with cysteines from other mammalian sequences. Two cysteines near the carboxyl terminus are shifted by one residue due to the insertion of proline in that region. Overall, the chicken protein showed approximately equal to 68% sequence identity in a comparison with various mammalian MTs. The affinity of the polyclonal antibody for chicken MT was decreased by 2 orders of magnitude in comparison to that of a mammalian MT (rat MT isoforms). This reduced affinity is attributed to major substitutions in chicken MT in the regions of the principal determinants of mammalian MTs. Theoretical analysis of the primary structure predicted the secondary structure to consist of reverse turns and random coils with no stable beta or helix conformations. There is no evidence that chicken MT differs functionally from mammalian MTs. PMID:2448773

  4. Detection of nucleic acid sequences by invader-directed cleavage

    DOEpatents

    Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.

  5. Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences

    NASA Astrophysics Data System (ADS)

    Ekeberg, Magnus; Hartonen, Tuomo; Aurell, Erik

    2014-11-01

    Direct-coupling analysis is a group of methods to harvest information about coevolving residues in a protein family by learning a generative model in an exponential family from data. In protein families of realistic size, this learning can only be done approximately, and there is a trade-off between inference precision and computational speed. We here show that an earlier introduced l2-regularized pseudolikelihood maximization method called plmDCA can be modified as to be easily parallelizable, as well as inherently faster on a single processor, at negligible difference in accuracy. We test the new incarnation of the method on 143 protein family/structure-pairs from the Protein Families database (PFAM), one of the larger tests of this class of algorithms to date.

  6. Nanopore-based sequencing and detection of nucleic acids.

    PubMed

    Ying, Yi-Lun; Zhang, Junji; Gao, Rui; Long, Yi-Tao

    2013-12-09

    Nanopore-based techniques, which mimic the functions of natural ion channels, have attracted increasing attention as unique methods for single-molecule detection. The technology allows the real-time, selective, high-throughput analysis of nucleic acids through both biological and solid-state nanopores. In this Minireview, the background and latest progress in nanopore-based sequencing and detection of nucleic acids are summarized, and light is shed on a novel platform for nanopore-based detection. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Amino acid sequence of bovine heart coupling factor 6.

    PubMed Central

    Fang, J K; Jacobs, J W; Kanner, B I; Racker, E; Bradshaw, R A

    1984-01-01

    The amino acid sequence of bovine heart mitochondrial coupling factor 6 (F6) has been determined by automated Edman degradation of the whole protein and derived peptides. Preparations based on heat precipitation and ethanol extraction showed allotypic variation at three positions while material further purified by HPLC yielded only one sequence that also differed by a Phe-Thr replacement at residue 62. The mature protein contains 76 amino acids with a calculated molecular weight of 9006 and a pI of approximately equal to 5, in good agreement with experimentally measured values. The charged amino acids are mainly clustered at the termini and in one section in the middle; these three polar segments are separated by two segments relatively rich in nonpolar residues. Chou-Fasman analysis suggests three stretches of alpha-helix coinciding (or within) the high-charge-density sequences with a single beta-turn at the first polar-nonpolar junction. Comparison of the F6 sequence with those of other proteins did not reveal any homologous structures. PMID:6149548

  8. Hybridization and sequencing of nucleic acids using base pair mismatches

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  9. Amino acid sequence of tyrosinase from Neurospora crassa.

    PubMed Central

    Lerch, K

    1978-01-01

    The amino-acid sequence of tyrosinase from Neurospora crassa (monophenol,dihydroxyphenylalanine:oxygen oxidoreductase, EC 1.14.18.1) is reported. This copper-containing oxidase consists of a single polypeptide chain of 407 amino acids. The primary structure was determined by automated and manual sequence analysis on fragments produced by cleavage with cyanogen bromide and on peptides obtained by digestion with trypsin, pepsin, thermolysin, or chymotrypsin. The amino terminus of the protein is acetylated and the single cysteinyl residue 96 is covalently linked via a thioether bridge to histidyl residue 94. The formation and the possible role of this unusual structure in Neurospora tyrosinase is discussed. Dye-sensitized photooxidation of apotyrosinase and active-site-directed inactivation of the native enzyme indicate the possible involvement of histidyl residues 188, 192, 289, and 305 or 306 as ligands to the active-site copper as well as in the catalytic mechanism of this monooxygenase. PMID:151279

  10. Amino acid analysis

    NASA Technical Reports Server (NTRS)

    Winitz, M.; Graff, J. (Inventor)

    1974-01-01

    The process and apparatus for qualitative and quantitative analysis of the amino acid content of a biological sample are presented. The sample is deposited on a cation exchange resin and then is washed with suitable solvents. The amino acids and various cations and organic material with a basic function remain on the resin. The resin is eluted with an acid eluant, and the eluate containing the amino acids is transferred to a reaction vessel where the eluant is removed. Final analysis of the purified acylated amino acid esters is accomplished by gas-liquid chromatographic techniques.

  11. Molecular characterization of MT3 antigens by two-dimensional gel electrophoresis, NH2-terminal amino acid sequence analysis, and southern blot analysis.

    PubMed Central

    Sorrentino, R; Lillie, J; Strominger, J L

    1985-01-01

    The monoclonal antibody 109d6, which recognizes major histocompatibility antigen MT3-like serologic determinants, has been used to characterize the molecules bearing this determinant in HLA-DR4 and -DR7 homozygous cell lines by two-dimensional gel and sequencing analyses. By these two criteria, these molecules are identical to each other. Southern blot analysis of genomic DNA from HLA-DR1 through -DR7 homozygous cell lines with DR beta-chain gene probes reveals a striking similarity in the pattern of hybridizing fragments between DR4 and DR7 haplotypes and among DR3, DR5, and DRw6 haplotypes reminiscent of the MT3/MT2 allodeterminant distribution. The sharing of the MT2 determinant between DR3, DR5, and DRw6 haplotypes and of the MT3 determinant between DR4 and DR7 haplotypes is part of a broader "homology," which may be a consequence of more recent separation of the haplotypes sharing the MT2 determinant on the one hand and the haplotypes sharing the MT3 determinant on the other hand. Images PMID:2582424

  12. Molecular characterization of MT3 antigens by two-dimensional gel electrophoresis, NH2-terminal amino acid sequence analysis, and southern blot analysis.

    PubMed

    Sorrentino, R; Lillie, J; Strominger, J L

    1985-06-01

    The monoclonal antibody 109d6, which recognizes major histocompatibility antigen MT3-like serologic determinants, has been used to characterize the molecules bearing this determinant in HLA-DR4 and -DR7 homozygous cell lines by two-dimensional gel and sequencing analyses. By these two criteria, these molecules are identical to each other. Southern blot analysis of genomic DNA from HLA-DR1 through -DR7 homozygous cell lines with DR beta-chain gene probes reveals a striking similarity in the pattern of hybridizing fragments between DR4 and DR7 haplotypes and among DR3, DR5, and DRw6 haplotypes reminiscent of the MT3/MT2 allodeterminant distribution. The sharing of the MT2 determinant between DR3, DR5, and DRw6 haplotypes and of the MT3 determinant between DR4 and DR7 haplotypes is part of a broader "homology," which may be a consequence of more recent separation of the haplotypes sharing the MT2 determinant on the one hand and the haplotypes sharing the MT3 determinant on the other hand.

  13. Purification, characterization and amino-acid sequence analysis of a thermostable, low molecular mass endo-beta-1,4-glucanase from blue mussel, Mytilus edulis.

    PubMed

    Xu, B; Hellman, U; Ersson, B; Janson, J C

    2000-08-01

    A cellulase (endo-beta-1,4-D-glucanase, EC 3.2.1.4) from blue mussel (Mytilus edulis) was purified to homogeneity using a combination of acid precipitation, heat precipitation, immobilized metal ion affinity chromatography, size-exclusion chromatography and ion-exchange chromatography. Purity was analyzed by SDS/PAGE, IEF and RP-HPLC. The cellulase (endoglucanase) was characterized with regard to enzymatic properties, isoelectric point, molecular mass and amino-acid sequence. It is a single polypeptide chain of 181 amino acids cross-linked with six disulfide bridges. Its molecular mass, as measured by MALDI-MS, is 19 702 Da; a value of 19 710.57 Da was calculated from amino-acid composition. The isoelectric point of the enzyme was estimated by isoelectric focusing in a polyacrylamide gel to a value of 7.6. According to amino-acid composition, the theoretical pI is 7.011. The effect of temperature on the endoglucanase activity, with carboxymethyl cellulose and amorphous cellulose as substrates, respectively, was studied at pH 5.5 and displayed an unusually broad optimum activity temperature range between 30 and 50 degrees C. Another unusual feature is that the enzyme retains 55-60% of its maximum activity at 0 degrees C. The enzyme readily degrades amorphous cellulose and carboxymethyl cellulose but displays no hydrolytic activity towards crystalline cellulose (Avicel) and shows no cross-specificity for xylan; there is no binding to Avicel. The enzyme can withstand 10 min at 100 degrees C without irreversible loss of enzymatic activity. Amino-acid sequence-based classification has revealed that the enzyme belongs to the glycoside hydrolase family 45, subfamily 2 (B. Henrissat, Centre de Recherches sur les Macromolecules Végétales, CNRS, Joseph Fourier Université, Grenoble, France, personal communication).

  14. Molecular cloning and amino acid sequence of human 5-lipoxygenase

    SciTech Connect

    Matsumoto, T.; Funk, C.D.; Radmark, O.; Hoeoeg, J.O.; Joernvall, H.; Samuelsson, B.

    1988-01-01

    5-Lipoxygenase (EC 1.13.11.34), a Ca/sup 2 +/- and ATP-requiring enzyme, catalyzes the first two steps in the biosynthesis of the peptidoleukotrienes and the chemotactic factor leukotriene B/sub 4/. A cDNA clone corresponding to 5-lipoxygenase was isolated from a human lung lambda gt11 expression library by immunoscreening with a polyclonal antibody. Additional clones from a human placenta lambda gt11 cDNA library were obtained by plaque hybridization with the /sup 32/P-labeled lung cDNA clone. Sequence data obtained from several overlapping clones indicate that the composite DNAs contain the complete coding region for the enzyme. From the deduced primary structure, 5-lipoxygenase encodes a 673 amino acid protein with a calculated molecular weight of 77,839. Direct analysis of the native protein and its proteolytic fragments confirmed the deduced composition, the amino-terminal amino acid sequence, and the structure of many internal segments. 5-Lipoxygenase has no apparent sequence homology with leukotriene A/sub 4/ hydrolase or Ca/sup 2 +/-binding proteins. RNA blot analysis indicated substantial amounts of an mRNA species of approx. = 2700 nucleotides in leukocytes, lung, and placenta.

  15. The complementary deoxyribonucleic acid sequence of guinea pig endometrial prorelaxin.

    PubMed

    Lee, Y A; Bryant-Greenwood, G D; Mandel, M; Greenwood, F C

    1992-03-01

    The nucleotide sequence of the relaxin gene transcript in the endometrium of the late pregnant guinea pig has been determined. The strategy used was a combination of polymerase chain reaction (PCR) with primers designed from the mRNA sequence of porcine preprorelaxin, rapid amplification of cDNA ends-PCR, and blunt end cloning in M13 mp18. With heterologous primers, a 226-basepair (bp) segment of the guinea pig relaxin gene sequence was obtained and was used to design a guinea pig-specific primer for use with the rapid amplification of cDNA ends-PCR method. The latter allowed completion of the sequence of 336 bp, with a 96-bp overlap. The sequence obtained shows greater homology at both the nucleotide and amino acid levels with porcine and human relaxins H1 and H2 than with rat relaxin, supporting the thesis that the guinea pig is not a rodent. The transcription of the guinea pig endometrial relaxin gene during pregnancy was confirmed by Northern analysis of guinea pig endometrial tissues with a species-specific cDNA probe. The endometrial relaxin gene is transcribed during pregnancy, but not in lactation, consistent with the observed immunostaining for relaxin.

  16. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2002-01-01

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  17. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2006-07-04

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  18. Kit for detecting nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2001-01-01

    A kit is provided for detecting a target nucleic acid sequence in a sample, the kit comprising: a first hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the first hybridization probe including a first complexing agent for forming a binding pair with a second complexing agent; and a second hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the first hybridization probe does not selectively hybridize, the second hybridization probe including a detectable marker; a third hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the third hybridization probe including the same detectable marker as the second hybridization probe; and a fourth hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the third hybridization probe does not selectively hybridize, the fourth hybridization probe including the first complexing agent for forming a binding pair with the second complexing agent; wherein the first and second hybridization probes are capable of simultaneously hybridizing to the target sequence and the third and fourth hybridization probes are capable of simultaneously hybridizing to the target sequence, the detectable marker is not present on the first or fourth hybridization probes and the first, second, third, and fourth hybridization probes each include a competitive nucleic acid sequence which is sufficiently complementary to a third portion of the target sequence that the competitive sequences of the first, second, third, and fourth hybridization probes compete with each other to hybridize to the third portion of the

  19. The amino acid sequence of wood duck lysozyme.

    PubMed

    Araki, T; Torikata, T

    1999-01-01

    The amino acid sequence of wood duck (Aix sponsa) lysozyme was analyzed. Carboxymethylated lysozyme was digested with trypsin and the resulting peptides were sequenced. The established amino acid sequence had the highest similarity to duck III lysozyme with four amino acid substitutions, and had eighteen amino acid substitutions from chicken lysozyme. The valine at position 75 was newly detected in chicken-type lysozymes. In the active site, Tyr34 and Glu57 were found at subsites F and D, respectively, when compared with chicken lysozyme.

  20. Sequence analysis of a gene cluster involved in metabolism of 2,4,5-trichlorophenoxyacetic acid by Burkholderia cepacia AC1100.

    PubMed Central

    Daubaras, D L; Hershberger, C D; Kitano, K; Chakrabarty, A M

    1995-01-01

    Burkholderia cepacia AC1100 utilizes 2,4,5-trichlorophenoxyacetic acid (2,4,5-T) as a sole source of carbon and energy. PT88 is a chromosomal deletion mutant of B. cepacia AC1100 and is unable to grow on 2,4,5-T. The nucleotide sequence of a 5.5-kb chromosomal fragment from B. cepacia AC1100 which complemented PT88 for growth on 2,4,5-T was determined. The sequence revealed the presence of six open reading frames, designated ORF1 to ORF6. Five polypeptides were produced when this DNA region was under control of the T7 promoter in Escherichia coli; however, no polypeptide was produced from the fourth open reading frame, ORF4. Homology searches of protein sequence databases were performed to determine if the proteins involved in 2,4,5-T metabolism were similar to other biodegradative enzymes. In addition, complementation studies were used to determine which genes were essential for the metabolism of 2,4,5-T. The first gene of the cluster, ORF1, encoded a 37-kDa polypeptide which was essential for complementation of PT88 and showed significant homology to putative trans-chlorodienelactone isomerases. The next gene, ORF2, was necessary for complementation and encoded a 47-kDa protein which showed homology to glutathione reductases. ORF3 was not essential for complementation; however, both the 23-kDa protein encoded by ORF3 and the predicted amino acid sequence of ORF4 showed homology to glutathione S-transferases. ORF5, which encoded an 11-kDa polypeptide, was essential for growth on 2,4,5-T, but the amino acid sequence did not show homology to those of any known proteins. The last gene of the cluster, ORF6, was necessary for complementation of PT88, and the 32-kDa protein encoded by this gene showed homology to catechol and chlorocatechol-1,2-dioxygenases. PMID:7538273

  1. Solid phase sequencing of double-stranded nucleic acids

    DOEpatents

    Fu, Dong-Jing; Cantor, Charles R.; Koster, Hubert; Smith, Cassandra L.

    2002-01-01

    This invention relates to methods for detecting and sequencing of target double-stranded nucleic acid sequences, to nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probe comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include nucleic acids in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated determination of molecular weights and identification of the target sequence.

  2. Soil amino acid composition across a boreal forest successional sequence

    Treesearch

    Nancy R. Werdin-Pfisterer; Knut Kielland; Richard D. Boone

    2009-01-01

    Soil amino acids are important sources of organic nitrogen for plant nutrition, yet few studies have examined which amino acids are most prevalent in the soil. In this study, we examined the composition, concentration, and seasonal patterns of soil amino acids across a primary successional sequence encompassing a natural gradient of plant productivity and soil...

  3. Feature selection from short amino acid sequences in phosphorylation prediction problem

    NASA Astrophysics Data System (ADS)

    Wecławski, Jakub; Jankowski, Stanisław; Szymański, Zbigniew

    The paper describes solution of feature selection from amino acid sequences in phosphorylation prediction problem. We show that even for short sequences the variable selection leads to better classification performance. Moreover, the final simplicity of models allows for better data understanding and can be used by an expert for further analysis. The feature selection process is divided into two parts: i) the classification tree is used for finding the most relevant positions in amino acid sequences, ii) then the contrast pattern kernel is applied for pattern selection. This work summarizes the research made on classification of short amino acid sequences. The results of the research allowed us to propose a general scheme of amino acid sequence analysis.

  4. De novo sequencing and comparative transcriptome analysis of adventitious root development induced by exogenous indole-3-butyric acid in cuttings of tetraploid black locust.

    PubMed

    Quan, Jine; Meng, Seng; Guo, Erhui; Zhang, Sheng; Zhao, Zhong; Yang, Xitian

    2017-02-16

    Indole-3-butyric acid (IBA) is applied to the cuttings of various plant species to induce formation of adventitious roots (ARs) in commercial settings. Tetraploid black locust is an attractive ornamental tree that is drought resistant, sand tolerant, can prevent sand erosion and has various commercial uses. To further elucidate the mechanisms of AR formation, we used Illumina sequencing to analyze transcriptome dynamics and differential gene expression at four developmental stages in control (CK) and IBA-treated groups. The short reads were assembled into 127,038 unitranscripts and 101,209 unigenes, with average lengths of 986 and 852 bp. In total, 10,181 and 14,924 differentially expressed genes (DEGs) were detected in the CK and IBA-treated groups, respectively. Comparison of the four consecutive developmental stages showed that 282 and 260 DEGs were shared between IBA-treated and CK, suggesting that IBA treatment increased the number of DEGs. We observed 1,721 up-regulated and 849 down-regulated genes in CI vs. II, 849 up-regulated and 836 down-regulated genes in CC vs. IC, 881 up-regulated and 631 down-regulated genes in CRP vs. IRP, and 5,626 up-regulated and 4,932 down-regulated genes in CAR vs. IAR, of which 25 up-regulated DEGs were common to four pairs, and these DEGs were significantly up-regulated at AR. These results suggest that substantial changes in gene expression are associated with adventitious rooting. GO functional category analysis indicated that IBA significantly up- or down-regulated processes associated with regulation of transcription, transcription of DNA dependent, integral to membrane and ATP binding during the development process. KEGG pathway enrichment indicated that glycolysis/gluconeogenesis, cysteine and methionine metabolism, photosynthesis, nucleotide sugar metabolism, and lysosome were the pathways most highly regulated by IBA. We identified a number of differentially regulated unigenes, including 12 methionine-related genes

  5. Completion of the amino acid sequence of the alpha 1 chain from type I calf skin collagen. Amino acid sequence of alpha 1(I)B8.

    PubMed Central

    Glanville, R W; Breitkreutz, D; Meitinger, M; Fietzek, P P

    1983-01-01

    The complete amino acid sequence of the 279-residue CNBr peptide CB8 from the alpha 1 chain of type I calf skin collagen is presented. It was determined by sequencing overlapping fragments of CB8 produced by Staphylococcus aureus V8 proteinase, trypsin, Endoproteinase Arg-C and hydroxylamine. Tryptic cleavages were also made specific for lysine by blocking arginine residues with cyclohexane-1,2-dione. This completes the amino acid sequence analysis of the 1054-residues-long alpha (I) chain of calf skin collagen. PMID:6354180

  6. Automated carboxy-terminal sequence analysis of peptides.

    PubMed Central

    Bailey, J. M.; Shenoy, N. R.; Ronk, M.; Shively, J. E.

    1992-01-01

    Proteins and peptides can be sequenced from the carboxy-terminus with isothiocyanate reagents to produce amino acid thiohydantoin derivatives. Previous studies in our laboratory have focused on solution phase conditions for formation of the peptidylthiohydantoins with trimethylsilylisothiocyanate (TMS-ITC) and for hydrolysis of these peptidylthiohydantoins into an amino acid thiohydantoin derivative and a new shortened peptide capable of continued degradation (Bailey, J. M. & Shively, J. E., 1990, Biochemistry 29, 3145-3156). The current study is a continuation of this work and describes the construction of an instrument for automated C-terminal sequencing, the application of the thiocyanate chemistry to peptides covalently coupled to a novel polyethylene solid support (Shenoy, N. R., Bailey, J. M., & Shively, J. E., 1992, Protein Sci. I, 58-67), the use of sodium trimethylsilanolate as a novel reagent for the specific cleavage of the derivatized C-terminal amino acid, and the development of methodology to sequence through the difficult amino acid, aspartate. Automated programs are described for the C-terminal sequencing of peptides covalently attached to carboxylic acid-modified polyethylene. The chemistry involves activation with acetic anhydride, derivatization with TMS-ITC, and cleavage of the derivatized C-terminal amino acid with sodium trimethylsilanolate. The thiohydantoin amino acid is identified by on-line high performance liquid chromatography using a Phenomenex Ultracarb 5 ODS(30) column and a triethylamine/phosphoric acid buffer system containing pentanesulfonic acid. The generality of our automated C-terminal sequencing methodology was examined by sequencing model peptides containing all 20 of the common amino acids. All of the amino acids were found to sequence in high yield (90% or greater) except for asparagine and aspartate, which could be only partially removed, and proline, which was found not be capable of derivatization. In spite of these

  7. Amino acid sequence of mouse submaxillary gland renin.

    PubMed Central

    Misono, K S; Chang, J J; Inagami, T

    1982-01-01

    The complete amino acid sequences of the heavy chain and light chain of mouse submaxillary gland renin have been determined. The heavy chain consists of 288 amino acid residues having a Mr of 31,036 calculated from the sequence. The light chain contains 48 amino acid residues with a Mr of 5,458. The sequence of the heavy chain was determined by automated Edman degradations of the cyanogen bromide peptides and tryptic peptides generated after citraconylation, as well as other peptides generated therefrom. The sequence of the light chain was derived from sequence analyses of the peptides generated by cyanogen bromide cleavage or by digestion with Staphylococcus aureus protease. The sequences in the active site regions in renin containing two catalytically essential aspartyl residues 32 and 215 were found identical with those in pepsin, chymosin, and penicillopepsin. Comparison of the amino acid sequence of renin with that of porcine pepsin indicated a 42% sequence identity of the heavy chain with the amino-terminal and middle regions and a 46% identity of the light chain with the carboxyl-terminal region of the porcine pepsin sequence. Residues identical in renin and pepsin are distributed throughout the length of the molecules, suggesting a similarity in their overall structures. PMID:6812055

  8. Bovine testis acylphosphatase: purification and amino acid sequence.

    PubMed

    Pazzagli, L; Cappugi, G; Camici, G; Manao, G; Ramponi, G

    1993-10-01

    Two acylphosphatase molecular forms have been isolated from bovine testis. Their amino acid sequence was determined. One (ACY1) consists of 98 amino acid residues, while the other one (ACY2) consists of 100 amino acid residues. Both molecular forms are N-acetylated and differ only in the amino terminus. ACY2 has an additional Ser-Met tail with respect to ACY1. Both ACY1 and ACY2 are organ-common type isoenzymes and thus differ for about half of the amino acid positions from the previously sequenced bovine muscle isoenzyme.

  9. An Integrated Sequence-Structure Database incorporating matching mRNA sequence, amino acid sequence and protein three-dimensional structure data.

    PubMed Central

    Adzhubei, I A; Adzhubei, A A; Neidle, S

    1998-01-01

    We have constructed a non-homologous database, termed the Integrated Sequence-Structure Database (ISSD) which comprises the coding sequences of genes, amino acid sequences of the corresponding proteins, their secondary structure and straight phi,psi angles assignments, and polypeptide backbone coordinates. Each protein entry in the database holds the alignment of nucleotide sequence, amino acid sequence and the PDB three-dimensional structure data. The nucleotide and amino acid sequences for each entry are selected on the basis of exact matches of the source organism and cell environment. The current version 1.0 of ISSD is available on the WWW at http://www.protein.bio.msu.su/issd/ and includes 107 non-homologous mammalian proteins, of which 80 are human proteins. The database has been used by us for the analysis of synonymous codon usage patterns in mRNA sequences showing their correlation with the three-dimensional structure features in the encoded proteins. Possible ISSD applications include optimisation of protein expression, improvement of the protein structure prediction accuracy, and analysis of evolutionary aspects of the nucleotide sequence-protein structure relationship. PMID:9399866

  10. Prediction of protein antigenic determinants from amino acid sequences

    SciTech Connect

    Hopp, T.P.; Woods, K.R.

    1981-06-01

    A method is presented for locating protein antigenic determinants by analyzing amino acid sequences in order to find the point of greatest local hydrophilicity. This is accomplished by assigning each amino acid a numerical value (hydrophilicity value) and then repetitively averaging these values along the peptide chain. The point of highest local average hydrophilicity is invariably located in, or immediately adjacent to, an antigenic determinant. It was found that the prediction success rate depended on averaging group length, with hexapeptide averages yielding optimal results. The method was developed using 12 proteins for which extensive immunochemical analysis has been carried out and subsequently was used to predict antigenic determinants for the following proteins: hepatitis B surface antigen, influenza hemagglutinis, fowl plague virus hemagglutinin, human histocompatibility antigen HLA-B7, human interferons, Escherichia coli and cholera enterotoxins, ragweed allergens Ra3 and Ra5, and streptococcal M protein. The hepatitis B surface antigen sequence was synthesized by chemical means and was shown to have antigenic activity by radioimmunoassay.

  11. De novo Sequencing and Transcriptome Analysis of Pinellia ternata Identify the Candidate Genes Involved in the Biosynthesis of Benzoic Acid and Ephedrine

    PubMed Central

    Zhang, Guang-hui; Jiang, Ni-hao; Song, Wan-ling; Ma, Chun-hua; Yang, Sheng-chao; Chen, Jun-wen

    2016-01-01

    Background: The medicinal herb, Pinellia ternata, is purported to be an anti-emetic with analgesic and sedative effects. Alkaloids are the main biologically active compounds in P. ternata, especially ephedrine that is a phenylpropylamino alkaloid specifically produced by Ephedra and Catha edulis. However, how ephedrine is synthesized in plants is uncertain. Only the phenylalanine ammonia lyase (PAL) and relevant genes in this pathway have been characterized. Genomic information of P. ternata is also unavailable. Results: We analyzed the transcriptome of the tuber of P. ternata with the Illumina HiSeq™ 2000 sequencing platform. 66,813,052 high-quality reads were generated, and these reads were assembled de novo into 89,068 unigenes. Most known genes involved in benzoic acid biosynthesis were identified in the unigene dataset of P. ternata, and the expression patterns of some ephedrine biosynthesis-related genes were analyzed by reverse transcription quantitative real-time PCR (RT-qPCR). Also, 14,468 simple sequence repeats (SSRs) were identified from 12,000 unigenes. Twenty primer pairs for SSRs were randomly selected for the validation of their amplification effect. Conclusion: RNA-seq data was used for the first time to provide a comprehensive gene information on P. ternata at the transcriptional level. These data will advance molecular genetics in this valuable medicinal plant. PMID:27579029

  12. Characterization, Genome Sequence, and Analysis of Escherichia Phage CICC 80001, a Bacteriophage Infecting an Efficient L-Aspartic Acid Producing Escherichia coli.

    PubMed

    Xu, Youqiang; Ma, Yuyue; Yao, Su; Jiang, Zengyan; Pei, Jiangsen; Cheng, Chi

    2016-03-01

    Escherichia phage CICC 80001 was isolated from the bacteriophage contaminated medium of an Escherichia coli strain HY-05C (CICC 11022S) which could produce L-aspartic acid. The phage had a head diameter of 45-50 nm and a tail of about 10 nm. The one-step growth curve showed a latent period of 10 min and a rise period of about 20 min. The average burst size was about 198 phage particles per infected cell. Tests were conducted on the plaques, multiplicity of infection, and host range. The genome of CICC 80001 was sequenced with a length of 38,810 bp, and annotated. The key proteins leading to host-cell lysis were phylogenetically analyzed. One protein belonged to class II holin, and the other two belonged to the endopeptidase family and N-acetylmuramoyl-L-alanine amidase family, respectively. The genome showed the sequence identity of 82.7% with that of Enterobacteria phage T7, and carried ten unique open reading frames. The bacteriophage resistant E. coli strain designated CICC 11021S was breeding and its L-aspartase activity was 84.4% of that of CICC 11022S.

  13. Inferring species trees from gene trees: a phylogenetic analysis of the Elapidae (Serpentes) based on the amino acid sequences of venom proteins.

    PubMed

    Slowinski, J B; Knight, A; Rooney, A P

    1997-12-01

    Toward the goal of recovering the phylogenetic relationships among elapid snakes, we separately found the shortest trees from the amino acid sequences for the venom proteins phospholipase A2 and the short neurotoxin, collectively representing 32 species in 16 genera. We then applied a method we term gene tree parsimony for inferring species trees from gene trees that works by finding the species tree which minimizes the number of deep coalescences or gene duplications plus unsampled sequences necessary to fit each gene tree to the species tree. This procedure, which is both logical and generally applicable, avoids many of the problems of previous approaches for inferring species trees from gene trees. The results support a division of the elapids examined into sister groups of the Australian and marine (laticaudines and hydrophiines) species, and the African and Asian species. Within the former clade, the sea snakes are shown to be diphyletic, with the laticaudines and hydrophiines having separate origins. This finding is corroborated by previous studies, which provide support for the usefulness of gene tree parsimony.

  14. Multiple Genome Sequences of Important Beer-Spoiling Lactic Acid Bacteria.

    PubMed

    Geissler, Andreas J; Behr, Jürgen; Vogel, Rudi F

    2016-10-06

    Seven strains of important beer-spoiling lactic acid bacteria were sequenced using single-molecule real-time sequencing. Complete genomes were obtained for strains of Lactobacillus paracollinoides, Lactobacillus lindneri, and Pediococcus claussenii The analysis of these genomes emphasizes the role of plasmids as the genomic foundation of beer-spoiling ability. Copyright © 2016 Geissler et al.

  15. Multiple Genome Sequences of Important Beer-Spoiling Lactic Acid Bacteria

    PubMed Central

    Geissler, Andreas J.; Vogel, Rudi F.

    2016-01-01

    Seven strains of important beer-spoiling lactic acid bacteria were sequenced using single-molecule real-time sequencing. Complete genomes were obtained for strains of Lactobacillus paracollinoides, Lactobacillus lindneri, and Pediococcus claussenii. The analysis of these genomes emphasizes the role of plasmids as the genomic foundation of beer-spoiling ability. PMID:27795248

  16. Nucleotide sequence and functional analysis of the genes encoding 2,4,5-trichlorophenoxyacetic acid oxygenase in Pseudomonas cepacia AC1100.

    PubMed Central

    Danganan, C E; Ye, R W; Daubaras, D L; Xun, L; Chakrabarty, A M

    1994-01-01

    Pseudomonas cepacia AC1100 is able to use the chlorinated aromatic compound 2,4,5-trichlorophenoxyacetic acid (2,4,5-T) as the sole source of carbon and energy. One of the early steps in this pathway is the conversion of 2,4,5-T to 2,4,5-trichlorophenol (2,4,5-TCP). 2,4,5-TCP accumulates in the culture medium when AC1100 is grown in the presence of 2,4,5-T. A DNA region from the AC1100 genome has been subcloned as a 2.7-kb SstI-XbaI DNA fragment, which on transfer to Pseudomonas aeruginosa PAO1 allows the conversion of 2,4,5-T to 2,4,5-TCP. We have determined the directions of transcription of these genes as well as the complete nucleotide sequences of the genes and the number and sizes of the polypeptides synthesized by pulse-labeling experiments. This 2.7-kb DNA fragment encodes two polypeptides with calculated molecular masses of 51 and 18 kDa. Proteins of similar sizes were seen in the T7 pulse-labeling experiment in Escherichia coli. We have designated the genes for these proteins tftA1 (which encodes the 51-kDa protein) and tftA2 (which encodes the 18-kDa protein). TftA1 and TftA2 have strong amino acid sequence homology to BenA and BenB from the benzoate 1,2-dioxygenase system of Acinetobacter calcoaceticus, as well as to XylX and XylY from the toluate 1,2-dioxygenase system of Pseudomonas putida. The Pseudomonas aeruginosa PAO1 strain containing the 2.7-kb SstI-XbaI fragment was able to convert not only 2,4,5-T to 2,4,5-TCP but also 2,4-dichlorophenoxyacetic acid to 2,4-dichlorophenol and phenoxyacetate to phenol. Images PMID:7527626

  17. [Computer programs for the analysis of nucleotide sequences (MALK)].

    PubMed

    Mironov, A A; Aleksandrov, N N; Liunovskaia-Gurova, L V; Kister, A E

    1987-01-01

    A system for the computer analysis of nucleic acid and protein sequences ("Helix") is described. Format of the DNA sequences is EMBL--compatible and may be easily commented with the help of convenient menus. "Helix" has also following possibilities: an effective alignment of gele reading data and formation of the final sequence; simple making of recombined molecules "in calcular"; calculations of nucleotide and dinucleotide distribution along the sequence; looking for coding frames; calculations percentage of codons and amino acids in coding frames; searching for direct and inverted repeats; sequences alignment; protein secondary structure prediction; restriction mapping; DNA--protein translation. "Helix" also contain programs for RNA-structure prediction, looking for homologies throughover the EMAL bank, choosing optimal sequence for probes and searching promoters. All the programs are written at FORTRAN-77 and automatically translated into FORTRAN-4. "Helix" require only 64 kbite.

  18. Amino Acid Sequence of Human Cholinesterase

    DTIC Science & Technology

    1985-10-01

    liquid chromatography (HPLC). Activity testing of the aged, DFP-labeled cholinesterase showed that 99.8% of the active sites had been labeled, since...acids were quantitated by ninhydrin at the AAA Labs, or by derivatization with phenylisothiocyanate at the University of Michigan. The latter method

  19. Collision-Induced Release, Ion Mobility Separation, and Amino Acid Sequence Analysis of Subunits from Mass-Selected Noncovalent Protein Complexes

    NASA Astrophysics Data System (ADS)

    Rathore, Deepali; Dodds, Eric D.

    2014-09-01

    In recent years, mass spectrometry has become a valuable tool for detecting and characterizing protein-protein interactions and for measuring the masses and subunit stoichiometries of noncovalent protein complexes. The gas-phase dissociation of noncovalent protein assemblies via tandem mass spectrometry can be useful in confirming subunit masses and stoichiometries; however, dissociation experiments that are able to yield subunit sequence information must usually be conducted separately. Here, we furnish proof of concept for a method that allows subunit sequence information to be directly obtained from a protein aggregate in a single gas-phase analysis. The experiments were carried out using a quadrupole time-of-flight mass spectrometer equipped with a traveling-wave ion mobility separator. This instrument configuration allows for a noncovalent protein assembly to be quadrupole selected, then subjected to two successive rounds of collision-induced dissociation with an intervening stage of ion mobility separation. This approach was applied to four model proteins as their corresponding homodimers: glucagon, ubiquitin, cytochrome c, and β-lactoglobulin. In each case, b- and y-type fragment ions were obtained upon further collisional activation of the collisionally-released subunits, resulting in up to 50% sequence coverage. Owing to the incorporation of an ion mobility separation, these results also suggest the intriguing possibility of measuring complex mass, complex collisional cross section, subunit masses, subunit collisional cross sections, and sequence information for the subunits in a single gas-phase experiment. Overall, these findings represent a significant contribution towards the realization of protein interactomic analyses, which begin with native complexes and directly yield subunit identities.

  20. Collision-induced release, ion mobility separation, and amino acid sequence analysis of subunits from mass-selected noncovalent protein complexes.

    PubMed

    Rathore, Deepali; Dodds, Eric D

    2014-09-01

    In recent years, mass spectrometry has become a valuable tool for detecting and characterizing protein-protein interactions and for measuring the masses and subunit stoichiometries of noncovalent protein complexes. The gas-phase dissociation of noncovalent protein assemblies via tandem mass spectrometry can be useful in confirming subunit masses and stoichiometries; however, dissociation experiments that are able to yield subunit sequence information must usually be conducted separately. Here, we furnish proof of concept for a method that allows subunit sequence information to be directly obtained from a protein aggregate in a single gas-phase analysis. The experiments were carried out using a quadrupole time-of-flight mass spectrometer equipped with a traveling-wave ion mobility separator. This instrument configuration allows for a noncovalent protein assembly to be quadrupole selected, then subjected to two successive rounds of collision-induced dissociation with an intervening stage of ion mobility separation. This approach was applied to four model proteins as their corresponding homodimers: glucagon, ubiquitin, cytochrome c, and β-lactoglobulin. In each case, b- and y-type fragment ions were obtained upon further collisional activation of the collisionally-released subunits, resulting in up to 50% sequence coverage. Owing to the incorporation of an ion mobility separation, these results also suggest the intriguing possibility of measuring complex mass, complex collisional cross section, subunit masses, subunit collisional cross sections, and sequence information for the subunits in a single gas-phase experiment. Overall, these findings represent a significant contribution towards the realization of protein interactomic analyses, which begin with native complexes and directly yield subunit identities.

  1. Genome sequence of the acid-tolerant strain Rhizobium sp. LPU83.

    PubMed

    Wibberg, Daniel; Tejerizo, Gonzalo Torres; Del Papa, María Florencia; Martini, Carla; Pühler, Alfred; Lagares, Antonio; Schlüter, Andreas; Pistorio, Mariano

    2014-04-20

    Rhizobia are important members of the soil microbiome since they enter into nitrogen-fixing symbiosis with different legume host plants. Rhizobium sp. LPU83 is an acid-tolerant Rhizobium strain featuring a broad-host-range. However, it is ineffective in nitrogen fixation. Here, the improved draft genome sequence of this strain is reported. Genome sequence information provides the basis for analysis of its acid tolerance, symbiotic properties and taxonomic classification.

  2. ABRF ESRG 2005 Study: Identification of Seven Modified Amino Acids by Edman Sequencing

    PubMed Central

    Brune, D.; Denslow, N.D.; Kobayashi, R.; Lane, W.S.; Leone, J.W.; Madden, B.J.; Neveu, J. M.; Pohl, J.

    2006-01-01

    Identification of modified amino acids can be a challenging part for Edman degradation sequence analysis, largely because they are not included among the commonly used phenylthiohydantion amino acid standards. Yet many can have unique retention times and can be assigned by an experienced researcher or through the use of a guide showing their typical chromatography characteristics. The Edman Sequencing Research Group (ESRG) 2005 study is a continuation of the 2004 study, in which the participating laboratories were provided a synthetic peptide and asked to identify the modified amino acids present in the sequence. The study sample provided an opportunity to sequence a peptide containing a variety of modified amino acids and note their retention times relative to the common amino acids. It also allowed the ESRG to compile the chromatographic properties and intensities from multiple instruments and tabulate an average elution position for these modified amino acids on commonly used instruments. Participating laboratories were given 2000 pmoles of a synthetic peptide, 18 amino acids long, containing the following modified amino acids: dimethyl- and trimethyl-lysine, 3-methyl-histidine, N-carbamyl-lysine, cystine, N-methyl-alanine, and isoaspartic acid. The modified amino acids were interspersed with standard amino acids to help in the assessment of initial and repetitive yields. In addition to filling in an assignment sheet, which included retention times and peak areas, participants were asked to provide specific details about the parameters used for the sequencing run. References for some of the modified amino acid elution characteristics were provided and the participants had the option of viewing a list of the modified amino acids present in the peptide at the ESRG Web site. The ABRF ESRG 2005 sample is the seventeenth in a series of studies designed to aid laboratories in evaluating their abilities to obtain and interpret amino acid sequence data. PMID:17122064

  3. Comparative genome sequence analysis of Sulfolobus acidocaldarius and 9 other isolates of its genus for factors influencing codon and amino acid usage.

    PubMed

    Nayak, Kinshuk Chandra

    2013-01-15

    In the present study, major constraints for codon and amino acid usage of Sulfolobus acidocaldarius, Sulfolobus solfataricus, Sulfolobus tokodali, Sulfolobus islandis and 6 other isolates from islandicus species of genus Sulfolobus were investigated. Correspondence analysis revealed high significant correlation between the major trend of synonymous codon usage and gene expression level, as assessed by the "Codon Adaptation Index" (CAI). There is a significant negative correlation between Nc (Effective number of codons) and CAI demonstrating role of codon bias as an important determinant of codon usage. The significant correlation between major trend of synonymous codon usage and GC3s (G+C at third synonymous position) indicated dominant role of mutational bias in codon usage pattern. The result was further supported from SCUO (synonymous codon usage order) analysis. The amino acid usage was found to be significantly influenced by aromaticity and hydrophobicity of proteins. However, translational selection which causes a preference for codons that are most rapidly translated by current tRNA with multiple copy numbers was not found to be highly dominating for all studied isolates. Notably, 26 codons that were found to be optimally used by genes of S. acidocaldarius at higher expression level and its comparative analysis with 9 other isolates may provide some useful clues for further in vivo genetic studies on this genus.

  4. Human liver apolipoprotein B-100 cDNA: complete nucleic acid and derived amino acid sequence.

    PubMed Central

    Law, S W; Grant, S M; Higuchi, K; Hospattankar, A; Lackner, K; Lee, N; Brewer, H B

    1986-01-01

    Human apolipoprotein B-100 (apoB-100), the ligand on low density lipoproteins that interacts with the low density lipoprotein receptor and initiates receptor-mediated endocytosis and low density lipoprotein catabolism, has been cloned, and the complete nucleic acid and derived amino acid sequences have been determined. ApoB-100 cDNAs were isolated from normal human liver cDNA libraries utilizing immunoscreening as well as filter hybridization with radiolabeled apoB-100 oligodeoxynucleotides. The apoB-100 mRNA is 14.1 kilobases long encoding a mature apoB-100 protein of 4536 amino acids with a calculated amino acid molecular weight of 512,723. ApoB-100 contains 20 potential glycosylation sites, and 12 of a total of 25 cysteine residues are located in the amino-terminal region of the apolipoprotein providing a potential globular structure of the amino terminus of the protein. ApoB-100 contains relatively few regions of amphipathic helices, but compared to other human apolipoproteins it is enriched in beta-structure. The delineation of the entire human apoB-100 sequence will now permit a detailed analysis of the conformation of the protein, the low density lipoprotein receptor binding domain(s), and the structural relationship between apoB-100 and apoB-48 and will provide the basis for the study of genetic defects in apoB-100 in patients with dyslipoproteinemias. PMID:3464946

  5. Amino acid sequence of toxin III from Anemonia sulcata.

    PubMed

    Bĕress, L; Wunderer, G; Wachter, E

    1977-08-01

    Toxin III, the smallest toxin component of the poison of the sea anemone Anemonia sulcata, is a polypeptide with 27 amino acids. Its structure is stabilized by three disulfide bridges. The amino acid sequence was determined by solid-phase Edman degradation of the aminoethylated derivative. The peptide was coupled to the carrier, porous glass, by thiourea bridges between the alpha-amino group of arginine-1 and the epsilon-amino group of lysine-26 and the isothiocyanate groups of the carrier. Another fraction of the polypeptide was bound by an acid-amide condensation of the C-terminal valine-27 with the aminopropyl group of the carrier. The sequence of toxin III has no regions homologous to the 47-residue toxin II. Comparison with the known partial sequence of toxin I, which contains 46 amino acids (Wunderer, G. & Eulitz, M., in preparation) also fails to reveal homologies.

  6. Ultrafast clustering algorithms for metagenomic sequence analysis

    PubMed Central

    Fu, Limin; Niu, Beifang; Wu, Sitao; Wooley, John

    2012-01-01

    The rapid advances of high-throughput sequencing technologies dramatically prompted metagenomic studies of microbial communities that exist at various environments. Fundamental questions in metagenomics include the identities, composition and dynamics of microbial populations and their functions and interactions. However, the massive quantity and the comprehensive complexity of these sequence data pose tremendous challenges in data analysis. These challenges include but are not limited to ever-increasing computational demand, biased sequence sampling, sequence errors, sequence artifacts and novel sequences. Sequence clustering methods can directly answer many of the fundamental questions by grouping similar sequences into families. In addition, clustering analysis also addresses the challenges in metagenomics. Thus, a large redundant data set can be represented with a small non-redundant set, where each cluster can be represented by a single entry or a consensus. Artifacts can be rapidly detected through clustering. Errors can be identified, filtered or corrected by using consensus from sequences within clusters. PMID:22772836

  7. Nonlinear analysis of biological sequences

    SciTech Connect

    Torney, D.C.; Bruno, W.; Detours, V.

    1998-11-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The main objectives of this project involved deriving new capabilities for analyzing biological sequences. The authors focused on tabulating the statistical properties exhibited by Human coding DNA sequences and on techniques of inferring the phylogenetic relationships among protein sequences related by descent.

  8. Analysis of amino acid sequences of penicillin-binding protein 2 in clinical isolates of Neisseria gonorrhoeae with reduced susceptibility to cefixime and ceftriaxone.

    PubMed

    Osaka, Kazuyoshi; Takakura, Tadakazu; Narukawa, Kayo; Takahata, Masahiro; Endo, Katsuhisa; Kiyota, Hiroshi; Onodera, Shoichi

    2008-06-01

    Neisseria gonorrhoeae strains with reduced susceptibility to cefixime and ceftriaxone, with minimum inhibitory concentrations (MICs) of cefixime of 0.125-0.25 microg/ml and ceftriaxone of 0.031-0.125 microg/ml, were isolated from male urethritis patients in Tokyo, Japan, in 2006. The amino acid sequences of PenA, penicillin-binding protein 2, in these strains were of two types: PenA mosaic and nonmosaic strains. In the PenA mosaic strain, some regions in the transpeptidase-encoding domain in PenA were similar to those of Neisseria perflava/sicca, Neisseria cinerea, Neisseria flavescens, Neisseria polysaccharea, and Neisseria meningitidis. In the PenA nonmosaic strain, there was a mutation of Ala-501 to Val in PenA. In addition, we performed homology modeling of PenA wild-type and mosaic strains and compared them. The results of the modeling studies suggested that reduced susceptibility to cephems such as cefixime and ceftriaxone is due to a conformational alteration of the beta-lactam-binding pocket. These results also indicated that the mosaic structures and the above point mutation in PenA make a major contribution to the reduced susceptibility to cephem antibiotics.

  9. Single-chain structure of human ceruloplasmin: the complete amino acid sequence of the whole molecule.

    PubMed Central

    Takahashi, N; Ortel, T L; Putnam, F W

    1984-01-01

    We have determined the amino acid sequence of the amino-terminal 67,000-dalton (67-kDa) fragment of human ceruloplasmin and have established overlapping sequences between the 67-kDa and 50-kDa fragments and between the 50-kDa and 19-kDa fragments. The 67-kDa fragment contains 480 amino acid residues and three glucosamine oligosaccharides. These results together with our previous sequence data for the 50-kDa and 19-kDa fragments complete the amino acid sequence of human ceruloplasmin. The polypeptide chain has a total of 1,046 amino acid residues (Mr 120,085) and has attachment sites for four glucosamine oligosaccharides; together these account for the total molecular mass of human ceruloplasmin (132 kDa). The sequence analysis of the peptides overlapping the fragments showed that one additional amino acid, arginine, is present between the 67-kDa and 50-kDa fragments, and another, lysine, is between the 50-kDa and 19-kDa fragments. Only two apparent sites of amino acid interchange have been identified in the polypeptide chain. Both involve a single-point interchange of glycine and lysine that would result in a difference in charge. The results of the complete sequence analysis verified that human ceruloplasmin is composed of a single polypeptide chain and that the subunit-like fragments are produced by proteolytic cleavage during purification (and possibly also in vivo). PMID:6582496

  10. Myoglobin of the shark Heterodontus portusjacksoni: isolation and amino acid sequence.

    PubMed

    Fisher, W K; Thompson, E O

    1979-06-01

    Myoglobin isolated from red muscle of the shark H. portusjacksoni was purified by ion-exchange chromatography on sulfopropyl-Sephadex and gel-filtration. Amino acid analysis and sequence determination showed 148 amino acid residues. The amino terminal residue is acetylated as shown by mass spectrographic analysis of N-terminal peptides. There is a deletion of four residues at the amino terminal end as well as one residue in the CD interhelical area relative to other myoglobins. The complete amino acid sequence has been determined following digestion with trypsin, chymotrypsin, pepsin and staphylococcal protease. Sequences of the purified peptides were determined by the dansyl-Edman procedure. The amino acid sequence showed approximately 85 differences from mammalian, monotreme and bird myoglobins. The date of divergence of the shark H. portusjacksoni from these other orders was estimated at 450 +/- 16 million years, based on the number of amino acid differences between species and allowing for multiple mutations during the evolutionary period. This estimate agrees well with similar estimates made using alpha- and beta-globin sequences, in contrast to widely differing estimates of dates of divergence for monotremes using the same three globin chains. Compared with myoglobins from species previously studied, there are many more differences in amino acid sequences, and in many positions residues are found that are more characteristic of alpha- and beta-globins, suggesting a conservation of residues over a long period of evolutionary time. There are fewer stabilizing hydrogen bonds and salt-linkages than in other myoglobins.

  11. Joint Sequence Analysis: Association and Clustering

    ERIC Educational Resources Information Center

    Piccarreta, Raffaella

    2017-01-01

    In its standard formulation, sequence analysis aims at finding typical patterns in a set of life courses represented as sequences. Recently, some proposals have been introduced to jointly analyze sequences defined on different domains (e.g., work career, partnership, and parental histories). We introduce measures to evaluate whether a set of…

  12. RNA sequence analysis using covariance models.

    PubMed Central

    Eddy, S R; Durbin, R

    1994-01-01

    We describe a general approach to several RNA sequence analysis problems using probabilistic models that flexibly describe the secondary structure and primary sequence consensus of an RNA sequence family. We call these models 'covariance models'. A covariance model of tRNA sequences is an extremely sensitive and discriminative tool for searching for additional tRNAs and tRNA-related sequences in sequence databases. A model can be built automatically from an existing sequence alignment. We also describe an algorithm for learning a model and hence a consensus secondary structure from initially unaligned example sequences and no prior structural information. Models trained on unaligned tRNA examples correctly predict tRNA secondary structure and produce high-quality multiple alignments. The approach may be applied to any family of small RNA sequences. Images PMID:8029015

  13. The amino acid sequence of monal pheasant lysozyme and its activity.

    PubMed

    Araki, T; Matsumoto, T; Torikata, T

    1998-10-01

    The amino acid sequence of monal pheasant lysozyme and its activity were analyzed. Carboxymethylated lysozyme was digested with trypsin and the resulting peptides were sequenced. The established amino acid sequence had one amino acid substitution at position 102 (Arg to Gly) comparing with Indian peafowl lysozyme and four amino acid substitutions at positions 3 (Phe to Tyr), 15 (His to Leu), 41 (Gln to His), and 121 (Gln to His) with chicken lysozyme. Analysis of the time-courses of reaction using N-acetylglucosamine pentamer as a substrate showed a difference of binding free energy change (-0.4 kcal/mol) at subsites A between monal pheasant and Indian peafowl lysozyme. This was assumed to be caused by the amino acid substitution at subsite A with loss of a positive charge at position 102 (Arg102 to Gly).

  14. Amino acid sequences of proteins from Leptospira serovar pomona.

    PubMed

    Alves, S F; Lefebvre, R B; Probert, W

    2000-01-01

    This report describes a partial amino acid sequences from three putative outer envelope proteins from Leptospira serovar pomona. In order to obtain internal fragments for protein sequencing, enzymatic and chemical digestion was performed. The enzyme clostripain was used to digest the proteins 32 and 45 kDa. In situ digestion of 40 kDa molecular weight protein was accomplished using cyanogen bromide. The 32 kDa protein generated two fragments, one of 21 kDa and another of 10 kDa that yielded five residues. A fragment of 24 kDa that yielded nineteen residues of amino acids was obtained from 45 kDa protein. A fragment with a molecular weight of 20 kDa, yielding a twenty amino acids sequence from the 40 kDa protein.

  15. Parvalbumins from coelacanth muscle. III. Amino acid sequence of the major component.

    PubMed

    Jauregui-Adell, J; Pechere, J F

    1978-09-26

    The primary structure of the major parvalbumin (pI = 4.52) from coelacanth muscle (Latimeria chalumnae) has been determined. Sequence analysis of the tryptic peptides, in some cases obtained with beta-trypsin, accounts for the total amino acid content of the protein. Chymotryptic peptides provide appropriate sequence overlaps, to complete the localization of the tryptic peptides. Examination of the amino acid sequence of this protein shows the typical structure of a beta-parvalbumin. Its position in the dendrogram of related calcium-binding proteins corresponds to that usually accepted for crossopterygians.

  16. Immunological responses of turbot (Psetta maxima) to nodavirus infection or polyriboinosinic polyribocytidylic acid (pIC) stimulation, using expressed sequence tags (ESTs) analysis and cDNA microarrays.

    PubMed

    Park, Kyoung C; Osborne, Jane A; Montes, Ariana; Dios, Sonia; Nerland, Audun H; Novoa, Beatriz; Figueras, Antonio; Brown, Laura L; Johnson, Stewart C

    2009-01-01

    To investigate the immunological responses of turbot to nodavirus infection or pIC stimulation, we constructed cDNA libraries from liver, kidney and gill tissues of nodavirus-infected fish and examined the differential gene expression within turbot kidney in response to nodavirus infection or pIC stimulation using a turbot cDNA microarray. Turbot were experimentally infected with nodavirus and samples of each tissue were collected at selected time points post-infection. Using equal amount of total RNA at each sampling time, we made three tissue-specific cDNA libraries. After sequencing 3230 clones we obtained 3173 (98.2%) high quality sequences from our liver, kidney and gill libraries. Of these 2568 (80.9%) were identified as known genes and 605 (19.1%) as unknown genes. A total of 768 unique genes were identified. The two largest groups resulting from the classification of ESTs according to function were the cell/organism defense genes (71 uni-genes) and apoptosis-related process (23 uni-genes). Using these clones, a 1920 element cDNA microarray was constructed and used to investigate the differential gene expression within turbot in response to experimental nodavirus infection or pIC stimulation. Kidney tissue was collected at selected times post-infection (HPI) or stimulation (HPS), and total RNA was isolated for microarray analysis. Of the 1920 genes studied on the microarray, we identified a total of 121 differentially expressed genes in the kidney: 94 genes from nodavirus-infected animals and 79 genes from those stimulated with pIC. Within the nodavirus-infected fish we observed the highest number of differentially expressed genes at 24 HPI. Our results indicate that certain genes in turbot have important roles in immune responses to nodavirus infection and dsRNA stimulation.

  17. The primary structure of E. coli RNA polymerase, Nucleotide sequence of the rpoC gene and amino acid sequence of the beta'-subunit.

    PubMed

    Ovchinnikov YuA; Monastyrskaya, G S; Gubanov, V V; Guryev, S O; Salomatina, I S; Shuvaeva, T M; Lipkin, V M; Sverdlov, E D

    1982-07-10

    The primary structure of the E. coli rpoC gene (5321 base pairs) coding the beta'-subunit of RNA polymerase as well as its adjacent segment have been determined. The structure analysis of the peptides obtained by cleavage of the protein with cyanogen bromide and trypsin has confirmed the amino acid sequence of the beta'-subunit deduced from the nucleotide sequence analysis. The beta'-subunit of E. coli RNA polymerase contains 1407 amino acid residues. Its translation is initiated by codon GUG and terminated by codon TAA. It has been detected that the sequence following the terminating codon is strikingly homologous to known sequences of rho-independent terminators.

  18. N-terminal sequence analysis of proteins and peptides.

    PubMed

    Reim, D F; Speicher, D W

    2001-05-01

    Amino-terminal (N-terminal) sequence analysis is used to identify the order of amino acids of proteins or peptides, starting at their N-terminal end. This unit describes the sequence analysis of protein or peptide samples in solution or bound to PVDF membranes using a Perkin-Elmer Procise Sequencer. Sequence analysis of protein or peptide samples in solution or bound to PVDF membranes using a Hewlett-Packard Model G1005A sequencer is also described. Methods are provided for optimizing separation of PTH amino acid derivatives on Perkin-Elmer instruments and for increasing the proportion of sample injected onto the PTH analyzer on older Perkin-Elmer instruments by installing a modified sample loop. The amount of data obtained from a single sequencer run is substantial, and careful interpretation of this data by an experienced scientist familiar with the current operation performance of the instrument used for this analysis is critically important. A discussion of data interpretation is therefore provided. Finally, discussion of optimization of sequencer performance as well as possible solutions to frequently encountered problems is included.

  19. Extensive amino acid sequence homologies between animal lectins

    SciTech Connect

    Paroutaud, P.; Levi, G.; Teichberg, V.I.; Strosberg, A.D.

    1987-09-01

    The authors have established the amino acid sequence of the ..beta..-D-galactoside binding lectin from the electric eel and the sequences of several peptides from a similar lectin isolated from human placenta. These sequences were compared with the published sequences of peptides derived from the ..beta..-D-galactoside binding lectin from human lung and with sequences deduced from cDNAs assigned to the ..beta..-D-galactoside binding lectins from chicken embryo skin and human hepatomas. Significant homologies were observed. One of the highly conserved regions that contains a tryptophan residue and two glutamic acid resides is probably part of the ..beta..-D-galactoside binding site, which, on the basis of spectroscopic studies of the electric eel lectin, is expected to contain such residues. The similarity of the hydropathy profiles and the predicted secondary structure of the lectins from chicken skin and electric eel, in spite of differences in their amino acid sequences, strongly suggests that these proteins have maintained structural homologies during evolution and together with the other ..beta..-D-galactoside binding lectins were derived form a common ancestor gene.

  20. Amino acid sequence of porcine spleen cathepsin D.

    PubMed Central

    Shewale, J G; Tang, J

    1984-01-01

    The amino acid sequence of porcine spleen cathepsin D heavy chain has been determined and, hence, the complete structure of this enzyme is now known. The sequence of heavy chain was constructed by aligning the structures of peptides generated by cyanogen bromide, trypsin, and endo-proteinase Lys C cleavages. The structure of the light chain has been published previously. The cathepsin D molecule contains 339 amino acid residues in two polypeptide chains: a 97-residue light chain and a 242-residue heavy chain, with a combined Mr of 36,779 (without carbohydrate). There are two carbohydrate units linked to asparagine residues 70 and 192. The disulfide bond arrangement in cathepsin D is probably similar to that of pepsin, because the positions of six half-cystine residues are conserved. The active site aspartyl residues, corresponding to aspartic acid-32 and -215 of pepsin, are located at residues 33 and 224 in the cathepsin D molecule. The amino acid sequence around these aspartyl residues is strongly conserved. Cathepsin D shows a strong homology with other acid proteases. When the sequence of cathepsin D, renin, and pepsin are aligned, 32.7% of the residues are identical. The homology is observed throughout the length of the molecules, indicating that three-dimensional structures of all three molecules are similar. PMID:6587385

  1. Genome Sequencing and Analysis Conference IV

    SciTech Connect

    Not Available

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  2. Analysis of expressed sequence tags (ESTs) from Agrostis species obtained using sequence related amplified polymorphism.

    PubMed

    Dinler, Gizem; Budak, Hikmet

    2008-10-01

    Bentgrass (Agrostis spp.), a genus of the Poaceae family, consists of more than 200 species and is mainly used in athletic fields and golf courses. Creeping bentgrass (A. stolonifera L.) is the most commonly used species in maintaining golf courses, followed by colonial bentgrass (A. capillaris L.) and velvet bentgrass (A. canina L.). The presence and nature of sequence related amplified polymorphism (SRAP) at the cDNA level were investigated. We isolated 80 unique cDNA fragment bands from these species using 56 SRAP primer combinations. Sequence analysis of cDNA clones and analysis of putative translation products revealed that some encoded amino acid sequences were similar to proteins involved in DNA synthesis, transcription, and signal transduction. The cytosolic glyceraldehyde-3-phosphate dehydrogenase (GAPDH) gene (GenBank accession no. EB812822) was also identified from velvet bentgrass, and the corresponding protein sequence is further analyzed due to its critical role in many cellular processes. The partial peptide sequence obtained was 112 amino acids long, presenting a high degree of homology to parts of the N-terminal and C-terminal regions of cytosolic phosphorylating GAPDH (GapC). The existence of common expressed sequence tags (ESTs) revealed by a minimum evolutionary dendrogram among the Agrostis ESTs indicated the usefulness of SRAP for comparative genome analysis of transcribed genes in the grass species.

  3. Amino acid sequences of bacterial cytochromes c' and c-556.

    PubMed Central

    Ambler, R P; Bartsch, R G; Daniel, M; Kamen, M D; McLellan, L; Meyer, T E; Van Beeumen, J

    1981-01-01

    The cytochrome c' are electron transport proteins widely distributed in photosynthetic and aerobic bacteria. We report the amino acid sequences of the proteins from 12 different bacterial species, and we show by sequences that the cytochromes c-556 from 2 different bacteria are structurally related to the cytochromes c'. Unlike the mitochondrial cytochromes c, the heme binding site in the cytochromes c' and c-556 is near the COOH terminus. The cytochromes c-556 probably have a methionine sixth heme ligand located near the NH2 terminus, whereas the cytochromes c' may be pentacoordinate. Quantitative comparison of cytochrome c' and c-556 sequences indicates a relatively low 28% average identity. PMID:6273892

  4. Amino acid sequence of fibrolase, a direct-acting fibrinolytic enzyme from Agkistrodon contortrix contortrix venom.

    PubMed Central

    Randolph, A.; Chamberlain, S. H.; Chu, H. L.; Retzios, A. D.; Markland, F. S.; Masiarz, F. R.

    1992-01-01

    The complete amino acid sequence of fibrolase, a fibrinolytic enzyme from southern copperhead (Agkistrodon contortrix contortrix) venom, has been determined. This is the first report of the sequence of a direct-acting, nonhemorrhagic fibrinolytic enzyme found in snake venom. The majority of the sequence was established by automated Edman degradation of overlapping peptides generated by a variety of selective cleavage procedures. The amino-terminus is blocked by a cyclized glutamine (pyroglutamic acid) residue, and the sequence of this region of the molecule was determined by mass spectrometry. Fibrolase is composed of 203 residues in a single polypeptide chain with a molecular weight of 22,891, as determined by the sequence. Its sequence is homologous to the sequence of the hemorrhagic toxin Ht-d of Crotalus atrox venom and with the sequences of two metalloproteinases from Trimeresurus flavoviridis venom. Microheterogeneity in the sequence was found at both the amino-terminus and at residues 189 and 192. All six cysteine residues in fibrolase are involved in disulfide bonds. A disulfide bond between cysteine-118 and cysteine-198 has been established and bonds between cysteines-158/165 and between cysteines-160/192 are inferred from the homology to Ht-d. Secondary structure prediction reveals a very low percentage of alpha-helix (4%), but much greater beta-structure (39.5%). Analysis of the sequence reveals the absence of asparagine-linked glycosylation sites defined by the consensus sequence: asparagine-X-serine/threonine. PMID:1304358

  5. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    PubMed

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.

  6. Sequencing and comparative analysis of the gorilla MHC genomic sequence

    PubMed Central

    Wilming, Laurens G.; Hart, Elizabeth A.; Coggill, Penny C.; Horton, Roger; Gilbert, James G. R.; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L.

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC. PMID:23589541

  7. Comprehensive global amino acid sequence analysis of PB1F2 protein of influenza A H5N1 viruses and the influenza A virus subtypes responsible for the 20th-century pandemics.

    PubMed

    Pasricha, Gunisha; Mishra, Akhilesh C; Chakrabarti, Alok K

    2013-07-01

    PB1F2 is the 11th protein of influenza A virus translated from +1 alternate reading frame of PB1 gene. Since the discovery, varying sizes and functions of the PB1F2 protein of influenza A viruses have been reported. Selection of PB1 gene segment in the pandemics, variable size and pleiotropic effect of PB1F2 intrigued us to analyze amino acid sequences of this protein in various influenza A viruses. Amino acid sequences for PB1F2 protein of influenza A H5N1, H1N1, H2N2, and H3N2 subtypes were obtained from Influenza Research Database. Multiple sequence alignments of the PB1F2 protein sequences of the aforementioned subtypes were used to determine the size, variable and conserved domains and to perform mutational analysis. Analysis showed that 96·4% of the H5N1 influenza viruses harbored full-length PB1F2 protein. Except for the 2009 pandemic H1N1 virus, all the subtypes of the 20th-century pandemic influenza viruses contained full-length PB1F2 protein. Through the years, PB1F2 protein of the H1N1 and H3N2 viruses has undergone much variation. PB1F2 protein sequences of H5N1 viruses showed both human- and avian host-specific conserved domains. Global database of PB1F2 protein revealed that N66S mutation was present only in 3·8% of the H5N1 strains. We found a novel mutation, N84S in the PB1F2 protein of 9·35% of the highly pathogenic avian influenza H5N1 influenza viruses. Varying sizes and mutations of the PB1F2 protein in different influenza A virus subtypes with pandemic potential were obtained. There was genetic divergence of the protein in various hosts which highlighted the host-specific evolution of the virus. However, studies are required to correlate this sequence variability with the virulence and pathogenicity. © 2012 John Wiley & Sons Ltd.

  8. Purification, characterization and partial amino acid sequence of glycogen synthase from Saccharomyces cerevisiae.

    PubMed Central

    Carabaza, A; Arino, J; Fox, J W; Villar-Palasi, C; Guinovart, J J

    1990-01-01

    Glycogen synthase from Saccharomyces cerevisiae was purified to homogeneity. The enzyme showed a subunit molecular mass of 80 kDa. The holoenzyme appears to be a tetramer. Antibodies developed against purified yeast glycogen synthase inactivated the enzyme in yeast extracts and allowed the detection of the protein in Western blots. Amino acid analysis showed that the enzyme is very rich in glutamate and/or glutamine residues. The N-terminal sequence (11 amino acid residues) was determined. In addition, selected tryptic-digest peptides were purified by reverse-phase h.p.l.c. and submitted to gas-phase sequencing. Up to eight sequences (79 amino acid residues) could be aligned with the human muscle enzyme sequence. Levels of identity range between 37 and 100%, indicating that, although human and yeast glycogen synthases probably share some conserved regions, significant differences in their primary structure should be expected. Images Fig. 1. Fig. 2. Fig. 3. PMID:2114092

  9. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza

    PubMed Central

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  10. Patterns in protein primary sequences: classification, display and analysis.

    PubMed Central

    Saurugger, P. N.; Metfessel, B. A.

    1991-01-01

    The protein folding code, which is contained in the amino acid chain of a protein, has so far eluded elucidation. However, patterns of hydrophobic residues have previously been identified which show a specificity towards certain secondary structural elements. We are developing an analysis toolkit to find, visualize, and analyze patterns in primary sequences. Preliminary results show that there exist patterns in primary sequences which are useful for predicting the structural class of amino acid chains, performing especially well for the all-alpha helix and all-beta sheet classes. PMID:1807631

  11. Analysis of DNA Sequence Variants Detected by High Throughput Sequencing

    PubMed Central

    Adams, David R; Sincan, Murat; Fajardo, Karin Fuentes; Mullikin, James C; Pierson, Tyler M; Toro, Camilo; Boerkoel, Cornelius F; Tifft, Cynthia J; Gahl, William A; Markello, Tom C

    2014-01-01

    The Undiagnosed Diseases Program at the National Institutes of Health uses High Throughput Sequencing (HTS) to diagnose rare and novel diseases. HTS techniques generate large numbers of DNA sequence variants, which must be analyzed and filtered to find candidates for disease causation. Despite the publication of an increasing number of successful exome-based projects, there has been little formal discussion of the analytic steps applied to HTS variant lists. We present the results of our experience with over 30 families for whom HTS sequencing was used in an attempt to find clinical diagnoses. For each family, exome sequence was augmented with high-density SNP-array data. We present a discussion of the theory and practical application of each analytic step and provide example data to illustrate our approach. The paper is designed to provide an analytic roadmap for variant analysis, thereby enabling a wide range of researchers and clinical genetics practitioners to perform direct analysis of HTS data for their patients and projects. PMID:22290882

  12. Cannabinoid acids analysis.

    PubMed

    Lercker, G; Bocci, F; Frega, N; Bortolomeazzi, R

    1992-03-01

    The cannabinoid pattern of vegetable preparations from Cannabis sativa (hashish, marijuana) allows to recognize the phenotype of the plants, to be used as drug or for fiber. Cannabinoid determination by analytical point of view has represented some problems caused by the complex composition of the hexane extract. Capillary gas chromatography of the hexane extracts of vegetable samples, shows the presence of rather polar constituents that eluted, with noticeable interactions, only on polar phase. The compounds can be methylated by diazomethane and silanized (TMS) by silylating reagents. The methyl and methyl-TMS derivatives are analyzed by high resolution gas chromatography (HRGC) and by gas chromatography-mass spectrometry (GC-MS). The identification of the compounds shows their nature of cannabinoid acids, which the main by quantitative point of view results the cannabidiolic acid (CBDA). It is known that the cannabinoid acids are thermally unstable and are transformed in the corresponding cannabinoids by decarboxilation. This is of interest in forensic analysis with the aim to establish the total amount of THC in the Cannabis preparations, as the active component.

  13. Amino acid sequence of homologous rat atrial peptides: natriuretic activity of native and synthetic forms.

    PubMed Central

    Seidah, N G; Lazure, C; Chrétien, M; Thibault, G; Garcia, R; Cantin, M; Genest, J; Nutt, R F; Brady, S F; Lyle, T A

    1984-01-01

    A substance called atrial natriuretic factor (ANF), localized in secretory granules of atrial cardiocytes, was isolated as four homologous natriuretic peptides from homogenates of rat atria. The complete sequence of the longest form showed that it is composed of 33 amino acids. The three other shorter forms (2-33, 3-33, and 8-33) represent amino-terminally truncated versions of the 33 amino acid parent molecule as shown by analysis of sequence, amino acid composition, or both. The proposed primary structure agrees entirely with the amino acid composition and reveals no significant sequence homology with any known protein or segment of protein. The short form ANF-(8-33) was synthesized by a multi-fragment condensation approach and the synthetic product was shown to exhibit specific activity comparable to that of the natural ANF-(3-33). PMID:6232612

  14. Active site amino acid sequence of human factor D.

    PubMed

    Davis, A E

    1980-08-01

    Factor D was isolated from human plasma by chromatography on CM-Sephadex C50, Sephadex G-75, and hydroxylapatite. Digestion of reduced, S-carboxymethylated factor D with cyanogen bromide resulted in three peptides which were isolated by chromatography on Sephadex G-75 (superfine) equilibrated in 20% formic acid. NH2-Terminal sequences were determined by automated Edman degradation with a Beckman 890C sequencer using a 0.1 M Quadrol program. The smallest peptide (CNBr III) consisted of the NH2-terminal 14 amino acids. The other two peptides had molecular weights of 17,000 (CNBr I) and 7000 (CNBr II). Overlap of the NH2-terminal sequence of factor D with the NH2-terminal sequence of CNBr I established the order of the peptides. The NH2-terminal 53 residues of factor D are somewhat more homologous with the group-specific protease of rat intestine than with other serine proteases. The NH2-terminal sequence of CNBr II revealed the active site serine of factor D. The typical serine protease active site sequence (Gly-Asp-Ser-Gly-Gly-Pro was found at residues 12-17. The region surrounding the active site serine does not appear to be more highly homologous with any one of the other serine proteases. The structural data obtained point out the similarities between factor D and the other proteases. However, complete definition of the degree of relationship between factor D and other proteases will require determination of the remainder of the primary structure.

  15. The amino acid sequence of iguana (Iguana iguana) pancreatic ribonuclease.

    PubMed

    Zhao, W; Beintema, J J; Hofsteenge, J

    1994-01-15

    The pyrimidine-specific ribonuclease superfamily constitutes a group of homologous proteins so far found only in higher vertebrates. Four separate families are found in mammals, which have resulted from gene duplications in mammalian ancestors. To learn more about the evolutionary history of this superfamily, the primary structure and other characteristics of the pancreatic enzyme from iguana (Iguana iguana), a herbivorous lizard species belonging to the reptiles, have been determined. The polypeptide chain consists of 119 amino acid residues. The positions of insertions and deletions in the sequence are identical to those in the enzyme from snapping turtle. However, the two enzymes differ at 54% of the amino acid positions. Iguana ribonuclease contains no carbohydrate, although the enzyme possesses three recognition sites for carbohydrate attachment, and has a high number of acidic residues in a localized part of the sequence.

  16. Fractal analysis of DNA sequence data

    SciTech Connect

    Berthelsen, C.L.

    1993-01-01

    DNA sequence databases are growing at an almost exponential rate. New analysis methods are needed to extract knowledge about the organization of nucleotides from this vast amount of data. Fractal analysis is a new scientific paradigm that has been used successfully in many domains including the biological and physical sciences. Biological growth is a nonlinear dynamic process and some have suggested that to consider fractal geometry as a biological design principle may be most productive. This research is an exploratory study of the application of fractal analysis to DNA sequence data. A simple random fractal, the random walk, is used to represent DNA sequences. The fractal dimension of these walks is then estimated using the [open quote]sandbox method[close quote]. Analysis of 164 human DNA sequences compared to three types of control sequences (random, base-content matched, and dimer-content matched) reveals that long-range correlations are present in DNA that are not explained by base or dimer frequencies. The study also revealed that the fractal dimension of coding sequences was significantly lower than sequences that were primarily noncoding, indicating the presence of longer-range correlations in functional sequences. The multifractal spectrum is used to analyze fractals that are heterogeneous and have a different fractal dimension for subsets with different scalings. The multifractal spectrum of the random walks of twelve mitochondrial genome sequences was estimated. Eight vertebrate mtDNA sequences had uniformly lower spectra values than did four invertebrate mtDNA sequences. Thus, vertebrate mitochondria show significantly longer-range correlations than to invertebrate mitochondria. The higher multifractal spectra values for invertebrate mitochondria suggest a more random organization of the sequences. This research also includes considerable theoretical work on the effects of finite size, embedding dimension, and scaling ranges.

  17. Sequence Quality Analysis Tool for HIV Type 1 Protease and Reverse Transcriptase

    PubMed Central

    DeLong, Allison K.; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W.

    2012-01-01

    Abstract Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature (http://hivdb.Stanford.edu). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1–2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or <0.5% or >15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences. PMID:21916749

  18. Sequence variation divides Equine rhinitis B virus into three distinct phylogenetic groups that correlate with serotype and acid stability.

    PubMed

    Black, Wesley D; Hartley, Carol A; Ficorilli, Nino P; Studdert, Michael J

    2005-08-01

    Equine rhinitis B virus (ERBV), genus Erbovirus, family Picornaviridae, occurs as two serotypes, ERBV1 and ERBV2, and the few isolates previously tested were acid labile. Of 24 ERBV1 isolates tested in the studies reported here, 19 were acid labile and five were acid stable. The two available ERBV2 isolates, as expected, were acid labile. Nucleotide sequences of the P1 region encoding the capsid proteins VP1, VP2, VP3 and VP4 were determined for five acid-labile and three acid-stable ERBV1 isolates and one acid-labile ERBV2 isolate. The sequences were aligned with the published sequences of the prototype acid-labile ERBV1.1436/71 and the prototype ERBV2.313/75. The three acid-stable ERBV1 were closely related in a phylogenetic group that was distinct from the group of six acid-labile ERBV1, which were also closely related to each other. The two acid-labile ERBV2 formed a third distinct group. One acid-labile ERBV1 had a chimeric acid-labile/acid-stable ERBV1 P1 sequence, presumably because of a recombination event within VP2 and this was supported by SimPlot analysis. ERBV1 rabbit antiserum neutralized acid-stable and acid-labile ERBV1 isolates similarly. Accordingly, three distinct phylogenetic groups of erboviruses exist that are consistent with serotype and acid stability phenotypes.

  19. Constrained Multistate Sequence Design for Nucleic Acid Reaction Pathway Engineering.

    PubMed

    Wolfe, Brian R; Porubsky, Nicholas J; Zadeh, Joseph N; Dirks, Robert M; Pierce, Niles A

    2017-03-01

    We describe a framework for designing the sequences of multiple nucleic acid strands intended to hybridize in solution via a prescribed reaction pathway. Sequence design is formulated as a multistate optimization problem using a set of target test tubes to represent reactant, intermediate, and product states of the system, as well as to model crosstalk between components. Each target test tube contains a set of desired "on-target" complexes, each with a target secondary structure and target concentration, and a set of undesired "off-target" complexes, each with vanishing target concentration. Optimization of the equilibrium ensemble properties of the target test tubes implements both a positive design paradigm, explicitly designing for on-pathway elementary steps, and a negative design paradigm, explicitly designing against off-pathway crosstalk. Sequence design is performed subject to diverse user-specified sequence constraints including composition constraints, complementarity constraints, pattern prevention constraints, and biological constraints. Constrained multistate sequence design facilitates nucleic acid reaction pathway engineering for diverse applications in molecular programming and synthetic biology. Design jobs can be run online via the NUPACK web application.

  20. CROSS-DISCIPLINARY PHYSICS AND RELATED AREAS OF SCIENCE AND TECHNOLOGY: The structural analysis of protein sequences based on the quasi-amino acids code

    NASA Astrophysics Data System (ADS)

    Zhu, Ping; Tang, Xu-Qing; Xu, Zhen-Yuan

    2009-01-01

    Proteomics is the study of proteins and their interactions in a cell. With the successful completion of the Human Genome Project, it comes the postgenome era when the proteomics technology is emerging. This paper studies protein molecule from the algebraic point of view. The algebraic system (Σ, +, *) is introduced, where Σ is the set of 64 codons. According to the characteristics of (Σ, +, *), a novel quasi-amino acids code classification method is introduced and the corresponding algebraic operation table over the set ZU of the 16 kinds of quasi-amino acids is established. The internal relation is revealed about quasi-amino acids. The results show that there exist some very close correlations between the properties of the quasi-amino acids and the codon. All these correlation relationships may play an important part in establishing the logic relationship between codons and the quasi-amino acids during the course of life origination. According to Ma F et al (2003 J. Anhui Agricultural University 30 439), the corresponding relation and the excellent properties about amino acids code are very difficult to observe. The present paper shows that (ZU, ⊕, otimes) is a field. Furthermore, the operational results display that the codon tga has different property from other stop codons. In fact, in the mitochondrion from human and ox genomic codon, tga is just tryptophane, is not the stop codon like in other genetic code, it is the case of the Chen W C et al (2002 Acta Biophysica Sinica 18(1) 87). The present theory avoids some inexplicable events of the 20 kinds of amino acids code, in other words it solves the problem of 'the 64 codon assignments of mRNA to amino acids is probably completely wrong' proposed by Yang (2006 Progress in Modern Biomedicine 6 3).

  1. Sequences Of Amino Acids For Human Serum Albumin

    NASA Technical Reports Server (NTRS)

    Carter, Daniel C.

    1992-01-01

    Sequences of amino acids defined for use in making polypeptides one-third to one-sixth as large as parent human serum albumin molecule. Smaller, chemically stable peptides have diverse applications including service as artificial human serum and as active components of biosensors and chromatographic matrices. In applications involving production of artificial sera from new sequences, little or no concern about viral contaminants. Smaller genetically engineered polypeptides more easily expressed and produced in large quantities, making commercial isolation and production more feasible and profitable.

  2. Structure and sequence based analysis of alpha-amylase evolution.

    PubMed

    Singh, Swati; Guruprasad, Lalitha

    2014-01-01

    α-Amylases hydrolyze α- 1,4-glycosidic bonds during assimilation of biological macromolecules. The amino acid sequences of these enzymes in thousands of diverse organisms are known and the 3D structures of several proteins have been solved. The 3D structure analysis of these universal enzymes from diverse organisms has been studied by the generation of phylogenetic trees and structure based sequence analysis to generate a metric for the degree of conservation that is responsible for individual speciation. Greater similarities are observed between reference NCBI tree and structure based phylogenetic tree compared to sequence based phylogenetic tree indicating that structures truly represent the functional aspects of proteins than from the sequence information alone. We report differences in the profile specific conserved and insertion/deletion regions, factors responsible for the Ca(2+) and Cl(-) ion binding and the disulfide connectivity pattern that discriminate the enzymes over evolution.

  3. Scale-PC shielding analysis sequences

    SciTech Connect

    Bowman, S.M.

    1996-05-01

    The SCALE computational system is a modular code system for analyses of nuclear fuel facility and package designs. With the release of SCALE-PC Version 4.3, the radiation shielding analysis community now has the capability to execute the SCALE shielding analysis sequences contained in the control modules SAS1, SAS2, SAS3, and SAS4 on a MS- DOS personal computer (PC). In addition, SCALE-PC includes two new sequences, QADS and ORIGEN-ARP. The capabilities of each sequence are presented, along with example applications.

  4. Nanopores and nucleic acids: prospects for ultrarapid sequencing

    NASA Technical Reports Server (NTRS)

    Deamer, D. W.; Akeson, M.

    2000-01-01

    DNA and RNA molecules can be detected as they are driven through a nanopore by an applied electric field at rates ranging from several hundred microseconds to a few milliseconds per molecule. The nanopore can rapidly discriminate between pyrimidine and purine segments along a single-stranded nucleic acid molecule. Nanopore detection and characterization of single molecules represents a new method for directly reading information encoded in linear polymers. If single-nucleotide resolution can be achieved, it is possible that nucleic acid sequences can be determined at rates exceeding a thousand bases per second.

  5. Nanopores and nucleic acids: prospects for ultrarapid sequencing

    NASA Technical Reports Server (NTRS)

    Deamer, D. W.; Akeson, M.

    2000-01-01

    DNA and RNA molecules can be detected as they are driven through a nanopore by an applied electric field at rates ranging from several hundred microseconds to a few milliseconds per molecule. The nanopore can rapidly discriminate between pyrimidine and purine segments along a single-stranded nucleic acid molecule. Nanopore detection and characterization of single molecules represents a new method for directly reading information encoded in linear polymers. If single-nucleotide resolution can be achieved, it is possible that nucleic acid sequences can be determined at rates exceeding a thousand bases per second.

  6. Diagnostics based on nucleic acid sequence variant profiling: PCR, hybridization, and NGS approaches.

    PubMed

    Khodakov, Dmitriy; Wang, Chunyan; Zhang, David Yu

    2016-10-01

    Nucleic acid sequence variations have been implicated in many diseases, and reliable detection and quantitation of DNA/RNA biomarkers can inform effective therapeutic action, enabling precision medicine. Nucleic acid analysis technologies being translated into the clinic can broadly be classified into hybridization, PCR, and sequencing, as well as their combinations. Here we review the molecular mechanisms of popular commercial assays, and their progress in translation into in vitro diagnostics. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  7. Quantiprot - a Python package for quantitative analysis of protein sequences.

    PubMed

    Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold

    2017-07-17

    The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.

  8. Complete amino acid sequence of three reptile lysozymes.

    PubMed

    Ponkham, Pornpimol; Daduang, Sakda; Kitimasak, Wachira; Krittanai, Chartchai; Chokchaichamnankit, Daranee; Srisomsap, Chantragan; Svasti, Jisnuson; Kawamura, Shunsuke; Araki, Tomohiro; Thammasirirak, Sompong

    2010-01-01

    To study the structure and function of reptile lysozymes, we have reported their purification, and in this study we have established the amino acid sequence of three egg white lysozymes in soft-shelled turtle eggs (SSTL A and SSTL B from Trionyx sinensis, ASTL from Amyda cartilaginea) by using the rapid peptide mapping method. The established amino acid sequence of SSTL A, SSTL B, and ASTL showed substitutions of 43, 42, and 44 residues respectively when compared with the HEWL (hen egg white lysozyme) sequence. In these reptile lysozymes, SSTL A had one substitution compared with SSTL B (Gly126Asp) and had an N-terminal extra Gly and 11 substitutions compared with ASTL. SSTL B had an N-terminal extra Gly and 10 residues different from ASTL. The sequence of SSTL B was identical to soft-shelled turtle lysozyme from STL (Trionyx sinensis japonicus). The Ile residue at position 93 of ASTL is the first report in all C-type lysozymes. Furthermore, amino acid substitutions (Phe34His, Arg45Tyr, Thr47Arg, and Arg114Tyr) were also found at subsites E and F when compared with HEWL. The time course using N-acetylglucosamine pentamer as a substrate exhibited a reduction of the rate constant of glycosidic cleavage and increase of binding free energy for subsites E and F, which proved the contribution for amino acids mentioned above for substrate binding at subsites E and F. Interestingly, the variable binding free energy values occurred on ASTL, may be contributed from substitutions at outside of subsites E and F.

  9. Isolation and amino-acid sequence determination of monkey insulin and proinsulin.

    PubMed

    Naithani, V K; Steffens, G J; Tager, H S; Buse, G; Rubenstein, A H; Steiner, D F

    1984-05-01

    Insulin has been isolated and purified from rhesus monkey pancreas by means of acid-ethanol extraction, gel filtration and ion exchange chromatography. The complete amino-acid sequence of the hormone has been determined by amino-acid analysis of the oxidized A- and B-chains, by end group determination, by the identification of the C-terminal residues (AsnA21 and ThrB30) by carboxypeptidase A digestion and by Edman degradation of the S-carboxymethylated A- and B-chains. The 51-residue monkey insulin was shown to be identical to human insulin. From the known insulin and C-peptide sequence the primary sequence of monkey proinsulin has been proposed.

  10. Sequence Analysis and Evolutionary Studies of Reelin Proteins

    PubMed Central

    Manoharan, Malini; Muhammad, Sayyed Auwn; Sowdhamini, Ramanathan

    2015-01-01

    The reelin gene is conserved across many vertebrate species, including humans. The protein product of this gene plays several important roles in early brain development and regulation of neural network plasticity of a matured brain structure. With an extended structure of 3461 amino acid sequences, consisting of eight reelin repeats, the human reelin sequence stands out as an exceptional model for evolutionary studies. In this study, sequence analysis of the human reelin and its homologues and reelin sequences from 104 other species is described in detail. Interesting sequence conservation patterns of individual repeats have been highlighted. Sequence phylogeny of the reelin sequences indicates a pattern similar to the evolution of the species, thereby serving as a highly conserved family for evolutionary purposes. Multiple sequence alignment of different reelin domain repeats, derived from homologues, suggests specific functions for individual repeats and high sequence conservation across reelin repeats from different organisms, albeit with few unusual domain architectures. A three-dimensional structural model of the full-length human reelin is now available that provides clues on residues at the dimer interface. PMID:26715843

  11. Amino-acid sequence of toxin I from Anemonia sulcata.

    PubMed

    Wunderer, G; Eulitz, M

    1978-08-15

    Toxin I from Anemonia sulcata, a major component of the sea anemone venom, consists of 46 amino acid residues which are linked by three disulfide bridges. The [14C]carboxymethylated polypeptide was sequenced to position 29 by automated Edman degradation. The remaining sequence was determined from cyanogen bromide peptides and from tryptic peptides of the citraconylated [14C]carboxymethylated toxin. Toxin I is homologous to toxin II from Anemonia sulcata and to anthopleurin A, a toxin from the sea anemone Anthopleura xanthogrammica. These toxins constitute a new class of polypeptide toxins. No significant homologies exist with toxin III from Anemonia sulcata nor with known sequences of neurotoxins or cardiotoxins of various origin.

  12. RNA internal standard synthesis by nucleic acid sequence-based amplification for competitive quantitative amplification reactions.

    PubMed

    Lo, Wan-Yu; Baeumner, Antje J

    2007-02-15

    Nucleic acid sequence-based amplification (NASBA) reactions have been demonstrated to successfully synthesize new sequences based on deletion and insertion reactions. Two RNA internal standards were synthesized for use in competitive amplification reactions in which quantitative analysis can be achieved by coamplifying the internal standard with the wild type sample. The sequences were created in two consecutive NASBA reactions using the E. coli clpB mRNA sequence as model analyte. The primer sequences of the wild type sequence were maintained, and a 20-nt-long segment inside the amplicon region was exchanged for a new segment of similar GC content and melting temperature. The new RNA sequence was thus amplifiable using the wild type primers and detectable via a new inserted sequence. In the first reaction, the forwarding primer and an additional 20-nt-long sequence was deleted and replaced by a new 20-nt-long sequence. In the second reaction, a forwarding primer containing as 5' overhang sequence the wild type primer sequence was used. The presence of pure internal standard was verified using electrochemiluminescence and RNA lateral-flow biosensor analysis. Additional sequence deletion in order to shorten the internal standard amplicons and thus generate higher detection signals was found not to be required. Finally, a competitive NASBA reaction between one internal standard and the wild type sequence was carried out proving its functionality. This new rapid construction method via NASBA provides advantages over the traditional techniques since it requires no traditional cloning procedures, no thermocyclers, and can be completed in less than 4 h.

  13. Mechanism Analysis of Acid Tolerance Response of Bifidobacterium longum subsp. longum BBMN 68 by Gene Expression Profile Using RNA-Sequencing

    PubMed Central

    Jin, Junhua; Zhang, Bing; Guo, Huiyuan; Cui, Jianyun; Jiang, Lu; Song, Shuhui; Sun, Min; Ren, Fazheng

    2012-01-01

    To analyze the mechanism of the acid tolerance response (ATR) in Bifidobacterium longum subsp. longum BBMN68, we optimized the acid-adaptation condition to stimulate ATR effectively and analyzed the change of gene expression profile after acid-adaptation using high-throughput RNA-Seq. After acid-adaptation at pH 4.5 for 2 hours, the survival rate of BBMN68 at lethal pH 3.5 for 120 min was increased by 70 fold and the expression of 293 genes were upregulated by more than 2 fold, and 245 genes were downregulated by more than 2 fold. Gene expression profiling of ATR in BBMN68 suggested that, when the bacteria faced acid stress, the cells strengthened the integrity of cell wall and changed the permeability of membrane to keep the H+ from entering. Once the H+ entered the cytoplasm, the cells showed four main responses: First, the F0F1-ATPase system was initiated to discharge H+. Second, the ability to produce NH3 by cysteine-cystathionine-cycle was strengthened to neutralize excess H+. Third, the cells started NER-UVR and NER-VSR systems to minimize the damage to DNA and upregulated HtpX, IbpA, and γ-glutamylcysteine production to protect proteins against damage. Fourth, the cells initiated global response signals ((p)ppGpp, polyP, and Sec-SRP) to bring the whole cell into a state of response to the stress. The cells also secreted the quorum sensing signal (AI-2) to communicate between intraspecies cells by the cellular signal system, such as two-component systems, to improve the overall survival rate. Besides, the cells varied the pathways of producing energy by shifting to BCAA metabolism and enhanced the ability to utilize sugar to supply sufficient energy for the operation of the mechanism mentioned above. Based on these reults, it was inferred that, during industrial applications, the acid resistance of bifidobacteria could be improved by adding BCAA, γ-glutamylcysteine, cysteine, and cystathionine into the acid-stress environment. PMID:23236393

  14. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  15. Auditory sequence analysis and phonological skill.

    PubMed

    Grube, Manon; Kumar, Sukhbinder; Cooper, Freya E; Turton, Stuart; Griffiths, Timothy D

    2012-11-07

    This work tests the relationship between auditory and phonological skill in a non-selected cohort of 238 school students (age 11) with the specific hypothesis that sound-sequence analysis would be more relevant to phonological skill than the analysis of basic, single sounds. Auditory processing was assessed across the domains of pitch, time and timbre; a combination of six standard tests of literacy and language ability was used to assess phonological skill. A significant correlation between general auditory and phonological skill was demonstrated, plus a significant, specific correlation between measures of phonological skill and the auditory analysis of short sequences in pitch and time. The data support a limited but significant link between auditory and phonological ability with a specific role for sound-sequence analysis, and provide a possible new focus for auditory training strategies to aid language development in early adolescence.

  16. Auditory sequence analysis and phonological skill

    PubMed Central

    Grube, Manon; Kumar, Sukhbinder; Cooper, Freya E.; Turton, Stuart; Griffiths, Timothy D.

    2012-01-01

    This work tests the relationship between auditory and phonological skill in a non-selected cohort of 238 school students (age 11) with the specific hypothesis that sound-sequence analysis would be more relevant to phonological skill than the analysis of basic, single sounds. Auditory processing was assessed across the domains of pitch, time and timbre; a combination of six standard tests of literacy and language ability was used to assess phonological skill. A significant correlation between general auditory and phonological skill was demonstrated, plus a significant, specific correlation between measures of phonological skill and the auditory analysis of short sequences in pitch and time. The data support a limited but significant link between auditory and phonological ability with a specific role for sound-sequence analysis, and provide a possible new focus for auditory training strategies to aid language development in early adolescence. PMID:22951739

  17. Sequencing and Analysis of Neanderthal Genomic DNA

    PubMed Central

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Pääbo, Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2008-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library are of Neanderthal origin, the strongest being the ascertainment of sequence identities between Neanderthal and chimpanzee at sites where the human genomic sequence is different. These results enabled us to calculate the human-Neanderthal divergence time based on multiple randomly distributed autosomal loci. Our analyses suggest that on average the Neanderthal genomic sequence we obtained and the reference human genome sequence share a most recent common ancestor ~706,000 years ago, and that the human and Neanderthal ancestral populations split ~370,000 years ago, before the emergence of anatomically modern humans. Our finding that the Neanderthal and human genomes are at least 99.5% identical led us to develop and successfully implement a targeted method for recovering specific ancient DNA sequences from metagenomic libraries. This initial analysis of the Neanderthal genome advances our understanding of the evolutionary relationship of Homo sapiens and Homo neanderthalensis and signifies the dawn of Neanderthal genomics. PMID:17110569

  18. Optimizing cancer genome sequencing and analysis

    PubMed Central

    Griffith, Malachi; Miller, Christopher A.; Griffith, Obi L.; Krysiak, Kilannin; Skidmore, Zachary L.; Ramu, Avinash; Walker, Jason R.; Dang, Ha X.; Trani, Lee; Larson, David E.; Demeter, Ryan T.; Wendl, Michael C.; McMichael, Joshua F.; Austin, Rachel E.; Magrini, Vincent; McGrath, Sean D.; Ly, Amy; Kulkarni, Shashikant; Cordes, Matthew G.; Fronick, Catrina C.; Fulton, Robert S.; Maher, Christopher A.; Ding, Li; Klco, Jeffery M.; Mardis, Elaine R.; Ley, Timothy J.; Wilson, Richard K.

    2015-01-01

    Summary Tumors are typically sequenced to depths of 75–100× (exome) or 30–50× (whole genome). We demonstrate that current sequencing paradigms are inadequate for tumors that are impure, aneuploid or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ~312×) whole genome sequencing (WGS) and exome capture (up to ~433×) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested multiple alignment and variant calling algorithms and validated ~200,000 putative SNVs by sequencing them to depths of ~1,000×. Additional targeted sequencing provided over 10,000× coverage and ddPCR assays provided up to ~250,000× sampling of selected sites. We evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource (dbGaP accession id phs000159). PMID:26645048

  19. RSAT 2015: Regulatory Sequence Analysis Tools

    PubMed Central

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-01-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632

  20. RSAT 2015: Regulatory Sequence Analysis Tools.

    PubMed

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-07-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/.

  1. Phylogenetic analysis of adenovirus sequences.

    PubMed

    Harrach, Balázs; Benko, Mária

    2007-01-01

    Members of the family Adenoviridae have been isolated from a large variety of hosts, including representatives from every major vertebrate class from fish to mammals. The high prevalence, together with the fairly conserved organization of the central part of their genomes, make the adenoviruses one of (if not the) best models for studying viral evolution on a larger time scale. Phylogenetic calculation can infer the evolutionary distance among adenovirus strains on serotype, species, and genus levels, thus helping the establishment of a correct taxonomy on the one hand, and speeding up the process of typing new isolates on the other. Initially, four major lineages corresponding to four genera were recognized. Later, the demarcation criteria of lower taxon levels, such as species or types, could also be defined with phylogenetic calculations. A limited number of possible host switches have been hypothesized and convincingly supported. Application of the web-based BLAST and MultAlin programs and the freely available PHYLIP package, along with the TreeView program, enables everyone to make correct calculations. In addition to step-by-step instruction on how to perform phylogenetic analysis, critical points where typical mistakes or misinterpretation of the results might occur will be identified and hints for their avoidance will be provided.

  2. Trichomonas vaginalis acidic phospholipase A2: isolation and partial amino acid sequence.

    PubMed

    Escobedo-Guajardo, Brenda L; González-Salazar, Francisco; Palacios-Corona, Rebeca; Torres de la Cruz, Víctor M; Morales-Vallarta, Mario; Mata-Cárdenas, Benito D; Garza-González, Jesús N; Rivera-Silva, Gerardo; Vargas-Villarreal, Javier

    2013-12-01

    Sexually transmitted diseases are a major cause of acute disease worldwide, and trichomoniasis is the most common and curable disease, generating more than 170 million cases annually worldwide. Trichomonas vaginalis is the causal agent of trichomoniasis and has the ability to destroy in vitro cell monolayers of the vaginal mucosa, where the phospholipases A2 (PLA2) have been reported as potential virulence factors. These enzymes have been partially characterized from the subcellular fraction S30 of pathogenic T. vaginalis strains. The main objective of this study was to purify a phospholipase A2 from T. vaginalis, make a partial characterization, obtain a partial amino acid sequence, and determine its enzymatic participation as hemolytic factor causing lysis of erythrocytes. Trichomonas S30, RF30 and UFF30 sub-fractions from GT-15 strain have the capacity to hydrolyze [2-(14)C-PA]-PC at pH 6.0. Proteins from the UFF30 sub-fraction were separated by affinity chromatography into two eluted fractions with detectable PLA A2 activity. The EDTA-eluted fraction was analyzed by HPLC using on-line HPLC-tandem mass spectrometry and two protein peaks were observed at 8.2 and 13 kDa. Peptide sequences were identified from the proteins present in the eluted EDTA UFF30 fraction; bioinformatic analysis using Protein Link Global Server charged with T. vaginalis protein database suggests that eluted peptides correspond a putative ubiquitin protein in the 8.2 kDa fraction and a phospholipase preserved in the 13 kDa fraction. The EDTA-eluted fraction hydrolyzed [2-(14)C-PA]-PC lyses erythrocytes from Sprague-Dawley in a time and dose-dependent manner. The acidic hemolytic activity decreased by 84% with the addition of 100 μM of Rosenthal's inhibitor.

  3. Computer analysis and structure prediction of nucleic acids and proteins.

    PubMed Central

    Kanehisa, M; Klein, P; Greif, P; DeLisi, C

    1984-01-01

    We have developed an integrated computer system for analysis of nucleic acid and protein sequences, which consists of sequence and structure databases, a relational database, and software for structural analysis. The system is potentially applicable to a number of problems in structural biology including predictive classification of the function and location of oncogene products. PMID:6546426

  4. A classification of glycosyl hydrolases based on amino acid sequence similarities.

    PubMed Central

    Henrissat, B

    1991-01-01

    The amino acid sequences of 301 glycosyl hydrolases and related enzymes have been compared. A total of 291 sequences corresponding to 39 EC entries could be classified into 35 families. Only ten sequences (less than 5% of the sample) could not be assigned to any family. With the sequences available for this analysis, 18 families were found to be monospecific (containing only one EC number) and 17 were found to be polyspecific (containing at least two EC numbers). Implications on the folding characteristics and mechanism of action of these enzymes and on the evolution of carbohydrate metabolism are discussed. With the steady increase in sequence and structural data, it is suggested that the enzyme classification system should perhaps be revised. PMID:1747104

  5. Analysis of Organic Acids.

    ERIC Educational Resources Information Center

    Griswold, John R.; Rauner, Richard A.

    1990-01-01

    Presented are the procedures and a discussion of the results for an experiment in which students select unknown carboxylic acids, determine their melting points, and investigate their solubility behavior in water and ethanol. A table of selected carboxylic acids is included. (CW)

  6. Analysis of Organic Acids.

    ERIC Educational Resources Information Center

    Griswold, John R.; Rauner, Richard A.

    1990-01-01

    Presented are the procedures and a discussion of the results for an experiment in which students select unknown carboxylic acids, determine their melting points, and investigate their solubility behavior in water and ethanol. A table of selected carboxylic acids is included. (CW)

  7. Sequence analysis of diamine oxidase gene from fava bean and its expression related to γ-aminobutyric acid accumulation in seeds germinating under hypoxia-NaCl stress.

    PubMed

    Yang, Runqiang; Yin, Yongqi; Guo, Liping; Han, Yongbin; Gu, Zhenxin

    2014-06-01

    γ-Aminobutyric acid (GABA) is synthesized via the polyamine degradation pathway in plants, with diamine oxidase (DAO) being the key enzyme. In this study the cDNA of DAO in fava bean was cloned and its expression in seeds germinating under hypoxia-NaCl stress was investigated. Fava bean DAO cDNA is 2199 bp long and contains 2025 bp of open reading frame that encodes 675 amino acid peptides with a calculated molecular weight of 76.31 kDa and a pI of 5.41. Hypoxia and hypoxia-NaCl stress enhanced DAO activity and resulted in GABA accumulation in germinating fava bean. However, DAO gene expression was down-regulated under hypoxia compared with non-stress condition, while its expression in the cotyledon and shoot was up-regulated under hypoxia-NaCl. In addition, DAO expression could be promoted to enhance GABA accumulation after increasing the stress intensity using NaCl. DAO gene expression was significantly inhibited by aminoguanidine treatment under hypoxia but increased under hypoxia-NaCl. Under hypoxia, GABA accumulation due to NaCl was mainly concentrated in the cotyledon. The GABA content increase under hypoxia did not result from DAO gene expression, but DAO existing in seeds was activated under hypoxia. DAO gene expression was up-regulated to enhance GABA accumulation after increasing the stress intensity. © 2013 Society of Chemical Industry.

  8. Nucleic acid sequence detection using multiplexed oligonucleotide PCR

    DOEpatents

    Nolan, John P.; White, P. Scott

    2006-12-26

    Methods for rapidly detecting single or multiple sequence alleles in a sample nucleic acid are described. Provided are all of the oligonucleotide pairs capable of annealing specifically to a target allele and discriminating among possible sequences thereof, and ligating to each other to form an oligonucleotide complex when a particular sequence feature is present (or, alternatively, absent) in the sample nucleic acid. The design of each oligonucleotide pair permits the subsequent high-level PCR amplification of a specific amplicon when the oligonucleotide complex is formed, but not when the oligonucleotide complex is not formed. The presence or absence of the specific amplicon is used to detect the allele. Detection of the specific amplicon may be achieved using a variety of methods well known in the art, including without limitation, oligonucleotide capture onto DNA chips or microarrays, oligonucleotide capture onto beads or microspheres, electrophoresis, and mass spectrometry. Various labels and address-capture tags may be employed in the amplicon detection step of multiplexed assays, as further described herein.

  9. Ab initio detection of fuzzy amino acid tandem repeats in protein sequences

    PubMed Central

    2012-01-01

    Background Tandem repetitions within protein amino acid sequences often correspond to regular secondary structures and form multi-repeat 3D assemblies of varied size and function. Developing internal repetitions is one of the evolutionary mechanisms that proteins employ to adapt their structure and function under evolutionary pressure. While there is keen interest in understanding such phenomena, detection of repeating structures based only on sequence analysis is considered an arduous task, since structure and function is often preserved even under considerable sequence divergence (fuzzy tandem repeats). Results In this paper we present PTRStalker, a new algorithm for ab-initio detection of fuzzy tandem repeats in protein amino acid sequences. In the reported results we show that by feeding PTRStalker with amino acid sequences from the UniProtKB/Swiss-Prot database we detect novel tandemly repeated structures not captured by other state-of-the-art tools. Experiments with membrane proteins indicate that PTRStalker can detect global symmetries in the primary structure which are then reflected in the tertiary structure. Conclusions PTRStalker is able to detect fuzzy tandem repeating structures in protein sequences, with performance beyond the current state-of-the art. Such a tool may be a valuable support to investigating protein structural properties when tertiary X-ray data is not available. PMID:22536906

  10. The amino acid sequence of chymopapain from Carica papaya.

    PubMed Central

    Watson, D C; Yaguchi, M; Lynn, K R

    1990-01-01

    Chymopapain is a polypeptide of 218 amino acid residues. It has considerable structural similarity with papain and papaya proteinase omega, including conservation of the catalytic site and of the disulphide bonding. Chymopapain is like papaya proteinase omega in carrying four extra residues between papain positions 168 and 169, but differs from both papaya proteinases in the composition of its S2 subsite, as well as in having a second thiol group, Cys-117. Some evidence for the amino acid sequence of chymopapain has been deposited as Supplementary Publication SUP 50153 (12 pages) at the British Library Document Supply Centre, Boston Spa., Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms indicated in Biochem. J. (1990) 265, 5. The information comprises Supplement Tables 1-4, which contain, in order, amino acid compositions of peptides from tryptic, peptic, CNBr and mild acid cleavages, Supplement Fig. 1, showing re-fractionation of selected peaks from Fig. 2 of the main paper. Supplement Fig. 2, showing cation-exchange chromatography of the earliest-eluted peak of Fig. 3 of the main paper, Supplement Fig. 3, showing reverse-phase h.p.l.c. of the later-eluted peak from Fig. 3 of the main paper, and Supplement Fig. 4, showing the separation of peptides after mild acid hydrolysis of CNBr-cleavage fragment CB3. PMID:2106878

  11. Improved algorithm for analysis of DNA sequences using multiresolution transformation.

    PubMed

    Inbamalar, T M; Sivakumar, R

    2015-01-01

    Bioinformatics and genomic signal processing use computational techniques to solve various biological problems. They aim to study the information allied with genetic materials such as the deoxyribonucleic acid (DNA), the ribonucleic acid (RNA), and the proteins. Fast and precise identification of the protein coding regions in DNA sequence is one of the most important tasks in analysis. Existing digital signal processing (DSP) methods provide less accurate and computationally complex solution with greater background noise. Hence, improvements in accuracy, computational complexity, and reduction in background noise are essential in identification of the protein coding regions in the DNA sequences. In this paper, a new DSP based method is introduced to detect the protein coding regions in DNA sequences. Here, the DNA sequences are converted into numeric sequences using electron ion interaction potential (EIIP) representation. Then discrete wavelet transformation is taken. Absolute value of the energy is found followed by proper threshold. The test is conducted using the data bases available in the National Centre for Biotechnology Information (NCBI) site. The comparative analysis is done and it ensures the efficiency of the proposed system.

  12. Improved Algorithm for Analysis of DNA Sequences Using Multiresolution Transformation

    PubMed Central

    Inbamalar, T. M.; Sivakumar, R.

    2015-01-01

    Bioinformatics and genomic signal processing use computational techniques to solve various biological problems. They aim to study the information allied with genetic materials such as the deoxyribonucleic acid (DNA), the ribonucleic acid (RNA), and the proteins. Fast and precise identification of the protein coding regions in DNA sequence is one of the most important tasks in analysis. Existing digital signal processing (DSP) methods provide less accurate and computationally complex solution with greater background noise. Hence, improvements in accuracy, computational complexity, and reduction in background noise are essential in identification of the protein coding regions in the DNA sequences. In this paper, a new DSP based method is introduced to detect the protein coding regions in DNA sequences. Here, the DNA sequences are converted into numeric sequences using electron ion interaction potential (EIIP) representation. Then discrete wavelet transformation is taken. Absolute value of the energy is found followed by proper threshold. The test is conducted using the data bases available in the National Centre for Biotechnology Information (NCBI) site. The comparative analysis is done and it ensures the efficiency of the proposed system. PMID:26000337

  13. Sequence analysis by iterated maps, a review.

    PubMed

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  14. Design, synthesis, and characterization of a protein sequencing reagent yielding amino acid derivatives with enhanced detectability by mass spectrometry.

    PubMed Central

    Aebersold, R.; Bures, E. J.; Namchuk, M.; Goghari, M. H.; Shushan, B.; Covey, T. C.

    1992-01-01

    We report the design, chemical synthesis, and structural and functional characterization of a novel reagent for protein sequence analysis by the Edman degradation, yielding amino acid derivatives rapidly detectable at high sensitivity by ion-evaporation mass spectrometry. We demonstrate that the reagent 3-[4'(ethylene-N,N,N-trimethylamino)phenyl]-2-isothiocyanate is chemically stable and shows coupling and cyclization/cleavage yields comparable to phenylisothiocyanate, the standard reagent in chemical sequence analysis, under conditions typically encountered in manual or automated sequence analysis. Amino acid derivatives generated with this reagent were detectable by ion-evaporation mass spectrometry at the subfemtomole sensitivity level at a pace of one sample per minute. Furthermore, derivatives were identified by their mass, thus permitting the rapid and highly sensitive determination of the molecular nature of modified amino acids. Derivatives of amino acids with acidic, basic, polar, or hydrophobic side chains were reproducibly detectable at comparable sensitivities. The polar nature of the reagent required covalent immobilization of polypeptides prior to automated sequence analysis. This reagent, used in automated sequence analysis, has the potential for overcoming the limitations in sensitivity, speed, and the ability to characterize modified amino acid residues inherent in the chemical sequencing methods that are currently used. PMID:1304351

  15. The amino acid sequence of rabbit cardiac troponin I.

    PubMed Central

    Grand, R J; Wilkinson, J M

    1976-01-01

    The complete amino acid sequence of troponin I from rabbit cardiac muscle was determined by the isolation of four unique CNBr fragments, together with overlapping tryptic peptides containing radioactive methionine residues. Overlap data for residues 35-36, 93-94 and 140-145 are incomplete, the sequence at these positions being based on homology with the sequence of the fast-skeletal-muscle protein. Cardiac troponin I is a single polypeptide chain of 206 residues with mol.wt. 23550 and an extinction coefficient, E 1%,1cm/280, of 4.37. The protein has a net positive charge of 14 and is thus somewhat more basic than troponin I from fast-skeletal muscle. Comparison of the sequences of troponin I from cardiac and fast skeletal muscle show that the cardiac protein has 26 extra residues at the N-terminus which account for the larger size of the protein. In the remainder of sequence there is a considerable degree of homology, this being greater in the C-terminal two-thirds of the molecule. The region in the cardiac protein corresponding to the peptide with inhibitory activity from the fast-skeletal-muscle protein is very similar and it seems unlikely that this is the cause of the difference in inhibitory activity between the two proteins. The region responsible for binding troponin C, however, possesses a lower degree of homology. Detailed evidence on which the sequence is based has been deposited as Supplementary Publication SUP 50072 (20 pages), at the British Library Lending Division, Boston Spa, Wetherby, West Yorkshire LS23 7QB, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1976) 153, 5. PMID:1008822

  16. Amino acid sequence of a mouse immunoglobulin mu chain.

    PubMed Central

    Kehry, M; Sibley, C; Fuhrman, J; Schilling, J; Hood, L E

    1979-01-01

    The complete amino acid sequence of the mouse mu chain from the BALB/c myeloma tumor MOPC 104E is reported. The C mu region contains four consecutive homology regions of approximately 110 residues and a COOH-terminal region of 19 residues. A comparison of this mu chain from mouse with a complete mu sequence from human (Ou) and a partial mu chain sequence from dog (Moo) reveals a striking gradient of increasing homology from the NH2-terminal to the COOH-terminal portion of these mu chains, with the former being the least and the latter the most highly conserved. Four of the five sites of carbohydrate attachment appear to be at identical residue positions when the constant regions of the mouse and human mu chains are compared. The mu chain of MOPC 104E has a carbohydrate moiety attached in the second hypervariable region. This is particularly interesting in view of the fact that MOPC 104E binds alpha-(1 leads to 3)-dextran, a simple carbohydrate. The structural and functional constraints imposed by these comparative sequence analyses are discussed. PMID:111247

  17. Bacteriorhodopsin: partial sequence of mRNA provides amino acid sequence in the precursor region.

    PubMed Central

    Chang, S H; Majumdar, A; Dunn, R; Makabe, O; RajBhandary, U L; Khorana, H G; Ohtsuka, E; Tanaka, T; Taniyama, Y O; Ikehara, M

    1981-01-01

    mRNA for bacteriorhodopsin from Halobacterium halobium has been partially purified. By using this mRNA as template in the presence of reverse transcriptase RNA-dependent DNA nucleotidyltransferase and a 5'-[32P] synthetic oligodeoxyribonucleotide corresponding to amino acids 9-12 of bacteriorhodopsin as primer, we have isolated the major 5'-[32P]cDNA product, approximately 80 nucleotides long, and determined its sequence. Based on the cDNA sequence, the 5'-proximal sequence of bacteriorhodopsin mRNA is G-C-A-U-G-U-U-G-G-A-G-U-U-A-U-U-G-C-C-A-A-C-A-G-C-A-G-U-G-G-A-G-G-G-G-G-U-A-U-C -G-C-A-G-G-C-C-C-A-G-A-U-C-A-C-C-G-G-A-C-G-U-C-C-G. This includes the expected sequence for amino acids 1-8 and shows that bacteriorhodopsin is synthesized as a precursor that is at least 13 amino acids longer (Met-Leu-Glu-Leu-Leu-Pro-Thr-Ala-Val-Glu-Gly-Val-Ser) at the NH2 terminus. Agarose/urea gel electrophoresis of the partially purified mRNA showed several bands; of these, a major one hybridized with 5'-[32P]cDNA. These results suggest that the bacteriorhodopsin mRNA in the partially purified preparation is homogeneous in size and that it constitutes a substantial portion of the RNA preparation subjected to electrophoresis. Images PMID:6943548

  18. Relationship between peptide amino acid sequence and membrane curvature generation

    NASA Astrophysics Data System (ADS)

    Schmidt, Nathan; Kuo, David; Hwee Lai, Ghee; Mishra, Abhijit; Wong, Gerard

    2012-02-01

    Amphipathic peptides and amphipathic domains in proteins can perturb and restructure biological membranes. For example, it is believed that the cationic, amphipathic motif found in membrane active antimicrobial peptides (AMPs) is responsible for their membrane disruption mechanisms of action. And ApoA-I, the main apolipoprotein in high density lipoprotein contains a series of amphipathic α-helical repeats which are responsible for its lipid associating properties. We use small angle x-ray scattering (SAXS) to investigate the interaction of model cell membranes with prototypical AMPs and consensus peptides derived from the helical structural motif of ApoA-I. The relationship between peptide sequence and the peptide-induced changes in membrane curvature and topology is examined. By comparing the membrane rearrangement and corresponding phase behavior induced by these two distinct classes of membrane restructuring peptides we will discuss the role of amino acid sequence on membrane curvature generation.

  19. Nucleotide sequence and the encoded amino acids of human apolipoprotein A-I mRNA.

    PubMed Central

    Law, S W; Brewer, H B

    1984-01-01

    The cDNA clones encoding the precursor form of human liver apolipoprotein A-I (apoA-I), preproapoA-I, have been isolated from a cDNA library. A 17-base synthetic oligonucleotide based on residues 108-113 of apoA-I and a 26-base primer-extended, dideoxynucleotide-terminated cDNA were used as hybridization probes to select for recombinant plasmids bearing the apoA-I sequence. The complete nucleic acid sequence of human liver preproapoA-I has been determined by analysis of the cloned cDNA. The sequence is composed of 801 nucleotides encoding 267 amino acid residues. PreproapoA-I contains an 18-amino-acid prepeptide and a 6-amino-acid propeptide connected to the amino terminus of the 243-amino acid mature apoA-I. Southern blotting analysis of chromosomal DNA obtained from peripheral blood indicated the apoA-I gene is contained in a 2.1-kilobase-pair Pst I fragment and there is no gross difference in structural organization between the normal apoA-I gene and the Tangier disease apoA-I gene. Images PMID:6198645

  20. AcalPred: a sequence-based tool for discriminating between acidic and alkaline enzymes.

    PubMed

    Lin, Hao; Chen, Wei; Ding, Hui

    2013-01-01

    The structure and activity of enzymes are influenced by pH value of their surroundings. Although many enzymes work well in the pH range from 6 to 8, some specific enzymes have good efficiencies only in acidic (pH<5) or alkaline (pH>9) solution. Studies have demonstrated that the activities of enzymes correlate with their primary sequences. It is crucial to judge enzyme adaptation to acidic or alkaline environment from its amino acid sequence in molecular mechanism clarification and the design of high efficient enzymes. In this study, we developed a sequence-based method to discriminate acidic enzymes from alkaline enzymes. The analysis of variance was used to choose the optimized discriminating features derived from g-gap dipeptide compositions. And support vector machine was utilized to establish the prediction model. In the rigorous jackknife cross-validation, the overall accuracy of 96.7% was achieved. The method can correctly predict 96.3% acidic and 97.1% alkaline enzymes. Through the comparison between the proposed method and previous methods, it is demonstrated that the proposed method is more accurate. On the basis of this proposed method, we have built an online web-server called AcalPred which can be freely accessed from the website (http://lin.uestc.edu.cn/server/AcalPred). We believe that the AcalPred will become a powerful tool to study enzyme adaptation to acidic or alkaline environment.

  1. Ultrasensitive nucleic acid sequence detection by single-molecule electrophoresis

    SciTech Connect

    Castro, A; Shera, E.B.

    1996-09-01

    This is the final report of a one-year laboratory-directed research and development project at Los Alamos National Laboratory. There has been considerable interest in the development of very sensitive clinical diagnostic techniques over the last few years. Many pathogenic agents are often present in extremely small concentrations in clinical samples, especially at the initial stages of infection, making their detection very difficult. This project sought to develop a new technique for the detection and accurate quantification of specific bacterial and viral nucleic acid sequences in clinical samples. The scheme involved the use of novel hybridization probes for the detection of nucleic acids combined with our recently developed technique of single-molecule electrophoresis. This project is directly relevant to the DOE`s Defense Programs strategic directions in the area of biological warfare counter-proliferation.

  2. Role of the two-component leader sequence and mature amino acid sequences in extracellular export of endoglucanase EGL from Pseudomonas solanacearum.

    PubMed Central

    Huang, J Z; Schell, M A

    1992-01-01

    The egl gene of Pseudomonas solanacearum encodes a 43-kDa extracellular endoglucanase (mEGL) involved in wilt disease caused by this phytopathogen. Egl is initially translated with a 45-residue, two-part leader sequence. The first 19 residues are apparently removed by signal peptidase II during export of Egl across the inner membrane (IM); the remaining residues of the leader sequence (modified with palmitate) are removed during export across the outer membrane (OM). Localization of Egl-PhoA fusion proteins showed that the first 26 residues of the Egl leader sequence are required and sufficient to direct lipid modification, processing, and export of Egl or PhoA across the IM but not the OM. Fusions of the complete 45-residue leader sequence or of the leader and increasing portions of mEgl sequences to PhoA did not cause its export across the OM. In-frame deletion of portions of mEGL-coding sequences blocked export of the truncated polypeptides across the OM without affecting export across the IM. These results indicate that the first part of the leader sequence functions independently to direct export of Egl across the IM while the second part and sequences and structures in mEGL are involved in export across the OM. Computer analysis of the mEgl amino acid sequence obtained from its nucleotide sequence identified a region of mEGL similar in amino acid sequence to regions in other prokaryotic endoglucanases. Images PMID:1735723

  3. Synthesis and use of universal sequence probes in fluorogenic multi-strand hybridisation complexes for economical nucleic acid testing.

    PubMed

    French, David J; Richardson, James A; Howard, Rebecca L; Brown, Tom; Debenham, Paul G

    2015-08-01

    Analysis of nucleic acid amplification products has become the gold standard for applications such as pathogen detection and characterisation of single nucleotide polymorphisms and short tandem repeat sequences. The development of real-time PCR and melting curve analysis using fluorescent probes has simplified nucleic acid analyses. However, the cost of probe synthesis can be prohibitive when developing large panels of tests. We describe an economic two-stage method for probe synthesis, and a new method for nucleic acid sequence analysis which together considerably reduce costs. The analysis method utilises three-strand and four-strand hybridisation complexes for the detection and identification of nucleic acid target sequences by real-time PCR and fluorescence melting. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. Complete genome sequence of Lactococcus lactis IO-1, a lactic acid bacterium that utilizes xylose and produces high levels of L-lactic acid.

    PubMed

    Kato, Hiroaki; Shiwa, Yuh; Oshima, Kenshiro; Machii, Miki; Araya-Kojima, Tomoko; Zendo, Takeshi; Shimizu-Kadota, Mariko; Hattori, Masahira; Sonomoto, Kenji; Yoshikawa, Hirofumi

    2012-04-01

    We report the complete genome sequence of Lactococcus lactis IO-1 (= JCM7638). It is a nondairy lactic acid bacterium, produces nisin Z, ferments xylose, and produces predominantly L-lactic acid at high xylose concentrations. From ortholog analysis with other five L. lactis strains, IO-1 was identified as L. lactis subsp. lactis.

  5. RSAT 2011: regulatory sequence analysis tools

    PubMed Central

    Thomas-Chollier, Morgane; Defrance, Matthieu; Medina-Rivera, Alejandra; Sand, Olivier; Herrmann, Carl; Thieffry, Denis

    2011-01-01

    RSAT (Regulatory Sequence Analysis Tools) comprises a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. Thirteen new programs have been added to the 30 described in the 2008 NAR Web Software Issue, including an automated sequence retrieval from EnsEMBL (retrieve-ensembl-seq), two novel motif discovery algorithms (oligo-diff and info-gibbs), a 100-times faster version of matrix-scan enabling the scanning of genome-scale sequence sets, and a series of facilities for random model generation and statistical evaluation (random-genome-fragments, random-motifs, random-sites, implant-sites, sequence-probability, permute-matrix). Our most recent work also focused on motif comparison (compare-matrices) and evaluation of motif quality (matrix-quality) by combining theoretical and empirical measures to assess the predictive capability of position-specific scoring matrices. To process large collections of peak sequences obtained from ChIP-seq or related technologies, RSAT provides a new program (peak-motifs) that combines several efficient motif discovery algorithms to predict transcription factor binding motifs, match them against motif databases and predict their binding sites. Availability (web site, stand-alone programs and SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services): http://rsat.ulb.ac.be/rsat/. PMID:21715389

  6. Sequence analysis of the AAA protein family.

    PubMed Central

    Beyer, A.

    1997-01-01

    The AAA protein family, a recently recognized group of Walker-type ATPases, has been subjected to an extensive sequence analysis. Multiple sequence alignments revealed the existence of a region of sequence similarity, the so-called AAA cassette. The borders of this cassette were localized and within it, three boxes of a high degree of conservation were identified. Two of these boxes could be assigned to substantial parts of the ATP binding site (namely, to Walker motifs A and B); the third may be a portion of the catalytic center. Phylogenetic trees were calculated to obtain insights into the evolutionary history of the family. Subfamilies with varying degrees of intra-relatedness could be discriminated; these relationships are also supported by analysis of sequences outside the canonical AAA boxes: within the cassette are regions that are strongly conserved within each subfamily, whereas little or even no similarity between different subfamilies can be observed. These regions are well suited to define fingerprints for subfamilies. A secondary structure prediction utilizing all available sequence information was performed and the result was fitted to the general 3D structure of a Walker A/GTPase. The agreement was unexpectedly high and strongly supports the conclusion that the AAA family belongs to the Walker superfamily of A/GTPases. PMID:9336829

  7. Molecular cloning and sequencing of the human erythrocyte 2,3-bisphosphoglycerate mutase cDNA: revised amino acid sequence.

    PubMed Central

    Joulin, V; Peduzzi, J; Roméo, P H; Rosa, R; Valentin, C; Dubart, A; Lapeyre, B; Blouquit, Y; Garel, M C; Goossens, M

    1986-01-01

    The human erythrocyte 2,3-bisphosphoglycerate mutase (BPGM) is a multifunctional enzyme which controls the metabolism of 2,3-diphosphoglycerate, the main allosteric effector of haemoglobin. Several cDNA banks were constructed from reticulocyte mRNA, either by conventional cloning methods in pBR322 and screening with specific mixed oligonucleotide probes, or in the expression vector lambda gt 11. The largest cDNA isolated contained 1673 bases [plus the poly(A) tail], which is slightly smaller than the size of the intact mRNA as estimated by Northern blot analysis (approximately 1800 bases). This cDNA encodes for a protein of 258 residues; the protein yielded 34 tryptic peptides which were subsequently isolated by h.p.l.c. Our nucleotide sequence data were entirely confirmed by the amino acid composition of these tryptic peptides and reveal several major differences from the published sequence; the revised amino acid sequence of human BPGM is presented. These findings represent the first step in the study of the expression and regulation of this enzyme as a specific marker of the erythroid cell line. Images Fig. 5. PMID:3023066

  8. Information theory applications for biological sequence analysis.

    PubMed

    Vinga, Susana

    2014-05-01

    Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.

  9. Nucleic acid (cDNA) and amino acid sequences of alpha-type gliadins from wheat (Triticum aestivum).

    PubMed Central

    Kasarda, D D; Okita, T W; Bernardin, J E; Baecker, P A; Nimmo, C C; Lew, E J; Dietler, M D; Greene, F C

    1984-01-01

    The complete amino acid sequence for an alpha-type gliadin protein of wheat (Triticum aestivum Linnaeus) endosperm has been derived from a cloned cDNA sequence. An additional cDNA clone that corresponds to about 75% of a similar alpha-type gliadin has been sequenced and shows some important differences. About 97% of the composite sequence of A-gliadin (an alpha-type gliadin fraction) has also been obtained by direct amino acid sequencing. This sequence shows a high degree of similarity with amino acid sequences derived from both cDNA clones and is virtually identical to one of them. On the basis of sequence information, after loss of the signal sequence, the mature alpha-type gliadins may be divided into five different domains, two of which may have evolved from an ancestral gliadin gene, whereas the remaining three contain repeating sequences that may have developed independently. Images PMID:6589619

  10. Coding and 3' non-coding nucleotide sequence of chalcone synthase mRNA and assignment of amino acid sequence of the enzyme

    PubMed Central

    Reimold, Ursula; Kröger, Manfred; Kreuzaler, Fritz; Hahlbrock, Klaus

    1983-01-01

    The nucleotide sequence of an almost complete cDNA copy of chalcone synthase mRNA from cultured parsley cells (Petroselinum hortense) has been determined. The cDNA copy comprised the complete coding sequence for chalcone synthase, a short A-rich stretch of the 5' non-coding region and the complete 3' non-coding region including a poly(A) tail. The amino acid sequence deduced from the nucleotide sequence of the cDNA is consistent with a partial N-terminal sequence analysis, the total amino acid composition, the cyanogen bromide cleavage pattern, and the apparent mol. wt. of the subunit of the purified enzyme. PMID:16453477

  11. CISAPS: Complex Informational Spectrum for the Analysis of Protein Sequences.

    PubMed

    Chrysostomou, Charalambos; Seker, Huseyin; Aydin, Nizamettin

    2015-01-01

    Complex informational spectrum analysis for protein sequences (CISAPS) and its web-based server are developed and presented. As recent studies show, only the use of the absolute spectrum in the analysis of protein sequences using the informational spectrum analysis is proven to be insufficient. Therefore, CISAPS is developed to consider and provide results in three forms including absolute, real, and imaginary spectrum. Biologically related features to the analysis of influenza A subtypes as presented as a case study in this study can also appear individually either in the real or imaginary spectrum. As the results presented, protein classes can present similarities or differences according to the features extracted from CISAPS web server. These associations are probable to be related with the protein feature that the specific amino acid index represents. In addition, various technical issues such as zero-padding and windowing that may affect the analysis are also addressed. CISAPS uses an expanded list of 611 unique amino acid indices where each one represents a different property to perform the analysis. This web-based server enables researchers with little knowledge of signal processing methods to apply and include complex informational spectrum analysis to their work.

  12. CISAPS: Complex Informational Spectrum for the Analysis of Protein Sequences

    PubMed Central

    Seker, Huseyin; Aydin, Nizamettin

    2015-01-01

    Complex informational spectrum analysis for protein sequences (CISAPS) and its web-based server are developed and presented. As recent studies show, only the use of the absolute spectrum in the analysis of protein sequences using the informational spectrum analysis is proven to be insufficient. Therefore, CISAPS is developed to consider and provide results in three forms including absolute, real, and imaginary spectrum. Biologically related features to the analysis of influenza A subtypes as presented as a case study in this study can also appear individually either in the real or imaginary spectrum. As the results presented, protein classes can present similarities or differences according to the features extracted from CISAPS web server. These associations are probable to be related with the protein feature that the specific amino acid index represents. In addition, various technical issues such as zero-padding and windowing that may affect the analysis are also addressed. CISAPS uses an expanded list of 611 unique amino acid indices where each one represents a different property to perform the analysis. This web-based server enables researchers with little knowledge of signal processing methods to apply and include complex informational spectrum analysis to their work. PMID:25632276

  13. Sequence analysis by iterated maps, a review

    PubMed Central

    2014-01-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, ‘Chaos Game Representation’. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results. PMID:24162172

  14. Structural gene and complete amino acid sequence of Vibrio alginolyticus collagenase.

    PubMed Central

    Takeuchi, H; Shibano, Y; Morihara, K; Fukushima, J; Inami, S; Keil, B; Gilles, A M; Kawamoto, S; Okuda, K

    1992-01-01

    The DNA encoding the collagenase of Vibrio alginolyticus was cloned, and its complete nucleotide sequence was determined. When the cloned gene was ligated to pUC18, the Escherichia coli expression vector, bacteria carrying the gene exhibited both collagenase antigen and collagenase activity. The open reading frame from the ATG initiation codon was 2442 bp in length for the collagenase structural gene. The amino acid sequence, deduced from the nucleotide sequence, revealed that the mature collagenase consists of 739 amino acids with an Mr of 81875. The amino acid sequences of 20 polypeptide fragments were completely identical with the deduced amino acid sequences of the collagenase gene. The amino acid composition predicted from the DNA sequence was similar to the chemically determined composition of purified collagenase reported previously. The analyses of both the DNA and amino acid sequences of the collagenase gene were rigorously performed, but we could not detect any significant sequence similarity to other collagenases. Images Fig. 2. PMID:1311172

  15. The primary structure of E. coli RNA polymerase, Nucleotide sequence of the rpoC gene and amino acid sequence of the beta'-subunit.

    PubMed Central

    Ovchinnikov YuA; Monastyrskaya, G S; Gubanov, V V; Guryev, S O; Salomatina, I S; Shuvaeva, T M; Lipkin, V M; Sverdlov, E D

    1982-01-01

    The primary structure of the E. coli rpoC gene (5321 base pairs) coding the beta'-subunit of RNA polymerase as well as its adjacent segment have been determined. The structure analysis of the peptides obtained by cleavage of the protein with cyanogen bromide and trypsin has confirmed the amino acid sequence of the beta'-subunit deduced from the nucleotide sequence analysis. The beta'-subunit of E. coli RNA polymerase contains 1407 amino acid residues. Its translation is initiated by codon GUG and terminated by codon TAA. It has been detected that the sequence following the terminating codon is strikingly homologous to known sequences of rho-independent terminators. PMID:6287430

  16. Mathematical Characterization of Protein Sequences Using Patterns as Chemical Group Combinations of Amino Acids.

    PubMed

    Das, Jayanta Kumar; Das, Provas; Ray, Korak Kumar; Choudhury, Pabitra Pal; Jana, Siddhartha Sankar

    2016-01-01

    Comparison of amino acid sequence similarity is the fundamental concept behind the protein phylogenetic tree formation. By virtue of this method, we can explain the evolutionary relationships, but further explanations are not possible unless sequences are studied through the chemical nature of individual amino acids. Here we develop a new methodology to characterize the protein sequences on the basis of the chemical nature of the amino acids. We design various algorithms for studying the variation of chemical group transitions and various chemical group combinations as patterns in the protein sequences. The amino acid sequence of conventional myosin II head domain of 14 family members are taken to illustrate this new approach. We find two blocks of maximum length 6 aa as 'FPKATD' and 'Y/FTNEKL' without repeating the same chemical nature and one block of maximum length 20 aa with the repetition of chemical nature which are common among all 14 members. We also check commonality with another motor protein sub-family kinesin, KIF1A. Based on our analysis we find a common block of length 8 aa both in myosin II and KIF1A. This motif is located in the neck linker region which could be responsible for the generation of mechanical force, enabling us to find the unique blocks which remain chemically conserved across the family. We also validate our methodology with different protein families such as MYOI, Myosin light chain kinase (MLCK) and Rho-associated protein kinase (ROCK), Na+/K+-ATPase and Ca2+-ATPase. Altogether, our studies provide a new methodology for investigating the conserved amino acids' pattern in different proteins.

  17. High sequence homology between protein tyrosine acid phosphatase from boar seminal vesicles and human prostatic acid phosphatase.

    PubMed

    Wysocki, Paweł; Płucienniczak, Grazyna; Strzezek, Jerzy

    2009-01-01

    Boar seminal vesicle protein tyrosine acid phosphatase (PTAP) and human prostatic acid phosphatase (PAP) show high affinity for protein phosphotyrosine residues. The physico-chemical and kinetic properties of the boar and human enzymes are different. The main objective of this study was to establish the nucleotide sequence of cDNA encoding boar PTAP and compare it with that of human PAP cDNA. Also, the amino-acid sequence of boar PTAP was compared with the sequence of human PAP. PTAP was isolated from boar seminal vesicle fluid and sequenced. cDNA to boar seminal vesicle RNA was synthesized, amplified by PCR, cloned in E. coli and sequenced. The obtained N-terminal amino-acid sequence of boar PTAP showed 92% identity with the N-terminal amino-acid sequence of human PAP. The determined sequence of a 354 bp nucleotide fragment (GenBank accession number: GQ184596) showed 90% identity with the corresponding sequence of human PAP. On the basis of this sequence a 118 amino acid fragment of boar PTAP was predicted. This fragment showed 89% identity with the corresponding fragment of human PAP and had a similar hydropathy profile. The compared sequences differ in terms of their isoelectric points and amino-acid composition. This may explain the differences in substrate specificity and inhibitor resistance of boar PTAP and human PAP.

  18. Complete Amino Acid Sequence of a Copper/Zinc-Superoxide Dismutase from Ginger Rhizome.

    PubMed

    Nishiyama, Yuki; Fukamizo, Tamo; Yoneda, Kazunari; Araki, Tomohiro

    2017-04-01

    Superoxide dismutase (SOD) is an antioxidant enzyme protecting cells from oxidative stress. Ginger (Zingiber officinale) is known for its antioxidant properties, however, there are no data on SODs from ginger rhizomes. In this study, we purified SOD from the rhizome of Z. officinale (Zo-SOD) and determined its complete amino acid sequence using N terminal sequencing, amino acid analysis, and de novo sequencing by tandem mass spectrometry. Zo-SOD consists of 151 amino acids with two signature Cu/Zn-SOD motifs and has high similarity to other plant Cu/Zn-SODs. Multiple sequence alignment showed that Cu/Zn-binding residues and cysteines forming a disulfide bond, which are highly conserved in Cu/Zn-SODs, are also present in Zo-SOD. Phylogenetic analysis revealed that plant Cu/Zn-SODs clustered into distinct chloroplastic, cytoplasmic, and intermediate groups. Among them, only chloroplastic enzymes carried amino acid substitutions in the region functionally important for enzymatic activity, suggesting that chloroplastic SODs may have a function distinct from those of SODs localized in other subcellular compartments. The nucleotide sequence of the Zo-SOD coding region was obtained by reverse-translation, and the gene was synthesized, cloned, and expressed. The recombinant Zo-SOD demonstrated pH stability in the range of 5-10, which is similar to other reported Cu/Zn-SODs, and thermal stability in the range of 10-60 °C, which is higher than that for most plant Cu/Zn-SODs but lower compared to the enzyme from a Z. officinale relative Curcuma aromatica.

  19. [Measurement of the amino acid sequence for the fusion protein FP3 with LC-MS/MS].

    PubMed

    Li, Xiang; Gao, Xiang-Dong; Tao, Lei; Pei, De-Ning; Guo, Ying; Rao, Chun-Ming; Wang, Jun-Zhi

    2012-02-01

    The amino acid sequence of the fusion protein FP3 was measured by two types of LC-MS/MS and its primary structure was confirmed. After reduction and alkylation, the protein was digested with trypsin and glycosyl groups in glycopeptide were removed by PNGase F. The mixed peptides were separated by LC, then Q-TOF and Ion trap tandem mass spectrometry were used to measure b, y fragment ions of each peptide to analyze the amino acid sequence of fusion protein FP3. Seventy-six percent of full amino acid sequence of the fusion protein FP3 was measured by LC-ESI-Q-TOF with the remaining 24% completed by LC-ESI-Trap. As LC-MS and tandem mass spectrometry are rapid, sensitive, accurate to measure the protein amino acid sequence, they are important approach to structure analysis and identification of recombinant protein.

  20. [MOLECULAR EVOLUTION OF ION CHANNELS: AMINO ACID SEQUENCES AND 3D STRUCTURES].

    PubMed

    Korkosh, V S; Zhorov, B S; Tikhonov, D B

    2016-01-01

    An integral part of modern evolutionary biology is comparative analysis of structure and function of macromolecules such as proteins. The first and critical step to understand evolution of homologous proteins is their amino acid sequence alignment. However, standard algorithms fop not provide unambiguous sequence alignments for proteins of poor homology. More reliable results can be obtained by comparing experimental 3D structures obtained at atomic resolution, for instance, with the aid of X-ray structural analysis. If such structures are lacking, homology modeling is used, which may take into account indirect experimental data on functional roles of individual amino-acid residues. An important problem is that the sequence alignment, which reflects genetic modifications, does not necessarily correspond to the functional homology. The latter depends on three-dimensional structures which are critical for natural selection. Since alignment techniques relying only on the analysis of primary structures carry no information on the functional properties of proteins, including 3D structures into consideration is very important. Here we consider several examples involving ion channels and demonstrate that alignment of their three-dimensional structures can significantly improve sequence alignments obtained by traditional methods.

  1. Giant panda ribosomal protein S14: cDNA, genomic sequence cloning, sequence analysis, and overexpression.

    PubMed

    Wu, G-F; Hou, Y-L; Hou, W-R; Song, Y; Zhang, T

    2010-10-13

    RPS14 is a component of the 40S ribosomal subunit encoded by the RPS14 gene and is required for its maturation. The cDNA and the genomic sequence of RPS14 were cloned successfully from the giant panda (Ailuropoda melanoleuca) using RT-PCR technology and touchdown-PCR, respectively; they were both sequenced and analyzed. The length of the cloned cDNA fragment was 492 bp; it contained an open-reading frame of 456 bp, encoding 151 amino acids. The length of the genomic sequence is 3421 bp; it contains four exons and three introns. Alignment analysis indicates that the nucleotide sequence shares a high degree of homology with those of Homo sapiens, Bos taurus, Mus musculus, Rattus norvegicus, Gallus gallus, Xenopus laevis, and Danio rerio (93.64, 83.37, 92.54, 91.89, 87.28, 84.21, and 84.87%, respectively). Comparison of the deduced amino acid sequences of the giant panda with those of these other species revealed that the RPS14 of giant panda is highly homologous with those of B. taurus, R. norvegicus and D. rerio (85.99, 99.34 and 99.34%, respectively), and is 100% identical with the others. This degree of conservation of RPS14 suggests evolutionary selection. Topology prediction shows that there are two N-glycosylation sites, three protein kinase C phosphorylation sites, two casein kinase II phosphorylation sites, four N-myristoylation sites, two amidation sites, and one ribosomal protein S11 signature in the RPS14 protein of the giant panda. The RPS14 gene can be readily expressed in Escherichia coli. When it was fused with the N-terminally His-tagged protein, it gave rise to accumulation of an expected 22-kDa polypeptide, in good agreement with the predicted molecular weight. The expression product obtained can be purified for studies of its function.

  2. Complete amino acid sequence of the A chain of human complement-classical-pathway enzyme C1r.

    PubMed Central

    Arlaud, G J; Willis, A C; Gagnon, J

    1987-01-01

    The amino acid sequence of human C1r A chain was determined, from sequence analysis performed on fragments obtained from C1r autolytic cleavage, cleavage of methionyl bonds, tryptic cleavages at arginine and lysine residues, and cleavages by staphylococcal proteinase. The polypeptide chain has an N-terminal serine residue and contains 446 amino acid residues (Mr 51,200). The sequence data allow chemical characterization of fragments alpha (positions 1-211), beta (positions 212-279) and gamma (positions 280-446) yielded from C1r autolytic cleavage, and identification of the two major cleavage sites generating these fragments. Position 150 of C1r A chain is occupied by a modified amino acid residue that, upon acid hydrolysis, yields erythro-beta-hydroxyaspartic acid, and that is located in a sequence homologous to the beta-hydroxyaspartic acid-containing regions of Factor IX, Factor X, protein C and protein Z. Sequence comparison reveals internal homology between two segments (positions 10-78 and 186-257). Two carbohydrate moieties are attached to the polypeptide chain, both via asparagine residues at positions 108 and 204. Combined with the previously determined sequence of C1r B chain [Arlaud & Gagnon (1983) Biochemistry 22, 1758-1764], these data give the complete sequence of human C1r. PMID:3036070

  3. Common recognition principles across diverse sequence and structural families of sialic acid binding proteins.

    PubMed

    Bhagavat, Raghu; Chandra, Nagasuma

    2014-01-01

    Sialic acids form a large family of 9-carbon monosaccharides and are integral components of glycoconjugates. They are known to bind to a wide range of receptors belonging to diverse sequence families and fold classes and are key mediators in a plethora of cellular processes. Thus, it is of great interest to understand the features that give rise to such a recognition capability. Structural analyses using a non-redundant data set of known sialic acid binding proteins was carried out, which included exhaustive binding site comparisons and site alignments using in-house algorithms, followed by clustering and tree computation, which has led to derivation of sialic acid recognition principles. Although the proteins in the data set belong to several sequence and structure families, their binding sites could be grouped into only six types. Structural comparison of the binding sites indicates that all sites contain one or more different combinations of key structural features over a common scaffold. The six binding site types thus serve as structural motifs for recognizing sialic acid. Scanning the motifs against a non-redundant set of binding sites from PDB indicated the motifs to be specific for sialic acid recognition. Knowledge of determinants obtained from this study will be useful for detecting function in unknown proteins. As an example analysis, a genome-wide scan for the motifs in structures of Mycobacterium tuberculosis proteome identified 17 hits that contain combinations of the features, suggesting a possible function of sialic acid binding by these proteins.

  4. Nucleotide sequences of the Pseudomonas savastanoi indoleacetic acid genes show homology with Agrobacterium tumefaciens T-DNA

    PubMed Central

    Yamada, Tetsuji; Palm, Curtis J.; Brooks, Bob; Kosuge, Tsune

    1985-01-01

    We report the nucleotide sequences of iaaM and iaaH, the genetic determinants for, respectively, tryptophan 2-monooxygenase and indoleacetamide hydrolase, the enzymes that catalyze the conversion of L-tryptophan to indoleacetic acid in the tumor-forming bacterium Pseudomonas syringae pv. savastanoi. The sequence analysis indicates that the iaaM locus contains an open reading frame encoding 557 amino acids that would comprise a protein with a molecular weight of 61,783; the iaaH locus contains an open reading frame of 455 amino acids that would comprise a protein with a molecular weight of 48,515. Significant amino acid sequence homology was found between the predicted sequence of the tryptophan monooxygenase of P. savastanoi and the deduced product of the T-DNA tms-1 gene of the octopine-type plasmid pTiA6NC from Agrobacterium tumefaciens. Strong homology was found in the 25 amino acid sequence in the putative FAD-binding region of tryptophan monooxygenase. Homology was also found in the amino acid sequences representing the central regions of the putative products of iaaH and tms-2 T-DNA. The results suggest a strong similarity in the pathways for indoleacetic acid synthesis encoded by genes in P. savastanoi and in A. tumefaciens T-DNA. Images PMID:16593610

  5. Cloning and sequence analysis of an actin gene in aloe.

    PubMed

    Wen, S S; He, D W; Liao, C M; Li, J; Wen, G Q; Liu, X H

    2014-07-04

    Aloe (Aloe spp), containing abundant polysaccharides and numerous bioactive ingredients, has remarkable medical, ornamental, calleidic, and edible values. In the present study, the total RNA was extracted from aloe leaf tissue. The isolated high-quality RNA was further used to clone actin gene by using reverse transcription-polymerase chain reaction (RT-PCR). The result of sequence analysis for the amplified fragment revealed that the cloned actin gene was 1012 bp in length (GenBank accession No. KC751541.1) and contained a 924-bp coding region and encoded a protein consisting of 307 amino acids. Homologous alignment showed that it shared over 80 and 96% identity with the nucleotide and amino acid sequences of actin from other plants, respectively. In addition, the cloned gene was used for phylogenetic analyses based on the deduced amino acid sequences, and the results suggested that the actin gene is highly conserved in evolution. The findings of this study will be useful for investigating the expression patterns of other genes in Aloe.

  6. The `heavy' subunit of the photosynthetic reaction centre from Rhodopseudomonas viridis: isolation of the gene, nucleotide and amino acid sequence

    PubMed Central

    Michel, H.; Weyer, K. A.; Gruenberg, H.; Lottspeich, F.

    1985-01-01

    The gene coding for the `heavy' subunit of the photosynthetic reaction centre from Rhodopseudomonas viridis was isolated in an expression vector. Expression of the heavy subunit in Escherichia coli was detected with antibodies raised against crystalline reaction centres. The entire subunit, and not a fusion protein, was expressed in E. coli. The protein coding region of the gene was sequenced and the amino acid sequence derived. Part of the amino acid sequence was confirmed by chemical sequence analysis of the protein. The heavy subunit consists of 258 amino acids and its mol. wt. is 28 345. It possesses one membrane-spanning α-helical segment, as was revealed by the concomitant X-ray structure analysis. ImagesFig. 1.Fig. 2. PMID:16453623

  7. Reticuloendotheliosis Virus Nucleic Acid Sequences in Cellular DNA

    PubMed Central

    Kang, Chil-Yong; Temin, Howard M.

    1974-01-01

    Reticuloendotheliosis virus 60S RNA labeled with 125I, or reticuloendotheliosis virus complementary DNA labeled with 3H, were hybridized to DNAs from infected chicken and pheasant cells. Most of the sequences of the viral RNA were found in the infected cell DNAs. The reticuloendotheliosis viruses, therefore, replicate through a DNA intermediate. The same labeled nucleic acids were hybridized to DNA of uninfected chicken, pheasant, quail, turkey, and duck. About 10% of the sequences of reticuloendotheliosis virus RNA were present in the DNA of uninfected chicken, pheasant, quail, and turkey. None were detected in DNA of duck. The specificity of the hybridization was shown by competition between unlabeled and 125I-labeled viral RNAs and by determination of melting temperatures. In contrast, 125I-labeled RNA of Rous-associated virus-O, an avian leukosis-sarcoma virus, hybridized 55% to DNA of uninfected chicken, 20% to DNA of uninfected pheasant, 15% to DNA of uninfected quail, 10% to DNA of uninfected turkey, and less than 1% to DNA of uninfected duck. PMID:4372393

  8. Nucleic acid (cDNA) and amino acid sequences of the maize endosperm protein glutelin-2.

    PubMed Central

    Prat, S; Cortadas, J; Puigdomènech, P; Palau, J

    1985-01-01

    The cDNA coding for a glutelin-2 protein from maize endosperm has been cloned and the complete amino acid sequence of the protein derived for the first time. An immature maize endosperm cDNA bank was screened for the expression of a beta-lactamase:glutelin-2 (G2) fusion polypeptide by using antibodies against the purified 28 kd G2 protein. A clone corresponding to the 28 kd G2 protein was sequenced and the primary structure of this protein was derived. Five regions can be defined in the protein sequence: an 11 residue N-terminal part, a repeated region formed by eight units of the sequence Pro-Pro-Pro-Val-His-Leu, an alternating Pro-X stretch 21 residues long, a Cys rich domain and a C-terminal part rich in Gln. The protein sequence is preceded by 19 residues which have the characteristics of the signal peptide found in secreted proteins. Unlike zeins, the main maize storage proteins, 28 kd glutelin-2 has several homologous sequences in common with other cereal storage proteins. Images PMID:3839076

  9. NexGen Production – Sequencing and Analysis

    SciTech Connect

    Muzny, Donna

    2010-06-02

    Donna Muzny of the Baylor College of Medicine Human Genome Sequencing Center discusses next generation sequencing platforms and evaluating pipeline performance on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  10. Multiple site-selective insertions of non-canonical amino acids into sequence-repetitive polypeptides

    PubMed Central

    Wu, I-Lin; Patterson, Melissa A.; Carpenter Desai, Holly E.; Mehl, Ryan A.; Giorgi, Gianluca

    2013-01-01

    A simple and efficient method is described for introduction of non-canonical amino acids at multiple, structurally defined sites within recombinant polypeptide sequences. E. coli MRA30, a bacterial host strain with attenuated activity for release factor 1 (RF1), is assessed for its ability to support the incorporation of a diverse range of non-canonical amino acids in response to multiple encoded amber (TAG) codons within genetic templates derived from superfolder GFP and an elastin-mimetic protein polymer. Suppression efficiency and isolated protein yield were observed to depend on the identity of the orthogonal aminoacyl-tRNA synthetase/tRNACUA pair and the non-canonical amino acid substrate. This approach afforded elastin-mimetic protein polymers containing non-canonical amino acid derivatives at up to twenty-two positions within the repeat sequence with high levels of substitution. The identity and position of the variant residues was confirmed by mass spectrometric analysis of the full-length polypeptides and proteolytic cleavage fragments resulting from thermolysin digestion. The accumulated data suggest that this multi-site suppression approach permits the preparation of protein-based materials in which novel chemical functionality can be introduced at precisely defined positions within the polypeptide sequence. PMID:23625817

  11. The complete amino acid sequence of a trypsin inhibitor from Bauhinia variegata var. candida seeds.

    PubMed

    Di Ciero, L; Oliva, M L; Torquato, R; Köhler, P; Weder, J K; Camillo Novello, J; Sampaio, C A; Oliveira, B; Marangoni, S

    1998-11-01

    Trypsin inhibitors of two varieties of Bauhinia variegata seeds have been isolated and characterized. Bauhinia variegata candida trypsin inhibitor (BvcTI) and B. variegata lilac trypsin inhibitor (BvlTI) are proteins with Mr of about 20,000 without free sulfhydryl groups. Amino acid analysis shows a high content of aspartic acid, glutamic acid, serine, and glycine, and a low content of histidine, tyrosine, methionine, and lysine in both inhibitors. Isoelectric focusing for both varieties detected three isoforms (pI 4.85, 5.00, and 5.15), which were resolved by HPLC procedure. The trypsin inhibitors show Ki values of 6.9 and 1.2 nM for BvcTI and BvlTI, respectively. The N-terminal sequences of the three trypsin inhibitor isoforms from both varieties of Bauhinia variegata and the complete amino acid sequence of B. variegata var. candida L. trypsin inhibitor isoform 3 (BvcTI-3) are presented. The sequences have been determined by automated Edman degradation of the reduced and carboxymethylated proteins of the peptides resulting from Staphylococcus aureus protease and trypsin digestion. BvcTI-3 is composed of 167 residues and has a calculated molecular mass of 18,529. Homology studies with other trypsin inhibitors show that BvcTI-3 belongs to the Kunitz family. The putative active site encompasses Arg (63)-Ile (64).

  12. Deduced amino acid sequence of human pulmonary surfactant proteolipid: SPL(pVal)

    SciTech Connect

    Whitsett, J.A.; Glasser, S.W.; Korfhagen, T.R.; Weaver, T.E.; Clark, J.; Pilot-Matias, T.; Meuth, J.; Fox, J.L.

    1987-05-01

    Hydrophobic, proteolipid-like protein of Mr 6500 was isolated from ether/ethanol extracts of human, canine and bovine pulmonary surfactant. Amino acid composition of the protein demonstrated a remarkable abundance of hydrophobic residues, particularly valine and leucine. The N-terminal amino acid sequence of the human protein was determined: N-Leu-Ile-Pro-Cys-Cys-Pro-Val-Asn-Leu-Lys-Arg-Leu-Leu-Ile-Val4... An oligonucleotide probe was used to screen an adult human lung cDNA library and resulted in detection of cDNA clones with predicted amino acid sequence with close identity to the N-terminal amino acid sequence of the human peptide. SPL(pVal) was found within the reading frame of a larger peptide. SPL(pVal) results from proteolytic processing of a larger preprotein. Northern blot analysis detected in a single 1.0 kilobase SPL(pVal) RNA which was less abundant in fetal than in adult lung. Mixtures of purified canine and bovine SPL(pVal) and synthetic phospholipids display properties of rapid adsorption and surface tension lowering activity characteristic of surfactant. Human SPL(pVal) is a pulmonary surfactant proteolipid which may therefore be useful in combination with phospholipids and/or other surfactant proteins for the treatment of surfactant deficiency such as hyaline membrane disease in newborn infants.

  13. Predicting protein amidation sites by orchestrating amino acid sequence features

    NASA Astrophysics Data System (ADS)

    Zhao, Shuqiu; Yu, Hua; Gong, Xiujun

    2017-08-01

    Amidation is the fourth major category of post-translational modifications, which plays an important role in physiological and pathological processes. Identifying amidation sites can help us understanding the amidation and recognizing the original reason of many kinds of diseases. But the traditional experimental methods for predicting amidation sites are often time-consuming and expensive. In this study, we propose a computational method for predicting amidation sites by orchestrating amino acid sequence features. Three kinds of feature extraction methods are used to build a feature vector enabling to capture not only the physicochemical properties but also position related information of the amino acids. An extremely randomized trees algorithm is applied to choose the optimal features to remove redundancy and dependence among components of the feature vector by a supervised fashion. Finally the support vector machine classifier is used to label the amidation sites. When tested on an independent data set, it shows that the proposed method performs better than all the previous ones with the prediction accuracy of 0.962 at the Matthew's correlation coefficient of 0.89 and area under curve of 0.964.

  14. Multilevel analysis of sports video sequences

    NASA Astrophysics Data System (ADS)

    Han, Jungong; Farin, Dirk; de With, Peter H. N.

    2006-01-01

    We propose a fully automatic and flexible framework for analysis and summarization of tennis broadcast video sequences, using visual features and specific game-context knowledge. Our framework can analyze a tennis video sequence at three levels, which provides a broad range of different analysis results. The proposed framework includes novel pixel-level and object-level tennis video processing algorithms, such as a moving-player detection taking both the color and the court (playing-field) information into account, and a player-position tracking algorithm based on a 3-D camera model. Additionally, we employ scene-level models for detecting events, like service, base-line rally and net-approach, based on a number real-world visual features. The system can summarize three forms of information: (1) all court-view playing frames in a game, (2) the moving trajectory and real-speed of each player, as well as relative position between the player and the court, (3) the semantic event segments in a game. The proposed framework is flexible in choosing the level of analysis that is desired. It is effective because the framework makes use of several visual cues obtained from the real-world domain to model important events like service, thereby increasing the accuracy of the scene-level analysis. The paper presents attractive experimental results highlighting the system efficiency and analysis capabilities.

  15. Complete nucleic acid sequence of Penaeus stylirostris densovirus (PstDNV) from India.

    PubMed

    Rai, Praveen; Safeena, Muhammed P; Karunasagar, Iddya; Karunasagar, Indrani

    2011-06-01

    Infectious hypodermal and hematopoietic necrosis virus (IHHNV) of shrimp, recently been classified as Penaeus stylirostris densovirus (PstDNV). The complete nucleic acid sequence of PstDNV from India was obtained by cloning and sequencing of different DNA fragment of the virus. The genome organisation of PstDNV revealed that there were three major coding domains: a left ORF (NS1) of 2001 bp, a mid ORF (NS2) of 1092 bp and a right ORF (VP) of 990 bp. The complete genome and amino acid sequences of three proteins viz., NS1, NS2 and VP were compared with the genomes of the virus reported from Hawaii, China and Mexico and with partial sequence available from isolates from different regions. The phylogenetic analysis of shrimp, insect and vertebrate parvovirus sequences showed that the Indian PstDNV isolate is phylogenetically more closely related to one of the three isolates from Taiwan (AY355307), and two isolates (AY362547 and AY102034) from Thailand.

  16. Molecular cloning and amino acid sequence of human plakoglobin, the common junctional plaque protein

    SciTech Connect

    Franke, W.W.; Goldschmidt, M.D.; Zimbelmann, R.; Mueller, H.M.; Schiller, D.L.; Cowin, P. )

    1989-06-01

    Plakoglobin is a major cytoplasmic protein that occurs in a soluble and a membrane-associated form and is the only known constituent common to the submembranous plaques of both kinds of adhering junctions, the desmosomes and the intermediate junctions. Using a partial cDNA clone for bovine plakoglobin, the authors isolated cDNAs encoding human plakoglobin, determined its nucleotide sequence, and deduced the complete amino acid sequence. The polypeptide encoded by the cDNA was synthesized by in vitro transcription and translation and identified by its comigration with authentic plakoglobin in two-dimensional gel electrophoresis. The identity was further confirmed by comparison of the deduced sequence with the directly determined amino acid sequence of two fragments from bovine plakoglobin. Analysis of the plakoglobin sequence showed the protein to be unrelated to any other known proteins, highly conserved between human and bovine tissues, and characterized by numerous changes between hydrophilic and hydrophobic sections. Only one kind of plakoglobin mRNA was found in most tissues, but an additional mRNA was detected in certain human tumor cell lines. This longer mRNA may be represented by a second type of plakoglobin cDNA, which contains an insertion of 297 nucleotides in the 3{prime} noncoding region.

  17. Complete amino acid sequence of chicken liver acyl carrier protein derived from the fatty acid synthase.

    PubMed

    Huang, W Y; Stoops, J K; Wakil, S J

    1989-04-01

    The acyl carrier protein domain of the chicken liver fatty acid synthase has been isolated after tryptic treatment of the synthase. The isolated domain functions as an acceptor of acetyl and malonyl moieties in the synthase-catalyzed transfer of these groups from their coenzyme A esters and therefore indicates that the acyl carrier protein domain exists in the complex as a discrete entity. The amino acid sequence of the acyl carrier protein was derived from analyses of peptide fragments produced by cyanogen bromide cleavage and trypsin and Staphylococcus aureus V8 protease digestions of the molecule. The isolated acyl carrier protein domain consists of 89 amino acid residues and has a calculated molecular weight of 10,127. The protein contains the phosphopantetheine group attached to the serine residue at position 38. The isolated acyl carrier protein peptide shows some sequence homology with the acyl carrier protein of Escherichia coli, particularly in the vicinity of the site of phosphopantetheine attachment, and shows extensive sequence homology with the acyl carrier protein from the uropygial gland of goose.

  18. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2011-07-01 2011-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  19. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  20. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2013-07-01 2013-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  1. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2012-07-01 2012-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  2. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2014-07-01 2014-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  3. ANTICALIgN: visualizing, editing and analyzing combined nucleotide and amino acid sequence alignments for combinatorial protein engineering.

    PubMed

    Jarasch, Alexander; Kopp, Melanie; Eggenstein, Evelyn; Richter, Antonia; Gebauer, Michaela; Skerra, Arne

    2016-07-01

    ANTIC ALIGN: is an interactive software developed to simultaneously visualize, analyze and modify alignments of DNA and/or protein sequences that arise during combinatorial protein engineering, design and selection. ANTIC ALIGN: combines powerful functions known from currently available sequence analysis tools with unique features for protein engineering, in particular the possibility to display and manipulate nucleotide sequences and their translated amino acid sequences at the same time. ANTIC ALIGN: offers both template-based multiple sequence alignment (MSA), using the unmutated protein as reference, and conventional global alignment, to compare sequences that share an evolutionary relationship. The application of similarity-based clustering algorithms facilitates the identification of duplicates or of conserved sequence features among a set of selected clones. Imported nucleotide sequences from DNA sequence analysis are automatically translated into the corresponding amino acid sequences and displayed, offering numerous options for selecting reading frames, highlighting of sequence features and graphical layout of the MSA. The MSA complexity can be reduced by hiding the conserved nucleotide and/or amino acid residues, thus putting emphasis on the relevant mutated positions. ANTIC ALIGN: is also able to handle suppressed stop codons or even to incorporate non-natural amino acids into a coding sequence. We demonstrate crucial functions of ANTIC ALIGN: in an example of Anticalins selected from a lipocalin random library against the fibronectin extradomain B (ED-B), an established marker of tumor vasculature. Apart from engineered protein scaffolds, ANTIC ALIGN: provides a powerful tool in the area of antibody engineering and for directed enzyme evolution. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  4. Genome sequence and analysis of Lactobacillus helveticus

    PubMed Central

    Cremonesi, Paola; Chessa, Stefania; Castiglioni, Bianca

    2013-01-01

    The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of Lactobacillus helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE) inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract. As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones. PMID:23335916

  5. Comparative Analysis of Genome Sequences with VISTA

    DOE Data Explorer

    Dubchak, Inna

    VISTA is a comprehensive suite of programs and databases developed by and hosted at the Genomics Division of Lawrence Berkeley National Laboratory. They provide information and tools designed to facilitate comparative analysis of genomic sequences. Users have two ways to interact with the suite of applications at the VISTA portal. They can submit their own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species. A key menu option is the Enhancer Browser and Database at http://enhancer.lbl.gov/. The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice. Most of these noncoding elements were selected for testing based on their extreme conservation with other vertebrates. The results of this enhancer screen are provided through this publicly available website. The browser also features relevant results by external contributors and a large collection of additional genome-wide conserved noncoding elements which are candidate enhancer sequences. The LBL developers invite external groups to submit computational predictions of developmental enhancers. As of 10/19/2009 the database contains information on 1109 in vivo tested elements - 508 elements with enhancer activity.

  6. Mathematical Characterization of Protein Sequences Using Patterns as Chemical Group Combinations of Amino Acids

    PubMed Central

    Choudhury, Pabitra Pal; Jana, Siddhartha Sankar

    2016-01-01

    Comparison of amino acid sequence similarity is the fundamental concept behind the protein phylogenetic tree formation. By virtue of this method, we can explain the evolutionary relationships, but further explanations are not possible unless sequences are studied through the chemical nature of individual amino acids. Here we develop a new methodology to characterize the protein sequences on the basis of the chemical nature of the amino acids. We design various algorithms for studying the variation of chemical group transitions and various chemical group combinations as patterns in the protein sequences. The amino acid sequence of conventional myosin II head domain of 14 family members are taken to illustrate this new approach. We find two blocks of maximum length 6 aa as ‘FPKATD’ and ‘Y/FTNEKL’ without repeating the same chemical nature and one block of maximum length 20 aa with the repetition of chemical nature which are common among all 14 members. We also check commonality with another motor protein sub-family kinesin, KIF1A. Based on our analysis we find a common block of length 8 aa both in myosin II and KIF1A. This motif is located in the neck linker region which could be responsible for the generation of mechanical force, enabling us to find the unique blocks which remain chemically conserved across the family. We also validate our methodology with different protein families such as MYOI, Myosin light chain kinase (MLCK) and Rho-associated protein kinase (ROCK), Na+/K+-ATPase and Ca2+-ATPase. Altogether, our studies provide a new methodology for investigating the conserved amino acids’ pattern in different proteins. PMID:27930687

  7. Analysis of Pteridium ribosomal RNA sequences by rapid direct sequencing.

    PubMed

    Tan, M K

    1991-08-01

    A total of 864 bases from 5 regions interspersed in the 18S and 26S rRNA molecules from various clones of Pteridium covering the general geographical distribution of the genus was analysed using a rapid rRNA sequencing technique. No base difference has been detected amongst the three major lineages, two of which apparently separated before the breakup of the ancient supercontinent, Pangaea. These regions of the rRNA sequences have thus been conserved for at least 160 million years and are here compared with other eukaryotic, especially plant rRNAs.

  8. Sequence analysis of myostatin promoter in cattle.

    PubMed

    Crisà, A; Marchitelli, C; Savarese, M C; Valentini, A

    2003-01-01

    Myostatin (GDF8) acts as a negative regulator of muscle growth. Mutations in the gene are responsible for the double muscling phenotype in several European cattle breeds. Here we describe the sequence of the upstream 5' region of the myostatin gene. The sequence analysis was carried out on three animals of nine European cattle breeds, with the aim to search for polymorphisms. A T/A polymorphism at -371 and a G/C polymorphism at -805 (relative to ATG) were found. PCR- RFLP was used to further screen 353 animals of the nine breeds studied and to assess the frequencies of the SNPs. The promoter region of the gene contains several binding sites for transcription factors found also in other myogenic genes. This may play an important role in the regulation of the protein and consequently on muscular development.

  9. Computer selection of oligonucleotide probes from amino acid sequences for use in gene library screening.

    PubMed

    Yang, J H; Ye, J H; Wallace, D C

    1984-01-11

    We present a computer program, FINPROBE, which utilizes known amino acid sequence data to deduce minimum redundancy oligonucleotide probes for use in screening cDNA or genomic libraries or in primer extension. The user enters the amino acid sequence of interest, the desired probe length, the number of probes sought, and the constraints on oligonucleotide synthesis. The computer generates a table of possible probes listed in increasing order of redundancy and provides the location of each probe in the protein and mRNA coding sequence. Activation of a next function provides the amino acid and mRNA sequences of each probe of interest as well as the complementary sequence and the minimum dissociation temperature of the probe. A final routine prints out the amino acid sequence of the protein in parallel with the mRNA sequence listing all possible codons for each amino acid.

  10. Designing novel kinases using evolutionary sequence analysis

    NASA Astrophysics Data System (ADS)

    Mody, Areez; Weiner, Joan; Iyer, Lakshman; Ramanathan, Sharad

    2006-03-01

    Cellular pathways with new functions are thought to arise from the duplication and divergence of proteins in existing pathways. The MAP kinase pathways in eukaryotes provide one example of this. These pathways consist of the MAP kinase proteins which are responsible for evoking the correct response to external stimuli. In the yeast Saccharomyces cerevisiae these pathways detect pheromones, osmolar stresses and nutrient levels, leading the cell into dramatic changes of morphology. Despite being homologous to each other, the MAP kinase proteins show specificity of function. We investigate the nature of the amino acid sequences conferring this specificity. To this end, we i) search the sequences of similar proteins in other Eukaryote species, ii) make a study of simple theoretical models exploring the constraints felt by these protein segments and iii) experimentally construct, a large suite of hybrid proteins made of segments taken from the homologous proteins. These are then expressed in Yeast cells to see what function they are able to perform. Particularly we also ask whether it is possible to design a new kinase protein possessing new function and specificity.

  11. Novel Numerical Characterization of Protein Sequences Based on Individual Amino Acid and Its Application

    PubMed Central

    Zhang, Yan-ping; Sheng, Ya-jun; He, Ping-an; Ruan, Ji-shuo

    2015-01-01

    The hydrophobicity and hydrophilicity of amino acids play a very important role in protein folding and its interaction with the environment and other molecules, as well as its catalytic mechanism. Based on the two physicochemical indexes, a 2D graphical representation of protein sequences is introduced; meanwhile, a new numerical characteristic has been proposed to compute the distance of different sequences for analysis of sequence similarity/dissimilarity on the basis of this graphical representation. Furthermore, we apply the new distance in the similarities/dissimilarities of ND5 proteins of nine species and predict the four major classes based on the dataset containing 639 domains. The results show that the method is simple and effective. PMID:25705698

  12. Amino acid sequence similarity between rabies virus glycoprotein and snake venom curaremimetic neurotoxins.

    PubMed

    Lentz, T L; Wilson, P T; Hawrot, E; Speicher, D W

    1984-11-16

    Evidence was presented earlier that a host-cell receptor for the highly neurotropic rabies virus might be the acetylcholine receptor. The amino acid sequence of the glycoprotein of rabies virus was compared by computer analysis with that of snake venom curaremimetic neurotoxins, potent ligands of the acetylcholine receptor. A statistically significant sequence relation was found between a segment of the rabies glycoprotein and the entire sequence of long neurotoxins. The greatest identity occurs with residues considered most important in neurotoxicity, including those interacting with the acetylcholine binding site of the acetylcholine receptor. Because of the similarity between the glycoprotein and the receptor-binding region of the neurotoxins, this region of the viral glycoprotein may function as a recognition site for the acetylcholine receptor. Direct binding of the rabies virus glycoprotein to the acetylcholine receptor could contribute to the neurotropism of this virus.

  13. Castor Bean Organelle Genome Sequencing and Worldwide Genetic Diversity Analysis

    PubMed Central

    Chan, Agnes P.; Williams, Amber L.; Rice, Danny W.; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M. J.; Khouri, Hoda M.; Beckstrom-Sternberg, Stephen M.; Allan, Gerard J.; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D.

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade. PMID:21750729

  14. Castor bean organelle genome sequencing and worldwide genetic diversity analysis.

    PubMed

    Rivarola, Maximo; Foster, Jeffrey T; Chan, Agnes P; Williams, Amber L; Rice, Danny W; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M J; Khouri, Hoda M; Beckstrom-Sternberg, Stephen M; Allan, Gerard J; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade.

  15. Nucleotide and deduced amino acid sequences of rat myosin binding protein H (MyBP-H).

    PubMed

    Jung, J; Oh, J; Lee, K

    1998-12-01

    The complete nucleotide sequence of the cDNA clone encoding rat skeletal muscle myosin-binding protein H (MyBP-H) was determined and amino acid sequence was deduced from the nucleotide sequence (GenBank accession number AF077338). The full-length cDNA of 1782 base pairs(bp) contains a single open reading frame of 1454 bp encoding a rat MyBP-H protein of the predicted molecular mass 52.7 kDa and includes the common consensus 'CA__TG' protein binding motif. The cDNA sequence of rat MyBP-H show 92%, 84% and 41% homology with those of mouse, human and chicken, respectively. The protein contains tandem internal motifs array (-FN III-Ig C2-FN III-Ig C2-) in the C-terminal region which resembles to the immunoglobulin superfamily C2 and fibronectin type III motifs. The amino acid sequence of the C-terminal Ig C2 was highly conserved among MyBPs family and other thick filament binding proteins, suggesting that the C-terminal Ig C2 might play an important role in its function. All proteins belonging to MyBP-H member contains 'RKPS' sequence which is assumed to be cAMP- and cGMP-dependent protein kinase A phosphorylation site. Computer analysis of the primary sequence of rat MyBP-H predicted 11 protein kinase C (PKC) phosphorylation site, 7 casein kinase II (CK2) phosphorylation site and 4 N-myristoylation site.

  16. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  17. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  18. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  19. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  20. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  1. Integrating Sequence Evolution into Probabilistic Orthology Analysis.

    PubMed

    Ullah, Ikram; Sjöstrand, Joel; Andersson, Peter; Sennblad, Bengt; Lagergren, Jens

    2015-11-01

    Orthology analysis, that is, finding out whether a pair of homologous genes are orthologs - stemming from a speciation - or paralogs - stemming from a gene duplication - is of central importance in computational biology, genome annotation, and phylogenetic inference. In particular, an orthologous relationship makes functional equivalence of the two genes highly likely. A major approach to orthology analysis is to reconcile a gene tree to the corresponding species tree, (most commonly performed using the most parsimonious reconciliation, MPR). However, most such phylogenetic orthology methods infer the gene tree without considering the constraints implied by the species tree and, perhaps even more importantly, only allow the gene sequences to influence the orthology analysis through the a priori reconstructed gene tree. We propose a sound, comprehensive Bayesian Markov chain Monte Carlo-based method, DLRSOrthology, to compute orthology probabilities. It efficiently sums over the possible gene trees and jointly takes into account the current gene tree, all possible reconciliations to the species tree, and the, typically strong, signal conveyed by the sequences. We compare our method with PrIME-GEM, a probabilistic orthology approach built on a probabilistic duplication-loss model, and MrBayesMPR, a probabilistic orthology approach that is based on conventional Bayesian inference coupled with MPR. We find that DLRSOrthology outperforms these competing approaches on synthetic data as well as on biological data sets and is robust to incomplete taxon sampling artifacts.

  2. Multilocus sequence analysis (MLSA) in prokaryotic taxonomy.

    PubMed

    Glaeser, Stefanie P; Kämpfer, Peter

    2015-06-01

    To obtain a higher resolution of the phylogenetic relationships of species within a genus or genera within a family, multilocus sequence analysis (MLSA) is currently a widely used method. In MLSA studies, partial sequences of genes coding for proteins with conserved functions ('housekeeping genes') are used to generate phylogenetic trees and subsequently deduce phylogenies. However, MLSA is not only suggested as a phylogenetic tool to support and clarify the resolution of bacterial species with a higher resolution, as in 16S rRNA gene-based studies, but has also been discussed as a replacement for DNA-DNA hybridization (DDH) in species delineation. Nevertheless, despite the fact that MLSA has become an accepted and widely used method in prokaryotic taxonomy, no common generally accepted recommendations have been devised to date for either the whole area of microbial taxonomy or for taxa-specific applications of individual MLSA schemes. The different ways MLSA is performed can vary greatly for the selection of genes, their number, and the calculation method used when comparing the sequences obtained. Here, we provide an overview of the historical development of MLSA and critically review its current application in prokaryotic taxonomy by highlighting the advantages and disadvantages of the method's numerous variations. This provides a perspective for its future use in forthcoming genome-based genotypic taxonomic analyses. Copyright © 2015 Elsevier GmbH. All rights reserved.

  3. Vitellogenin, a biomarker for environmental estrogenic pollution, of Reeves' pond turtles: analysis of similarity for its amino acid sequence and cognate mRNA expression after exposure to estrogen.

    PubMed

    Tada, Noriko; Nakao, Aya; Hoshi, Hidenobu; Saka, Masahiro; Kamata, Yoichi

    2008-03-01

    Vitellogenin (VTG), a biomarker for environmental estrogenic pollution, can be detected in the bloodstream of oviparous animals before morphological and functional abnormalities appear due to exposure to environmental estrogens. Reports observing VTG in turtles have been limited. We therefore cloned and sequenced a partial cDNA of VTG in Reeves' pond turtle, Chinemys reevesii. The cloned cDNA fragment possessed the start codon and 2,229 bp, encoding 743 amino acid residues. A sequence of deduced amino acid from the cDNA did not contain a high serine content, such as that which exists in phosvitin. Two N-glycosylation sites were found in the sequence. The sequence was compared to those of two birds (chicken and herring gull), one amphibian (Xenopus), and five fishes (carp, zebrafish, eel, haddock, and red seabream). The C. reevesii VTG was similar to that of herring gull (78%, value of positives), chicken (76%), Xenopus (69%), eel (63%), red seabream (62%), haddock (62%), carp (62%), and zebrafish (61%). The phylogenetic tree showed that C. reevesii VTG existed between the amphibian and birds, and it was present far from fish VTGs. A reverse transcription-polymerase chain reaction method was employed to detect the mRNA expression of the C. reevesii VTG through the use of primers designed from our sequence. The VTG mRNA expression (292 bp) was proven in the total RNA extraction from the liver of the juvenile turtles which were treated with estradiol-17beta. The information herein would be useful for ecotoxicological studies using freshwater turtles and these findings are expected to contribute positively towards wildlife conservation.

  4. Sequence analysis of the 47-kilodalton major integral membrane immunogen of Treponema pallidum.

    PubMed Central

    Hsu, P L; Chamberlain, N R; Orth, K; Moomaw, C R; Zhang, L Q; Slaughter, C A; Radolf, J D; Sell, S; Norgard, M V

    1989-01-01

    The complete primary amino acid sequence for the 47-kilodalton (kDa) major integral membrane immunogen of Treponema pallidum subsp. pallidum was obtained by using a combined strategy of DNA sequencing (of the cloned gene in Escherichia coli) and N-terminal amino acid sequencing of the native (T. pallidum subsp. pallidum-derived) antigen. An open reading frame believed to encode the 47-kDa antigen comprised 367 amino acid codons, which gave rise to a calculated molecular weight for the corresponding antigen of 40,701. Of the 367 amino acids, 113 (31%) were sequenced by N-terminal amino acid sequencing of trypsin and hydroxylamine cleavage fragments of the native molecule isolated from T. pallidum subsp. pallidum; amino acid sequence data had a 100% correlation with that of the amino acid sequence predicted from DNA sequencing of the cloned gene in E. coli. Although no consensus sequences for the initiation of transcription or translation were readily identifiable immediately 5' to the putative methionine start codon, a 63-base-pair PstI fragment located 159 nucleotides upstream was required for expression of the 47-kDa antigen in E. coli. The 47-kDa antigen sequence did not reveal a typical leader sequence. The overall G+C content for the DNA corresponding to the structural gene was 53%. Hydrophilicity analysis identified at least one major hydrophilic domain of the protein near the N terminus of the molecule which potentially represents an immunodominant epitope. No repetitive primary sequence epitopes were found. The combined data provide the molecular basis for further structural and functional studies regarding the role of the antigen in the immunopathogenesis of treponemal disease. PMID:2642466

  5. Gene sequence and predicted amino acid sequence of the motA protein, a membrane-associated protein required for flagellar rotation in Escherichia coli.

    PubMed Central

    Dean, G E; Macnab, R M; Stader, J; Matsumura, P; Burks, C

    1984-01-01

    The motA and motB gene products of Escherichia coli are integral membrane proteins necessary for flagellar rotation. We determined the DNA sequence of the region containing the motA gene and its promoter. Within this sequence, there is an open reading frame of 885 nucleotides, which with high probability (98% confidence level) meets criteria for a coding sequence. The 295-residue amino acid translation product had a molecular weight of 31,974, in good agreement with the value determined experimentally by gel electrophoresis. The amino acid sequence, which was quite hydrophobic, was subjected to a theoretical analysis designed to predict membrane-spanning alpha-helical segments of integral membrane proteins; four such hydrophobic helices were predicted by this treatment. Additional amphipathic helices may also be present. A remarkable feature of the sequence is the existence of two segments of high uncompensated charge density, one positive and the other negative. Possible organization of the protein in the membrane is discussed. Asymmetry in the amino acid composition of translated DNA sequences was used to distinguish between two possible initiation codons. The use of this method as a criterion for authentication of coding regions is described briefly in an Appendix. PMID:6090403

  6. The complete amino acid sequence of growth hormone of an elasmobranch, the blue shark (Prionace glauca).

    PubMed

    Yamaguchi, K; Yasuda, A; Lewis, U J; Yokoo, Y; Kawauchi, H

    1989-02-01

    The complete amino acid sequence of growth hormone (GH) from a phylogenetically ancient fish, the blue shark (Prionace glauca), was determined. The shark GH isolated from pituitary glands by U. J. Lewis, R. N. P. Singh, B. K. Seavey, R. Lasker, and G. E. Pickford (1972, Fish. Bull. 70, 933-939) was purified by reversed-phase high-performance liquid chromatography. The hormone was reduced, carboxymethylated, and subsequently cleaved in turn with cyanogen bromide and Staphylococcus aureus protease. The intact protein was also cleaved with lysyl endopeptidase and o-iodosobenzoic acid. The resulting peptide fragments were separated by rpHPLC and submitted to sequence analysis by automated and manual Edman methods. The shark GH consists of 183 amino acid residues with a calculated molecular weight of 21,081. Sequence comparisons revealed that the elasmobranch GH is considerably more similar to tetrapod GHs (e.g., 68% identity with sea turtle GH, 63% with chicken GH, and 58% with ovine GH) than teleostean GHs (e.g., 38% identities with salmon GH and 42% with bonito GH) except for eel GH (61% identity), and substantiates the earlier finding derived from the immunochemical and biological studies (Hayashida and Lewis, 1978) that the primitive fish are less diverged from the main line of vertebrate evolution leading to the tetrapod than are the modern bony fish.

  7. Complete amino acid sequences of three proteinase inhibitors from white sword bean (Canavalia gladiata).

    PubMed

    Park, S S; Sumi, T; Ohba, H; Nakamura, O; Kimura, M

    2000-10-01

    Three major serine proteinase inhibitors (SBI-1, -2, and -3) were purified from the seeds of white sword bean (Canavalia gladiata) by FPLC and reversed-phase HPLC. The sequences of these inhibitors were established by automatic Edman degradation and TOF-mass spectrometry. SBI-1, -2, and -3 consisted of 72, 73, and 75 amino acid residues, with molecular masses of 7806.5, 7919.8, and 8163.4, respectively. The sequences of SBI-1 and -2 coincided with those of CLT I and II [Terada et al. (1994) Biosci. Biotech. Biochem., 58, 376-379] except only N- or C-terminal amino acid residues. Analysis of the amino acid sequences showed that the active sites of the inhibitors contained a Lys21-Ser22 against trypsin and Leu48-Ser49 against chymotrypsin, respectively. Further, it became apparent that about seven disulfide bonds were present. These results suggest that sword bean inhibitors are members of the Bowman-Birk proteinase inhibitor family.

  8. Random Amino Acid Mutations and Protein Misfolding Lead to Shannon Limit in Sequence-Structure Communication

    PubMed Central

    Lisewski, Andreas Martin

    2008-01-01

    The transmission of genomic information from coding sequence to protein structure during protein synthesis is subject to stochastic errors. To analyze transmission limits in the presence of spurious errors, Shannon's noisy channel theorem is applied to a communication channel between amino acid sequences and their structures established from a large-scale statistical analysis of protein atomic coordinates. While Shannon's theorem confirms that in close to native conformations information is transmitted with limited error probability, additional random errors in sequence (amino acid substitutions) and in structure (structural defects) trigger a decrease in communication capacity toward a Shannon limit at 0.010 bits per amino acid symbol at which communication breaks down. In several controls, simulated error rates above a critical threshold and models of unfolded structures always produce capacities below this limiting value. Thus an essential biological system can be realistically modeled as a digital communication channel that is (a) sensitive to random errors and (b) restricted by a Shannon error limit. This forms a novel basis for predictions consistent with observed rates of defective ribosomal products during protein synthesis, and with the estimated excess of mutual information in protein contact potentials. PMID:18769673

  9. Whole-Genome Sequence Analysis of Bombella intestini LMG 28161T, a Novel Acetic Acid Bacterium Isolated from the Crop of a Red-Tailed Bumble Bee, Bombus lapidarius

    PubMed Central

    Li, Leilei; Illeghems, Koen; Van Kerrebroeck, Simon; Borremans, Wim; Cleenwerck, Ilse; Smagghe, Guy; De Vuyst, Luc

    2016-01-01

    The whole-genome sequence of Bombella intestini LMG 28161T, an endosymbiotic acetic acid bacterium (AAB) occurring in bumble bees, was determined to investigate the molecular mechanisms underlying its metabolic capabilities. The draft genome sequence of B. intestini LMG 28161T was 2.02 Mb. Metabolic carbohydrate pathways were in agreement with the metabolite analyses of fermentation experiments and revealed its oxidative capacity towards sucrose, D-glucose, D-fructose and D-mannitol, but not ethanol and glycerol. The results of the fermentation experiments also demonstrated that the lack of effective aeration in small-scale carbohydrate consumption experiments may be responsible for the lack of reproducibility of such results in taxonomic studies of AAB. Finally, compared to the genome sequences of its nearest phylogenetic neighbor and of three other insect associated AAB strains, the B. intestini LMG 28161T genome lost 69 orthologs and included 89 unique genes. Although many of the latter were hypothetical they also included several type IV secretion system proteins, amino acid transporter/permeases and membrane proteins which might play a role in the interaction with the bumble bee host. PMID:27851750

  10. FAST: FAST Analysis of Sequences Toolbox

    PubMed Central

    Lawrence, Travis J.; Kauffman, Kyle T.; Amrine, Katherine C. H.; Carper, Dana L.; Lee, Raymond S.; Becich, Peter J.; Canales, Claudia J.; Ardell, David H.

    2015-01-01

    FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought. PMID:26042145

  11. Bayesian Correlation Analysis for Sequence Count Data.

    PubMed

    Sánchez-Taltavull, Daniel; Ramachandran, Parameswaran; Lau, Nelson; Perkins, Theodore J

    2016-01-01

    Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low-especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities' signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset.

  12. Bayesian Correlation Analysis for Sequence Count Data

    PubMed Central

    Lau, Nelson; Perkins, Theodore J.

    2016-01-01

    Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities’ measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low—especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities’ signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset. PMID:27701449

  13. The amino acid sequence of the aspartate aminotransferase from baker's yeast (Saccharomyces cerevisiae).

    PubMed Central

    Cronin, V B; Maras, B; Barra, D; Doonan, S

    1991-01-01

    1. The single (cytosolic) aspartate aminotransferase was purified in high yield from baker's yeast (Saccharomyces cerevisiae). 2. Amino-acid-sequence analysis was carried out by digestion of the protein with trypsin and with CNBr; some of the peptides produced were further subdigested with Staphylococcus aureus V8 proteinase or with pepsin. Peptides were sequenced by the dansyl-Edman method and/or by automated gas-phase methods. The amino acid sequence obtained was complete except for a probable gap of two residues as indicated by comparison with the structures of counterpart proteins in other species. 3. The N-terminus of the enzyme is blocked. Fast-atom-bombardment m.s. was used to identify the blocking group as an acetyl one. 4. Alignment of the sequence of the enzyme with those of vertebrate cytosolic and mitochondrial aspartate aminotransferases and with the enzyme from Escherichia coli showed that about 25% of residues are conserved between these distantly related forms. 5. Experimental details and confirmatory data for the results presented here are given in a Supplementary Publication (SUP 50164, 25 pages) that has been deposited at the British Library Document Supply Centre, Boston Spa. Wetherby, West Yorkshire LS23 7 BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1991) 273, 5. PMID:1859361

  14. CAFE: aCcelerated Alignment-FrEe sequence analysis.

    PubMed

    Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A; Waterman, Michael S; Sun, Fengzhu

    2017-05-03

    Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Sequence analysis of porothramycin biosynthetic gene cluster.

    PubMed

    Najmanova, Lucie; Ulanova, Dana; Jelinkova, Marketa; Kamenik, Zdenek; Kettnerova, Eliska; Koberska, Marketa; Gazak, Radek; Radojevic, Bojana; Janata, Jiri

    2014-11-01

    The biosynthetic gene cluster of porothramycin, a sequence-selective DNA alkylating compound, was identified in the genome of producing strain Streptomyces albus subsp. albus (ATCC 39897) and sequentially characterized. A 39.7 kb long DNA region contains 27 putative genes, 18 of them revealing high similarity with homologous genes from biosynthetic gene cluster of closely related pyrrolobenzodiazepine (PBD) compound anthramycin. However, considering the structures of both compounds, the number of differences in the gene composition of compared biosynthetic gene clusters was unexpectedly high, indicating participation of alternative enzymes in biosynthesis of both porothramycin precursors, anthranilate, and branched L-proline derivative. Based on the sequence analysis of putative NRPS modules Por20 and Por21, we suppose that in porothramycin biosynthesis, the methylation of anthranilate unit occurs prior to the condensation reaction, while modifications of branched proline derivative, oxidation, and dimethylation of the side chain occur on already condensed PBD core. Corresponding two specific methyltransferase encoding genes por26 and por25 were identified in the porothramycin gene cluster. Surprisingly, also methyltransferase gene por18 homologous to orf19 from anthramycin biosynthesis was detected in porothramycin gene cluster even though the appropriate biosynthetic step is missing, as suggested by ultra high-performance liquid chromatography-diode array detection-mass spectrometry (UHPLC-DAD-MS) analysis of the product in the S. albus culture broth.

  16. Whole genome sequence analysis of Mycobacterium suricattae.

    PubMed

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  17. Efficient algorithms for molecular sequence analysis.

    PubMed Central

    Karlin, S; Morris, M; Ghandour, G; Leung, M Y

    1988-01-01

    Efficient (linear time) algorithms are described for identifying global molecular sequence features allowing for errors including repeats, matches between sequences, dyad symmetry pairings, and other sequence patterns. A multiple sequence alignment algorithm is also described. Specific applications are given to hepatitis B viruses and the J5-C (J, joining; C, constant) region of the immunoglobulin kappa gene. PMID:3124111

  18. The complete amino acid sequence of ubiquitin, an adenylate cyclase stimulating polypeptide probably universal in living cells.

    PubMed

    Schlesinger, D H; Goldstein, G; Niall, H D

    1975-05-20

    The complete amino acid sequence was determined for bovine ubiquitin, and adenylate cyclase stimulating polypeptide, which is probably represented universally in living cells. Ubiquitin has a molecular weight of 8451 and consists of a single polypeptide chain containing 74 amino acid residues. It contains four arginine residues but no cysteine or trytophan residues. The first 61 amino acid residues were obtained by automated Edman degradations. Tryptic digestion of maleated ubiquitin yielded four peptide fragments that were resolved by molecular sieve chromatography and coded in order of decreasing chain length (MT-1, MT-2, MT-3, and MT-4). The automated sequenator determinations on native ubiquintin provided overlapping sequence data for three of these fragments that gave an order of MT-1, MT-3, and then MT-2; Peptide MT-4, a dipeptide, was therefore assigned to the C terminus, and the placement of peptide MT-2 was corroborated by analysis of data from carboxypeptidase digestions of maleated ubiquitin. Peptide MT-2 was domaleated and sequenced by manual Edman degradations through a single lysine residue. It was cleaved at this residue with trypsin, and the two resultant peptides were separated by ion-exchange chromatography. Manual sequencing of the C-terminal demaleated tryptic peptide of MT-2 completed the sequence of MT-2 and that of native ubiquitin. The sequence of ubiquitin was further confirmed and supported by amino acid and parital sequence anlysis of fragments obtained by digestion of maleated ubiquitin with chymotrypsin or staphylococcal protease.

  19. Complete amino acid sequence of a histidine-rich proteolytic fragment of human ceruloplasmin.

    PubMed

    Kingston, I B; Kingston, B L; Putnam, F W

    1979-04-01

    The complete amino acid sequence has been determined for a fragment of human ceruloplasmin [ferroxidase; iron(II):oxygen oxidoreductase, EC 1.16.3.1]. The fragment (designated Cp F5) contains 159 amino acid residues and has a molecular weight of 18,650; it lacks carbohydrate, is rich in histidine, and contains one free cysteine that may be part of a copper-binding site. This fragment is present in most commercial preparations of ceruloplasmin, probably owing to proteolytic degradation, but can also be obtained by limited cleavage of single-chain ceruloplasmin with plasmin. Cp F5 probably is an intact domain attached to the COOH-terminal end of single-chain ceruloplasmin via a labile interdomain peptide bond. A model of the secondary structure predicted by empirical methods suggests that almost one-third of the amino acid residues are distributed in alpha helices, about a third in beta-sheet structure, and the remainder in beta turns and unidentified structures. Computer analysis of the amino acid sequence has not demonstrated a statistically significant relationship between this ceruloplasmin fragment and any other protein, but there is some evidence for an internal duplication.

  20. The ABRF Edman Sequencing Research Group 2008 Study: Investigation into Homopolymeric Amino Acid N-Terminal Sequence Tags and Their Effects on Automated Edman Degradation

    PubMed Central

    Thoma, R. S.; Smith, J. S.; Sandoval, W.; Leone, J. W.; Hunziker, P.; Hampton, B.; Linse, K. D.; Denslow, N. D.

    2009-01-01

    The Edman Sequence Research Group (ESRG) of the Association of Biomolecular Resource designs and executes interlaboratory studies investigating the use of automated Edman degradation for protein and peptide analysis. In 2008, the ESRG enlisted the help of core sequencing facilities to investigate the effects of a repeating amino acid tag at the N-terminus of a protein. Commonly, to facilitate protein purification, an affinity tag containing a polyhistidine sequence is conjugated to the N-terminus of the protein. After expression, polyhistidine-tagged protein is readily purified via chelation with an immobilized metal affinity resin. The addition of the polyhistidine tag presents unique challenges for the determination of protein identity using Edman degradation chemistry. Participating laboratories were asked to sequence one protein engineered in three configurations: with an N-terminal polyhistidine tag; with an N-terminal polyalanine tag; or with no tag. Study participants were asked to return a data file containing the uncorrected amino acid picomole yields for the first 17 cycles. Initial and repetitive yield (R.Y.) information and the amount of lag were evaluated. Information about instrumentation and sample treatment was also collected as part of the study. For this study, the majority of participating laboratories successfully called the amino acid sequence for 17 cycles for all three test proteins. In general, laboratories found it more difficult to call the sequence containing the polyhistidine tag. Lag was observed earlier and more consistently with the polyhistidine-tagged protein than the polyalanine-tagged protein. Histidine yields were significantly less than the alanine yields in the tag portion of each analysis. The polyhistidine and polyalanine protein-R.Y. calculations were found to be equivalent. These calculations showed that the nontagged portion from each protein was equivalent. The terminal histidines from the tagged portion of the protein

  1. Multilocus sequence analysis of the family Halomonadaceae.

    PubMed

    de la Haba, Rafael R; Márquez, M Carmen; Papke, R Thane; Ventosa, Antonio

    2012-03-01

    Multilocus sequence analysis (MLSA) protocols have been developed for species circumscription for many taxa. However, at present, no studies based on MLSA have been performed within any moderately halophilic bacterial group. To test the usefulness of MLSA with these kinds of micro-organisms, the family Halomonadaceae, which includes mainly halophilic bacteria, was chosen as a model. This family comprises ten genera with validly published names and 85 species of environmental, biotechnological and clinical interest. In some cases, the phylogenetic relationships between members of this family, based on 16S rRNA gene sequence comparisons, are not clear and a deep phylogenetic analysis using several housekeeping genes seemed appropriate. Here, MLSA was applied using the 16S rRNA, 23S rRNA, atpA, gyrB, rpoD and secA genes for species of the family Halomonadaceae. Phylogenetic trees based on the individual and concatenated gene sequences revealed that the family Halomonadaceae formed a monophyletic group of micro-organisms within the order Oceanospirillales. With the exception of the genera Halomonas and Modicisalibacter, all other genera within this family were phylogenetically coherent. Five of the six studied genes (16S rRNA, 23S rRNA, gyrB, rpoD and secA) showed a consistent evolutionary history. However, the results obtained with the atpA gene were different; thus, this gene may not be considered useful as an individual gene phylogenetic marker within this family. The phylogenetic methods produced variable results, with those generated from the maximum-likelihood and neighbour-joining algorithms being more similar than those obtained by maximum-parsimony methods. Horizontal gene transfer (HGT) plays an important evolutionary role in the family Halomonadaceae; however, the impact of recombination events in the phylogenetic analysis was minimized by concatenating the six loci, which agreed with the current taxonomic scheme for this family. Finally, the findings of

  2. Position-dependent effects of locked nucleic acid (LNA) on DNA sequencing and PCR primers

    PubMed Central

    Levin, Joshua D.; Fiala, Dean; Samala, Meinrado F.; Kahn, Jason D.; Peterson, Raymond J.

    2006-01-01

    Genomes are becoming heavily annotated with important features. Analysis of these features often employs oligonucleotides that hybridize at defined locations. When the defined location lies in a poor sequence context, traditional design strategies may fail. Locked Nucleic Acid (LNA) can enhance oligonucleotide affinity and specificity. Though LNA has been used in many applications, formal design rules are still being defined. To further this effort we have investigated the effect of LNA on the performance of sequencing and PCR primers in AT-rich regions, where short primers yield poor sequencing reads or PCR yields. LNA was used in three positional patterns: near the 5′ end (LNA-5′), near the 3′ end (LNA-3′) and distributed throughout (LNA-Even). Quantitative measures of sequencing read length (Phred Q30 count) and real-time PCR signal (cycle threshold, CT) were characterized using two-way ANOVA. LNA-5′ increased the average Phred Q30 score by 60% and it was never observed to decrease performance. LNA-5′ generated cycle thresholds in quantitative PCR that were comparable to high-yielding conventional primers. In contrast, LNA-3′ and LNA-Even did not improve read lengths or CT. ANOVA demonstrated the statistical significance of these results and identified significant interaction between the positional design rule and primer sequence. PMID:17071964

  3. Repetitive sequence analysis and karyotyping reveals centromere-associated DNA sequences in radish (Raphanus sativus L.).

    PubMed

    He, Qunyan; Cai, Zexi; Hu, Tianhua; Liu, Huijun; Bao, Chonglai; Mao, Weihai; Jin, Weiwei

    2015-04-18

    Radish (Raphanus sativus L., 2n = 2x = 18) is a major root vegetable crop especially in eastern Asia. Radish root contains various nutritions which play an important role in strengthening immunity. Repetitive elements are primary components of the genomic sequence and the most important factors in genome size variations in higher eukaryotes. To date, studies about repetitive elements of radish are still limited. To better understand genome structure of radish, we undertook a study to evaluate the proportion of repetitive elements and their distribution in radish. We conducted genome-wide characterization of repetitive elements in radish with low coverage genome sequencing followed by similarity-based cluster analysis. Results showed that about 31% of the genome was composed of repetitive sequences. Satellite repeats were the most dominating elements of the genome. The distribution pattern of three satellite repeat sequences (CL1, CL25, and CL43) on radish chromosomes was characterized using fluorescence in situ hybridization (FISH). CL1 was predominantly located at the centromeric region of all chromosomes, CL25 located at the subtelomeric region, and CL43 was a telomeric satellite. FISH signals of two satellite repeats, CL1 and CL25, together with 5S rDNA and 45S rDNA, provide useful cytogenetic markers to identify each individual somatic metaphase chromosome. The centromere-specific histone H3 (CENH3) has been used as a marker to identify centromere DNA sequences. One putative CENH3 (RsCENH3) was characterized and cloned from radish. Its deduced amino acid sequence shares high similarities to those of the CENH3s in Brassica species. An antibody against B. rapa CENH3, specifically stained radish centromeres. Immunostaining and chromatin immunoprecipitation (ChIP) tests with anti-BrCENH3 antibody demonstrated that both the centromere-specific retrotransposon (CR-Radish) and satellite repeat (CL1) are directly associated with RsCENH3 in radish. Proportions

  4. DNA methylation detection: bisulfite genomic sequencing analysis.

    PubMed

    Li, Yuanyuan; Tollefsbol, Trygve O

    2011-01-01

    DNA methylation, which most commonly occurs at the C5 position of cytosines within CpG dinucleotides, plays a pivotal role in many biological procedures such as gene expression, embryonic development, cellular proliferation, differentiation, and chromosome stability. Aberrant DNA methylation is often associated with loss of DNA homeostasis and genomic instability leading to the development of human diseases such as cancer. The importance of DNA methylation creates an urgent demand for effective methods with high sensitivity and reliability to explore innovative diagnostic and therapeutic strategies. Bisulfite genomic sequencing developed by Frommer and colleagues was recognized as a revolution in DNA methylation analysis based on conversion of genomic DNA by using sodium bisulfite. Besides various merits of the bisulfite genomic sequencing method such as being highly qualitative and quantitative, it serves as a fundamental principle to many derived methods to better interpret the mystery of DNA methylation. Here, we present a protocol currently frequently used in our laboratory that has proven to yield optimal outcomes. We also discuss the potential technical problems and troubleshooting notes for a variety of applications in this field.

  5. Whole-Genome Sequencing in Outbreak Analysis

    PubMed Central

    Turner, Stephen D.; Riley, Margaret F.; Petri, William A.; Hewlett, Erik L.

    2015-01-01

    SUMMARY In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  6. SBSPKS: structure based sequence analysis of polyketide synthases

    PubMed Central

    Anand, Swadha; Prasad, M. V. R.; Yadav, Gitanjali; Kumar, Narendra; Shehara, Jyoti; Ansari, Md. Zeeshan; Mohanty, Debasisa

    2010-01-01

    Polyketide synthases (PKSs) catalyze biosynthesis of a diverse family of pharmaceutically important secondary metabolites. Bioinformatics analysis of sequence and structural features of PKS proteins plays a crucial role in discovery of new natural products by genome mining, as well as in design of novel secondary metabolites by biosynthetic engineering. The availability of the crystal structures of various PKS catalytic and docking domains, and mammalian fatty acid synthase module prompted us to develop SBSPKS software which consists of three major components. Model_3D_PKS can be used for modeling, visualization and analysis of 3D structure of individual PKS catalytic domains, dimeric structures for complete PKS modules and prediction of substrate specificity. Dock_Dom_Anal identifies the key interacting residue pairs in inter-subunit interfaces based on alignment of inter-polypeptide linker sequences to the docking domain structure. In case of modular PKS with multiple open reading frames (ORFs), it can predict the cognate order of substrate channeling based on combinatorial evaluation of all possible interface contacts. NRPS–PKS provides user friendly tools for identifying various catalytic domains in the sequence of a Type I PKS protein and comparing them with experimentally characterized PKS/NRPS clusters cataloged in the backend databases of SBSPKS. SBSPKS is available at http://www.nii.ac.in/sbspks.html. PMID:20444870

  7. Amino acid substitutions in genetic variants of human serum albumin and in sequences inferred from molecular cloning

    SciTech Connect

    Takahashi, N.; Takahashi, Y.; Blumberg, B.S.; Putnam, F.W.

    1987-07-01

    The structural changes in four genetic variants of human serum albumin were analyzed by tandem high-pressure liquid chromatography (HPLC) of the tryptic peptides, HPLC mapping and isoelectric focusing of the CNBr fragments, and amino acid sequence analysis of the purified peptides. Lysine-372 of normal (common) albumin A was changed to glutamic acid both in albumin Naskapi, a widespread polymorphic variant of North American Indians, and in albumin Mersin found in Eti Turks. The two variants also exhibited anomalous migration in NaDodSO/sub 4//PAGE, which is attributed to a conformational change. The identity of albumins Naskapi and Mersin may have originated through descent from a common mid-Asiatic founder of the two migrating ethnic groups, or it may represent identical but independent mutations of the albumin gene. In albumin Adana, from Eti Turks, the substitution site was not identified but was localized to the region from positions 447 through 548. The substitution of aspartic acid-550 by glycine was found in albumin Mexico-2 from four individuals of the Pima tribe. Although only single-point substitutions have been found in these and in certain other genetic variants of human albumin, five differences exist in the amino acid sequences inferred from cDNA sequences by workers in three other laboratories. However, our results on albumin A and on 14 different genetic variants accord with the amino acid sequence of albumin deduced from the genomic sequence. The apparent amino acid substitutions inferred from comparison of individual cDNA sequences probably reflect artifacts in cloning or in cDNA sequence analysis rather than polymorphism of the coding sections of the albumin gene.

  8. RIKEN integrated sequence analysis (RISA) system--384-format sequencing pipeline with 384 multicapillary sequencer.

    PubMed

    Shibata, K; Itoh, M; Aizawa, K; Nagaoka, S; Sasaki, N; Carninci, P; Konno, H; Akiyama, J; Nishi, K; Kitsunai, T; Tashiro, H; Itoh, M; Sumi, N; Ishii, Y; Nakamura, S; Hazama, M; Nishine, T; Harada, A; Yamamoto, R; Matsumoto, H; Sakaguchi, S; Ikegami, T; Kashiwagi, K; Fujiwake, S; Inoue, K; Togawa, Y

    2000-11-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3' end and 5' end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can be

  9. RIKEN Integrated Sequence Analysis (RISA) System—384-Format Sequencing Pipeline with 384 Multicapillary Sequencer

    PubMed Central

    Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide

    2000-01-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can

  10. Sequence analysis of the Choristoneura occidentalis granulovirus genome.

    PubMed

    Escasa, Shannon R; Lauzon, Hilary A M; Mathur, Amanda C; Krell, Peter J; Arif, Basil M

    2006-07-01

    The genome of the Choristoneura occidentalis granulovirus (ChocGV) isolated from the western spruce budworm, Choristoneura occidentalis, was sequenced completely. It was 104,710 bp long, with a 67.3% A+T content and contained 116 potential open reading frames (ORFs) covering 88.4% of the genome. Of these, 29 ORFs were conserved in all fully sequenced baculovirus genomes, 30 were GV-specific, 53 were present in some nucleopolyhedroviruses (NPVs) and/or GVs, three were common to ChocGV and Choristoneura fumiferana GV (ChfuGV) and one was so far unique. To date, ChocGV is the only GV identified that contains a homologue of the apoptosis inhibitor protein P35/P49, present in some group I NPVs. It is also the first GV without a Xestia c-nigrum GV ORF 26 homologue. Five homologous regions (hrs)/repeat regions, lacking typical NPV hr palindromes were identified. ChocGV hrs were similar to each other but not to other GV hrs. A 1.8 kb repeat region with a high A+T content (81%) and multiple repeats of 21-210 bp was found between choc36 and 37. This area resembled the non-homologous region origin of DNA replication (non-hr ori) identified in Cryptophlebia leucotreta GV (CrleGV) and Cydia pomonella GV (CpGV). Based on the mean amino acid identities of homologous proteins, ChocGV was closest to fully sequenced genomes CpGV (52.3%) and CrleGV (52.1%). The closest amino acid identity was to individual ORFs from the partially sequenced ChfuGV genome (97.2% in 38 ORFs). Phylogenetic analysis placed ChocGV in a clade with CrleGV and CpGV.

  11. Time fluctuation analysis of forest fire sequences

    NASA Astrophysics Data System (ADS)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value

  12. Entropy analysis of substitutive sequences revisited

    NASA Astrophysics Data System (ADS)

    Karamanos, K.

    2001-11-01

    A given finite sequence of letters over a finite alphabet can always be algorithmically generated, in particular by a Turing machine. This fact is at the heart of complexity theory in the sense of Kolmogorov and Chaitin. A relevant question in this context is whether, given a statistically 'sufficiently long' sequence, there exists a deterministic finite automaton that generates it. In this paper we propose a simple criterion, based on measuring block entropies by lumping, which is satisfied by all automatic sequences. On the basis of this, one can determine that a given sequence is not automatic and obtain interesting information when the sequence is automatic. Following previous work on the Feigenbaum sequence, we give a necessary entropy-based condition valid for all automatic sequences read by lumping. Applications of these ideas to representative examples are discussed. In particular, we establish new entropic decimation schemes for the Thue-Morse, the Rudin-Shapiro and the paperfolding sequences read by lumping.

  13. Complete amino acid sequence and structure characterization of the taste-modifying protein, miraculin.

    PubMed

    Theerasilp, S; Hitotsuya, H; Nakajo, S; Nakaya, K; Nakamura, Y; Kurihara, Y

    1989-04-25

    The taste-modifying protein, miraculin, has the unusual property of modifying sour taste into sweet taste. The complete amino acid sequence of miraculin purified from miracle fruits by a newly developed method (Theerasilp, S., and Kurihara, Y. (1988) J. Biol. Chem. 263, 11536-11539) was determined by an automatic Edman degradation method. Miraculin was a single polypeptide with 191 amino acid residues. The calculated molecular weight based on the amino acid sequence and the carbohydrate content (13.9%) was 24,600. Asn-42 and Asn-186 were linked N-glycosidically to carbohydrate chains. High homology was found between the amino acid sequences of miraculin and soybean trypsin inhibitor.

  14. Detection and isolation of nucleic acid sequences using a bifunctional hybridization probe

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2000-01-01

    A method for detecting and isolating a target sequence in a sample of nucleic acids is provided using a bifunctional hybridization probe capable of hybridizing to the target sequence that includes a detectable marker and a first complexing agent capable of forming a binding pair with a second complexing agent. A kit is also provided for detecting a target sequence in a sample of nucleic acids using a bifunctional hybridization probe according to this method.

  15. Structural similarity between native proteins and chimera constructs obtained by inverting the amino Acid sequence.

    PubMed

    Carugo, Oliviero

    2010-12-01

    The analysis of the symmetry of protein three-dimensional structures can be extremely useful in order to understand and classify the protein structural universe. The structures of proteins with back-traced amino acid sequence were modeled and compared to the structures of their native counterparts. Only in a very limited set of cases, the two objects showed a significant level of similarity. These extremely symmetric examples can be of any structural class and of any dimension. The lack of biunique "N to C" and "C to N" symmetry at the structural level mirrors that at the sequence level and we propose to design as a dlof symmetry the cases in which a protein structure is similar to its back-traced variant.

  16. Microbial community dynamics in bioaugmented sequencing batch reactors for bromoamine acid removal.

    PubMed

    Qu, Yuanyuan; Zhou, Jiti; Wang, Jing; Fu, Xiang; Xing, Linlin

    2005-05-01

    Sphingomonas xenophaga QYY with the ability to degrade bromoamine acid (BAA) was previously isolated from sludge samples. The enhancement of BAA removal by strain QYY in sequencing batch reactors (SBRs) was investigated in this study. The results showed that augmented SBRs exhibited stronger abilities to degrade BAA than the non-augmented control one. In order to estimate the relationship between community dynamics and function of augmented SBRs, a combined method based on fingerprints (ribosomal intergenic spacer analysis, RISA) and 16S rRNA gene sequencing was used. The results indicated that the microbial community dynamics were substantially changed, and the introduced strain QYY was persistent in the augmented systems. This study suggests that it is feasible and potentially useful to enhance BAA removal using BAA-degrading bacteria, such as S. xenophaga QYY.

  17. Direct Chloroplast Sequencing: Comparison of Sequencing Platforms and Analysis Tools for Whole Chloroplast Barcoding

    PubMed Central

    Brozynska, Marta; Furtado, Agnelo; Henry, Robert James

    2014-01-01

    Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina) and Ion Torrent (Life Technology) sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare). Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels) between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis. PMID:25329378

  18. Comparative Analysis of Genome Sequences Covering the Seven Cronobacter Species

    PubMed Central

    Cummings, Craig A.; Shih, Rita; Degoricija, Lovorka; Rico, Alain; Brzoska, Pius; Hamby, Stephen E.; Masood, Naqash; Hariri, Sumyya; Sonbol, Hana; Chuzhanova, Nadia; McClelland, Michael; Furtado, Manohar R.; Forsythe, Stephen J.

    2012-01-01

    Background Species of Cronobacter are widespread in the environment and are occasional food-borne pathogens associated with serious neonatal diseases, including bacteraemia, meningitis, and necrotising enterocolitis. The genus is composed of seven species: C. sakazakii, C. malonaticus, C. turicensis, C. dublinensis, C. muytjensii, C. universalis, and C. condimenti. Clinical cases are associated with three species, C. malonaticus, C. turicensis and, in particular, with C. sakazakii multilocus sequence type 4. Thus, it is plausible that virulence determinants have evolved in certain lineages. Methodology/Principal Findings We generated high quality sequence drafts for eleven Cronobacter genomes representing the seven Cronobacter species, including an ST4 strain of C. sakazakii. Comparative analysis of these genomes together with the two publicly available genomes revealed Cronobacter has over 6,000 genes in one or more strains and over 2,000 genes shared by all Cronobacter. Considerable variation in the presence of traits such as type six secretion systems, metal resistance (tellurite, copper and silver), and adhesins were found. C. sakazakii is unique in the Cronobacter genus in encoding genes enabling the utilization of exogenous sialic acid which may have clinical significance. The C. sakazakii ST4 strain 701 contained additional genes as compared to other C. sakazakii but none of them were known specific virulence-related genes. Conclusions/Significance Genome comparison revealed that pair-wise DNA sequence identity varies between 89 and 97% in the seven Cronobacter species, and also suggested various degrees of divergence. Sets of universal core genes and accessory genes unique to each strain were identified. These gene sequences can be used for designing genus/species specific detection assays. Genes encoding adhesins, T6SS, and metal resistance genes as well as prophages are found in only subsets of genomes and have contributed considerably to the variation of

  19. Spermatogenesis of the lizard Lacerta vivipara: histological studies and amino acid sequence of a protamine lacertine 1.

    PubMed

    Martinage, A; Depeiges, A; Wouters, D; Morel, L; Sautière, P

    1996-06-01

    The lizard Lacerta vivipara is a seasonal breeder with a well characterized reproductive cycle. An histological study of the lizard testis has been performed at different stages of spermatogenesis and the nuclear basic proteins content was assessed by electrophoretical analysis. Two protamines, lacertines 1 and 2, are present in spermatozoa in April and May. We have isolated lacertine1 and characterized a protamine with a mass of 4,963.7 Da. Amino acid sequence of this protamine (41 residues) was established from data provided by automated Edman degradation. It is characterized by a basic amino acid stretch in the N- and C-terminal regions and by a central part which only consists of 3 different intermingled amino acids. This protamine presents 62% homology with scylliorhinine Z3 from dog-fish Scylliorhinus caniculus and 58% homology with quail protamine. The reported lizard protamine sequence is the first reptilian protamine sequence available so far.

  20. Whole exome sequence analysis of Peters anomaly.

    PubMed

    Weh, Eric; Reis, Linda M; Happ, Hannah C; Levin, Alex V; Wheeler, Patricia G; David, Karen L; Carney, Erin; Angle, Brad; Hauser, Natalie; Semina, Elena V

    2014-12-01

    Peters anomaly is a rare form of anterior segment ocular dysgenesis, which can also be associated with additional systemic defects. At this time, the majority of cases of Peters anomaly lack a genetic diagnosis. We performed whole exome sequencing of 27 patients with syndromic or isolated Peters anomaly to search for pathogenic mutations in currently known ocular genes. Among the eight previously recognized Peters anomaly genes, we identified a de novo missense mutation in PAX6, c.155G>A, p.(Cys52Tyr), in one patient. Analysis of 691 additional genes currently associated with a different ocular phenotype identified a heterozygous splicing mutation c.1025+2T>A in TFAP2A, a de novo heterozygous nonsense mutation c.715C>T, p.(Gln239*) in HCCS, a hemizygous mutation c.385G>A, p.(Glu129Lys) in NDP, a hemizygous mutation c.3446C>T, p.(Pro1149Leu) in FLNA, and compound heterozygous mutations c.1422T>A, p.(Tyr474*) and c.2544G>A, p.(Met848Ile) in SLC4A11; all mutations, except for the FLNA and SLC4A11 c.2544G>A alleles, are novel. This is the first study to use whole exome sequencing to discern the genetic etiology of a large cohort of patients with syndromic or isolated Peters anomaly. We report five new genes associated with this condition and suggest screening of TFAP2A and FLNA in patients with Peters anomaly and relevant syndromic features and HCCS, NDP and SLC4A11 in patients with isolated Peters anomaly.

  1. Whole exome sequence analysis of Peters anomaly

    PubMed Central

    Weh, Eric; Reis, Linda M.; Happ, Hannah C.; Levin, Alex V.; Wheeler, Patricia G.; David, Karen L.; Carney, Erin; Angle, Brad; Hauser, Natalie

    2015-01-01

    Peters anomaly is a rare form of anterior segment ocular dysgenesis, which can also be associated with additional systemic defects. At this time, the majority of cases of Peters anomaly lack a genetic diagnosis. We performed whole exome sequencing of 27 patients with syndromic or isolated Peters anomaly to search for pathogenic mutations in currently known ocular genes. Among the eight previously recognized Peters anomaly genes, we identified a de novo missense mutation in PAX6, c.155G>A, p.(Cys52Tyr), in one patient. Analysis of 691 additional genes currently associated with a different ocular phenotype identified a heterozygous splicing mutation c.1025+2T>A in TFAP2A, a de novo heterozygous nonsense mutation c.715C>T, p.(Gln239*) in HCCS, a hemizygous mutation c.385G>A, p.(Glu129Lys) in NDP, a hemizygous mutation c.3446C>T, p.(Pro1149Leu) in FLNA, and compound heterozygous mutations c.1422T>A, p.(Tyr474*) and c.2544G>A, p.(Met848Ile) in SLC4A11; all mutations, except for the FLNA and SLC4A11 c.2544G>A alleles, are novel. This is the frst study to use whole exome sequencing to discern the genetic etiology of a large cohort of patients with syndromic or isolated Peters anomaly. We report five new genes associated with this condition and suggest screening of TFAP2A and FLNA in patients with Peters anomaly and relevant syndromic features and HCCS, NDP and SLC4A11 in patients with isolated Peters anomaly. PMID:25182519

  2. The Role of HIV-1 gp41 Glycoprotein in Infectious Tropism Inferred from Physico-Chemical Properties of its Amino Acid Sequence

    NASA Astrophysics Data System (ADS)

    Figueroa, E.; Villarreal, C.; Huerta, L.; Cocho, G.

    2006-09-01

    We performed a statistical analysis of the amino acid sequence of the gp41 ectodomain of the Human Immunodeficiency Virus type 1. We found strong correlations between physicochemical properties of highly variable residues and the viral infectious tropism.

  3. Analysis of human immunodeficiency virus type 1 nef gene sequences present in vivo.

    PubMed Central

    Shugars, D C; Smith, M S; Glueck, D H; Nantermet, P V; Seillier-Moiseiwitsch, F; Swanstrom, R

    1993-01-01

    The nef genes of the human immunodeficiency viruses type 1 and 2 (HIV-1 and HIV-2) and the related simian immunodeficiency viruses (SIVs) encode a protein (Nef) whose role in virus replication and cytopathicity remains uncertain. As an attempt to elucidate the function of nef, we characterized the nucleotide and corresponding protein sequences of naturally occurring nef genes obtained from several HIV-1-infected individuals. A consensus Nef sequence was derived and used to identify several features that were highly conserved among the Nef sequences. These features included a nearly invariant myristylation signal, regions of sequence polymorphism and variable duplication, a region with an acidic charge, a (Pxx)4 repeat sequence, and a potential protein kinase C phosphorylation site. Clustering of premature stop codons at position 124 was noted in 6 of the 54 Nef sequences. Further analysis revealed four stretches of residues that were highly conserved not only among the patient-derived HIV-1 Nef sequences, but also among the Nef sequences of HIV-2 and the SIVs, suggesting that Nef proteins expressed by these retroviruses are functionally equivalent. The "Nef-defining" sequences were used to evaluate the sequence alignments of known proteins reported to share sequence similarity with Nef sequences and to conduct additional computer-based searches for similar protein sequences. A gene encoding the consensus Nef sequence was also generated. This gene encodes a full-length Nef protein that should be a valuable tool in further studies of Nef function. Images PMID:8043040

  4. Bacteria obtained from a sequencing batch reactor that are capable of growth on dehydroabietic acid.

    PubMed Central

    Mohn, W W

    1995-01-01

    Eleven isolates capable of growth on the resin acid dehydroabietic acid (DhA) were obtained from a sequencing batch reactor designed to treat a high-strength process stream from a paper mill. The isolates belonged to two groups, represented by strains DhA-33 and DhA-35, which were characterized. In the bioreactor, bacteria like DhA-35 were more abundant than those like DhA-33. The population in the bioreactor of organisms capable of growth on DhA was estimated to be 1.1 x 10(6) propagules per ml, based on a most-probable-number determination. Analysis of small-subunit rRNA partial sequences indicated that DhA-33 was most closely related to Sphingomonas yanoikuyae (Sab = 0.875) and that DhA-35 was most closely related to Zoogloea ramigera (Sab = 0.849). Both isolates additionally grew on other abietanes, i.e., abietic and palustric acids, but not on the pimaranes, pimaric and isopimaric acids. For DhA-33 and DhA-35 with DhA as the sole organic substrate, doubling times were 2.7 and 2.2 h, respectively, and growth yields were 0.30 and 0.25 g of protein per g of DhA, respectively. Glucose as a cosubstrate stimulated growth of DhA-33 on DhA and stimulated DhA degradation by the culture. Pyruvate as a cosubstrate did not stimulate growth of DhA-35 on DhA and reduced the specific rate of DhA degradation of the culture. DhA induced DhA and abietic acid degradation activities in both strains, and these activities were heat labile. Cell suspensions of both strains consumed DhA at a rate of 6 mumol mg of protein-1 h-1.(ABSTRACT TRUNCATED AT 250 WORDS) PMID:7793937

  5. Bacteria obtained from a sequencing batch reactor that are capable of growth on dehydroabietic acid.

    PubMed

    Mohn, W W

    1995-06-01

    Eleven isolates capable of growth on the resin acid dehydroabietic acid (DhA) were obtained from a sequencing batch reactor designed to treat a high-strength process stream from a paper mill. The isolates belonged to two groups, represented by strains DhA-33 and DhA-35, which were characterized. In the bioreactor, bacteria like DhA-35 were more abundant than those like DhA-33. The population in the bioreactor of organisms capable of growth on DhA was estimated to be 1.1 x 10(6) propagules per ml, based on a most-probable-number determination. Analysis of small-subunit rRNA partial sequences indicated that DhA-33 was most closely related to Sphingomonas yanoikuyae (Sab = 0.875) and that DhA-35 was most closely related to Zoogloea ramigera (Sab = 0.849). Both isolates additionally grew on other abietanes, i.e., abietic and palustric acids, but not on the pimaranes, pimaric and isopimaric acids. For DhA-33 and DhA-35 with DhA as the sole organic substrate, doubling times were 2.7 and 2.2 h, respectively, and growth yields were 0.30 and 0.25 g of protein per g of DhA, respectively. Glucose as a cosubstrate stimulated growth of DhA-33 on DhA and stimulated DhA degradation by the culture. Pyruvate as a cosubstrate did not stimulate growth of DhA-35 on DhA and reduced the specific rate of DhA degradation of the culture. DhA induced DhA and abietic acid degradation activities in both strains, and these activities were heat labile. Cell suspensions of both strains consumed DhA at a rate of 6 mumol mg of protein-1 h-1.(ABSTRACT TRUNCATED AT 250 WORDS)

  6. Cloning and sequence analysis of the muramidase-2 gene from Enterococcus hirae.

    PubMed Central

    Chu, C P; Kariyama, R; Daneo-Moore, L; Shockman, G D

    1992-01-01

    Extracellular muramidase-2 of Enterococcus hirae ATCC 9790 was purified to homogeneity by substrate binding, guanidine-HCl extraction, and reversed-phase chromatography. A monoclonal antibody, 2F8, which specifically recognizes muramidase-2, was used to screen a genomic library of E. hirae ATCC 9790 DNA in bacteriophage lambda gt11. A positive phage clone containing a 4.5-kb DNA insert was isolated and analyzed. The EcoRI-digested 4.5-kb fragment was cut into 2.3-, 1.0-, and 1.5-kb pieces by using restriction enzymes KpnI, Sau3AI, and PstI, and each fragment was subcloned into plasmid pJDC9 or pUC19. The nucleotide sequence of each subclone was determined. The sequence data indicated an open reading frame encoding a polypeptide of 666 amino acid residues, with a calculated molecular mass of 70,678 Da. The first 24 N-terminal amino acids of purified extracellular muramidase-2 were in very good agreement with the deduced amino acid sequence after a 49-amino-acid putative signal sequence. Analysis of the deduced amino acid sequence showed the presence at the C-terminal region of the protein of six highly homologous repeat units separated by nonhomologous intervening sequences that are highly enriched in serine and threonine. The overall sequence showed a high degree of homology with a recently cloned Streptococcus faecalis autolysin. Images PMID:1347040

  7. Amino acid sequences and structures of chicken and turkey beta 2-microglobulin.

    PubMed

    Welinder, K G; Jespersen, H M; Walther-Rasmussen, J; Skjødt, K

    1991-01-01

    The complete amino acid sequences of chicken and turkey beta 2-microglobulins have been determined by analyses of tryptic, V8-proteolytic and cyanogen bromide fragments, and by N-terminal sequencing. Mass spectrometric analysis of chicken beta 2-microglobulin supports the sequence-derived Mr of 11,048. The higher apparent Mr obtained for the avian beta 2-microglobulins as compared to human beta 2-microglobulin by SDS-PAGE is not understood. Chicken and turkey beta 2-microglobulin consist of 98 residues and deviate at seven positions: 60, 66, 74-76, 78 and 82. The chicken and turkey sequences are identical to human beta 2-microglobulin at 46 and 47 positions, respectively, and to bovine beta 2-microglobulin at 47 positions, i.e. there is about 47% identity between avian and mammalian beta 2-microglobulins. The known X-ray crystallographic structures of bovine beta 2-microglobulin and human HLA-A2 complex suggest that the seven chicken to turkey differences are exposed to solvent in the avian MHC class I complex. The key residues of beta 2-microglobulin involved in alpha chain contacts within the MHC class I molecule are highly conserved between chicken and man. This explains that heterologous human beta 2-microglobulin can substitute the chicken beta 2-microglobulin in exchange studies with B-F (chicken MHC class I molecule), and suggests that the MHC class I structure is conserved over long evolutionary distances.

  8. Coupling sequencing by hybridization (SBH) with gel sequencing for an inexpensive analysis of genes and genomes

    SciTech Connect

    Drmanac, S.; Labat, I.; Hauser, B.; Drmanac, R.

    1996-11-01

    The speed and cost of DNA sequencing are bottlenecks in the analysis of genes end genomes. Sequencing by hybridization (SBH) is a versatile method with several applications which can accelerated DNA screening, mapping and sequencing. Requirements, achievements and problems in the development of the SBH format 1 (DNA samples arrayed) are presented and schemes for its synergetic coupling with gel sequencing techniques are discussed. It appears that by one hybridization machine with 24 boxes and four ABI gel sequencers 100- 300 Mb of DNA sequence can be determined per year. Various genetic studies based on computer assisted analysis of large collections of partial or complete DNA sequences (`sequenetics`) may be achieved in this century.

  9. Microfluidic devices for DNA sequencing: sample preparation and electrophoretic analysis.

    PubMed

    Paegel, Brian M; Blazej, Robert G; Mathies, Richard A

    2003-02-01

    Modern DNA sequencing 'factories' have revolutionized biology by completing the human genome sequence, but in the race to completion we are left with inefficient, cumbersome, and costly macroscale processes and supporting facilities. During the same period, microfabricated DNA sequencing, sample processing and analysis devices have advanced rapidly toward the goal of a 'sequencing lab-on-a-chip'. Integrated microfluidic processing dramatically reduces analysis time and reagent consumption, and eliminates costly and unreliable macroscale robotics and laboratory apparatus. A microfabricated device for high-throughput DNA sequencing that couples clone isolation, template amplification, Sanger extension, purification, and electrophoretic analysis in a single microfluidic circuit is now attainable.

  10. A 25-Amino Acid Sequence of the Arabidopsis TGD2 Protein Is Sufficient for Specific Binding of Phosphatidic Acid*

    PubMed Central

    Lu, Binbin; Benning, Christoph

    2009-01-01

    Genetic analysis suggests that the TGD2 protein of Arabidopsis is required for the biosynthesis of endoplasmic reticulum derived thylakoid lipids. TGD2 is proposed to be the substrate-binding protein of a presumed lipid transporter consisting of the TGD1 (permease) and TGD3 (ATPase) proteins. The TGD1, -2, and -3 proteins are localized in the inner chloroplast envelope membrane. TGD2 appears to be anchored with an N-terminal membrane-spanning domain into the inner envelope membrane, whereas the C-terminal domain faces the intermembrane space. It was previously shown that the C-terminal domain of TGD2 binds phosphatidic acid (PtdOH). To investigate the PtdOH binding site of TGD2 in detail, the C-terminal domain of the TGD2 sequence lacking the transit peptide and transmembrane sequences was fused to the C terminus of the Discosoma sp. red fluorescent protein (DR). This greatly improved the solubility of the resulting DR-TGD2C fusion protein following production in Escherichia coli. The DR-TGD2C protein bound PtdOH with high specificity, as demonstrated by membrane lipid-protein overlay and liposome association assays. Internal deletion and truncation mutagenesis identified a previously undescribed minimal 25-amino acid fragment in the C-terminal domain of TGD2 that is sufficient for PtdOH binding. Binding characteristics of this 25-mer were distinctly different from those of TGD2C, suggesting that additional sequences of TGD2 providing the proper context for this 25-mer are needed for wild type-like PtdOH binding. PMID:19416982

  11. Identification of Nucleic Acid High Affinity Binding Sequences of Proteins by SELEX.

    PubMed

    Bouvet, Philippe

    2015-01-01

    A technique is described for the identification of nucleic acid sequences bound with high affinity by proteins or by other molecules suitable for a partitioning assay. Here, a histidine-tagged protein is allowed to interact with a pool of nucleic acids and the protein-nucleic acid complexes formed are retained on a Ni-NTA matrix. Nucleic acids with a low level of recognition by the protein are washed away. The pool of recovered nucleic acids is amplified by the polymerase chain reaction and is submitted to further rounds of selection. Each round of selection increases the proportion of sequences that are avidly bound by the protein of interest. The cloning and sequencing of these sequences finally completes their identification.

  12. Identification of nucleic acid high-affinity binding sequences of proteins by SELEX.

    PubMed

    Bouvet, Philippe

    2009-01-01

    A technique is described for the identification of nucleic acid sequences bound with high affinity by proteins or by other molecules suitable for a partitioning assay. Here, a histidine-tagged protein is allowed to interact with a pool of nucleic acids and the protein-nucleic acid complexes formed are retained on a Ni-NTA matrix. Nucleic acids with a low level of recognition by the protein are washed away. The pool of recovered nucleic acids is amplified by the polymerase chain reaction and is submitted to further rounds of selection. Each round of selection increases the proportion of sequences that are avidly bound by the protein of interest. The cloning and sequencing of these sequences finally completes their identification.

  13. Molecular cloning, encoding sequence, and expression of vaccinia virus nucleic acid-dependent nucleoside triphosphatase gene.

    PubMed Central

    Rodriguez, J F; Kahn, J S; Esteban, M

    1986-01-01

    A rabbit poxvirus genomic library contained within the expression vector lambda gt11 was screened with polyclonal antiserum prepared against vaccinia virus nucleic acid-dependent nucleoside triphosphatase (NTPase)-I enzyme. Five positive phage clones containing from 0.72- to 2.5-kilobase-pair (kbp) inserts expressed a beta-galactosidase fusion protein that was reactive by immunoblotting with the NTPase-I antibody. Hybridization analysis allowed the location of this gene within the vaccinia HindIIID restriction fragment. From the known nucleotide sequence of the 16-kbp vaccinia HindIIID fragment, we identified a region that contains a 1896-base open reading frame coding for a 631-amino acid protein. Analysis of the complete sequence revealed a highly basic protein, with hydrophilic COOH and NH2 termini, various hydrophobic domains, and no significant homology to other known proteins. Translational studies demonstrate that NTPase-I belongs to a late class of viral genes. This protein is highly conserved among Orthopoxviruses. Images PMID:3025846

  14. Sequence-Specific Covalent Capture Coupled with High-Contrast Nanopore Detection of a Disease-Derived Nucleic Acid Sequence.

    PubMed

    Nejad, Maryam Imani; Shi, Ruicheng; Zhang, Xinyue; Gu, Li-Qun; Gates, Kent S

    2017-07-18

    Hybridization-based methods for the detection of nucleic acid sequences are important in research and medicine. Short probes provide sequence specificity, but do not always provide a durable signal. Sequence-specific covalent crosslink formation can anchor probes to target DNA and might also provide an additional layer of target selectivity. Here, we developed a new crosslinking reaction for the covalent capture of specific nucleic acid sequences. This process involved reaction of an abasic (Ap) site in a probe strand with an adenine residue in the target strand and was used for the detection of a disease-relevant T→A mutation at position 1799 of the human BRAF kinase gene sequence. Ap-containing probes were easily prepared and displayed excellent specificity for the mutant sequence under isothermal assay conditions. It was further shown that nanopore technology provides a high contrast-in essence, digital-signal that enables sensitive, single-molecule sensing of the cross-linked duplexes. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. Project Report: Automatic Sequence Processor Software Analysis

    NASA Technical Reports Server (NTRS)

    Benjamin, Brandon

    2011-01-01

    The Mission Planning and Sequencing (MPS) element of Multi-Mission Ground System and Services (MGSS) provides space missions with multi-purpose software to plan spacecraft activities, sequence spacecraft commands, and then integrate these products and execute them on spacecraft. Jet Propulsion Laboratory (JPL) is currently is flying many missions. The processes for building, integrating, and testing the multi-mission uplink software need to be improved to meet the needs of the missions and the operations teams that command the spacecraft. The Multi-Mission Sequencing Team is responsible for collecting and processing the observations, experiments and engineering activities that are to be performed on a selected spacecraft. The collection of these activities is called a sequence and ultimately a sequence becomes a sequence of spacecraft commands. The operations teams check the sequence to make sure that no constraints are violated. The workflow process involves sending a program start command, which activates the Automatic Sequence Processor (ASP). The ASP is currently a file-based system that is comprised of scripts written in perl, c-shell and awk. Once this start process is complete, the system checks for errors and aborts if there are any; otherwise the system converts the commands to binary, and then sends the resultant information to be radiated to the spacecraft.

  16. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration.

  17. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-03-24

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. 14 figs.

  18. PrDOS: prediction of disordered protein regions from amino acid sequence.

    PubMed

    Ishida, Takashi; Kinoshita, Kengo

    2007-07-01

    PrDOS is a server that predicts the disordered regions of a protein from its amino acid sequence (http://prdos.hgc.jp). The server accepts a single protein amino acid sequence, in either plain text or FASTA format. The prediction system is composed of two predictors: a predictor based on local amino acid sequence information and one based on template proteins. The server combines the results of the two predictors and returns a two-state prediction (order/disorder) and a disorder probability for each residue. The prediction results are sent by e-mail, and the server also provides a web-interface to check the results.

  19. The amino acid sequence of protein CM-3 from Dendroaspis polylepis polylepis (black mamba) venom.

    PubMed

    Joubert, F J

    1985-01-01

    Protein CM-3 from Dendroaspis polylepis polylepis venom was purified by gel filtration and ion exchange chromatography. It comprises 65 amino acids including eight half-cystines. The complete amino acid sequence of protein CM-3 has been elucidated. The sequence (residues 1-50) resembles that of the N-terminal sequence of the subunits of a synergistic type protein and residues 51-65 that of the C-terminal sequence of an angusticeps type protein. Mixtures of protein CM-3 and angusticeps type proteins showed no apparent synergistic effect, in that their toxicity in combination was no greater than the sum of their individual toxicities.

  20. The amino acid sequences of the Fd fragments of two human γ heavy chains

    PubMed Central

    Press, E. M.; Hogg, N. M.

    1970-01-01

    The amino acid sequences of the Fd fragments of two human pathological immunoglobulins of the immunoglobulin G1 class are reported. Comparison of the two sequences shows that the heavy-chain variable regions are similar in length to those of the light chains. The existence of heavy chain variable region subgroups is also deduced, from a comparison of these two sequences with those of another γ 1 chain, Eu, a μ chain, Ou, and the partial sequence of a fourth γ 1 chain, Ste. Carbohydrate has been found to be linked to an aspartic acid residue in the variable region of one of the γ 1 chains, Cor. PMID:5449120

  1. Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

    PubMed Central

    Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  2. Sequencing, analysis, and annotation of expressed sequence tags for Camelus dromedarius.

    PubMed

    Al-Swailem, Abdulaziz M; Shehata, Maher M; Abu-Duhier, Faisel M; Al-Yamani, Essam J; Al-Busadah, Khalid A; Al-Arawi, Mohammed S; Al-Khider, Ali Y; Al-Muhaimeed, Abdullah N; Al-Qahtani, Fahad H; Manee, Manee M; Al-Shomrani, Badr M; Al-Qhtani, Saad M; Al-Harthi, Amer S; Akdemir, Kadir C; Inan, Mehmet S; Otu, Hasan H

    2010-05-19

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and approximately 40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism.

  3. RED: the analysis, management and dissemination of expressed sequence tags.

    PubMed

    Everitt, R; Minnema, S E; Wride, M A; Koster, C S; Hance, J E; Mansergh, F C; Rancourt, D E

    2002-12-01

    The Rancourt EST Database (RED) is a web-based system for the analysis, management, and dissemination of expressed sequence tags (ESTs). RED represents a flexible template DNA sequence database that can be easily manipulated to suit the needs of other laboratories undertaking mid-size sequencing projects.

  4. Evolutionary insights from suffix array-based genome sequence analysis.

    PubMed

    Poddar, Anindya; Chandra, Nagasuma; Ganapathiraju, Madhavi; Sekar, K; Klein-Seetharaman, Judith; Reddy, Raj; Balakrishnan, N

    2007-08-01

    Gene and protein sequence analyses, central components of studies in modern biology are easily amenable to string matching and pattern recognition algorithms. The growing need of analysing whole genome sequences more efficiently and thoroughly, has led to the emergence of new computational methods. Suffix trees and suffix arrays are data structures, well known in many other areas and are highly suited for sequence analysis too. Here we report an improvement to the design of construction of suffix arrays. Enhancement in versatility and scalability, enabled by this approach, is demonstrated through the use of real-life examples. The scalability of the algorithm to whole genomes renders it suitable to address many biologically interesting problems. One example is the evolutionary insight gained by analysing unigrams, bi-grams and higher n-grams, indicating that the genetic code has a direct influence on the overall composition of the genome. Further, different proteomes have been analysed for the coverage of the possible peptide space, which indicate that as much as a quarter of the total space at the tetra-peptide level is left un-sampled in prokaryotic organisms, although almost all tri-peptides can be seen in one protein or another in a proteome. Besides, distinct patterns begin to emerge for the counts of particular tetra and higher peptides, indicative of a 'meaning' for tetra and higher n-grams. The toolkit has also been used to demonstrate the usefulness of identifying repeats in whole proteomes efficiently. As an example, 16 members of one COG,coded by the genome of Mycobacterium tuberculosis H37Rv have been found to contain a repeating sequence of 300 amino acids.

  5. Fragmentation Characteristics of Deprotonated N-linked Glycopeptides: Influences of Amino Acid Composition and Sequence

    NASA Astrophysics Data System (ADS)

    Nishikaze, Takashi; Kawabata, Shin-ichirou; Tanaka, Koichi

    2014-06-01

    Glycopeptide structural analysis using tandem mass spectrometry is becoming a common approach for elucidating site-specific N-glycosylation. The analysis is generally performed in positive-ion mode. Therefore, fragmentation of protonated glycopeptides has been extensively investigated; however, few studies are available on deprotonated glycopeptides, despite the usefulness of negative-ion mode analysis in detecting glycopeptide signals. Here, large sets of glycopeptides derived from well-characterized glycoproteins were investigated to understand the fragmentation behavior of deprotonated N-linked glycopeptides under low-energy collision-induced dissociation (CID) conditions. The fragment ion species were found to be significantly variable depending on their amino acid sequence and could be classified into three types: (i) glycan fragment ions, (ii) glycan-lost fragment ions and their secondary cleavage products, and (iii) fragment ions with intact glycan moiety. The CID spectra of glycopeptides having a short peptide sequence were dominated by type (i) glycan fragments (e.g., 2,4AR, 2,4AR-1, D, and E ions). These fragments define detailed structural features of the glycan moiety such as branching. For glycopeptides with medium or long peptide sequences, the major fragments were type (ii) ions (e.g., [peptide + 0,2X0-H]- and [peptide-NH3-H]-). The appearance of type (iii) ions strongly depended on the peptide sequence, and especially on the presence of Asp, Asn, and Glu. When a glycosylated Asn is located on the C-terminus, an interesting fragment having an Asn residue with intact glycan moiety, [glycan + Asn-36]-, was abundantly formed. Observed fragments are reasonably explained by a combination of existing fragmentation rules suggested for N-glycans and peptides.

  6. An analysis of the feasibility of short read sequencing

    PubMed Central

    Whiteford, Nava; Haslam, Niall; Weber, Gerald; Prügel-Bennett, Adam; Essex, Jonathan W.; Roach, Peter L.; Bradley, Mark; Neylon, Cameron

    2005-01-01

    Several methods for ultra high-throughput DNA sequencing are currently under investigation. Many of these methods yield very short blocks of sequence information (reads). Here we report on an analysis showing the level of genome sequencing possible as a function of read length. It is shown that re-sequencing and de novo sequencing of the majority of a bacterial genome is possible with read lengths of 20–30 nt, and that reads of 50 nt can provide reconstructed contigs (a contiguous fragment of sequence data) of 1000 nt and greater that cover 80% of human chromosome 1. PMID:16275781

  7. The Chinese hamster Alu-equivalent sequence: a conserved highly repetitious, interspersed deoxyribonucleic acid sequence in mammals has a structure suggestive of a transposable element.

    PubMed Central

    Haynes, S R; Toomey, T P; Leinwand, L; Jelinek, W R

    1981-01-01

    A consensus sequence has been determined for a major interspersed deoxyribonucleic acid repeat in the genome of Chinese hamster ovary cells (CHO cells). This sequence is extensively homologous to (i) the human Alu sequence (P. L. Deininger et al., J. Mol. Biol., in press), (ii) the mouse B1 interspersed repetitious sequence (Krayev et al., Nucleic Acids Res. 8:1201-1215, 1980) (iii) an interspersed repetitious sequence from African green monkey deoxyribonucleic acid (Dhruva et al., Proc. Natl. Acad. Sci. U.S.A. 77:4514-4518, 1980) and (iv) the CHO and mouse 4.5S ribonucleic acid (this report; F. Harada and N. Kato, Nucleic Acids Res. 8:1273-1285, 1980). Because the CHO consensus sequence shows significant homology to the human Alu sequence it is termed the CHO Alu-equivalent sequence. A conserved structure surrounding CHO Alu-equivalent family members can be recognized. It is similar to that surrounding the human Alu and the mouse B1 sequences, and is represented as follows: direct repeat-CHO-Alu-A-rich sequence-direct repeat. A composite interspersed repetitious sequence has been identified. Its structure is represented as follows: direct repeat-residue 47 to 107 of CHO-Alu-non-Alu repetitious sequence-A-rich sequence-direct repeat. Because the Alu flanking sequences resemble those that flank known transposable elements, we think it likely that the Alu sequence dispersed throughout the mammalian genome by transposition. Images PMID:9279371

  8. Formation Sequences of Iron Minerals in the Acidic Alteration Products and Variation of Hydrothermal Fluid Conditions

    NASA Astrophysics Data System (ADS)

    Isobe, H.; Yoshizawa, M.

    2008-12-01

    Iron minerals have important role in environmental issues not only on the Earth but also other terrestrial planets. Iron mineral species related to alteration products of primary minerals with surface or subsurface fluids are characterized by temperature, acidity and redox conditions of the fluids. We can see various iron- bearing alteration products in alteration products around fumaroles in geothermal/volcanic areas. In this study, zonal structures of iron minerals in alteration products of the geothermal area are observed to elucidate temporal and spatial variation of hydrothermal fluids. Alteration of the pyroxene-amphibole andesite of Garan-dake volcano, Oita, Japan occurs by the acidic hydrothermal fluid to form cristobalite leaching out elements other than Si. Hand specimens with unaltered or weakly altered core and cristobalite crust show various sequences of layers. XRD analysis revealed that the alteration degree is represented by abundance of cristobalite. Intermediately altered layers are characterized by occurrence including alunite, pyrite, kaolinite, goethite and hematite. A specimen with reddish brown core surrounded by cristobalite-rich white crust has brown colored layers at the boundary of core and the crust. Reddish core is characterized by occurrence of crystalline hematite by XRD. Another hand specimen has light gray core, which represents reduced conditions, and white cristobalite crust with light brown and reddish brown layers of ferric iron minerals between the core and the crust. On the other hand, hornblende crystals, typical ferrous iron-bearing mineral of the host rock, are well preserved in some samples with strongly decolorized cristobalite-rich groundmass. Hydrothermal alteration experiments of iron-rich basaltic material shows iron mineral species depend on acidity and temperature of the fluid. Oxidation states of the iron-bearing mineral species are strongly influenced by the acidity and redox conditions. Variations of alteration

  9. The amino acid sequence of goat beta-lactoglobulin.

    PubMed

    Préaux, G; Braunitzer, G; Schrank, B; Stangl, A

    1979-11-01

    The isolation of beta-lactoglobulin from milk of the goat is described. The purified protein was checked for purity and has been characterized by its gross composition and end groups. The native or the modified protein was then degraded by tryptic and cyanogen bromide cleavage. The cleavage products were isolated and sequenced in the sequenator using a Quadrol and propyne program. These data provide the complete sequence of beta-lactoglobulin of the goat. The results are discussed and compared particularly with bovine beta-lactoglobulin components AB. Some biological aspects are described.

  10. Layered materials with coexisting acidic and basic sites for catalytic one-pot reaction sequences.

    PubMed

    Motokura, Ken; Tada, Mizuki; Iwasawa, Yasuhiro

    2009-06-17

    Acidic montmorillonite-immobilized primary amines (H-mont-NH(2)) were found to be excellent acid-base bifunctional catalysts for one-pot reaction sequences, which are the first materials with coexisting acid and base sites active for acid-base tamdem reactions. For example, tandem deacetalization-Knoevenagel condensation proceeded successfully with the H-mont-NH(2), affording the corresponding condensation product in a quantitative yield. The acidity of the H-mont-NH(2) was strongly influenced by the preparation solvent, and the base-catalyzed reactions were enhanced by interlayer acid sites.

  11. Expressed sequence tags analysis of Blattella germanica.

    PubMed

    Chung, Hyang Suk; Yu, Tai Hyun; Kim, Bong Jin; Kim, Sun Mi; Kim, Joo Yeong; Yu, Hak Sun; Jeong, Hae Jin; Ock, Mee Sun

    2005-12-01

    Four hundred and sixty five randomly selected clones from a cDNA library of Blattella germanica were partially sequenced and searched using BLAST as a means of analyzing the transcribed sequences of its genome. A total of 363 expressed sequence tags (ESTs) were generated from 465 clones after editing and trimming the vector and ambiguous sequences. About 42% (154/363) of these clones showed significant homology with other data base registered genes. These new B. germanica genes constituted a broad range of transcripts distributed among ribosomal proteins, energy metabolism, allergens, proteases, protease inhibitors, enzymes, translation, cell signaling pathways, and proteins of unknown function. Eighty clones were not well-matched by database searches, and these represent new B. germanica-specific ESTs. Some genes which drew our attention are discussed. The information obtained increases our understanding of the B. germanica genome.

  12. Expressed sequence tags analysis of Blattella germanica

    PubMed Central

    Chung, Hyang Suk; Yu, Tai Hyun; Kim, Bong Jin; Kim, Sun Mi; Kim, Joo Yeong; Yu, Hak Sun; Jeong, Hae Jin

    2005-01-01

    Four hundred and sixty five randomly selected clones from a cDNA library of Blattella germanica were partially sequenced and searched using BLAST as a means of analyzing the transcribed sequences of its genome. A total of 363 expressed sequence tags (ESTs) were generated from 465 clones after editing and trimming the vector and ambiguous sequences. About 42% (154/363) of these clones showed significant homology with other data base registered genes. These new B. germanica genes constituted a broad range of transcripts distributed among ribosomal proteins, energy metabolism, allergens, proteases, protease inhibitors, enzymes, translation, cell signaling pathways, and proteins of unknown function. Eighty clones were not well-matched by database searches, and these represent new B. germanica-specific ESTs. Some genes which drew our attention are discussed. The information obtained increases our understanding of the B. germanica genome. PMID:16340304

  13. Microbial Contamination in Next Generation Sequencing: Implications for Sequence-Based Analysis of Clinical Samples

    PubMed Central

    Strong, Michael J.; Xu, Guorong; Morici, Lisa; Splinter Bon-Durant, Sandra; Baddoo, Melody; Lin, Zhen; Fewell, Claire; Taylor, Christopher M.; Flemington, Erik K.

    2014-01-01

    The high level of accuracy and sensitivity of next generation sequencing for quantifying genetic material across organismal boundaries gives it tremendous potential for pathogen discovery and diagnosis in human disease. Despite this promise, substantial bacterial contamination is routinely found in existing human-derived RNA-seq datasets that likely arises from environmental sources. This raises the need for stringent sequencing and analysis protocols for studies investigating sequence-based microbial signatures in clinical samples. PMID:25412476

  14. Analysis of Neuronal Sequences Using Pairwise Biases

    DTIC Science & Technology

    2015-08-27

    sequences have been tied to memory formation and spatial navigation in the hippocampus , a region of mammalian brains. Traditionally, neu- ronal...brain func- tions. In particular, these sequences have been tied to memory formation and spatial navigation in the hippocampus , a region of mammalian... hippocampus is widely believed to aid in two main functions: episodic memory and spatial navigation. Episodic memories are, broadly speaking, memories that

  15. Genetic characterization of three novel chicken parvovirus strains based on analysis of their coding sequences.

    PubMed

    Koo, Bon-Sang; Lee, Hae-Rim; Jeon, Eun-Ok; Han, Moo-Sung; Min, Kyeong-Cheol; Lee, Seung-Baek; Bae, Yeon-Ji; Cho, Sun-Hyung; Mo, Jong-Suk; Kwon, Hyuk Moo; Sung, Haan Woo; Kim, Jong-Nyeo; Mo, In-Pil

    2015-01-01

    Chicken parvovirus (ChPV) is one of the causative agents of viral enteritis. Recently, the genome of the ABU-P1 strain of ChPV was fully sequenced and determined to have a distinct genomic composition compared with that of vertebrate parvoviruses. However, no comparative sequence analysis of coding regions of ChPVs was possible because of the lack of other sequence information. In this study, we obtained the nucleotide sequences of all genomic coding regions of three ChPVs by polymerase chain reaction using 13 primer sets, and deduced the amino acid sequences from the nucleotide sequences. The non-structural protein 1 (NS1) gene of the three ChPVs showed 95.0 to 95.5% nucleotide sequence identity and 96.5 to 98.1% amino acid sequence identity to those of NS1 from the ABU-P1 strain, respectively, and even higher nucleotide and amino acid similarities to one another. The viral proteins (VP) gene was more divergent between the three ChPV Korean strains and ABU-P1, with 88.1 to 88.3% nucleotide identity and 93.0% amino acid identity. Analysis of the putative tertiary structure of the ChPV VP2 protein showed that variable regions with less than 80% nucleotide similarity between the three Korean strains and ABU-P1 occurred in large loops of the VP2 protein believed to be involved in antigenicity, pathogenicity, and tissue tropism in other parvoviruses. Based on our analysis of full-length coding sequences, we discovered greater variation in ChPV strains than reported previously, especially in partial regions of the VP2 protein.

  16. Computer Simulation of the Determination of Amino Acid Sequences in Polypeptides

    ERIC Educational Resources Information Center

    Daubert, Stephen D.; Sontum, Stephen F.

    1977-01-01

    Describes a computer program that generates a random string of amino acids and guides the student in determining the correct sequence of a given protein by using experimental analytic data for that protein. (MLH)

  17. Computer Simulation of the Determination of Amino Acid Sequences in Polypeptides

    ERIC Educational Resources Information Center

    Daubert, Stephen D.; Sontum, Stephen F.

    1977-01-01

    Describes a computer program that generates a random string of amino acids and guides the student in determining the correct sequence of a given protein by using experimental analytic data for that protein. (MLH)

  18. Synthesis of gamma,delta-unsaturated glycolic acids via sequenced brook and Ireland--claisen rearrangements.

    PubMed

    Schmitt, Daniel C; Johnson, Jeffrey S

    2010-03-05

    Organozinc, -magnesium, and -lithium nucleophiles initiate a Brook/Ireland-Claisen rearrangement sequence of allylic silyl glyoxylates resulting in the formation of gamma,delta-unsaturated alpha-silyloxy acids.

  19. Rethinking microbial diversity analysis in the high throughput sequencing era.

    PubMed

    Lemos, Leandro N; Fulthorpe, Roberta R; Triplett, Eric W; Roesch, Luiz F W

    2011-07-01

    The analysis of amplified and sequenced 16S rRNA genes has become the most important single approach for microbial diversity studies. The new sequencing technologies allow for sequencing thousands of reads in a single run and a cost-effective option is split into a single run across many samples. However for this type of investigation the key question that needs to be answered is how many samples can be sequenced without biasing the results due to lack of sequence representativeness? In this work we demonstrated that the level of sequencing effort used for analyzing soil microbial communities biases the results and determines the most effective type of analysis for small and large datasets. Many simulations were performed with four independent pyrosequencing-generated 16S rRNA gene libraries from different environments. The analysis performed here illustrates the lack of resolution of OTU-based approaches for datasets with low sequence coverage. This analysis should be performed with at least 90% of sequence coverage. Diversity index values increase with sample size making normalization of the number of sequences in all samples crucial. An important finding of this study was the advantage of phylogenetic approaches for examining microbial communities with low sequence coverage. However, if the environments being compared were closely related, a deeper sequencing would be necessary to detect the variation in the microbial composition. Copyright © 2011 Elsevier B.V. All rights reserved.

  20. Complete amino acid sequence of a Lolium perenne (perennial rye grass) pollen allergen, Lol p II.

    PubMed

    Ansari, A A; Shenbagamurthi, P; Marsh, D G

    1989-07-05

    The complete amino acid sequence of a Lolium perenne (rye grass) pollen allergen, Lol p II was determined by automated Edman degradation of the protein and selected fragments. Cleavage of the protein by enzymatic and chemical techniques established an unambiguous sequence for the protein. Lol p II contains 97 amino acid residues, with a calculated molecular weight of 10,882. The protein lacks cysteine and glutamine and shows no evidence of glycosylation. Theoretical predictions by Fraga's (Fraga, S. (1982) Can. J. Chem. 60, 2606-2610) and Hopp and Woods' (Hopp, T. P., and Woods, K. R. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 3824-3828) methods indicate the presence of four hydrophilic regions, which may contribute to sequential or parts of conformational B-cell epitopes. Analysis of amphipathic regions by Berzofsky's method indicates the presence of a highly amphipathic region, which may contain, or contribute to, an Ia/T-cell epitope. This latter segment of Lol p II was found to be highly homologous with an antibody-binding segment of the major rye allergen Lol p I and may explain why immune responsiveness to both the allergens is associated with HLA-DR3.

  1. Partial amino acid sequences around sulfhydryl groups of soybean beta-amylase.

    PubMed

    Nomura, K; Mikami, B; Morita, Y

    1987-08-01

    Sulfhydryl (SH) groups of soybean beta-amylase were modified with 5-(iodoaceto-amidoethyl)aminonaphthalene-1-sulfonate (IAEDANS) and the SH-containing peptides exhibiting fluorescence were purified after chymotryptic digestion of the modified enzyme. The sequence analysis of the peptides derived from the modification of all SH groups in the denatured enzyme revealed the existence of six SH groups, in contrast to five reported previously. One of them was found to have extremely low reactivity toward SH-reagents without reduction. In the native state, IAEDANS reacted with 2 mol of SH groups per mol of the enzyme (SH1 and SH2) accompanied with inactivation of the enzyme owing to the modification of SH2 located near the active site of this enzyme. The selective modification of SH2 with IAEDANS was attained after the blocking of SH1 with 5,5'-dithiobis-(2-nitrobenzoic acid). The amino acid sequences of the peptides containing SH1 and SH2 were determined to be Cys-Ala-Asn-Pro-Gln and His-Gln-Cys-Gly-Gly-Asn-Val-Gly-Asp-Ile-Val-Asn-Ile-Pro-Ile-Pro-Gln-Trp, respectively.

  2. GATA: a graphic alignment tool for comparative sequence analysis.

    PubMed

    Nix, David A; Eisen, Michael B

    2005-01-17

    Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dot plot analysis is often used to estimate non-coding sequence relatedness. Yet dot plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments. To address some of these issues, we created a stand alone, platform independent, graphic alignment tool for comparative sequence analysis (GATA http://gata.sourceforge.net/). GATA uses the NCBI-BLASTN program and extensive post-processing to identify all small sub-alignments above a low cut-off score. These are graphed as two shaded boxes, one for each sequence, connected by a line using the coordinate system of their parent sequence. Shading and colour are used to indicate score and orientation. A variety of options exist for querying, modifying and retrieving conserved sequence elements. Extensive gene annotation can be added to both sequences using a standardized General Feature Format (GFF) file. GATA uses the NCBI-BLASTN program in conjunction with post-processing to exhaustively align two DNA sequences. It provides researchers with a fine

  3. The BsaHI restriction-modification system: cloning, sequencing and analysis of conserved motifs.

    PubMed

    Neely, Robert K; Roberts, Richard J

    2008-05-14

    Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360), cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases.

  4. Full-length sequence analysis of four IBDV strains with different pathogenicities.

    PubMed

    Petkov, Daniel; Linnemann, Erich; Kapczynski, Darrell R; Sellers, Holly S

    2007-06-01

    Characterization of field isolate 9109, Lukert, Edgar cell culture-adapted (CCA), and Edgar chicken embryo-adapted (CEA) serotype 1 IBDV strains using full-length genomic sequences is reported. IBDV genomic segments A and B were sequenced and the nucleotide and deduced amino acid (aa) sequences were compared with previously reported full-length sequenced IBDV strains. We found that the viral protein VPX and amino acid sequences between aa 202-451 and 210-473 of VP2 but not the entire VP2 protein are the best representatives of the entire IBDV genome. The greatest variability was found in the VP2 and 5' non-coding region of segment B among IBDV strains. The deduced amino acid sequences of the VP1 protein varies in length among the strains analyzed. The RNA-dependent, RNA-polymerase motifs within VP1 and the VP5 protein were highly conserved among isolates. Although within the VP2 processing site, amino acid sequence of Lukert was similar to the classical while the Edgar CCA, and CEA were more similar to the very virulent strains, it was determined that these strains have sequence characteristics of the classical strains. In addition, close relatedness between Lukert, Edgar CCA and CEA was observed. Although phylogenetic analysis of the VP1, VP3, and VP4 proteins indicated that 9109 is a classical type virus, this isolate shares unique amino acid changes with very virulent strains within the same proteins. Phylogenetic analysis of the 3' and 5' non-coding regions of segment A revealed that 9109 is more similar to the very virulent strains compared to the classical strains. In the VP2 protein, several amino acids were conserved between variant E and 9109 strains. Thus, it appears that 9109 isolate has characteristics of classical, very virulent, and variant strains. Our analysis indicates that although VPX amino acid comparison may be initially useful for molecular typing, full-length genomic sequence analysis is essential for thorough molecular characterization as

  5. Automated carboxy-terminal sequence analysis of peptides and proteins using diphenyl phosphoroisothiocyanatidate.

    PubMed Central

    Bailey, J. M.; Nikfarjam, F.; Shenoy, N. R.; Shively, J. E.

    1992-01-01

    peptides covalently attached to carboxylic acid-modified polyethylene and proteins (200 pmol to 5 nmol) noncovalently applied to Zitex (porous Teflon). The generality of our automated C-terminal sequencing methodology was examined by sequencing model peptides containing all 20 of the common amino acids. All of the amino acids tested were found to sequence in good yield except for proline, which was found not to be capable of derivatization. In spite of this limitation, the methodology should be a valuable tool for the C-terminal sequence analysis of peptides and proteins.(ABSTRACT TRUNCATED AT 400 WORDS) PMID:1304893

  6. Scalable Kernel Methods and Algorithms for General Sequence Analysis

    ERIC Educational Resources Information Center

    Kuksa, Pavel

    2011-01-01

    Analysis of large-scale sequential data has become an important task in machine learning and pattern recognition, inspired in part by numerous scientific and technological applications such as the document and text classification or the analysis of biological sequences. However, current computational methods for sequence comparison still lack…

  7. Scalable Kernel Methods and Algorithms for General Sequence Analysis

    ERIC Educational Resources Information Center

    Kuksa, Pavel

    2011-01-01

    Analysis of large-scale sequential data has become an important task in machine learning and pattern recognition, inspired in part by numerous scientific and technological applications such as the document and text classification or the analysis of biological sequences. However, current computational methods for sequence comparison still lack…

  8. [Tabular excel editor for analysis of aligned nucleotide sequences].

    PubMed

    Demkin, V V

    2010-01-01

    Excel platform was used for transition of results of multiple aligned nucleotide sequences obtained using the BLAST network service to the form appropriate for visual analysis and editing. Two macros operators for MS Excel 2007 were constructed. The array of aligned sequences transformed into Excel table and processed using macros operators is more appropriate for analysis than initial html data.

  9. Rare variant detection using family-based sequencing analysis.

    PubMed

    Peng, Gang; Fan, Yu; Palculict, Timothy B; Shen, Peidong; Ruteshouser, E Cristy; Chi, Aung-Kyaw; Davis, Ronald W; Huff, Vicki; Scharfe, Curt; Wang, Wenyi

    2013-03-05

    Next-generation sequencing is revolutionizing genomic analysis, but this analysis can be compromised by high rates of missing true variants. To develop a robust statistical method capable of identifying variants that would otherwise not be called, we conducted sequence data simulations and both whole-genome and targeted sequencing data analysis of 28 families. Our method (Family-Based Sequencing Program, FamSeq) integrates Mendelian transmission information and raw sequencing reads. Sequence analysis using FamSeq reduced the number of false negative variants by 14-33% as assessed by HapMap sample genotype confirmation. In a large family affected with Wilms tumor, 84% of variants uniquely identified by FamSeq were confirmed by Sanger sequencing. In children with early-onset neurodevelopmental disorders from 26 families, de novo variant calls in disease candidate genes were corrected by FamSeq as mendelian variants, and the number of uniquely identified variants in affected individuals increased proportionally as additional family members were included in the analysis. To gain insight into maximizing variant detection, we studied factors impacting actual improvements of family-based calling, including pedigree structure, allele frequency (common vs. rare variants), prior settings of minor allele frequency, sequence signal-to-noise ratio, and coverage depth (∼20× to >200×). These data will help guide the design, analysis, and interpretation of family-based sequencing studies to improve the ability to identify new disease-associated genes.

  10. Establishing a framework for comparative analysis of genome sequences

    SciTech Connect

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  11. Relationships among genera of the Saccharomycotina from multigene sequence analysis

    USDA-ARS?s Scientific Manuscript database

    Most known species of the subphylum Saccharomycotina (budding ascomycetous yeasts) have now been placed in phylogenetically defined clades following multigene sequence analysis. Terminal clades, which are usually well supported from bootstrap analysis, are viewed as phylogenetically circumscribed ge...

  12. Cloning and sequence analysis of IL-2, IL-4 and IFN-γ from Indian Dromedary camels (Camelus dromedarius).

    PubMed

    Nagarajan, G; Swami, Shelesh Kumar; Ghorui, S K; Pathak, K M L; Singh, R K; Patil, N V

    2012-06-01

    The cDNAs of three cytokines, viz., IL-2, IL-4 and IFN-γ from Dromedary camels were amplified by PCR using Bactrian camel sequences and subsequently cloned for sequence analysis. Relationship based on amino acid sequences revealed that Dromedary camel IL-2 shared 99.5% and 99.3% identity at the nucleotide and amino acid levels with Bactrian camel IL-2. In the case of IL-4, the identity of Dromedary camel was 99.7% and 99.2% at the nucleotide and amino acid levels, respectively with that of Bactrian camel. The Dromedary camel IFN-γ shared 100% identity both at nucleotide and amino acid levels with Bactrian camel IFN-γ. Phylogenetic analysis based on amino acid sequences indicated the close relationship in these cytokine genes between the Dromedary camel and other camelids.

  13. PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.

    PubMed

    Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong; Warnow, Tandy

    2015-05-01

    We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.

  14. Wavelet Analysis on Symbolic Sequences and Two-Fold de Bruijn Sequences

    NASA Astrophysics Data System (ADS)

    Osipov, V. Al.

    2016-07-01

    The concept of symbolic sequences play important role in study of complex systems. In the work we are interested in ultrametric structure of the set of cyclic sequences naturally arising in theory of dynamical systems. Aimed at construction of analytic and numerical methods for investigation of clusters we introduce operator language on the space of symbolic sequences and propose an approach based on wavelet analysis for study of the cluster hierarchy. The analytic power of the approach is demonstrated by derivation of a formula for counting of two-fold de Bruijn sequences, the extension of the notion of de Bruijn sequences. Possible advantages of the developed description is also discussed in context of applied problem of construction of efficient DNA sequence assembly algorithms.

  15. Human protein cluster analysis using amino acid frequencies.

    PubMed

    Vernone, Annamaria; Berchialla, Paola; Pescarmona, Gianpiero

    2013-01-01

    The paper focuses on the development of a software tool for protein clustering according to their amino acid content. All known human proteins were clustered according to the relative frequencies of their amino acids starting from the UniProtKB/Swiss-Prot reference database and making use of hierarchical cluster analysis. RESULTS were compared to those based on sequence similarities. Proteins display different clustering patterns according to type. Many extracellular proteins with highly specific and repetitive sequences (keratins, collagens etc.) cluster clearly confirming the accuracy of the clustering method. In our case clustering by sequence and amino acid content overlaps. Proteins with a more complex structure with multiple domains (catalytic, extracellular, transmembrane etc.), even if classified very similar according to sequence similarity and function (aquaporins, cadherins, steroid 5-alpha reductase etc.) showed different clustering according to amino acid content. Availability of essential amino acids according to local conditions (starvation, low or high oxygen, cell cycle phase etc.) may be a limiting factor in protein synthesis, whatever the mRNA level. This type of protein clustering may therefore prove a valuable tool in identifying so far unknown metabolic connections and constraints.

  16. Molecular cloning and sequence analysis of prion protein gene in Xiji donkey in China.

    PubMed

    Zhang, Zhuming; Wang, Renli; Xu, Lihua; Yuan, Fangzhong; Zhou, Xiangmei; Yang, Lifeng; Yin, Xiaomin; Xu, Binrui; Zhao, Deming

    2013-10-25

    Prion diseases are a group of human and animal neurodegenerative disorders caused by the deposition of an abnormal isoform prion protein (PrP(Sc)) encoded by a single copy prion protein gene (PRNP). Prion disease has been reported in many herbivores but not in Equus and the species barrier might be playing a role in resistance of these species to the disease. Therefore, analysis of genotype of prion protein (PrP) in these species may help understand the transmission of the disease. Xiji donkey is a rare species of Equus not widely reared in Ningxia, China, for service, food and medicine, but its PRNP has not been studied. Based on the reported PrP sequence in GenBank we designed primers and amplified, cloned and sequenced the PRNP of Xiji donkey. The sequence analysis showed that the Xiji donkey PRNP was consisted of an open reading frame of 768 nucleotides encoding 256 amino acids. Amino acid residues unique to donkey as compared with some Equus animals, mink, cow, sheep, human, dog, sika deer, rabbit and hamster were identified. The results showed that the amino acid sequence of Xiji donkey PrP starts with the consensus sequence MVKSH, with almost identical amino acid sequence to the PrP of other Equus species in this study. Amino acid sequence analysis showed high identity within species and close relation to the PRNP of sika deer, sheep, dog, camel, cow, mink, rabbit and hamster with 83.1-99.7% identity. The results provided the PRNP data for an additional Equus species, which should be useful to the study of the prion disease pathogenesis, resistance and cross species transmission.

  17. Nucleic acid sequence of an internal image-bearing monoclonal anti-idiotype and its comparison to the sequence of the external antigen.

    PubMed Central

    Bruck, C; Co, M S; Slaoui, M; Gaulton, G N; Smith, T; Fields, B N; Mullins, J I; Greene, M I

    1986-01-01

    The monoclonal anti-idiotypic antibody (mAb2) 87.92.6 directed against the 9B.G5 antibody specific for the virus neutralizing epitope on the mammalian reovirus type 3 hemagglutinin was previously demonstrated to express an internal image of the receptor binding epitope of the reovirus type 3. Furthermore, this mAb2 has autoimmune reactivity to the cell surface receptor of the reovirus. The nucleotide and deduced amino acid sequences of the 87.92.6 mAb2 heavy and light chains are described in this report. The sequence analysis reveals that the same heavy chain variable and joining (VH and JH) gene segments are used by the 87.92.6 anti-idiotypic mAb2 and by the dominant idiotypes of the BALB/c anti-GAT (cGAT) and anti-NP (NPa) responses. [GAT; random polymer that is 60% glutamic acid, 30% alanine, and 10% tyrosine. NP; (4-hydroxy-3-nitrophenyl)-acetyl.] Despite extensive homology at the level of the heavy chain variable regions, the NPa positive BALB/c anti-NP monoclonal antibody 17.2.25 binds neither 9B.G5 nor the cellular receptor for the hemagglutinin. Amino acid sequence comparison between the viral hemagglutinin and the 87.92.6 mAb2 light chain "internal image," reveals an area of significant homology indicating that antigen mimicry by antibodies may be achieved by sharing primary structure. PMID:2428036

  18. Neurofibromatosis type 1 gene mutation analysis using sequence capture and high-throughput sequencing.

    PubMed

    Uusitalo, Elina; Hammais, Anna; Palonen, Elina; Brandt, Annika; Mäkelä, Ville-Veikko; Kallionpää, Roope; Jouhilahti, Eeva-Mari; Pöyhönen, Minna; Soini, Juhani; Peltonen, Juha; Peltonen, Sirkku

    2014-11-01

    Neurofibromatosis type 1 syndrome (NF1) is caused by mutations in the NF1 gene. Availability of new sequencing technology prompted us to search for an alternative method for NF1 mutation analysis. Genomic DNA was isolated from saliva avoiding invasive sampling. The NF1 exons with an additional 50bp of flanking intronic sequences were captured and enriched using the SeqCap EZ Choice Library protocol. The captured DNA was sequenced with the Roche/454 GS Junior system. The mean coverages of the targeted regions were 41x and 74x in 2 separate sets of samples. An NF1 mutation was discovered in 10 out of 16 separate patient samples. Our study provides proof of principle that the sequence capture methodology combined with high-throughput sequencing is applicable to NF1 mutation analysis. Deep intronic mutations may however remain undetectable, and change at the DNA level may not predict the outcome at the mRNA or protein levels.

  19. Design and Analysis of Single-Cell Sequencing Experiments.

    PubMed

    Grün, Dominic; van Oudenaarden, Alexander

    2015-11-05

    Recent advances in single-cell sequencing hold great potential for exploring biological systems with unprecedented resolution. Sequencing the genome of individual cells can reveal somatic mutations and allows the investigation of clonal dynamics. Single-cell transcriptome sequencing can elucidate the cell type composition of a sample. However, single-cell sequencing comes with major technical challenges and yields complex data output. In this Primer, we provide an overview of available methods and discuss experimental design and single-cell data analysis. We hope that these guidelines will enable a growing number of researchers to leverage the power of single-cell sequencing.

  20. Draft Genome Sequence of Gephyronic Acid Producer Cystobacter violaceus Strain Cb vi76

    PubMed Central

    Stevens, D. Cole; Young, Jeanette; Carmichael, Rory; Tan, John

    2014-01-01

    A draft genome sequence of Cystobacter violaceus strain Cb vi76, which produces the eukaryotic protein synthesis inhibitor gephyronic acid, has been obtained. The genome contains numerous predicted secondary metabolite clusters, including the gephyronic acid biosynthetic pathway. This genome will contribute to the investigation of secondary metabolism in other Cystobacter strains. PMID:25502681

  1. SETG: Nucleic Acid Extraction and Sequencing for In Situ Life Detection on Mars

    NASA Astrophysics Data System (ADS)

    Mojarro, A.; Hachey, J.; Tani, J.; Smith, A.; Bhattaru, S. A.; Pontefract, A.; Doebler, R.; Brown, M.; Ruvkun, G.; Zuber, M. T.; Carr, C. E.

    2016-10-01

    We are developing an integrated nucleic acid extraction and sequencing instrument: the Search for Extra-Terrestrial Genomes (SETG) for in situ life detection on Mars. Our goals are to identify related or unrelated nucleic acid-based life on Mars.

  2. Draft Genome Sequence of Cyanobacterium sp. Strain IPPAS B-1200 with a Unique Fatty Acid Composition

    PubMed Central

    Starikov, Alexander Y.; Usserbaeva, Aizhan A.; Sinetova, Maria A.; Sarsekeyeva, Fariza K.; Zayadan, Bolatkhan K.; Ustinova, Vera V.; Kupriyanova, Elena V.; Los, Dmitry A.

    2016-01-01

    Here, we report the draft genome of Cyanobacterium sp. IPPAS strain B-1200, isolated from Lake Balkhash, Kazakhstan, and characterized by the unique fatty acid composition of its membrane lipids, which are enriched with myristic and myristoleic acids. The approximate genome size is 3.4 Mb, and the predicted number of coding sequences is 3,119. PMID:27856596

  3. Molecular cloning and sequencing of a cDNA encoding the thioesterase domain of the rat fatty acid synthetase.

    PubMed

    Naggert, J; Witkowski, A; Mikkelsen, J; Smith, S

    1988-01-25

    A cloned cDNA containing the entire coding sequence for the long-chain S-acyl fatty acid synthetase thioester hydrolase (thioesterase I) component as well as the 3'-noncoding region of the fatty acid synthetase has been isolated using an expression vector and domain-specific antibodies. The coding region was assigned to the thioesterase I domain by identification of sequences coding for characterized peptide fragments, amino-terminal analysis of the isolated thioesterase I domain and the presence of the serine esterase active-site sequence motif. The thioesterase I domain is 306 amino acids long with a calculated molecular mass of 33,476 daltons; its DNA is flanked at the 5'-end by a region coding for the acyl carrier protein domain and at the 3'-end by a 1,537-base pairs-long noncoding sequence with a poly(A) tail. The thioesterase I domain exhibits a low, albeit discernible, homology with the discrete medium-chain S-acyl fatty acid synthetase thioester hydrolases (thioesterase II) from rat mammary gland and duck uropygial gland, suggesting a distant but common evolutionary ancestry for these proteins.

  4. Alfresco—A Workbench for Comparative Genomic Sequence Analysis

    PubMed Central

    Jareborg, Niclas; Durbin, Richard

    2000-01-01

    Comparative analysis of genomic sequences provides a powerful tool for identifying regions of potential biologic function; by comparing corresponding regions of genomes from suitable species, protein coding or regulatory regions can be identified by their homology. This requires the use of several specific types of computational analysis tools. Many programs exist for these types of analysis; not many exist for overall view/control of the results, which is necessary for large-scale genomic sequence analysis. Using Java, we have developed a new visualization tool that allows effective comparative genome sequence analysis. The program handles a pair of sequences from putatively homologous regions in different species. Results from various different existing external analysis programs, such as database searching, gene prediction, repeat masking, and alignment programs, are visualized and used to find corresponding functional sequence domains in the two sequences. The user interacts with the program through a graphic display of the genome regions, in which an independently scrollable and zoomable symbolic representation of the sequences is shown. As an example, the analysis of two unannotated orthologous genomic sequences from human and mouse containing parts of the UTY locus is presented. PMID:10958633

  5. Practical guidelines for B-cell receptor repertoire sequencing analysis.

    PubMed

    Yaari, Gur; Kleinstein, Steven H

    2015-11-20

    High-throughput sequencing of B-cell immunoglobulin repertoires is increasingly being applied to gain insights into the adaptive immune response in healthy individuals and in those with a wide range of diseases. Recent applications include the study of autoimmunity, infection, allergy, cancer and aging. As sequencing technologies continue to improve, these repertoire sequencing experiments are producing ever larger datasets, with tens- to hundreds-of-millions of sequences. These data require specialized bioinformatics pipelines to be analyzed effectively. Numerous methods and tools have been developed to handle different steps of the analysis, and integrated software suites have recently been made available. However, the field has yet to converge on a standard pipeline for data processing and analysis. Common file formats for data sharing are also lacking. Here we provide a set of practical guidelines for B-cell receptor repertoire sequencing analysis, starting from raw sequencing reads and proceeding through pre-processing, determination of population structure, and analysis of repertoire properties. These include methods for unique molecular identifiers and sequencing error correction, V(D)J assignment and detection of novel alleles, clonal assignment, lineage tree construction, somatic hypermutation modeling, selection analysis, and analysis of stereotyped or convergent responses. The guidelines presented here highlight the major steps involved in the analysis of B-cell repertoire sequencing data, along with recommendations on how to avoid common pitfalls.

  6. Sequence and phylogenetic analysis of chicken anaemia virus obtained from backyard and commercial chickens in Nigeria.

    PubMed

    Oluwayelu, D O; Todd, D; Olaleye, O D

    2008-12-01

    This work reports the first molecular analysis study of chicken anaemia virus (CAV) in backyard chickens in Africa using molecular cloning and sequence analysis to characterize CAV strains obtained from commercial chickens and Nigerian backyard chickens. Partial VP1 gene sequences were determined for three CAVs from commercial chickens and for six CAV variants present in samples from a backyard chicken. Multiple alignment analysis revealed that the 6% and 4% nucleotide diversity obtained respectively for the commercial and backyard chicken strains translated to only 2% amino acid diversity for each breed. Overall, the amino acid composition of Nigerian CAVs was found to be highly conserved. Since the partial VP1 gene sequence of two backyard chicken cloned CAV strains (NGR/CI-8 and NGR/CI-9) were almost identical and evolutionarily closely related to the commercial chicken strains NGR-1, and NGR-4 and NGR-5, respectively, we concluded that CAV infections had crossed the farm boundary.

  7. Analysis of expressed sequence tags of the water flea Daphnia magna.

    PubMed

    Watanabe, Hajime; Tatarazako, Norihisa; Oda, Shigeto; Nishide, Hiroyo; Uchiyama, Ikuo; Morita, Masatoshi; Iguchi, Taisen

    2005-08-01

    To study gene expression in the water flea Daphnia magna we constructed a cDNA library and characterized the expressed sequence tags (ESTs) of 7210 clones. The EST sequences clustered into 2958 nonredundant groups. BLAST analyses of both protein and DNA databases showed that 1218 (41%) of the unique sequences shared significant similarities to known nucleotide or amino acid sequences, whereas the remaining 1740 (59%) showed no significant similarities to other genes. Clustering analysis revealed particularly high expression of genes related to ATP synthesis, structural proteins, and proteases. The cDNA clones and EST sequence information should be useful for future functional analysis of daphnid biology and investigation of the links between ecology and genomics.

  8. UNIT 11.10 N-Terminal Sequence Analysis of Proteins and Peptides

    PubMed Central

    Speicher, Kaye D.; Gorman, Nicole; Speicher, David W.

    2009-01-01

    Automated N-terminal sequence analysis involves a series of chemical reactions that derivatize and remove one amino acid at a time from the N-terminal of purified peptides or intact proteins. At least several pmoles of a purified protein or 10 to 20 pmoles of a purified peptide with an unmodified N-terminal is required in order to obtain useful sequence information. In recent years the demand for N-terminal sequencing has decreased substantially as some applications for protein identification and characterization can now be more effectively performed using mass spectrometry. However, N-terminal sequencing remains the method of choice for verifying the N-terminal boundary of recombinant proteins, determining the N-terminal of protease-resistant domains, identifying proteins isolated from species where most of the genome has not yet been sequenced, and mapping modified or crosslinked sites in proteins that prove to be refractory to analysis by mass spectrometry. PMID:18429102

  9. Sequence and phylogenetic analysis of M-class genome segments of novel duck reovirus NP03

    PubMed Central

    Wang, Shao; Chen, Shilong; Cheng, Xiaoxia; Chen, Shaoying; Lin, FengQiang; Jiang, Bing; Zhu, Xiaoli; Li, Zhaolong; Wang, Jinxiang

    2015-01-01

    We report the sequence and phylogenetic analysis of the entire M1, M2, and M3 genome segments of the novel duck reovirus (NDRV) NP03. Alignment between the newly determined nucleotide sequences as well as their deduced amino acid sequences and the published sequences of avian reovirus (ARV) was carried out with DNASTAR software. Sequence comparison showed that the M2 gene had the most variability among the M-class genes of DRV. Phylogenetic analysis of the M-class genes of ARV strains revealed different lineages and clusters within DRVs. The 5 NDRV strains used in this study fall into a well-supported lineage that includes chicken ARV strains, whereas Muscovy DRV (MDRV) strains are separate from NDRV strains and form a distinct genetic lineage in the M2 gene tree. However, the MDRV and NDRV strains are closely related and located in a common lineage in the M1 and M3 gene trees, respectively. PMID:25852231

  10. Purification and partial amino acid sequence of the chloroplast cytochrome b-559.

    PubMed

    Widger, W R; Cramer, W A; Hermodson, M; Meyer, D; Gullifor, M

    1984-03-25

    The hydrophobic cytochrome b-559, purified from unstacked, ethanol-washed spinach thylakoid membranes, using extraction with 2% Triton X-100 in 4 M urea and three chromatographic steps in the presence of protease inhibitors, has a dominant band on sodium dodecyl sulfate-urea gels corresponding to Mr = 10,000. The yield of this preparation is 30-50% (5-10 mg) starting with 600 mg of chlorophyll. The heme content yields a calculated molecular weight of no more than 17,500/heme, and perhaps somewhat smaller after correction for impurities. The Mr = 10,000 band is stained by the tetramethylbenzidine-H2O2 heme reagent on lithium dodecyl sulfate gels run at 0 degrees C. The Mr = 10,000 protein, further separated by high performance liquid chromatography, contains a unique NH2 terminus that is not blocked, and the amino acid sequence for the first 27 residues is NH2-Ser-Gly-Ser-Thr-Gly-Glu-Arg-Ser-Phe-Ala-Asp-Ile-Ile-Thr-Ser-Ile-Arg-Tyr-Trp -Val-Ile-X-Ser-Ile-Thr-Ile-Pro. . . COOH. Approximately 55% of the amino acids are hydrophobic, based on amino acid analysis of the Mr = 10,000 peptide, which also indicated the presence of at least one histidine. Only one cytochrome b-559 component could be identified, whose yield indicated that it arises from a single b-559 protein in chloroplasts corresponding to the in situ high potential cytochrome of the chloroplast photosystem II.

  11. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments.

    PubMed

    Schwarz, Roland F; Tamuri, Asif U; Kultys, Marek; King, James; Godwin, James; Florescu, Ana M; Schultz, Jörg; Goldman, Nick

    2016-05-05

    Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles). © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Stratigraphic sequence analysis of the Antler foreland

    SciTech Connect

    Silberling, N.J.; Nichols, K.M.; Macke, D.L. )

    1993-04-01

    Mid-Upper Devonian to Upper Mississippian strata in western Utah were deposited in the distal Antler foreland. They record lateral and vertical changes in depositional environments that define five successive stratigraphic sequences, each representing a third-order transgressive-regressive cycle. In ascending order, these sequences are informally named the Langenheim (LA) of late Frasnian to mid-Famennian age, the Gutschick (GU) of late Famennian to early Kinderhookian age, the Morris (MO) of late Kinderhookian age; the Sadlick (SA) of Osagean to early Meramecian age, and the Maughan (MA) of mid-Meramecian to Chesterian age. MO is widespread and recognized within carbonate rocks of the Fitchville Formation and Joana Limestone. SA formed in concert with and to the east and south of the Wendover foreland high; the Delle phosphatic event marks maximum marine flooding during SA deposition. The transgressive systems tract of MA includes rhythmic-bedded limestone in the upper part of the Deseret Limestone in west-central Utah and, farther west, the hypoxic limestone and black shale of the Skunk Spring Limestone Bed and part of the overlying Chainman Shale. Traced westward into Nevada, MA first oversteps SA and then MO. Lithostratigraphic correlation of these sequences still farther west into the Eureka thrust belt (ETB) could mean that the youngest strata truncated by the Roberts Mountains thrust belong to the MA and that this thrust is simply part of the post-Mississippian ETB. However, some strata in central Nevada that lithically resemble those of the MA are paleontologically dated as Early Mississippian, the age of sequences overstepped by MA not far to the east. Thus, at least some imbricates of the ETB may contain a sequence stratigraphy which reflects local tectonic control.

  13. A novel regucalcin gene promoter region-related protein: comparison of nucleotide and amino acid sequences in vertebrate species.

    PubMed

    Sawada, Natsumi; Yamaguchi, Masayoshi

    2005-01-01

    The molecular cloning and sequencing of the cDNA coding for a novel regucalcin gene promoter region-related protein (RGPR-p117) from bovine, rabbit and chicken livers was investigated using rapid amplification of cDNA endo (RACE) method. Their nucleotide and amino acid sequences were compared with human, rat and mouse sequences published previously. RGPR-p117 of bovine, rabbit and chicken livers consisted of 1052, 1045, and 929 amino acid residues with calculated molecular mass of 117, 114, and 103 kDa, and estimated pI of 5.64, 5.84, and 5.59, respectively. Comparison analysis revealed that the nucleotide sequences of RGPR-p117 from mammalian species were highly-conserved in their coding region, and the homologies were at least 72.9%. The RGPR-p117 proteins in mammalian species consisted of 1045-1060 amino acids, and had 63.1-90.2% identity. Meanwhile, the nucleotide and amino acid sequences of chicken RGPR-p117 had at least 36.4 and 43.7% identities, respectively. Phylogenetic analysis showed that RGPR-p117 in six vertebrates appears to form a single cluster. Mammalian RGPR-p117 conserved a leucine zipper motif. Moreover, the analysis for subcellular localization of RGPR-p117 from six vertebrates showed the probability of nuclear localization >52.2%; the nuclear localization in rat and mouse was 78.3%. This study demonstrates a great conservation of RGPR-p117 genes throughout evolution.

  14. Clostridium sticklandii, a specialist in amino acid degradation:revisiting its metabolism through its genome sequence

    PubMed Central

    2010-01-01

    Background Clostridium sticklandii belongs to a cluster of non-pathogenic proteolytic clostridia which utilize amino acids as carbon and energy sources. Isolated by T.C. Stadtman in 1954, it has been generally regarded as a "gold mine" for novel biochemical reactions and is used as a model organism for studying metabolic aspects such as the Stickland reaction, coenzyme-B12- and selenium-dependent reactions of amino acids. With the goal of revisiting its carbon, nitrogen, and energy metabolism, and comparing studies with other clostridia, its genome has been sequenced and analyzed. Results C. sticklandii is one of the best biochemically studied proteolytic clostridial species. Useful additional information has been obtained from the sequencing and annotation of its genome, which is presented in this paper. Besides, experimental procedures reveal that C. sticklandii degrades amino acids in a preferential and sequential way. The organism prefers threonine, arginine, serine, cysteine, proline, and glycine, whereas glutamate, aspartate and alanine are excreted. Energy conservation is primarily obtained by substrate-level phosphorylation in fermentative pathways. The reactions catalyzed by different ferredoxin oxidoreductases and the exergonic NADH-dependent reduction of crotonyl-CoA point to a possible chemiosmotic energy conservation via the Rnf complex. C. sticklandii possesses both the F-type and V-type ATPases. The discovery of an as yet unrecognized selenoprotein in the D-proline reductase operon suggests a more detailed mechanism for NADH-dependent D-proline reduction. A rather unusual metabolic feature is the presence of genes for all the enzymes involved in two different CO2-fixation pathways: C. sticklandii harbours both the glycine synthase/glycine reductase and the Wood-Ljungdahl pathways. This unusual pathway combination has retrospectively been observed in only four other sequenced microorganisms. Conclusions Analysis of the C. sticklandii genome and

  15. 5S ribosomal ribonucleic acid sequences in Bacteroides and Fusobacterium: evolutionary relationships within these genera and among eubacteria in general

    NASA Technical Reports Server (NTRS)

    Van den Eynde, H.; De Baere, R.; Shah, H. N.; Gharbia, S. E.; Fox, G. E.; Michalik, J.; Van de Peer, Y.; De Wachter, R.

    1989-01-01

    The 5S ribosomal ribonucleic acid (rRNA) sequences were determined for Bacteroides fragilis, Bacteroides thetaiotaomicron, Bacteroides capillosus, Bacteroides veroralis, Porphyromonas gingivalis, Anaerorhabdus furcosus, Fusobacterium nucleatum, Fusobacterium mortiferum, and Fusobacterium varium. A dendrogram constructed by a clustering algorithm from these sequences, which were aligned with all other hitherto known eubacterial 5S rRNA sequences, showed differences as well as similarities with respect to results derived from 16S rRNA analyses. In the 5S rRNA dendrogram, Bacteroides clustered together with Cytophaga and Fusobacterium, as in 16S rRNA analyses. Intraphylum relationships deduced from 5S rRNAs suggested that Bacteroides is specifically related to Cytophaga rather than to Fusobacterium, as was suggested by 16S rRNA analyses. Previous taxonomic considerations concerning the genus Bacteroides, based on biochemical and physiological data, were confirmed by the 5S rRNA sequence analysis.

  16. 5S ribosomal ribonucleic acid sequences in Bacteroides and Fusobacterium: evolutionary relationships within these genera and among eubacteria in general

    NASA Technical Reports Server (NTRS)

    Van den Eynde, H.; De Baere, R.; Shah, H. N.; Gharbia, S. E.; Fox, G. E.; Michalik, J.; Van de Peer, Y.; De Wachter, R.

    1989-01-01

    The 5S ribosomal ribonucleic acid (rRNA) sequences were determined for Bacteroides fragilis, Bacteroides thetaiotaomicron, Bacteroides capillosus, Bacteroides veroralis, Porphyromonas gingivalis, Anaerorhabdus furcosus, Fusobacterium nucleatum, Fusobacterium mortiferum, and Fusobacterium varium. A dendrogram constructed by a clustering algorithm from these sequences, which were aligned with all other hitherto known eubacterial 5S rRNA sequences, showed differences as well as similarities with respect to results derived from 16S rRNA analyses. In the 5S rRNA dendrogram, Bacteroides clustered together with Cytophaga and Fusobacterium, as in 16S rRNA analyses. Intraphylum relationships deduced from 5S rRNAs suggested that Bacteroides is specifically related to Cytophaga rather than to Fusobacterium, as was suggested by 16S rRNA analyses. Previous taxonomic considerations concerning the genus Bacteroides, based on biochemical and physiological data, were confirmed by the 5S rRNA sequence analysis.

  17. Sequence analysis in molecular biology: Treasure trove or trivial pursuit

    SciTech Connect

    Von Heijne, G.

    1987-01-01

    This book deals with sequence analysis on the computer. One of its aims is to serve as a brief survey of what one can do with protein and DNA sequences either directly on a microcomputer or by using one of the main sequence/programs data banks such as BioNet or the Wisconsin package. Equally important, the book traces the origins of some of the ideas that have come to be embodied in these programs from both biological and methodological points of view: What do the standard sequence analysis algorithms really analyze, and to what degree can we trust their outputs.

  18. Modern Computational Techniques for the HMMER Sequence Analysis

    PubMed Central

    2013-01-01

    This paper focuses on the latest research and critical reviews on modern computing architectures, software and hardware accelerated algorithms for bioinformatics data analysis with an emphasis on one of the most important sequence analysis applications—hidden Markov models (HMM). We show the detailed performance comparison of sequence analysis tools on various computing platforms recently developed in the bioinformatics society. The characteristics of the sequence analysis, such as data and compute-intensive natures, make it very attractive to optimize and parallelize by using both traditional software approach and innovated hardware acceleration technologies. PMID:25937944

  19. SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues

    PubMed Central

    Sun, Jun; Liu, Rong

    2015-01-01

    Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder. PMID:26176857

  20. Amino acid sequence of winged bean (Psophocarpus tetragonolobus (L.) DC.) chymotrypsin inhibitor, WCI-3.

    PubMed

    Shibata, H; Hara, S; Ikenaka, T

    1988-10-01

    The complete amino acid sequence of winged bean chymotrypsin inhibitor 3 (WCI-3) was determined by the conventional methods. WCI-3 consisted of 183 amino acid residues, but was heterogeneous in the carboxyl terminal region owing to the loss of one to four carboxyl terminal amino acid residues. The sequence of WCI-3 was highly homologous with those of soybean trypsin inhibitor Tia, winged bean trypsin inhibitor WTI-1, and Erythrina latissima trypsin inhibitor DE-3. One of the reactive site peptide bonds of WCI-3 was identified as Leu(65)-Ser(66), which was located at the same position as those of the other Kunitz-family leguminous proteinase inhibitors.

  1. Amino acid sequence of anionic peroxidase from the windmill palm tree Trachycarpus fortunei.

    PubMed

    Baker, Margaret R; Zhao, Hongwei; Sakharov, Ivan Yu; Li, Qing X

    2014-12-10

    Palm peroxidases are extremely stable and have uncommon substrate specificity. This study was designed to fill in the knowledge gap about the structures of a peroxidase from the windmill palm tree Trachycarpus fortunei. The complete amino acid sequence and partial glycosylation were determined by MALDI-top-down sequencing of native windmill palm tree peroxidase (WPTP), MALDI-TOF/TOF MS/MS of WPTP tryptic peptides, and cDNA sequencing. The propeptide of WPTP contained N- and C-terminal signal sequences which contained 21 and 17 amino acid residues, respectively. Mature WPTP was 306 amino acids in length, and its carbohydrate content ranged from 21% to 29%. Comparison to closely related royal palm tree peroxidase revealed structural features that may explain differences in their substrate specificity. The results can be used to guide engineering of WPTP and its novel applications.

  2. Amino Acid Sequence of Anionic Peroxidase from the Windmill Palm Tree Trachycarpus fortunei

    PubMed Central

    2015-01-01

    Palm peroxidases are extremely stable and have uncommon substrate specificity. This study was designed to fill in the knowledge gap about the structures of a peroxidase from the windmill palm tree Trachycarpus fortunei. The complete amino acid sequence and partial glycosylation were determined by MALDI-top-down sequencing of native windmill palm tree peroxidase (WPTP), MALDI-TOF/TOF MS/MS of WPTP tryptic peptides, and cDNA sequencing. The propeptide of WPTP contained N- and C-terminal signal sequences which contained 21 and 17 amino acid residues, respectively. Mature WPTP was 306 amino acids in length, and its carbohydrate content ranged from 21% to 29%. Comparison to closely related royal palm tree peroxidase revealed structural features that may explain differences in their substrate specificity. The results can be used to guide engineering of WPTP and its novel applications. PMID:25383699

  3. Frequencies of amino acid strings in globular protein sequences indicate suppression of blocks of consecutive hydrophobic residues

    PubMed Central

    Schwartz, Russell; Istrail, Sorin; King, Jonathan

    2001-01-01

    Patterns of hydrophobic and hydrophilic residues play a major role in protein folding and function. Long, predominantly hydrophobic strings of 20–22 amino acids each are associated with transmembrane helices and have been used to identify such sequences. Much less attention has been paid to hydrophobic sequences within globular proteins. In prior work on computer simulations of the competition between on-pathway folding and off-pathway aggregate formation, we found that long sequences of consecutive hydrophobic residues promoted aggregation within the model, even controlling for overall hydrophobic content. We report here on an analysis of the frequencies of different lengths of contiguous blocks of hydrophobic residues in a database of amino acid sequences of proteins of known structure. Sequences of three or more consecutive hydrophobic residues are found to be significantly less common in actual globular proteins than would be predicted if residues were selected independently. The result may reflect selection against long blocks of hydrophobic residues within globular proteins relative to what would be expected if residue hydrophobicities were independent of those of nearby residues in the sequence. PMID:11316883

  4. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations.

    PubMed

    Abascal, Federico; Zardoya, Rafael; Telford, Maximilian J

    2010-07-01

    We present TranslatorX, a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations. Many comparisons between biological sequences (nucleic acids and proteins) involve the construction of multiple alignments. Alignments represent a statement regarding the homology between individual nucleotides or amino acids within homologous genes. As protein-coding DNA sequences evolve as triplets of nucleotides (codons) and it is known that sequence similarity degrades more rapidly at the DNA than at the amino acid level, alignments are generally more accurate when based on amino acids than on their corresponding nucleotides. TranslatorX novelties include: (i) use of all documented genetic codes and the possibility of assigning different genetic codes for each sequence; (ii) a battery of different multiple alignment programs; (iii) translation of ambiguous codons when possible; (iv) an innovative criterion to clean nucleotide alignments with GBlocks based on protein information; and (v) a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments. The TranslatorX server is freely available at http://translatorx.co.uk.

  5. RNAblueprint: flexible multiple target nucleic acid sequence design.

    PubMed

    Hammer, Stefan; Tschiatschek, Birgit; Flamm, Christoph; Hofacker, Ivo L; Findeiß, Sven

    2017-09-15

    Realizing the value of synthetic biology in biotechnology and medicine requires the design of molecules with specialized functions. Due to its close structure to function relationship, and the availability of good structure prediction methods and energy models, RNA is perfectly suited to be synthetically engineered with predefined properties. However, currently available RNA design tools cannot be easily adapted to accommodate new design specifications. Furthermore, complicated sampling and optimization methods are often developed to suit a specific RNA design goal, adding to their inflexibility. We developed a C ++  library implementing a graph coloring approach to stochastically sample sequences compatible with structural and sequence constraints from the typically very large solution space. The approach allows to specify and explore the solution space in a well defined way. Our library also guarantees uniform sampling, which makes optimization runs performant by not only avoiding re-evaluation of already found solutions, but also by raising the probability of finding better solutions for long optimization runs. We show that our software can be combined with any other software package to allow diverse RNA design applications. Scripting interfaces allow the easy adaption of existing code to accommodate new scenarios, making the whole design process very flexible. We implemented example design approaches written in Python to demonstrate these advantages. RNAblueprint , Python implementations and benchmark datasets are available at github: https://github.com/ViennaRNA . s.hammer@univie.ac.at, ivo@tbi.univie.ac.at or sven@tbi.univie.ac.at. Supplementary data are available at Bioinformatics online.

  6. Caraparu virus (group C Orthobunyavirus): sequencing and phylogenetic analysis based on the conserved region 3 of the RNA polymerase gene.

    PubMed

    de Brito Magalhães, Cintia Lopes; Quinan, Bárbara Resende; Novaes, Renata Franco Vianna; dos Santos, João Rodrigues; Kroon, Erna Geessien; Bonjardim, Cláudio Antônio; Ferreira, Paulo César Peregrino

    2007-12-01

    Here, for the first time, we report the nucleotide sequence of Caraparu virus (CARV) L segment and the analysis of the RNA polymerase region 3 encoded by this segment. The 1,404 bp nucleotide sequence shares the highest identity with Bunyamwera, La Crosse, Oropouche, and Akabane virus sequences. The amino acid sequence was deduced and aligned with sequences from members of the Bunyaviridae family and used for phylogenetic analysis. The CARV clustered in the Orthobunyavirus genus. The premotif A and motifs A-E are present in the region 3 of the Bunyaviridae family, were also conserved in CARV L protein, as well as other conserved regions among Orthobunyavirus genus.

  7. Nucleotide and deduced amino acid sequences of a new subtilisin from an alkaliphilic Bacillus isolate.

    PubMed

    Saeki, Katsuhisa; Magallones, Marietta V; Takimura, Yasushi; Hatada, Yuji; Kobayashi, Tohru; Kawai, Shuji; Ito, Susumu

    2003-10-01

    The gene for a new subtilisin from the alkaliphilic Bacillus sp. KSM-LD1 was cloned and sequenced. The open reading frame of the gene encoded a 97 amino-acid prepro-peptide plus a 307 amino-acid mature enzyme that contained a possible catalytic triad of residues, Asp32, His66, and Ser224. The deduced amino acid sequence of the mature enzyme (LD1) showed approximately 65% identity to those of subtilisins SprC and SprD from alkaliphilic Bacillus sp. LG12. The amino acid sequence identities of LD1 to those of previously reported true subtilisins and high-alkaline proteases were below 60%. LD1 was characteristically stable during incubation with surfactants and chemical oxidants. Interestingly, an oxidizable Met residue is located next to the catalytic Ser224 of the enzyme as in the cases of the oxidation-susceptible subtilisins reported to date.

  8. Identification and sequence analysis of grain softness protein in selected wheat, rye and triticale.

    PubMed

    Kharrazi, M A S; Bobojonov, V

    2012-08-16

    Grain softness protein (GSP) is an important protein for overcoming milling and grain defenses in the innate immunity systems of cereals. The objective of this study was to evaluate and understand GSP sequences in selected wheat, rye and triticale. Using sequences for this gene from a sequence database, we performed clustering analysis to compare the sequences obtained from 3 germplasms with other studied sequences for GSP. The maximum difference between the Hirmand GSP genotype in wheat and the database sequences was 23% in EF109396 and EF109399. Most amino acid variation between the GSP sequences involved the same amino acids. The Nikita rye GSP gene showed 64% identity with DQ269918 and AY667063. The isoelectric point in the GSP of wheat and Lasko triticale was significantly higher than that of rye GSP. In addition, parameters such as optical density, grand average of hydrophobicity, percentage of hydrophobicity and hydrophilic amino acids, and number of alpha helices and beta sheets in GSP were similar in wheat and triticale but not in wheat and rye.

  9. Shark myelin basic protein: amino acid sequence, secondary structure, and self-association.

    PubMed

    Milne, T J; Atkins, A R; Warren, J A; Auton, W P; Smith, R

    1990-09-01

    Myelin basic protein (MBP) from the Whaler shark (Carcharhinus obscurus) has been purified from acid extracts of a chloroform/methanol pellet from whole brains. The amino acid sequence of the majority of the protein has been determined and compared with the sequences of other MBPs. The shark protein has only 44% homology with the bovine protein, but, in common with other MBPs, it has basic residues distributed throughout the sequence and no extensive segments that are predicted to have an ordered secondary structure in solution. Shark MBP lacks the triproline sequence previously postulated to form a hairpin bend in the molecule. The region containing the putative consensus sequence for encephalitogenicity in the guinea pig contains several substitutions, thus accounting for the lack of activity of the shark protein. Studies of the secondary structure and self-association have shown that shark MBP possesses solution properties similar to those of the bovine protein, despite the extensive differences in primary structure.

  10. Peptide Mass Fingerprinting and N-Terminal Amino Acid Sequencing of Glycosylated Cysteine Protease of Euphorbia nivulia Buch.-Ham.

    PubMed Central

    Badgujar, Shamkant B.; Mahajan, Raghunath T.

    2013-01-01

    A new cysteine protease named Nivulian-II has been purified from the latex of Euphorbia nivulia Buch.-Ham. The apparent molecular mass of Nivulian-II is 43670.846 Da (MALDI TOF/MS). Peptide mass fingerprint analysis revealed peptide matches to Maturase K (Q52ZV1_9MAGN) of Banksia quercifolia. The N-terminal sequence (DFPPNTCCCICC) showed partial homology with those of other cysteine proteinases of biological origin. This is the first paper to characterize a Nivulian-II of E. nivulia latex with respect to amino acid sequencing. PMID:23476742

  11. Detailed Analysis of a Multiplet Earthquake Sequence

    NASA Astrophysics Data System (ADS)

    Iglesias, A.; Singh, S. K.; Garduño, V. H.

    2014-12-01

    The Mexican National Seismological Service reported a sequence of four small earthquakes (2.5 < M < 3.0) occurring in Morelia, a city of 1,000,000, which is the capital city of Michoacán State. A careful revision of the records from a three-component broad band station, located ~10 km far from the earthquakes, showed a sequence of 7 earthquakes in a period of about 36 hours. Waveforms are remarkably similar between them and they may be considered as a "multiplet". In this work, we use the records from the broad-band station and a coda wave interferometry based methodology to obtain the relative distance between pair of events. The 21 inter-event distances obtained are considered as over-determined system for the relative positions between events. A non-linear damped scheme is used to solve the over-determined system and to obtain the spatial distribution of the 7 earthquakes. Results show (1) distances between events are < 200 m, and (2) the sequence has an approximate linear distribution.

  12. Complete cDNA and derived amino acid sequence of human factor V

    SciTech Connect

    Jenny, R.J.; Pittman, D.D.; Toole, J.J.; Kriz, R.W.; Aldape, R.A.; Hewick, R.M.; Kaufman, R.J.; Mann, K.G.

    1987-07-01

    cDNA clones encoding human factor V have been isolated from an oligo(dT)-primed human fetal liver cDNA library prepared with vector Charon 21A. The cDNA sequence of factor V from three overlapping clones includes a 6672-base-pair (bp) coding region, a 90-bp 5' untranslated region, and a 163-bp 3' untranslated region within which is a poly(A)tail. The deduced amino acid sequence consists of 2224 amino acids inclusive of a 28-amino acid leader peptide. Direct comparison with human factor VIII reveals considerable homology between proteins in amino acid sequence and domain structure: a triplicated A domain and duplicated C domain show approx. 40% identity with the corresponding domains in factor VIII. As in factor VIII, the A domains of factor V share approx. 40% amino acid-sequence homology with the three highly conserved domains in ceruloplasmin. The B domain of factor V contains 35 tandem and approx. 9 additional semiconserved repeats of nine amino acids of the form Asp-Leu-Ser-Gln-Thr-Thr/Asn-Leu-Ser-Pro and 2 additional semiconserved repeats of 17 amino acids. Factor V contains 37 potential N-linked glycosylation sites, 25 of which are in the B domain, and a total of 19 cysteine residues.

  13. Complete nucleotide sequence analysis of a Dengue-1 virus isolated on Easter Island, Chile.

    PubMed

    Cáceres, C; Yung, V; Araya, P; Tognarelli, J; Villagra, E; Vera, L; Fernández, J

    2008-01-01

    Dengue-1 viruses responsible for the dengue fever outbreak in Easter Island in 2002 were isolated from acute-phase sera of dengue fever patients. In order to analyze the complete genome sequence, we designed primers to amplify contiguous segments across the entire sequence of the viral genome. RT-PCR products obtained were cloned, and complete nucleotide and deduced amino acid sequences were determined. This report constitutes the first complete genetic characterization of a DENV-1 isolate from Chile. Phylogenetic analysis shows that an Easter Island isolate is most closely related to Pacific DENV-1 genotype IV viruses.

  14. Lipoic acid metabolism in Escherichia coli: sequencing and functional characterization of the lipA and lipB genes.

    PubMed Central

    Reed, K E; Cronan, J E

    1993-01-01

    Two genes, lipA and lipB, involved in lipoic acid biosynthesis or metabolism were characterized by DNA sequence analysis. The translational initiation site of the lipA gene was established, and the lipB gene product was identified as a 25-kDa protein. Overproduction of LipA resulted in the formation of inclusion bodies, from which the protein was readily purified. Cells grown under strictly anaerobic conditions required the lipA and lipB gene products for the synthesis of a functional glycine cleavage system. Mutants carrying a null mutation in the lipB gene retained a partial ability to synthesize lipoic acid and produced low levels of pyruvate dehydrogenase and alpha-ketoglutarate dehydrogenase activities. The lipA gene product failed to convert protein-bound octanoic acid moieties to lipoic acid moieties in vivo; however, the growth of both lipA and lipB mutants was supported by either 6-thiooctanoic acid or 8-thiooctanoic acid in place of lipoic acid. These data suggest that LipA is required for the insertion of the first sulfur into the octanoic acid backbone. LipB functions downstream of LipA, but its role in lipoic acid metabolism remains unclear. Images PMID:8444795

  15. Characterisation and Next-generation Sequencing Analysis of Unknown Arboviruses

    DTIC Science & Technology

    2012-09-01

    using techniques such as PCR-select subtraction and next-generation sequencing. Preliminary analysis of the four sequenced viruses has shown that they...HOJV) and Harrison Dam virus (HARDV), and two unknown bunyaviruses, Buffalo Creek Virus (BCV) and Maprik virus (MPKV). It describes the techniques such...unknown viruses with greater speed and at lower cost. The rapid advancement of new generation sequencing techniques allows for highly specific acquisition

  16. Nonlinear multiscale analysis of three-dimensional echocardiographic sequences

    SciTech Connect

    Sarti, A. |; Mikula, K.; Sgallari, F.

    1999-06-01

    The authors introduce a new model for multiscale analysis of space-time echocardiographic sequences. The proposed nonlinear partial differential equation, representing the multiscale analysis, filters the sequence while keeping the space-time coherent structures. It combines the ideas of regularized Perona-Malik anisotropic diffusion and the Galilean invariant movie multiscale analysis of Alvarez, Guichard, Lions and Morel. A numerical method for solving the proposed partial differential equation is suggested and its stability is shown. Computational results on synthesized and real sequences are provided. A qualitative and quantitative evaluation of the accuracy of the method is presented.

  17. Nonlinear multiscale analysis of three-dimensional echocardiographic sequences.

    PubMed

    Sarti, A; Mikula, K; Sgallari, F

    1999-06-01

    We introduce a new model for multiscale analysis of space-time echocardiographic sequences. The proposed nonlinear partial differential equation, representing the multiscale analysis, filters the sequence while keeping the space-time coherent structures. It combines the ideas of regularized Perona-Malik anisotropic diffusion and the Galilean invariant movie multiscale analysis of Alvarez, Guichard, Lions and Morel. A numerical method for solving the proposed partial differential equation is suggested and its stability is shown. Computational results on synthesized and real sequences are provided. A qualitative and quantitative evaluation of the accuracy of the method is presented.

  18. Error analysis of deep sequencing of phage libraries: peptides censored in sequencing.

    PubMed

    Matochko, Wadim L; Derda, Ratmir

    2013-01-01

    Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (Sa). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq). Sequencing without any bias and errors is Seq = Sa IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  19. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    PubMed Central

    Matochko, Wadim L.; Derda, Ratmir

    2013-01-01

    Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (Sa). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq). Sequencing without any bias and errors is Seq = Sa IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071

  20. Molecular cloning, sequence analysis, and functional expression of a novel growth regulator, oncostatin M.

    PubMed Central

    Malik, N; Kallestad, J C; Gunderson, N L; Austin, S D; Neubauer, M G; Ochs, V; Marquardt, H; Zarling, J M; Shoyab, M; Wei, C M

    1989-01-01

    Oncostatin M is a polypeptide of Mr approximately 28,000 that acts as a growth regulator for many cultured mammalian cells. We report the cDNA and genomic cloning, sequence analysis, and functional expression in heterologous cells of oncostatin M. cDNA clones were isolated from mRNA of U937 cells that had been induced to differentiate into macrophagelike cells by treatment with phorbol 12-myristate 13-acetate, and a genomic clone was also isolated from human brain DNA. Sequence analysis of these clones established the 1,814-base-pair cDNA sequence as well as exon boundaries. This sequence predicted that oncostatin M is synthesized as a 252-amino-acid polypeptide, with a 25-residue hydrophobic sequence resembling a signal peptide at the N terminus. The predicted oncostatin M amino acid sequence shared no homology with other known proteins, but the sequence of the 3' noncoding region of the cDNA contained an A + T-rich stretch with sequence motifs found in the 3' untranslated regions of many cytokine and lymphokine cDNAs. Oncostatin M mRNA of approximately 2 kilobase pairs was detected in phorbol 12-myristate 13-acetate-treated U937 cells and in activated human T cells. Transfection of cDNA encoding the oncostatin M precursor into COS cells resulted in the secretion of proteins with the structural and functional properties of oncostatin M. The unique amino acid sequence, expression by lymphoid cells, and growth-regulatory activities of oncostatin M suggest that it is a novel cytokine. Images PMID:2779549

  1. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

    PubMed Central

    Sinclair, Robert M.; Ravantti, Janne J.

    2017-01-01

    ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids

  2. High Throughput Sequence Analysis for Disease Resistance in Maize

    USDA-ARS?s Scientific Manuscript database

    Preliminary results of a computational analysis of high throughput sequencing data from Zea mays and the fungus Aspergillus are reported. The Illumina Genome Analyzer was used to sequence RNA samples from two strains of Z. mays (Va35 and Mp313) collected over a time course as well as several specie...

  3. Initial sequencing and analysis of the human genome.

    PubMed

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

  4. Classification of mouse VK groups based on the partial amino acid sequence to the first invariant tryptophan: impact of 14 new sequences from IgG myeloma proteins.

    PubMed

    Potter, M; Newell, J B; Rudikoff, S; Haber, E

    1982-12-01

    Fourteen new VK sequences derived from BALB/c IgG myeloma proteins were determined to the first invariant tryptophan (Trp 35). These partial sequences were compared with 65 other published VK sequences using a computer program. The 79 sequences were organized according to the length of the sequence from the amino terminus to the first invariant tryptophan (Trp 35), into seven groups (33, 34, 35, 36, 39, 40 and 41aa). A distance matrix of all 79 sequences was then computed, i.e. the number of amino acid substitutions necessary to convert one sequence to another was determined. From these data a dendrogram was constructed. Most of the VK sequences fell into clusters or closely related groups. The definition of a sequence group is arbitrary but facilitates the classification of VK proteins. We used 12 substitutions as the basis for defining a sequence group based on the known number of substitutions that are found in the VK21 proteins. By this criterion there were 18 groups in the Trp 35 dendrogram. Twelve of the 14 new sequences fell into one of these sequence groups; two formed new sequence groups. Collective amino acid sequencing is still encountering new VK structures indicating more sequences will be required to attain an accurate estimate of the total number of VK groups. Updated dendrograms can be quickly generated to include newly generated sequences.

  5. Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane

    PubMed Central

    Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

    2003-01-01

    To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979

  6. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1997-04-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided. 7 figs.

  7. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1997-01-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided.

  8. MESSA: MEta-Server for protein Sequence Analysis

    PubMed Central

    2012-01-01

    Background Computational sequence analysis, that is, prediction of local sequence properties, homologs, spatial structure and function from the sequence of a protein, offers an efficient way to obtain needed information about proteins under study. Since reliable prediction is usually based on the consensus of many computer programs, meta-severs have been developed to fit such needs. Most meta-servers focus on one aspect of sequence analysis, while others incorporate more information, such as PredictProtein for local sequence feature predictions, SMART for domain architecture and sequence motif annotation, and GeneSilico for secondary and spatial structure prediction. However, as predictions of local sequence properties, three-dimensional structure and function are usually intertwined, it is beneficial to address them together. Results We developed a MEta-Server for protein Sequence Analysis (MESSA) to facilitate comprehensive protein sequence analysis and gather structural and functional predictions for a protein of interest. For an input sequence, the server exploits a number of select tools to predict local sequence properties, such as secondary structure, structurally disordered regions, coiled coils, signal peptides and transmembrane helices; detect homologous proteins and assign the query to a protein family; identify three-dimensional structure templates and generate structure models; and provide predictive statements about the protein's function, including functional annotations, Gene Ontology terms, enzyme classification and possible functionally associated proteins. We tested MESSA on the proteome of Candidatus Liberibacter asiaticus. Manual curation shows that three-dimensional structure models generated by MESSA covered around 75% of all the residues in this proteome and the function of 80% of all proteins could be predicted. Availability MESSA is free for non-commercial use at http://prodata.swmed.edu/MESSA/ PMID:23031578

  9. The dog genome: survey sequencing and comparative analysis.

    PubMed

    Kirkness, Ewen F; Bafna, Vineet; Halpern, Aaron L; Levy, Samuel; Remington, Karin; Rusch, Douglas B; Delcher, Arthur L; Pop, Mihai; Wang, Wei; Fraser, Claire M; Venter, J Craig

    2003-09-26

    A survey of the dog genome sequence (6.22 million sequence reads; 1.5x coverage) demonstrates the power of sample sequencing for comparative analysis of mammalian genomes and the generation of species-specific resources. More than 650 million base pairs (>25%) of dog sequence align uniquely to the human genome, including fragments of putative orthologs for 18,473 of 24,567 annotated human genes. Mutation rates, conserved synteny, repeat content, and phylogeny can be compared among human, mouse, and dog. A variety of polymorphic elements are identified that will be valuable for mapping the genetic basis of diseases and traits in the dog.

  10. Gene structure and amino acid sequence of Latimeria chalumnae (coelacanth) myelin DM20: phylogenetic relation of the fish.

    PubMed

    Tohyama, Y; Kasama-Yoshida, H; Sakuma, M; Kobayashi, Y; Cao, Y; Hasegawa, M; Kojima, H; Tamai, Y; Tanokura, M; Kurihara, T

    1999-07-01

    The structure of Latimeria chalumnae (coelacanth) proteolipid protein/DM20 gene excluding exon 1 was determined, and the amino acid sequence of Latimeria DM20 corresponding to exons 2-7 was deduced. The nucleotide sequence of exon 3 suggests that only DM20 isoform is expressed in Latimeria. The structure of proteolipid protein/DM20 gene is well preserved among human, dog, mouse, and Latimeria. Southern blot analysis indicates that Latimeria DM20 gene is a single-copy gene. When the amino acid sequences of DM20 were compared among various species, Latimeria was more similar to tetrapods than other fishes including lungfish, confirming the previous finding by immunoreactivity (Waehneldt and Malotka 1989 J. Neurochem. 52:1941-1943). However, when phylogenetic trees were constructed from the DM20 sequences, lungfish was clearly the closest to tetrapods. Latimeria was situated outside of lungfish by the maximum likelihood method. The apparent similarity of Latimeria DM20 to tetrapod proteolipid protein/DM20 is explained by the slow amino acid substitution rate of Latimeria DM20.

  11. A rapid method for manual or automated purification of fluorescently labeled nucleic acids for sequencing, genotyping, and microarrays.

    PubMed

    Springer, Amy L; Booth, Lisa R; Braid, Michael D; Houde, Christiane M; Hughes, Karin A; Kaiser, Robert J; Pedrak, Casandra; Spicer, Douglas A; Stolyar, Sergey

    2003-03-01

    Fluorescent dyes provide specific, sensitive, and multiplexed detection of nucleic acids. To maximize sensitivity, fluorescently labeled reaction products (e.g., cycle sequencing or primer extension products) must be purified away from residual dye-labeled precursors. Successful high-throughput analyses require that this purification be reliable, rapid, and amenable to automation. Common methods for purifying reaction products involve several steps and require processes that are not easily automated. Prolinx, Inc. has devel oped RapXtract superparamagnetic separation technology affording rapid and easy-to-perform methods that yield high-quality product and are easily automated. The technology uses superparamagnetic particles that specifically remove unincorporated dye-labeled precursors. These particles are efficiently pelleted in the presence of a magnetic field, making them ideal for purification because of the rapid separations that they allow. RapXtract-purified sequencing reactions yield data with good signal and high Phred quality scores, and they work with various sequencing dye chemistries, including BigDye and near-infrared fluorescence IRDyes. RapXtract technology can also be used to purify dye primer sequencing reactions, primer extension reactions for genotyping analysis, and nucleic acid labeling reactions for microarray hybridization. The ease of use and versatility of RapXtract technology makes it a good choice for manual or automated purification of fluorescently labeled nucleic acids.

  12. Amino acid sequence around the active-site serine residue in the acyltransferase domain of goat mammary fatty acid synthetase.

    PubMed Central

    Mikkelsen, J; Højrup, P; Rasmussen, M M; Roepstorff, P; Knudsen, J

    1985-01-01

    Goat mammary fatty acid synthetase was labelled in the acyltransferase domain by formation of O-ester intermediates by incubation with [1-14C]acetyl-CoA and [2-14C]malonyl-CoA. Tryptic-digest and CNBr-cleavage peptides were isolated and purified by high-performance reverse-phase and ion-exchange liquid chromatography. The sequences of the malonyl- and acetyl-labelled peptides were shown to be identical. The results confirm the hypothesis that both acetyl and malonyl groups are transferred to the mammalian fatty acid synthetase complex by the same transferase. The sequence is compared with those of other fatty acid synthetase transferases. PMID:3922356

  13. Ligation with nucleic acid sequence-based amplification.

    PubMed

    Ong, Carmichael; Tai, Warren; Sarma, Aartik; Opal, Steven M; Artenstein, Andrew W; Tripathi, Anubhav

    2012-01-01

    This work presents a novel method for detecting nucleic acid targets using a ligation step along with an isothermal, exponential amplification step. We use an engineered ssDNA with two variable regions on the ends, allowing us to design the probe for optimal reaction kinetics and primer binding. This two-part probe is ligated by T4 DNA Ligase only when both parts bind adjacently to the target. The assay demonstrates that the expected 72-nt RNA product appears only when the synthetic target, T4 ligase, and both probe fragments are present during the ligation step. An extraneous 38-nt RNA product also appears due to linear amplification of unligated probe (P3), but its presence does not cause a false-positive result. In addition, 40 mmol/L KCl in the final amplification mix was found to be optimal. It was also found that increasing P5 in excess of P3 helped with ligation and reduced the extraneous 38-nt RNA product. The assay was also tested with a single nucleotide polymorphism target, changing one base at the ligation site. The assay was able to yield a negative signal despite only a single-base change. Finally, using P3 and P5 with longer binding sites results in increased overall sensitivity of the reaction, showing that increasing ligation efficiency can improve the assay overall. We believe that this method can be used effectively for a number of diagnostic assays.

  14. Irritable bowel syndrome-diarrhea: characterization of genotype by exome sequencing, and phenotypes of bile acid synthesis and colonic transit.

    PubMed

    Camilleri, Michael; Klee, Eric W; Shin, Andrea; Carlson, Paula; Li, Ying; Grover, Madhusudan; Zinsmeister, Alan R

    2014-01-01

    The study objectives were: to mine the complete exome to identify putative rare single nucleotide variants (SNVs) associated with irritable bowel syndrome (IBS)-diarrhea (IBS-D) phenotype, to assess genes that regulate bile acids in IBS-D, and to explore univariate associations of SNVs with symptom phenotype and quantitative traits in an independent IBS cohort. Using principal components analysis, we identified two groups of IBS-D (n = 16) with increased fecal bile acids: rapid colonic transit or high bile acids synthesis. DNA was sequenced in depth, analyzing SNVs in bile acid genes (ASBT, FXR, OSTα/β, FGF19, FGFR4, KLB, SHP, CYP7A1, LRH-1, and FABP6). Exome findings were compared with those of 50 similar ethnicity controls. We assessed univariate associations of each SNV with quantitative traits and a principal components analysis and associations between SNVs in KLB and FGFR4 and symptom phenotype in 405 IBS, 228 controls and colonic transit in 70 IBS-D, 71 IBS-constipation. Mining the complete exome did not reveal significant associations with IBS-D over controls. There were 54 SNVs in 10 of 11 bile acid-regulating genes, with no SNVs in FGF19; 15 nonsynonymous SNVs were identified in similar proportions of IBS-D and controls. Variations in KLB (rs1015450, downstream) and FGFR4 [rs434434 (intronic), rs1966265, and rs351855 (nonsynonymous)] were associated with colonic transit (rs1966265; P = 0.043), fecal bile acids (rs1015450; P = 0.064), and principal components analysis groups (all 3 FGFR4 SNVs; P < 0.05). In the 633-person cohort, FGFR4 rs434434 was associated with symptom phenotype (P = 0.027) and rs1966265 with 24-h colonic transit (P = 0.066). Thus exome sequencing identified additional variants in KLB and FGFR4 associated with bile acids or colonic transit in IBS-D.

  15. Irritable bowel syndrome-diarrhea: characterization of genotype by exome sequencing, and phenotypes of bile acid synthesis and colonic transit

    PubMed Central

    Klee, Eric W.; Shin, Andrea; Carlson, Paula; Li, Ying; Grover, Madhusudan; Zinsmeister, Alan R.

    2013-01-01

    The study objectives were: to mine the complete exome to identify putative rare single nucleotide variants (SNVs) associated with irritable bowel syndrome (IBS)-diarrhea (IBS-D) phenotype, to assess genes that regulate bile acids in IBS-D, and to explore univariate associations of SNVs with symptom phenotype and quantitative traits in an independent IBS cohort. Using principal components analysis, we identified two groups of IBS-D (n = 16) with increased fecal bile acids: rapid colonic transit or high bile acids synthesis. DNA was sequenced in depth, analyzing SNVs in bile acid genes (ASBT, FXR, OSTα/β, FGF19, FGFR4, KLB, SHP, CYP7A1, LRH-1, and FABP6). Exome findings were compared with those of 50 similar ethnicity controls. We assessed univariate associations of each SNV with quantitative traits and a principal components analysis and associations between SNVs in KLB and FGFR4 and symptom phenotype in 405 IBS, 228 controls and colonic transit in 70 IBS-D, 71 IBS-constipation. Mining the complete exome did not reveal significant associations with IBS-D over controls. There were 54 SNVs in 10 of 11 bile acid-regulating genes, with no SNVs in FGF19; 15 nonsynonymous SNVs were identified in similar proportions of IBS-D and controls. Variations in KLB (rs1015450, downstream) and FGFR4 [rs434434 (intronic), rs1966265, and rs351855 (nonsynonymous)] were associated with colonic transit (rs1966265; P = 0.043), fecal bile acids (rs1015450; P = 0.064), and principal components analysis groups (all 3 FGFR4 SNVs; P < 0.05). In the 633-person cohort, FGFR4 rs434434 was associated with symptom phenotype (P = 0.027) and rs1966265 with 24-h colonic transit (P = 0.066). Thus exome sequencing identified additional variants in KLB and FGFR4 associated with bile acids or colonic transit in IBS-D. PMID:24200957

  16. Deep sequencing and human antibody repertoire analysis

    PubMed Central

    Boyd, Scott D; Crowe, James E

    2016-01-01

    In the past decade, high-throughput DNA sequencing (HTS) methods and improved approaches for isolating antigen-specific B cells and their antibody genes have been applied in many areas of human immunology. This work has greatly increased our understanding of human antibody repertoires and the specific clones responsible for protective immunity or immune-mediated pathogenesis. Although the principles underlying selection of individual B cell clones in the intact immune system are still under investigation, the combination of more powerful genetic tracking of antibody lineage development and functional testing of the encoded proteins promises to transform therapeutic antibody discovery and optimization. Here, we highlight recent advances in this fast-moving field. PMID:27065089

  17. SAW: a graphical user interface for the analysis of immunoglobulin variable domain sequences.

    PubMed

    Elgavish, R A; Schroeder, H W

    1993-12-01

    The Sequence Analysis Workshop (SAW) is an interactive program for sequence analysis of immunoglobulin variable domains. Sequences for SAW can be obtained from GenBank or from a standard text file. SAW can compare a variable domain to as many as 100 different sequences, calculate the extent of homology, sort the sequences by their degree of similarity, translate the nucleotide codons into amino acids and then display the results in either a graphical or text format. These comparisons allow the investigator to determine the likely germ-line progenitors of a variable domain and to visualize how it differs from other antibody genes by functional region. SAW supports replacement and silent site substitution analysis by either codon or region, thus providing rapid insight into the forces that have shaped mutations. The sequence comparisons can be printed out as an aid for paper analysis or for preparation of figures for publication. SAW is written in Microsoft C for use with the Microsoft Windows graphics environment. The use of color and graphics, the generation of subsidiary windows that contain the results of specific analyses and the mouse-driven control of the program make SAW an easy-to-use tool for immunoglobulin sequence comparison.

  18. Rational design of translational pausing without altering the amino acid sequence dramatically promotes soluble protein expression: a strategic demonstration.

    PubMed

    Chen, Wei; Jin, Jingjie; Gu, Wei; Wei, Bo; Lei, Yun; Xiong, Sheng; Zhang, Gong

    2014-11-10

    The production of many pharmaceutical and industrial proteins in prokaryotic hosts is hindered by the insolubility of industrial expression products resulting from misfolding. Even with a correct primary sequence, an improper translation elongation rate in a heterologous expression system is an important cause of misfolding. In silico analysis revealed that most of the endogenous Escherichia coli genes display translational pausing sites that promote correct folding, and almost 1/5 genes have pausing sites at the 3'-termini of their coding sequence. Therefore, we established a novel strategy to efficiently promote the expression of soluble and active proteins without altering the amino acid sequence or expression conditions. This strategy uses the rational design of translational pausing based on structural information solely through synonymous substitutions, i.e. no change on the amino acids sequence. We demonstrated this strategy on a promising antiviral candidate, Cyanovirin-N (CVN), which could not be efficiently expressed in any previously reported system. By introducing silent mutations, we increased the soluble expression level in E. coli by 2000-fold without altering the CVN protein sequence, and the specific activity was slightly higher for the optimized CVN than for the wild-type variant. This strategy introduces new possibilities for the production of bioactive recombinant proteins. Copyright © 2014 Elsevier B.V. All rights reserved.

  19. Computational simulations of protein folding to engineer amino acid sequences to encourage desired supersecondary structure formation.

    PubMed

    Gerstman, Bernard S; Chapagain, Prem P

    2013-01-01

    The dynamics of protein folding are complicated because of the various types of amino acid interactions that create secondary, supersecondary, and tertiary interactions. Computational modeling can be used to simulate the biophysical and biochemical interactions that determine protein folding. Effective folding to a desired protein configuration requires a compromise between speed, stability, and specificity. If the primary sequence of amino acids emphasizes one of these characteristics, the others might suffer and the folding process may not be optimized. We provide an example of a model peptide whose primary sequence produces a highly stable supersecondary two-helix bundle structure, but at the expense of lower speed and specificity of the folding process. We show how computational simulations can be used to discover the configuration of the kinetic trap that causes the degradation in the speed and specificity of folding. We also show how amino acid sequences can be engineered by specific substitutions to optimize the folding to the desired supersecondary structure.

  20. Thin-film technology for direct visual detection of nucleic acid sequences: applications in clinical research.

    PubMed

    Jenison, Robert D; Bucala, Richard; Maul, Diana; Ward, David C

    2006-01-01

    Certain optical conditions permit the unaided eye to detect thickness changes on surfaces on the order of 20 A, which are of similar dimensions to monomolecular interactions between proteins or hybridization of complementary nucleic acid sequences. Such detection exploits specific interference of reflected white light, wherein thickness changes are perceived as surface color changes. This technology, termed thin-film detection, allows for the visualization of subattomole amounts of nucleic acid targets, even in complex clinical samples. Thin-film technology has been applied to a broad range of clinically relevant indications, including the detection of pathogenic bacterial and viral nucleic acid sequences and the discrimination of sequence variations in human genes causally related to susceptibility or severity of disease.

  1. Amino acid sequences of two trypsin inhibitors from winged bean seeds (Psophocarpus tetragonolobus (L)DC.).

    PubMed

    Yamamoto, M; Hara, S; Ikenaka, T

    1983-09-01

    The trypsin inhibitor (WTI-1) purified from winged bean seeds is a Kunitz type protease inhibitor having a molecular weight of 19,200. WTI-1 inhibits bovine trypsin stoichiometrically, but not bovine alpha-chymotrypsin. The approximate Ki value for the trypsin-inhibitor complex is 2.5 X 10(-9) M. The complete amino acid sequence of WTI-1 was determined by conventional methods. Comparison of the sequence with that of soybean trypsin inhibitor (STI) indicated that the sequence of WTI-1 had 50% homology with that of STI. WTI-1 was separated into 2 homologous inhibitors, WTI-1A and WTI-1B, by isoelectric focusing. The isoelectric points of WTI-1A and WTI-1B were 8.5 and 9.4, respectively, and their sequences were presumed from their amino acid compositions.

  2. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  3. Laser Desorption Mass Spectrometry for DNA Sequencing and Analysis

    NASA Astrophysics Data System (ADS)

    Chen, C. H. Winston; Taranenko, N. I.; Golovlev, V. V.; Isola, N. R.; Allman, S. L.

    1998-03-01

    Rapid DNA sequencing and/or analysis is critically important for biomedical research. In the past, gel electrophoresis has been the primary tool to achieve DNA analysis and sequencing. However, gel electrophoresis is a time-consuming and labor-extensive process. Recently, we have developed and used laser desorption mass spectrometry (LDMS) to achieve sequencing of ss-DNA longer than 100 nucleotides. With LDMS, we succeeded in sequencing DNA in seconds instead of hours or days required by gel electrophoresis. In addition to sequencing, we also applied LDMS for the detection of DNA probes for hybridization LDMS was also used to detect short tandem repeats for forensic applications. Clinical applications for disease diagnosis such as cystic fibrosis caused by base deletion and point mutation have also been demonstrated. Experimental details will be presented in the meeting. abstract.

  4. Sequence analysis with the Kestrel SIMD parallel processor.

    PubMed

    Grate, L; Diekhans, M; Dahle, D; Hughey, R

    2001-01-01

    Computer aided sequence analysis is a critical aspect of current biological research. Sequence information from the genome sequencing projects fills databases so quickly that humans cannot examine it all. Hence there is a heavy reliance on computer algorithms to point out the few important nuggets for human examination. Sequence search algorithms range from simple to complex, as does the representation of the biological data. Typically though, simple algorithms are used on the simplest of data representations because of the large computational demands of anything more complex. This leads to missed hits because the simple search techniques are often not sufficiently sensitive. Here we describe the implementation of several sensitive sequence analysis algorithms on the Kestrel parallel processor, a single-instruction multiple-data (SIMD) processor developed and built at UCSC. Performance of the Smith-Waterman and Hidden Markov Model algorithms, with both Viterbi and Expectation Maximization methods ranges from 6 to 20 times faster than standard computers.

  5. [Automatic analysis pipeline of next-generation sequencing data].

    PubMed

    Wenke, Li; Fengyu, Li; Siyao, Zhang; Bin, Cai; Na, Zheng; Yu, Nie; Dao, Zhou; Qian, Zhao

    2014-06-01

    The development of next-generation sequencing has generated high demand for data processing and analysis. Although there are a lot of software for analyzing next-generation sequencing data, most of them are designed for one specific function (e.g., alignment, variant calling or annotation). Therefore, it is necessary to combine them together for data analysis and to generate interpretable results for biologists. This study designed a pipeline to process Illumina sequencing data based on Perl programming language and SGE system. The pipeline takes original sequence data (fastq format) as input, calls the standard data processing software (e.g., BWA, Samtools, GATK, and Annovar), and finally outputs a list of annotated variants that researchers can further analyze. The pipeline simplifies the manual operation and improves the efficiency by automatization and parallel computation. Users can easily run the pipeline by editing the configuration file or clicking the graphical interface. Our work will facilitate the research projects using the sequencing technology.

  6. Initial sequencing and comparative analysis of the mouse genome

    SciTech Connect

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F.; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E.; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R.; Brown, Daniel G.; Brown, Stephen D.; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D.; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T.; Church, Deanna M.; Clamp, Michele; Clee, Christopher; Collins, Francis S.; Cook, Lisa L.; Copley, Richard R.; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D.; Deri, Justin; Dermitzakis, Emmanouil T.; Dewey, Colin; Dickens, Nicholas J.; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M.; Eddy, Sean R.; Elnitski, Laura; Emes, Richard D.; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A.; Flicek, Paul; Foley, Karen; Frankel, Wayne N.; Fulton, Lucinda A.; Fulton, Robert S.; Furey, Terrence S.; Gage, Diane; Gibbs, Richard A.; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A.; Green, Eric D.; Gregory, Simon; Guigo, Roderic; Guyer, Mark; Hardison, Ross C.; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W.; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B.; Johnson, L. Steven; Jones, Matthew; Jones, Thomas A.; Joy, Ann; Kamal, Michael; Karlsson, Elinor K.; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W. James; Kirby, Andrew; Kolbe, Diana L.; Korf, Ian; Kucherlapati, Raju S.; Kulbokas III, Edward J.; Kulp, David; Landers, Tom; Leger, J.P.; Leonard, Steven; Letunic, Ivica; Levine, Rosie; et al.

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  7. Initial sequencing and comparative analysis of the mouse genome.

    PubMed

    Waterston, Robert H; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R; Brown, Daniel G; Brown, Stephen D; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T; Church, Deanna M; Clamp, Michele; Clee, Christopher; Collins, Francis S; Cook, Lisa L; Copley, Richard R; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D; Deri, Justin; Dermitzakis, Emmanouil T; Dewey, Colin; Dickens, Nicholas J; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M; Eddy, Sean R; Elnitski, Laura; Emes, Richard D; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A; Flicek, Paul; Foley, Karen; Frankel, Wayne N; Fulton, Lucinda A; Fulton, Robert S; Furey, Terrence S; Gage, Diane; Gibbs, Richard A; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A; Green, Eric D; Gregory, Simon; Guigó, Roderic; Guyer, Mark; Hardison, Ross C; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B; Johnson, L Steven; Jones, Matthew; Jones, Thomas A; Joy, Ann; Kamal, Michael; Karlsson, Elinor K; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W James; Kirby, Andrew; Kolbe, Diana L; Korf, Ian; Kucherlapati, Raju S; Kulbokas, Edward J; Kulp, David; Landers, Tom; Leger, J P; Leonard, Steven; Letunic, Ivica; Levine, Rosie; Li, Jia; Li, Ming; Lloyd, Christine; Lucas, Susan; Ma, Bin; Maglott, Donna R; Mardis, Elaine R; Matthews, Lucy; Mauceli, Evan; Mayer, John H; McCarthy, Megan; McCombie, W Richard; McLaren, Stuart; McLay, Kirsten; McPherson, John D; Meldrim, Jim; Meredith, Beverley; Mesirov, Jill P; Miller, Webb; Miner, Tracie L; Mongin, Emmanuel; Montgomery, Kate T; Morgan, Michael; Mott, Richard; Mullikin, James C; Muzny, Donna M; Nash, William E; Nelson, Joanne O; Nhan, Michael N; Nicol, Robert; Ning, Zemin; Nusbaum, Chad; O'Connor, Michael J; Okazaki, Yasushi; Oliver, Karen; Overton-Larty, Emma; Pachter, Lior; Parra, Genís; Pepin, Kymberlie H; Peterson, Jane; Pevzner, Pavel; Plumb, Robert; Pohl, Craig S; Poliakov, Alex; Ponce, Tracy C; Ponting, Chris P; Potter, Simon; Quail, Michael; Reymond, Alexandre; Roe, Bruce A; Roskin, Krishna M; Rubin, Edward M; Rust, Alistair G; Santos, Ralph; Sapojnikov, Victor; Schultz, Brian; Schultz, Jörg; Schwartz, Matthias S; Schwartz, Scott; Scott, Carol; Seaman, Steven; Searle, Steve; Sharpe, Ted; Sheridan, Andrew; Shownkeen, Ratna; Sims, Sarah; Singer, Jonathan B; Slater, Guy; Smit, Arian; Smith, Douglas R; Spencer, Brian; Stabenau, Arne; Stange-Thomann, Nicole; Sugnet, Charles; Suyama, Mikita; Tesler, Glenn; Thompson, Johanna; Torrents, David; Trevaskis, Evanne; Tromp, John; Ucla, Catherine; Ureta-Vidal, Abel; Vinson, Jade P; Von Niederhausern, Andrew C; Wade, Claire M; Wall, Melanie; Weber, Ryan J; Weiss, Robert B; Wendl, Michael C; West, Anthony P; Wetterstrand, Kris; Wheeler, Raymond; Whelan, Simon; Wierzbowski, Jamey; Willey, David; Williams, Sophie; Wilson, Richard K; Winter, Eitan; Worley, Kim C; Wyman, Dudley; Yang, Shan; Yang, Shiaw-Pyng; Zdobnov, Evgeny M; Zody, Michael C; Lander, Eric S

    2002-12-05

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  8. Snake venom toxins. The amino acid sequence of toxin Vi2, a homologue of pancreatic trypsin inhibitor, from Dendroaspis polylepis polylepis (black mamba) venom.

    PubMed

    Strydom, D J

    1977-04-25

    The amino acid sequence of venom component Vi2, a protein of low toxicity from Dendroaspis polylepis polylepis venom was determined by automatic sequence analysis in combination with sequence studies on tryptic peptides. This protein, the most retarded fraction of this venom on a cation-exchange resin, is a homologue of bovine pancreatic trypsin inhibitor consisting of a single chain of 57 amino acid residues containing six half-cystine residues. The active site lysyl residue of bovine trypsin inhibitor is conserved in Vi2 although large differences are found in the rest of the molecule.

  9. Physiology of acetic acid bacteria in light of the genome sequence of Gluconobacter oxydans.

    PubMed

    Deppenmeier, Uwe; Ehrenreich, Armin

    2009-01-01

    Acetic acid bacteria are a distinct group of microorganisms within the family Acetobacteriaceae. They are characterized by their ability to incompletely oxidize a wide range of carbohydrates and alcohols. The great advantage of these reactions is that many substrates are regio- and stereoselectively oxidized. This feature is already exploited in several combined biotechnological-chemical procedures for the synthesis of sugar derivatives. Therefore, it is important to understand the basic concepts of this type of physiology to construct strains for improved or new oxidative fermentations. Based on the genome sequence of Gluconobacteroxydans, we will shed light on the central carbon metabolism, the composition of the respiratory chain and the analysis of uncharacterized oxidoreductases. In this context, the role of membrane-bound and -soluble dehydrogenases are of major importance in the process of incomplete oxidation. Other topics deal with the question of how these organisms generate energy and assimilate carbon. Furthermore, we will discuss how acetic acid bacteria thrive in their nutrient-rich environment and how they outcompete other microorganisms. Copyright (c) 2008 S. Karger AG, Basel.

  10. Sequencing and Analysis of Neanderthal Genomic DNA

    SciTech Connect

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  11. A convenient and adaptable microcomputer environment for DNA and protein sequence manipulation and analysis.

    PubMed Central

    Pustell, J; Kafatos, F C

    1986-01-01

    We describe the further development of a widely used package of DNA and protein sequence analysis programs for microcomputers (1,2,3). The package now provides a screen oriented user interface, and an enhanced working environment with powerful formatting, disk access, and memory management tools. The new GenBank floppy disk database is supported transparently to the user and a similar version of the NBRF protein database is provided. The programs can use sequence file annotation to automatically annotate printouts and translate or extract specified regions from sequences by name. The sequence comparison programs can now perform a 5000 X 5000 bp analysis in 12 minutes on an IBM PC. A program to locate potential protein coding regions in nucleic acids, a digitizer interface, and other additions are also described. PMID:3753784

  12. Cloning and sequence analysis of candidate human natural killer-enhancing factor genes

    SciTech Connect

    Shau, H.; Butterfield, L.H.; Chiu, R.; Kim, A.

    1994-12-31

    A cytosol factor from human red blood cells enhances natural killer (NK) activity. This factor, termed NK-enhancing factor (NKEF), is a protein of 44000 M{sub r} consisting of two subunits of equal size linked by disulfide bonds. NKEF is expressed in the NK-sensitive erythroleukemic cell line K562. Using an antibody specific for NKEF as a probe for immunoblot screening, we isolated several clones from a {lambda}gt11 cDNA library of K562. Additional subcloning and sequencing revealed that the candidate NKEF cDNAs fell into one of two categories of closely related but non-identical genes, referred to as NKEF A and B. They are 88% identical in amino acid sequence and 71% identical in nucleotide sequence. Southern blot analysis suggests that there are two to three NKEF family members in the genome. Analysis of predicted amino acid sequences indicates that both NKEF A and B are cytosol proteins with several phosphorylation sites each, but that they have no glycosylation sites. They are significantly homologous to several other proteins from a wide variety of organisms ranging from prokaryotes to mammals, especially with regard to several well-conserved motifs within the amino acid sequences. The biological functions of these proteins in other species are mostly unknown, but some of them were reported to be induced by oxidative stress. Therefore, as well as for immunoregulation of NK activity, NKEF may be important for cells in coping with oxidative insults. 32 refs., 3 figs.

  13. Cloning and sequence analysis of Hemonchus contortus HC58cDNA.

    PubMed

    Muleke, Charles I; Ruofeng, Yan; Lixin, Xu; Xinwen, Bo; Xiangrui, Li

    2007-06-01

    The complete coding sequence of Hemonchus contortus HC58cDNA was generated by rapid amplification of cDNA ends and polymerase chain reaction using primers based on the 5' and 3' ends of the parasite mRNA, accession no. AF305964. The HC58cDNA gene was 851 bp long, with open reading frame of 717 bp, precursors to 239 amino acids coding for approximately 27 kDa protein. Analysis of amino acid sequence revealed conserved residues of cysteine, histidine, asparagine, occluding loop pattern, hemoglobinase motif and glutamine of the oxyanion hole characteristic of cathepsin B like proteases (CBL). Comparison of the predicted amino acid sequences showed the protein shared 33.5-58.7% identity to cathepsin B homologues in the papain clan CA family (family C1). Phylogenetic analysis revealed close evolutionary proximity of the protein sequence to counterpart sequences in the CBL, suggesting that HC58cDNA was a member of the papain family.

  14. Parameters of proteome evolution from histograms of amino-acid sequence identities of paralogous proteins

    PubMed Central

    Axelsen, Jacob Bock; Yan, Koon-Kiu; Maslov, Sergei

    2007-01-01

    Background The evolution of the full repertoire of proteins encoded in a given genome is mostly driven by gene duplications, deletions, and sequence modifications of existing proteins. Indirect information about relative rates and other intrinsic parameters of these three basic processes is contained in the proteome-wide distribution of sequence identities of pairs of paralogous proteins. Results We introduce a simple mathematical framework based on a stochastic birth-and-death model that allows one to extract some of this information and apply it to the set of all pairs of paralogous proteins in H. pylori, E. coli, S. cerevisiae, C. elegans, D. melanogaster, and H. sapiens. It was found that the histogram of sequence identities p generated by an all-to-all alignment of all protein sequences encoded in a genome is well fitted with a power-law form ~ p-γ with the value of the exponent γ around 4 for the majority of organisms used in this study. This implies that the intra-protein variability of substitution rates is best described by the Gamma-distribution with the exponent α ≈ 0.33. Different features of the shape of such histograms allow us to quantify the ratio between the genome-wide average deletion/duplication rates and the amino-acid substitution rate. Conclusion We separately measure the short-term ("raw") duplication and deletion rates rdup∗, rdel∗ which include gene copies that will be removed soon after the duplication event and their dramatically reduced long-term counterparts rdup, rdel. High deletion rate among recently duplicated proteins is consistent with a scenario in which they didn't have enough time to significantly change their functional roles and thus are to a large degree disposable. Systematic trends of each of the four duplication/deletion rates with the total number of genes in the genome were analyzed. All but the deletion rate of recent duplicates rdel∗ were shown to systematically increase with Ngenes. Abnormally flat shapes

  15. Conversion of amino-acid sequence in proteins to classical music: search for auditory patterns

    PubMed Central

    2007-01-01

    We have converted genome-encoded protein sequences into musical notes to reveal auditory patterns without compromising musicality. We derived a reduced range of 13 base notes by pairing similar amino acids and distinguishing them using variations of three-note chords and codon distribution to dictate rhythm. The conversion will help make genomic coding sequences more approachable for the general public, young children, and vision-impaired scientists. PMID:17477882

  16. Tools for integrated sequence-structure analysis with UCSF Chimera

    PubMed Central

    Meng, Elaine C; Pettersen, Eric F; Couch, Gregory S; Huang, Conrad C; Ferrin, Thomas E

    2006-01-01

    Background Comparing related structures and viewing the structures in the context of sequence alignments are important tasks in protein structure-function research. While many programs exist for individual aspects of such work, there is a need for interactive visualization tools that: (a) provide a deep integration of sequence and structure, far beyond mapping where a sequence region falls in the structure and vice versa; (b) facilitate changing data of one type based on the other (for example, using only sequence-conserved residues to match structures, or adjusting a sequence alignment based on spatial fit); (c) can be used with a researcher's own data, including arbitrary sequence alignments and annotations, closely or distantly related sets of proteins, etc.; and (d) interoperate with each other and with a full complement of molecular graphics features. We describe enhancements to UCSF Chimera to achieve these goals. Results The molecular graphics program UCSF Chimera includes a suite of tools for interactive analyses of sequences and structures. Structures automatically associate with sequences in imported alignments, allowing many kinds of crosstalk. A novel method is provided to superimpose structures in the absence of a pre-existing sequence alignment. The method uses both sequence and secondary structure, and can match even structures with very low sequence identity. Another tool constructs structure-based sequence alignments from superpositions of two or more proteins. Chimera is designed to be extensible, and mechanisms for incorporating user-specific data without Chimera code development are also provided. Conclusion The tools described here apply to many problems involving comparison and analysis of protein structures and their sequences. Chimera includes complete documentation and is intended for use by a wide range of scientists, not just those in the computational disciplines. UCSF Chimera is free for non-commercial use and is available for Microsoft

  17. Nucleotide and deduced amino acid sequences of Torpedo californica acetylcholine receptor gamma subunit.

    PubMed Central

    Claudio, T; Ballivet, M; Patrick, J; Heinemann, S

    1983-01-01

    The nucleotide sequence has been determined of a cDNA clone that codes for the 60,000-dalton gamma subunit of Torpedo californica acetylcholine receptor. The length of the cDNA clone is 2,010 base pairs. The 5' and 3' untranslated regions have respective lengths of 31 and 461 base pairs. Data suggest that the putative polyadenylylation consensus sequence A-A-T-A-A-A may not be required for polyadenylylation of the mRNA corresponding to the cDNA clone described in this study. From the DNA sequence data, the amino acid sequence of the gamma subunit was deduced. The subunit is composed of 489 amino acids giving a molecular mass of 56,600 daltons. The deduced amino acid sequence data also indicate the presence of a 17-amino acid extension or signal peptide on this subunit. From these data, structural predictions for the gamma subunit are made such as potential membrane-spanning regions, possible asparagine-linked glycosylation sites, and the assignment of regions of the protein to the extracellular, internal, and cytoplasmic domains of the lipid bilayer. Images PMID:6573658

  18. Sequence Comparison and Phylogeny of Nucleotide Sequence of Coat Protein and Nucleic Acid Binding Protein of a Distinct Isolate of Shallot virus X from India.

    PubMed

    Majumder, S; Baranwal, V K

    2011-06-01

    Shallot virus X (ShVX), a type species in the genus Allexivirus of the family Alfaflexiviridae has been associated with shallot plants in India and other shallot growing countries like Russia, Germany, Netherland, and New Zealand. Coat protein (CP) and nucleic acid binding protein (NB) region of the virus was obtained by reverse transcriptase polymerase chain reaction from scales leaves of shallot bulbs. The partial cDNA contained two open reading frames encoding proteins of molecular weights of 28.66 and 14.18 kDa belonging to Flexi_CP super-family and viral NB super-family, respectively. The percent identity and phylogenetic analysis of amino acid sequences of CP and NB region of the virus associated with shallot indicated that it was a distinct isolate of ShVX.

  19. The complete amino acid sequence of chicken skeletal-muscle enolase.

    PubMed Central

    Russell, G A; Dunbar, B; Fothergill-Gilmore, L A

    1986-01-01

    The complete amino acid sequence of chicken skeletal-muscle enolase, comprising 433 residues, was determined. The sequence was deduced by automated sequencing of hydroxylamine-cleavage, CNBr-cleavage, o-iodosobenzoic acid-cleavage, clostripain-digest and staphylococcal-proteinase-digest fragments. The presence of several acid-labile peptide bonds and the tenacious aggregation of most CNBr-cleavage fragments meant that a commonly used sequencing strategy involving initial CNBr cleavage was unproductive. Cleavage at the single Asn-Gly peptide bond with hydroxylamine proved to be particularly useful. Comparison of the sequence of chicken enolase with the two yeast enolase isoenzyme sequences shows that the enzyme is strongly conserved, with 60% of the residues identical. The histidine and arginine residues implicated as being important for the activity of yeast enolase are conserved in the chicken enzyme. Secondary-structure predictions are analysed in an accompanying paper [Sawyer, Fothergill-Gilmore & Russell (1986) Biochem. J. 236, 127-130]. PMID:3539098

  20. The amino acid motif L/IIxxFE defines a novel actin-binding sequence in PDZ-RhoGEF.

    PubMed

    Banerjee, Jayashree; Fischer, Christopher C; Wedegaertner, Philip B

    2009-08-25

    PDZ-RhoGEF is a member of the regulator family of G protein signaling (RGS) domain-containing RhoGEFs (RGS-RhoGEFs) that link activated heterotrimeric G protein alpha subunits of the G12 family to activation of the small GTPase RhoA. Unique among the RGS-RhoGEFs, PDZ-RhoGEF contains a short sequence that localizes the protein to the actin cytoskeleton. In this report, we demonstrate that the actin-binding domain, located between amino acids 561 and 585, directly binds to F-actin in vitro. Extensive mutagenesis identifies isoleucine 568, isoleucine 569, phenylalanine 572, and glutamic acid 573 as being necessary for binding to actin and for colocalization with the actin cytoskeleton in cells. These results define a novel actin-binding sequence in PDZ-RhoGEF with a critical amino acid motif of IIxxFE. Moreover, sequence analysis identifies a similar actin-binding motif in the N-terminus of the RhoGEF frabin, and as with PDZ-RhoGEF, mutagenesis and actin interaction experiments demonstrate an LIxxFE motif, consisting of the key amino acids leucine 23, isoleucine 24, phenylalanine 27, and glutamic acid 28. Taken together, results with PDZ-RhoGEF and frabin identify a novel actin-binding sequence. Lastly, inducible dimerization of the actin-binding region of PDZ-RhoGEF revealed a dimerization-dependent actin bundling activity in vitro. PDZ-RhoGEF exists in cells as a dimer, raising the possibility that PDZ-RhoGEF could influence actin structure in a manner independent of its ability to activate RhoA.

  1. Computer analysis of HIV epitope sequences

    SciTech Connect

    Gupta, G.; Myers, G.

    1990-01-01

    Phylogenetic tree analysis provide us with important general information regarding the extent and rate of HIV variation. Currently we are attempting to extend computer analysis and modeling to the V3 loop of the type 2 virus and its simian homologues, especially in light of the prominent role the latter will play in animal model studies. Moreover, it might be possible to attack the slightly similar V4 loop by this approach. However, the strategy relies very heavily upon natural'' information and constraints, thus there exist severe limitations upon the general applicability, in addition to uncertainties with regard to long-range residue interactions. 5 refs., 3 figs.

  2. Image sequence analysis and face feature extraction

    NASA Astrophysics Data System (ADS)

    Ravaut, Frederic; Stamon, Georges

    1997-04-01

    Based on the hypothesis of a one-to-one relationship between the external symptoms of epileptic fits and the abnormal cerebral functioning which causes it, the computerized study of epileptic fit video tapes brings new information on abnormal neuron activity. This insight will improve specialist's analysis in their diagnoses.

  3. Genome sequencing and analysis conference grant

    SciTech Connect

    Venter, J.C.

    1995-10-01

    The 14 plenary session presentations focused on nematode; yeast; fruit fly; plants; mycobacteria; and man. In addition there were presentations on a variety of technical innovations including database developments and refinements, bioelectronic genesensors, computer-assisted multiplex techniques, and hybridization analysis with DNA chip technology. This document includes a list of exhibitors and abstracts of sessions.

  4. Phylogenetic distribution of phenotypic traits in Bacillus thuringiensis determined by multilocus sequence analysis.

    PubMed

    Blackburn, Michael B; Martin, Phyllis A W; Kuhar, Daniel; Farrar, Robert R; Gundersen-Rindal, Dawn E

    2013-01-01

    Diverse isolates from a world-wide collection of Bacillus thuringiensis were classified based on phenotypic profiles resulting from six biochemical tests; production of amylase (T), lecithinase (L), urease (U), acid from sucrose (S) and salicin (A), and the hydrolysis of esculin (E). Eighty two isolates representing the 15 most common phenotypic profiles were subjected to phylogenetic analysis by multilocus sequence typing; these were found to be distributed among 19 sequence types, 8 of which were novel. Approximately 70% of the isolates belonged to sequence types corresponding to the classical B. thuringiensis varieties kurstaki (20 isolates), finitimus (15 isolates), morrisoni (11 isolates) and israelensis (11 isolates). Generally, there was little apparent correlation between phenotypic traits and phylogenetic position, and phenotypic variation was often substantial within a sequence type. Isolates of the sequence type corresponding to kurstaki displayed the greatest apparent phenotypic variation with 6 of the 15 phenotypic profiles represented. Despite the phenotypic variation often observed within a given sequence type, certain phenotypes appeared highly correlated with particular sequence types. Isolates with the phenotypic profiles TLUAE and LSAE were found to be exclusively associated with sequence types associated with varieties kurstaki and finitimus, respectively, and 7 of 8 TS isolates were found to be associated with the morrisoni sequence type. Our results suggest that the B. thuringiensis varieties israelensis and kurstaki represent the most abundant varieties of Bt in soil.

  5. The amino acid sequence around the active-site cysteine and histidine residues of stem bromelain

    PubMed Central

    Husain, S. S.; Lowe, G.

    1970-01-01

    Stem bromelain that had been irreversibly inhibited with 1,3-dibromo[2-14C]-acetone was reduced with sodium borohydride and carboxymethylated with iodoacetic acid. After digestion with trypsin and α-chymotrypsin three radioactive peptides were isolated chromatographically. The amino acid sequences around the cross-linked cysteine and histidine residues were determined and showed a high degree of homology with those around the active-site cysteine and histidine residues of papain and ficin. PMID:5420046

  6. Amino acid sequences of two nonspecific lipid-transfer proteins from germinated castor bean.

    PubMed

    Takishima, K; Watanabe, S; Yamada, M; Suga, T; Mamiya, G

    1988-11-01

    The amino acid sequence of two nonspecific lipid-transfer proteins (nsLTP) B and C from germinated castor bean seeds have been determined. Both the proteins consist of 92 residues, as for nsLTP previously reported, and their calculated Mr values are 9847 and 9593 for nsLTP-B and nsLTP-C, respectively. The sequences of nsLTP-B and nsLTP-C, compared to the known sequence of nsLTP-A from the same source, are 68% and 35% similar, respectively. No variation was found at the positions of the cysteine residues, indicating that they might be involved in disulfide bridges.

  7. Sequencing and characterization of oligosaccharides using infrared multiphoton dissociation and boronic acid derivatization in a quadrupole ion trap.

    PubMed

    Pikulski, Michael; Hargrove, Amanda; Shabbir, Shagufta H; Anslyn, Eric V; Brodbelt, Jennifer S

    2007-12-01

    A simplified method for determining the sequence and branching of oligosaccharides using infrared multiphoton dissociation (IRMPD) in a quadrupole ion trap (QIT) is described. An IR-active boronic acid (IRABA) reagent is used to derivatize the oligosaccharides before IRMPD analysis. The IRABA ligand is designed to both enhance the efficiency of the derivatization reaction and to facilitate the photon absorption process. The resulting IRMPD spectra display oligosaccharide fragments that are formed from primarily one type of diagnostic cleavage, thus making sequencing straightforward. The presence of sequential fragment ions, a phenomenon of IRMPD, permit the comprehensive sequencing of the oligosaccharides studied in a single stage of activation. We demonstrate this approach for two series of oligosaccharides, the lacto-N-fucopentaoses (LNFPs) and the lacto-N-difucohexaoses (LNDFHs).

  8. The amino acid sequence and reactive site of a single-headed trypsin inhibitor from wheat endosperm.

    PubMed

    Poerio, E; Caporale, C; Carrano, L; Caruso, C; Vacca, F; Buonocore, V

    1994-02-01

    The sequence of a trypsin inhibitor, isolated from wheat endosperm, is reported. The primary structure was obtained by automatic sequence analysis of the S-alkylated protein and of purified peptides derived from chemical cleavage by cyanogen bromide and digestion with Staphylococcus aureus V8 protease. This protein, named wheat trypsin inhibitor (WTI), which is comprised of a total of 71 amino acid residues, has 12 cysteines, all involved in disulfide bridges. The primary site of interaction (reactive site) with bovine trypsin has been identified as the dipeptide arginyl-methionyl at positions 19 and 20. WTI has a high degree of sequence identity with a number of serine proteinase inhibitors isolated from both cereal and leguminous plants. On the basis of the findings presented, this protein has been classified as a single-headed trypsin inhibitor of Bowman-Birk type.

  9. A simple ligation-based method to increase the information density in sequencing reactions used to deconvolute nucleic acid selections

    PubMed Central

    Childs-Disney, Jessica L.; Disney, Matthew D.

    2008-01-01

    Herein, a method is described to increase the information density of sequencing experiments used to deconvolute nucleic acid selections. The method is facile and should be applicable to any selection experiment. A critical feature of this method is the use of biotinylated primers to amplify and encode a BamHI restriction site on both ends of a PCR product. After amplification, the PCR reaction is captured onto streptavidin resin, washed, and digested directly on the resin. Resin-based digestion affords clean product that is devoid of partially digested products and unincorporated PCR primers. The product's complementary ends are annealed and ligated together with T4 DNA ligase. Analysis of ligation products shows formation of concatemers of different length and little detectable monomer. Sequencing results produced data that routinely contained three to four copies of the library. This method allows for more efficient formulation of structure-activity relationships since multiple active sequences are identified from a single clone. PMID:18065718

  10. DNA Sequence and Expression Variation of Hop (Humulus lupulus) Valerophenone Synthase (VPS), a Key Gene in Bitter Acid Biosynthesis

    PubMed Central

    Castro, Consuelo B.; Whittock, Lucy D.; Whittock, Simon P.; Leggett, Grey; Koutoulis, Anthony

    2008-01-01

    Background The hop plant (Humulus lupulus) is a source of many secondary metabolites, with bitter acids essential in the beer brewing industry and others having potential applications for human health. This study investigated variation in DNA sequence and gene expression of valerophenone synthase (VPS), a key gene in the bitter acid biosynthesis pathway of hop. Methods Sequence variation was studied in 12 varieties, and expression was analysed in four of the 12 varieties in a series across the development of the hop cone. Results Nine single nucleotide polymorphisms (SNPs) were detected in VPS, seven of which were synonymous. The two non-synonymous polymorphisms did not appear to be related to typical bitter acid profiles of the varieties studied. However, real-time quantitative reverse-transcription polymerase chain reaction (qRT-PCR) analysis of VPS expression during hop cone development showed a clear link with the bitter acid content. The highest levels of VPS expression were observed in two triploid varieties, ‘Symphony’ and ‘Ember’, which typically have high bitter acid levels. Conclusions In all hop varieties studied, VPS expression was lowest in the leaves and an increase in expression was consistently observed during the early stages of cone development. PMID:18519445

  11. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  12. D-Tailor: automated analysis and design of DNA sequences

    PubMed Central

    Guimaraes, Joao C.; Rocha, Miguel; Arkin, Adam P.; Cambray, Guillaume

    2014-01-01

    Motivation: Current advances in DNA synthesis, cloning and sequencing technologies afford high-throughput implementation of artificial sequences into living cells. However, flexible computational tools for multi-objective sequence design are lacking, limiting the potential of these technologies. Results: We developed DNA-Tailor (D-Tailor), a fully extendable software framework, for property-based design of synthetic DNA sequences. D-Tailor permits the seamless integration of multiple sequence analysis tools into a generic Monte Carlo simulation that evolves sequences toward any combination of rationally defined properties. As proof of principle, we show that D-Tailor is capable of designing sequence libraries comprising all possible combinations among three different sequence properties influencing translation efficiency in Escherichia coli. The capacity to design artificial sequences that systematically sample any given parameter space should support the implementation of more rigorous experimental designs. Availability: Source code is available for download at https://sourceforge.net/projects/dtailor/ Contact: aparkin@lbl.gov or cambray.guillaume@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online (D-Tailor Tutorial). PMID:24398007

  13. GeneQuiz: A workbench for sequence analysis

    SciTech Connect

    Scharf, M.; Schneider, R.; Casari, G.; Bork, P.

    1994-12-31

    We present the prototype of a software system, called GeneQuiz, for large-scale biological sequence analysis. The system was designed to meet the needs that arise in computational sequence analysis and our past experience with the analysis of 171 protein sequences of yeast chromosome III. We explain the cognitive challenges associated with this particular research activity and present our model of the sequence analysis process. The prototype system consists of two parts: (i) the database update and search system (driven by perl programs and rdb, a simple relational database engine also written in perl) and (ii) the visualization and browsing system (developed under C++/ET++). The principal design requirement for the first part was the complete automation of all repetitive actions: database up- dates, efficient sequence similarity searches and sampling of results in a uniform fashion. The user is then presented with {open_quotes}hit-lists{close_quotes} that summarize the results from heterogeneous database searches. The expert`s primary task now simply becomes the further analysis of the candidate entries, where the problem is to extract adequate information about functional characteristics of the query protein rapidly. This second task is tremendously accelerated by a simple combination of the heterogeneous output into uniform relational tables and the provision of browsing mechanisms that give access to database records, sequence entries and alignment views. Indexing of molecular sequence databases provides fast retrieval of individual entries with the use of unique identifiers as well as browsing through databases using pre-existing cross-references. The presentation here covers an overview of the architecture of the system prototype and our experiences on its applicability in sequence analysis.

  14. Synthetic oligonucleotide probes deduced from amino acid sequence data. Theoretical and practical considerations.

    PubMed

    Lathe, R

    1985-05-05

    Synthetic probes deduced from amino acid sequence data are widely used to detect cognate coding sequences in libraries of cloned DNA segments. The redundancy of the genetic code dictates that a choice must be made between (1) a mixture of probes reflecting all codon combinations, and (2) a single longer "optimal" probe. The second strategy is examined in detail. The frequency of sequences matching a given probe by chance alone can be determined and also the frequency of sequences closely resembling the probe and contributing to the hybridization background. Gene banks cannot be treated as random associations of the four nucleotides, and probe sequences deduced from amino acid sequence data occur more often than predicted by chance alone. Probe lengths must be increased to confer the necessary specificity. Examination of hybrids formed between unique homologous probes and their cognate targets reveals that short stretches of perfect homology occurring by chance make a significant contribution to the hybridization background. Statistical methods for improving homology are examined, taking human coding sequences as an example, and considerations of codon utilization and dinucleotide frequencies yield an overall homology of greater than 82%. Recommendations for probe design and hybridization are presented, and the choice between using multiple probes reflecting all codon possibilities and a unique optimal probe is discussed.

  15. Complete amino acid sequence of the N-terminal extension of calf skin type III procollagen.

    PubMed Central

    Brandt, A; Glanville, R W; Hörlein, D; Bruckner, P; Timpl, R; Fietzek, P P; Kühn, K

    1984-01-01

    The N-terminal extension peptide of type III procollagen, isolated from foetal-calf skin, contains 130 amino acid residues. To determine its amino acid sequence, the peptide was reduced and carboxymethylated or aminoethylated and fragmented with trypsin, Staphylococcus aureus V8 proteinase and bacterial collagenase. Pyroglutamate aminopeptidase was used to deblock the N-terminal collagenase fragment to enable amino acid sequencing. The type III collagen extension peptide is homologous to that of the alpha 1 chain of type I procollagen with respect to a three-domain structure. The N-terminal 79 amino acids, which contain ten of the 12 cysteine residues, form a compact globular domain. The next 39 amino acids are in a collagenase triplet sequence (Gly- Xaa - Yaa )n with a high hydroxyproline content. Finally, another short non-collagenous domain of 12 amino acids ends at the cleavage site for procollagen aminopeptidase, which cleaves a proline-glutamine bond. In contrast with type I procollagen, the type III procollagen extension peptides contain interchain disulphide bridges located at the C-terminus of the triple-helical domain. PMID:6331392

  16. Detection of multiple, novel reverse transcriptase coding sequences in human nucleic acids: relation to primate retroviruses

    SciTech Connect

    Shih, A.; Misra, R.; Rush, M.G.

    1989-01-01

    A variety of chemically synthesized oligonucleotides designed on the basis of amino acid and/or nucleotide sequence data were used to detect a large number of novel reverse transcriptase coding sequences in human and mouse DNAs. Procedures involving Southern blotting, library screening, and the polymerase chain reaction were all used to detect such sequences; the polymerase chain reaction was the most rapid and productive approach. In the polymerase chain reaction, oligonucleotide mixtures based on consensus sequence homologies to reverse transcriptase coding sequences and unique oligonucleotides containing perfect homology to the coding sequences of human T-cell leukemia virus types I and II were both effective in amplifying reverse transcriptase-related DNA. It is shown that human DNA contains a wide spectrum of retrovirus-related reverse transcriptase coding sequences, including some that are clearly related to human T-cell leukemia virus types I and II, some that are related to the L-1 family of long interspersed nucleotide sequences, and others that are related to previously described human endogenous proviral DNAs. In addition, human T-cell leukemia virus type I-related sequences appear to be transcribed in both normal human T cells and in a cell line derived from a human teratocarcinoma.

  17. Purification and N-terminal amino acid sequence comparisons of structural proteins from retrovirus-D/Washington and Mason-Pfizer monkey virus.

    PubMed Central

    Henderson, L E; Sowder, R; Smythers, G; Benveniste, R E; Oroszlan, S

    1985-01-01

    A new D-type retrovirus originally designated SAIDS-D/Washington and here referred to as retrovirus-D/Washington (R-D/W) was recently isolated at the University of Washington Primate Center, Seattle, Wash., from a rhesus monkey with an acquired immunodeficiency syndrome and retroperitoneal fibromatosis. To better establish the relationship of this new D-type virus to the prototype D-type virus, Mason-Pfizer monkey virus (MPMV), we have purified and compared six structural proteins from each virus. The proteins purified from each D-type retrovirus include p4, p10, p12, p14, p27, and a phosphoprotein designated pp18 for MPMV and pp20 for R-D/W. Amino acid analysis and N-terminal amino acid sequence analysis show that the p4, p12, p14, and p27 proteins of R-D/W are distinct from the homologous proteins of MPMV but that these proteins from the two different viruses share a high degree of amino acid sequence homology. The p10 proteins from the two viruses have similar amino acid compositions, and both are blocked to N-terminal Edman degradation. The phosphoproteins from the two viruses each contain phosphoserine but are different from each other in amino acid composition, molecular weight, and N-terminal amino acid sequence. The data thus show that each of the R-D/W proteins examined is distinguishable from its MPMV homolog and that a major difference between these two D-type retroviruses is found in the viral phosphoproteins. The N-terminal amino acid sequences of D-type retroviral proteins were used to search for sequence homologies between D-type and other retroviral amino acid sequences. An unexpected amino acid sequence homology was found between R-D/W pp20 (a gag protein) and a 28-residue segment of the env precursor polyprotein of Rous sarcoma virus. The N-terminal amino acid sequences of the D-type major gag protein (p27) and the nucleic acid-binding protein (p14) show only limited amino acid sequence homology to functionally homologous proteins of C

  18. Isolation and sequence analysis of the gene encoding triose phosphate isomerase from Zygosaccharomyces bailii.

    PubMed

    Merico, A; Rodrigues, F; Côrte-Real, M; Porro, D; Ranzi, B M; Compagno, C

    2001-06-30

    The ZbTPI1 gene encoding triose phosphate isomerase (TIM) was cloned from a Zygosaccharomyces bailii genomic library by complementation of the Saccharomyces cerevisiae tpi1 mutant strain. The nucleotide sequence of a 1.5 kb fragment showed an open reading frame (ORF) of 746 bp, encoding a protein of 248 amino acid residues. The deduced amino acid sequence shares a high degree of homology with TIMs from other yeast species, including some highly conserved regions. The analysis of the promoter sequence of the ZbTPI1 revealed the presence of putative motifs known to have regulatory functions in S. cerevisiae. The GenBank Accession No. of ZbTPI1 is AF325852.

  19. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence editors...

  20. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence editors...

  1. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence editors...

  2. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence editors...

  3. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence editors...

  4. A comparative analysis of HIV drug resistance interpretation based on short reverse transcriptase sequences versus full sequences

    PubMed Central

    2010-01-01

    Background As second-line antiretroviral treatment (ART) becomes more accessible in resource-limited settings (RLS), the need for more affordable monitoring tools such as point-of-care viral load assays and simplified genotypic HIV drug resistance (HIVDR) tests increases substantially. The prohibitive expenses of genotypic HIVDR assays could partly be addressed by focusing on a smaller region of the HIV reverse transcriptase gene (RT) that encompasses the majority of HIVDR mutations for people on ART in RLS. In this study, an in silico analysis of 125,329 RT sequences was performed to investigate the effect of submitting short RT sequences (codon 41 to 238) to the commonly used virco®TYPE and Stanford genotype interpretation tools. Results Pair-wise comparisons between full-length and short RT sequences were performed. Additionally, a non-inferiority approach with a concordance limit of 95% and two-sided 95% confidence intervals was used to demonstrate concordance between HIVDR calls based on full-length and short RT sequences. The results of this analysis showed that HIVDR interpretations based on full-length versus short RT sequences, using the Stanford algorithms, had concordance significantly above 95%. When using the virco®TYPE algorithm, similar concordance was demonstrated (>95%), but some differences were observed for d4T, AZT and TDF, where predictions were affected in more than 5% of the sequences. Most differences in interpretation, however, were due to shifts from fully susceptible to reduced susceptibility (d4T) or from reduced response to minimal response (AZT, TDF) or vice versa, as compared to the predicted full RT sequence. The virco®TYPE prediction uses many more mutations outside the RT 41-238 amino acid domain, which significantly contribute to the HIVDR prediction for these 3 antiretroviral agents. Conclusions This study illustrates the acceptability of using a shortened RT sequences (codon 41-238) to obtain reliable genotype interpretations

  5. Mutational analysis of DBD*--a unique antileukemic gene sequence.

    PubMed

    Ji, Yan-shan; Johnson, Betty H; Webb, M Scott; Thompson, E Brad

    2002-01-01

    DBD* is a novel gene encoding an 89 amino acid peptide that is constitutively lethal to leukemic cells. DBD* was derived from the DNA binding domain of the human glucocorticoid receptor by a frameshift that replaces the final 21 C-terminal amino acids of the domain. Previous studies suggested that DBD* no longer acted as the natural DNA binding domain. To confirm and extend these results, we mutated DBD* in 29 single amino acid positions, critical for the function in the native domain or of possible functional significance in the novel 21 amino acid C-terminal sequence. Steroid-resistant leukemic ICR-27-4 cells were transiently transfected by electroporation with each of the 29 mutants. Cell kill was evaluated by trypan blue dye exclusion, a WST-1 tetrazolium-based assay for cell respiration, propidium iodide exclusion, and Hoechst 33258 staining of chromatin. Eleven of the 29 point mutants increased, whereas four decreased antileukemic activity. The remainder had no effect on activity. The nonconcordances between these effects and native DNA binding domain function strongly suggest that the lethality of DBD* is distinct from that of the glucocorticoid receptor. Transfections of fragments of DBD* showed that optimal activity localized to the sequence for its C-terminal 32 amino acids.

  6. Applying machine learning techniques to DNA sequence analysis

    SciTech Connect

    Shavlik, J.W.

    1992-01-01

    We are developing a machine learning system that modifies existing knowledge about specific types of biological sequences. It does this by considering sample members and nonmembers of the sequence motif being learned. Using this information (which we call a domain theory''), our learning algorithm produces a more accurate representation of the knowledge needed to categorize future sequences. Specifically, the KBANN algorithm maps inference rules, such as consensus sequences, into a neural (connectionist) network. Neural network training techniques then use the training examples of refine these inference rules. We have been applying this approach to several problems in DNA sequence analysis and have also been extending the capabilities of our learning system along several dimensions.

  7. Applying machine learning techniques to DNA sequence analysis

    SciTech Connect

    Shavlik, J.W. . Dept. of Computer Sciences); Noordewier, M.O. . Dept. of Computer Science)

    1992-01-01

    We are primarily developing a machine teaming (ML) system that modifies existing knowledge about specific types of biological sequences. It does this by considering sample members and nonmembers of the sequence motif being teamed. Using this information, our teaming algorithm produces a more accurate representation of the knowledge needed to categorize future sequences. Specifically, our KBANN algorithm maps inference rules about a given recognition task into a neural network. Neural network training techniques then use the training examples to refine these inference rules. We call these rules a domain theory, following the convention in the machine teaming community. We have been applying this approach to several problems in DNA sequence analysis. In addition, we have been extending the capabilities of our teaming system along several dimensions. We have also been investigating parallel algorithms that perform sequence alignments in the presence of frameshift errors.

  8. Boric Acid in Kjeldahl Analysis

    ERIC Educational Resources Information Center

    Cruz, Gregorio

    2013-01-01

    The use of boric acid in the Kjeldahl determination of nitrogen is a variant of the original method widely applied in many laboratories all over the world. Its use is recommended by control organizations such as ISO, IDF, and EPA because it yields reliable and accurate results. However, the chemical principles the method is based on are not…

  9. Boric Acid in Kjeldahl Analysis

    ERIC Educational Resources Information Center

    Cruz, Gregorio

    2013-01-01

    The use of boric acid in the Kjeldahl determination of nitrogen is a variant of the original method widely applied in many laboratories all over the world. Its use is recommended by control organizations such as ISO, IDF, and EPA because it yields reliable and accurate results. However, the chemical principles the method is based on are not…

  10. Principal component analysis of phenolic acid spectra

    USDA-ARS?s Scientific Manuscript database

    Phenolic acids are common plant metabolites that exhibit bioactive properties and have applications in functional food and animal feed formulations. The ultraviolet (UV) and infrared (IR) spectra of four closely related phenolic acid structures were evaluated by principal component analysis (PCA) to...

  11. Amino acid isotopic analysis in agricultural systems

    USDA-ARS?s Scientific Manuscript database

    A relatively new approach to stable isotopic analysis—referred to as compound-specific isotopic analysis (CSIA)—has emerged, centering on the measurement of 15N:14N ratios in amino acids (glutamic acid and phenylalanine). CSIA has recently been used to generate trophic position estimates among anima...

  12. [Cloning and sequence analysis of 55 K protein of egg drop syndrome virus].

    PubMed

    Zhu, L; Jin, Q; Zeng, L

    1999-06-30

    For understanding the characteristics of genomic structure of egg drop syndrome virus(EDSV). Nucleic acid was extracted using routine method from weak virulent strain AA-2 of EDSV isolated from Chinese sick hens. Construction of the whole genomic library was by hydrolysis with Hind III, strand encoding 55 K gene locating in Hind III--A segment was sequenced and analyzed. The open reading frame has a length of 1,014 nt and codes a polypeptide of 337 amino acids with molecular weight of 38,200. Analysis of the amino acid sequence revealed a homology from 25.5%-32.4% to the 55 K protein of human adenovirus types 2, 12, 40, canine adenovirus and fowl adenoviruses of group 1, whereas to ovine adenovirus is 46.4%. The genomic structure of EDSV has some relationship with adenoviruses.

  13. EST sequencing of Onychophora and phylogenomic analysis of Metazoa.

    PubMed

    Roeding, Falko; Hagner-Holler, Silke; Ruhberg, Hilke; Ebersberger, Ingo; von Haeseler, Arndt; Kube, Michael; Reinhardt, Richard; Burmester, Thorsten

    2007-12-01

    Onychophora (velvet worms) represent a small animal taxon considered to be related to Euarthropoda. We have obtained 1873 5' cDNA sequences (expressed sequence tags, ESTs) from the velvet worm Epiperipatus sp., which were assembled into 833 contigs. BLAST similarity searches revealed that 51.9% of the contigs had matches in the protein databases with expectation values lower than 10(-4). Most ESTs had the best hit with proteins from either Chordata or Arthropoda (approximately 40% respectively). The ESTs included sequences of 27 ribosomal proteins. The orthologous sequences from 28 other species of a broad range of phyla were obtained from the databases, including other EST projects. A concatenated amino acid alignment comprising 5021 positions was constructed, which covers 4259 positions when problematic regions were removed. Bayesian and maximum likelihood methods place Epiperipatus within the monophyletic Ecdysozoa (Onychophora, Arthropoda, Tardigrada and Nematoda), but its exact relation to the Euarthropoda remained unresolved. The "Articulata" concept was not supported. Tardigrada and Nematoda formed a well-supported monophylum, suggesting that Tardigrada are actually Cycloneuralia. In agreement with previous studies, we have demonstrated that random sequencing of cDNAs results in sequence information suitable for phylogenomic approaches to resolve metazoan relationships.

  14. Rat mammary-gland transferrin: nucleotide sequence, phylogenetic analysis and glycan structure.

    PubMed Central

    Escrivá, H; Pierce, A; Coddeville, B; González, F; Benaissa, M; Léger, D; Wieruszeski, J M; Spik, G; Pamblanco, M

    1995-01-01

    The complete cDNA for rat mammary-gland transferrin (Tf) has been sequenced and also the native protein isolated from milk in order to analyse the structure of the main glycan variants present. A lactating-rat mammary-gland cDNA library in lambda gt10 was screened with a partial cDNA copy of rat liver Tf and subsequently rescreened with 5' fragments of the longest clones. This produced a 2275 bp insert coding for an open reading frame of 695 amino acid residues. This includes a 19-amino acid signal sequence and the mature protein containing 676 amino acids and one N-glycosylation site in the C-terminal domain at residue 490. Phylogenetic analysis was carried out using 14 translated Tf nucleotide sequences, and the derived evolutionary tree shows that at least three gene duplication events have occurred during Tf evolution, one of which generated the N- and C-terminal domains and occurred before separation of arthropods and chordates. The two halves of human melanotransferrin are more similar to each other than to any other sequence, which contrasts with the pattern shown by the remaining sequences. Native rat milk Tf is separated into four bands on native PAGE that differ only in their sialic acid content: one biantennary glycan is present containing either no sialic acid residues or up to three. The complete structures of the two major variants were determined by methylation, m.s. and 400 MHz 1H-n.m.r. spectroscopy. They contain either one or two neuraminic acid residues (alpha 2-->6)-linked to galactose in conventional biantennary N-acetyl-lactosamine-type glycans. Most contain fucose (alpha 1-->6)-linked to the terminal non-reducing N-acetylglucosamine. Images Figure 4 PMID:7717992

  15. Analyses of mitochondrial amino acid sequence datasets support the proposal that specimens of Hypodontus macropi from three species of macropodid hosts represent distinct species

    PubMed Central

    2013-01-01

    Background Hypodontus macropi is a common intestinal nematode of a range of kangaroos and wallabies (macropodid marsupials). Based on previous multilocus enzyme electrophoresis (MEE) and nuclear ribosomal DNA sequence data sets, H. macropi has been proposed to be complex of species. To test this proposal using independent molecular data, we sequenced the whole mitochondrial (mt) genomes of individuals of H. macropi from three different species of hosts (Macropus robustus robustus, Thylogale billardierii and Macropus [Wallabia] bicolor) as well as that of Macropicola ocydromi (a related nematode), and undertook a comparative analysis of the amino acid sequence datasets derived from these genomes. Results The mt genomes sequenced by next-generation (454) technology from H. macropi from the three host species varied from 13,634 bp to 13,699 bp in size. Pairwise comparisons of the amino acid sequences predicted from these three mt genomes revealed differences of 5.8% to 18%. Phylogenetic analysis of the amino acid sequence data sets using Bayesian Inference (BI) showed that H. macropi from the three different host species formed distinct, well-supported clades. In addition, sliding window analysis of the mt genomes defined variable regions for future population genetic studies of H. macropi in different macropodid hosts and geographical regions around Australia. Conclusions The present analyses of inferred mt protein sequence datasets clearly supported the hypothesis that H. macropi from M. robustus robustus, M. bicolor and T. billardierii represent distinct species. PMID:24261823

  16. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants.

    PubMed

    Gundry, Michael; Vijg, Jan

    2012-01-03

    DNA mutations are the source of genetic variation within populations. The majority of mutations with observable effects are deleterious. In humans mutations in the germ line can cause genetic disease. In somatic cells multiple rounds of mutations and selection lead to cancer. The study of genetic variation has progressed rapidly since the completion of the draft sequence of the human genome. Recent advances in sequencing technology, most importantly the introduction of massively parallel sequencing (MPS), have resulted in more than a hundred-fold reduction in the time and cost required for sequencing nucleic acids. These improvements have greatly expanded the use of sequencing as a practical tool for mutation analysis. While in the past the high cost of sequencing limited mutation analysis to selectable markers or small forward mutation targets assumed to be representative for the genome overall, current platforms allow whole genome sequencing for less than $5000. This has already given rise to direct estimates of germline mutation rates in multiple organisms including humans by comparing whole genome sequences between parents and offspring. Here we present a brief history of the field of mutation research, with a focus on classical tools for the measurement of mutation rates. We then review MPS, how it is currently applied and the new insight into human and animal mutation frequencies and spectra that has been obtained from whole genome sequencing. While great progress has been made, we note that the single most important limitation of current MPS approaches for mutation analysis is the inability to address low-abundance mutations that turn somatic tissues into mosaics of cells. Such mutations are at the basis of intra-tumor heterogeneity, with important implications for clinical diagnosis, and could also contribute to somatic diseases other than cancer, including aging. Some possible approaches to gain access to low-abundance mutations are discussed, with a brief

  17. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants

    PubMed Central

    Gundry, Michael; Vijg, Jan

    2011-01-01

    DNA mutations are the source of genetic variation within populations. The majority of mutations with observable effects are deleterious. In humans mutations in the germ line can cause genetic disease. In somatic cells multiple rounds of mutations and selection lead to cancer. The study of genetic variation has progressed rapidly since the completion of the draft sequence of the human genome. Recent advances in sequencing technology, most importantly the introduction of massively parallel sequencing (MPS), have resulted in more than a hundred-fold reduction in the time and cost required for sequencing nucleic acids. These improvements have greatly expanded the use of sequencing as a practical tool for mutation analysis. While in the past the high cost of sequencing limited mutation analysis to selectable markers or small forward mutation targets assumed to be representative for the genome overall, current platforms allow whole genome sequencing for less than $5,000. This has already given rise to direct estimates of germline mutation rates in multiple organisms including humans by comparing whole genome sequences between parents and offspring. Here we present a brief history of the field of mutation research, with a focus on classical tools for the measurement of mutation rates. We then review MPS, how it is currently applied and the new insight into human and animal mutation frequencies and spectra that has been obtained from whole genome sequencing. While great progress has been made, we note that the single most important limitation of current MPS approaches for mutation analysis is the inability to address low-abundance mutations that turn somatic tissues into mosaics of cells. Such mutations are at the basis of intra-tumor heterogeneity, with important implications for clinical diagnosis, and could also contribute to somatic diseases other than cancer, including aging. Some possible approaches to gain access to low-abundance mutations are discussed, with a

  18. The amino acid sequence of cytochromes c-551 from three species of Pseudomonas

    PubMed Central

    Ambler, R. P.; Wynn, Margaret

    1973-01-01

    The amino acid sequences of the cytochromes c-551 from three species of Pseudomonas have been determined. Each resembles the protein from Pseudomonas strain P6009 (now known to be Pseudomonas aeruginosa, not Pseudomonas fluorescens) in containing 82 amino acids in a single peptide chain, with a haem group covalently attached to cysteine residues 12 and 15. In all four sequences 43 residues are identical. Although by bacteriological criteria the organisms are closely related, the differences between pairs of sequences range from 22% to 39%. These values should be compared with the differences in the sequence of mitochondrial cytochrome c between mammals and amphibians (about 18%) or between mammals and insects (about 33%). Detailed evidence for the amino acid sequences of the proteins has been deposited as Supplementary Publication SUP 50015 at the National Lending Library for Science and Technology, Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1973), 131, 5. PMID:4352718

  19. Draft Genome Sequence of Sorghum Grain Mold Fungus Epicoccum sorghinum, a Producer of Tenuazonic Acid

    PubMed Central

    Oliveira, Rodrigo C.; Davenport, Karen W.; Hovde, Blake; Silva, Danielle; Chain, Patrick S. G.; Correa, Benedito

    2017-01-01

    ABSTRACT The facultative plant pathogen Epicoccum sorghinum is associated with grain mold of sorghum and produces the mycotoxin tenuazonic acid. This fungus can have serious economic impact on sorghum production. Here, we report the draft genome sequence of E. sorghinum (USPMTOX48). PMID:28126937

  20. Snake venom. The amino acid sequence of protein A from Dendroaspis polylepis polylepis (black mamba) venom.

    PubMed

    Joubert, F J; Strydom, D J

    1980-12-01

    Protein A from Dendroaspis polylepis polylepis venom comprises 81 amino acids, including ten half-cystine residues. The complete primary structures of protein A and its variant A' were elucidated. The sequences of proteins A and A', which differ in a single position, show no homology with various neurotoxins and non-neurotoxic proteins and represent a new type of elapid venom protein.

  1. Draft Genome Sequence of Bacillus coagulans NL01, a Wonderful l-Lactic Acid Producer

    PubMed Central

    Zheng, Zhaojuan; Jiang, Ting; Lin, Xi; Zhou, Jie

    2015-01-01

    Here, we report the draft genome sequence of Bacillus coagulans NL01, which could produce high optically pure l-lactic acid using xylose as a sole carbon source. The draft genome is 3,505,081 bp, with 144 contigs. About 3,903 protein-coding genes and 92 rRNAs are predicted from this assembly. PMID:26089419

  2. [DNA analysis for the post genome-sequencing era].

    PubMed

    Kambara, Hideki

    2002-05-01

    With the completion of the human genome sequencing, the new post genome-sequencing era has started. The major subjects are clarifying the function of genes to apply this information to medical as well as various industrial fields. Various DNA analysis methods and instruments for gene expression profiling as well as genetic diversity including SNPs typing are required and have been developed. Here, the history and technologies related to DNA analysis including the Wada project in the early 1980's, and the Human genome project from 1990 are described. Various new technologies have developed in this decade. They include a capillary gel array DNA sequencer, DNA chips, bead probe arrays, a new DNA sequencing method using pyrosequencing and an efficient SNP typing method by BAMPER.

  3. Basic Sequence Analysis Techniques for Use with Audit Trail Data

    ERIC Educational Resources Information Center

    Judd, Terry; Kennedy, Gregor

    2008-01-01

    Audit trail analysis can provide valuable insights to researchers and evaluators interested in comparing and contrasting designers' expectations of use and students' actual patterns of use of educational technology environments (ETEs). Sequence analysis techniques are particularly effective but have been neglected to some extent because of real…

  4. Food Fish Identification from DNA Extraction through Sequence Analysis

    ERIC Educational Resources Information Center

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  5. Food Fish Identification from DNA Extraction through Sequence Analysis

    ERIC Educational Resources Information Center

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  6. Applications of recursive segmentation to the analysis of DNA sequences.

    PubMed

    Li, Wentian; Bernaola-Galván, Pedro; Haghighi, Fatameh; Grosse, Ivo

    2002-07-01

    Recursive segmentation is a procedure that partitions a DNA sequence into domains with a homogeneous composition of the four nucleotides A, C, G and T. This procedure can also be applied to any sequence converted from a DNA sequence, such as to a binary strong(G + C)/weak(A + T) sequence, to a binary sequence indicating the presence or absence of the dinucleotide CpG, or to a sequence indicating both the base and the codon position information. We apply various conversion schemes in order to address the following five DNA sequence analysis problems: isochore mapping, CpG island detection, locating the origin and terminus of replication in bacterial genomes, finding complex repeats in telomere sequences, and delineating coding and noncoding regions. We find that the recursive segmentation procedure can successfully detect isochore borders, CpG islands, and the origin and terminus of replication, but it needs improvement for detecting complex repeats as well as borders between coding and noncoding regions.

  7. DNA shotgun sequencing analysis of Garcinia mangostana L. variety Mesta.

    PubMed

    Abu Bakar, Syuhaidah; Kumar, Suresh; Loke, Kok-Keong; Goh, Hoe-Han; Mohd Noor, Normah

    2017-06-01

    Mangosteen (Garcinia mangostana Linn.) is an ultra-tropical tree characterized by its unique dark purple fruits with white flesh. The xanthone-rich purple pericarp tissue contains valuable compounds with medicinal properties. Following previously reported genome sequencing of a common variety of mangosteen [1], we performed another whole genome sequencing of a commercially popular variety of this fruit species (var. Mesta) for comparative analysis of its genome composition. Raw reads of the DNA sequencing project were deposited to SRA database with the accession number SRX2709728.

  8. Amino acid sequence of myoglobin from white-tailed deer (Odocoileus virginianus).

    PubMed

    Joseph, Poulson; Suman, Surendranath P; Li, Shuting; Fontaine, Michele; Steinke, Laurey

    2012-10-01

    Our objective was to determine the primary structure of white-tailed deer myoglobin (Mb). White-tailed deer Mb was isolated from cardiac muscles employing ammonium sulfate precipitation and gel-filtration chromatography. The amino acid sequence was determined by Edman degradation. Sequence analyses of intact Mb as well as tryptic- and cyanogen bromide-peptides yielded the complete primary structure of white-tailed deer Mb, which shared 100% similarity with red deer Mb. White-tailed deer Mb consists of 153 amino acid residues and shares more than 96% sequence similarity with myoglobins from meat-producing ruminants, such as cattle, buffalo, sheep, and goat. Similar to sheep and goat myoglobins, white-tailed deer Mb contains 12 histidine residues. Proximal (position 93) and distal (position 64) histidine residues responsible for maintaining the stability of heme are conserved in white-tailed deer Mb.

  9. Amino acid sequences of heterotrophic and photosynthetic ferredoxins from the tomato plant (Lycopersicon esculentum Mill.).