Science.gov

Sample records for abrf edman sequencing

  1. ABRF ESRG 2005 Study: Identification of Seven Modified Amino Acids by Edman Sequencing

    PubMed Central

    Brune, D.; Denslow, N.D.; Kobayashi, R.; Lane, W.S.; Leone, J.W.; Madden, B.J.; Neveu, J. M.; Pohl, J.

    2006-01-01

    Identification of modified amino acids can be a challenging part for Edman degradation sequence analysis, largely because they are not included among the commonly used phenylthiohydantion amino acid standards. Yet many can have unique retention times and can be assigned by an experienced researcher or through the use of a guide showing their typical chromatography characteristics. The Edman Sequencing Research Group (ESRG) 2005 study is a continuation of the 2004 study, in which the participating laboratories were provided a synthetic peptide and asked to identify the modified amino acids present in the sequence. The study sample provided an opportunity to sequence a peptide containing a variety of modified amino acids and note their retention times relative to the common amino acids. It also allowed the ESRG to compile the chromatographic properties and intensities from multiple instruments and tabulate an average elution position for these modified amino acids on commonly used instruments. Participating laboratories were given 2000 pmoles of a synthetic peptide, 18 amino acids long, containing the following modified amino acids: dimethyl- and trimethyl-lysine, 3-methyl-histidine, N-carbamyl-lysine, cystine, N-methyl-alanine, and isoaspartic acid. The modified amino acids were interspersed with standard amino acids to help in the assessment of initial and repetitive yields. In addition to filling in an assignment sheet, which included retention times and peak areas, participants were asked to provide specific details about the parameters used for the sequencing run. References for some of the modified amino acid elution characteristics were provided and the participants had the option of viewing a list of the modified amino acids present in the peptide at the ESRG Web site. The ABRF ESRG 2005 sample is the seventeenth in a series of studies designed to aid laboratories in evaluating their abilities to obtain and interpret amino acid sequence data. PMID:17122064

  2. The ABRF Edman Sequencing Research Group 2008 Study: Investigation into Homopolymeric Amino Acid N-Terminal Sequence Tags and Their Effects on Automated Edman Degradation

    PubMed Central

    Thoma, R. S.; Smith, J. S.; Sandoval, W.; Leone, J. W.; Hunziker, P.; Hampton, B.; Linse, K. D.; Denslow, N. D.

    2009-01-01

    The Edman Sequence Research Group (ESRG) of the Association of Biomolecular Resource designs and executes interlaboratory studies investigating the use of automated Edman degradation for protein and peptide analysis. In 2008, the ESRG enlisted the help of core sequencing facilities to investigate the effects of a repeating amino acid tag at the N-terminus of a protein. Commonly, to facilitate protein purification, an affinity tag containing a polyhistidine sequence is conjugated to the N-terminus of the protein. After expression, polyhistidine-tagged protein is readily purified via chelation with an immobilized metal affinity resin. The addition of the polyhistidine tag presents unique challenges for the determination of protein identity using Edman degradation chemistry. Participating laboratories were asked to sequence one protein engineered in three configurations: with an N-terminal polyhistidine tag; with an N-terminal polyalanine tag; or with no tag. Study participants were asked to return a data file containing the uncorrected amino acid picomole yields for the first 17 cycles. Initial and repetitive yield (R.Y.) information and the amount of lag were evaluated. Information about instrumentation and sample treatment was also collected as part of the study. For this study, the majority of participating laboratories successfully called the amino acid sequence for 17 cycles for all three test proteins. In general, laboratories found it more difficult to call the sequence containing the polyhistidine tag. Lag was observed earlier and more consistently with the polyhistidine-tagged protein than the polyalanine-tagged protein. Histidine yields were significantly less than the alanine yields in the tag portion of each analysis. The polyhistidine and polyalanine protein-R.Y. calculations were found to be equivalent. These calculations showed that the nontagged portion from each protein was equivalent. The terminal histidines from the tagged portion of the protein

  3. High-throughput sequencing of peptoids and peptide-peptoid hybrids by partial edman degradation and mass spectrometry.

    PubMed

    Thakkar, Amit; Cohen, Allison S; Connolly, Michael D; Zuckermann, Ronald N; Pei, Dehua

    2009-03-09

    A method for the rapid sequence determination of peptoids [oligo(N-substituted glycines)] and peptide-peptoid hybrids selected from one-bead-one-compound combinatorial libraries has been developed. In this method, beads carrying unique peptoid (or peptide-peptoid) sequences were subjected to multiple cycles of partial Edman degradation (PED) by treatment with a 1:3 (mol/mol) mixture of phenyl isothiocyanate (PITC) and 9-fluorenylmethyl chloroformate (Fmoc-Cl) to generate a series of N-terminal truncation products for each resin-bound peptoid. After PED, the Fmoc group was removed from the N-terminus and any reacted side chains via piperidine treatment. The resulting mixture of the full-length peptoid and its truncation products was analyzed by matrix-assisted laser desorption ionization (MALDI) mass spectrometry, to reveal the sequence of the full-length peptoid. With a slight modification, the method was also effective in the sequence determination of peptide-peptoid hybrids. This rapid, high-throughput, sensitive, and inexpensive sequencing method should greatly expand the utility of combinatorial peptoid libraries in biomedical and materials research.

  4. Primary structure of three cationic peptides from porcine neutrophils. Sequence determination by the combined usage of electrospray ionization mass spectrometry and Edman degradation.

    PubMed

    Mirgorodskaya, O A; Shevchenko, A A; Abdalla, K O; Chernushevich, I V; Egorov, T A; Musoliamov, A X; Kokryakov, V N; Shamova, O V

    1993-09-20

    The primary structure of three major cationic peptides from porcine neutrophils has been determined. The sequencing was made by the combined use of electrospray ionization mass spectrometry and Edman degradation. The determined sequences unambiguously show that these peptides can not be considered as defensins.

  5. Rapid on-membrane proteolytic cleavage for Edman sequencing and mass spectrometric identification of proteins.

    PubMed

    Pham, Victoria C; Henzel, William J; Lill, Jennie R

    2005-11-01

    A method for the rapid limited enzymatic cleavage of PVDF membrane-immobilized proteins is described. This method allows the fast characterization of PVDF blotted proteins by peptide mass fingerprinting (Henzel, W. J., Billeci, T. M., Stults, J. T., Wong, S. C., Grimley, C., Wantanabe, C., Proc. Natl. Acad. Sci. USA 1993, 90, 5011-5015), LC-MS/MS, or N-terminal sequencing and has been demonstrated on a range of proteins using a full complement of proteolytic enzymes. This technique allows the generation of proteolytic fragments between 5 and 60 min (depending on the enzyme employed), which is significantly faster than previously reported on-membrane digestion methods. To date, this on-membrane rapid digestion protocol has aided in the identification and confirmation of mutation sites in over 200 recombinant proteins.

  6. Identification of Optimal Protocols for Sequencing Difficult Templates: Results of the 2008 ABRF DNA Sequencing Research Group Difficult Template Study 2008

    PubMed Central

    Kieleczawa, Jan; Adam, Debbie; Bintzler, Doug; Detwiler, Michelle; Needleman, David; Schweitzer, Peter; Singh, Sushmita; Steen, Robert; Zianni, Michael

    2009-01-01

    The 2008 ABRF DNA Sequencing Research Group (DSRG) difficult template sequencing study was designed to identify a general set of guidelines that would constitute the best approaches for sequencing difficult templates. This was a continuation of previous DSRG difficult template studies performed in 1996, 1997, and 2003. The distinguishing factors in the present study were the number of DNA templates used, the number of different types of difficult regions tested, and the inclusion of a follow-up phase of the study to identify optimal protocols for each type of difficult template. DNA templates with associated sequencing primers were distributed to participating laboratories and each laboratory returned their sequencing results along with descriptions of the experimental conditions used. The data were analyzed and the best protocols were identified for each difficult template. This information was subsequently distributed to the participating laboratories for a second round of sequencing to evaluate the general applicability of the optimized protocols. The average improvements in sequencing results were 11% overall, with a range of −25% to +43% using the optimized protocols. The full results from this study are presented here and they demonstrate that general experimental protocols and common additives can be used to improve the sequencing success for many difficult templates. PMID:19503623

  7. Identification of optimal protocols for sequencing difficult templates: results of the 2008 ABRF DNA Sequencing Research Group difficult template study 2008.

    PubMed

    Kieleczawa, Jan; Adam, Debbie; Bintzler, Doug; Detwiler, Michelle; Needleman, David; Schweitzer, Peter; Singh, Sushmita; Steen, Robert; Zianni, Michael

    2009-04-01

    The 2008 ABRF DNA Sequencing Research Group (DSRG) difficult template sequencing study was designed to identify a general set of guidelines that would constitute the best approaches for sequencing difficult templates. This was a continuation of previous DSRG difficult template studies performed in 1996, 1997, and 2003. The distinguishing factors in the present study were the number of DNA templates used, the number of different types of difficult regions tested, and the inclusion of a follow-up phase of the study to identify optimal protocols for each type of difficult template. DNA templates with associated sequencing primers were distributed to participating laboratories and each laboratory returned their sequencing results along with descriptions of the experimental conditions used. The data were analyzed and the best protocols were identified for each difficult template. This information was subsequently distributed to the participating laboratories for a second round of sequencing to evaluate the general applicability of the optimized protocols. The average improvements in sequencing results were 11% overall, with a range of -25% to +43% using the optimized protocols. The full results from this study are presented here and they demonstrate that general experimental protocols and common additives can be used to improve the sequencing success for many difficult templates.

  8. Multi-platform and cross-methodological reproducibility of transcriptome profiling by RNA-seq in the ABRF Next-Generation Sequencing Study

    PubMed Central

    Nicolet, Charles M.; Grove, Deborah; Levy, Shawn; Farmerie, William; Viale, Agnes; Wright, Chris; Schweitzer, Peter A.; Gao, Yuan; Kim, Dewey; Boland, Joe; Hicks, Belynda; Kim, Ryan; Chhangawala, Sagar; Jafari, Nadereh; Raghavachari, Nalini; Gandara, Jorge; Garcia-Reyero, Natàlia; Hendrickson, Cynthia; Roberson, David; Rosenfeld, Jeffrey; Smith, Todd; Underwood, Jason G.; Wang, May; Zumbo, Paul; Baldwin, Don A.; Grills, George S.; Mason, Christopher E.

    2014-01-01

    High-throughput RNA sequencing (RNA-seq) dramatically expands the potential for novel genomics discoveries, but the wide variety of platforms, protocols and performance has created the need for comprehensive reference data. Here we describe the Association of Biomolecular Resource Facilities next-generation sequencing (ABRF-NGS) study on RNA-seq. We tested replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols (polyA-selected, ribo-depleted, size-selected and degraded) on five sequencing platforms (Illumina HiSeq, Life Technologies’ PGM and Proton, Pacific Biosciences RS and Roche’s 454). The results show high intra-platform and inter-platform concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms. These data also demonstrate that ribosomal RNA depletion can both enable effective analysis of degraded RNA samples and be readily compared to polyA-enriched fractions. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq. PMID:25150835

  9. Detection of DBD-carbamoyl amino acids in amino acid sequence and D/L configuration determination of peptides with fluorogenic Edman reagent 7-[(N,N-dimethylamino)sulfonyl]-2,1,3-benzoxadiazol-4-yl isothiocyanate.

    PubMed

    Huang, Y; Matsunaga, H; Toriba, A; Santa, T; Fukushima, T; Imai, K

    1999-06-01

    A method for amino acid sequence and D/L configuration identification of peptides by using fluorogenic Edman reagent 7-[(N, N-dimethylamino)sulfonyl]-2,1,3-benzoxadiazol-4-yl isothiocyanate (DBD-NCS) has been developed. This method was based on the Edman degradation principle with some modifications. A peptide or protein was coupled with DBD-NCS under basic conditions and then cyclized/cleaved to produce DBD-thiazolinone (TZ) derivative by BF3, a Lewis acid, which could significantly suppress the amino acid racemization. The liberated DBD-TZ amino acid was hydrolyzed to DBD-thiocarbamoyl (TC) amino acid under a weakly acidic condition and then oxidized by NaNO2/H+ to DBD-carbamoyl (CA) amino acid which was a stable and had a strong fluorescence intensity. The individual DBD-CA amino acids were separated on a reversed-phase high-performance liquid chromatography (RP-HPLC) for amino acid sequencing and their enantiomers were resolved on a chiral stationary-phase HPLC for identifying their D/L configurations. Combination of the two HPLC systems, the amino acid sequence and D/L configuration of peptides could be determined. This method will be useful for searching D-amino-acid-containing peptides in animals.

  10. A photothermally responsive nanoprobe for bioimaging based on Edman degradation

    NASA Astrophysics Data System (ADS)

    Liu, Yi; Wang, Zhantong; Zhang, Huimin; Lang, Lixin; Ma, Ying; He, Qianjun; Lu, Nan; Huang, Peng; Liu, Yijing; Song, Jibin; Liu, Zhibo; Gao, Shi; Ma, Qingjie; Kiesewetter, Dale O.; Chen, Xiaoyuan

    2016-05-01

    A new type of photothermally responsive nanoprobe based on Edman degradation has been synthesized and characterized. Under irradiation by an 808 nm laser, the heat generated by the gold nanorod core breaks the thiocarbamide structure and releases the fluorescent dye Cy5.5 with increased near-infrared (NIR) fluorescence under mild acidic conditions. This RGD modified nanoprobe is capable of fluorescence imaging of ανβ3 over-expressing U87MG cells in vitro and in vivo. This Edman degradation-based nanoprobe provides a novel strategy to design activatable probes for biomedical imaging and drug/gene delivery.A new type of photothermally responsive nanoprobe based on Edman degradation has been synthesized and characterized. Under irradiation by an 808 nm laser, the heat generated by the gold nanorod core breaks the thiocarbamide structure and releases the fluorescent dye Cy5.5 with increased near-infrared (NIR) fluorescence under mild acidic conditions. This RGD modified nanoprobe is capable of fluorescence imaging of ανβ3 over-expressing U87MG cells in vitro and in vivo. This Edman degradation-based nanoprobe provides a novel strategy to design activatable probes for biomedical imaging and drug/gene delivery. Electronic supplementary information (ESI) available: HPLC, MS and 1H NMR spectrum. See DOI: 10.1039/c6nr01400c

  11. THE ABRF MARG MICROARRAY SURVEY 2005: TAKING THE PULSE ON THE MICROARRAY FIELD

    EPA Science Inventory

    Over the past several years microarray technology has evolved into a critical component of any discovery based program. Since 1999, the Association of Biomolecular Resource Facilities (ABRF) Microarray Research Group (MARG) has conducted biennial surveys designed to generate a pr...

  12. THE ABRF MARG MICROARRAY SURVEY 2005: TAKING THE PULSE ON THE MICROARRAY FIELD

    EPA Science Inventory

    Over the past several years microarray technology has evolved into a critical component of any discovery based program. Since 1999, the Association of Biomolecular Resource Facilities (ABRF) Microarray Research Group (MARG) has conducted biennial surveys designed to generate a pr...

  13. ABRF-PRG07: advanced quantitative proteomics study.

    PubMed

    Falick, Arnold M; Lane, William S; Lilley, Kathryn S; MacCoss, Michael J; Phinney, Brett S; Sherman, Nicholas E; Weintraub, Susan T; Witkowska, H Ewa; Yates, Nathan A

    2011-04-01

    A major challenge for core facilities is determining quantitative protein differences across complex biological samples. Although there are numerous techniques in the literature for relative and absolute protein quantification, the majority is nonroutine and can be challenging to carry out effectively. There are few studies comparing these technologies in terms of their reproducibility, accuracy, and precision, and no studies to date deal with performance across multiple laboratories with varied levels of expertise. Here, we describe an Association of Biomolecular Resource Facilities (ABRF) Proteomics Research Group (PRG) study based on samples composed of a complex protein mixture into which 12 known proteins were added at varying but defined ratios. All of the proteins were present at the same concentration in each of three tubes that were provided. The primary goal of this study was to allow each laboratory to evaluate its capabilities and approaches with regard to: detection and identification of proteins spiked into samples that also contain complex mixtures of background proteins and determination of relative quantities of the spiked proteins. The results returned by 43 participants were compiled by the PRG, which also collected information about the strategies used to assess overall performance and as an aid to development of optimized protocols for the methodologies used. The most accurate results were generally reported by the most experienced laboratories. Among laboratories that used the same technique, values that were closer to the expected ratio were obtained by more experienced groups.

  14. ABRF-PRG07: Advanced Quantitative Proteomics Study

    PubMed Central

    Falick, Arnold M.; Lane, William S.; Lilley, Kathryn S.; MacCoss, Michael J.; Phinney, Brett S.; Sherman, Nicholas E.; Weintraub, Susan T.; Witkowska, H. Ewa; Yates, Nathan A.

    2011-01-01

    A major challenge for core facilities is determining quantitative protein differences across complex biological samples. Although there are numerous techniques in the literature for relative and absolute protein quantification, the majority is nonroutine and can be challenging to carry out effectively. There are few studies comparing these technologies in terms of their reproducibility, accuracy, and precision, and no studies to date deal with performance across multiple laboratories with varied levels of expertise. Here, we describe an Association of Biomolecular Resource Facilities (ABRF) Proteomics Research Group (PRG) study based on samples composed of a complex protein mixture into which 12 known proteins were added at varying but defined ratios. All of the proteins were present at the same concentration in each of three tubes that were provided. The primary goal of this study was to allow each laboratory to evaluate its capabilities and approaches with regard to: detection and identification of proteins spiked into samples that also contain complex mixtures of background proteins and determination of relative quantities of the spiked proteins. The results returned by 43 participants were compiled by the PRG, which also collected information about the strategies used to assess overall performance and as an aid to development of optimized protocols for the methodologies used. The most accurate results were generally reported by the most experienced laboratories. Among laboratories that used the same technique, values that were closer to the expected ratio were obtained by more experienced groups. PMID:21455478

  15. THE ABRF-MARG MICROARRAY SURVEY 2004: TAKING THE PULSE OF THE MICROARRAY FIELD

    EPA Science Inventory

    Over the past several years, the field of microarrays has grown and evolved drastically. In its continued efforts to track this evolution, the ABRF-MARG has once again conducted a survey of international microarray facilities and individual microarray users. The goal of the surve...

  16. THE ABRF-MARG MICROARRAY SURVEY 2004: TAKING THE PULSE OF THE MICROARRAY FIELD

    EPA Science Inventory

    Over the past several years, the field of microarrays has grown and evolved drastically. In its continued efforts to track this evolution, the ABRF-MARG has once again conducted a survey of international microarray facilities and individual microarray users. The goal of the surve...

  17. Interlaboratory Study on Differential Analysis of Protein Glycosylation by Mass Spectrometry: The ABRF Glycoprotein Research Multi-Institutional Study 2012*

    PubMed Central

    Leymarie, Nancy; Griffin, Paula J.; Jonscher, Karen; Kolarich, Daniel; Orlando, Ron; McComb, Mark; Zaia, Joseph; Aguilan, Jennifer; Alley, William R.; Altmann, Friederich; Ball, Lauren E.; Basumallick, Lipika; Bazemore-Walker, Carthene R.; Behnken, Henning; Blank, Michael A.; Brown, Kristy J.; Bunz, Svenja-Catharina; Cairo, Christopher W.; Cipollo, John F.; Daneshfar, Rambod; Desaire, Heather; Drake, Richard R.; Go, Eden P.; Goldman, Radoslav; Gruber, Clemens; Halim, Adnan; Hathout, Yetrib; Hensbergen, Paul J.; Horn, David M.; Hurum, Deanna; Jabs, Wolfgang; Larson, Göran; Ly, Mellisa; Mann, Benjamin F.; Marx, Kristina; Mechref, Yehia; Meyer, Bernd; Möginger, Uwe; Neusüβ, Christian; Nilsson, Jonas; Novotny, Milos V.; Nyalwidhe, Julius O.; Packer, Nicolle H.; Pompach, Petr; Reiz, Bela; Resemann, Anja; Rohrer, Jeffrey S.; Ruthenbeck, Alexandra; Sanda, Miloslav; Schulz, Jan Mirco; Schweiger-Hufnagel, Ulrike; Sihlbom, Carina; Song, Ehwang; Staples, Gregory O.; Suckau, Detlev; Tang, Haixu; Thaysen-Andersen, Morten; Viner, Rosa I.; An, Yanming; Valmu, Leena; Wada, Yoshinao; Watson, Megan; Windwarder, Markus; Whittal, Randy; Wuhrer, Manfred; Zhu, Yiying; Zou, Chunxia

    2013-01-01

    One of the principal goals of glycoprotein research is to correlate glycan structure and function. Such correlation is necessary in order for one to understand the mechanisms whereby glycoprotein structure elaborates the functions of myriad proteins. The accurate comparison of glycoforms and quantification of glycosites are essential steps in this direction. Mass spectrometry has emerged as a powerful analytical technique in the field of glycoprotein characterization. Its sensitivity, high dynamic range, and mass accuracy provide both quantitative and sequence/structural information. As part of the 2012 ABRF Glycoprotein Research Group study, we explored the use of mass spectrometry and ancillary methodologies to characterize the glycoforms of two sources of human prostate specific antigen (PSA). PSA is used as a tumor marker for prostate cancer, with increasing blood levels used to distinguish between normal and cancer states. The glycans on PSA are believed to be biantennary N-linked, and it has been observed that prostate cancer tissues and cell lines contain more antennae than their benign counterparts. Thus, the ability to quantify differences in glycosylation associated with cancer has the potential to positively impact the use of PSA as a biomarker. We studied standard peptide-based proteomics/glycomics methodologies, including LC-MS/MS for peptide/glycopeptide sequencing and label-free approaches for differential quantification. We performed an interlaboratory study to determine the ability of different laboratories to correctly characterize the differences between glycoforms from two different sources using mass spectrometry methods. We used clustering analysis and ancillary statistical data treatment on the data sets submitted by participating laboratories to obtain a consensus of the glycoforms and abundances. The results demonstrate the relative strengths and weaknesses of top-down glycoproteomics, bottom-up glycoproteomics, and glycomics methods. PMID

  18. Interlaboratory study on differential analysis of protein glycosylation by mass spectrometry: the ABRF glycoprotein research multi-institutional study 2012.

    PubMed

    Leymarie, Nancy; Griffin, Paula J; Jonscher, Karen; Kolarich, Daniel; Orlando, Ron; McComb, Mark; Zaia, Joseph; Aguilan, Jennifer; Alley, William R; Altmann, Friederich; Ball, Lauren E; Basumallick, Lipika; Bazemore-Walker, Carthene R; Behnken, Henning; Blank, Michael A; Brown, Kristy J; Bunz, Svenja-Catharina; Cairo, Christopher W; Cipollo, John F; Daneshfar, Rambod; Desaire, Heather; Drake, Richard R; Go, Eden P; Goldman, Radoslav; Gruber, Clemens; Halim, Adnan; Hathout, Yetrib; Hensbergen, Paul J; Horn, David M; Hurum, Deanna; Jabs, Wolfgang; Larson, Göran; Ly, Mellisa; Mann, Benjamin F; Marx, Kristina; Mechref, Yehia; Meyer, Bernd; Möginger, Uwe; Neusüβ, Christian; Nilsson, Jonas; Novotny, Milos V; Nyalwidhe, Julius O; Packer, Nicolle H; Pompach, Petr; Reiz, Bela; Resemann, Anja; Rohrer, Jeffrey S; Ruthenbeck, Alexandra; Sanda, Miloslav; Schulz, Jan Mirco; Schweiger-Hufnagel, Ulrike; Sihlbom, Carina; Song, Ehwang; Staples, Gregory O; Suckau, Detlev; Tang, Haixu; Thaysen-Andersen, Morten; Viner, Rosa I; An, Yanming; Valmu, Leena; Wada, Yoshinao; Watson, Megan; Windwarder, Markus; Whittal, Randy; Wuhrer, Manfred; Zhu, Yiying; Zou, Chunxia

    2013-10-01

    One of the principal goals of glycoprotein research is to correlate glycan structure and function. Such correlation is necessary in order for one to understand the mechanisms whereby glycoprotein structure elaborates the functions of myriad proteins. The accurate comparison of glycoforms and quantification of glycosites are essential steps in this direction. Mass spectrometry has emerged as a powerful analytical technique in the field of glycoprotein characterization. Its sensitivity, high dynamic range, and mass accuracy provide both quantitative and sequence/structural information. As part of the 2012 ABRF Glycoprotein Research Group study, we explored the use of mass spectrometry and ancillary methodologies to characterize the glycoforms of two sources of human prostate specific antigen (PSA). PSA is used as a tumor marker for prostate cancer, with increasing blood levels used to distinguish between normal and cancer states. The glycans on PSA are believed to be biantennary N-linked, and it has been observed that prostate cancer tissues and cell lines contain more antennae than their benign counterparts. Thus, the ability to quantify differences in glycosylation associated with cancer has the potential to positively impact the use of PSA as a biomarker. We studied standard peptide-based proteomics/glycomics methodologies, including LC-MS/MS for peptide/glycopeptide sequencing and label-free approaches for differential quantification. We performed an interlaboratory study to determine the ability of different laboratories to correctly characterize the differences between glycoforms from two different sources using mass spectrometry methods. We used clustering analysis and ancillary statistical data treatment on the data sets submitted by participating laboratories to obtain a consensus of the glycoforms and abundances. The results demonstrate the relative strengths and weaknesses of top-down glycoproteomics, bottom-up glycoproteomics, and glycomics methods.

  19. Identification of novel periviscerokinins from single neurohaemal release sites in insects MS/MS fragmentation complemented by Edman degradation.

    PubMed

    Predel, R; Kellner, R; Baggerman, G; Steinmetzer, T; Schoofs, L

    2000-06-01

    Three novel members of the periviscerokinin family could be identified directly from extracts of single abdominal perisympathetic organs of blaberoid cockroaches by means of electrospray ionization-quadrupole time of flight (ESI-QTOF) MS. Sequences of these periviscerokinins were confirmed by Edman degradation. Their primary structures are GSSGLIPFGRT-NH2 (Lem-PVK-1), GSSGLISMPRV-NH2 (Lem-PVK-2), and GSSGMIPFPRV-NH2 (Lem-PVK-3). Hitherto only known from the American cockroach, this neuropeptide family contains a highly conserved N-terminus whereas, at the C-terminus, only the penultimate amino-acid residue (Arg) has been found in all members of this peptide family. The identified periviscerokinins are the only abundant myoactive peptides in abdominal perisympathetic organs of blaberoid cockroches and they appear to be absent in the retrocerebral complex. Screening of extracts of single abdominal perisympathetic organs (70-90 microm in diameter), from five different species of the suborder Blaberoidea, revealed that they all contain the three neuropeptides which are described here for the first time.

  20. ABRF Proteome Informatics Research Group (iPRG) 2015 Study: Detection of Differentially Abundant Proteins in Label-Free Quantitative LC-MS/MS Experiments.

    PubMed

    Choi, Meena; Eren-Dogu, Zeynep F; Colangelo, Christopher; Cottrell, John; Hoopmann, Michael R; Kapp, Eugene A; Kim, Sangtae; Lam, Henry; Neubert, Thomas A; Palmblad, Magnus; Phinney, Brett S; Weintraub, Susan T; MacLean, Brendan; Vitek, Olga

    2017-02-03

    Detection of differentially abundant proteins in label-free quantitative shotgun liquid chromatography-tandem mass spectrometry (LC-MS/MS) experiments requires a series of computational steps that identify and quantify LC-MS features. It also requires statistical analyses that distinguish systematic changes in abundance between conditions from artifacts of biological and technical variation. The 2015 study of the Proteome Informatics Research Group (iPRG) of the Association of Biomolecular Resource Facilities (ABRF) aimed to evaluate the effects of the statistical analysis on the accuracy of the results. The study used LC-tandem mass spectra acquired from a controlled mixture, and made the data available to anonymous volunteer participants. The participants used methods of their choice to detect differentially abundant proteins, estimate the associated fold changes, and characterize the uncertainty of the results. The study found that multiple strategies (including the use of spectral counts versus peak intensities, and various software tools) could lead to accurate results, and that the performance was primarily determined by the analysts' expertise. This manuscript summarizes the outcome of the study, and provides representative examples of good computational and statistical practice. The data set generated as part of this study is publicly available.

  1. The 2012/2013 ABRF Proteomic Research Group Study: Assessing Longitudinal Intralaboratory Variability in Routine Peptide Liquid Chromatography Tandem Mass Spectrometry Analyses*

    PubMed Central

    Bennett, Keiryn L.; Wang, Xia; Bystrom, Cory E.; Chambers, Matthew C.; Andacht, Tracy M.; Dangott, Larry J.; Elortza, Félix; Leszyk, John; Molina, Henrik; Moritz, Robert L.; Phinney, Brett S.; Thompson, J. Will; Bunger, Maureen K.; Tabb, David L.

    2015-01-01

    Questions concerning longitudinal data quality and reproducibility of proteomic laboratories spurred the Protein Research Group of the Association of Biomolecular Resource Facilities (ABRF-PRG) to design a study to systematically assess the reproducibility of proteomic laboratories over an extended period of time. Developed as an open study, initially 64 participants were recruited from the broader mass spectrometry community to analyze provided aliquots of a six bovine protein tryptic digest mixture every month for a period of nine months. Data were uploaded to a central repository, and the operators answered an accompanying survey. Ultimately, 45 laboratories submitted a minimum of eight LC-MSMS raw data files collected in data-dependent acquisition (DDA) mode. No standard operating procedures were enforced; rather the participants were encouraged to analyze the samples according to usual practices in the laboratory. Unlike previous studies, this investigation was not designed to compare laboratories or instrument configuration, but rather to assess the temporal intralaboratory reproducibility. The outcome of the study was reassuring with 80% of the participating laboratories performing analyses at a medium to high level of reproducibility and quality over the 9-month period. For the groups that had one or more outlying experiments, the major contributing factor that correlated to the survey data was the performance of preventative maintenance prior to the LC-MSMS analyses. Thus, the Protein Research Group of the Association of Biomolecular Resource Facilities recommends that laboratories closely scrutinize the quality control data following such events. Additionally, improved quality control recording is imperative. This longitudinal study provides evidence that mass spectrometry-based proteomics is reproducible. When quality control measures are strictly adhered to, such reproducibility is comparable among many disparate groups. Data from the study are

  2. Unraveling the sequence and structure of the protein osteocalcin from a 42 ka fossil horse

    NASA Astrophysics Data System (ADS)

    Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Andrews, Philip C.; Leykam, Joseph; Stafford, Thomas W.; Kelly, Robert L.; Walker, Danny N.; Buckley, Mike; Humpula, James

    2006-04-01

    We report the first complete amino acid sequence and evidence of secondary structure for osteocalcin from a temperate fossil. The osteocalcin derives from a 42 ka equid bone excavated from Juniper Cave, Wyoming. Results were determined by matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-MS) and Edman sequencing with independent confirmation of the sequence in two laboratories. The ancient sequence was compared to that of three modern taxa: horse ( Equus caballus), zebra ( Equus grevyi), and donkey ( Equus asinus). Although there was no difference in sequence among modern taxa, MALDI-MS and Edman sequencing show that residues 48 and 49 of our modern horse are Thr, Ala rather than Pro, Val as previously reported (Carstanjen B., Wattiez, R., Armory, H., Lepage, O.M., Remy, B., 2002. Isolation and characterization of equine osteocalcin. Ann. Med. Vet.146(1), 31-38). MALDI-MS and Edman sequencing data indicate that the osteocalcin sequence of the 42 ka fossil is similar to that of modern horse. Previously inaccessible structural attributes for ancient osteocalcin were observed. Glu 39 rather than Gln 39 is consistent with deamidation, a process known to occur during fossilization and aging. Two post-translational modifications were documented: Hyp 9 and a disulfide bridge. The latter suggests at least partial retention of secondary structure. As has been done for ancient DNA research, we recommend standards for preparation and criteria for authenticating results of ancient protein sequencing.

  3. Protein Sequencing with Tandem Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Ziady, Assem G.; Kinter, Michael

    The recent introduction of electrospray ionization techniques that are suitable for peptides and whole proteins has allowed for the design of mass spectrometric protocols that provide accurate sequence information for proteins. The advantages gained by these approaches over traditional Edman Degradation sequencing include faster analysis and femtomole, sometimes attomole, sensitivity. The ability to efficiently identify proteins has allowed investigators to conduct studies on their differential expression or modification in response to various treatments or disease states. In this chapter, we discuss the use of electrospray tandem mass spectrometry, a technique whereby protein-derived peptides are subjected to fragmentation in the gas phase, revealing sequence information for the protein. This powerful technique has been instrumental for the study of proteins and markers associated with various disorders, including heart disease, cancer, and cystic fibrosis. We use the study of protein expression in cystic fibrosis as an example.

  4. Primary structure of a histidine-rich proteolytic fragment of human ceruloplasmin. I. Amino acid sequence of the cyanogen bromide peptides.

    PubMed

    Kingston, I B; Kingston, B L; Putnam, F W

    1980-04-10

    A histidine-rich fragment, Cp F5, with a molecular weight of 18,650 was isolated from human ceruloplasmin. It consists of 159 amino acids and contains a possible copper-binding site. The sequence of the first 18 NH2-terminal residues of Cp F5 was determined by automated Edman degradation. Cp F5 was cleaved by cyanogen bromide to produce nine fragments of from 2 to 63 residues. The amino acid sequence of all of the cyanogen bromide fragments was investigated using automated and manual Edman degradation, the fragments being digested with trypsin, chymotrypsin, thermolysin, staphylococcal protease, and pepsin as appropriate. The results, in conjunction with the data on the tryptic peptides reported in the accompanying paper (Kingston, I.B., Kingston, B.L., and Putnam, F.L. (1980) J. Biol. Chem. 255, 2886-2896), establish the complete amino acid sequence of Cp F5.

  5. De novo proteomic sequencing of a monoclonal antibody raised against OX40 ligand.

    PubMed

    Pham, Victoria; Henzel, William J; Arnott, David; Hymowitz, Sarah; Sandoval, Wendy N; Truong, Bao-Tran; Lowman, Henry; Lill, Jennie R

    2006-05-01

    De novo sequencing of a full-length monoclonal antibody raised against OX40 ligand is described. Using a combination of overlapping complementary proteolytic and chemical digestions, with analysis by mass spectrometry and Edman degradation, both the heavy and light chains were fully sequenced. Particular attention was paid to those modifications that could be susceptible to degradation in the complementarity determining region and Fc region. An overview of the protocol is described, and suggestions for improvements to aid in such sequencing projects in the future are discussed.

  6. Complete amino acid sequence and structure characterization of the taste-modifying protein, miraculin.

    PubMed

    Theerasilp, S; Hitotsuya, H; Nakajo, S; Nakaya, K; Nakamura, Y; Kurihara, Y

    1989-04-25

    The taste-modifying protein, miraculin, has the unusual property of modifying sour taste into sweet taste. The complete amino acid sequence of miraculin purified from miracle fruits by a newly developed method (Theerasilp, S., and Kurihara, Y. (1988) J. Biol. Chem. 263, 11536-11539) was determined by an automatic Edman degradation method. Miraculin was a single polypeptide with 191 amino acid residues. The calculated molecular weight based on the amino acid sequence and the carbohydrate content (13.9%) was 24,600. Asn-42 and Asn-186 were linked N-glycosidically to carbohydrate chains. High homology was found between the amino acid sequences of miraculin and soybean trypsin inhibitor.

  7. Dipeptide Sequence Determination: Analyzing Phenylthiohydantoin Amino Acids by HPLC

    NASA Astrophysics Data System (ADS)

    Barton, Janice S.; Tang, Chung-Fei; Reed, Steven S.

    2000-02-01

    Amino acid composition and sequence determination, important techniques for characterizing peptides and proteins, are essential for predicting conformation and studying sequence alignment. This experiment presents improved, fundamental methods of sequence analysis for an upper-division biochemistry laboratory. Working in pairs, students use the Edman reagent to prepare phenylthiohydantoin derivatives of amino acids for determination of the sequence of an unknown dipeptide. With a single HPLC technique, students identify both the N-terminal amino acid and the composition of the dipeptide. This method yields good precision of retention times and allows use of a broad range of amino acids as components of the dipeptide. Students learn fundamental principles and techniques of sequence analysis and HPLC.

  8. Amino acid sequence of versutoxin, a lethal neurotoxin from the venom of the funnel-web spider Atrax versutus.

    PubMed

    Brown, M R; Sheumack, D D; Tyler, M I; Howden, M E

    1988-03-01

    The complete amino acid sequence of versutoxin, a lethal neurotoxic polypeptide isolated from the venom of male and female funnel-web spiders of the species Atrax versutus, was determined. Sequencing was performed in a gas-phase protein sequencer by automated Edman degradation of the S-carboxymethylated toxin and fragments of it produced by reaction with CNBr. Versutoxin consisted of a single chain of 42 amino acid residues. It was found to have a high proportion of basic residues and of cystine. The primary structure showed marked homology with that of robustoxin, a novel neurotoxin recently isolated from the venom of another funnel-web-spider species, Atrax robustus.

  9. Amino acid sequence of versutoxin, a lethal neurotoxin from the venom of the funnel-web spider Atrax versutus.

    PubMed Central

    Brown, M R; Sheumack, D D; Tyler, M I; Howden, M E

    1988-01-01

    The complete amino acid sequence of versutoxin, a lethal neurotoxic polypeptide isolated from the venom of male and female funnel-web spiders of the species Atrax versutus, was determined. Sequencing was performed in a gas-phase protein sequencer by automated Edman degradation of the S-carboxymethylated toxin and fragments of it produced by reaction with CNBr. Versutoxin consisted of a single chain of 42 amino acid residues. It was found to have a high proportion of basic residues and of cystine. The primary structure showed marked homology with that of robustoxin, a novel neurotoxin recently isolated from the venom of another funnel-web-spider species, Atrax robustus. PMID:3355530

  10. The application of 0.1 M quadrol to the microsequence of proteins and the sequence of tryptic peptides.

    PubMed

    Brauer, A W; Margolies, M N; Haber, E

    1975-07-01

    In an effort to extend automated Edman degradation to nanomole quantities of protein, the method of sequenator analysis described by Edman and Begg (Edman, P., and Begg, G. (1967), Eur. J. Biochem. 1, 80) was modified to permit long degradations in the absence of carrier proteins. By using an aqueous 0.1 M Quadrol program with limited, combined benezene-ethyl acetate solvent extractions, as well as a change in the delivery system for heptafluorobutyric acid, it was possible to recover and identify the first 30 amino acid residues from a sequenator run on 7 nmol of myoglobin. For 3 nmol of myoglobin, 20 steps could be identified. PTH-amino acids were identified by gas-liquid chromatography and thin-layer chromatography on polyamide sheets. Without using a carrier protein the cup to prevent mechanical losses (Niall, H. D., Jacobs, J. W., Van Rietshoten, J., and Tregear, G. W. (1974), FEBS Lett. 41, 62), the repetitive yield using this program was 93-96%. The same program has been applied successfully to peptides of 14 or more residues with or without modification by Braunitzer's reagent and to a number of larger peptides and proteins including a 216 residue segment of rabbit antibody heavy chain in which a sequence of 35 steps was accomplished on 25 nmol.

  11. Characterization of a benzyladenine binding-site peptide isolated from a wheat cytokinin-binding protein: Sequence analysis and identification of a single affinity-labeled histidine residue by mass spectrometry

    SciTech Connect

    Brinegar, A.C.; Cooper, G.; Stevens, A.; Hauer, C.R.; Shabanowitz, J.; Hunt, D.F.; Fox, J.E. )

    1988-08-01

    A wheat embryo cytokinin-binding protein was covalently modified with the radiolabeled photoaffinity ligand 2-azido-N{sup 6}-({sup 14}C)benzyladenine. A single labeled peptide was obtained after proteolytic digestion and isolation by reversed-phase and anion-exchange HPLC. Sequencing by classical Edman degradation identified 11 of the 12 residues but failed to identify the labeled amino acid. Analysis by laser photodissociation Fourier-transform mass spectrometry of 10 pmol of the peptide independently confirmed the Edman data and also demonstrated that the histidine residue nearest the C terminus (underlined) was modified by the reagent in the sequence Ala-Phe-Leu-Gln-Pro-Ser-His-His{und His}-Asp-Ala-Asp-Glu.

  12. Amino acid sequence of toxin III from Anemonia sulcata.

    PubMed

    Bĕress, L; Wunderer, G; Wachter, E

    1977-08-01

    Toxin III, the smallest toxin component of the poison of the sea anemone Anemonia sulcata, is a polypeptide with 27 amino acids. Its structure is stabilized by three disulfide bridges. The amino acid sequence was determined by solid-phase Edman degradation of the aminoethylated derivative. The peptide was coupled to the carrier, porous glass, by thiourea bridges between the alpha-amino group of arginine-1 and the epsilon-amino group of lysine-26 and the isothiocyanate groups of the carrier. Another fraction of the polypeptide was bound by an acid-amide condensation of the C-terminal valine-27 with the aminopropyl group of the carrier. The sequence of toxin III has no regions homologous to the 47-residue toxin II. Comparison with the known partial sequence of toxin I, which contains 46 amino acids (Wunderer, G. & Eulitz, M., in preparation) also fails to reveal homologies.

  13. Amino-acid sequence of toxin I from Anemonia sulcata.

    PubMed

    Wunderer, G; Eulitz, M

    1978-08-15

    Toxin I from Anemonia sulcata, a major component of the sea anemone venom, consists of 46 amino acid residues which are linked by three disulfide bridges. The [14C]carboxymethylated polypeptide was sequenced to position 29 by automated Edman degradation. The remaining sequence was determined from cyanogen bromide peptides and from tryptic peptides of the citraconylated [14C]carboxymethylated toxin. Toxin I is homologous to toxin II from Anemonia sulcata and to anthopleurin A, a toxin from the sea anemone Anthopleura xanthogrammica. These toxins constitute a new class of polypeptide toxins. No significant homologies exist with toxin III from Anemonia sulcata nor with known sequences of neurotoxins or cardiotoxins of various origin.

  14. The complete amino acid sequence of ubiquitin, an adenylate cyclase stimulating polypeptide probably universal in living cells.

    PubMed

    Schlesinger, D H; Goldstein, G; Niall, H D

    1975-05-20

    The complete amino acid sequence was determined for bovine ubiquitin, and adenylate cyclase stimulating polypeptide, which is probably represented universally in living cells. Ubiquitin has a molecular weight of 8451 and consists of a single polypeptide chain containing 74 amino acid residues. It contains four arginine residues but no cysteine or trytophan residues. The first 61 amino acid residues were obtained by automated Edman degradations. Tryptic digestion of maleated ubiquitin yielded four peptide fragments that were resolved by molecular sieve chromatography and coded in order of decreasing chain length (MT-1, MT-2, MT-3, and MT-4). The automated sequenator determinations on native ubiquintin provided overlapping sequence data for three of these fragments that gave an order of MT-1, MT-3, and then MT-2; Peptide MT-4, a dipeptide, was therefore assigned to the C terminus, and the placement of peptide MT-2 was corroborated by analysis of data from carboxypeptidase digestions of maleated ubiquitin. Peptide MT-2 was domaleated and sequenced by manual Edman degradations through a single lysine residue. It was cleaved at this residue with trypsin, and the two resultant peptides were separated by ion-exchange chromatography. Manual sequencing of the C-terminal demaleated tryptic peptide of MT-2 completed the sequence of MT-2 and that of native ubiquitin. The sequence of ubiquitin was further confirmed and supported by amino acid and parital sequence anlysis of fragments obtained by digestion of maleated ubiquitin with chymotrypsin or staphylococcal protease.

  15. Purification and partial sequence analysis of human T-cell growth factor.

    PubMed Central

    Robb, R J; Kutny, R M; Chowdhry, V

    1983-01-01

    A murine monoclonal antibody directed against human T-cell growth factor (TCGF) from the JURKAT cell line was used for affinity column purification of the factor. Bound TCGF was eluted nearly quantitatively at low pH, and the recovered factor appeared homogeneous by two-dimensional gel electrophoresis. The molecule is markedly hydrophobic, with a high content of leucine. A single NH2-terminal sequence of 36 residues was obtained by automated Edman degradation, further supporting the homogeneity of the material. Thus, significant quantities of purified TCGF have been prepared in a single step, making possible detailed analysis of its molecular structure and biological role. Images PMID:6604277

  16. Amino acid sequence of neurotoxin III of the scorpion Androctonus austrialis Hector.

    PubMed

    Kopeyan, C; Martinez, G; Rochat, H

    1979-03-01

    The amino acid sequence of neurotoxin III, purified from the venom of the North African scorpion Androctonus australis Hector, has been determined by Edman degradation using a liquid-phase sequencer. Carboxypeptidase A hydrolyses confirmed not only the sequence of the five last residues but also the presence of a free alpha-carboxylic group at the C-terminus. Edman degradation was conducted on one hand with the Quadrol [N,N,N',N'-tetrakis(2-hydroxypropyl)ethylene diamine] program and S-alkylated protein before or after coupling with sulfophenylisothiocynate (the first 34 residues were thus identified), on the other hand on tryptic and chymotryptic peptides with a dimethylbenzylamine program (residues 1--23 and 31--34 were confirmed, the positions of residues 35-64 were established). Neurotoxin III was found to belong to the same group of scorpion toxins active on mammals as neurotoxin I purified from the same venom (50 homologous positions exist in the two proteins).

  17. Amino acid sequence of mouse submaxillary gland renin.

    PubMed Central

    Misono, K S; Chang, J J; Inagami, T

    1982-01-01

    The complete amino acid sequences of the heavy chain and light chain of mouse submaxillary gland renin have been determined. The heavy chain consists of 288 amino acid residues having a Mr of 31,036 calculated from the sequence. The light chain contains 48 amino acid residues with a Mr of 5,458. The sequence of the heavy chain was determined by automated Edman degradations of the cyanogen bromide peptides and tryptic peptides generated after citraconylation, as well as other peptides generated therefrom. The sequence of the light chain was derived from sequence analyses of the peptides generated by cyanogen bromide cleavage or by digestion with Staphylococcus aureus protease. The sequences in the active site regions in renin containing two catalytically essential aspartyl residues 32 and 215 were found identical with those in pepsin, chymosin, and penicillopepsin. Comparison of the amino acid sequence of renin with that of porcine pepsin indicated a 42% sequence identity of the heavy chain with the amino-terminal and middle regions and a 46% identity of the light chain with the carboxyl-terminal region of the porcine pepsin sequence. Residues identical in renin and pepsin are distributed throughout the length of the molecules, suggesting a similarity in their overall structures. PMID:6812055

  18. Further characterization and amino acid sequence of m-type thioredoxins from spinach chloroplasts.

    PubMed

    Maeda, K; Tsugita, A; Dalzoppo, D; Vilbois, F; Schürmann, P

    1986-01-02

    The complete primary structure of m-type thioredoxin from spinach chloroplasts has been sequenced by conventional sequencing including fragmentation, Edman degradation and carboxypeptidase digestion. As already reported [Tsugita, A., Maeda, K. & Schürmann, P. (1983) Biochem. Biophys. Res. Commun. 115, 1-7] these thioredoxins contain the same active-site sequence as thioredoxins from other sources. Based on the amino acid sequence thioredoxin mc contains 103 residues, has a relative molecular mass of 11425 and a molar absorption coefficient at 280 nm of 19 300 M-1 cm-1. The spinach thioredoxin mc has an overall homology of 44% with the thioredoxin from Escherichia coli mainly due to differences in the N-terminal and C-terminal regions.

  19. Amino acid sequence of bovine gamma E (IVa) lens crystallin.

    PubMed Central

    Kilby, G. W.; Sheil, M. M.; Shaw, D.; Harding, J. J.; Truscott, R. J.

    1997-01-01

    When electrospray ionization mass spectrometry (ESMS) was used to analyze purified bovine gamma E (gamma IVa)-crystallin, it yielded a relative molecular mass (M(r)) of 20.955 +/- 5. This mass is significantly different from that calculated from the published sequence (M(r) 20.894) (White HE et al., 1989, J Mol Biol 207:217-235). Further, ES-MS analysis of the protein after it had been reduced and carboxymethylated indicated the presence of five cysteine residues, whereas the published sequence contains six (Kilby GW et al., 1995, Eur Mass Spectrom 1:203-208). The entire protein sequence of gamma E crystallin has therefore been studied via a combination of ES-MS, ES-MS/MS, and Edman amino acid sequencing. The corrected sequence gives an M(r) of 20.955.3, which matches that obtained by ES-MS analysis of the purified native protein. The corrected sequence is also in agreement with a recent cDNA sequence obtained for a bovine gamma-crystallin by R. Hay (pers. comm.). PMID:9098901

  20. Amino acid sequence of bovine gamma E (IVa) lens crystallin.

    PubMed

    Kilby, G W; Sheil, M M; Shaw, D; Harding, J J; Truscott, R J

    1997-04-01

    When electrospray ionization mass spectrometry (ESMS) was used to analyze purified bovine gamma E (gamma IVa)-crystallin, it yielded a relative molecular mass (M(r)) of 20.955 +/- 5. This mass is significantly different from that calculated from the published sequence (M(r) 20.894) (White HE et al., 1989, J Mol Biol 207:217-235). Further, ES-MS analysis of the protein after it had been reduced and carboxymethylated indicated the presence of five cysteine residues, whereas the published sequence contains six (Kilby GW et al., 1995, Eur Mass Spectrom 1:203-208). The entire protein sequence of gamma E crystallin has therefore been studied via a combination of ES-MS, ES-MS/MS, and Edman amino acid sequencing. The corrected sequence gives an M(r) of 20.955.3, which matches that obtained by ES-MS analysis of the purified native protein. The corrected sequence is also in agreement with a recent cDNA sequence obtained for a bovine gamma-crystallin by R. Hay (pers. comm.).

  1. Amino acid sequence of myoglobin from white-tailed deer (Odocoileus virginianus).

    PubMed

    Joseph, Poulson; Suman, Surendranath P; Li, Shuting; Fontaine, Michele; Steinke, Laurey

    2012-10-01

    Our objective was to determine the primary structure of white-tailed deer myoglobin (Mb). White-tailed deer Mb was isolated from cardiac muscles employing ammonium sulfate precipitation and gel-filtration chromatography. The amino acid sequence was determined by Edman degradation. Sequence analyses of intact Mb as well as tryptic- and cyanogen bromide-peptides yielded the complete primary structure of white-tailed deer Mb, which shared 100% similarity with red deer Mb. White-tailed deer Mb consists of 153 amino acid residues and shares more than 96% sequence similarity with myoglobins from meat-producing ruminants, such as cattle, buffalo, sheep, and goat. Similar to sheep and goat myoglobins, white-tailed deer Mb contains 12 histidine residues. Proximal (position 93) and distal (position 64) histidine residues responsible for maintaining the stability of heme are conserved in white-tailed deer Mb.

  2. Isolation and amino-acid sequence determination of monkey insulin and proinsulin.

    PubMed

    Naithani, V K; Steffens, G J; Tager, H S; Buse, G; Rubenstein, A H; Steiner, D F

    1984-05-01

    Insulin has been isolated and purified from rhesus monkey pancreas by means of acid-ethanol extraction, gel filtration and ion exchange chromatography. The complete amino-acid sequence of the hormone has been determined by amino-acid analysis of the oxidized A- and B-chains, by end group determination, by the identification of the C-terminal residues (AsnA21 and ThrB30) by carboxypeptidase A digestion and by Edman degradation of the S-carboxymethylated A- and B-chains. The 51-residue monkey insulin was shown to be identical to human insulin. From the known insulin and C-peptide sequence the primary sequence of monkey proinsulin has been proposed.

  3. Active site amino acid sequence of human factor D.

    PubMed

    Davis, A E

    1980-08-01

    Factor D was isolated from human plasma by chromatography on CM-Sephadex C50, Sephadex G-75, and hydroxylapatite. Digestion of reduced, S-carboxymethylated factor D with cyanogen bromide resulted in three peptides which were isolated by chromatography on Sephadex G-75 (superfine) equilibrated in 20% formic acid. NH2-Terminal sequences were determined by automated Edman degradation with a Beckman 890C sequencer using a 0.1 M Quadrol program. The smallest peptide (CNBr III) consisted of the NH2-terminal 14 amino acids. The other two peptides had molecular weights of 17,000 (CNBr I) and 7000 (CNBr II). Overlap of the NH2-terminal sequence of factor D with the NH2-terminal sequence of CNBr I established the order of the peptides. The NH2-terminal 53 residues of factor D are somewhat more homologous with the group-specific protease of rat intestine than with other serine proteases. The NH2-terminal sequence of CNBr II revealed the active site serine of factor D. The typical serine protease active site sequence (Gly-Asp-Ser-Gly-Gly-Pro was found at residues 12-17. The region surrounding the active site serine does not appear to be more highly homologous with any one of the other serine proteases. The structural data obtained point out the similarities between factor D and the other proteases. However, complete definition of the degree of relationship between factor D and other proteases will require determination of the remainder of the primary structure.

  4. Purification, amino acid sequence and immunological characterization of Ole e 6, a cysteine-enriched allergen from olive tree pollen.

    PubMed

    Batanero, E; Ledesma, A; Villalba, M; Rodríguez, R

    1997-06-30

    The Ole e 6 allergen from olive tree pollen has been isolated by combining gel permeation and reverse-phase chromatographies. It is a single and highly acidic (pI 4.2) polypeptide chain protein. Its NH2-terminal amino acid sequence has been determined by Edman degradation. Total RNA from the olive tree pollen was isolated, and a specific cDNA was amplified by the polymerase chain reaction using a degenerate oligonucleotide primer designed according to the NH2-terminal sequence of the protein. The nucleotide sequencing of the cDNA rendered an open reading frame encoding a 50 amino acid polypeptide chain, in which two sets of the sequential motif Cys-X3-Cys-X3-Cys are present. No sequence similarity has been found between this protein and other previously described polypeptides.

  5. Amino acid sequence of fibrolase, a direct-acting fibrinolytic enzyme from Agkistrodon contortrix contortrix venom.

    PubMed Central

    Randolph, A.; Chamberlain, S. H.; Chu, H. L.; Retzios, A. D.; Markland, F. S.; Masiarz, F. R.

    1992-01-01

    The complete amino acid sequence of fibrolase, a fibrinolytic enzyme from southern copperhead (Agkistrodon contortrix contortrix) venom, has been determined. This is the first report of the sequence of a direct-acting, nonhemorrhagic fibrinolytic enzyme found in snake venom. The majority of the sequence was established by automated Edman degradation of overlapping peptides generated by a variety of selective cleavage procedures. The amino-terminus is blocked by a cyclized glutamine (pyroglutamic acid) residue, and the sequence of this region of the molecule was determined by mass spectrometry. Fibrolase is composed of 203 residues in a single polypeptide chain with a molecular weight of 22,891, as determined by the sequence. Its sequence is homologous to the sequence of the hemorrhagic toxin Ht-d of Crotalus atrox venom and with the sequences of two metalloproteinases from Trimeresurus flavoviridis venom. Microheterogeneity in the sequence was found at both the amino-terminus and at residues 189 and 192. All six cysteine residues in fibrolase are involved in disulfide bonds. A disulfide bond between cysteine-118 and cysteine-198 has been established and bonds between cysteines-158/165 and between cysteines-160/192 are inferred from the homology to Ht-d. Secondary structure prediction reveals a very low percentage of alpha-helix (4%), but much greater beta-structure (39.5%). Analysis of the sequence reveals the absence of asparagine-linked glycosylation sites defined by the consensus sequence: asparagine-X-serine/threonine. PMID:1304358

  6. A manual sequence method of peptides and phosphopeptides using 4-(1'-cyanoisoindolyl)phenylisothiocyanate.

    PubMed

    Shibata, Takayuki; Wainaina, Moses N; Miyoshi, Takayuki; Kabashima, Tsutomu; Kai, Masaaki

    2011-06-17

    A method for sequence analysis and identification of phosphoamino acids in peptides based on high performance liquid chromatography (HPLC) is described. The peptides were derivatized with an Edman type reagent, 4-(1'-cyanoisoindolyl)phenylisothiocyanate (CIPIC) and subsequently cleaved to generate stable and fluorescent 4-(1'-cyanoisoindolyl)phenylthiazolinone (CIP-TZ)-amino acids. Several experimental factors that affected derivatization on membranes were examined. Under the optimized conditions, the CIP-TZ derivatives of Try(p), Thr(p) and Ser(p) were obtained and separated from their parent amino acids with baseline resolution using an isocratic elution system. Up to the 4th residue of phosphorylated pentapeptides was successfully identified, whereas phosphoamino acid residues could not be detected by the conventional procedure using phenylisothiocyanate (PITC). The results demonstrated the potential of CIPIC as a derivatization reagent for peptide sequencing and the applicability of the method for the study and identification of phosphoamino acids in peptides.

  7. Spermatogenesis of the lizard Lacerta vivipara: histological studies and amino acid sequence of a protamine lacertine 1.

    PubMed

    Martinage, A; Depeiges, A; Wouters, D; Morel, L; Sautière, P

    1996-06-01

    The lizard Lacerta vivipara is a seasonal breeder with a well characterized reproductive cycle. An histological study of the lizard testis has been performed at different stages of spermatogenesis and the nuclear basic proteins content was assessed by electrophoretical analysis. Two protamines, lacertines 1 and 2, are present in spermatozoa in April and May. We have isolated lacertine1 and characterized a protamine with a mass of 4,963.7 Da. Amino acid sequence of this protamine (41 residues) was established from data provided by automated Edman degradation. It is characterized by a basic amino acid stretch in the N- and C-terminal regions and by a central part which only consists of 3 different intermingled amino acids. This protamine presents 62% homology with scylliorhinine Z3 from dog-fish Scylliorhinus caniculus and 58% homology with quail protamine. The reported lizard protamine sequence is the first reptilian protamine sequence available so far.

  8. Amino acid sequence of bovine heart coupling factor 6.

    PubMed Central

    Fang, J K; Jacobs, J W; Kanner, B I; Racker, E; Bradshaw, R A

    1984-01-01

    The amino acid sequence of bovine heart mitochondrial coupling factor 6 (F6) has been determined by automated Edman degradation of the whole protein and derived peptides. Preparations based on heat precipitation and ethanol extraction showed allotypic variation at three positions while material further purified by HPLC yielded only one sequence that also differed by a Phe-Thr replacement at residue 62. The mature protein contains 76 amino acids with a calculated molecular weight of 9006 and a pI of approximately equal to 5, in good agreement with experimentally measured values. The charged amino acids are mainly clustered at the termini and in one section in the middle; these three polar segments are separated by two segments relatively rich in nonpolar residues. Chou-Fasman analysis suggests three stretches of alpha-helix coinciding (or within) the high-charge-density sequences with a single beta-turn at the first polar-nonpolar junction. Comparison of the F6 sequence with those of other proteins did not reveal any homologous structures. PMID:6149548

  9. Recognizing Sequences of Sequences

    PubMed Central

    Kiebel, Stefan J.; von Kriegstein, Katharina; Daunizeau, Jean; Friston, Karl J.

    2009-01-01

    The brain's decoding of fast sensory streams is currently impossible to emulate, even approximately, with artificial agents. For example, robust speech recognition is relatively easy for humans but exceptionally difficult for artificial speech recognition systems. In this paper, we propose that recognition can be simplified with an internal model of how sensory input is generated, when formulated in a Bayesian framework. We show that a plausible candidate for an internal or generative model is a hierarchy of ‘stable heteroclinic channels’. This model describes continuous dynamics in the environment as a hierarchy of sequences, where slower sequences cause faster sequences. Under this model, online recognition corresponds to the dynamic decoding of causal sequences, giving a representation of the environment with predictive power on several timescales. We illustrate the ensuing decoding or recognition scheme using synthetic sequences of syllables, where syllables are sequences of phonemes and phonemes are sequences of sound-wave modulations. By presenting anomalous stimuli, we find that the resulting recognition dynamics disclose inference at multiple time scales and are reminiscent of neuronal dynamics seen in the real brain. PMID:19680429

  10. Amino-acid sequence data of beta-tubulin from Physarum polycephalum myxamoebae.

    PubMed

    Singhofer-Wowra, M; Clayton, L; Dawson, P; Gull, K; Little, M

    1986-12-15

    Starting with 7.7 mg of a beta-tubulin isolated from myxamoebae of the slime mould Physarum polycephalum, 90% of the sequence has been determined by the Edman degradation of peptides generated by cyanogen bromide, trypsin and Staphylococcus aureus protease. Differences to other beta-tubulins are mainly conservative and spread evenly throughout the chain except for a high concentration at the C-terminus. The Physarum beta-tubulin shows most homology to Chlamydomonas beta-tubulin (90.5%) and least homology to yeast beta-tubulin (S. cerevisiae, 73.4%). Two tryptic peptides were isolated in approximately equal quantities which were identical except in one position (S/ALTVPELTQRMFDA) showing that at least two beta-tubulins are present in myxamoebae. However, since this was the only heterogeneity found, these beta-tubulins are probably very similar.

  11. Amino acid sequence of two neurotoxins from the venom of the Egyptian black snake (Walterinnesia aegyptia).

    PubMed

    Samejima, Y; Aoki-Tomomatsu, Y; Yanagisawa, M; Mebs, D

    1997-02-01

    The venom of the Egyptian black snake Walterinnesia aegyptia contains at least three toxins, which act postsynaptically to block the neuromuscular transmission of isolated rat phrenic nerve-diaphragm and chicken biventer cervicis muscle. The complete amino acid sequence of the two toxins, W-III and W-IV, consisting of 62 amino acid residues, was elucidated by Edman degradation of fragments obtained after Staphylococcus aureus protease and prolylpeptidase digestion. Although the toxins exhibit close structural homology to other short-chain postsynaptic neurotoxins from Elapidae venoms, toxin IV is unique by having a free SH-group (cysteine) at position 16. In position 35 of W-III, which is located at the tip of the central loop, threonine is replaced by lysine, which may alter the interaction of the toxin with the acetylcholine receptor, since the toxin is seven times less lethal than toxin W-IV.

  12. Rapid and sensitive amino-acid sequencing of cloning Thermus thermophilus HB8 ferredoxin by proteomics.

    PubMed

    Kaneko, Maki; Masui, Ryoji; Ake, Kojiro; Kousumi, Yukihide; Kuramitsu, Seiki; Yamaguchi, Minoru; Kuyama, Hiroki; Ando, Eiji; Norioka, Shigemi; Nakazawa, Takashi; Okamura, Taka-Aki; Yamamoto, Hitoshi; Ueyama, Norikazu

    2004-01-01

    Recombinant holo Thermus thermophilus [7Fe-8S] ferredoxin was synthesized by cloning from Thermus thermophilus HB8 gene. A specific sequence (Pro-His-Val-Ile) at the N-terminus of the recombinant ferredoxin was determined by a rapid and highly sensitive mass spectral method using a novel Ru(II) Edman reagent, [(tpy)Ru(tpy-C6H4-NCS)](PF6)2 (tpy=terpyridine). The formation of the recombinant holoTtFd was established by the characteristic absorptions and CD extrema as [7Fe-8S] ferredoxin. The catalytic electron-transfer reactivity of the [7Fe-8S] ferredoxin between ferredoxin-NADP+ reductase and cytochrome c was recognized.

  13. Complete amino acid sequence of the catalytic subunit of bovine cardiac muscle cyclic AMP-dependent protein kinase.

    PubMed Central

    Shoji, S; Parmelee, D C; Wade, R D; Kumar, S; Ericsson, L H; Walsh, K A; Neurath, H; Long, G L; Demaille, J G; Fischer, E H; Titani, K

    1981-01-01

    The complete amino acid sequence of the 349-residue catalytic subunit of cyclic AMP-dependent protein kinase from bovine cardiac muscle is presented. The sequence of the subunit (Mr 40,580 including phosphate groups at threonine-196 and serine-337) was derived largely by automated Edman degradation of nine fragments generated from the carboxymethylated protein by cleavage of methionyl bonds with cyanogen bromide. These fragments were aligned along the polypeptide chain by analysis of methionine-containing tryptic peptides isolated from protein radiolabeled in vitro by [14C]methyl exchange at methionyl residues. The molecule contains only two cysteinyl residues, at positions 198 and 342. It is relatively polar, containing clusters of cationic residues toward the amino terminus and anionic residues towards the carboxyl terminus. Predictions of secondary structure suggest the presence of three major domains with approximately half of the residues occurring in alpha-helices and 12% in beta-strands. PMID:6262777

  14. Myoglobins of cartilaginous fishes III. Amino acid sequence of myoglobin of the shark Galeorhinus australis.

    PubMed

    Fisher, W K; Koureas, D D; Thompson, E O

    1981-01-01

    Myoglobin isolated from the red muscle of the school shark Galeorhinus australis was purified by gel filtration and ion-exchange chromatography. The amino acid sequence was determined following digestion with trypsin and purification of the peptides by paper ionophoresis and chromatography. Sequences of purified peptides were determined by the dansyl-Edman procedure and the peptides aligned by homology with the sequence of the myoglobin of the gummy shark Mustelus antarcticus. The two myoglobin sequences showed a marked similarity (16 differences), but both sequences showed approximately the same number of differences (68) from myoglobin of the Port Jackson shark Heterodontus portusjacksoni. There are 19 residues unique to three shark myoglobin sequences. As found with other fish myoglobins there are 148 residues with deletions of four residues at the amino terminal end as well as one residue in the CD region. The amino terminal residue is acetylated. The distal E7 histidine residue was found to be replaced by glutamine, as only previously reported for the myoglobin sequence of gummy shark.

  15. Myoglobin of the shark Heterodontus portusjacksoni: isolation and amino acid sequence.

    PubMed

    Fisher, W K; Thompson, E O

    1979-06-01

    Myoglobin isolated from red muscle of the shark H. portusjacksoni was purified by ion-exchange chromatography on sulfopropyl-Sephadex and gel-filtration. Amino acid analysis and sequence determination showed 148 amino acid residues. The amino terminal residue is acetylated as shown by mass spectrographic analysis of N-terminal peptides. There is a deletion of four residues at the amino terminal end as well as one residue in the CD interhelical area relative to other myoglobins. The complete amino acid sequence has been determined following digestion with trypsin, chymotrypsin, pepsin and staphylococcal protease. Sequences of the purified peptides were determined by the dansyl-Edman procedure. The amino acid sequence showed approximately 85 differences from mammalian, monotreme and bird myoglobins. The date of divergence of the shark H. portusjacksoni from these other orders was estimated at 450 +/- 16 million years, based on the number of amino acid differences between species and allowing for multiple mutations during the evolutionary period. This estimate agrees well with similar estimates made using alpha- and beta-globin sequences, in contrast to widely differing estimates of dates of divergence for monotremes using the same three globin chains. Compared with myoglobins from species previously studied, there are many more differences in amino acid sequences, and in many positions residues are found that are more characteristic of alpha- and beta-globins, suggesting a conservation of residues over a long period of evolutionary time. There are fewer stabilizing hydrogen bonds and salt-linkages than in other myoglobins.

  16. Investigation of the protein osteocalcin of Camelops hesternus: Sequence, structure and phylogenetic implications

    NASA Astrophysics Data System (ADS)

    Humpula, James F.; Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Stafford, Thomas W.; Smith, James J.; Voorhies, Michael R.; George Corner, R.; Andrews, Phillip C.

    2007-12-01

    Ancient DNA sequences offer an extraordinary opportunity to unravel the evolutionary history of ancient organisms. Protein sequences offer another reservoir of genetic information that has recently become tractable through the application of mass spectrometric techniques. The extent to which ancient protein sequences resolve phylogenetic relationships, however, has not been explored. We determined the osteocalcin amino acid sequence from the bone of an extinct Camelid (21 ka, Camelops hesternus) excavated from Isleta Cave, New Mexico and three bones of extant camelids: bactrian camel ( Camelus bactrianus); dromedary camel ( Camelus dromedarius) and guanaco ( Llama guanacoe) for a diagenetic and phylogenetic assessment. There was no difference in sequence among the four taxa. Structural attributes observed in both modern and ancient osteocalcin include a post-translation modification, Hyp 9, deamidation of Gln 35 and Gln 39, and oxidation of Met 36. Carbamylation of the N-terminus in ancient osteocalcin may result in blockage and explain previous difficulties in sequencing ancient proteins via Edman degradation. A phylogenetic analysis using osteocalcin sequences of 25 vertebrate taxa was conducted to explore osteocalcin protein evolution and the utility of osteocalcin sequences for delineating phylogenetic relationships. The maximum likelihood tree closely reflected generally recognized taxonomic relationships. For example, maximum likelihood analysis recovered rodents, birds and, within hominins, the Homo-Pan-Gorilla trichotomy. Within Artiodactyla, character state analysis showed that a substitution of Pro 4 for His 4 defines the Capra-Ovis clade within Artiodactyla. Homoplasy in our analysis indicated that osteocalcin evolution is not a perfect indicator of species evolution. Limited sequence availability prevented assigning functional significance to sequence changes. Our preliminary analysis of osteocalcin evolution represents an initial step towards a

  17. Genome Sequencing.

    PubMed

    Verma, Mansi; Kulshrestha, Samarth; Puri, Ayush

    2017-01-01

    Genome sequencing is an important step toward correlating genotypes with phenotypic characters. Sequencing technologies are important in many fields in the life sciences, including functional genomics, transcriptomics, oncology, evolutionary biology, forensic sciences, and many more. The era of sequencing has been divided into three generations. First generation sequencing involved sequencing by synthesis (Sanger sequencing) and sequencing by cleavage (Maxam-Gilbert sequencing). Sanger sequencing led to the completion of various genome sequences (including human) and provided the foundation for development of other sequencing technologies. Since then, various techniques have been developed which can overcome some of the limitations of Sanger sequencing. These techniques are collectively known as "Next-generation sequencing" (NGS), and are further classified into second and third generation technologies. Although NGS methods have many advantages in terms of speed, cost, and parallelism, the accuracy and read length of Sanger sequencing is still superior and has confined the use of NGS mainly to resequencing genomes. Consequently, there is a continuing need to develop improved real time sequencing techniques. This chapter reviews some of the options currently available and provides a generic workflow for sequencing a genome.

  18. Purification and N-terminal sequence of a serine proteinase-like protein (BMK-CBP) from the venom of the Chinese scorpion (Buthus martensii Karsch).

    PubMed

    Gao, Rong; Zhang, Yong; Gopalakrishnakone, Ponnampalam

    2008-08-01

    A serine proteinase-like protein was isolated from the venom of Chinese red scorpion (Buthus martensii Karsch) by combination of gel filtration, ion-exchange and reveres-phase chromatography and named BMK-CBP. The apparent molecular weight of BMK-CBP was identified as 33 kDa by SDS-PAGE under non-reducing condition. The sequence of N-terminal 40 amino acids was obtained by Edman degradation. The sequence shows highest similarity to proteinase from insect source. When tested with commonly used substrates of proteinase, no significant hydrolytic activity was observed for BMK-CBP. The purified BMK-CBP was found to bind to the cancer cell line MCF-7 and the cell binding ability was dose-dependent.

  19. Sequence comparison of pepsin-resistant segments of basement-membrane collagen alpha 1(IV) chains from bovine lens capsule and mouse tumour.

    PubMed Central

    Schuppan, D; Glanville, R W; Timpl, R; Dixit, S N; Kang, A H

    1984-01-01

    The C-terminal peptic fragment P1 (about 518 amino acid residues) of bovine lens-capsule collagen alpha 1(IV) chain was cleaved with CNBr and trypsin. The peptides were purified and characterized, allowing their ordering within the P1 fragment by comparison with a corresponding section of mouse collagen alpha 1(IV) chain [Schuppan, Glanville & Timpl (1982) Eur. J. Biochem. 123, 505-512]. About 67% of the sequence of bovine collagen fragment P1 was determined by Edman degradation. Comparison with the sequence of the corresponding mouse collagen fragment P1 showed 76% identity for positions Xaa and Yaa of the triplet structures Gly-Xaa-Yaa. Invariance was found for the positions of two non-triplet interruptions and of 3-hydroxyproline residues, pointing to the functional importance of these structures. PMID:6430279

  20. Amino-Terminal Oriented Mass Spectrometry of Substrates (ATOMS) N-terminal sequencing of proteins and proteolytic cleavage sites by quantitative mass spectrometry.

    PubMed

    Doucet, Alain; Overall, Christopher M

    2011-01-01

    Edman degradation is a long-established technique for N-terminal sequencing of proteins and cleavage fragments. However, for accurate data analysis and amino acid assignments, Edman sequencing proceeds on samples of single proteins only and so lacks high-throughput capabilities. We describe a new method for the high-throughput determination of N-terminal sequences of multiple protein fragments in solution. Proteolytic processing can change the activity of bioactive proteins and also reveal cryptic binding sites and generate proteins with new functions (neoproteins) not found in the parent molecule. For example, extracellular matrix (ECM) protein processing often produces multiple proteolytic fragments with the generation of cryptic binding sites and neoproteins by ECM protein processing being well documented. The exact proteolytic cleavage sites need to be identified to fully understand the functions of the cleavage fragments and biological roles of proteases in vivo. However, the identification of cleavage sites in complex high molecular proteins such as those composing the ECM is not trivial. N-terminal microsequencing of proteolytic fragments is the usual method employed, but it suffers from poor resolution of sodium dodecylsulfate-polyacrylamide gel electrophoresis gels and is inefficient at identifying multiple cleavages, requiring preparation of numerous gels or membrane slices for analysis. We recently developed Amino-Terminal Oriented Mass spectrometry of Substrates (ATOMS) to overcome these limitations as a complement for N-terminal sequencing. ATOMS employs isotopic labeling and quantitative tandem mass spectrometry to identify cleavage sites in a fast and accurate manner. We successfully used ATOMS to identify nearly 100 cleavage sites in the ECM proteins laminin and fibronectin. Presented herein is the detailed step-by-step protocol for ATOMS. Copyright © 2011 Elsevier Inc. All rights reserved.

  1. The amino acid sequence of the aspartate aminotransferase from baker's yeast (Saccharomyces cerevisiae).

    PubMed Central

    Cronin, V B; Maras, B; Barra, D; Doonan, S

    1991-01-01

    1. The single (cytosolic) aspartate aminotransferase was purified in high yield from baker's yeast (Saccharomyces cerevisiae). 2. Amino-acid-sequence analysis was carried out by digestion of the protein with trypsin and with CNBr; some of the peptides produced were further subdigested with Staphylococcus aureus V8 proteinase or with pepsin. Peptides were sequenced by the dansyl-Edman method and/or by automated gas-phase methods. The amino acid sequence obtained was complete except for a probable gap of two residues as indicated by comparison with the structures of counterpart proteins in other species. 3. The N-terminus of the enzyme is blocked. Fast-atom-bombardment m.s. was used to identify the blocking group as an acetyl one. 4. Alignment of the sequence of the enzyme with those of vertebrate cytosolic and mitochondrial aspartate aminotransferases and with the enzyme from Escherichia coli showed that about 25% of residues are conserved between these distantly related forms. 5. Experimental details and confirmatory data for the results presented here are given in a Supplementary Publication (SUP 50164, 25 pages) that has been deposited at the British Library Document Supply Centre, Boston Spa. Wetherby, West Yorkshire LS23 7 BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1991) 273, 5. PMID:1859361

  2. Unconventional amino acid sequence of the sun anemone (Stoichactis helianthus) polypeptide neurotoxin

    SciTech Connect

    Kem, W.; Dunn, B.; Parten, B.; Pennington, M.; Price, D.

    1986-05-01

    A 5000 dalton polypeptide neurotoxin (Sh-NI) purified by G50 Sephadex, P-cellulose, and SP-Sephadex chromatography was homogeneous by isoelectric focusing. Sh-NI was highly toxic to crayfish (LD/sub 50/ 0.6 ..mu..g/kg) but without effect upon mice at 15,000 ..mu..g/kg (i.p. injection). The reduced, /sup 3/H-carboxymethylated toxin and its fragments were subjected to automatic Edman degradation and the resulting PTH-amino acids were identified by HPLC, back hydrolysis, and scintillation counting. Peptides resulting from proteolytic (clostripain, staphylococcal protease) and chemical (tryptophan) cleavage were sequenced. The sequence is: AACKCDDEGPDIRTAPLTGTVDLGSCNAGWEKCASYYTIIADCCRKKK. This sequence differs considerably from the homologous Anemonia and Anthopleura toxins; many of the identical residues (6 half-cystines, G9, P10, R13, G19, G29, W30) are probably critical for folding rather than receptor recognition. However, the Sh-NI sequence closely resembles Radioanthus macrodactylus neurotoxin III and r. paumotensis II. The authors propose that Sh-NI and related Radioanthus toxins act upon a different site on the sodium channel.

  3. N-terminal amino acid sequence of proalbumin from inbred buffalo rats.

    PubMed

    Millership, A; Edwards, K; Chelladurai, M; Dryburgh, H; Inglis, A S; Urban, J; Schreiber, G

    1980-03-01

    The sequence of radioactively labelled amino acids at the N-terminus of proalbumin was determined by automated Edman-degradation. [3H] Valine, [3H]phenylalanine or [14C]arginine was incorporated into protein in vivo for a time period of 10 min after injection. Since albumin remains unlabelled during this time period (Urban et al., 1976), separation of proalbumin and albumin was not required for this work. Hence, compared to previous methods, a shorter purification procedure could be used which increased the yield of anti-albumin-precipitable protein and reduced the risk of proteolysis. Microsomes were prepared from livers removed 10 min after injection of the radioactively labelled amino acids. A buffer extract of the acetone-dried powder from these microsomes was chromatographed on DEAE-cellulose. All protein obtained after chromatography which could be precipitated with antiserum to serum albumin was isolated by immunoprecipitation and subsequent separation of the antigen-antibody complex. The sequence of radioactive amino acids in this antigen preparation suggests that about 20-25% of proalbumin possessed at the N-terminus the pentapeptide sequence X-Val-Phe-Arg-Arg- whereas 75-80% contained the hexapeptide sequence Arg-X-Val-Phe-Arg-Arg-.

  4. Haemoglobins of the shark, Heterodontus portusjacksoni II. Amino acid sequence of the alpha-chain.

    PubMed

    Nash, A R; Fisher, W K; Thompson, E O

    1976-03-01

    The amino acid sequence of the alpha-chain of the principal haemoglobin from the shark, H. portusjacksoni has been determined. The chain has 148 residues and is acetylated at the amino terminal. The soluble peptides obtained by tryptic and chymotryptic digestion of the protein or its cyanogen bromide fragments were isolated by gel filtration, paper ionophoresis and paper chromatography. The amino acid sequences were determined by the dansyl-Edman procedure. The insoluble "core" peptide from the tryptic digestion contained 34 residues and required cleavage by several prosteases before the sequence was established. Compared with human alpha-chain there are 88 amino acid differences including the additional seven residues which appear on the amino terminal of the shark chain. There is also one deletion and one insertion. The chain contains no tryptophan but has four cysteinyl residues which is the highest number of such residues recorded for a vertebrate globin. In the alpha1beta1 contact sites there are four changes in the oxyhaemoglobin form and six deoxy form. Nine of the 16, alpha1beta1 contact sites show variation while three of the haem contact sites have changed in comparison to the residues known to be involved in these interactions in horse haemoglobin alpha-chain. Use of the sequence data to estimate a time of divergence of the shark from the main vertebrate line yielded the value of 410 +/- 46 million years. The data, in general, support the palaeontological view that bony fishes arose before the elasmobranchs.

  5. A proposal for a coherent mammalian histone H1 nomenclature correlated with amino acid sequences.

    PubMed

    Parseghian, M H; Henschen, A H; Krieglstein, K G; Hamkalo, B A

    1994-04-01

    Bio-Rex 70 chromatography was combined with reverse-phase (RP) HPLC to fractionate histone H1 zero and 4 histone H1 subtypes from human placental nuclei as previously described (Parseghian MH et al., 1993, Chromosome Res 1:127-139). After proteolytic digestion of the subtypes with Staphylococcus aureus V8 protease, peptides were fractionated by RP-HPLC and partially sequenced by Edman degradation in order to correlate them with human spleen subtypes (Ohe Y, Hayashi H, Iwai K, 1986, J Biochem (Tokyo) 100:359-368; 1989, J Biochem (Tokyo) 106:844-857). Based on comparisons with the sequence data available from other mammalian species, subtypes were grouped. These groupings were used to construct a coherent nomenclature for mammalian somatic H1s. Homologous subtypes possess characteristic patterns of growth-related and cAMP-dependent phosphorylation sites. The groupings defined by amino acid sequence also were used to correlate the elution profiles and electrophoretic mobilities of subtypes derived from different species. Previous attempts at establishing an H1 nomenclature by chromatographic or electrophoretic fractionations has resulted in several misidentifications. We present here, for the first time, a nomenclature for somatic H1s based on amino acid sequences that are analogous to those for H1 zero and H1t. The groupings defined should be useful in correlating the many observations regarding H1 subtypes in the literature.

  6. Sequence analysis and location of capsid proteins within RNA 2 of strawberry latent ringspot virus.

    PubMed

    Kreiah, S; Strunk, G; Cooper, J I

    1994-09-01

    The nucleotide sequence of the RNA 2 of a strawberry isolate (H) of strawberry latent ringspot virus (SLRSV) comprised 3824 nucleotides and contained one long open reading frame with a theoretical coding capacity of 890 amino acids equivalent to a protein of 98.8K. The N-terminal amino acid sequences of virion-derived proteins were determined by Edman degradation allowing the capsid coding regions to be located and serine/glycine cleavage sites to be identified within the polyprotein. The amino acid sequence in the capsid coding region of an isolate of SLRSV from flowering cherry in New Zealand was 97% identical to that of SLRSV-H. Except in the 3' and 5' terminal non-coding sequences, computer-based alignment and comparison algorithms did not reveal any substantial homologies between RNA 2 of SLRSV-H and the equivalent genomic segments in the nepoviruses arabis mosaic, cherry leaf roll, grapevine fanleaf, raspberry ringspot, grapevine hungarian chrome mosaic, tomato blackring, tomato ringspot, tobacco ringspot, or in the comoviruses cowpea mosaic and red clover mottle. Despite the similarities in overall genome organization, data from RNA 2 remain insufficient for unambiguous positioning of SLRSV in relation to species/genera in the Comoviridae.

  7. Design, synthesis, and characterization of a protein sequencing reagent yielding amino acid derivatives with enhanced detectability by mass spectrometry.

    PubMed Central

    Aebersold, R.; Bures, E. J.; Namchuk, M.; Goghari, M. H.; Shushan, B.; Covey, T. C.

    1992-01-01

    We report the design, chemical synthesis, and structural and functional characterization of a novel reagent for protein sequence analysis by the Edman degradation, yielding amino acid derivatives rapidly detectable at high sensitivity by ion-evaporation mass spectrometry. We demonstrate that the reagent 3-[4'(ethylene-N,N,N-trimethylamino)phenyl]-2-isothiocyanate is chemically stable and shows coupling and cyclization/cleavage yields comparable to phenylisothiocyanate, the standard reagent in chemical sequence analysis, under conditions typically encountered in manual or automated sequence analysis. Amino acid derivatives generated with this reagent were detectable by ion-evaporation mass spectrometry at the subfemtomole sensitivity level at a pace of one sample per minute. Furthermore, derivatives were identified by their mass, thus permitting the rapid and highly sensitive determination of the molecular nature of modified amino acids. Derivatives of amino acids with acidic, basic, polar, or hydrophobic side chains were reproducibly detectable at comparable sensitivities. The polar nature of the reagent required covalent immobilization of polypeptides prior to automated sequence analysis. This reagent, used in automated sequence analysis, has the potential for overcoming the limitations in sensitivity, speed, and the ability to characterize modified amino acid residues inherent in the chemical sequencing methods that are currently used. PMID:1304351

  8. Sequence landscapes.

    PubMed Central

    Clift, B; Haussler, D; McConnell, R; Schneider, T D; Stormo, G D

    1986-01-01

    We describe a method for representing the structure of repeating sequences in nucleic-acids, proteins and other texts. A portion of the sequence is presented at the bottom of a CRT screen. Above the sequence is its landscape, which looks like a mountain range. Each mountain corresponds to a subsequence of the sequence. At the peak of every mountain is written the number of times that the subsequence appears. A data structure called a DAWG, which can be built in time proportional to the length of the sequence, is used to construct the landscape. For the 40 thousand bases of bacteriophage T7, the DAWG can be built in 30 seconds. The time to display any portion of the landscape is less than a second. Using sequence landscapes, one can quickly locate significant repeats. PMID:3753762

  9. Sequencing technologies and genome sequencing.

    PubMed

    Pareek, Chandra Shekhar; Smoczynski, Rafal; Tretyn, Andrzej

    2011-11-01

    The high-throughput - next generation sequencing (HT-NGS) technologies are currently the hottest topic in the field of human and animals genomics researches, which can produce over 100 times more data compared to the most sophisticated capillary sequencers based on the Sanger method. With the ongoing developments of high throughput sequencing machines and advancement of modern bioinformatics tools at unprecedented pace, the target goal of sequencing individual genomes of living organism at a cost of $1,000 each is seemed to be realistically feasible in the near future. In the relatively short time frame since 2005, the HT-NGS technologies are revolutionizing the human and animal genome researches by analysis of chromatin immunoprecipitation coupled to DNA microarray (ChIP-chip) or sequencing (ChIP-seq), RNA sequencing (RNA-seq), whole genome genotyping, genome wide structural variation, de novo assembling and re-assembling of genome, mutation detection and carrier screening, detection of inherited disorders and complex human diseases, DNA library preparation, paired ends and genomic captures, sequencing of mitochondrial genome and personal genomics. In this review, we addressed the important features of HT-NGS like, first generation DNA sequencers, birth of HT-NGS, second generation HT-NGS platforms, third generation HT-NGS platforms: including single molecule Heliscope™, SMRT™ and RNAP sequencers, Nanopore, Archon Genomics X PRIZE foundation, comparison of second and third HT-NGS platforms, applications, advances and future perspectives of sequencing technologies on human and animal genome research.

  10. The complete amino acid sequence of a trypsin inhibitor from Bauhinia variegata var. candida seeds.

    PubMed

    Di Ciero, L; Oliva, M L; Torquato, R; Köhler, P; Weder, J K; Camillo Novello, J; Sampaio, C A; Oliveira, B; Marangoni, S

    1998-11-01

    Trypsin inhibitors of two varieties of Bauhinia variegata seeds have been isolated and characterized. Bauhinia variegata candida trypsin inhibitor (BvcTI) and B. variegata lilac trypsin inhibitor (BvlTI) are proteins with Mr of about 20,000 without free sulfhydryl groups. Amino acid analysis shows a high content of aspartic acid, glutamic acid, serine, and glycine, and a low content of histidine, tyrosine, methionine, and lysine in both inhibitors. Isoelectric focusing for both varieties detected three isoforms (pI 4.85, 5.00, and 5.15), which were resolved by HPLC procedure. The trypsin inhibitors show Ki values of 6.9 and 1.2 nM for BvcTI and BvlTI, respectively. The N-terminal sequences of the three trypsin inhibitor isoforms from both varieties of Bauhinia variegata and the complete amino acid sequence of B. variegata var. candida L. trypsin inhibitor isoform 3 (BvcTI-3) are presented. The sequences have been determined by automated Edman degradation of the reduced and carboxymethylated proteins of the peptides resulting from Staphylococcus aureus protease and trypsin digestion. BvcTI-3 is composed of 167 residues and has a calculated molecular mass of 18,529. Homology studies with other trypsin inhibitors show that BvcTI-3 belongs to the Kunitz family. The putative active site encompasses Arg (63)-Ile (64).

  11. Purification, characterization, and complete amino acid sequence of a thioredoxin from a green alga, Chlamydomonas reinhardtii.

    PubMed

    Decottignies, P; Schmitter, J M; Jacquot, J P; Dutka, S; Picaud, A; Gadal, P

    1990-07-01

    Two thioredoxins (named Ch1 and Ch2 in reference to their elution pattern on an anion-exchange column) have been purified to homogeneity from the green alga, Chlamydomonas reinhardtii. In this paper, we described the properties and the sequence of the most abundant form, Ch2. Its activity in various enzymatic assays has been compared with those of Escherichia coli and spinach thioredoxins. C. reinhardtii thioredoxin Ch2 can serve as a substrate for E. coli thioredoxin reductase with a lower efficiency when compared to the homologous system. In the presence of dithiothreitol (DTT), the protein is able to catalyze the reduction of porcine insulin. Thioredoxin Ch2 is as efficient as its spinach counterpart in the DTT or light activation of corn NADP-malate dehydrogenase, but it only activates spinach fructose-1, 6-bisphosphatase at very high concentrations. The complete primary structure of the C. reinhardtii thioredoxin Ch2 was determined by automated Edman degradation of the intact protein and of peptides derived from trypsin, chymotrypsin, clostripain, and SV8 protease digestions. It consists of a polypeptide of 106 amino acids (MW 11,808) and contains the well-conserved active site sequence Trp-Cys-Gly-Pro-Cys. The sequence of the algal thioredoxin Ch2 has been compared to that of thioredoxins from other sources and has the greatest similarity (67%) with the thioredoxin from Anabaena 7119.

  12. The complete amino acid sequence of growth hormone of an elasmobranch, the blue shark (Prionace glauca).

    PubMed

    Yamaguchi, K; Yasuda, A; Lewis, U J; Yokoo, Y; Kawauchi, H

    1989-02-01

    The complete amino acid sequence of growth hormone (GH) from a phylogenetically ancient fish, the blue shark (Prionace glauca), was determined. The shark GH isolated from pituitary glands by U. J. Lewis, R. N. P. Singh, B. K. Seavey, R. Lasker, and G. E. Pickford (1972, Fish. Bull. 70, 933-939) was purified by reversed-phase high-performance liquid chromatography. The hormone was reduced, carboxymethylated, and subsequently cleaved in turn with cyanogen bromide and Staphylococcus aureus protease. The intact protein was also cleaved with lysyl endopeptidase and o-iodosobenzoic acid. The resulting peptide fragments were separated by rpHPLC and submitted to sequence analysis by automated and manual Edman methods. The shark GH consists of 183 amino acid residues with a calculated molecular weight of 21,081. Sequence comparisons revealed that the elasmobranch GH is considerably more similar to tetrapod GHs (e.g., 68% identity with sea turtle GH, 63% with chicken GH, and 58% with ovine GH) than teleostean GHs (e.g., 38% identities with salmon GH and 42% with bonito GH) except for eel GH (61% identity), and substantiates the earlier finding derived from the immunochemical and biological studies (Hayashida and Lewis, 1978) that the primitive fish are less diverged from the main line of vertebrate evolution leading to the tetrapod than are the modern bony fish.

  13. Complete amino acid sequences of three proteinase inhibitors from white sword bean (Canavalia gladiata).

    PubMed

    Park, S S; Sumi, T; Ohba, H; Nakamura, O; Kimura, M

    2000-10-01

    Three major serine proteinase inhibitors (SBI-1, -2, and -3) were purified from the seeds of white sword bean (Canavalia gladiata) by FPLC and reversed-phase HPLC. The sequences of these inhibitors were established by automatic Edman degradation and TOF-mass spectrometry. SBI-1, -2, and -3 consisted of 72, 73, and 75 amino acid residues, with molecular masses of 7806.5, 7919.8, and 8163.4, respectively. The sequences of SBI-1 and -2 coincided with those of CLT I and II [Terada et al. (1994) Biosci. Biotech. Biochem., 58, 376-379] except only N- or C-terminal amino acid residues. Analysis of the amino acid sequences showed that the active sites of the inhibitors contained a Lys21-Ser22 against trypsin and Leu48-Ser49 against chymotrypsin, respectively. Further, it became apparent that about seven disulfide bonds were present. These results suggest that sword bean inhibitors are members of the Bowman-Birk proteinase inhibitor family.

  14. The amino acid sequence of Ole e I, the major allergen from olive tree (Olea europaea) pollen.

    PubMed

    Villalba, M; Batanero, E; López-Otín, C; Sánchez, L M; Monsalve, R I; González de la Peña, M A; Lahoz, C; Rodríguez, R

    1993-09-15

    The complete primary structure of the major allergen from Olea europaea (olive tree) pollen, Ole e I (IUIS nomenclature), has been determined. The amino acid sequence was established by automated Edman degradation of the reduced and alkylated molecule as well as of selected fragments obtained by proteolytic digestions. Ole e I contains a single polypeptide chain of 145 amino acid residues with a calculated molecular mass of 16331 Da. No free sulfhydryl groups have been detected in the native protein. The molecule contains a putative glycosylation site. A high degree of microheterogeneity has been observed, mainly centered in the first 33% of the molecule. Comparison of Ole e I sequence with protein sequence databases showed no similarity with other known allergens. However, it has a 36% and 38% sequence identity with the putative polypeptide structures, deduced, respectively, from nucleotide sequences of genes isolated from tomato anthers and corn pollen, which have been suggested to be involved in the growing of the pollen tube. Therefore, the olive tree allergen may be a constitutive protein of the pollen involved in reproductive functions.

  15. Isolation and sequence of tryptic peptides from the proton-pumping ATPase of the oat plasma membrane.

    PubMed

    Schaller, G E; Sussman, M R

    1988-02-01

    In crude extracts of plant tissue, the M(r) = 100,000 proton-pumping ATPase constitutes less than 0.01% of the total cell protein. A large-scale purification procedure is described that has been used to obtain extensive protein sequence information from this enzyme. Plasma membrane vesicles enriched in ATPase activity were obtained from extracts of oat roots by routine differential and density gradient centrifugation. Following a detergent wash, the ATPase was resolved from other integral membrane proteins by size fractionation at 4 degrees C in the presence of lithium dodecyl sulfate. After carboxymethylation of cysteine residues and removal of detergent, the ATPase was digested with trypsin and resultant peptide fragments separated by reverse phase high performance liquid chromatography. Peptides were recovered with high yield and were readily sequenced by automated Edman degradation on a gas-phase sequencer. Of the eight peptides sequenced, six showed strong homology with known amino acid sequences of the fungal proton-pumping and other cation-transporting ATPases.

  16. Skin peptides from anurans of the Litoria rubella Group: sequence determination using electrospray mass spectrometry. Opioid activity of two major peptides.

    PubMed

    Jackway, Rebecca J; Maselli, Vita M; Musgrave, Ian F; Maclean, Micheal J; Tyler, Michael J; Bowie, John H

    2009-04-01

    Many species of frogs of the genus Litoria secrete bioactive peptides from their skin glands. These peptides are normally host-defence compounds and may have one, or more of the following activities; smooth muscle contraction, analgesic, antimicrobial, antiviral, lymphocyte proliferator (immunomodulator) and neuronal nitric oxide synthase (nNOS) inactivation. Two frog species of the Litoria rubella Group that have been studied before, namely, Litoria electrica and Litoria rubella, are different from other species of the genus Litoria in that they produce small peptides that show neither membrane, lymphocyte nor nNOS activity. In this study we have used electrospray mass spectrometry together with Edman sequencing to identify eight skin peptides of the third member of this Group, Litoria dentata: surprisingly, none of these peptides show activity in our biological screening program. However, two major peptides (FPWL-NH(2) and FPWP-NH(2)) from L. electrica and L. rubella are opioids at the micromolar concentration.

  17. Complete amino acid sequence of branched-chain amino acid aminotransferase (transaminase B) of Salmonella typhimurium, identification of the coenzyme-binding site and sequence comparison analysis

    SciTech Connect

    Feild, M.J.

    1988-01-01

    The complete amino acid sequence of the subunit of branched-chain amino acid aminotransferase of Salmonella typhimurium was determined by automated Edman degradation of peptide fragments generated by chemical and enzymatic digestion of S-carboxymethylated and S-pyridylethylated transaminase B. Peptide fragments of transaminase B were generated by treatment of the enzyme with trypsin, Staphylococcus aureus V8 protease, endoproteinase Lys-C, and cyanogen bromide. Protocols were developed for separation of the peptide fragments by reverse-phase high performance liquid chromatography (HPLC), ion-exchange HPLC, and SDS-urea gel electrophoresis. The enzyme subunit contains 308 amino acid residues and has a molecular weight of 33,920 daltons. The coenzyme-binding site was determined by treatment of the enzyme, containing bound pyridoxal 5-phosphate, with tritiated sodium borohydride prior to trypsin digestion. Monitoring radioactivity incorporation and peptide map comparisons with an apoenzyme tryptic digest, allowed identification of the pyridoxylated-peptide which was isolated by reverse-phase HPLC and sequenced. The coenzyme-binding site is a lysyl residue at position 159. Some peptides were further characterized by fast atom bombardment mass spectrometry.

  18. Sample sequencing

    SciTech Connect

    Prange, C.

    1994-04-01

    The goal of the Human Genome Project is to sequence all 3 billion basepairs of human DNA. At Lawrence Livermore Lab, attention is focused on Chromosome 19, which has been estimated to contain approximately 2000 genes. So far, only 200 have been mapped to specific areas on the chromosome. For this reason, a simple method is needed to predict the most likely locations of the coding regions in the DNA. In addition, there is also a need for unique market sites (STS`s) along the chromosome. Sample sequencing uses standard cloning techniques to prepare DNA for sequencing. Once sequence is obtained, it is analyzed using databases to predict the regions most likely to contain genes. All sequences may also be used to generate STS`s. So far, 21 fragments from five different clones have been completely sequenced, with fragments from eight more clones in progress. Constant improvement of methods to increase efficiency and accuracy combined with utilization of the most current databases available make sample sequencing a useful tool for reaching the goals of the Human Genome Project.

  19. Dna Sequencing

    DOEpatents

    Tabor, Stanley; Richardson, Charles C.

    1995-04-25

    A method for sequencing a strand of DNA, including the steps off: providing the strand of DNA; annealing the strand with a primer able to hybridize to the strand to give an annealed mixture; incubating the mixture with four deoxyribonucleoside triphosphates, a DNA polymerase, and at least three deoxyribonucleoside triphosphates in different amounts, under conditions in favoring primer extension to form nucleic acid fragments complementory to the DNA to be sequenced; labelling the nucleic and fragments; separating them and determining the position of the deoxyribonucleoside triphosphates by differences in the intensity of the labels, thereby to determine the DNA sequence.

  20. Bacteriocuprein superoxide dismutase of Photobacterium leiognathi. Isolation and sequence of the gene and evidence for a precursor form.

    PubMed

    Steinman, H M

    1987-02-05

    The gene encoding the bacteriocuprein superoxide dismutase from Photobacterium leiognathi, American Type Culture Collection strain 25521, was cloned in a pUC12 vector and sequenced. The nucleotide sequence predicted a 22-residue leader peptide amino-terminal to the known bacteriocuprein sequence. The expected precursor bacteriocuprein was directly identified in the in vitro translation products of the cloned gene by polyacrylamide gel electrophoresis and automated Edman degradation. Enzymatically active bacteriocuprein that lacked the leader peptide was identified in sonic extracts of Escherichia coli hosts containing the cloned gene. A single transcript of 580 nucleotides was observed in blots of total P. leiognathi RNA, and a unique site of transcriptional initiation was identified by primer extension analysis. P. leiognathi bacteriocuprein is the first bacteriocuprein whose gene has been isolated and sequenced and the first copper-zinc superoxide dismutase in which a leader peptide has been found. The presence of a leader peptide suggests that the bacteriocuprein is localized in the membrane or periplasm, in contrast to the eukaryotic copper-zinc superoxide dismutases, which are cytoplasmic enzymes. Such a difference in intracellular location could be important for understanding the presence and function of the uncommon, bacteriocuprein superoxide dismutase in P. leiognathi.

  1. Complete amino acid sequence of a Lolium perenne (perennial rye grass) pollen allergen, Lol p II.

    PubMed

    Ansari, A A; Shenbagamurthi, P; Marsh, D G

    1989-07-05

    The complete amino acid sequence of a Lolium perenne (rye grass) pollen allergen, Lol p II was determined by automated Edman degradation of the protein and selected fragments. Cleavage of the protein by enzymatic and chemical techniques established an unambiguous sequence for the protein. Lol p II contains 97 amino acid residues, with a calculated molecular weight of 10,882. The protein lacks cysteine and glutamine and shows no evidence of glycosylation. Theoretical predictions by Fraga's (Fraga, S. (1982) Can. J. Chem. 60, 2606-2610) and Hopp and Woods' (Hopp, T. P., and Woods, K. R. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 3824-3828) methods indicate the presence of four hydrophilic regions, which may contribute to sequential or parts of conformational B-cell epitopes. Analysis of amphipathic regions by Berzofsky's method indicates the presence of a highly amphipathic region, which may contain, or contribute to, an Ia/T-cell epitope. This latter segment of Lol p II was found to be highly homologous with an antibody-binding segment of the major rye allergen Lol p I and may explain why immune responsiveness to both the allergens is associated with HLA-DR3.

  2. Plasma-desorption mass spectrometry as an aid in protein sequence determination. Application of the method on a cuticular protein from the migratory locust (Locusta migratoria).

    PubMed Central

    Klarskov, K; Højrup, P; Andersen, S O; Roepstorff, P

    1989-01-01

    The complete amino acid sequence of a structural protein, protein 8, isolated from the pharate cuticle of the locust Locusta migratoria was determined. Protein 8 contains 148 amino acid residues and has an Mr of 15,224. By the extensive use of information obtained by plasma-desorption mass spectrometry (p.d.m.s.) it was possible to reduce the need for conventional sequence determination and to improve the reliability of the results. On the basis of the determined Mr of the intact protein all the peptides that constitute the complete sequence could be isolated from a time-course enzymic digestion. The isolated peptides were sequenced by using a combination of Edman degradation and carboxypeptidase digestion monitored by p.d.m.s. The alignment of the peptides was established from the time-course digestion and further verified by a second enzymic digestion. The primary structure of the protein consists of two hydrophilic and two hydrophobic regions. The hydrophobic regions are enriched in alanine, valine and proline and dominated by a repetitive sequence Ala-Ala-Pro-(Ala/Val). The sequence strengthens the view that the cuticle proteins belong to a unique family of structural proteins. PMID:2590176

  3. MSLICE Sequencing

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Norris, Jeffrey S.; Morris, John R.

    2011-01-01

    MSLICE Sequencing is a graphical tool for writing sequences and integrating them into RML files, as well as for producing SCMF files for uplink. When operated in a testbed environment, it also supports uplinking these SCMF files to the testbed via Chill. This software features a free-form textural sequence editor featuring syntax coloring, automatic content assistance (including command and argument completion proposals), complete with types, value ranges, unites, and descriptions from the command dictionary that appear as they are typed. The sequence editor also has a "field mode" that allows tabbing between arguments and displays type/range/units/description for each argument as it is edited. Color-coded error and warning annotations on problematic tokens are included, as well as indications of problems that are not visible in the current scroll range. "Quick Fix" suggestions are made for resolving problems, and all the features afforded by modern source editors are also included such as copy/cut/paste, undo/redo, and a sophisticated find-and-replace system optionally using regular expressions. The software offers a full XML editor for RML files, which features syntax coloring, content assistance and problem annotations as above. There is a form-based, "detail view" that allows structured editing of command arguments and sequence parameters when preferred. The "project view" shows the user s "workspace" as a tree of "resources" (projects, folders, and files) that can subsequently be opened in editors by double-clicking. Files can be added, deleted, dragged-dropped/copied-pasted between folders or projects, and these operations are undoable and redoable. A "problems view" contains a tabular list of all problems in the current workspace. Double-clicking on any row in the table opens an editor for the appropriate sequence, scrolling to the specific line with the problem, and highlighting the problematic characters. From there, one can invoke "quick fix" as described

  4. Amino acid sequence and physiological characterization of toxins from the venom of the scorpion Centruroides limpidus tecomanus Hoffmann.

    PubMed

    Martin, B M; Carbone, E; Yatani, A; Brown, A M; Ramírez, A N; Gurrola, G B; Possani, L D

    1988-01-01

    The complete amino acid sequence of the major toxic component (II.20.3.4), named toxin 1, from the venom of the Mexican scorpion C. l. tecomanus is reported. The sequence (66 amino acids) was obtained by direct Edman degradation of reduced and alkylated toxin, followed by sequence determination of selected peptides separated after enzymatic cleavage with S. aureus V8 protease. In cultured chick dorsal root ganglion cells, 0.5 microM toxin 1 slowed down specifically the time course of Na+ current inactivation, while Ca2+ currents from the same preparation were little affected. In neonatal rat ventricular heart cells, toxin 1, at concentrations between 0.1 and 0.5 microM, reduced Na+ currents without changing the kinetics and Ca2+ currents were unaffected. Comparative analysis of the primary structure of this toxin with other scorpion toxins shows a high degree of similarity with the north American scorpion toxins. This analysis suggests that the 'fine tuning' of the molecular mechanism of action of these toxins is related to variations in the primary structure as well as to the type of membrane under study (tissue specificity).

  5. Rapid 'de novo' peptide sequencing by a combination of nanoelectrospray, isotopic labeling and a quadrupole/time-of-flight mass spectrometer.

    PubMed

    Shevchenko, A; Chernushevich, I; Ens, W; Standing, K G; Thomson, B; Wilm, M; Mann, M

    1997-01-01

    Protein microanalysis usually involves the sequencing of gel-separated proteins available in very small amounts. While mass spectrometry has become the method of choice for identifying proteins in databases, in almost all laboratories 'de novo' protein sequencing is still performed by Edman degradation. Here we show that a combination of the nanoelectrospray ion source, isotopic end labeling of peptides and a quadrupole/ time-of-flight instrument allows facile read-out of the sequences of tryptic peptides. Isotopic labeling was performed by enzymatic digestion of proteins in 1:1 16O/18O water, eliminating the need for peptide derivatization. A quadrupole/time-of-flight mass spectrometer was constructed from a triple quadrupole and an electrospray time-of-flight instrument. Tandem mass spectra of peptides were obtained with better than 50 ppm mass accuracy and resolution routinely in excess of 5000. Unique and error tolerant identification of yeast proteins as well as the sequencing of a novel protein illustrate the potential of the approach. The high data quality in tandem mass spectra and the additional information provided by the isotopic end labeling of peptides enabled automated interpretation of the spectra via simple software algorithms. The technique demonstrated here removes one of the last obstacles to routine and high throughput protein sequencing by mass spectrometry.

  6. Insertion Sequences

    PubMed Central

    Mahillon, Jacques; Chandler, Michael

    1998-01-01

    Insertion sequences (ISs) constitute an important component of most bacterial genomes. Over 500 individual ISs have been described in the literature to date, and many more are being discovered in the ongoing prokaryotic and eukaryotic genome-sequencing projects. The last 10 years have also seen some striking advances in our understanding of the transposition process itself. Not least of these has been the development of various in vitro transposition systems for both prokaryotic and eukaryotic elements and, for several of these, a detailed understanding of the transposition process at the chemical level. This review presents a general overview of the organization and function of insertion sequences of eubacterial, archaebacterial, and eukaryotic origins with particular emphasis on bacterial elements and on different aspects of the transposition mechanism. It also attempts to provide a framework for classification of these elements by assigning them to various families or groups. A total of 443 members of the collection have been grouped in 17 families based on combinations of the following criteria: (i) similarities in genetic organization (arrangement of open reading frames); (ii) marked identities or similarities in the enzymes which mediate the transposition reactions, the recombinases/transposases (Tpases); (iii) similar features of their ends (terminal IRs); and (iv) fate of the nucleotide sequence of their target sites (generation of a direct target duplication of determined length). A brief description of the mechanism(s) involved in the mobility of individual ISs in each family and of the structure-function relationships of the individual Tpases is included where available. PMID:9729608

  7. Protein identification with N and C-terminal sequence tags in proteome projects.

    PubMed

    Wilkins, M R; Gasteiger, E; Tonella, L; Ou, K; Tyler, M; Sanchez, J C; Gooley, A A; Walsh, B J; Bairoch, A; Appel, R D; Williams, K L; Hochstrasser, D F

    1998-05-08

    Genome sequences are available for increasing numbers of organisms. The proteomes (protein complement expressed by the genome) of many such organisms are being studied with two-dimensional (2D) gel electrophoresis. Here we have investigated the application of short N-terminal and C-terminal sequence tags to the identification of proteins separated on 2D gels. The theoretical N and C termini of 15, 519 proteins, representing all SWISS-PROT entries for the organisms Mycoplasma genitalium, Bacillus subtilis, Escherichia coli, Saccharomyces cerevisiae and human, were analysed. Sequence tags were found to be surprisingly specific, with N-terminal tags of four amino acid residues found to be unique for between 43% and 83% of proteins, and C-terminal tags of four amino acid residues unique for between 74% and 97% of proteins, depending on the species studied. Sequence tags of five amino acid residues were found to be even more specific. To utilise this specificity of sequence tags for protein identification, we created a world-wide web-accessible protein identification program, TagIdent (http://www.expasy.ch/www/tools.html), which matches sequence tags of up to six amino acid residues as well as estimated protein pI and mass against proteins in the SWISS-PROT database. We demonstrate the utility of this identification approach with sequence tags generated from 91 different E. coli proteins purified by 2D gel electrophoresis. Fifty-one proteins were unambiguously identified by virtue of their sequence tags and estimated pI and mass, and a further 11 proteins identified when sequence tags were combined with protein amino acid composition data. We conlcude that the TagIdent identification approach is best suited to the identification of proteins from prokaryotes whose complete genome sequences are available. The approach is less well suited to proteins from eukaryotes, as many eukaryotic proteins are not amenable to sequencing via Edman degradation, and tag protein

  8. Isolation, characterization, and cDNA sequencing of alpha-1-antiproteinase-like protein from rainbow trout seminal plasma.

    PubMed

    Mak, Monika; Mak, Paweł; Olczak, Mariusz; Szalewicz, Agata; Glogowski, Jan; Dubin, Adam; Watorek, Wiesław; Ciereszko, Andrzej

    2004-03-17

    Seminal plasma of teleost fish contains serine proteinase inhibitors related to those present in blood. These inhibitors can be bound to Q-Sepharose and sequentially eluted with a NaCl gradient. In the present study, using a two-step procedure, we purified (73-fold to homogeneity) and characterized the inhibitor eluted as the second fraction of antitrypsin activity (inhibitor II) from Q-Sepharose. The molecular weight of this inhibitor was estimated to be 56 kDa with an isoelectric point of 5.4. It effectively inhibited trypsin and chymotrypsin but was less effective against elastase. It formed SDS-stable complexes with cod and bovine trypsin. Inhibitor II appeared to be a glycoprotein. Carbohydrate content was determined to be 16%. N-terminal Edman sequencing allowed identification of the first 30 N-terminal amino acids HDGDHAGHTEDHHHHLHHIAGEAHPQHSHG and 25 amino acids within the reactive loop IMPMSLPDTIMLNRPFLLFILEDST. The N-terminal sequence did not match any known sequence, however, the sequence within the reactive loop was significantly similar to carp and mammalian alpha1-antiproteinases. Both sequences were used to construct primers and obtain a cDNA sequence from liver. The mRNA coding the protein is 1675 nt in length including a single open reading frame of 1281 nt that encodes 426 amino acid residues. Analysis of this sequence indicated the presence of putative conserved serpin domains and confirmed the similarity to carp alpha1-antiproteinase and mammalian alpha1-antiproteinase. Our results indicate that inhibitor II belongs to the serpin superfamily and is similar to alpha1-antiproteinase.

  9. Bacterial pro-transglutaminase from Streptoverticillium mobaraense--purification, characterisation and sequence of the zymogen.

    PubMed

    Pasternack, R; Dorsch, S; Otterbach, J T; Robenek, I R; Wolf, S; Fuchsbauer, H L

    1998-11-01

    The zymogen of bacterial transglutaminase was found during cultivation of Streptoverticillium mobaraense (DSMZ strain) using rabbit antibodies raised against the active enzyme. Ion-exchange chromatography at pH 5.0 yielded a highly purified pro-enzyme. Structure information was obtained by means of Edman degradation and analysis of PCR amplified nucleotide fragments. The data revealed an excess of negatively charged amino acids in the pro-region resulting in a decreased isoelectric point of the zymogen. Additionally, the new sequence gave rise to some modifications to the previously published hypothetical structure of prepro-transglutaminase derived from genomic DNA [Washizu, K., Ando, K., Koikeda, S., Hirose, S., Matsuura, A., Takagi, H., Motoki, M. & Takeuchi, K. (1994) Biosci. Biotechnol. Biochem. 58, 82-87]. Inactive transglutaminase, which carries an activation peptide of 45 amino acids, has a calculated molecular mass of 42445 Da. Its pro-region provides for both suppression of activity and increased thermostability. Furthermore, it could be shown that the micro-organism produces a protease which cleaves pro-transglutaminase at the C-side of Pro45. Rapid transformation of the mature enzyme also occurs by addition of other proteases. During conversion, 43 and 41 amino acid peptides are released by bovine trypsin and dispase from Bacillus polymyxa, respectively. The detection of endogenous substrates in the murein layer makes discussion of the physiological role of bacterial transglutaminases necessary.

  10. Complete primary structure of a Lolium perenne (perennial rye grass) pollen allergen, Lol p III: comparison with known Lol p I and II sequences.

    PubMed

    Ansari, A A; Shenbagamurthi, P; Marsh, D G

    1989-10-17

    The complete amino acid sequence of a Lolium perenne (rye grass) pollen allergen, Lol p III, determined by the automated Edman degradation of the protein and its selected fragments, is reported in this paper. Cleavage by enzymatic and chemical techniques established unambiguously the sequence for this 97-residue protein (Mr = 10,909), which lacks cysteine and shows no evidence of glycosylation. The sequence of Lol p III is very similar to that of another L. perenne allergen, Lol p II, which was sequenced recently; of the 97 positions in the two proteins, 57 are occupied by identical amino acids (59% identity). In addition, both allergens share a similar structure with an antibody-binding fragment of a third L. perenne allergen, Lol p I. Since human antibody responsiveness to all these three allergens is associated with HLA-DR3, and since the structure common to the three molecules shows high degrees of amphipathicity in Lol p II and III, we speculate that this common segment in the three molecules might contain or contribute to the respectively Ia/T-cell sites.

  11. Molecular characterization of a cytokinin-inducible periwinkle protein showing sequence homology with pathogenesis-related proteins and the Bet v 1 allergen family.

    PubMed

    Carpin, S; Laffer, S; Schoentgen, F; Valenta, R; Chénieux, J C; Rideau, M; Hamdi, S

    1998-03-01

    Cytokinin treatment of periwinkle callus cultures increased the accumulation of a protein, designated T1, in two-dimensional separated protein extracts. The first 30 NH2-terminal amino acids were determined by Edman degradation and showed significant sequence homology with intracellular pathogenesis-related (IPR) plant proteins and the Bet v 1 allergen family. The deduced amino acid sequence of cDNAs coding for T1, isolated by RT-PCR and 5' RACE-PCR, exhibited an average sequence identity of 40% with both IPR and Bet v 1-related allergens. T1 and all related proteins contained a p-loop motif typically found in nucleotide-binding proteins as the most conserved sequence feature. Northern blot analysis showed that cytokinin treatment of periwinkle callus induced T1 transcripts, whereas addition of 2,4-dichlorophenoxyacetic acid inhibited this accumulation. Hybridization of genomic periwinkle DNA with the T1 cDNA suggested that the protein is encoded by a single-copy gene. Immunoblot studies with a panel of Bet v 1-specific antibodies and sera from Bet v 1 allergic individuals identified T1 as a protein that is immunologically distinct from the Bet v 1 allergen family and has no allergenic properties.

  12. Sequencing and analysis of the gene encoding the alpha-toxin of Clostridium novyi proves its homology to toxins A and B of Clostridium difficile.

    PubMed

    Hofmann, F; Herrmann, A; Habermann, E; von Eichel-Streiber, C

    1995-06-25

    A library of total Clostridium novyi DNA was established and screened for the alpha-toxin gene (tcn alpha) by hybridization with oligonucleotides derived from a partial N-terminal sequence and by using specific antisera. Overlapping subgenic tcn alpha fragments were isolated and subsequently the total sequence of tcn alpha was determined. The 6534 nucleotide open reading frame encodes a polypeptide of M(r) 250,166 and pI 5.9. The N-terminal alpha-toxin (Tcn alpha) sequence MLITREQLMKIASIP determined by Edman degradation confirmed the identity of the reading frame and the assignment of the translation start point. The toxin is not modified posttranslationally at its N-terminus nor does it consist of different subunits. Overall the amino acid sequence shows 48% homology between the Tcn alpha and both toxins A (TcdA) and B (TcdB) of Clostridium difficile. The C-terminal 382 residues of Tcn alpha constitute a repetitive domain similar to those reported for TcdA and TcdB of C. difficile. The individual repeat motifs of these three toxins consist of oligopeptides some 19-52 amino acids in length, arranged in four to five different groups. Genetic, biochemical and pharmacological data thus confirm that the three toxins belong to one subgroup, designated large clostridial cytotoxins (LCT). Further definition of their structure and detailed molecular action should allow the LCTs to be used tools for the analysis of microfilament assembly and function.

  13. Production, purification, sequencing and activity spectra of mutacins D-123.1 and F-59.1

    PubMed Central

    2011-01-01

    Background The increase in bacterial resistance to antibiotics impels the development of new anti-bacterial substances. Mutacins (bacteriocins) are small antibacterial peptides produced by Streptococcus mutans showing activity against bacterial pathogens. The objective of the study was to produce and characterise additional mutacins in order to find new useful antibacterial substances. Results Mutacin F-59.1 was produced in liquid media by S. mutans 59.1 while production of mutacin D-123.1 by S. mutans 123.1 was obtained in semi-solid media. Mutacins were purified by hydrophobic chromatography. The amino acid sequences of the mutacins were obtained by Edman degradation and their molecular mass was determined by mass spectrometry. Mutacin F-59.1 consists of 25 amino acids, containing the YGNGV consensus sequence of pediocin-like bacteriocins with a molecular mass calculated at 2719 Da. Mutacin D-123.1 has an identical molecular mass (2364 Da) with the same first 9 amino acids as mutacin I. Mutacins D-123.1 and F-59.1 have wide activity spectra inhibiting human and food-borne pathogens. The lantibiotic mutacin D-123.1 possesses a broader activity spectrum than mutacin F-59.1 against the bacterial strains tested. Conclusion Mutacin F-59.1 is the first pediocin-like bacteriocin identified and characterised that is produced by Streptococcus mutans. Mutacin D-123.1 appears to be identical to mutacin I previously identified in different strains of S. mutans. PMID:21477375

  14. Novel proline-hydroxyproline glycopeptides from the dandelion (Taraxacum officinale Wigg.) flowers: de novo sequencing and biological activity.

    PubMed

    Astafieva, Alexandra A; Enyenihi, Atim A; Rogozhin, Eugene A; Kozlov, Sergey A; Grishin, Eugene V; Odintsova, Tatyana I; Zubarev, Roman A; Egorov, Tsezi A

    2015-09-01

    Two novel homologous peptides named ToHyp1 and ToHyp2 that show no similarity to any known proteins were isolated from Taraxacum officinale Wigg. flowers by multidimensional liquid chromatography. Amino acid and mass spectrometry analyses demonstrated that the peptides have unusual structure: they are cysteine-free, proline-hydroxyproline-rich and post-translationally glycosylated by pentoses, with 5 carbohydrates in ToHyp2 and 10 in ToHyp1. The ToHyp2 peptide with a monoisotopic molecular mass of 4350.3Da was completely sequenced by a combination of Edman degradation and de novo sequencing via top down multistage collision induced dissociation (CID) and higher energy dissociation (HCD) tandem mass spectrometry (MS(n)). ToHyp2 consists of 35 amino acids, contains eighteen proline residues, of which 8 prolines are hydroxylated. The peptide displays antifungal activity and inhibits growth of Gram-positive and Gram-negative bacteria. We further showed that carbohydrate moieties have no significant impact on the peptide structure, but are important for antifungal activity although not absolutely necessary. The deglycosylated ToHyp2 peptide was less active against the susceptible fungus Bipolaris sorokiniana than the native peptide. Unique structural features of the ToHyp2 peptide place it into a new family of plant defense peptides. The discovery of ToHyp peptides in T. officinale flowers expands the repertoire of molecules of plant origin with practical applications.

  15. Purification and sequencing of radish seed calmodulin antagonists phosphorylated by calcium-dependent protein kinase.

    PubMed

    Polya, G M; Chandra, S; Condron, R

    1993-02-01

    A family of radish (Raphanus sativus) calmodulin antagonists (RCAs) was purified from seeds by extraction, centrifugation, batch-wise elution from carboxymethyl-cellulose, and high performance liquid chromatography (HPLC) on an SP5PW cation-exchange column. This RCA fraction was further resolved into three calmodulin antagonist polypeptides (RCA1, RCA2, and RCA3) by denaturation in the presence of guanidinium HCl and mercaptoethanol and subsequent reverse-phase HPLC on a C8 column eluted with an acetonitrile gradient in the presence of 0.1% trifluoroacetic acid. The RCA preparation, RCA1, RCA2, RCA3, and other radish seed proteins are phosphorylated by wheat embryo Ca(2+)-dependent protein kinase (CDPK). The RCA preparation contains other CDPK substrates in addition to RCA1, RCA2, and RCA3. The RCA preparation, RCA1, RCA2, and RCA3 inhibit chicken gizzard calmodulin-dependent myosin light chain kinase assayed with a myosin-light chain-based synthetic peptide substrate (fifty percent inhibitory concentrations of RCA2 and RCA3 are about 7 and 2 microM, respectively). N-terminal sequencing by sequential Edman degradation of RCA1, RCA2, and RCA3 revealed sequences having a high homology with the small subunit of the storage protein napin from Brassica napus and with related proteins. The deduced amino acid sequences of RCA1, RCA2, RCA3, and RCA3' (a subform of RCA3) have agreement with average molecular masses from electrospray mass spectrometry of 4537, 4543, 4532, and 4560 kD, respectively. The only sites for serine phosphorylation are near or at the C termini and hence adjacent to the sites of proteolytic precursor cleavage.

  16. Purification and sequencing of radish seed calmodulin antagonists phosphorylated by calcium-dependent protein kinase.

    PubMed Central

    Polya, G M; Chandra, S; Condron, R

    1993-01-01

    A family of radish (Raphanus sativus) calmodulin antagonists (RCAs) was purified from seeds by extraction, centrifugation, batch-wise elution from carboxymethyl-cellulose, and high performance liquid chromatography (HPLC) on an SP5PW cation-exchange column. This RCA fraction was further resolved into three calmodulin antagonist polypeptides (RCA1, RCA2, and RCA3) by denaturation in the presence of guanidinium HCl and mercaptoethanol and subsequent reverse-phase HPLC on a C8 column eluted with an acetonitrile gradient in the presence of 0.1% trifluoroacetic acid. The RCA preparation, RCA1, RCA2, RCA3, and other radish seed proteins are phosphorylated by wheat embryo Ca(2+)-dependent protein kinase (CDPK). The RCA preparation contains other CDPK substrates in addition to RCA1, RCA2, and RCA3. The RCA preparation, RCA1, RCA2, and RCA3 inhibit chicken gizzard calmodulin-dependent myosin light chain kinase assayed with a myosin-light chain-based synthetic peptide substrate (fifty percent inhibitory concentrations of RCA2 and RCA3 are about 7 and 2 microM, respectively). N-terminal sequencing by sequential Edman degradation of RCA1, RCA2, and RCA3 revealed sequences having a high homology with the small subunit of the storage protein napin from Brassica napus and with related proteins. The deduced amino acid sequences of RCA1, RCA2, RCA3, and RCA3' (a subform of RCA3) have agreement with average molecular masses from electrospray mass spectrometry of 4537, 4543, 4532, and 4560 kD, respectively. The only sites for serine phosphorylation are near or at the C termini and hence adjacent to the sites of proteolytic precursor cleavage. PMID:8278508

  17. The sequence of sequencers: The history of sequencing DNA.

    PubMed

    Heather, James M; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way.

  18. Preserving sequence annotations across reference sequences.

    PubMed

    Tatum, Zuotian; Roos, Marco; Gibson, Andrew P; Taschner, Peter Em; Thompson, Mark; Schultes, Erik A; Laros, Jeroen Fj

    2014-01-01

    Matching and comparing sequence annotations of different reference sequences is vital to genomics research, yet many annotation formats do not specify the reference sequence types or versions used. This makes the integration of annotations from different sources difficult and error prone. As part of our effort to create linked data for interoperable sequence annotations, we present an RDF data model for sequence annotation using the ontological framework established by the OBO Foundry ontologies and the Basic Formal Ontology (BFO). We defined reference sequences as the common domain of integration for sequence annotations, and identified three semantic relationships between sequence annotations. In doing so, we created the Reference Sequence Annotation to compensate for gaps in the SO and in its mapping to BFO, particularly for annotations that refer to versions of consensus reference sequences. Moreover, we present three integration models for sequence annotations using different reference assemblies. We demonstrated a working example of a sequence annotation instance, and how this instance can be linked to other annotations on different reference sequences. Sequence annotations in this format are semantically rich and can be integrated easily with different assemblies. We also identify other challenges of modeling reference sequences with the BFO.

  19. Preserving sequence annotations across reference sequences

    PubMed Central

    2014-01-01

    Background Matching and comparing sequence annotations of different reference sequences is vital to genomics research, yet many annotation formats do not specify the reference sequence types or versions used. This makes the integration of annotations from different sources difficult and error prone. Results As part of our effort to create linked data for interoperable sequence annotations, we present an RDF data model for sequence annotation using the ontological framework established by the OBO Foundry ontologies and the Basic Formal Ontology (BFO). We defined reference sequences as the common domain of integration for sequence annotations, and identified three semantic relationships between sequence annotations. In doing so, we created the Reference Sequence Annotation to compensate for gaps in the SO and in its mapping to BFO, particularly for annotations that refer to versions of consensus reference sequences. Moreover, we present three integration models for sequence annotations using different reference assemblies. Conclusions We demonstrated a working example of a sequence annotation instance, and how this instance can be linked to other annotations on different reference sequences. Sequence annotations in this format are semantically rich and can be integrated easily with different assemblies. We also identify other challenges of modeling reference sequences with the BFO. PMID:25093075

  20. Purification, amino acid sequence, and some properties of rabbit kidney lysozyme.

    PubMed

    Ito, Y; Yamada, H; Nakamura, S; Imoto, T

    1990-02-01

    The lysozyme (rabbit kidney lysozyme) from the homogenate of rabbit kidney (Japanese white) was purified by repeated cation-exchange chromatography on Bio-Rex 70. The amino acid sequence was determined by automated gas-phase Edman degradation of the peptides obtained from the digestion of reduced and S-carboxymethylated rabbit lysozyme with Achromobacter protease I (lysyl endopeptidase). The sequence thus determined was KIYERCELARTLKKLGLDGYKGVSLANWMCLAKWESSYNTRATNYNPGDKSTDYGIFQ INSRYWCNDGKTPRAVNACHIPCSDLLKDDITQAVACAKRVVSDPQGIRAWVAWRNHCQ NQDLTPYIRGCGV, indicating 25 amino acid substitutions from human lysozyme. The lytic activity of rabbit lysozyme against Micrococcus lysodeikticus at pH 7, ionic strength of 0.1, and 30 degrees C was found to be 190 and 60% of those of hen and human lysozymes, respectively. The lytic activity-pH profile of rabbit lysozyme was slightly different from those of hen and human lysozymes. While hen and human lysozymes had wide optimum activities at around pH 5.5-8.5, the optimum activity of rabbit lysozyme was at around pH 5.5-7.0. The high proline content (five residues per molecule compared with two prolines per molecule in hen or human lysozyme) is one of the interesting features of rabbit lysozyme. The transition temperatures for the unfolding of rabbit, human, and hen lysozymes in 3 M guanidine hydrochloride at pH 5.5 were 51.2, 45.5, and 45.4 degrees C, respectively, indicating that rabbit lysozyme is stabler than the other two lysozymes. The high proline content may be responsible for the increased stability of rabbit lysozyme.

  1. Contrasting Sequence Groups by Emerging Sequences

    NASA Astrophysics Data System (ADS)

    Deng, Kang; Zaïane, Osmar R.

    Group comparison per se is a fundamental task in many scientific endeavours but is also the basis of any classifier. Contrast sets and emerging patterns contrast between groups of categorical data. Comparing groups of sequence data is a relevant task in many applications. We define Emerging Sequences (ESs) as subsequences that are frequent in sequences of one group and less frequent in the sequences of another, and thus distinguishing or contrasting sequences of different classes. There are two challenges to distinguish sequence classes: the extraction of ESs is not trivially efficient and only exact matches of sequences are considered. In our work we address those problems by a suffix tree-based framework and a similar matching mechanism. We propose a classifier based on Emerging Sequences. Evaluating against two learning algorithms based on frequent subsequences and exact matching subsequences, the experiments on two datasets show that our model outperforms the baseline approaches by up to 20% in prediction accuracy.

  2. Amino acid sequence, S-S bridge arrangement and distribution in plant tissues of thionins from Viscum album.

    PubMed

    Orrù, S; Scaloni, A; Giannattasio, M; Urech, K; Pucci, P; Schaller, G

    1997-09-01

    The complete primary structure of a cytotoxic 5 kDa polypeptide, viscotoxin A1, isolated from Viscum album L., has been determined by combining classical Edman degradation methodology with advanced mass spectrometric procedures. The same integrated approach allowed correction of the sequence of viscotoxin A2 and definition of the pattern of the disulfide bridges. The arrangement of the cysteine pairing was determined as Cys3-Cys40, Cys4-Cys32 and Cys16-Cys26. The primary structure of viscotoxin A1 shares a high degree of similarity with the known viscotoxins and more generally with the plant alpha- and beta-thionins. The pattern of S-S bridges determined for viscotoxin A2 and A1 is similar to that inferred by X-ray and NMR analysis in crambin and related to that present in alpha-purothionin and beta-hordothionin, thus indicating a highly conserved organization of the S-S pairings within the entire family. This arrangement of S-S bridges describes a peculiar structural motif, indicated as 'concentric motif', which is suggested to stabilize a common structure occurring in various small proteins able to interact with cell membranes. The distribution of the new variant toxin in different mistletoe subspecies was investigated. Viscotoxin A1 is abundant in the seeds of the three European subspecies of V. album whereas it represents a minor component in the shoots.

  3. Isolation, amino acid sequence and biological characterization of an "aspartic-49" phospholipase A₂ from Bothrops (Rhinocerophis) ammodytoides venom.

    PubMed

    Clement, Herlinda; Costa de Oliveira, Vanessa; Zamudio, Fernando Z; Lago, Néstor R; Valdez-Cruz, Norma A; Bérnard Valle, Melisa; Hajos, Silvia E; Alagón, Alejandro; Possani, Lourival D; de Roodt, Adolfo R

    2012-12-01

    A phospholipase enzyme was separated by chromatography from the venom of the snake Bothrops (Rhinocerophis) ammodytoides and characterized. The experimentally determined molecular weight was 13,853.65 Da, and the full primary structure was determined by Edman degradation and mass spectrometry analysis. The enzyme contains 122 amino acids residues closely stabilized by 7 disulfide bridges with an isoelectric point of 6.13. Sequence comparison with other known secretory PLA2 shows that the enzyme isolated belongs to the group II, presenting an aspartic acid residue at position 48 (numbered by convention as Asp49) of the active site, and accordingly displaying enzymatic activity. The enzyme corresponds to 3% of the total mass of the venom. The enzyme is mildly toxic to mice. The intravenous LD₅₀ of this phospholipase in CD-1 mice was around 6 μg/g of mouse body weight (more exactly 117 μg/mouse of 20 g) and the minimal mortal dose (MMD) was estimated to be close to 10 μg/g. In contrast, the LD₅₀ of the venom was circa 2 μg/g mouse body weight. Toxicological analyses of the purified enzyme were performed in vitro and in vivo using experimental animals (mice and rats). The enzyme at high doses caused pulmonary congestion, intraperitoneal bleeding, inhibition of clot retraction and muscle tissue alterations with increasing of creatine kinase levels.

  4. Purification and complete amino acid sequence of a new type of sweet protein taste-modifying activity, curculin.

    PubMed

    Yamashita, H; Theerasilp, S; Aiuchi, T; Nakaya, K; Nakamura, Y; Kurihara, Y

    1990-09-15

    A new taste-modifying protein named curculin was extracted with 0.5 M NaCl from the fruits of Curculigo latifolia and purified by ammonium sulfate fractionation, CM-Sepharose ion-exchange chromatography, and gel filtration. Purified curculin thus obtained gave a single band having a Mr of 12,000 on sodium dodecyl sulfate-polyacrylamide gel electrophoresis in the presence of 8 M urea. The molecular weight determined by low-angle laser light scattering was 27,800. These results suggest that native curculin is a dimer of a 12,000-Da polypeptide. The complete amino acid sequence of curculin was determined by automatic Edman degradation. Curculin consists of 114 residues. Curculin itself elicits a sweet taste. After curculin, water elicits a sweet taste, and sour substances induce a stronger sense of sweetness. No protein with both sweet-tasting and taste-modifying activities has ever been found. There are five sets of tripeptides common to miraculin (a taste-modifying protein), six sets of tripeptides common to thaumatin (a sweet protein), and two sets of tripeptides common to monellin (a sweet protein). Anti-miraculin serum was not immunologically reactive with curculin. The mechanism of the taste-modifying action of curculin is discussed.

  5. Hydrogen ion titration of 12 S rape seed protein and partial N-terminal sequence of one of it's subunits.

    PubMed

    Bhushan, R; Mahesh, V K; Mallikharjun, P V

    1989-10-01

    The high molecular weight 12 S protein from rape seed was isolated in a homogeneous form and characterized. Six subunits were isolated by PAGE in the presence of SDS and 0.2 M 2-mercaptoethanol. These subunits (s1 to s6) were found in the protein in the weight ratio of 1.32:1.2:1.15:1.0:1.21:1.11. The molecular weights and first two N-terminal amino acids of the isolated subunits were 64,800 and phenylalanine, alanine (s1), 50,650 and valine, tyrosine (s2), 42,500 and phenylalanine, leucine (s3), 28,800 and threonine, glutamic acid (s4), 19,100 and cystine, isoleucine (s5) and 15,600 and alanine, phenylalanine (s6). The number of side chain carboxyl, imidazole and epsilon-amino groups were calculated from the hydrogen ion titrations, which were in agreement with the amino acid assay. Besides, the N-terminal amino acid sequence upto 43 residues for one subunit (s6) is reported using Edman degradation.

  6. X-ray sequence and crystal structure of luffaculin 1, a novel type 1 ribosome-inactivating protein

    PubMed Central

    Hou, Xiaomin; Chen, Minghuang; Chen, Liqing; Meehan, Edward J; Xie, Jieming; Huang, Mingdong

    2007-01-01

    Background Protein sequence can be obtained through Edman degradation, mass spectrometry, or cDNA sequencing. High resolution X-ray crystallography can also be used to derive protein sequence information, but faces the difficulty in distinguishing the Asp/Asn, Glu/Gln, and Val/Thr pairs. Luffaculin 1 is a new type 1 ribosome-inactivating protein (RIP) isolated from the seeds of Luffa acutangula. Besides rRNA N-glycosidase activity, luffaculin 1 also demonstrates activities including inhibiting tumor cells' proliferation and inducing tumor cells' differentiation. Results The crystal structure of luffaculin 1 was determined at 1.4 Å resolution. Its amino-acid sequence was derived from this high resolution structure using the following criteria: 1) high resolution electron density; 2) comparison of electron density between two molecules that exist in the same crystal; 3) evaluation of the chemical environment of residues to break down the sequence assignment ambiguity in residue pairs Glu/Gln, Asp/Asn, and Val/Thr; 4) comparison with sequences of the homologous proteins. Using the criteria 1 and 2, 66% of the residues can be assigned. By incorporating with criterion 3, 86% of the residues were assigned, suggesting the effectiveness of chemical environment evaluation in breaking down residue ambiguity. In total, 94% of the luffaculin 1 sequence was assigned with high confidence using this improved X-ray sequencing strategy. Two N-acetylglucosamine moieties, linked respectively to the residues Asn77 and Asn84, can be identified in the structure. Residues Tyr70, Tyr110, Glu159 and Arg162 define the active site of luffaculin 1 as an RNA N-glycosidase. Conclusion X-ray sequencing method can be effective to derive sequence information of proteins. The evaluation of the chemical environment of residues is a useful method to break down the assignment ambiguity in Glu/Gln, Asp/Asn, and Val/Thr pairs. The sequence and the crystal structure confirm that luffaculin 1 is a new

  7. Shotgun protein sequencing.

    SciTech Connect

    Faulon, Jean-Loup Michel; Heffelfinger, Grant S.

    2009-06-01

    A novel experimental and computational technique based on multiple enzymatic digestion of a protein or protein mixture that reconstructs protein sequences from sequences of overlapping peptides is described in this SAND report. This approach, analogous to shotgun sequencing of DNA, is to be used to sequence alternative spliced proteins, to identify post-translational modifications, and to sequence genetically engineered proteins.

  8. The sequence of sequencers: The history of sequencing DNA

    PubMed Central

    Heather, James M.; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. PMID:26554401

  9. Multimodal sequence learning.

    PubMed

    Kemény, Ferenc; Meier, Beat

    2016-02-01

    While sequence learning research models complex phenomena, previous studies have mostly focused on unimodal sequences. The goal of the current experiment is to put implicit sequence learning into a multimodal context: to test whether it can operate across different modalities. We used the Task Sequence Learning paradigm to test whether sequence learning varies across modalities, and whether participants are able to learn multimodal sequences. Our results show that implicit sequence learning is very similar regardless of the source modality. However, the presence of correlated task and response sequences was required for learning to take place. The experiment provides new evidence for implicit sequence learning of abstract conceptual representations. In general, the results suggest that correlated sequences are necessary for implicit sequence learning to occur. Moreover, they show that elements from different modalities can be automatically integrated into one unitary multimodal sequence.

  10. Whole Genome Sequencing

    MedlinePlus

    ... you want to learn. Search form Search Whole Genome Sequencing You are here Home Testing & Services Testing ... the full story, click here . What is whole genome sequencing? Whole genome sequencing is the mapping out ...

  11. Complete amino acid sequence of an acidic, cardiotoxic phospholipase A2 from the venom of Ophiophagus hannah (King Cobra): a novel cobra venom enzyme with "pancreatic loop".

    PubMed

    Huang, M Z; Gopalakrishnakone, P; Chung, M C; Kini, R M

    1997-02-15

    A phospholipase A2 (OHV A-PLA2) from the venom of Ophiophagus hannah (King cobra) is an acidic protein exhibiting cardiotoxicity, myotoxicity, and antiplatelet activity. The complete amino acid sequence of OHV A-PLA2 has been determined using a combination of Edman degradation and mass spectrometric techniques. OHV A-PLA2 is composed of a single chain of 124 amino acid residues with 14 cysteines and a calculated molecular weight of 13719 Da. It contains the loop of residues (62-66) found in pancreatic PLA2s and hence belongs to class IB enzymes. This pancreatic loop is between two proline residues (Pro 59 and Pro 68) and contains several hydrophilic amino acids (Ser and Asp). This region has high degree of conformational flexibility and is on the surface of the molecule, and hence it may be a potential protein-protein interaction site. A relatively low sequence homology is found between OHV A-PLA2 and other known cardiotoxic PLA2s, and hence a contiguous segment could not be identified as a site responsible for the cardiotoxic activity.

  12. Isolation, amino acid sequence and biological activities of novel long-chain polyamine-associated peptide toxins from the sponge Axinyssa aculeata.

    PubMed

    Matsunaga, Satoko; Jimbo, Mitsuru; Gill, Martin B; Wyhe, L Leanne Lash-Van; Murata, Michio; Nonomura, Ken'ichi; Swanson, Geoffrey T; Sakai, Ryuichi

    2011-09-19

    A novel family of functionalized peptide toxins, aculeines (ACUs), was isolated from the marine sponge Axinyssa aculeate. ACUs are polypeptides with N-terminal residues that are modified by the addition of long-chain polyamines (LCPA). Aculeines were present in the sponge extract as a complex mixture with differing polyamine chain lengths and peptide structures. ACU-A and B, which were purified in this study, share a common polypeptide chain but differ in their N-terminal residue modifications. The amino acid sequence of the polypeptide portion of ACU-A and B was deduced from 3' and 5' RACE, and supported by Edman degradation and mass spectral analysis of peptide fragments. ACU induced convulsions upon intracerebroventricular (i.c.v.) injection in mice, and disrupted neuronal membrane integrity in electrophysiological assays. ACU also lysed erythrocytes with a potency that differed between animal species. Here we describe the isolation, amino acid sequence, and biological activity of this new group of cytotoxic sponge peptides. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. De novo sequencing and characterization of a novel Bowman-Birk inhibitor from Lathyrus sativus L. seeds by electrospray mass spectrometry.

    PubMed

    Tamburino, Rachele; Severino, Valeria; Sandomenico, Annamaria; Ruvo, Menotti; Parente, Augusto; Chambery, Angela; Di Maro, Antimo

    2012-10-30

    Bowman-Birk serine protease inhibitors (BBIs) from legume seeds are small proteins showing a two-head structure with distinct reactive site loops, which inhibit two molecules of the same enzyme or two different proteases. Purification and characterization of new BBIs is of broad interest for understanding the basic molecular mechanisms underlying natural defence against the action of proteolytic enzymes. In this study, two novel acidic BBIs (LSI-1a and LSI-2a) were isolated from L. sativus seeds using classical biochemical techniques and characterized for their inhibitory activity. In addition, the N-terminal sequencing of LSI-1a was performed by Edman degradation up to residue 10 and the complete primary structure of the most abundant form (LSI-2a) was determined by using a combination of mass spectrometry approaches, including MALDI-TOF MS, tandem MS and Electron Transfer Dissociation coupled with Proton Transfer Reaction (ETD/PTR) top-down sequencing of N- and C-termini. Furthermore, the LSI-2a dimerization surface has also been investigated by a combination of gel filtration, electrophoretic techniques and homology modelling. Knowing the structure of small proteins inhibiting proteolytic enzymes is of general importance for understanding the defence mechanisms against degradation for their use in biological applications as well as for designing artificial inhibitors.

  14. Purification, Characterization, and Gene Sequence of Michiganin A, an Actagardine-Like Lantibiotic Produced by the Tomato Pathogen Clavibacter michiganensis subsp. michiganensis

    PubMed Central

    Holtsmark, I.; Mantzilas, D.; Eijsink, V. G. H.; Brurberg, M. B.

    2006-01-01

    Members of the actinomycete genus Clavibacter are known to produce antimicrobial compounds, but so far none of these compounds has been purified and characterized. We have isolated an antimicrobial peptide, michiganin A, from the tomato pathogen Clavibacter michiganensis subsp. michiganensis, using ammonium sulfate precipitation followed by cation-exchange and reversed-phase chromatography steps. Upon chemical derivatization of putative dehydrated amino acids and lanthionine bridges by alkaline ethanethiol, Edman degradation yielded sequence information that proved to be sufficient for cloning of the gene by a genome-walking strategy. The mature unmodified peptide consists of 21 amino acids, SSSGWLCTLTIECGTIICACR. All of the threonine residues undergo dehydration, and three of them interact with cysteines via thioether bonds to form methyllanthionine bridges. Michiganin A resembles actagardine, a type B lantibiotic with a known three-dimensional structure, produced by Actinoplanes liguriae, which is a filamentous actinomycete. The DNA sequence of the gene showed that the michiganin A precursor contains an unusual putative signal peptide with no similarity to well-known secretion signals and only very limited similarity to the (only two) available leader peptides of other type B lantibiotics. Michiganin A inhibits the growth of Clavibacter michiganensis subsp. sepedonicus, the causal agent of ring rot of potatoes, with MICs in the low nanomolar range. Thus, michiganin A may have some potential in biological control of potato ring rot. PMID:16957199

  15. Purification, characterization, and gene sequence of michiganin A, an actagardine-like lantibiotic produced by the tomato pathogen Clavibacter michiganensis subsp. michiganensis.

    PubMed

    Holtsmark, I; Mantzilas, D; Eijsink, V G H; Brurberg, M B

    2006-09-01

    Members of the actinomycete genus Clavibacter are known to produce antimicrobial compounds, but so far none of these compounds has been purified and characterized. We have isolated an antimicrobial peptide, michiganin A, from the tomato pathogen Clavibacter michiganensis subsp. michiganensis, using ammonium sulfate precipitation followed by cation-exchange and reversed-phase chromatography steps. Upon chemical derivatization of putative dehydrated amino acids and lanthionine bridges by alkaline ethanethiol, Edman degradation yielded sequence information that proved to be sufficient for cloning of the gene by a genome-walking strategy. The mature unmodified peptide consists of 21 amino acids, SSSGWLCTLTIECGTIICACR. All of the threonine residues undergo dehydration, and three of them interact with cysteines via thioether bonds to form methyllanthionine bridges. Michiganin A resembles actagardine, a type B lantibiotic with a known three-dimensional structure, produced by Actinoplanes liguriae, which is a filamentous actinomycete. The DNA sequence of the gene showed that the michiganin A precursor contains an unusual putative signal peptide with no similarity to well-known secretion signals and only very limited similarity to the (only two) available leader peptides of other type B lantibiotics. Michiganin A inhibits the growth of Clavibacter michiganensis subsp. sepedonicus, the causal agent of ring rot of potatoes, with MICs in the low nanomolar range. Thus, michiganin A may have some potential in biological control of potato ring rot.

  16. Isolation, Amino Acid Sequence and Biological Activities of Novel Long-Chain Polyamine-Associated Peptide Toxins from the Sponge Axinyssa aculeata

    PubMed Central

    Matsunaga, Satoko; Jimbo, Mitsuru; Gill, Martin B.; Lash-Van Wyhe, L. Leanne; Murata, Michio; Nonomura, Ken’ichi; Swanson, Geoffrey T.

    2012-01-01

    A novel family of functionalized peptide toxins, aculeines (ACUs), was isolated from the marine sponge Axinyssa aculeate. ACUs are polypeptides with N-terminal residues that are modified by the addition of long-chain polyamines (LCPA). Aculeines were present in the sponge extract as a complex mixture with differing polyamine chain lengths and peptide structures. ACU-A and B, which were purified in this study, share a common polypeptide chain but differ in their N-terminal residue modifications. The amino acid sequence of the polypeptide portion of ACU-A and B was deduced from 3′ and 5′ RACE, and supported by Edman degradation and mass spectral analysis of peptide fragments. ACU induced convulsions upon intracerebroventricular (i.c.v.) injection in mice, and disrupted neuronal membrane integrity in electrophysiological assays. ACU also lysed erythrocytes with a potency that differed between animal species. Here we describe the isolation, amino acid sequence, and biological activity of this new group of cytotoxic sponge peptides. PMID:21830292

  17. Coordinate cytokine regulatory sequences

    DOEpatents

    Frazer, Kelly A.; Rubin, Edward M.; Loots, Gabriela G.

    2005-05-10

    The present invention provides CNS sequences that regulate the cytokine gene expression, expression cassettes and vectors comprising or lacking the CNS sequences, host cells and non-human transgenic animals comprising the CNS sequences or lacking the CNS sequences. The present invention also provides methods for identifying compounds that modulate the functions of CNS sequences as well as methods for diagnosing defects in the CNS sequences of patients.

  18. Amino acid sequence and chemical modification of a novel alpha-neurotoxin (Oh-5) from king cobra (Ophiophagus hannah) venom.

    PubMed

    Lin, S R; Leu, L F; Chang, L S; Chang, C C

    1997-04-01

    A novel alpha-neurotoxin, Oh-5, was isolated from king cobra (Ophiophagus hannah) venom and purified by successive SP-Sephadex C-25 column chromatography and reversed-phase HPLC. The complete sequence of Oh-5 was determined by Edman degradation of peptide fragments generated by endopeptidases, i.e., trypsin, Saccharomyces aureus V8 protease and lysyl endopeptidase. This novel toxin comprises 72 amino acid residues with 10 cysteines. The sequence shows 89% sequence homology with Oh-4, and 60% with Toxins a and b from the same venom. The tyrosine, tryptophan, lysine and arginine residues in Oh-5 were modified with tetranitromethane (TNM), 2-nitrophenylsulfenyl (NPS) chloride, trinitrobenzene sulfonate (TNBS), and p-hydroxyphenylglyoxal (HPG), respectively. Modification of Tyr-4 or Trp-27 did not affect the lethal toxicity at all, while the Tyr-4 and 23 nitrated derivative retained about 50% of the lethality of native toxin. Selective trinitrophenylation of Lys-51 or 69 resulted in a decrease in lethality by 29%, and 50% lethality was retained after modification of Lys-2, 51, and 69. A drastic decrease in lethality to 26% was observed when both Arg-35 and 37 were modified. The neurotoxicity was further decreased when Arg-9 was additionally modified. These results suggest that the aromatic residues, Tyr-4 and Trp-27, are not crucial for the neurotoxicity, whereas the cationic residues are involved in multipoint contact between the toxin molecule and the nicotinic acetylcholine receptor (nAChR). The residues Tyr-23 and Arg-35 and 37 in the central loop of Oh-5 seem to contribute greatly to the neurotoxicity.

  19. MRO Sequence Checking Tool

    NASA Technical Reports Server (NTRS)

    Fisher, Forest; Gladden, Roy; Khanampornpan, Teerapat

    2008-01-01

    The MRO Sequence Checking Tool program, mro_check, automates significant portions of the MRO (Mars Reconnaissance Orbiter) sequence checking procedure. Though MRO has similar checks to the ODY s (Mars Odyssey) Mega Check tool, the checks needed for MRO are unique to the MRO spacecraft. The MRO sequence checking tool automates the majority of the sequence validation procedure and check lists that are used to validate the sequences generated by MRO MPST (mission planning and sequencing team). The tool performs more than 50 different checks on the sequence. The automation varies from summarizing data about the sequence needed for visual verification of the sequence, to performing automated checks on the sequence and providing a report for each step. To allow for the addition of new checks as needed, this tool is built in a modular fashion.

  20. Fine structural analysis of the Zoogloea ramigera phbA-phbB locus encoding beta-ketothiolase and acetoacetyl-CoA reductase: nucleotide sequence of phbB.

    PubMed

    Peoples, O P; Sinskey, A J

    1989-03-01

    A series of expression plasmids containing either the complete insert from plasmid pUCDBK1 (Peoples et al., 1987) or sub-fragments thereof were constructed in a tac promoter vector. Analysis of protein lysates of induced cultures of these clones identified the gene encoding NADPH-specific acetoacetyl-CoA reductase in the 2.3kb of sequence located downstream from the beta-ketothiolase gene in plasmid pUCDBK1. The complete nucleotide sequence (2.1kb) of this region was determined. An open reading frame was located 88bp downstream from the stop codon of the thiolase gene encoding a potential polypeptide of Mr 25,000, which is in good agreement with that observed for the overexpressed protein on SDS-PAGE. N-terminal protein sequence data obtained by Edman degradation of the purified Mr = 25,000 polypeptide were used to identify the correct start of the NADPH-specific acetoacetyl-CoA reductase gene. Hence in Z. ramigera, the genes encoding beta-ketothiolase (phbA) and NADPH-specific acetoacetyl-CoA reductase (phbB) are organized as phbA-phbB. S1-nuclease analysis of Z. ramigera RNA identified a transcription start site 85 bp upstream from the phbA structural gene locating the promoter region.

  1. The Connell Sum Sequence

    NASA Astrophysics Data System (ADS)

    Bullington, Grady D.

    2007-01-01

    The Connell sum sequence refers to the partial sums of the Connell sequence. In this paper, the Connell sequence, Connell sum sequence and generalizations from Iannucci and Mills-Taylor are interpreted as sums of elements of triangles, relating them to polygonal number-stuttered arithmetic progressions. The n-th element of the Connell sum sequence is established as a sharp upper bound for the value of a gamma-labeling of a graph of size n. The limiting behavior and a explicit formula for the Connell (m,r)-sum sequence are also given.

  2. Genome Sequence Databases (Overview): Sequencing and Assembly

    SciTech Connect

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  3. Automated DNA Sequencing System

    SciTech Connect

    Armstrong, G.A.; Ekkebus, C.P.; Hauser, L.J.; Kress, R.L.; Mural, R.J.

    1999-04-25

    Oak Ridge National Laboratory (ORNL) is developing a core DNA sequencing facility to support biological research endeavors at ORNL and to conduct basic sequencing automation research. This facility is novel because its development is based on existing standard biology laboratory equipment; thus, the development process is of interest to the many small laboratories trying to use automation to control costs and increase throughput. Before automation, biology Laboratory personnel purified DNA, completed cycle sequencing, and prepared 96-well sample plates with commercially available hardware designed specifically for each step in the process. Following purification and thermal cycling, an automated sequencing machine was used for the sequencing. A technician handled all movement of the 96-well sample plates between machines. To automate the process, ORNL is adding a CRS Robotics A- 465 arm, ABI 377 sequencing machine, automated centrifuge, automated refrigerator, and possibly an automated SpeedVac. The entire system will be integrated with one central controller that will direct each machine and the robot. The goal of this system is to completely automate the sequencing procedure from bacterial cell samples through ready-to-be-sequenced DNA and ultimately to completed sequence. The system will be flexible and will accommodate different chemistries than existing automated sequencing lines. The system will be expanded in the future to include colony picking and/or actual sequencing. This discrete event, DNA sequencing system will demonstrate that smaller sequencing labs can achieve cost-effective the laboratory grow.

  4. Sequence information signal processor

    NASA Technical Reports Server (NTRS)

    Peterson, John C. (Inventor); Chow, Edward T. (Inventor); Waterman, Michael S. (Inventor); Hunkapillar, Timothy J. (Inventor)

    1999-01-01

    An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.

  5. Sequence information signal processor

    DOEpatents

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1999-01-01

    An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.

  6. Roles of repetitive sequences

    SciTech Connect

    Bell, G.I.

    1991-12-31

    The DNA of higher eukaryotes contains many repetitive sequences. The study of repetitive sequences is important, not only because many have important biological function, but also because they provide information on genome organization, evolution and dynamics. In this paper, I will first discuss some generic effects that repetitive sequences will have upon genome dynamics and evolution. In particular, it will be shown that repetitive sequences foster recombination among, and turnover of, the elements of a genome. I will then consider some examples of repetitive sequences, notably minisatellite sequences and telomere sequences as examples of tandem repeats, without and with respectively known function, and Alu sequences as an example of interspersed repeats. Some other examples will also be considered in less detail.

  7. Nonparametric Combinatorial Sequence Models

    NASA Astrophysics Data System (ADS)

    Wauthier, Fabian L.; Jordan, Michael I.; Jojic, Nebojsa

    This work considers biological sequences that exhibit combinatorial structures in their composition: groups of positions of the aligned sequences are "linked" and covary as one unit across sequences. If multiple such groups exist, complex interactions can emerge between them. Sequences of this kind arise frequently in biology but methodologies for analyzing them are still being developed. This paper presents a nonparametric prior on sequences which allows combinatorial structures to emerge and which induces a posterior distribution over factorized sequence representations. We carry out experiments on three sequence datasets which indicate that combinatorial structures are indeed present and that combinatorial sequence models can more succinctly describe them than simpler mixture models. We conclude with an application to MHC binding prediction which highlights the utility of the posterior distribution induced by the prior. By integrating out the posterior our method compares favorably to leading binding predictors.

  8. Cellulases and coding sequences

    SciTech Connect

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-01-01

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  9. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-02-20

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  10. DNA sequencing conference, 2

    SciTech Connect

    Cook-Deegan, R.M.; Venter, J.C.; Gilbert, W.; Mulligan, J.; Mansfield, B.K.

    1991-06-19

    This conference focused on DNA sequencing, genetic linkage mapping, physical mapping, informatics and bioethics. Several were used to study this sequencing and mapping. This article also discusses computer hardware and software aiding in the mapping of genes.

  11. Schur monotone decreasing sequences

    NASA Astrophysics Data System (ADS)

    Ganikhodjaev, Rasul; Saburov, Mansoor; Saburov, Khikmat

    2013-09-01

    In this paper, we introduce Schur monotone decreasing sequences in an n-dimensional space by considering a majorization pre-order. By means of down arrow mappings, we study omega limiting points of bounded Schur monotone decreasing sequences. We provide convergence criteria for such kinds of sequences. We prove that a Cesaro mean (or an arithmetic mean) of any bounded Schur monotone decreasing sequences converges to a unique limiting point.

  12. Homology of the NH2-terminal amino acid sequences of the heavy and light chains of human monoclonal lupus autoantibodies containing the dominant 16/6 idiotype.

    PubMed Central

    Atkinson, P M; Lampman, G W; Furie, B C; Naparstek, Y; Schwartz, R S; Stollar, B D; Furie, B

    1985-01-01

    The NH2-terminal amino acid sequences have been determined by automated Edman degradation for the heavy and light chains of five monoclonal IgM anti-DNA autoantibodies that were produced by human-human hybridomas derived from lymphocytes of two patients with systemic lupus erythematosus. Four of the antibodies were closely related to the idiotype system 16/6, whereas the fifth antibody was unrelated idiotypically. The light chains of the 16/6 idiotype-positive autoantibodies (HF2-1/13b, HF2-1/17, HF2-18/2, and HF3-16/6) had identical amino acid sequences from residues 1 to 40. Their framework structures were characteristic of VKI light chains. The light chain of the 16/6 idiotype-negative autoantibody HF6-21/28 was characteristic of the VKII subgroup. The heavy chains of the 16/6 idiotype-positive autoantibodies had nearly identical amino acid sequences from residues 1 to 40. The framework structures were characteristic of the VHIII subgroup. In contrast, the GM4672 fusion partner of the hybridoma produced small quantities of an IgG with a VHI heavy chain and a VKI light chain. The heavy chains of the lupus autoantibodies and the light chains of those autoantibodies that were idiotypically related to the 16/6 system had marked sequence homology with WEA, a Waldenstrom IgM that binds to Klebsiella polysaccharides and expresses the 16/6 idiotype. These results indicate a striking homology in the amino termini of the heavy and light chains of the lupus autoantibodies studied and suggest that the V regions of the heavy and light chains of the 16/6 idiotype-positive DNA-binding lupus auto-antibodies are each encoded by a single germ line gene. PMID:3921567

  13. Enhanced virome sequencing using targeted sequence capture.

    PubMed

    Wylie, Todd N; Wylie, Kristine M; Herter, Brandi N; Storch, Gregory A

    2015-12-01

    Metagenomic shotgun sequencing (MSS) is an important tool for characterizing viral populations. It is culture independent, requires no a priori knowledge of the viruses in the sample, and may provide useful genomic information. However, MSS can lack sensitivity and may yield insufficient data for detailed analysis. We have created a targeted sequence capture panel, ViroCap, designed to enrich nucleic acid from DNA and RNA viruses from 34 families that infect vertebrate hosts. A computational approach condensed ∼1 billion bp of viral reference sequence into <200 million bp of unique, representative sequence suitable for targeted sequence capture. We compared the effectiveness of detecting viruses in standard MSS versus MSS following targeted sequence capture. First, we analyzed two sets of samples, one derived from samples submitted to a diagnostic virology laboratory and one derived from samples collected in a study of fever in children. We detected 14 and 18 viruses in the two sets, comprising 19 genera from 10 families, with dramatic enhancement of genome representation following capture enrichment. The median fold-increases in percentage viral reads post-capture were 674 and 296. Median breadth of coverage increased from 2.1% to 83.2% post-capture in the first set and from 2.0% to 75.6% in the second set. Next, we analyzed samples containing a set of diverse anellovirus sequences and demonstrated that ViroCap could be used to detect viral sequences with up to 58% variation from the references used to select capture probes. ViroCap substantially enhances MSS for a comprehensive set of viruses and has utility for research and clinical applications. © 2015 Wylie et al.; Published by Cold Spring Harbor Laboratory Press.

  14. Enhanced virome sequencing using targeted sequence capture

    PubMed Central

    Wylie, Todd N.; Wylie, Kristine M.; Herter, Brandi N.; Storch, Gregory A.

    2015-01-01

    Metagenomic shotgun sequencing (MSS) is an important tool for characterizing viral populations. It is culture independent, requires no a priori knowledge of the viruses in the sample, and may provide useful genomic information. However, MSS can lack sensitivity and may yield insufficient data for detailed analysis. We have created a targeted sequence capture panel, ViroCap, designed to enrich nucleic acid from DNA and RNA viruses from 34 families that infect vertebrate hosts. A computational approach condensed ∼1 billion bp of viral reference sequence into <200 million bp of unique, representative sequence suitable for targeted sequence capture. We compared the effectiveness of detecting viruses in standard MSS versus MSS following targeted sequence capture. First, we analyzed two sets of samples, one derived from samples submitted to a diagnostic virology laboratory and one derived from samples collected in a study of fever in children. We detected 14 and 18 viruses in the two sets, comprising 19 genera from 10 families, with dramatic enhancement of genome representation following capture enrichment. The median fold-increases in percentage viral reads post-capture were 674 and 296. Median breadth of coverage increased from 2.1% to 83.2% post-capture in the first set and from 2.0% to 75.6% in the second set. Next, we analyzed samples containing a set of diverse anellovirus sequences and demonstrated that ViroCap could be used to detect viral sequences with up to 58% variation from the references used to select capture probes. ViroCap substantially enhances MSS for a comprehensive set of viruses and has utility for research and clinical applications. PMID:26395152

  15. Career Academy Course Sequences.

    ERIC Educational Resources Information Center

    Markham, Thom; Lenz, Robert

    This career academy course sequence guide is designed to give teachers a quick overview of the course sequences of well-known career academy and career pathway programs from across the country. The guide presents a variety of sample course sequences for the following academy themes: (1) arts and communication; (2) business and finance; (3)…

  16. Uncorrectable sequences and telecommand

    NASA Technical Reports Server (NTRS)

    Ekroot, Laura; Mceliece, R.; Dolinar, S.; Swanson, L.

    1993-01-01

    The purpose of a tail sequence for command link transmission units is to fail to decode, so that the command decoder will begin searching for the start of the next unit. A tail sequence used by several missions and recommended for this purpose by the Consultative Committee on Space Data Standards is analyzed. A single channel error can cause the sequence to decode. An alternative sequence requiring at least two channel errors before it can possibly decode is presented. (No sequence requiring more than two channel errors before it can possibly decode exists for this code.)

  17. [Sequencing - classical method].

    PubMed

    Sedivcová, Monika; Martínek, Petr; Stehlík, Jan; Grossmann, Petr; Kašpírková, Jana; Vaneček, Tomáš

    2013-06-01

    In this article the basic methods of reading nucleotide sequences in DNA molecules are summarized. Sanger sequencing is described most thoroughly as it is the most frequent routine method currently being utilized. The article describes in detail the principle of sequence determination through the production of fragments with a known end base using chain termination synthesis of DNA and ways of separation and detection of the fragments. Some alternative methods of sequencing are mentioned in short. Basic approaches of analyzing sequence data are explained as well as different outcomes, obstacles and challenges.

  18. Low autocorrelation binary sequences

    NASA Astrophysics Data System (ADS)

    Packebusch, Tom; Mertens, Stephan

    2016-04-01

    Binary sequences with minimal autocorrelations have applications in communication engineering, mathematics and computer science. In statistical physics they appear as groundstates of the Bernasconi model. Finding these sequences is a notoriously hard problem, that so far can be solved only by exhaustive search. We review recent algorithms and present a new algorithm that finds optimal sequences of length N in time O(N {1.73}N). We computed all optimal sequences for N≤slant 66 and all optimal skewsymmetric sequences for N≤slant 119.

  19. HIV Sequence Databases

    PubMed Central

    Kuiken, Carla; Korber, Bette; Shafer, Robert W.

    2008-01-01

    Two important databases are often used in HIV genetic research, the HIV Sequence Database in Los Alamos, which collects all sequences and focuses on annotation and data analysis, and the HIV RT/Protease Sequence Database in Stanford, which collects sequences associated with the development of viral resistance against anti-retroviral drugs and focuses on analysis of those sequences. The types of data and services these two databases offer, the tools they provide, and the way they are set up and operated are described in detail. PMID:12875108

  20. Multiple Sequence Alignment.

    PubMed

    Bawono, Punto; Dijkstra, Maurits; Pirovano, Walter; Feenstra, Anton; Abeln, Sanne; Heringa, Jaap

    2017-01-01

    The increasing importance of Next Generation Sequencing (NGS) techniques has highlighted the key role of multiple sequence alignment (MSA) in comparative structure and function analysis of biological sequences. MSA often leads to fundamental biological insight into sequence-structure-function relationships of nucleotide or protein sequence families. Significant advances have been achieved in this field, and many useful tools have been developed for constructing alignments, although many biological and methodological issues are still open. This chapter first provides some background information and considerations associated with MSA techniques, concentrating on the alignment of protein sequences. Then, a practical overview of currently available methods and a description of their specific advantages and limitations are given, to serve as a helpful guide or starting point for researchers who aim to construct a reliable MSA.

  1. Characterization of ostrich (Struthio camelus) β-microseminoprotein (MSP): Ideication of homologous sequences in EST databases and analysis of their evolution during speciation

    PubMed Central

    Lazure, Claude; Villemure, Michéle; Gauthier, Dany; Naudé, Ryno J.; Mbikay, Majambu

    2001-01-01

    β-Microseminoprotein, alternatively called prostatic secretory protein of 94 amino acids, is a hydrophilic, unglycosylated, small protein rich in conserved half-cystine residues. Originally found in human seminal plasma and prostatic fluids, its presence was later shown in numerous secretions and its homologs were described in many vertebrate species. These studies showed that this protein had rapidly evolved, but they failed to unambiguously idey its biological role. Here, we show that a protein isolated from ostrich pituitary gland is closely related to a similar one isolated from chicken serum and that the two are structurally related to the mammalian β-microseminoprotein. The complete 90–amino acid sequence of the ostrich molecule was established through a combination of automated Edman degradation and matrix-assisted laser desorption ionization–time of flight (MALDI-TOF) mass spectrometric procedures, including postsource decay (PSD) and ladder sequencing analyses. This study documents for the first time that β-microseminoprotein is present in aves. It is also the first report of a C-terminal amidated form for a member of this protein family and the first in which the disulfide linkages are established. Database searches using the herein-described amino acid sequence allowed ideication of related proteins in numerous species such as cow, African clawed frog, zebrafish, and Japanese flounder. These small proteins show a strikingly high rate of amino acid substitutions, especially across phyla boundaries. Noticeably, no β-microseminoprotein–related gene could be found in the recently completed fruit fly genome, indicating that if such a gene exists in arthropods, it must have extensively diverged from the vertebrate ones. PMID:11604528

  2. Purification and N-terminal amino acid sequence comparisons of structural proteins from retrovirus-D/Washington and Mason-Pfizer monkey virus.

    PubMed Central

    Henderson, L E; Sowder, R; Smythers, G; Benveniste, R E; Oroszlan, S

    1985-01-01

    A new D-type retrovirus originally designated SAIDS-D/Washington and here referred to as retrovirus-D/Washington (R-D/W) was recently isolated at the University of Washington Primate Center, Seattle, Wash., from a rhesus monkey with an acquired immunodeficiency syndrome and retroperitoneal fibromatosis. To better establish the relationship of this new D-type virus to the prototype D-type virus, Mason-Pfizer monkey virus (MPMV), we have purified and compared six structural proteins from each virus. The proteins purified from each D-type retrovirus include p4, p10, p12, p14, p27, and a phosphoprotein designated pp18 for MPMV and pp20 for R-D/W. Amino acid analysis and N-terminal amino acid sequence analysis show that the p4, p12, p14, and p27 proteins of R-D/W are distinct from the homologous proteins of MPMV but that these proteins from the two different viruses share a high degree of amino acid sequence homology. The p10 proteins from the two viruses have similar amino acid compositions, and both are blocked to N-terminal Edman degradation. The phosphoproteins from the two viruses each contain phosphoserine but are different from each other in amino acid composition, molecular weight, and N-terminal amino acid sequence. The data thus show that each of the R-D/W proteins examined is distinguishable from its MPMV homolog and that a major difference between these two D-type retroviruses is found in the viral phosphoproteins. The N-terminal amino acid sequences of D-type retroviral proteins were used to search for sequence homologies between D-type and other retroviral amino acid sequences. An unexpected amino acid sequence homology was found between R-D/W pp20 (a gag protein) and a 28-residue segment of the env precursor polyprotein of Rous sarcoma virus. The N-terminal amino acid sequences of the D-type major gag protein (p27) and the nucleic acid-binding protein (p14) show only limited amino acid sequence homology to functionally homologous proteins of C

  3. Computer assisted multiplex sequencing

    SciTech Connect

    Church, G.M.

    1992-08-01

    The objectives of this project are automation and optimization of multiplex sequencing. This year we have integrated direct transfer electrophoresis, automated multiplex hybridizations and automated film reading and applied this toward sequencing of three contiguous E. coli cosmids. Primers for the directed dideoxy sequence walking and sequence confirmation steps were synthesized with a 15 base tag complimentary to an alkaline phosphatase conjugate. A higher throughput synthesis device is well along in testing as are new automated hybridization devices. We have developed software for automatically annotating ORFs and databases of precise termini of proteis and RNA.

  4. Cosmetology: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a cosmetology vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  5. Sequences for Student Investigation

    ERIC Educational Resources Information Center

    Barton, Jeffrey; Feil, David; Lartigue, David; Mullins, Bernadette

    2004-01-01

    We describe two classes of sequences that give rise to accessible problems for undergraduate research. These problems may be understood with virtually no prerequisites and are well suited for computer-aided investigation. The first sequence is a variation of one introduced by Stephen Wolfram in connection with his study of cellular automata. The…

  6. Agriculture: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 3-year program in agriculture. The guide consists of a course description; general course objectives;…

  7. Sequences, Series, and Mathematica.

    ERIC Educational Resources Information Center

    Mathews, John H.

    1992-01-01

    Describes how the computer algebra system Mathematica can be used to enhance the teaching of the topics of sequences and series. Examines its capabilities to find exact, approximate, and graphically generated approximate solutions to problems from these topics and to understand proofs about sequences. (MDH)

  8. Lichenase and coding sequences

    SciTech Connect

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2000-08-15

    The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.

  9. Cosmetology: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a cosmetology vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  10. Can sequence determine function?

    PubMed Central

    Gerlt, John A; Babbitt, Patricia C

    2000-01-01

    The functional annotation of proteins identified in genome sequencing projects is based on similarities to homologs in the databases. As a result of the possible strategies for divergent evolution, homologous enzymes frequently do not catalyze the same reaction, and we conclude that assignment of function from sequence information alone should be viewed with some skepticism. PMID:11178260

  11. Biotools: Patenting DNA sequences

    SciTech Connect

    Yablonsky, M.D.; Hone, W.J.

    1995-07-01

    The decision, known as In re Deuel{sup 2}, rejects the PTO`s interpretation of a previous decision of the Federal Circuit and makes it more possible that a {open_quotes}nucleic acid of a particular sequence{close_quotes} - commonly known as a gene sequence - may be patentable. 15 refs.

  12. Agriculture: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 3-year program in agriculture. The guide consists of a course description; general course objectives;…

  13. Sequence History Update Tool

    NASA Technical Reports Server (NTRS)

    Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

    2008-01-01

    The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.

  14. Sequencing the maize genome.

    PubMed

    Martienssen, Robert A; Rabinowicz, Pablo D; O'Shaughnessy, Andrew; McCombie, W Richard

    2004-04-01

    Sequencing of complex genomes can be accomplished by enriching shotgun libraries for genes. In maize, gene-enrichment by copy-number normalization (high C(0)t) and methylation filtration (MF) have been used to generate up to two-fold coverage of the gene-space with less than 1 million sequencing reads. Simulations using sequenced bacterial artificial chromosome (BAC) clones predict that 5x coverage of gene-rich regions, accompanied by less than 1x coverage of subclones from BAC contigs, will generate high-quality mapped sequence that meets the needs of geneticists while accommodating unusually high levels of structural polymorphism. By sequencing several inbred strains, we propose a strategy for capturing this polymorphism to investigate hybrid vigor or heterosis.

  15. Amino acid sequence and posttranslational modifications of human factor VII sub a from plasma and transfected baby hamster kidney cells

    SciTech Connect

    Thim, L.; Bjoern, S.; Christensen, M.; Nicolaisen, E.M.; Lund-Hansen, T.; Pedersen, A.H.; Hedner, U. )

    1988-10-04

    Blood coagulation factor VII is a vitamin K dependent glycoprotein which in its activated form, factor VII{sub a}, participates in the coagulation process by activating factor X and/or factor IX in the presence of Ca{sup 2+} and tissue factor. Three types of potential posttranslational modifications exist in the human factor VII{sub a} molecule, namely, 10 {gamma}-carboxylated, N-terminally located glutamic acid residues, 1 {beta}-hydroxylated aspartic acid residue, and 2 N-glycosylated asparagine residues. In the present study, the amino acid sequence and posttranslational modifications of recombinant factor VII{sub a} as purified from the culture medium of a transfected baby hamster kidney cell line have been compared to human plasma factor VII{sub a}. By use of HPLC, amino acid analysis, peptide mapping, and automated Edman degradation, the protein backbone of recombinant factor VII{sub a} was found to be identical with human factor VII{sub a}. Asparagine residues 145 and 322 were found to be fully N-glycosylated in human plasma factor VII{sub a}. In the recombinant factor VII{sub a}, asparagine residue 322 was fully glycosylated whereas asparagine residue 145 was only partially (approximately 66%) glycosylated. Besides minor differences in the sialic acid and fucose contents, the overall carbohydrate compositions were nearly identical in recombinant factor VII{sub a} and human plasma factor VII{sub a}. These results show that factor VII{sub a} as produced in the transfected baby hamster kidney cells is very similar to human plasma factor VII{sub a} and that this cell line thus might represent an alternative source for human factor VII{sub a}.

  16. cDNA and deduced amino acid sequence of human pulmonary surfactant-associated proteolipid SPL(Phe)

    SciTech Connect

    Glasser, S.W.; Korfhagen, T.R.; Weaver, T.; Pilot-Matias, T.; Fox, J.L.; Whitsett, J.A.

    1987-06-01

    Hydrophobic surfactant-associated protein of M/sub r/ 6000-14,000 was isolated from either/ethanol or chloroform/methanol extracts of mammalian pulmonary surfactant. Automated Edman degradation in a gas-phase sequencer showed the major N-terminus of the human low molecular weight protein to be Phe-Pro-Ile-Pro-Leu-Pro-Try-Cys-Trp-Leu-Cys-Arg-Ala-Leu-. Because of the N-terminal phenylalanine, the surfactant protein was designated SPL(Phe). Antiserum generated against hydrophobic surfactant protein(s) from bovine pulmonary surfactant recognized protein of M/sub r/ 6000-14,000 in immunoblot analysis and was used to screen a lambdagt11 expression library constructed from adult human lung poly(A)/sup +/ RNA. This resulted in identification of a 1.4-kilobase cDNA clone that was shown to encode the N-terminus of the surfactant polypeptide SPL(Phe) (Phe-Pro-Ile-Pro-Leu-Pro-) within an open reading frame for a larger protein. Expression of a fused ..beta..-galactosidase-SPL (Phe) gene in Escherichia coli yielded an immunoreactive M/sub r/ 34,000 fusion peptide. Hybrid-arrested translation with the cDNA and immunoprecipitation of (/sup 35/S)methionine-labeled in vitro translation products of human poly(A)/sup +/ RNA with a surfactant polyclonal antibody resulted in identification of a M/sub r/ 40,000 precursor protein. Blot hybridization analysis of electrophoretically fractionated RNA from human lung detected a 2.0-kilobase RNA that was more abundant in adult lung than in fetal lung. These proteins, and specifically SPL(Phe), may therefore be useful for synthesis of replacement surfactants for treatment of hyaline membrane disease in newborn infants or of other surfactant-deficient states.

  17. Next-Generation Sequencing.

    PubMed

    Le Gallo, Matthieu; Lozy, Fred; Bell, Daphne W

    2017-01-01

    Endometrial cancers are the most frequently diagnosed gynecological malignancy and were expected to be the seventh leading cause of cancer death among American women in 2015. The majority of endometrial cancers are of serous or endometrioid histology. Most human tumors, including endometrial tumors, are driven by the acquisition of pathogenic mutations in cancer genes. Thus, the identification of somatic mutations within tumor genomes is an entry point toward cancer gene discovery. However, efforts to pinpoint somatic mutations in human cancers have, until recently, relied on high-throughput sequencing of single genes or gene families using Sanger sequencing. Although this approach has been fruitful, the cost and throughput of Sanger sequencing generally prohibits systematic sequencing of the ~22,000 genes that make up the exome. The recent development of next-generation sequencing technologies changed this paradigm by providing the capability to rapidly sequence exomes, transcriptomes, and genomes at relatively low cost. Remarkably, the application of this technology to catalog the mutational landscapes of endometrial tumor exomes, transcriptomes, and genomes has revealed, for the first time, that serous and endometrioid endometrial cancers can be classified into four distinct molecular subgroups. In this chapter, we overview the characteristic genomic features of each subgroup and discuss the known and putative cancer genes that have emerged from next-generation sequencing of endometrial carcinomas.

  18. HIV Sequence Compendium 2015

    SciTech Connect

    Foley, Brian Thomas; Leitner, Thomas Kenneth; Apetrei, Cristian; Hahn, Beatrice; Mizrachi, Ilene; Mullins, James; Rambaut, Andrew; Wolinsky, Steven; Korber, Bette Tina Marie

    2015-10-05

    This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  19. Personalized Course Sequence Recommendations

    NASA Astrophysics Data System (ADS)

    Xu, Jie; Xing, Tianwei; van der Schaar, Mihaela

    2016-10-01

    Given the variability in student learning it is becoming increasingly important to tailor courses as well as course sequences to student needs. This paper presents a systematic methodology for offering personalized course sequence recommendations to students. First, a forward-search backward-induction algorithm is developed that can optimally select course sequences to decrease the time required for a student to graduate. The algorithm accounts for prerequisite requirements (typically present in higher level education) and course availability. Second, using the tools of multi-armed bandits, an algorithm is developed that can optimally recommend a course sequence that both reduces the time to graduate while also increasing the overall GPA of the student. The algorithm dynamically learns how students with different contextual backgrounds perform for given course sequences and then recommends an optimal course sequence for new students. Using real-world student data from the UCLA Mechanical and Aerospace Engineering department, we illustrate how the proposed algorithms outperform other methods that do not include student contextual information when making course sequence recommendations.

  20. Phylogenetic Trees From Sequences

    NASA Astrophysics Data System (ADS)

    Ryvkin, Paul; Wang, Li-San

    In this chapter, we review important concepts and approaches for phylogeny reconstruction from sequence data.We first cover some basic definitions and properties of phylogenetics, and briefly explain how scientists model sequence evolution and measure sequence divergence. We then discuss three major approaches for phylogenetic reconstruction: distance-based phylogenetic reconstruction, maximum parsimony, and maximum likelihood. In the third part of the chapter, we review how multiple phylogenies are compared by consensus methods and how to assess confidence using bootstrapping. At the end of the chapter are two sections that list popular software packages and additional reading.

  1. Automatic Command Sequence Generation

    NASA Technical Reports Server (NTRS)

    Fisher, Forest; Gladded, Roy; Khanampompan, Teerapat

    2007-01-01

    Automatic Sequence Generator (Autogen) Version 3.0 software automatically generates command sequences for the Mars Reconnaissance Orbiter (MRO) and several other JPL spacecraft operated by the multi-mission support team. Autogen uses standard JPL sequencing tools like APGEN, ASP, SEQGEN, and the DOM database to automate the generation of uplink command products, Spacecraft Command Message Format (SCMF) files, and the corresponding ground command products, DSN Keywords Files (DKF). Autogen supports all the major multi-mission mission phases including the cruise, aerobraking, mapping/science, and relay mission phases. Autogen is a Perl script, which functions within the mission operations UNIX environment. It consists of two parts: a set of model files and the autogen Perl script. Autogen encodes the behaviors of the system into a model and encodes algorithms for context sensitive customizations of the modeled behaviors. The model includes knowledge of different mission phases and how the resultant command products must differ for these phases. The executable software portion of Autogen, automates the setup and use of APGEN for constructing a spacecraft activity sequence file (SASF). The setup includes file retrieval through the DOM (Distributed Object Manager), an object database used to store project files. This step retrieves all the needed input files for generating the command products. Depending on the mission phase, Autogen also uses the ASP (Automated Sequence Processor) and SEQGEN to generate the command product sent to the spacecraft. Autogen also provides the means for customizing sequences through the use of configuration files. By automating the majority of the sequencing generation process, Autogen eliminates many sequence generation errors commonly introduced by manually constructing spacecraft command sequences. Through the layering of commands into the sequence by a series of scheduling algorithms, users are able to rapidly and reliably construct the

  2. Toward nanoscale genome sequencing.

    PubMed

    Ryan, Declan; Rahimi, Maryam; Lund, John; Mehta, Ranjana; Parviz, Babak A

    2007-09-01

    This article reports on the state-of-the-art technologies that sequence DNA using miniaturized devices. The article considers the miniaturization of existing technologies for sequencing DNA and the opportunities for cost reduction that 'on-chip' devices can deliver. The ability to construct nano-scale structures and perform measurements using novel nano-scale effects has provided new opportunities to identify nucleotides directly using physical, and not chemical, methods. The challenges that these technologies need to overcome to provide a US$1000-genome sequencing technology are also presented.

  3. DNA sequences encoding osteoinductive products

    SciTech Connect

    Wang, E.A.; Wozney, J.M.; Rosen, V.

    1991-05-07

    This patent describes an isolated DNA sequence encoding an osteoinductive protein the DNA sequence comprising a coding sequence. It comprises: nucleotide No.1 through nucleotide No.387, nucleotide No.356 through nucleotide No.1543, nucleotide $402 through nucleotide No.1626, naturally occurring allelic sequences and equivalent degenerative codon sequences and sequences which hybridize to any of sequences under stringent hybridization conditions; and encode a protein characterized by the ability to induce the formation of bone and/or cartilage.

  4. Advances in sequence analysis.

    PubMed

    Califano, A

    2001-06-01

    In its early days, the entire field of computational biology revolved almost entirely around biological sequence analysis. Over the past few years, however, a number of new non-sequence-based areas of investigation have become mainstream, from the analysis of gene expression data from microarrays, to whole-genome association discovery, and to the reverse engineering of gene regulatory pathways. Nonetheless, with the completion of private and public efforts to map the human genome, as well as those of other organisms, sequence data continue to be a veritable mother lode of valuable biological information that can be mined in a variety of contexts. Furthermore, the integration of sequence data with a variety of alternative information is providing valuable and fundamentally new insight into biological processes, as well as an array of new computational methodologies for the analysis of biological data.

  5. Authentication of byte sequences

    SciTech Connect

    Stearns, S.D.

    1991-06-01

    Algorithms for the authentication of byte sequences are described. The algorithms are designed to authenticate data in the Storage, Retrieval, Analysis, and Display (SRAD) Test Data Archive of the Radiation Effects and Testing Directorate (9100) at Sandia National Laboratories, and may be used in similar situations where authentication of stored data is required. The algorithms use a well-known error detection method called the Cyclic Redundancy Check (CRC). When a byte sequence is authenticated and stored, CRC bytes are generated and attached to the end of the sequence. When the authenticated data is retrieved, the authentication check consists of processing the entire sequence, including the CRC bytes, and checking for a remainder of zero. The error detection properties of the CRC are extensive and result in a reliable authentication of SRAD data.

  6. Pierre Robin sequence

    MedlinePlus

    Pierre Robin syndrome; Pierre Robin complex; Pierre Robin anomaly ... The exact causes of Pierre Robin sequence are unknown. It may be part of many genetic syndromes. The lower jaw develops slowly before birth, but may grow ...

  7. Genomic Sequencing in Cancer

    PubMed Central

    Tuna, Musaffe; Amos, Christopher I.

    2013-01-01

    Genomic sequencing has provided critical insights into the etiology of both simple and complex diseases. The enormous reductions in cost for whole genome sequencing have allowed this technology to gain increasing use. Whole genome analysis has impacted research of complex diseases including cancer by allowing the systematic analysis of entire genomes in a single experiment, thereby facilitating the discovery of somatic and germline mutations, and identification of the function and impact of the insertions, deletions, and structural rearrangements, including translocations and inversions, in novel disease genes. Whole-genome sequencing can be used to provide the most comprehensive characterization of the cancer genome, the complexity of which we are only beginning to understand. Hence in this review, we focus on whole-genome sequencing in cancer. PMID:23178448

  8. Sequencing Complex Genomic Regions

    SciTech Connect

    Eichler, Evan

    2009-05-28

    Evan Eichler, Howard Hughes Medical Investigator at the University of Washington, gives the May 28, 2009 keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM. Part 1 of 2

  9. Sequencing Complex Genomic Regions

    SciTech Connect

    Eichler, Evan

    2009-05-28

    Evan Eichler, Howard Hughes Medical Investigator at the University of Washington, gives the May 28, 2009 keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM. Part 2 of 2

  10. Event-sequence detector

    NASA Technical Reports Server (NTRS)

    Hanna, M. F.

    1973-01-01

    Detector consists of matrix of storage elements which are activated by coincidence of failure-voltage pulses and clock pulses. Clock frequency used for event sequence detector can be selected to provide time resolution demanded by test at hand.

  11. HIV Sequence Compendium 2010

    SciTech Connect

    Kuiken, Carla; Foley, Brian; Leitner, Thomas; Apetrei, Christian; Hahn, Beatrice; Mizrachi, Ilene; Mullins, James; Rambaut, Andrew; Wolinsky, Steven; Korber, Bette

    2010-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is still increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  12. Sequences in drug discovery.

    PubMed

    Khurdayan, V; Davies, S

    2005-04-01

    Sequences in Drug Discovery is a new series of distinct brief reports on breaking topics in the field of drug R&D. This month's Sequences in Drug Discovery contains the following reports: Spotlight on West Nile virus vaccines. p38alpha MAPK--a dynamic target in rheumatoid arthritis. The need for new contraceptives: targeting PDE3. Vasopeptidase inhibition with a triple mode of action. Current advances in the development of 5-HT(6) receptor antagonists.

  13. Amino acid sequence and molecular modelling of glycoprotein IIb-IIIa and fibronectin receptor iso-antagonists from Trimeresurus elegans venom.

    PubMed Central

    Scaloni, A; Di Martino, E; Miraglia, N; Pelagalli, A; Della Morte, R; Staiano, N; Pucci, P

    1996-01-01

    Low-molecular-mass Arg-Gly-Asp (RGD)-containing polypeptides were isolated from the venom of Trimeresurus elegans by a simple two-step procedure consisting of membrane filtration and reverse-phase HPLC. A combination of electrospray MS, fast-atom bombardment MS and Edman degradation allowed us to ascertain the presence in the venom of different isoforms and to determine their primary structures. The amino acid sequences resembled the structure of elegantin, the only disintegrin previously reported from the T. elegans venom [Williams, Rucinski, Holt and Niewiarowski (1990) Biochim. Biophys, Acta 1039, 81-89]. MS analyses indicated the occurrence of differential proteolytic processing at both the N-terminus and the C-termins of the polypeptide chains. The amino acid sequence alignment of the elegantin isoforms with known components of the disintegrin family demonstrated the complete conservation of the 12 cysteine residues involved in disulphide bridges. Molecular modelling of elegantins predicted an overall folding of these molecules quite similar to that reported for the kistrin solution structure. The newly identified polypeptide isoforms strongly inhibited ADP-induced aggregation in both human and canine platelet-rich plasma but showed a different species-dependent specificity. These molecules were also able to inhibit B16-BL6 murine melanoma cell adhesion to immobilized fibronectin. The comparison of the structures and biological activities of elegantin isoforms and kistrin allowed us to highlight some structural features that, in addition to the RGD locus might be involved in the interaction of these snake-venom polypeptides with the integrin receptors on the platelet and cell surface. PMID:8920980

  14. Purification, sequencing and structural characterization of the phospholipase A1 from the venom of the social wasp Polybia paulista (Hymenoptera, Vespidae).

    PubMed

    Santos, Lucilene D; Santos, Keity S; de Souza, Bibiana M; Arcuri, Helen A; Cunha-Neto, Edécio; Castro, Fabio Morato; Kalil, Jorge Elias; Palma, Mario S

    2007-12-01

    The biochemical and functional characterization of wasp venom toxins is an important prerequisite for the development of new tools both for the therapy of the toxic reactions due to envenomation caused by multiple stinging accidents and also for the diagnosis and therapy of allergic reactions caused by this type of venom. PLA(1) was purified from the venom of the neotropical social wasp Polybia paulista by using molecular exclusion and cation exchange chromatographies; its amino acid sequence was determined by using automated Edman degradation and compared to the sequences of other vespid venom PLA(1)'s. The enzyme exists as a 33,961.40 Da protein, which was identified as a lipase of the GX class, liprotein lipase superfamily, pancreatic lipases (ab20.3) homologous family and RP2 sub-group of phospholipase. P. paulista PLA(1) is 53-82% identical to the phospholipases from wasp species from Northern Hemisphere. The use restrained-based modeling permitted to describe the 3-D structure of the enzyme, revealing that its molecule presents 23% alpha-helix, 28% beta-sheet and 49% coil. The protein structure has the alpha/beta fold common to many lipases; the core consists of a tightly packed beta-sheet constituted of six-stranded parallel and one anti-parallel beta-strand, surrounded by four alpha-helices. P. paulista PLA(1) exhibits direct hemolytic action against washed red blood cells with activity similar to the Cobra cardiotoxin from Naja naja atra. In addition to this, PLA(1) was immunoreactive to specific IgE from the sera of P. paulista-sensitive patients.

  15. The host-defence skin peptide profiles of Peron's Tree Frog Litoria peronii in winter and summer. Sequence determination by electrospray mass spectrometry and activities of the peptides.

    PubMed

    Bilusich, Daniel; Jackway, Rebecca J; Musgrave, Ian F; Tyler, Michael J; Bowie, John H

    2009-09-01

    Positive and negative ion electrospray mass spectrometry together with Edman sequencing (when appropriate) has been used to sequence the host-defence peptides secreted from skin glands of the tree frog Litoria peronii. The peptide profiles are different in winter and summer. In winter, the frog produces small amounts of the known caerin 1.1 [GLLSVLGSVAKHVLPHVVPVIAEHL-NH(2)] (a wide-spectrum antibiotic) and caerin 2.1 [GLVSSIGRALGGLLADVVKSKQPA-OH], a narrow-spectrum antibiotic and an inhibitor of neuronal nitric oxide synthase. The major peptides produced throughout the year are the pGlu-containing peroniins 1.1 to 1.5 (e.g. peroniin 1.1 [pEPWLPFG-NH(2)], a smooth muscle contractor from 10(-7) M), and caerulein [pEQDY(SO(3)H)TGWMDF-NH(2)], a known and potent smooth muscle contractor from 10(-10) M. There are also some precursors to the peroniin 1 peptides, only detected in the skin secretion in summer, which are inactive and appear to be all (or part) of the spacer peroniin 1 peptides, e.g. peroniin 1.1b [SEEEKRQPWLPFG-NH(2)]. There are three members of the Litoria peronii Group of tree frogs classified in Australia, namely, L. peronii, L. rothii and L.tyleri. A comparison of the skin peptide profiles of L. peronii with those reported previously for L. rothii suggests that either these two species of tree frog are not as closely related as determined previously on morphological grounds, or that skin peptide divergence in tree frogs of this Group is more extensive than in others that have been studied. Copyright (c) 2009 John Wiley & Sons, Ltd.

  16. Sequence TTKF↓QE Defines the Site of Proteolytic Cleavage in Mhp683 Protein, a Novel Glycosaminoglycan and Cilium Adhesin of Mycoplasma hyopneumoniae*

    PubMed Central

    Bogema, Daniel R.; Scott, Nichollas E.; Padula, Matthew P.; Tacchi, Jessica L.; Raymond, Benjamin B. A.; Jenkins, Cheryl; Cordwell, Stuart J.; Minion, F. Chris; Walker, Mark J.; Djordjevic, Steven P.

    2011-01-01

    Mycoplasma hyopneumoniae colonizes the ciliated respiratory epithelium of swine, disrupting mucociliary function and inducing chronic inflammation. P97 and P102 family members are major surface proteins of M. hyopneumoniae and play key roles in colonizing cilia via interactions with glycosaminoglycans and mucin. The p102 paralog, mhp683, and homologs in strains from different geographic origins encode a 135-kDa pre-protein (P135) that is cleaved into three fragments identified here as P45683, P48683, and P50683. A peptide sequence (TTKF↓QE) was identified surrounding both cleavage sites in Mhp683. N-terminal sequences of P48683 and P50683, determined by Edman degradation and mass spectrometry, confirmed cleavage after the phenylalanine residue. A similar proteolytic cleavage site was identified by mass spectrometry in another paralog of the P97/P102 family. Trypsin digestion and surface biotinylation studies showed that P45683, P48683, and P50683 reside on the M. hyopneumoniae cell surface. Binding assays of recombinant proteins F1683–F5683, spanning Mhp683, showed saturable and dose-dependent binding to biotinylated heparin that was inhibited by unlabeled heparin, fucoidan, and mucin. F1683–F5683 also bound porcine epithelial cilia, and antisera to F2683 and F5683 significantly inhibited cilium binding by M. hyopneumoniae cells. These data suggest that P45683, P48683, and P50683 each display cilium- and proteoglycan-binding sites. Mhp683 is the first characterized glycosaminoglycan-binding member of the P102 family. PMID:21969369

  17. Nanapore Sequencing with MSPA

    NASA Astrophysics Data System (ADS)

    Gundlach, Jens H.

    2011-10-01

    Nanopore sequencing is the simplest concept of converting the sequence of a single DNA molecule directly into an electronic signal. We introduced the protein pore MspA. derived from Mycobacterium smegmatis, to nanpore sequencing [1]. MspA has a single, narrow (-1.2nm) and short (<1nm) constriction, ideal to identify single nucleotides. Compared to solid state devices, MspA is reproducible with sub-nanometer precision and is engineerable using genetic mutations. DNA moves through the pore at rates exceeding 1nt/microsec. too fast to observe the passage of each nucleotide. However, when DNA is held with double stranded DNA sections or an avidin anchor, single nucleotides resident in MspA's constriction can be identified with highly resolved current differences. We have provided proof of principle of a nanopore sequencing method [2] in which we use DNA modified by inserting double stranded DNA-sections between every nucleotide. The double stranded sections are designed to halt translocation for long enough to sequentially read the sequence of the original DNA molecule. Prospects and developments to sequence unmodified native DNA using MspA will be discussed.[4pt] [1] T.Z. Butler, et al, PNAS 105 20647 (2008)[0pt] [2] I.M. Derrington, et al, PNAS 107 16060 (2010).

  18. Pairwise Sequence Alignment Library

    SciTech Connect

    Jeff Daily, PNNL

    2015-05-20

    Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, a novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.

  19. Nanopore sequencing in microgravity

    PubMed Central

    McIntyre, Alexa B R; Rizzardi, Lindsay; Yu, Angela M; Alexander, Noah; Rosen, Gail L; Botkin, Douglas J; Stahl, Sarah E; John, Kristen K; Castro-Wallace, Sarah L; McGrath, Ken; Burton, Aaron S; Feinberg, Andrew P; Mason, Christopher E

    2016-01-01

    Rapid DNA sequencing and analysis has been a long-sought goal in remote research and point-of-care medicine. In microgravity, DNA sequencing can facilitate novel astrobiological research and close monitoring of crew health, but spaceflight places stringent restrictions on the mass and volume of instruments, crew operation time, and instrument functionality. The recent emergence of portable, nanopore-based tools with streamlined sample preparation protocols finally enables DNA sequencing on missions in microgravity. As a first step toward sequencing in space and aboard the International Space Station (ISS), we tested the Oxford Nanopore Technologies MinION during a parabolic flight to understand the effects of variable gravity on the instrument and data. In a successful proof-of-principle experiment, we found that the instrument generated DNA reads over the course of the flight, including the first ever sequenced in microgravity, and additional reads measured after the flight concluded its parabolas. Here we detail modifications to the sample-loading procedures to facilitate nanopore sequencing aboard the ISS and in other microgravity environments. We also evaluate existing analysis methods and outline two new approaches, the first based on a wave-fingerprint method and the second on entropy signal mapping. Computationally light analysis methods offer the potential for in situ species identification, but are limited by the error profiles (stays, skips, and mismatches) of older nanopore data. Higher accuracies attainable with modified sample processing methods and the latest version of flow cells will further enable the use of nanopore sequencers for diagnostics and research in space. PMID:28725742

  20. Nanopore sequencing in microgravity.

    PubMed

    McIntyre, Alexa B R; Rizzardi, Lindsay; Yu, Angela M; Alexander, Noah; Rosen, Gail L; Botkin, Douglas J; Stahl, Sarah E; John, Kristen K; Castro-Wallace, Sarah L; McGrath, Ken; Burton, Aaron S; Feinberg, Andrew P; Mason, Christopher E

    2016-01-01

    Rapid DNA sequencing and analysis has been a long-sought goal in remote research and point-of-care medicine. In microgravity, DNA sequencing can facilitate novel astrobiological research and close monitoring of crew health, but spaceflight places stringent restrictions on the mass and volume of instruments, crew operation time, and instrument functionality. The recent emergence of portable, nanopore-based tools with streamlined sample preparation protocols finally enables DNA sequencing on missions in microgravity. As a first step toward sequencing in space and aboard the International Space Station (ISS), we tested the Oxford Nanopore Technologies MinION during a parabolic flight to understand the effects of variable gravity on the instrument and data. In a successful proof-of-principle experiment, we found that the instrument generated DNA reads over the course of the flight, including the first ever sequenced in microgravity, and additional reads measured after the flight concluded its parabolas. Here we detail modifications to the sample-loading procedures to facilitate nanopore sequencing aboard the ISS and in other microgravity environments. We also evaluate existing analysis methods and outline two new approaches, the first based on a wave-fingerprint method and the second on entropy signal mapping. Computationally light analysis methods offer the potential for in situ species identification, but are limited by the error profiles (stays, skips, and mismatches) of older nanopore data. Higher accuracies attainable with modified sample processing methods and the latest version of flow cells will further enable the use of nanopore sequencers for diagnostics and research in space.

  1. Program Synthesizes UML Sequence Diagrams

    NASA Technical Reports Server (NTRS)

    Barry, Matthew R.; Osborne, Richard N.

    2006-01-01

    A computer program called "Rational Sequence" generates Universal Modeling Language (UML) sequence diagrams of a target Java program running on a Java virtual machine (JVM). Rational Sequence thereby performs a reverse engineering function that aids in the design documentation of the target Java program. Whereas previously, the construction of sequence diagrams was a tedious manual process, Rational Sequence generates UML sequence diagrams automatically from the running Java code.

  2. Sequence-controlled polymers.

    PubMed

    Lutz, Jean-François; Ouchi, Makoto; Liu, David R; Sawamoto, Mitsuo

    2013-08-09

    Sequence-controlled polymers are macromolecules in which monomer units of different chemical nature are arranged in an ordered fashion. The most prominent examples are biological and have been studied and used primarily by molecular biologists and biochemists. However, recent progress in protein- and DNA-based nanotechnologies has shown the relevance of sequence-controlled polymers to nonbiological applications, including data storage, nanoelectronics, and catalysis. In addition, synthetic polymer chemistry has provided interesting routes for preparing nonnatural sequence-controlled polymers. Although these synthetic macromolecules do not yet compare in functional scope with their natural counterparts, they open up opportunities for controlling the structure, self-assembly, and macroscopic properties of polymer materials.

  3. Sequencing the Connectome

    PubMed Central

    Zador, Anthony M.; Dubnau, Joshua; Oyibo, Hassana K.; Zhan, Huiqing; Cao, Gang; Peikon, Ian D.

    2012-01-01

    Connectivity determines the function of neural circuits. Historically, circuit mapping has usually been viewed as a problem of microscopy, but no current method can achieve high-throughput mapping of entire circuits with single neuron precision. Here we describe a novel approach to determining connectivity. We propose BOINC (“barcoding of individual neuronal connections”), a method for converting the problem of connectivity into a form that can be read out by high-throughput DNA sequencing. The appeal of using sequencing is that its scale—sequencing billions of nucleotides per day is now routine—is a natural match to the complexity of neural circuits. An inexpensive high-throughput technique for establishing circuit connectivity at single neuron resolution could transform neuroscience research. PMID:23109909

  4. The amino acid sequence of protein AA from a burro (Equus asinus).

    PubMed

    Sletten, Knut; Johnson, Kenneth H; Westermark, Per

    2003-09-01

    The primary structure of amyloid fibril protein AA of a burro has been determined by Edman degradation. The 80 amino acid residue long protein shows strong resemblance to that of other mammalian AA-proteins and differs from equine protein AA at 5 positions: Burro/horse positions 20 (Q/N), 44 (R,Q, K/K,Q), 59 (G,L/G,A), 61 (Q/E) and 65 (N/R).

  5. Method to amplify variable sequences without imposing primer sequences

    DOEpatents

    Bradbury, Andrew M.; Zeytun, Ahmet

    2006-11-14

    The present invention provides methods of amplifying target sequences without including regions flanking the target sequence in the amplified product or imposing amplification primer sequences on the amplified product. Also provided are methods of preparing a library from such amplified target sequences.

  6. A Sequence of Cylinders

    ERIC Educational Resources Information Center

    Johnson, Erica

    2006-01-01

    Hoping to develop in her students an understanding of mathematics as a way of thinking more than a way of doing, the author of this article describes how her students worked on a spatial reasoning problem stemming from an iteratively constructed sequence of cylinders. She presents an activity of making cylinders out of paper models, and for every…

  7. Lining up Arithmetic Sequences

    ERIC Educational Resources Information Center

    Bell, Carol J.

    2011-01-01

    Most future teachers are familiar with number patterns that represent an arithmetic sequence, and most are able to determine the general representation of the "n"th number in the pattern. However, when they are given a visual representation instead of the numbers in the pattern, it is not always easy for them to make the connection between the…

  8. Prenatal Whole Genome Sequencing

    PubMed Central

    Donley, Greer; Hull, Sara Chandros; Berkman, Benjamin E.

    2014-01-01

    With whole genome sequencing set to become the preferred method of prenatal screening, we need to pay more attention to the massive amount of information it will deliver to parents—and the fact that we don't yet understand what most of it means. PMID:22777977

  9. MARS: improving multiple circular sequence alignment using refined sequences.

    PubMed

    Ayad, Lorraine A K; Pissis, Solon P

    2017-01-14

    A fundamental assumption of all widely-used multiple sequence alignment techniques is that the left- and right-most positions of the input sequences are relevant to the alignment. However, the position where a sequence starts or ends can be totally arbitrary due to a number of reasons: arbitrariness in the linearisation (sequencing) of a circular molecular structure; or inconsistencies introduced into sequence databases due to different linearisation standards. These scenarios are relevant, for instance, in the process of multiple sequence alignment of mitochondrial DNA, viroid, viral or other genomes, which have a circular molecular structure. A solution for these inconsistencies would be to identify a suitable rotation (cyclic shift) for each sequence; these refined sequences may in turn lead to improved multiple sequence alignments using the preferred multiple sequence alignment program. We present MARS, a new heuristic method for improving Multiple circular sequence Alignment using Refined Sequences. MARS was implemented in the C++ programming language as a program to compute the rotations (cyclic shifts) required to best align a set of input sequences. Experimental results, using real and synthetic data, show that MARS improves the alignments, with respect to standard genetic measures and the inferred maximum-likelihood-based phylogenies, and outperforms state-of-the-art methods both in terms of accuracy and efficiency. Our results show, among others, that the average pairwise distance in the multiple sequence alignment of a dataset of widely-studied mitochondrial DNA sequences is reduced by around 5% when MARS is applied before a multiple sequence alignment is performed. Analysing multiple sequences simultaneously is fundamental in biological research and multiple sequence alignment has been found to be a popular method for this task. Conventional alignment techniques cannot be used effectively when the position where sequences start is arbitrary. We present

  10. Multiplexed fragaria chloroplast genome sequencing

    Treesearch

    W. Njuguna; A. Liston; R. Cronn; N.V. Bassil

    2010-01-01

    A method to sequence multiple chloroplast genomes using ultra high throughput sequencing technologies was recently described. Complete chloroplast genome sequences can resolve phylogenetic relationships at low taxonomic levels and identify informative point mutations and indels. The objective of this research was to sequence multiple Fragaria...

  11. Targeted sequencing of plant genomes

    Treesearch

    Mark D. Huynh

    2014-01-01

    Next-generation sequencing (NGS) has revolutionized the field of genetics by providing a means for fast and relatively affordable sequencing. With the advancement of NGS, wholegenome sequencing (WGS) has become more commonplace. However, sequencing an entire genome is still not cost effective or even beneficial in all cases. In studies that do not require a whole-...

  12. Compression of Multiple DNA Sequences Using Intra-Sequence and Inter-Sequence Similarities.

    PubMed

    Cheng, Kin-On; Wu, Paula; Law, Ngai-Fong; Siu, Wan-Chi

    2015-01-01

    Traditionally, intra-sequence similarity is exploited for compressing a single DNA sequence. Recently, remarkable compression performance of individual DNA sequence from the same population is achieved by encoding its difference with a nearly identical reference sequence. Nevertheless, there is lack of general algorithms that also allow less similar reference sequences. In this work, we extend the intra-sequence to the inter-sequence similarity in that approximate matches of subsequences are found between the DNA sequence and a set of reference sequences. Hence, a set of nearly identical DNA sequences from the same population or a set of partially similar DNA sequences like chromosome sequences and DNA sequences of related species can be compressed together. For practical compressors, the compressed size is usually influenced by the compression order of sequences. Fast search algorithms for the optimal compression order are thus developed for multiple sequences compression. Experimental results on artificial and real datasets demonstrate that our proposed multiple sequences compression methods with fast compression order search are able to achieve good compression performance under different levels of similarity in the multiple DNA sequences.

  13. High Throughput Sequencing: An Overview of Sequencing Chemistry.

    PubMed

    Ambardar, Sheetal; Gupta, Rikita; Trakroo, Deepika; Lal, Rup; Vakhlu, Jyoti

    2016-12-01

    In the present century sequencing is to the DNA science, what gel electrophoresis was to it in the last century. From 1977 to 2016 three generation of the sequencing technologies of various types have been developed. Second and third generation sequencing technologies referred commonly to as next generation sequencing technology, has evolved significantly with increase in sequencing speed, decrease in sequencing cost, since its inception in 2004. GS FLX by 454 Life Sciences/Roche diagnostics, Genome Analyzer, HiSeq, MiSeq and NextSeq by Illumina, Inc., SOLiD by ABI, Ion Torrent by Life Technologies are various type of the sequencing platforms available for second generation sequencing. The platforms available for the third generation sequencing are Helicos™ Genetic Analysis System by SeqLL, LLC, SMRT Sequencing by Pacific Biosciences, Nanopore sequencing by Oxford Nanopore's, Complete Genomics by Beijing Genomics Institute and GnuBIO by BioRad, to name few. The present article is an overview of the principle and the sequencing chemistry of these high throughput sequencing technologies along with brief comparison of various types of sequencing platforms available.

  14. Sequencing of aromatase inhibitors

    PubMed Central

    Bertelli, G

    2005-01-01

    Since the development of the third-generation aromatase inhibitors (AIs), anastrozole, letrozole and exemestane, these agents have been the subject of intensive research to determine their optimal use in advanced breast cancer. Not only have they replaced progestins in second-line therapy and challenged the role of tamoxifen in first-line, but there is also evidence for a lack of cross-resistance between the steroidal and nonsteroidal AIs, meaning that they may be used in sequence to obtain prolonged clinical benefit. Many questions remain, however, as to the best sequence of the two types of AIs and of the other available agents, including tamoxifen and fulvestrant, in different patient groups. PMID:16100523

  15. Transposon facilitated DNA sequencing

    SciTech Connect

    Berg, D.E.; Berg, C.M.; Huang, H.V.

    1990-01-01

    The purpose of this research is to investigate and develop methods that exploit the power of bacterial transposable elements for large scale DNA sequencing: Our premise is that the use of transposons to put primer binding sites randomly in target DNAs should provide access to all portions of large DNA fragments, without the inefficiencies of methods involving random subcloning and attendant repetitive sequencing, or of sequential synthesis of many oligonucleotide primers that are used to match systematically along a DNA molecule. Two unrelated bacterial transposons, Tn5 and {gamma}{delta}, are being used because they have both proven useful for molecular analyses, and because they differ sufficiently in mechanism and specificity of transposition to merit parallel development.

  16. HIV sequence compendium 2002

    SciTech Connect

    Kuiken, Carla; Foley, Brian; Freed, Eric; Hahn, Beatrice; Marx, Preston; McCutchan, Francine; Mellors, John; Wolinsky, Steven; Korber, Bette

    2002-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Traditionally, we present the sequence data themselves in the form of alignments: Section II, an alignment of a selection of HIV-1/SIVcpz full-length genomes (a lot of LAI-like sequences, for example, have been omitted because they are so similar that they bias the alignment); Section III, a combined HIV-1/HIV-2/SIV whole genome alignment; Sections IV–VI, amino acid alignments for HIV-1/SIV-cpz, HIV-2/SIV, and SIVagm. The HIV-2/SIV and SIVagm amino acid alignments are separate because the genetic distances between these groups are so great that presenting them in one alignment would make it very elongated because of the large number of gaps that have to be inserted. As always, tables with extensive background information gathered from the literature accompany the whole genome alignments. The collection of whole-gene sequences in the database is now large enough that we have abundant representation of most subtypes. For many subtypes, and especially for subtype B, a large number of sequences that span entire genes were not included in the printed alignments to conserve space. A more complete version of all alignments is available on our website, http://hiv-web.lanl.gov/content/hiv-db/ALIGN_CURRENT/ALIGN-INDEX.html. Importantly, all these alignments have been edited to include only one sequence per person, based on phylogenetic trees that were created for all of them, as well as on the literature. Because of the number of sequences available, we have decided to use a different selection principle this year, based on the epidemiological importance of the subtypes. Subtypes A–D and CRFs 01 and 02 are by far the most widespread variants, and for these (when available) we have included 8–10 representatives in the alignments. The other

  17. Rapid Polymer Sequencer

    NASA Technical Reports Server (NTRS)

    Stolc, Viktor (Inventor); Brock, Mathew W. (Inventor)

    2011-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal or transverse direction at the tip, a polymer sequence is passed through the tip, and a change in an electrical current signal is measured as each polymer component passes through the tip. Each measured change in electrical current signals is compared with a database of reference signals, with each reference signal identified with a polymer component, to identify the unknown polymer component. The tip preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  18. The Fermi blazar sequence

    NASA Astrophysics Data System (ADS)

    Ghisellini, G.; Righi, C.; Costamante, L.; Tavecchio, F.

    2017-07-01

    We revisit the blazar sequence exploiting the complete, flux-limited sample of blazars with known redshift detected by the Fermi satellite after 4 yr of operations (the 3LAC sample). We divide the sources into γ-ray luminosity bins, collect all the archival data for all blazars, and construct their spectral energy distribution (SED). We describe the average SED of blazars in the same luminosity bin through a simple phenomenological function consisting of two broken power laws connecting with a power law describing the radio emission. We do that separately for BL Lacs and for flat spectrum radio quasars (FSRQs) and also for all blazars together. The main results are: (i) FSRQs display approximately the same SED as the luminosity increases, but the relative importance of the high-energy peak increases; (ii) as a consequence, the X-ray spectra of FSRQs become harder for larger luminosities; (iii) BL Lacs indeed form a sequence: they become redder (i.e. smaller peak frequencies) with increasing luminosities, with a softer γ-ray slope and a larger dominance of the high-energy peak; (iv) for all blazars (BL Lacs+FSRQs), these properties become more prominent, as the highest luminosity bin is populated mostly by FSRQs and the lowest luminosity bin mostly by BL Lacs. This agrees with the original blazar sequence, although BL Lacs never have an average γ-ray slope as hard as found in the original sequence. (v) At high luminosities, a large fraction of FSRQs show signs of thermal emission from the accretion disc, contributing to the optical-UV (ultraviolet).

  19. Sequencing BPS spectra

    DOE PAGES

    Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar; ...

    2016-03-02

    In this article, we provide both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explainmore » from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincar e polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (re fined) modular S-matrix. This leads to the identi fication of modular transformations in Chern-Simons theory and 3d N = 2 theory via the 3d/3d correspondence. In conclusion, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.« less

  20. Sequencing BPS spectra

    SciTech Connect

    Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar; Stošić, Marko; Sułkowski, Piotr

    2016-03-02

    In this article, we provide both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincar e polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (re fined) modular S-matrix. This leads to the identi fication of modular transformations in Chern-Simons theory and 3d N = 2 theory via the 3d/3d correspondence. In conclusion, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.

  1. Sequencing BPS spectra

    NASA Astrophysics Data System (ADS)

    Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar; Stošić, Marko; Sułkowski, Piotr

    2016-03-01

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincaré polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (refined) modular S-matrix. This leads to the identification of modular transformations in Chern-Simons theory and 3d {N}=2 theory via the 3d/3d correspondence. Lastly, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.

  2. The Galaxy End Sequence

    NASA Astrophysics Data System (ADS)

    Eales, Stephen; de Vis, Pieter; Smith, Matthew W. L.; Appah, Kiran; Ciesla, Laure; Duffield, Chris; Schofield, Simon

    2017-03-01

    A common assumption is that galaxies fall in two distinct regions of a plot of specific star formation rate (SSFR) versus galaxy stellar mass: a star-forming galaxy main sequence (GMS) and a separate region of 'passive' or 'red and dead galaxies'. Starting from a volume-limited sample of nearby galaxies designed to contain most of the stellar mass in this volume, and thus representing the end-point of ≃12 billion years of galaxy evolution, we investigate the distribution of galaxies in this diagram today. We show that galaxies follow a strongly curved extended GMS with a steep negative slope at high galaxy stellar masses. There is a gradual change in the morphologies of the galaxies along this distribution, but there is no clear break between early-type and late-type galaxies. Examining the other evidence that there are two distinct populations, we argue that the 'red sequence' is the result of the colours of galaxies changing very little below a critical value of the SSFR, rather than implying a distinct population of galaxies. Herschel observations, which show at least half of early-type galaxies contain a cool interstellar medium, also imply continuity between early-type and late-type galaxies. This picture of a unitary population of galaxies requires more gradual evolutionary processes than the rapid quenching process needed to explain two distinct populations. We challenge theorists to predict quantitatively the properties of this 'Galaxy End Sequence'.

  3. Plant DNA sequencing for phylogenetic analyses: from plants to sequences.

    PubMed

    Neves, Susana S; Forrest, Laura L

    2011-01-01

    DNA sequences are important sources of data for phylogenetic analysis. Nowadays, DNA sequencing is a routine technique in molecular biology laboratories. However, there are specific questions associated with project design and sequencing of plant samples for phylogenetic analysis, which may not be familiar to researchers starting in the field. This chapter gives an overview of methods and protocols involved in the sequencing of plant samples, including general recommendations on the selection of species/taxa and DNA regions to be sequenced, and field collection of plant samples. Protocols of plant sample preparation, DNA extraction, PCR and cloning, which are critical to the success of molecular phylogenetic projects, are described in detail. Common problems of sequencing (using the Sanger method) are also addressed. Possible applications of second-generation sequencing techniques in plant phylogenetics are briefly discussed. Finally, orientation on the preparation of sequence data for phylogenetic analyses and submission to public databases is also given.

  4. A vision for ubiquitous sequencing

    PubMed Central

    Erlich, Yaniv

    2015-01-01

    Genomics has recently celebrated reaching the $1000 genome milestone, making affordable DNA sequencing a reality. With this goal successfully completed, the next goal of the sequencing revolution can be sequencing sensors—miniaturized sequencing devices that are manufactured for real-time applications and deployed in large quantities at low costs. The first part of this manuscript envisions applications that will benefit from moving the sequencers to the samples in a range of domains. In the second part, the manuscript outlines the critical barriers that need to be addressed in order to reach the goal of ubiquitous sequencing sensors. PMID:26430149

  5. A vision for ubiquitous sequencing.

    PubMed

    Erlich, Yaniv

    2015-10-01

    Genomics has recently celebrated reaching the $1000 genome milestone, making affordable DNA sequencing a reality. With this goal successfully completed, the next goal of the sequencing revolution can be sequencing sensors--miniaturized sequencing devices that are manufactured for real-time applications and deployed in large quantities at low costs. The first part of this manuscript envisions applications that will benefit from moving the sequencers to the samples in a range of domains. In the second part, the manuscript outlines the critical barriers that need to be addressed in order to reach the goal of ubiquitous sequencing sensors.

  6. High-Throughput Sequencing Technologies

    PubMed Central

    Reuter, Jason A.; Spacek, Damek; Snyder, Michael P.

    2015-01-01

    Summary The human genome sequence has profoundly altered our understanding of biology, human diversity and disease. The path from the first draft sequence to our nascent era of personal genomes and genomic medicine has been made possible only because of the extraordinary advancements in DNA sequencing technologies over the past ten years. Here, we discuss commonly used high-throughput sequencing platforms, the growing array of sequencing assays developed around them as well as the challenges facing current sequencing platforms and their clinical application. PMID:26000844

  7. Correspondence: Searching sequence space

    SciTech Connect

    Youvan, D.C.

    1995-08-01

    This correspondence debates the efficiency and application of genetic algorithms (GAs) to search protein sequence space. The important experimental point is that such sparse searches utilize physically realistic syntheses. In this regard, all GA-based technologies are very similar; they {open_quotes}learn{close_quotes} from their initial sparse search and then generate interesting new proteins within a few iterations. Which GA-based technology is best? That probably depends on the protein and the specific engineering goal. Given the fact that the field of combinatorial chemistry is still in its infancy, it is probably wise to consider all of the proven mutagenesis methods. 19 refs.

  8. DNA Sequencing apparatus

    DOEpatents

    Tabor, Stanley; Richardson, Charles C.

    1992-01-01

    An automated DNA sequencing apparatus having a reactor for providing at least two series of DNA products formed from a single primer and a DNA strand, each DNA product of a series differing in molecular weight and having a chain terminating agent at one end; separating means for separating the DNA products to form a series bands, the intensity of substantially all nearby bands in a different series being different, band reading means for determining the position an This invention was made with government support including a grant from the U.S. Public Health Service, contract number AI-06045. The U.S. government has certain rights in the invention.

  9. Marks of Change in Sequences

    NASA Astrophysics Data System (ADS)

    Jürgensen, H.

    2011-12-01

    Given a sequence of events, how does one recognize that a change has occurred? We explore potential definitions of the concept of change in a sequence and propose that words in relativized solid codes might serve as indicators of change.

  10. Sequencing the Unrearranged Human Immunoglobin

    SciTech Connect

    Warren, Rene

    2010-06-03

    Rene Warren from Canada's Michael Smith Genome Sciences Centre discusses sequencing and finishing the IgH heavy chain locus on June 3, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  11. Spaces of Ideal Convergent Sequences

    PubMed Central

    Mursaleen, M.; Sharma, Sunil K.

    2014-01-01

    In the present paper, we introduce some sequence spaces using ideal convergence and Musielak-Orlicz function ℳ = (Mk). We also examine some topological properties of the resulting sequence spaces. PMID:24592143

  12. Next-Generation Sequencing Platforms

    NASA Astrophysics Data System (ADS)

    Mardis, Elaine R.

    2013-06-01

    Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.

  13. Rapid Polymer Sequencer

    NASA Technical Reports Server (NTRS)

    Stolc, Viktor (Inventor); Brock, Matthew W (Inventor)

    2013-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal direction, or in a transverse direction, in the tip region, a polymer sequence is passed through the tip region, and a change in an electrical current signal is measured as each polymer component passes through the tip region. Each of the measured changes in electrical current signals is compared with a database of reference electrical change signals, with each reference signal corresponding to an identified polymer component, to identify the unknown polymer component with a reference polymer component. The nanopore preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  14. Molecular phylogenetics before sequences

    PubMed Central

    Ragan, Mark A; Bernard, Guillaume; Chan, Cheong Xin

    2014-01-01

    From 1971 to 1985, Carl Woese and colleagues generated oligonucleotide catalogs of 16S/18S rRNAs from more than 400 organisms. Using these incomplete and imperfect data, Carl and his colleagues developed unprecedented insights into the structure, function, and evolution of the large RNA components of the translational apparatus. They recognized a third domain of life, revealed the phylogenetic backbone of bacteria (and its limitations), delineated taxa, and explored the tempo and mode of microbial evolution. For these discoveries to have stood the test of time, oligonucleotide catalogs must carry significant phylogenetic signal; they thus bear re-examination in view of the current interest in alignment-free phylogenetics based on k-mers. Here we consider the aims, successes, and limitations of this early phase of molecular phylogenetics. We computationally generate oligonucleotide sets (e-catalogs) from 16S/18S rRNA sequences, calculate pairwise distances between them based on D2 statistics, compute distance trees, and compare their performance against alignment-based and k-mer trees. Although the catalogs themselves were superseded by full-length sequences, this stage in the development of computational molecular biology remains instructive for us today. PMID:24572375

  15. Towards Sequencing Cotton (Gossypium) Genomes

    USDA-ARS?s Scientific Manuscript database

    Despite rapidly decreasing costs and innovative technologies, sequencing of angiosperm genomes is not yet undertaken lightly. Generating larger amounts of sequence data more quickly does not address the difficulties of sequencing and assembling complex genomes de novo. The cotton genomes represent a...

  16. Sequence Factorial and Its Applications

    ERIC Educational Resources Information Center

    Asiru, Muniru A.

    2012-01-01

    In this note, we introduce sequence factorial and use this to study generalized M-bonomial coefficients. For the sequence of natural numbers, the twin concepts of sequence factorial and generalized M-bonomial coefficients, respectively, extend the corresponding concepts of factorial of an integer and binomial coefficients. Some latent properties…

  17. Chromosome specific repetitive DNA sequences

    DOEpatents

    Moyzis, Robert K.; Meyne, Julianne

    1991-01-01

    A method is provided for determining specific nucleotide sequences useful in forming a probe which can identify specific chromosomes, preferably through in situ hybridization within the cell itself. In one embodiment, chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family me This invention is the result of a contract with the Department of Energy (Contract No. W-7405-ENG-36).

  18. Automated Identification of Nucleotide Sequences

    NASA Technical Reports Server (NTRS)

    Osman, Shariff; Venkateswaran, Kasthuri; Fox, George; Zhu, Dian-Hui

    2007-01-01

    STITCH is a computer program that processes raw nucleotide-sequence data to automatically remove unwanted vector information, perform reverse-complement comparison, stitch shorter sequences together to make longer ones to which the shorter ones presumably belong, and search against the user s choice of private and Internet-accessible public 16S rRNA databases. ["16S rRNA" denotes a ribosomal ribonucleic acid (rRNA) sequence that is common to all organisms.] In STITCH, a template 16S rRNA sequence is used to position forward and reverse reads. STITCH then automatically searches known 16S rRNA sequences in the user s chosen database(s) to find the sequence most similar to (the sequence that lies at the smallest edit distance from) each spliced sequence. The result of processing by STITCH is the identification of the most similar well-described bacterium. Whereas previously commercially available software for analyzing genetic sequences operates on one sequence at a time, STITCH can manipulate multiple sequences simultaneously to perform the aforementioned operations. A typical analysis of several dozen sequences (length of the order of 103 base pairs) by use of STITCH is completed in a few minutes, whereas such an analysis performed by use of prior software takes hours or days.

  19. Sequencing Technologies Panel at SFAF

    SciTech Connect

    Turner, Steve; Fiske, Haley; Knight, Jim; Rhodes, Michael; Vander Horn, Peter

    2010-06-02

    From left to right: Steve Turner of Pacific Biosciences, Haley Fiske of Illumina, Jim Knight of Roche, Michael Rhodes of Life Technologies and Peter Vander Horn of Life Technologies' Single Molecule Sequencing group discuss new sequencing technologies and applications on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  20. Stepping stones in DNA sequencing

    PubMed Central

    Stranneheim, Henrik; Lundeberg, Joakim

    2012-01-01

    In recent years there have been tremendous advances in our ability to rapidly and cost-effectively sequence DNA. This has revolutionized the fields of genetics and biology, leading to a deeper understanding of the molecular events in life processes. The rapid technological advances have enormously expanded sequencing opportunities and applications, but also imposed strains and challenges on steps prior to sequencing and in the downstream process of handling and analysis of these massive amounts of sequence data. Traditionally, sequencing has been limited to small DNA fragments of approximately one thousand bases (derived from the organism's genome) due to issues in maintaining a high sequence quality and accuracy for longer read lengths. Although many technological breakthroughs have been made, currently the commercially available massively parallel sequencing methods have not been able to resolve this issue. However, recent announcements in nanopore sequencing hold the promise of removing this read-length limitation, enabling sequencing of larger intact DNA fragments. The ability to sequence longer intact DNA with high accuracy is a major stepping stone towards greatly simplifying the downstream analysis and increasing the power of sequencing compared to today. This review covers some of the technical advances in sequencing that have opened up new frontiers in genomics. PMID:22887891

  1. RIKEN integrated sequence analysis (RISA) system--384-format sequencing pipeline with 384 multicapillary sequencer.

    PubMed

    Shibata, K; Itoh, M; Aizawa, K; Nagaoka, S; Sasaki, N; Carninci, P; Konno, H; Akiyama, J; Nishi, K; Kitsunai, T; Tashiro, H; Itoh, M; Sumi, N; Ishii, Y; Nakamura, S; Hazama, M; Nishine, T; Harada, A; Yamamoto, R; Matsumoto, H; Sakaguchi, S; Ikegami, T; Kashiwagi, K; Fujiwake, S; Inoue, K; Togawa, Y

    2000-11-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3' end and 5' end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can be

  2. RIKEN Integrated Sequence Analysis (RISA) System—384-Format Sequencing Pipeline with 384 Multicapillary Sequencer

    PubMed Central

    Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide

    2000-01-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can

  3. Event sequence detector

    NASA Technical Reports Server (NTRS)

    Hanna, M. F. (Inventor)

    1973-01-01

    An event sequence detector is described with input units, each associated with a row of bistable elements arranged in an array of rows and columns. The detector also includes a shift register which is responsive to clock pulses from any of the units to sequentially provide signals on its output lines each of which is connected to the bistable elements in a corresponding column. When the event-indicating signal is received by an input unit it provides a clock pulse to the shift register to provide the signal on one of its output lines. The input unit also enables all its bistable elements so that the particular element in the column supplied with the signal from the register is driven to an event-indicating state.

  4. Asteroid Ida Rotation Sequence

    NASA Technical Reports Server (NTRS)

    1994-01-01

    This montage of 14 images (the time order is right to left, bottom to top) shows Ida as it appeared in the field of view of Galileo's camera on August 28, 1993. Asteroid Ida rotates once every 4 hours, 39 minutes and clockwise when viewed from above the north pole; these images cover about one Ida 'day.' This sequence has been used to create a 3-D model that shows Ida to be almost croissant shaped. The earliest view (lower right) was taken from a range of 240,000 kilometers (150,000 miles), 5.4 hours before closest approach. The asteroid Ida draws its name from mythology, in which the Greek god Zeus was raised by the nymph Ida.

  5. Relay Sequence Generation Software

    NASA Technical Reports Server (NTRS)

    Gladden, Roy E.; Khanampompan, Teerapat

    2009-01-01

    Due to thermal and electromagnetic interactivity between the UHF (ultrahigh frequency) radio onboard the Mars Reconnaissance Orbiter (MRO), which performs relay sessions with the Martian landers, and the remainder of the MRO payloads, it is required to integrate and de-conflict relay sessions with the MRO science plan. The MRO relay SASF/PTF (spacecraft activity sequence file/ payload target file) generation software facilitates this process by generating a PTF that is needed to integrate the periods of time during which MRO supports relay activities with the rest of the MRO science plans. The software also generates the needed command products that initiate the relay sessions, some features of which are provided by the lander team, some are managed by MRO internally, and some being derived.

  6. Solid phase sequencing of biopolymers

    DOEpatents

    Cantor, Charles; Koster, Hubert

    2010-09-28

    This invention relates to methods for detecting and sequencing target nucleic acid sequences, to mass modified nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probes comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include DNA or RNA in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated molecular weight analysis and identification of the target sequence.

  7. Recent Advances in Sequencing Technology

    NASA Astrophysics Data System (ADS)

    Thompson, John F.; Ozsolak, Fatih; Milos, Patrice M.

    As we celebrate the tenth anniversary of the sequencing of the first human genome, we recognize the remarkable technological innovation that now provides the ability to resequence thousands of human genomes a year. While the current methods of choice utilize amplification-based methods and the corresponding challenges of sample preparation that accompany these methods, new technologies that do not require amplification have emerged. Single-molecule sequencing methods have the potential to dramatically shape the next 10 years of technological progress driven by the continuing interest of driving the cost of whole genome sequencing below the 1000 cost threshold. Yet while whole genome sequencing remains of interest, sequencing technologies also enable new approaches for genome exploration and experimentation including direct RNA sequencing, complete transcript sequencing and real time methods for both nucleic acid and enzyme kinetics.

  8. The evolution of nanopore sequencing

    PubMed Central

    Wang, Yue; Yang, Qiuping; Wang, Zhimin

    2014-01-01

    The “$1000 Genome” project has been drawing increasing attention since its launch a decade ago. Nanopore sequencing, the third-generation, is believed to be one of the most promising sequencing technologies to reach four gold standards set for the “$1000 Genome” while the second-generation sequencing technologies are bringing about a revolution in life sciences, particularly in genome sequencing-based personalized medicine. Both of protein and solid-state nanopores have been extensively investigated for a series of issues, from detection of ionic current blockage to field-effect-transistor (FET) sensors. A newly released protein nanopore sequencer has shown encouraging potential that nanopore sequencing will ultimately fulfill the gold standards. In this review, we address advances, challenges, and possible solutions of nanopore sequencing according to these standards. PMID:25610451

  9. Making sense of deep sequencing.

    PubMed

    Goldman, D; Domschke, K

    2014-10-01

    This review, the first of an occasional series, tries to make sense of the concepts and uses of deep sequencing of polynucleic acids (DNA and RNA). Deep sequencing, synonymous with next-generation sequencing, high-throughput sequencing and massively parallel sequencing, includes whole genome sequencing but is more often and diversely applied to specific parts of the genome captured in different ways, for example the highly expressed portion of the genome known as the exome and portions of the genome that are epigenetically marked either by DNA methylation, the binding of proteins including histones, or that are in different configurations and thus more or less accessible to enzymes that cleave DNA. Deep sequencing of RNA (RNASeq) reverse-transcribed to complementary DNA is invaluable for measuring RNA expression and detecting changes in RNA structure. Important concepts in deep sequencing include the length and depth of sequence reads, mapping and assembly of reads, sequencing error, haplotypes, and the propensity of deep sequencing, as with other types of 'big data', to generate large numbers of errors, requiring monitoring for methodologic biases and strategies for replication and validation. Deep sequencing yields a unique genetic fingerprint that can be used to identify a person, and a trove of predictors of genetic medical diseases. Deep sequencing to identify epigenetic events including changes in DNA methylation and RNA expression can reveal the history and impact of environmental exposures. Because of the power of sequencing to identify and deliver biomedically significant information about a person and their blood relatives, it creates ethical dilemmas and practical challenges in research and clinical care, for example the decision and procedures to report incidental findings that will increasingly and frequently be discovered.

  10. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  11. Large-Scale Sequence Comparison.

    PubMed

    Lal, Devi; Verma, Mansi

    2017-01-01

    There are millions of sequences deposited in genomic databases, and it is an important task to categorize them according to their structural and functional roles. Sequence comparison is a prerequisite for proper categorization of both DNA and protein sequences, and helps in assigning a putative or hypothetical structure and function to a given sequence. There are various methods available for comparing sequences, alignment being first and foremost for sequences with a small number of base pairs as well as for large-scale genome comparison. Various tools are available for performing pairwise large sequence comparison. The best known tools either perform global alignment or generate local alignments between the two sequences. In this chapter we first provide basic information regarding sequence comparison. This is followed by the description of the PAM and BLOSUM matrices that form the basis of sequence comparison. We also give a practical overview of currently available methods such as BLAST and FASTA, followed by a description and overview of tools available for genome comparison including LAGAN, MumMER, BLASTZ, and AVID.

  12. Explanatory chapter: next generation sequencing.

    PubMed

    Yegnasubramanian, Srinivasan

    2013-01-01

    Technological breakthroughs in sequencing technologies have driven the advancement of molecular biology and molecular genetics research. The advent of high-throughput Sanger sequencing (for information on the method, see Sanger Dideoxy Sequencing of DNA) in the mid- to late-1990s made possible the accelerated completion of the human genome project, which has since revolutionized the pace of discovery in biomedical research. Similarly, the advent of next generation sequencing is poised to revolutionize biomedical research and usher a new era of individualized, rational medicine. The term next generation sequencing refers to technologies that have enabled the massively parallel analysis of DNA sequence facilitated through the convergence of advancements in molecular biology, nucleic acid chemistry and biochemistry, computational biology, and electrical and mechanical engineering. The current next generation sequencing technologies are capable of sequencing tens to hundreds of millions of DNA templates simultaneously and generate >4 gigabases of sequence in a single day. These technologies have largely started to replace high-throughput Sanger sequencing for large-scale genomic projects, and have created significant enthusiasm for the advent of a new era of individualized medicine. Copyright © 2013 Elsevier Inc. All rights reserved.

  13. Graphene nanodevices for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Heerema, Stephanie J.; Dekker, Cees

    2016-02-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with nanopores. Owing to its unique structure and properties, graphene provides interesting opportunities for the development of a new sequencing technology. In recent years, a wide range of creative ideas for graphene sequencers have been theoretically proposed and the first experimental demonstrations have begun to appear. Here, we review the different approaches to using graphene nanodevices for DNA sequencing, which involve DNA passing through graphene nanopores, nanogaps, and nanoribbons, and the physisorption of DNA on graphene nanostructures. We discuss the advantages and problems of each of these key techniques, and provide a perspective on the use of graphene in future DNA sequencing technology.

  14. Solid phase sequencing of biopolymers

    SciTech Connect

    Cantor, Charles R.; Hubert, Koster

    2014-06-24

    This invention relates to methods for detecting and sequencing target nucleic acid sequences, to mass modified nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probes comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Probes may be affixed to a solid support such as a hybridization chip to facilitate automated molecular weight analysis and identification of the target sequence.

  15. Towards single molecule DNA sequencing

    NASA Astrophysics Data System (ADS)

    Liu, Hao

    Single molecule DNA Sequencing technology has been a hot research topic in the recent decades because it holds the promise to sequence a human genome in a fast and affordable way, which will eventually make personalized medicine possible. Single molecule differentiation and DNA translocation control are the two main challenges in all single molecule DNA sequencing methods. In this thesis, I will first introduce DNA sequencing technology development and its application, and then explain the performance and limitation of prior art in detail. Following that, I will show a single molecule DNA base differentiation result obtained in recognition tunneling experiments. Furthermore, I will explain the assembly of a nanofluidic platform for single strand DNA translocation, which holds the promised to be integrated into a single molecule DNA sequencing instrument for DNA translocation control. Taken together, my dissertation research demonstrated the potential of using recognition tunneling techniques to serve as a general readout system for single molecule DNA sequencing application.

  16. Nuclear RNA Isolation and Sequencing.

    PubMed

    Dhaliwal, Navroop K; Mitchell, Jennifer A

    2016-01-01

    Most transcriptome studies involve sequencing and quantification of steady-state mRNA by isolating and sequencing poly (A) RNA. Although this type of sequencing data is informative to determine steady-state mRNA levels it does not provide information on transcriptional output and thus may not always reflect changes in transcriptional regulation of gene expression. Furthermore, sequencing poly (A) RNA may miss transcribed regions of the genome not usually modified by polyadenylation which includes many long noncoding RNAs. Here, we describe nuclear-RNA sequencing (nucRNA-seq) which investigates the transcriptional landscape through sequencing and quantification of nuclear RNAs which are both unspliced and spliced transcripts for protein-coding genes and nuclear-retained long noncoding RNAs.

  17. Turtle Graphics of Morphic Sequences

    NASA Astrophysics Data System (ADS)

    Zantema, Hans

    2016-02-01

    The simplest infinite sequences that are not ultimately periodic are pure morphic sequences: fixed points of particular morphisms mapping single symbols to strings of symbols. A basic way to visualize a sequence is by a turtle curve: for every alphabet symbol fix an angle, and then consecutively for all sequence elements draw a unit segment and turn the drawing direction by the corresponding angle. This paper investigates turtle curves of pure morphic sequences. In particular, criteria are given for turtle curves being finite (consisting of finitely many segments), and for being fractal or self-similar: it contains an up-scaled copy of itself. Also space-filling turtle curves are considered, and a turtle curve that is dense in the plane. As a particular result we give an exact relationship between the Koch curve and a turtle curve for the Thue-Morse sequence, where until now for such a result only approximations were known.

  18. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    SciTech Connect

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by patterns in

  19. Nonlinear analysis of biological sequences

    SciTech Connect

    Torney, D.C.; Bruno, W.; Detours, V.

    1998-11-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The main objectives of this project involved deriving new capabilities for analyzing biological sequences. The authors focused on tabulating the statistical properties exhibited by Human coding DNA sequences and on techniques of inferring the phylogenetic relationships among protein sequences related by descent.

  20. Venter wins sequencing race - twice

    SciTech Connect

    Nowak, R.

    1995-06-02

    This article discusses the end of the race to sequence the first complete genome of a free-living organism. Craig Venter of the Institute for Geonomic Research unveiled the complete sequences of two bacteria: Haemophilus influenzae and Mycoplasma genitalium at the American Society of Microbiology Meeting in May 1995. Because there are many similarities in bacterial and human biochemistry, the sequences will be useful for searching for human genes.

  1. Biosensors for DNA sequence detection

    NASA Technical Reports Server (NTRS)

    Vercoutere, Wenonah; Akeson, Mark

    2002-01-01

    DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.

  2. Direct-Sequence Communication Systems

    DTIC Science & Technology

    2004-03-01

    modulation; CMF = chip-matched filter; SSG = spreading sequence generator. Delay = 0 for QPSK; delay = Tc/2 for OQPSK and MSK...balanced quaternary modulation (de- lay = 0 for QPSK and delay = Tc/2 for OQPSK and MSK); CMF = chip- matched filter; SSG = spreading sequence...with dual quaternary modulation; CMF = chip-matched filter; SSG = spreading sequence generator. Delay = 0 for QPSK; delay = Tc/2 for OQPSK and MSK. 33 N

  3. Biosensors for DNA sequence detection

    NASA Technical Reports Server (NTRS)

    Vercoutere, Wenonah; Akeson, Mark

    2002-01-01

    DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.

  4. Orbital-Maneuver-Sequence Optimization

    DTIC Science & Technology

    1985-12-01

    optimization computer program and applied it to the generation of optimal cog-brbital attack4ianeuver sequences * and to the generation of optimal evasions...maneuver-sequence- optimization computer programs can be improved by a general restructuring and streamlining and the addition of various features. It is...believed that with further development and systematic testing the programs have potential for real-time generation of optimal maneuver sequences in an

  5. Fungal Genome Sequencing and Bioenergy

    SciTech Connect

    Baker, Scott E.; Thykaer, Jette; Adney, William S.; Brettin, T.; Brockman, Fred J.; D'haeseleer, Patrik; Martinez, Antonio D.; Miller, R. M.; Rokhsar, Daniel S.; Schadt, Christopher W.; Torok, Tamas; Tuskan, Gerald; Bennett, Joan W.; Berka, Randy; Briggs, Steve; Heitman, Joseph; Taylor, John; Turgeon, Barbara G.; Werner-Washburne, Maggie; Himmel, Michael E.

    2008-09-30

    To date, the number of ongoing filamentous fungal genome sequencing projects is almost tenfold fewer than those of bacterial and archaeal genome projects. The fungi chosen for sequencing represent narrow kingdom diversity; most are pathogens or models. We advocate an ambitious, forward-looking phylogenetic-based genome sequencing program, designed to capture metabolic diversity within the fungal kingdom, thereby enhancing research into alternative bioenergy sources, bioremediation, and fungal-environment interactions.

  6. SNMR pulse sequence phase cycling

    DOEpatents

    Walsh, David O; Grunewald, Elliot D

    2013-11-12

    Technologies applicable to SNMR pulse sequence phase cycling are disclosed, including SNMR acquisition apparatus and methods, SNMR processing apparatus and methods, and combinations thereof. SNMR acquisition may include transmitting two or more SNMR pulse sequences and applying a phase shift to a pulse in at least one of the pulse sequences, according to any of a variety cycling techniques. SNMR processing may include combining SNMR from a plurality of pulse sequences comprising pulses of different phases, so that desired signals are preserved and indesired signals are canceled.

  7. Establishing homologies in protein sequences

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.; Barker, W. C.; Hunt, L. T.

    1983-01-01

    Computer-based statistical techniques used to determine homologies between proteins occurring in different species are reviewed. The technique is based on comparison of two protein sequences, either by relating all segments of a given length in one sequence to all segments of the second or by finding the best alignment of the two sequences. Approaches discussed include selection using printed tabulations, identification of very similar sequences, and computer searches of a database. The use of the SEARCH, RELATE, and ALIGN programs (Dayhoff, 1979) is explained; sample data are presented in graphs, diagrams, and tables and the construction of scoring matrices is considered.

  8. Establishing homologies in protein sequences

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.; Barker, W. C.; Hunt, L. T.

    1983-01-01

    Computer-based statistical techniques used to determine homologies between proteins occurring in different species are reviewed. The technique is based on comparison of two protein sequences, either by relating all segments of a given length in one sequence to all segments of the second or by finding the best alignment of the two sequences. Approaches discussed include selection using printed tabulations, identification of very similar sequences, and computer searches of a database. The use of the SEARCH, RELATE, and ALIGN programs (Dayhoff, 1979) is explained; sample data are presented in graphs, diagrams, and tables and the construction of scoring matrices is considered.

  9. Sequence independent amplification of DNA

    DOEpatents

    Bohlander, Stefan K.

    1998-01-01

    The present invention is a rapid sequence-independent amplification procedure (SIA). Even minute amounts of DNA from various sources can be amplified independent of any sequence requirements of the DNA or any a priori knowledge of any sequence characteristics of the DNA to be amplified. This method allows, for example the sequence independent amplification of microdissected chromosomal material and the reliable construction of high quality fluorescent in situ hybridization (FISH) probes from YACs or from other sources. These probes can be used to localize YACs on metaphase chromosomes but also--with high efficiency--in interphase nuclei.

  10. Sequence independent amplification of DNA

    DOEpatents

    Bohlander, S.K.

    1998-03-24

    The present invention is a rapid sequence-independent amplification procedure (SIA). Even minute amounts of DNA from various sources can be amplified independent of any sequence requirements of the DNA or any a priori knowledge of any sequence characteristics of the DNA to be amplified. This method allows, for example, the sequence independent amplification of microdissected chromosomal material and the reliable construction of high quality fluorescent in situ hybridization (FISH) probes from YACs or from other sources. These probes can be used to localize YACs on metaphase chromosomes but also--with high efficiency--in interphase nuclei. 25 figs.

  11. Representations of mechanical assembly sequences

    NASA Technical Reports Server (NTRS)

    Homem De Mello, Luiz S.; Sanderson, Arthur C.

    1991-01-01

    Five types of representations for assembly sequences are reviewed: the directed graph of feasible assembly sequences, the AND/OR graph of feasible assembly sequences, the set of establishment conditions, and two types of sets of precedence relationships. (precedence relationships between the establishment of one connection between parts and the establishment of another connection, and precedence relationships between the establishment of one connection and states of the assembly process). The mappings of one representation into the others are established. The correctness and completeness of these representations are established. The results presented are needed in the proof of correctness and completeness of algorithms for the generation of mechanical assembly sequences.

  12. Preferential Amplification of Pathogenic Sequences.

    PubMed

    Ge, Fang; Parker, Jayme; Chul Choi, Sang; Layer, Mark; Ross, Katherine; Jilly, Bernard; Chen, Jack

    2015-06-11

    The application of next generation sequencing (NGS) technology in the diagnosis of human pathogens is hindered by the fact that pathogenic sequences, especially viral, are often scarce in human clinical specimens. This known disproportion leads to the requirement of subsequent deep sequencing and extensive bioinformatics analysis. Here we report a method we called "Preferential Amplification of Pathogenic Sequences (PATHseq)" that can be used to greatly enrich pathogenic sequences. Using a computer program, we developed 8-, 9-, and 10-mer oligonucleotides called "non-human primers" that do not match the most abundant human transcripts, but instead selectively match transcripts of human pathogens. Instead of using random primers in the construction of cDNA libraries, the PATHseq method recruits these short non-human primers, which in turn, preferentially amplifies non-human, presumably pathogenic sequences. Using this method, we were able to enrich pathogenic sequences up to 200-fold in the final sequencing library. This method does not require prior knowledge of the pathogen or assumption of the infection; therefore, it provides a fast and sequence-independent approach for detection and identification of human viruses and other pathogens. The PATHseq method, coupled with NGS technology, can be broadly used in identification of known human pathogens and discovery of new pathogens.

  13. Automated Sequence Preprocessing in a Large-Scale Sequencing Environment

    PubMed Central

    Wendl, Michael C.; Dear, Simon; Hodgson, Dave; Hillier, LaDeana

    1998-01-01

    A software system for transforming fragments from four-color fluorescence-based gel electrophoresis experiments into assembled sequence is described. It has been developed for large-scale processing of all trace data, including shotgun and finishing reads, regardless of clone origin. Design considerations are discussed in detail, as are programming implementation and graphic tools. The importance of input validation, record tracking, and use of base quality values is emphasized. Several quality analysis metrics are proposed and applied to sample results from recently sequenced clones. Such quantities prove to be a valuable aid in evaluating modifications of sequencing protocol. The system is in full production use at both the Genome Sequencing Center and the Sanger Centre, for which combined weekly production is ∼100,000 sequencing reads per week. PMID:9750196

  14. Isolation and characterization of cDNA clones for rat ribophorin I: complete coding sequence and in vitro synthesis and insertion of the encoded product into endoplasmic reticulum membranes

    PubMed Central

    1987-01-01

    Ribophorins I and II are two transmembrane glycoproteins that are characteristic of the rough endoplasmic reticulum and are thought to be part of the apparatus that affects the co-translational translocation of polypeptides synthesized on membrane-bound polysomes. A ribophorin I cDNA clone containing a 0.6-kb insert was isolated from a rat liver lambda gtll cDNA library by immunoscreening with specific antibodies. This cDNA was used to isolate a clone (2.3 kb) from a rat brain lambda gtll cDNA library that contains the entire ribophorin I coding sequence. SP6 RNA transcripts of the insert in this clone directed the in vitro synthesis of a polypeptide of the expected size that was immunoprecipitated with anti-ribophorin I antibodies. When synthesized in the presence of microsomes, this polypeptide, like the translation product of the natural ribophorin I mRNA, underwent membrane insertion, signal cleavage, and co-translational glycosylation. The complete amino acid sequence of the polypeptide encoded in the cDNA insert was derived from the nucleotide sequence and found to contain a segment that corresponds to a partial amino terminal sequence of ribophorin I that was obtained by Edman degradation. This confirmed the identity of the cDNA clone and established that ribophorin I contains 583 amino acids and is synthesized with a cleavable amino terminal insertion signal of 22 residues. Analysis of the amino acid sequence of ribophorin I suggested that the polypeptide has a simple transmembrane disposition with a rather hydrophilic carboxy terminal segment of 150 amino acids exposed on the cytoplasmic face of the membrane, and a luminal domain of 414 amino acids containing three potential N-glycosylation sites. Hybridization measurements using the cloned cDNA as a probe showed that ribophorin I mRNA levels increase fourfold 15 h after partial hepatectomy, in confirmation of measurements made by in vitro translation of liver mRNA. Southern blot analysis of rat genomic

  15. VOE Accounting: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 2-year program in accounting. The guide consists of a course description; general course objectives;…

  16. DNA Sequencing Sensors: An Overview

    PubMed Central

    Garrido-Cardenas, Jose Antonio; Garcia-Maroto, Federico; Alvarez-Bermejo, Jose Antonio; Manzano-Agugliaro, Francisco

    2017-01-01

    The first sequencing of a complete genome was published forty years ago by the double Nobel Prize in Chemistry winner Frederick Sanger. That corresponded to the small sized genome of a bacteriophage, but since then there have been many complex organisms whose DNA have been sequenced. This was possible thanks to continuous advances in the fields of biochemistry and molecular genetics, but also in other areas such as nanotechnology and computing. Nowadays, sequencing sensors based on genetic material have little to do with those used by Sanger. The emergence of mass sequencing sensors, or new generation sequencing (NGS) meant a quantitative leap both in the volume of genetic material that was able to be sequenced in each trial, as well as in the time per run and its cost. One can envisage that incoming technologies, already known as fourth generation sequencing, will continue to cheapen the trials by increasing DNA reading lengths in each run. All of this would be impossible without sensors and detection systems becoming smaller and more precise. This article provides a comprehensive overview on sensors for DNA sequencing developed within the last 40 years. PMID:28335417

  17. The EMBL Nucleotide Sequence Database.

    PubMed

    Stoesser, Guenter; Baker, Wendy; van den Broek, Alexandra; Camon, Evelyn; Garcia-Pastor, Maria; Kanz, Carola; Kulikova, Tamara; Leinonen, Rasko; Lin, Quan; Lombard, Vincent; Lopez, Rodrigo; Redaschi, Nicole; Stoehr, Peter; Tuli, Mary Ann; Tzouvara, Katerina; Vaughan, Robert

    2002-01-01

    The EMBL Nucleotide Sequence Database (aka EMBL-Bank; http://www.ebi.ac.uk/embl/) incorporates, organises and distributes nucleotide sequences from all available public sources. EMBL-Bank is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis. Major contributors to the EMBL database are individual scientists and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, email and World Wide Web interfaces. EBI's Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many other specialized databases. For sequence similarity searching, a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk.

  18. DNA Sequencing by Capillary Electrophoresis

    PubMed Central

    Karger, Barry L.; Guttman, Andras

    2009-01-01

    Sequencing of human and other genomes has been at the center of interest in the biomedical field over the past several decades and is now leading toward an era of personalized medicine. During this time, DNA sequencing methods have evolved from the labor intensive slab gel electrophoresis, through automated multicapillary electrophoresis systems using fluorophore labeling with multispectral imaging, to the “next generation” technologies of cyclic array, hybridization based, nanopore and single molecule sequencing. Deciphering the genetic blueprint and follow-up confirmatory sequencing of Homo sapiens and other genomes was only possible by the advent of modern sequencing technologies that was a result of step by step advances with a contribution of academics, medical personnel and instrument companies. While next generation sequencing is moving ahead at break-neck speed, the multicapillary electrophoretic systems played an essential role in the sequencing of the Human Genome, the foundation of the field of genomics. In this prospective, we wish to overview the role of capillary electrophoresis in DNA sequencing based in part of several of our articles in this journal. PMID:19517496

  19. Diesel Mechanics: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a diesel mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  20. Chameleon sequences in neurodegenerative diseases.

    PubMed

    Bahramali, Golnaz; Goliaei, Bahram; Minuchehr, Zarrin; Salari, Ali

    2016-03-25

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to "helix to strand (HE)", "helix to coil (HC)" and "strand to coil (CE)" alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases.

  1. Chameleon sequences in neurodegenerative diseases

    SciTech Connect

    Bahramali, Golnaz; Goliaei, Bahram; Minuchehr, Zarrin; Salari, Ali

    2016-03-25

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to “helix to strand (HE)”, “helix to coil (HC)” and “strand to coil (CE)” alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases.

  2. Auto Mechanics: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for an auto mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  3. Urban Horticulture: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 4-year program in urban horticulture. The guide consists of a course description; general course…

  4. Urban Horticulture: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 4-year program in urban horticulture. The guide consists of a course description; general course…

  5. Recently published protein sequences. I.

    NASA Technical Reports Server (NTRS)

    Jukes, T. H.; Holmquist, R.

    1972-01-01

    Some polypeptide sequences that have been published in the 1972 scientific literature are listed. Only selected sequences are included. The compilation has two objectives. Current information between periods when more comprehensive compilations are published is to be assembled and the use of data that do not include arrangements of unsequenced peptides for 'maximum homology' is to be encouraged.

  6. Health Occupations: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 3-year program in health occupations. The guide consists of a course description; general course…

  7. AMPLIFICATION OF RIBOSOMAL RNA SEQUENCES

    EPA Science Inventory

    This book chapter offers an overview of the use of ribosomal RNA sequences. A history of the technology traces the evolution of techniques to measure bacterial phylogenetic relationships and recent advances in obtaining rRNA sequence information. The manual also describes procedu...

  8. Assembly of shotgun sequencing data

    SciTech Connect

    Huang, Xiaoqiu

    1996-12-31

    We present a simple algorithm for construction of the DNA sequence from a set of fragments generated in a shotgun sequencing project. The algorithm is based on rigorous detection of overlaps among fragments. We report assembly results of the algorithm on two genomic data sets. 14 refs., 1 fig.

  9. Commercial Art: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a commercial art vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  10. Venturia carpophila draft genome sequence

    USDA-ARS?s Scientific Manuscript database

    Venturia carpophila causes peach scab, a disease that renders peach fruit unmarketable. We report a high-quality draft genome sequence (36.9 Mb) of V. carpophila from an isolate collected from a peach tree in central Georgia in the United States. The genome sequence described will be a useful resour...

  11. VOE Accounting: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 2-year program in accounting. The guide consists of a course description; general course objectives;…

  12. VOE Clerical: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 2-year program in clerical skills. The guide consists of a course description; general course…

  13. Aircraft Mechanics: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for an aircraft mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and…

  14. Aircraft Mechanics: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for an aircraft mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and…

  15. Auto Mechanics: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for an auto mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  16. AMPLIFICATION OF RIBOSOMAL RNA SEQUENCES

    EPA Science Inventory

    This book chapter offers an overview of the use of ribosomal RNA sequences. A history of the technology traces the evolution of techniques to measure bacterial phylogenetic relationships and recent advances in obtaining rRNA sequence information. The manual also describes procedu...

  17. Diesel Mechanics: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a diesel mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  18. Rapid Diagnostics of Onboard Sequences

    NASA Technical Reports Server (NTRS)

    Starbird, Thomas W.; Morris, John R.; Shams, Khawaja S.; Maimone, Mark W.

    2012-01-01

    Keeping track of sequences onboard a spacecraft is challenging. When reviewing Event Verification Records (EVRs) of sequence executions on the Mars Exploration Rover (MER), operators often found themselves wondering which version of a named sequence the EVR corresponded to. The lack of this information drastically impacts the operators diagnostic capabilities as well as their situational awareness with respect to the commands the spacecraft has executed, since the EVRs do not provide argument values or explanatory comments. Having this information immediately available can be instrumental in diagnosing critical events and can significantly enhance the overall safety of the spacecraft. This software provides auditing capability that can eliminate that uncertainty while diagnosing critical conditions. Furthermore, the Restful interface provides a simple way for sequencing tools to automatically retrieve binary compiled sequence SCMFs (Space Command Message Files) on demand. It also enables developers to change the underlying database, while maintaining the same interface to the existing applications. The logging capabilities are also beneficial to operators when they are trying to recall how they solved a similar problem many days ago: this software enables automatic recovery of SCMF and RML (Robot Markup Language) sequence files directly from the command EVRs, eliminating the need for people to find and validate the corresponding sequences. To address the lack of auditing capability for sequences onboard a spacecraft during earlier missions, extensive logging support was added on the Mars Science Laboratory (MSL) sequencing server. This server is responsible for generating all MSL binary SCMFs from RML input sequences. The sequencing server logs every SCMF it generates into a MySQL database, as well as the high-level RML file and dictionary name inputs used to create the SCMF. The SCMF is then indexed by a hash value that is automatically included in all command

  19. Recent advances in nanopore sequencing

    PubMed Central

    Maitra, Raj D.; Kim, Jungsuk; Dunbar, William B.

    2013-01-01

    The prospect of nanopores as a next-generation sequencing (NGS) platform has been a topic of growing interest and considerable government-sponsored research for more than a decade. Oxford Nanopore Technologies recently announced the first commercial nanopore sequencing devices, to be made available by the end of 2012, while other companies (Life, Roche, IBM) are also pursuing nanopore sequencing approaches. In this paper, the state of the art in nanopore sequencing is reviewed, focusing on the most recent contributions that have or promise to have NGS commercial potential. We consider also the scalability of the circuitry to support multichannel arrays of nanopores in future sequencing devices, which is critical to commercial viability. PMID:23138639

  20. Pathogenetic mechanisms of fetal akinesia deformation sequence and oligohydramnios sequence.

    PubMed

    Rodríguez, J I; Palacios, J

    1991-09-01

    This article briefly reviews the participation of fetal compression, muscular weakness, and fetal akinesia in the genesis of the anomalies found in fetal akinesia deformation sequence (FADS) and oligohydramnios sequence (OS). Both sequences share phenotypic manifestations, such as arthrogryposis, short umbilical cord, and lung hypoplasia, in relation to decreased intrauterine fetal motility. Other characteristic manifestations found in OS, such as Potter face, and redundant skin, are produced by fetal compression. On the other hand, growth retardation, craniofacial anomalies, micrognathia, long bone hypoplasia, and polyhydramnios found in FADS could be related to intrauterine muscular weakness.

  1. Image analysis for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Palaniappan, Kannappan; Huang, Thomas S.

    1991-07-01

    There is a great deal of interest in automating the process of DNA (deoxyribonucleic acid) sequencing to support the analysis of genomic DNA such as the Human and Mouse Genome projects. In one class of gel-based sequencing protocols autoradiograph images are generated in the final step and usually require manual interpretation to reconstruct the DNA sequence represented by the image. The need to handle a large volume of sequence information necessitates automation of the manual autoradiograph reading step through image analysis in order to reduce the length of time required to obtain sequence data and reduce transcription errors. Various adaptive image enhancement, segmentation and alignment methods were applied to autoradiograph images. The methods are adaptive to the local characteristics of the image such as noise, background signal, or presence of edges. Once the two-dimensional data is converted to a set of aligned one-dimensional profiles waveform analysis is used to determine the location of each band which represents one nucleotide in the sequence. Different classification strategies including a rule-based approach are investigated to map the profile signals, augmented with the original two-dimensional image data as necessary, to textual DNA sequence information.

  2. From mapping to sequencing, post-sequencing and beyond.

    PubMed

    Sasaki, Takuji; Matsumoto, Takashi; Antonio, Baltazar A; Nagamura, Yoshiaki

    2005-01-01

    The Rice Genome Research Program (RGP) in Japan has been collaborating with the international community in elucidating a complete high-quality sequence of the rice genome. As the pioneer in large-scale analysis of the rice genome, the RGP has successfully established the fundamental tools for genome research such as a genetic map, a yeast artificial chromosome (YAC)-based physical map, a transcript map and a phage P1 artificial chromosome (PAC)/bacterial artificial chromosome (BAC) sequence-ready physical map, which serve as common resources for genome sequencing. Among the 12 rice chromosomes, the RGP is in charge of sequencing six chromosomes covering 52% of the 390 Mb total length of the genome. The contribution of the RGP to the realization of decoding the rice genome sequence with high accuracy and deciphering the genetic information in the genome will have a great impact in understanding the biology of the rice plant that provides a major food source for almost half of the world's population. A high-quality draft sequence (phase 2) was completed in December 2002. Since then, much of the finished quality sequence (phase 3) has become available in public databases. With the completion of sequencing in December 2004, it is expected that the genome sequence would facilitate innovative research in functional and applied genomics. A map-based genome sequence is indispensable for further improvement of current rice varieties and for development of novel varieties carrying agronomically important traits such as high yield potential and tolerance to both biotic and abiotic stresses. In addition to genome sequencing, various related projects have been initiated to generate valuable resources, which could serve as indispensable tools in clarifying the structure and function of the rice genome. These resources have been made available to the scientific community through the Rice Genome Resource Center (RGRC) of the National Institute of Agrobiological Sciences (NIAS) to

  3. Study Design for Sequencing Studies.

    PubMed

    Honaas, Loren A; Altman, Naomi S; Krzywinski, Martin

    2016-01-01

    Once a biochemical method has been devised to sample RNA or DNA of interest, sequencing can be used to identify the sampled molecules with high fidelity and low bias. High-throughput sequencing has therefore become the primary data acquisition method for many genomics studies and is being used more and more to address molecular biology questions. By applying principles of statistical experimental design, sequencing experiments can be made more sensitive to the effects under study as well as more biologically sound, hence more replicable.

  4. Compilation of small RNA sequences.

    PubMed

    Shumyatsky, G; Reddy, R

    1992-05-11

    This is an update containing small RNA sequences published during 1991. Approximately two hundred small RNA sequences are available in this and earlier compilations. The hard copy print out of this set will be available directly from us (inquiries should be addressed to R. Reddy). These files are also available on GenBank computer. Sequences from various sources covered in earlier compilations (see Reddy, R. Nucl. Acids Res. 16:r71; Reddy, R. and Gupta, S. Nucl Acids Res. 1990 Supplement, 18:2231 and 1991 Supplement, 19:2073) are not included in this update but are listed below.

  5. Inferring phylogenies of evolving sequences without multiple sequence alignment.

    PubMed

    Chan, Cheong Xin; Bernard, Guillaume; Poirion, Olivier; Hogan, James M; Ragan, Mark A

    2014-09-30

    Alignment-free methods, in which shared properties of sub-sequences (e.g. identity or match length) are extracted and used to compute a distance matrix, have recently been explored for phylogenetic inference. However, the scalability and robustness of these methods to key evolutionary processes remain to be investigated. Here, using simulated sequence sets of various sizes in both nucleotides and amino acids, we systematically assess the accuracy of phylogenetic inference using an alignment-free approach, based on D2 statistics, under different evolutionary scenarios. We find that compared to a multiple sequence alignment approach, D2 methods are more robust against among-site rate heterogeneity, compositional biases, genetic rearrangements and insertions/deletions, but are more sensitive to recent sequence divergence and sequence truncation. Across diverse empirical datasets, the alignment-free methods perform well for sequences sharing low divergence, at greater computation speed. Our findings provide strong evidence for the scalability and the potential use of alignment-free methods in large-scale phylogenomics.

  6. Multiple sequence alignment based on profile alignment of intermediate sequences.

    PubMed

    Lu, Yue; Sze, Sing-Hoi

    2008-09-01

    Despite considerable efforts, it remains difficult to obtain accurate multiple sequence alignments. By using additional hits from database search of the input sequences, a few strategies have been proposed to significantly improve alignment accuracy, including the construction of profiles from the hits while performing profile alignment, the inclusion of high scoring hits into the input sequences, the use of intermediate sequence search to link distant homologs, and the use of secondary structure information. We develop an algorithm that integrates these strategies to further improve alignment accuracy by modifying the pair-Hidden Markov Model (HMM) approach in ProbCons to incorporate profiles of intermediate sequences from database search and utilize secondary structure predictions as in SPEM. We test our algorithm on a few sets of benchmark multiple alignments, including BAliBASE, HOMSTRAD, PREFAB, and SABmark, and show that it significantly outperforms MAFFT and ProbCons, which are among the best multiple alignment algorithms that do not utilize additional information, and SPEM, which is among the best multiple alignment algorithms that utilize additional hits from database search. The improvement in accuracy over SPEM can be as much as 5-10% when aligning divergent sequences. A software program that implements this approach (ISPAlign) is available at http://faculty.cs.tamu.edu/shsze/ispalign.

  7. Inferring phylogenies of evolving sequences without multiple sequence alignment

    PubMed Central

    Chan, Cheong Xin; Bernard, Guillaume; Poirion, Olivier; Hogan, James M.; Ragan, Mark A.

    2014-01-01

    Alignment-free methods, in which shared properties of sub-sequences (e.g. identity or match length) are extracted and used to compute a distance matrix, have recently been explored for phylogenetic inference. However, the scalability and robustness of these methods to key evolutionary processes remain to be investigated. Here, using simulated sequence sets of various sizes in both nucleotides and amino acids, we systematically assess the accuracy of phylogenetic inference using an alignment-free approach, based on D2 statistics, under different evolutionary scenarios. We find that compared to a multiple sequence alignment approach, D2 methods are more robust against among-site rate heterogeneity, compositional biases, genetic rearrangements and insertions/deletions, but are more sensitive to recent sequence divergence and sequence truncation. Across diverse empirical datasets, the alignment-free methods perform well for sequences sharing low divergence, at greater computation speed. Our findings provide strong evidence for the scalability and the potential use of alignment-free methods in large-scale phylogenomics. PMID:25266120

  8. Sequencing as an Item Type.

    ERIC Educational Resources Information Center

    Alderson, J. Charles; Percsich, Richard; Szabo, Gabor

    2000-01-01

    Reports on the potential problems in scoring responses to sequencing tests, the development of a computer program to overcome these difficulties, and an exploration of the value of scoring procedures. (Author/VWL)

  9. Pythagorean Triples from Harmonic Sequences.

    ERIC Educational Resources Information Center

    DiDomenico, Angelo S.; Tanner, Randy J.

    2001-01-01

    Shows how all primitive Pythagorean triples can be generated from harmonic sequences. Use inductive and deductive reasoning to explore how Pythagorean triples are connected with another area of mathematics. (KHR)

  10. Guitars, Violins, and Geometric Sequences

    ERIC Educational Resources Information Center

    Barger, Rita; Haehl, Martha

    2007-01-01

    This article describes middle school mathematics activities that relate measurement, ratios, and geometric sequences to finger positions or the placement of frets on stringed musical instruments. (Contains 2 figures and 2 tables.)

  11. The Dynamics of DNA Sequencing.

    ERIC Educational Resources Information Center

    Morvillo, Nancy

    1997-01-01

    Describes a paper-and-pencil activity that helps students understand DNA sequencing and expands student understanding of DNA structure, replication, and gel electrophoresis. Appropriate for advanced biology students who are familiar with the Sanger method. (DDR)

  12. Molecular beacon sequence design algorithm.

    PubMed

    Monroe, W Todd; Haselton, Frederick R

    2003-01-01

    A method based on Web-based tools is presented to design optimally functioning molecular beacons. Molecular beacons, fluorogenic hybridization probes, are a powerful tool for the rapid and specific detection of a particular nucleic acid sequence. However, their synthesis costs can be considerable. Since molecular beacon performance is based on its sequence, it is imperative to rationally design an optimal sequence before synthesis. The algorithm presented here uses simple Microsoft Excel formulas and macros to rank candidate sequences. This analysis is carried out using mfold structural predictions along with other free Web-based tools. For smaller laboratories where molecular beacons are not the focus of research, the public domain algorithm described here may be usefully employed to aid in molecular beacon design.

  13. Paucity of moderately repetitive sequences

    SciTech Connect

    Schmid, C.W.

    1991-01-01

    We examined clones of renatured repetitive human DNA to find novel repetitive DNAs. After eliminating known repeats, the remaining clones were subjected to sequence analysis. These clones also corresponded to known repeats, but with greater sequence diversity. This indicates that either these libraries were depleted of short interspersed repeats in construction, or these repeats are much less prevalent in the human genome than is indicated by data from {und Xenopus} or sea urchin studies. We directly investigated the sequence composition of human DNA through traditional renaturation techniques with the goal of estimating the limits of abundance of repetitive sequence classes in human DNA. Our results sharply limit the maximum possible abundance to 1--2% of the human genome. Our estimate, minus the known repeats in this fraction, leaves about 1% (3 {times} 10{sup 7} nucleotides) of the human genome for novel repetitive elements. 2 refs. (MHB)

  14. Archiving next generation sequencing data.

    PubMed

    Shumway, Martin; Cochrane, Guy; Sugawara, Hideaki

    2010-01-01

    Next generation sequencing platforms are producing biological sequencing data in unprecedented amounts. The partners of the International Nucleotide Sequencing Database Collaboration, which includes the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI), and the DNA Data Bank of Japan (DDBJ), have established the Sequence Read Archive (SRA) to provide the scientific community with an archival destination for next generation data sets. The SRA is now accessible at http://www.ncbi.nlm.nih.gov/Traces/sra from NCBI, at http://www.ebi.ac.uk/ena from EBI and at http://www.ddbj.nig.ac.jp/sub/trace_sra-e.html from DDBJ. Users of these resources can obtain data sets deposited in any of the three SRA instances. Links and submission instructions are provided.

  15. Evaluation of whole exome sequencing by targeted gene sequencing and Sanger sequencing.

    PubMed

    Chang, Ya-Sian; Huang, Hsien-Da; Yeh, Kun-Tu; Chang, Jan-Gowth

    2017-08-01

    Targeted gene sequencing (TGS) and whole exome sequencing (WES) are being used in clinical testing in laboratories. We compared the performances of TGS and WES using the same DNA samples. DNA was extracted from 10 endometrial tumor tissue specimens. Sequencing were performed with an Illumina HiSeq 2000. We randomly selected variants to confirm through Sanger sequencing or mutant-enriched PCR with Sanger sequencing. We found that the variants identified in both TGS and WES were true positives (47/47), regardless of the sequencing depth. Most variants found in TGS only were true positives (34/40), and most of the variants found by WES only were false positives (8/18). From these results, we suggest that the sequencing depth may not play important role in the accuracy of NGS-based methods. After analysis, we found that WES had a sensitivity of 72.70%, specificity of 96.27%, precision of 99.44%, and accuracy of 75.03%. The results of NGS-based methods must currently be validated, especially for important reported variants regardless of the methods used, and for the use of WES in cancers a higher false negative rate must be considered. More sensitive methods should be used to confirm the NGS results in uneven cancer tissues. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Rover Sequencing and Visualization Program

    NASA Technical Reports Server (NTRS)

    Cooper, Brian; Hartman, Frank; Maxwell, Scott; Yen, Jeng; Wright, John; Balacuit, Carlos

    2005-01-01

    The Rover Sequencing and Visualization Program (RSVP) is the software tool for use in the Mars Exploration Rover (MER) mission for planning rover operations and generating command sequences for accomplishing those operations. RSVP combines three-dimensional (3D) visualization for immersive exploration of the operations area, stereoscopic image display for high-resolution examination of the downlinked imagery, and a sophisticated command-sequence editing tool for analysis and completion of the sequences. RSVP is linked with actual flight-code modules for operations rehearsal to provide feedback on the expected behavior of the rover prior to committing to a particular sequence. Playback tools allow for review of both rehearsed rover behavior and downlinked results of actual rover operations. These can be displayed simultaneously for comparison of rehearsed and actual activities for verification. The primary inputs to RSVP are downlink data products from the Operations Storage Server (OSS) and activity plans generated by the science team. The activity plans are high-level goals for the next day s activities. The downlink data products include imagery, terrain models, and telemetered engineering data on rover activities and state. The Rover Sequence Editor (RoSE) component of RSVP performs activity expansion to command sequences, command creation and editing with setting of command parameters, and viewing and management of rover resources. The HyperDrive component of RSVP performs 2D and 3D visualization of the rover s environment, graphical and animated review of rover-predicted and telemetered state, and creation and editing of command sequences related to mobility and Instrument Deployment Device (IDD) operations. Additionally, RoSE and HyperDrive together evaluate command sequences for potential violations of flight and safety rules. The products of RSVP include command sequences for uplink that are stored in the Distributed Object Manager (DOM) and predicted rover

  17. Sequenced drive for rotary valves

    DOEpatents

    Mittell, Larry C.

    1981-01-01

    A sequenced drive for rotary valves which provides the benefits of applying rotary and linear motions to the movable sealing element of the valve. The sequenced drive provides a close approximation of linear motion while engaging or disengaging the movable element with the seat minimizing wear and damage due to scrubbing action. The rotary motion of the drive swings the movable element out of the flowpath thus eliminating obstruction to flow through the valve.

  18. Structural Complexity of DNA Sequence

    PubMed Central

    Liou, Cheng-Yuan; Cheng, Wei-Chen; Tsai, Huai-Ying

    2013-01-01

    In modern bioinformatics, finding an efficient way to allocate sequence fragments with biological functions is an important issue. This paper presents a structural approach based on context-free grammars extracted from original DNA or protein sequences. This approach is radically different from all those statistical methods. Furthermore, this approach is compared with a topological entropy-based method for consistency and difference of the complexity results. PMID:23662161

  19. Overview of Sequence Data Formats.

    PubMed

    Zhang, Hongen

    2016-01-01

    Next-generation sequencing experiment can generate billions of short reads for each sample and processing of the raw reads will add more information. Various file formats have been introduced/developed in order to store and manipulate this information. This chapter presents an overview of the file formats including FASTQ, FASTA, SAM/BAM, GFF/GTF, BED, and VCF that are commonly used in analysis of next-generation sequencing data.

  20. Mycobacterium abscessus multispacer sequence typing

    PubMed Central

    2013-01-01

    Background Mycobacterium abscessus group includes antibiotic-resistant, opportunistic mycobacteria that are responsible for sporadic cases and outbreaks of cutaneous, pulmonary and disseminated infections. However, because of their close genetic relationships, accurate discrimination between the various strains of these mycobacteria remains difficult. In this report, we describe the development of a multispacer sequence typing (MST) analysis for the simultaneous identification and typing of M. abscessus mycobacteria. We also compared MST with the reference multilocus sequence analysis (MLSA) typing method. Results Based on the M. abscessus CIP104536T genome, eight intergenic spacers were selected, PCR amplified and sequenced in 21 M. abscessus isolates and analysed in 48 available M. abscessus genomes. MST and MLSA grouped 37 M. abscessus organisms into 12 and nine types, respectively; four formerly “M. bolletii” organisms and M. abscessus M139 into three and four types, respectively; and 27 formerly “M. massiliense” organisms grouped into nine and five types, respectively. The Hunter-Gaston index was off 0.912 for MST and of 0.903 for MLSA. The MST-derived tree was similar to that based on MLSA and rpoB gene sequencing and yielded three main clusters comprising each the type strain of the respective M. abscessus sub-species. Two isolates exhibited discordant MLSA- and rpoB gene sequence-derived position, one isolate exhibited discordant MST- and rpoB gene sequence-derived position and one isolate exhibited discordant MST- and MLSA-derived position. MST spacer n°2 sequencing alone allowed for the accurate identification of the different isolates at the sub-species level. Conclusions MST is a new sequencing-based approach for both identifying and genotyping M. abscessus mycobacteria that clearly differentiates formerly “M. massiliense” organisms from other M. abscessus subsp. bolletii organisms. PMID:23294800

  1. Nanogrid rolling circle DNA sequencing

    DOEpatents

    Church, George M.; Porreca, Gregory J.; Shendure, Jay; Rosenbaum, Abraham Meir

    2017-04-18

    The present invention relates to methods for sequencing a polynucleotide immobilized on an array having a plurality of specific regions each having a defined diameter size, including synthesizing a concatemer of a polynucleotide by rolling circle amplification, wherein the concatemer has a cross-sectional diameter greater than the diameter of a specific region, immobilizing the concatemer to the specific region to make an immobilized concatemer, and sequencing the immobilized concatemer.

  2. Genome Sequence of Canine Herpesvirus

    PubMed Central

    Papageorgiou, Konstantinos V.; Suárez, Nicolás M.; Wilkie, Gavin S.; McDonald, Michael; Graham, Elizabeth M.; Davison, Andrew J.

    2016-01-01

    Canine herpesvirus is a widespread alphaherpesvirus that causes a fatal haemorrhagic disease of neonatal puppies. We have used high-throughput methods to determine the genome sequences of three viral strains (0194, V777 and V1154) isolated in the United Kingdom between 1985 and 2000. The sequences are very closely related to each other. The canine herpesvirus genome is estimated to be 125 kbp in size and consists of a unique long sequence (97.5 kbp) and a unique short sequence (7.7 kbp) that are each flanked by terminal and internal inverted repeats (38 bp and 10.0 kbp, respectively). The overall nucleotide composition is 31.6% G+C, which is the lowest among the completely sequenced alphaherpesviruses. The genome contains 76 open reading frames predicted to encode functional proteins, all of which have counterparts in other alphaherpesviruses. The availability of the sequences will facilitate future research on the diagnosis and treatment of canine herpesvirus-associated disease. PMID:27213534

  3. Graphene Nanopores for Protein Sequencing

    PubMed Central

    Wilson, James; Sloman, Leila; He, Zhiren

    2016-01-01

    An inexpensive, reliable method for protein sequencing is essential to unraveling the biological mechanisms governing cellular behavior and disease. Current protein sequencing methods suffer from limitations associated with the size of proteins that can be sequenced, the time, and the cost of the sequencing procedures. Here, we report the results of all-atom molecular dynamics simulations that investigated the feasibility of using graphene nanopores for protein sequencing. We focus our study on the biologically significant phenylalanine-glycine repeat peptides (FG-nups)—parts of the nuclear pore transport machinery. Surprisingly, we found FG-nups to behave similarly to single stranded DNA: the peptides adhere to graphene and exhibit step-wise translocation when subject to a transmembrane bias or a hydrostatic pressure gradient. Reducing the peptide’s charge density or increasing the peptide’s hydrophobicity was found to decrease the translocation speed. Yet, unidirectional and stepwise translocation driven by a transmembrane bias was observed even when the ratio of charged to hydrophobic amino acids was as low as 1:8. The nanopore transport of the peptides was found to produce stepwise modulations of the nanopore ionic current correlated with the type of amino acids present in the nanopore, suggesting that protein sequencing by measuring ionic current blockades may be possible. PMID:27746710

  4. Long-range barcode labeling-sequencing

    DOEpatents

    Chen, Feng; Zhang, Tao; Singh, Kanwar K.; Pennacchio, Len A.; Froula, Jeff L.; Eng, Kevin S.

    2016-10-18

    Methods for sequencing single large DNA molecules by clonal multiple displacement amplification using barcoded primers. Sequences are binned based on barcode sequences and sequenced using a microdroplet-based method for sequencing large polynucleotide templates to enable assembly of haplotype-resolved complex genomes and metagenomes.

  5. Conservation of sequence in recombination signal sequence spacers.

    PubMed Central

    Ramsden, D A; Baetz, K; Wu, G E

    1994-01-01

    The variable domains of immunoglobulins and T cell receptors are assembled through the somatic, site specific recombination of multiple germline segments (V, D, and J segments) or V(D)J rearrangement. The recombination signal sequence (RSS) is necessary and sufficient for cell type specific targeting of the V(D)J rearrangement machinery to these germline segments. Previously, the RSS has been described as possessing both a conserved heptamer and a conserved nonamer motif. The heptamer and nonamer motifs are separated by a 'spacer' that was not thought to possess significant sequence conservation, however the length of the spacer could be either 12 +/- 1 bp or 23 +/- 1 bp long. In this report we have assembled and analyzed an extensive data base of published RSS. We have derived, through extensive consensus comparison, a more detailed description of the RSS than has previously been reported. Our analysis indicates that RSS spacers possess significant conservation of sequence, and that the conserved sequence in 12 bp spacers is similar to the conserved sequence in the first half of 23 bp spacers. PMID:8208601

  6. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    PubMed

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.

  7. Sequencing and comparative analysis of the gorilla MHC genomic sequence

    PubMed Central

    Wilming, Laurens G.; Hart, Elizabeth A.; Coggill, Penny C.; Horton, Roger; Gilbert, James G. R.; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L.

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC. PMID:23589541

  8. Ossification sequence heterochrony among amphibians.

    PubMed

    Harrington, Sean M; Harrison, Luke B; Sheil, Christopher A

    2013-01-01

    Heterochrony is an important mechanism in the evolution of amphibians. Although studies have centered on the relationship between size and shape and the rates of development, ossification sequence heterochrony also may have been important. Rigorous, phylogenetic methods for assessing sequence heterochrony are relatively new, and a comprehensive study of the relative timing of ossification of skeletal elements has not been used to identify instances of sequence heterochrony across Amphibia. In this study, a new version of the program Parsimov-based genetic inference (PGi) was used to identify shifts in ossification sequences across all extant orders of amphibians, for all major structural units of the skeleton. PGi identified a number of heterochronic sequence shifts in all analyses, the most interesting of which seem to be tied to differences in metamorphic patterns among major clades. Early ossification of the vomer, premaxilla, and dentary is retained by Apateon caducus and members of Gymnophiona and Urodela, which lack the strongly biphasic development seen in anurans. In contrast, bones associated with the jaws and face were identified as shifting late in the ancestor of Anura. The bones that do not shift late, and thereby occupy the earliest positions in the anuran cranial sequence, are those in regions of the skull that undergo the least restructuring throughout anuran metamorphosis. Additionally, within Anura, bones of the hind limb and pelvic girdle were also identified as shifting early in the sequence of ossification, which may be a result of functional constraints imposed by the drastic metamorphosis of most anurans. © 2013 Wiley Periodicals, Inc.

  9. Sequence Factorization with Multiple References

    PubMed Central

    Wandelt, Sebastian; Leser, Ulf

    2015-01-01

    The success of high-throughput sequencing has lead to an increasing number of projects which sequence large populations of a species. Storage and analysis of sequence data is a key challenge in these projects, because of the sheer size of the datasets. Compression is one simple technology to deal with this challenge. Referential factorization and compression schemes, which store only the differences between input sequence and a reference sequence, gained lots of interest in this field. Highly-similar sequences, e.g., Human genomes, can be compressed with a compression ratio of 1,000:1 and more, up to two orders of magnitude better than with standard compression techniques. Recently, it was shown that the compression against multiple references from the same species can boost the compression ratio up to 4,000:1. However, a detailed analysis of using multiple references is lacking, e.g., for main memory consumption and optimality. In this paper, we describe one key technique for the referential compression against multiple references: The factorization of sequences. Based on the notion of an optimal factorization, we propose optimization heuristics and identify parameter settings which greatly influence 1) the size of the factorization, 2) the time for factorization, and 3) the required amount of main memory. We evaluate a total of 30 setups with a varying number of references on data from three different species. Our results show a wide range of factorization sizes (optimal to an overhead of up to 300%), factorization speed (0.01 MB/s to more than 600 MB/s), and main memory usage (few dozen MB to dozens of GB). Based on our evaluation, we identify the best configurations for common use cases. Our evaluation shows that multi-reference factorization is much better than single-reference factorization. PMID:26422374

  10. Analysis of Pteridium ribosomal RNA sequences by rapid direct sequencing.

    PubMed

    Tan, M K

    1991-08-01

    A total of 864 bases from 5 regions interspersed in the 18S and 26S rRNA molecules from various clones of Pteridium covering the general geographical distribution of the genus was analysed using a rapid rRNA sequencing technique. No base difference has been detected amongst the three major lineages, two of which apparently separated before the breakup of the ancient supercontinent, Pangaea. These regions of the rRNA sequences have thus been conserved for at least 160 million years and are here compared with other eukaryotic, especially plant rRNAs.

  11. Genetics Home Reference: isolated Pierre Robin sequence

    MedlinePlus

    ... Health Conditions isolated Pierre Robin sequence isolated Pierre Robin sequence Enable Javascript to view the expand/collapse ... Download PDF Open All Close All Description Pierre Robin sequence is a set of abnormalities affecting the ...

  12. A Demonstration of Automated DNA Sequencing.

    ERIC Educational Resources Information Center

    Latourelle, Sandra; Seidel-Rogol, Bonnie

    1998-01-01

    Details a simulation that employs a paper-and-pencil model to demonstrate the principles behind automated DNA sequencing. Discusses the advantages of automated sequencing as well as the chemistry of automated DNA sequencing. (DDR)

  13. Statistical properties of DNA sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

    1995-01-01

    We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.

  14. Sequencing Needs for Viral Diagnostics

    SciTech Connect

    Gardner, S N; Lam, M; Mulakken, N J; Torres, C L; Smith, J R; Slezak, T

    2004-01-26

    We built a system to guide decisions regarding the amount of genomic sequencing required to develop diagnostic DNA signatures, which are short sequences that are sufficient to uniquely identify a viral species. We used our existing DNA diagnostic signature prediction pipeline, which selects regions of a target species genome that are conserved among strains of the target (for reliability, to prevent false negatives) and unique relative to other species (for specificity, to avoid false positives). We performed simulations, based on existing sequence data, to assess the number of genome sequences of a target species and of close phylogenetic relatives (''near neighbors'') that are required to predict diagnostic signature regions that are conserved among strains of the target species and unique relative to other bacterial and viral species. For DNA viruses such as variola (smallpox), three target genomes provide sufficient guidance for selecting species-wide signatures. Three near neighbor genomes are critical for species specificity. In contrast, most RNA viruses require four target genomes and no near neighbor genomes, since lack of conservation among strains is more limiting than uniqueness. SARS and Ebola Zaire are exceptional, as additional target genomes currently do not improve predictions, but near neighbor sequences are urgently needed. Our results also indicate that double stranded DNA viruses are more conserved among strains than are RNA viruses, since in most cases there was at least one conserved signature candidate for the DNA viruses and zero conserved signature candidates for the RNA viruses.

  15. Sequence-dependent nucleosome positioning.

    PubMed

    Chung, Ho-Ryun; Vingron, Martin

    2009-03-13

    Eukaryotic DNA is organized into a macromolecular structure called chromatin. The basic repeating unit of chromatin is the nucleosome, which consists of two copies of each of the four core histones and DNA. The nucleosomal organization and the positions of nucleosomes have profound effects on all DNA-dependent processes. Understanding the factors that influence nucleosome positioning is therefore of general interest. Among the many determinants of nucleosome positioning, the DNA sequence has been proposed to have a major role. Here, we analyzed more than 860,000 nucleosomal DNA sequences to identify sequence features that guide the formation of nucleosomes in vivo. We found that both a periodic enrichment of AT base pairs and an out-of-phase oscillating enrichment of GC base pairs as well as the overall preference for GC base pairs are determinants of nucleosome positioning. The preference for GC pairs can be related to a lower energetic cost required for deformation of the DNA to wrap around the histones. In line with this idea, we found that only incorporation of both signal components into a sequence model for nucleosome formation results in maximal predictive performance on a genome-wide scale. In this manner, one achieves greater predictive power than published approaches. Our results confirm the hypothesis that the DNA sequence has a major role in nucleosome positioning in vivo.

  16. Statistical properties of DNA sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

    1995-01-01

    We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.

  17. Thermoelectric method for sequencing DNA.

    PubMed

    Nestorova, Gergana G; Guilbeau, Eric J

    2011-05-21

    This study describes a novel, thermoelectric method for DNA sequencing in a microfluidic device. The method measures the heat released when DNA polymerase inserts a deoxyribonucleoside triphosphate into a primed DNA template. The study describes the principle of operation of a laminar flow microfluidic chip with a reaction zone that contains DNA template/primer complex immobilized to the inner surface of the device's lower channel wall. A thin-film thermopile attached to the external surface of the lower channel wall measures the dynamic change in temperature that results when Klenow polymerase inserts a deoxyribonucleoside triphosphate into the DNA template. The intrinsic rejection of common-mode thermal signals by the thermopile in combination with hydrodynamic focused flow allows for the measurement of temperature changes on the order of 10(-4) K without control of ambient temperature. To demonstrate the method, we report the sequencing of a model oligonucleotide containing 12 bases. Results demonstrate that it is feasible to sequence DNA by measuring the heat released during nucleotide incorporation. This thermoelectric method for sequencing DNA may offer a novel new method of DNA sequencing for personalized medicine applications. © The Royal Society of Chemistry 2011

  18. The Extrapolation of Elementary Sequences

    NASA Technical Reports Server (NTRS)

    Laird, Philip; Saul, Ronald

    1992-01-01

    We study sequence extrapolation as a stream-learning problem. Input examples are a stream of data elements of the same type (integers, strings, etc.), and the problem is to construct a hypothesis that both explains the observed sequence of examples and extrapolates the rest of the stream. A primary objective -- and one that distinguishes this work from previous extrapolation algorithms -- is that the same algorithm be able to extrapolate sequences over a variety of different types, including integers, strings, and trees. We define a generous family of constructive data types, and define as our learning bias a stream language called elementary stream descriptions. We then give an algorithm that extrapolates elementary descriptions over constructive datatypes and prove that it learns correctly. For freely-generated types, we prove a polynomial time bound on descriptions of bounded complexity. An especially interesting feature of this work is the ability to provide quantitative measures of confidence in competing hypotheses, using a Bayesian model of prediction.

  19. Sequence alignment with tandem duplication

    SciTech Connect

    Benson, G.

    1997-12-01

    Algorithm development for comparing and aligning biological sequences has, until recently, been based on the SI model of mutational events which assumes that modification of sequences proceeds through any of the operations of substitution, insertion or deletion (the latter two collectively termed indels). While this model has worked farily well, it has long been apparent that other mutational events occur. In this paper, we introduce a new model, the DSI model which includes another common mutational event, tandem duplication. Tandem duplication produces tandem repeats which are common in DNA, making up perhaps 10% of the human genome. They are responsible for some human diseases and may serve a multitude of functions in DNA regulation and evolution. Using the DSI model, we develop new exact and heuristic algorithms for comparing and aligning DNA sequences when they contain tandem repeats. 30 refs., 3 figs.

  20. Explaining the harmonic sequence paradox.

    PubMed

    Schmidt, Ulrich; Zimper, Alexander

    2012-05-01

    According to the harmonic sequence paradox, an expected utility decision maker's willingness to pay for a gamble whose expected payoffs evolve according to the harmonic series is finite if and only if his marginal utility of additional income becomes zero for rather low payoff levels. Since the assumption of zero marginal utility is implausible for finite payoff levels, expected utility theory - as well as its standard generalizations such as cumulative prospect theory - are apparently unable to explain a finite willingness to pay. This paper presents first an experimental study of the harmonic sequence paradox. Additionally, it demonstrates that the theoretical argument of the harmonic sequence paradox only applies to time-patient decision makers, whereas the paradox is easily avoided if time-impatience is introduced.

  1. Numerical classification of coding sequences

    NASA Technical Reports Server (NTRS)

    Collins, D. W.; Liu, C. C.; Jukes, T. H.

    1992-01-01

    DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for example to identify redundant GenBank entries. These descriptors may be compared, in place of DNA sequences, to extract homologous genes from large databases. This approach permits rapid searching with good selectivity.

  2. Prediction, sequences and the hippocampus

    PubMed Central

    Lisman, John; Redish, A.D.

    2009-01-01

    Recordings of rat hippocampal place cells have provided information about how the hippocampus retrieves memory sequences. One line of evidence has to do with phase precession, a process organized by theta and gamma oscillations. This precession can be interpreted as the cued prediction of the sequence of upcoming positions. In support of this interpretation, experiments in two-dimensional environments and on a cue-rich linear track demonstrate that many cells represent a position ahead of the animal and that this position is the same irrespective of which direction the rat is coming from. Other lines of investigation have demonstrated that such predictive processes also occur in the non-spatial domain and that retrieval can be internally or externally cued. The mechanism of sequence retrieval and the usefulness of this retrieval to guide behaviour are discussed. PMID:19528000

  3. Chaotic sequences for noisy environments

    NASA Astrophysics Data System (ADS)

    Carroll, T. L.; Rachford, F. J.

    2016-10-01

    There have been many attempts to apply chaotic signals to communications or radar, but one obstacle has been that there is no effective way to recover chaotic signals from noise larger than the signal. In this work, we create "pseudo-chaotic" signals by concatenating dictionary sequences generated from a chaotic attractor. Because the number of dictionary sequences is finite, these pseudo-chaotic signals are not actually chaotic, but they can still contain some of the desirable properties of chaos. Using dictionary sequences allows the pseudo-chaotic signal to be recovered from noise using a correlation detector and a Viterbi decoder, so the signal can be recovered from noise or interference that is larger than the signal itself.

  4. Analysis of DNA Sequence Variants Detected by High Throughput Sequencing

    PubMed Central

    Adams, David R; Sincan, Murat; Fajardo, Karin Fuentes; Mullikin, James C; Pierson, Tyler M; Toro, Camilo; Boerkoel, Cornelius F; Tifft, Cynthia J; Gahl, William A; Markello, Tom C

    2014-01-01

    The Undiagnosed Diseases Program at the National Institutes of Health uses High Throughput Sequencing (HTS) to diagnose rare and novel diseases. HTS techniques generate large numbers of DNA sequence variants, which must be analyzed and filtered to find candidates for disease causation. Despite the publication of an increasing number of successful exome-based projects, there has been little formal discussion of the analytic steps applied to HTS variant lists. We present the results of our experience with over 30 families for whom HTS sequencing was used in an attempt to find clinical diagnoses. For each family, exome sequence was augmented with high-density SNP-array data. We present a discussion of the theory and practical application of each analytic step and provide example data to illustrate our approach. The paper is designed to provide an analytic roadmap for variant analysis, thereby enabling a wide range of researchers and clinical genetics practitioners to perform direct analysis of HTS data for their patients and projects. PMID:22290882

  5. Composition for nucleic acid sequencing

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  6. Fault trees and sequence dependencies

    NASA Technical Reports Server (NTRS)

    Dugan, Joanne Bechta; Boyd, Mark A.; Bavuso, Salvatore J.

    1990-01-01

    One of the frequently cited shortcomings of fault-tree models, their inability to model so-called sequence dependencies, is discussed. Several sources of such sequence dependencies are discussed, and new fault-tree gates to capture this behavior are defined. These complex behaviors can be included in present fault-tree models because they utilize a Markov solution. The utility of the new gates is demonstrated by presenting several models of the fault-tolerant parallel processor, which include both hot and cold spares.

  7. High speed nucleic acid sequencing

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  8. Iterated sequence databank search methods.

    PubMed

    Taylor, W R; Brown, N P

    1999-06-15

    Iterated sequence databank search methods were assessed from the viewpoint of someone with the sequence of a novel gene product wishing to find distant relatives to their protein and, with the specific searches against the PDB, also hoping to find a relative of known structure. We examined three methods in detail, spanning a range from simple pattern-matching to sophisticated weighted profiles. Rather than apply these methods 'blindly' (with default parameters) to a large number of test queries, we have concentrated on the globins, so allowing a more detailed investigation of each method on different data subsets with different parameter settings. Despite their widespread use, regular-expression matching proved to be very limited-seldom extending beyond the sub-family from which the pattern was derived. To attain any generality, the patterns had to be 'stripped-down' to include only the most highly conserved parts. The QUEST program avoided these problems by introducing a more flexible (weighted) matching. On the PDB sequences this was highly effective, missing only a few globins with probes based on each sub-family or even a single representative from each sub-family. In addition, very few false-positives were encountered, and those that did match, often only did so for a few cycles before being lost again. On the larger sequence collection, however, QUEST encountered problems with maintaining (or achieving) the alignment of the full globin family. psi-BLAST also recognised almost all the globins when matching against the PDB sequences, typically, missing three or four of the most distantly related sequences while picking-up a few false-positives. In contrast to QUEST, psi-BLAST performed very well on the larger databank, getting almost a full collection of globins although still retaining the same proportion of false-positives. SAM applied to the PDB sequences performed reasonably well with the myoglobin and hemoglobin families as probes, missing, typically

  9. Hahn Sequence Space of Modals

    PubMed Central

    Balasubramanian, T.; Zion Chella Ruth, S.

    2014-01-01

    The history of modal intervals goes back to the very first publications on the topic of interval calculus. The modal interval analysis is used in Computer graphics and Computer Aided Design (CAD), namely, the computation of narrow bounds on Bezier and B-Spline curves. Since modal intervals are used in many fields, we introduce a new sequence space h(gI) called the Hahn sequence space of modal intervals. We have given some new definitions and theorems. Some inclusion relation and some topological properties of this space are investigated. Also dual spaces of this space are computed. PMID:27382628

  10. DNA Sequencing Using capillary Electrophoresis

    SciTech Connect

    Dr. Barry Karger

    2011-05-09

    The overall goal of this program was to develop capillary electrophoresis as the tool to be used to sequence for the first time the Human Genome. Our program was part of the Human Genome Project. In this work, we were highly successful and the replaceable polymer we developed, linear polyacrylamide, was used by the DOE sequencing lab in California to sequence a significant portion of the human genome using the MegaBase multiple capillary array electrophoresis instrument. In this final report, we summarize our efforts and success. We began our work by separating by capillary electrophoresis double strand oligonucleotides using cross-linked polyacrylamide gels in fused silica capillaries. This work showed the potential of the methodology. However, preparation of such cross-linked gel capillaries was difficult with poor reproducibility, and even more important, the columns were not very stable. We improved stability by using non-cross linked linear polyacrylamide. Here, the entangled linear chains could move when osmotic pressure (e.g. sample injection) was imposed on the polymer matrix. This relaxation of the polymer dissipated the stress in the column. Our next advance was to use significantly lower concentrations of the linear polyacrylamide that the polymer could be automatically blown out after each run and replaced with fresh linear polymer solution. In this way, a new column was available for each analytical run. Finally, while testing many linear polymers, we selected linear polyacrylamide as the best matrix as it was the most hydrophilic polymer available. Under our DOE program, we demonstrated initially the success of the linear polyacrylamide to separate double strand DNA. We note that the method is used even today to assay purity of double stranded DNA fragments. Our focus, of course, was on the separation of single stranded DNA for sequencing purposes. In one paper, we demonstrated the success of our approach in sequencing up to 500 bases. Other

  11. Genome Sequence of Spizellomyces punctatus

    PubMed Central

    Russ, Carsten; Lang, B. Franz; Chen, Zehua; Gujja, Sharvari; Shea, Terrance; Zeng, Qiandong; Young, Sarah; Nusbaum, Chad

    2016-01-01

    Spizellomyces punctatus is a basally branching chytrid fungus that is found in the Chytridiomycota phylum. Spizellomyces species are common in soil and of importance in terrestrial ecosystems. Here, we report the genome sequence of S. punctatus, which will facilitate the study of this group of early diverging fungi. PMID:27540072

  12. Information of sequences and applications

    NASA Astrophysics Data System (ADS)

    Bonanno, Claudio; Galatolo, Stefano; Menconi, Giulia

    2002-03-01

    In this short note, we outline some results about complexity of orbits of a dynamical system, entropy and initial condition sensitivity in weakly chaotic dynamical systems. We present a technique to estimate orbit complexity by the use of data compression algorithms. We also outline how this technique has been applied by our research group to dynamical systems and to DNA sequences.

  13. Ideal statistically quasi Cauchy sequences

    NASA Astrophysics Data System (ADS)

    Savas, Ekrem; Cakalli, Huseyin

    2016-08-01

    An ideal I is a family of subsets of N, the set of positive integers which is closed under taking finite unions and subsets of its elements. A sequence (xk) of real numbers is said to be S(I)-statistically convergent to a real number L, if for each ɛ > 0 and for each δ > 0 the set { n ∈N :1/n | { k ≤n :| xk-L | ≥ɛ } | ≥δ } belongs to I. We introduce S(I)-statistically ward compactness of a subset of R, the set of real numbers, and S(I)-statistically ward continuity of a real function in the senses that a subset E of R is S(I)-statistically ward compact if any sequence of points in E has an S(I)-statistically quasi-Cauchy subsequence, and a real function is S(I)-statistically ward continuous if it preserves S(I)-statistically quasi-Cauchy sequences where a sequence (xk) is called to be S(I)-statistically quasi-Cauchy when (Δxk) is S(I)-statistically convergent to 0. We obtain results related to S(I)-statistically ward continuity, S(I)-statistically ward compactness, Nθ-ward continuity, and slowly oscillating continuity.

  14. Fusicladium effusum draft genome sequence

    USDA-ARS?s Scientific Manuscript database

    The pecan scab fungus (Fusicladium effusum [G. Winter]) is an economically important pathogen of pecan (Carya illinoinensis [Wangenh]. K. Koch), on account of its impact on yield and quality of valuable nutmeats. We describe the first draft genome sequence of F. effusum, the characteristics of annot...

  15. Polymorphism in regulatory gene sequences

    PubMed Central

    Mitchison, N A

    2001-01-01

    The extensive polymorphism revealed in non-coding gene-regulatory sequences, particularly in the immune system, suggests that this type of genetic variation is functionally and evolutionarily far more important than has been suspected, and provides a lead to new therapeutic strategies. PMID:11178274

  16. Crop Sequence Calculator, v. 3

    USDA-ARS?s Scientific Manuscript database

    Producers need to know how to sequence crops to develop sustainable dynamic cropping systems that take advantage of inherent internal resources, such as crop synergism, nutrient cycling, and soil water, and capitalize on external resources, such as weather, markets, and government programs. Version ...

  17. Why Visual Sequences Come First.

    ERIC Educational Resources Information Center

    Barley, Steven D.

    Visual sequences should be the first visual literacy exercises for reasons that are physio-psychological, semantic, and curricular. In infancy, vision is undifferentiated and undetailed. The number of details a child sees increases with age. Therefore, a series of pictures, rather than one photograph which tells a whole story, is more appropriate…

  18. Single-Cell Semiconductor Sequencing

    PubMed Central

    Kohn, Andrea B.; Moroz, Tatiana P.; Barnes, Jeffrey P.; Netherton, Mandy; Moroz, Leonid L.

    2014-01-01

    RNA-seq or transcriptome analysis of individual cells and small-cell populations is essential for virtually any biomedical field. It is especially critical for developmental, aging, and cancer biology as well as neuroscience where the enormous heterogeneity of cells present a significant methodological and conceptual challenge. Here we present two methods that allow for fast and cost-efficient transcriptome sequencing from ultra-small amounts of tissue or even from individual cells using semiconductor sequencing technology (Ion Torrent, Life Technologies). The first method is a reduced representation sequencing which maximizes capture of RNAs and preserves transcripts’ directionality. The second, a template-switch protocol, is designed for small mammalian neurons. Both protocols, from cell/tissue isolation to final sequence data, take up to 4 days. The efficiency of these protocols has been validated with single hippocampal neurons and various invertebrate tissues including individually identified neurons within a simpler memory-forming circuit of Aplysia californica and early (1-, 2-, 4-, 8-cells) embryonic and developmental stages from basal metazoans. PMID:23929110

  19. [Gene and gene sequence patenting].

    PubMed

    Bergel, S D

    1998-01-01

    According to the author, the patenting of elements isolated or copied from the human body boils down to the issue of genes and gene sequences. He describes the current situation from the comparative law standpoint (U.S. and Spanish law mainly) and then esamines the biotechnology industry's position.

  20. Why Visual Sequences Come First.

    ERIC Educational Resources Information Center

    Barley, Steven D.

    Visual sequences should be the first visual literacy exercises for reasons that are physio-psychological, semantic, and curricular. In infancy, vision is undifferentiated and undetailed. The number of details a child sees increases with age. Therefore, a series of pictures, rather than one photograph which tells a whole story, is more appropriate…

  1. Exome sequencing deciphers rare diseases.

    PubMed

    Maxmen, Amy

    2011-03-04

    Two years ago, NIH's Undiagnosed Diseases Program began delivering genomics to the clinic on an unprecedented scale. Now, with 128 exomes sequenced and 39 rare diseases diagnosed, the program's success is paving the way for widespread personal genomics while pioneering new techniques for reigning in the "tsunami" of genomics data.

  2. Efficient algorithms for molecular sequence analysis.

    PubMed Central

    Karlin, S; Morris, M; Ghandour, G; Leung, M Y

    1988-01-01

    Efficient (linear time) algorithms are described for identifying global molecular sequence features allowing for errors including repeats, matches between sequences, dyad symmetry pairings, and other sequence patterns. A multiple sequence alignment algorithm is also described. Specific applications are given to hepatitis B viruses and the J5-C (J, joining; C, constant) region of the immunoglobulin kappa gene. PMID:3124111

  3. Gene Discovery through Expressed Sequence Tag Sequencing in Trypanosoma cruzi

    PubMed Central

    Verdun, Ramiro E.; Di Paolo, Nelson; Urmenyi, Turan P.; Rondinelli, Edson; Frasch, Alberto C. C.; Sanchez, Daniel O.

    1998-01-01

    Analysis of expressed sequence tags (ESTs) constitutes a useful approach for gene identification that, in the case of human pathogens, might result in the identification of new targets for chemotherapy and vaccine development. As part of the Trypanosoma cruzi genome project, we have partially sequenced the 5′ ends of 1,949 clones to generate ESTs. The clones were randomly selected from a normalized CL Brener epimastigote cDNA library. A total of 14.6% of the clones were homologous to previously identified T. cruzi genes, while 18.4% had significant matches to genes from other organisms in the database. A total of 67% of the ESTs had no matches in the database, and thus, some of them might be T. cruzi-specific genes. Functional groups of those sequences with matches in the database were constructed according to their putative biological functions. The two largest categories were protein synthesis (23.3%) and cell surface molecules (10.8%). The information reported in this paper should be useful for researchers in the field to analyze genes and proteins of their own interest. PMID:9784549

  4. Program for Editing Spacecraft Command Sequences

    NASA Technical Reports Server (NTRS)

    Gladden, Roy; Waggoner, Bruce; Kordon, Mark; Hashemi, Mahnaz; Hanks, David; Salcedo, Jose

    2006-01-01

    Sequence Translator, Editor, and Expander Resource (STEER) is a computer program that facilitates construction of sequences and blocks of sequences (hereafter denoted generally as sequence products) for commanding a spacecraft. STEER also provides mechanisms for translating among various sequence product types and quickly expanding activities of a given sequence in chronological order for review and analysis of the sequence. To date, construction of sequence products has generally been done by use of such clumsy mechanisms as text-editor programs, translating among sequence product types has been challenging, and expanding sequences to time-ordered lists has involved arduous processes of converting sequence products to "real" sequences and running them through Class-A software (defined, loosely, as flight and ground software critical to a spacecraft mission). Also, heretofore, generating sequence products in standard formats has been troublesome because precise formatting and syntax are required. STEER alleviates these issues by providing a graphical user interface containing intuitive fields in which the user can enter the necessary information. The STEER expansion function provides a "quick and dirty" means of seeing how a sequence and sequence block would expand into a chronological list, without need to use of Class-A software.

  5. Extrapolation methods for vector sequences

    NASA Technical Reports Server (NTRS)

    Smith, David A.; Ford, William F.; Sidi, Avram

    1987-01-01

    This paper derives, describes, and compares five extrapolation methods for accelerating convergence of vector sequences or transforming divergent vector sequences to convergent ones. These methods are the scalar epsilon algorithm (SEA), vector epsilon algorithm (VEA), topological epsilon algorithm (TEA), minimal polynomial extrapolation (MPE), and reduced rank extrapolation (RRE). MPE and RRE are first derived and proven to give the exact solution for the right 'essential degree' k. Then, Brezinski's (1975) generalization of the Shanks-Schmidt transform is presented; the generalized form leads from systems of equations to TEA. The necessary connections are then made with SEA and VEA. The algorithms are extended to the nonlinear case by cycling, the error analysis for MPE and VEA is sketched, and the theoretical support for quadratic convergence is discussed. Strategies for practical implementation of the methods are considered.

  6. Triple helix purification and sequencing

    DOEpatents

    Wang, Renfeng; Smith, Lloyd M.; Tong, Xinchun E.

    1995-01-01

    Disclosed herein are methods, kits, and equipment for purifying single stranded circular DNA and then using the DNA for DNA sequencing purposes. Templates are provided with an insert having a hybridization region. An elongated oligonucleotide has two regions that are complementary to the insert and the oligo is bound to a magnetic anchor. The oligo hybridizes to the insert on two sides to form a stable triple helix complex. The anchor can then be used to drag the template out of solution using a magnet. The system can purify sequencing templates, and if desired the triple helix complex can be opened up to a double helix so that the oligonucleotide will act as a primer for further DNA synthesis.

  7. Apparatus for improved DNA sequencing

    DOEpatents

    Douthart, Richard J.; Crowell, Shannon L.

    1996-01-01

    This invention is a means for the rapid sequencing of DNA samples. More specifically, it consists of a new design direct blotting electrophoresis unit. The DNA sequence is deposited on a membrane attached to a rotating drum. Initial data compaction is facilitated by the use of a machined multi-channeled plate called a ribbon channel plate. Each channel is an isolated mini gel system much like a gel filled capillary. The system as a whole, however, is in a slab gel like format with the advantages of uniformity and easy reusability. The system can be used in different embodiments. The drum system is unique in that after deposition the drum rotates the deposited DNA into a large non-buffer open space where processing and detection can occur. The drum can also be removed in toto to special workstations for downstream processing, multiplexing and detection.

  8. Apparatus for improved DNA sequencing

    DOEpatents

    Douthart, R.J.; Crowell, S.L.

    1996-05-07

    This invention is a means for the rapid sequencing of DNA samples. More specifically, it consists of a new design direct blotting electrophoresis unit. The DNA sequence is deposited on a membrane attached to a rotating drum. Initial data compaction is facilitated by the use of a machined multi-channeled plate called a ribbon channel plate. Each channel is an isolated mini gel system much like a gel filled capillary. The system as a whole, however, is in a slab gel like format with the advantages of uniformity and easy reusability. The system can be used in different embodiments. The drum system is unique in that after deposition the drum rotates the deposited DNA into a large non-buffer open space where processing and detection can occur. The drum can also be removed in toto to special workstations for downstream processing, multiplexing and detection. 18 figs.

  9. Visual pattern image sequence coding

    NASA Technical Reports Server (NTRS)

    Silsbee, Peter; Bovik, Alan C.; Chen, Dapang

    1990-01-01

    The visual pattern image coding (VPIC) configurable digital image-coding process is capable of coding with visual fidelity comparable to the best available techniques, at compressions which (at 30-40:1) exceed all other technologies. These capabilities are associated with unprecedented coding efficiencies; coding and decoding operations are entirely linear with respect to image size and entail a complexity that is 1-2 orders of magnitude faster than any previous high-compression technique. The visual pattern image sequence coding to which attention is presently given exploits all the advantages of the static VPIC in the reduction of information from an additional, temporal dimension, to achieve unprecedented image sequence coding performance.

  10. Cassini Mission Sequence Subsystem (MSS)

    NASA Technical Reports Server (NTRS)

    Alland, Robert

    2011-01-01

    This paper describes my work with the Cassini Mission Sequence Subsystem (MSS) team during the summer of 2011. It gives some background on the motivation for this project and describes the expected benefit to the Cassini program. It then introduces the two tasks that I worked on - an automatic system auditing tool and a series of corrections to the Cassini Sequence Generator (SEQ_GEN) - and the specific objectives these tasks were to accomplish. Next, it details the approach I took to meet these objectives and the results of this approach, followed by a discussion of how the outcome of the project compares with my initial expectations. The paper concludes with a summary of my experience working on this project, lists what the next steps are, and acknowledges the help of my Cassini colleagues.

  11. Triple helix purification and sequencing

    DOEpatents

    Wang, R.; Smith, L.M.; Tong, X.E.

    1995-03-28

    Disclosed herein are methods, kits, and equipment for purifying single stranded circular DNA and then using the DNA for DNA sequencing purposes. Templates are provided with an insert having a hybridization region. An elongated oligonucleotide has two regions that are complementary to the insert and the oligo is bound to a magnetic anchor. The oligo hybridizes to the insert on two sides to form a stable triple helix complex. The anchor can then be used to drag the template out of solution using a magnet. The system can purify sequencing templates, and if desired the triple helix complex can be opened up to a double helix so that the oligonucleotide will act as a primer for further DNA synthesis. 4 figures.

  12. Cassini Mission Sequence Subsystem (MSS)

    NASA Technical Reports Server (NTRS)

    Alland, Robert

    2011-01-01

    This paper describes my work with the Cassini Mission Sequence Subsystem (MSS) team during the summer of 2011. It gives some background on the motivation for this project and describes the expected benefit to the Cassini program. It then introduces the two tasks that I worked on - an automatic system auditing tool and a series of corrections to the Cassini Sequence Generator (SEQ_GEN) - and the specific objectives these tasks were to accomplish. Next, it details the approach I took to meet these objectives and the results of this approach, followed by a discussion of how the outcome of the project compares with my initial expectations. The paper concludes with a summary of my experience working on this project, lists what the next steps are, and acknowledges the help of my Cassini colleagues.

  13. Genome Sequence of Mycobacteriophage Momo

    PubMed Central

    Bina, Elizabeth A.; Brahme, Indraneel S.; Hill, Amy B.; Himmelstein, Philip H.; Hunsicker, Sara M.; Ish, Amanda R.; Le, Tinh S.; Martin, Mary M.; Moscinski, Catherine N.; Shetty, Sameer A.; Swierzewski, Tomasz; Iyengar, Varun B.; Kim, Hannah; Schafer, Claire E.; Grubb, Sarah R.; Warner, Marcie H.; Bowman, Charles A.; Russell, Daniel A.; Hatfull, Graham F.

    2015-01-01

    Momo is a newly discovered phage of Mycobacterium smegmatis mc2155. Momo has a double-stranded DNA genome 154,553 bp in length, with 233 predicted protein-encoding genes, 34 tRNA genes, and one transfer-messenger RNA (tmRNA) gene. Momo has a myoviral morphology and shares extensive nucleotide sequence similarity with subcluster C1 mycobacteriophages. PMID:26089415

  14. Genome Sequence of Mycobacteriophage Momo.

    PubMed

    Pope, Welkin H; Bina, Elizabeth A; Brahme, Indraneel S; Hill, Amy B; Himmelstein, Philip H; Hunsicker, Sara M; Ish, Amanda R; Le, Tinh S; Martin, Mary M; Moscinski, Catherine N; Shetty, Sameer A; Swierzewski, Tomasz; Iyengar, Varun B; Kim, Hannah; Schafer, Claire E; Grubb, Sarah R; Warner, Marcie H; Bowman, Charles A; Russell, Daniel A; Hatfull, Graham F

    2015-06-18

    Momo is a newly discovered phage of Mycobacterium smegmatis mc(2)155. Momo has a double-stranded DNA genome 154,553 bp in length, with 233 predicted protein-encoding genes, 34 tRNA genes, and one transfer-messenger RNA (tmRNA) gene. Momo has a myoviral morphology and shares extensive nucleotide sequence similarity with subcluster C1 mycobacteriophages. Copyright © 2015 Pope et al.

  15. Genome sequences and great expectations

    PubMed Central

    Iliopoulos, Ioannis; Tsoka, Sophia; Andrade, Miguel A; Janssen, Paul; Audit, Benjamin; Tramontano, Anna; Valencia, Alfonso; Leroy, Christophe; Sander, Chris; Ouzounis, Christos A

    2001-01-01

    To assess how automatic function assignment will contribute to genome annotation in the next five years, we have performed an analysis of 31 available genome sequences. An emerging pattern is that function can be predicted for almost two-thirds of the 73,500 genes that were analyzed. Despite progress in computational biology, there will always be a great need for large-scale experimental determination of protein function. PMID:11178275

  16. Channel plate for DNA sequencing

    DOEpatents

    Douthart, Richard J.; Crowell, Shannon L.

    1998-01-01

    This invention is a channel plate that facilitates data compaction in DNA sequencing. The channel plate has a length, a width and a thickness, and further has a plurality of channels that are parallel. Each channel has a depth partially through the thickness of the channel plate. Additionally an interface edge permits electrical communication across an interface through a buffer to a deposition membrane surface.

  17. Channel plate for DNA sequencing

    DOEpatents

    Douthart, R.J.; Crowell, S.L.

    1998-01-13

    This invention is a channel plate that facilitates data compaction in DNA sequencing. The channel plate has a length, a width and a thickness, and further has a plurality of channels that are parallel. Each channel has a depth partially through the thickness of the channel plate. Additionally an interface edge permits electrical communication across an interface through a buffer to a deposition membrane surface. 15 figs.

  18. Scrambled Sobol Sequences via Permutation

    DTIC Science & Technology

    2009-01-01

    two as modulus LCG. The linear scramblings thus used prime moduli, but only primes with special form such as the Mersenne or a Sophie-Germain primes ...both power-of-two and prime moduli are common pseudorandom number generators. When the modulus of an LCG is a power-of- two, the implementation is cheap...algorithm can be easily modified to handle other low-discrepancy sequences represented in bases other than 2, e.g. the prime number based Halton

  19. Orthogonal-polarization multipulse sequences

    NASA Astrophysics Data System (ADS)

    Grydeland, T.; Gustavsson, B.

    2011-02-01

    It is well known that using orthogonal polarizations for subpulses in multipulse sequences can be used to reduce clutter contributions in these modes. In this paper we show that further improvements are achieved if the orthogonality is taken into account already when constructing the codes. Using orthogonal polarizations, one can use denser transmission patterns, including elementary pulses without gaps between them, patterns that have severe range ambiguities when only a single polarization is used. Furthermore, correlations are computed separately for each combination of elementary pulse polarizations. Consequently, it is possible to have longer multipulse sequences without gaps in the lag sequence, it is possible to compute the odd lags as well as the even ones, and it is permissible to have some lags multiply obtained without range ambiguity. This means that using orthogonal polarizations when creating the multipulse transmission pattern gives flexibility well beyond the single-polarization case. This flexibility can be used to design patterns suited to particular experimental requirements. Furthermore, we point out that the improvement in clutter might have a more dramatic impact than is generally realized, particularly in high-SNR situations where the improvement in speed is up to a factor of 4. Examples are given of single- and multiple-frequency patterns that are not usable if only one polarization is available. Although all incoherent scatter radars in use today, except Jicamarca, lack orthogonal polarization capabilities, designers of the next generation of radars might find the improvements described herein to be of interest.

  20. The transvaal sequence: an overview

    NASA Astrophysics Data System (ADS)

    Eriksson, P. G.; Schweitzer, J. K.; Bosch, P. J. A.; Schereiber, U. M.; Van Deventer, J. L.; Hatton, C. J.

    1993-02-01

    The 15 000 m of relatively unmetamorphosed clastic and chemical sedimentary and volcanic rocks of the 2550-2050 Ma Transvaal Sequence as preserved within the Transvaal and correlated Griqualand West basins of South Africa, and in the Kanye basin of Botswana are described. Immature clastic sedimentary and largely andesitic volcanic rocks of the Wolkberg, Godwan and Buffelsfontein Groups and the Bloempoort and Wachteenbeetje Formations probably represent rift-related sequences of Ventersdorp age. The thin sandstones of the Black Reef Formation, developed at the base of both the Kanye and Transvaal basin successions and correlated with the basal Vryburg siltstones of the Griqualand West Sequence, are considered here to be the basal unit of the Transvaal Sequence. The Black Reef fluvial deposits grade up into the epeiric marine carbonates of the Malmani Subgroup. These stromatolitic dolomites and interdbedded cherts were laid down within a steepened carbonate ramp setting; transgressions from an initial Griqualand West compartment towards the northeast covered both the Kanye and Transvaal basins. Iron formations of the succeeding Penge Formation and Griqualand West correlates are envisaged as relatively shallow water shelf deposits within the carbonate platform model; siliceous breccias of the Kanye basin are interpreted as reflecting subaerial brecciation of exposed silica gels. The Duitschland Formation overlying the Penge iron formations is seen as a final, regressive clastic and chemical sedimentary deposits as the Malmani-Penge sea retreated from the Transvaal basin. The interbedded sandstones and mudstones of the uncomformity-bounded Pretoria Group probably represent a combination of alluvial fan and fluviodeltaic complexes debouching into the largely lacustrine Transvaal and Kanye basins. A strong glacial influence in the lower Pretoria Group is reflected in the correlated Makganyene diamicities of the Griqualand West Sequence. Sedimentation across all three

  1. Constrained de novo sequencing of conotoxins.

    PubMed

    Bhatia, Swapnil; Kil, Yong J; Ueberheide, Beatrix; Chait, Brian T; Tayo, Lemmuel; Cruz, Lourdes; Lu, Bingwen; Yates, John R; Bern, Marshall

    2012-08-03

    De novo peptide sequencing by mass spectrometry (MS) can determine the amino acid sequence of an unknown peptide without reference to a protein database. MS-based de novo sequencing assumes special importance in focused studies of families of biologically active peptides and proteins, such as hormones, toxins, and antibodies, for which amino acid sequences may be difficult to obtain through genomic methods. These protein families often exhibit sequence homology or characteristic amino acid content; yet, current de novo sequencing approaches do not take advantage of this prior knowledge and, hence, search an unnecessarily large space of possible sequences. Here, we describe an algorithm for de novo sequencing that incorporates sequence constraints into the core graph algorithm and thereby reduces the search space by many orders of magnitude. We demonstrate our algorithm in a study of cysteine-rich toxins from two cone snail species (Conus textile and Conus stercusmuscarum) and report 13 de novo and about 60 total toxins.

  2. Constrained De Novo Sequencing of Conotoxins

    PubMed Central

    Bhatia, Swapnil; Kil, Yong J.; Ueberheide, Beatrix; Chait, Brian T.; Tayo, Lemmuel; Cruz, Lourdes; Lu, Bingwen; Yates, John R.; Bern, Marshall

    2012-01-01

    De novo peptide sequencing by mass spectrometry (MS) can determine the amino acid sequence of an unknown peptide without reference to a protein database. MS-based de novo sequencing assumes special importance in focused studies of families of biologically active peptides and proteins, such as hormones, toxins, and antibodies, for which amino acid sequences may be difficult to obtain through genomic methods. These protein families often exhibit sequence homology or characteristic amino acid content, yet current de novo sequencing approaches do not take advantage of this prior knowledge and hence search an unnecessarily large space of possible sequences. Here, we describe an algorithm for de novo sequencing that incorporates sequence constraints into the core graph algorithm, and thereby reduces the search space by many orders of magnitude. We demonstrate our algorithm in a study of cysteine-rich toxins from two cone snail species (Conus textile and Conus stercusmuscarum), and report 13 de novo and about 60 total toxins. PMID:22709442

  3. Ultrafast clustering algorithms for metagenomic sequence analysis

    PubMed Central

    Fu, Limin; Niu, Beifang; Wu, Sitao; Wooley, John

    2012-01-01

    The rapid advances of high-throughput sequencing technologies dramatically prompted metagenomic studies of microbial communities that exist at various environments. Fundamental questions in metagenomics include the identities, composition and dynamics of microbial populations and their functions and interactions. However, the massive quantity and the comprehensive complexity of these sequence data pose tremendous challenges in data analysis. These challenges include but are not limited to ever-increasing computational demand, biased sequence sampling, sequence errors, sequence artifacts and novel sequences. Sequence clustering methods can directly answer many of the fundamental questions by grouping similar sequences into families. In addition, clustering analysis also addresses the challenges in metagenomics. Thus, a large redundant data set can be represented with a small non-redundant set, where each cluster can be represented by a single entry or a consensus. Artifacts can be rapidly detected through clustering. Errors can be identified, filtered or corrected by using consensus from sequences within clusters. PMID:22772836

  4. RNA sequence analysis using covariance models.

    PubMed Central

    Eddy, S R; Durbin, R

    1994-01-01

    We describe a general approach to several RNA sequence analysis problems using probabilistic models that flexibly describe the secondary structure and primary sequence consensus of an RNA sequence family. We call these models 'covariance models'. A covariance model of tRNA sequences is an extremely sensitive and discriminative tool for searching for additional tRNAs and tRNA-related sequences in sequence databases. A model can be built automatically from an existing sequence alignment. We also describe an algorithm for learning a model and hence a consensus secondary structure from initially unaligned example sequences and no prior structural information. Models trained on unaligned tRNA examples correctly predict tRNA secondary structure and produce high-quality multiple alignments. The approach may be applied to any family of small RNA sequences. Images PMID:8029015

  5. Personal genome sequencing: current approaches and challenges

    PubMed Central

    Snyder, Michael; Du, Jiang; Gerstein, Mark

    2010-01-01

    The revolution in DNA sequencing technologies has now made it feasible to determine the genome sequences of many individuals; i.e., “personal genomes.” Genome sequences of cells and tissues from both normal and disease states have been determined. Using current approaches, whole human genome sequences are not typically assembled and determined de novo, but, instead, variations relative to a reference sequence are identified. We discuss the current state of personal genome sequencing, the main steps involved in determining a genome sequence (i.e., identifying single-nucleotide polymorphisms [SNPs] and structural variations [SVs], assembling new sequences, and phasing haplotypes), and the challenges and performance metrics for evaluating the accuracy of the reconstruction. Finally, we consider the possible individual and societal benefits of personal genome sequences. PMID:20194435

  6. Method and apparatus for biological sequence comparison

    DOEpatents

    Marr, Thomas G.; Chang, William I-Wei

    1997-01-01

    A method and apparatus for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence.

  7. Method and apparatus for biological sequence comparison

    DOEpatents

    Marr, T.G.; Chang, W.I.

    1997-12-23

    A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.

  8. Memory and learning with rapid audiovisual sequences

    PubMed Central

    Keller, Arielle S.; Sekuler, Robert

    2015-01-01

    We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed. PMID:26575193

  9. Sequencing Voyager II for the Uranus encounter

    NASA Technical Reports Server (NTRS)

    Morris, R. B.

    1986-01-01

    The process of developing the programmed sequence of events necessary for the Voyager 2 spacecraft to return desired data from its Uranus encounter is discussed. The major steps in the sequence process are reviewed, and the elements of the Mission Sequence Software are described. The design phase and the implementation phase of the sequence process are discussed, and the Computer Command Subsystem architecture is examined in detail. The software's role in constructing the sequences and converting them into onboard programs is elucidated, and the problems unique to the Uranus encounter sequences are considered.

  10. Pure perceptual-based sequence learning.

    PubMed

    Remillard, Gilbert

    2003-07-01

    Learning a sequence of target locations when the sequence is uncorrelated with a sequence of responses and target location is not the response dimension (pure perceptual-based sequence learning) was examined. Using probabilistic sequences of target locations, the author shows that such learning can be implicit, is unaffected by distance between target locations, and is mostly limited to first-order transition probabilities. Moreover, the mechanism underlying learning affords processing of information at anticipated target locations and appears to be attention based. Implications for hypotheses of implicit sequence learning are discussed.

  11. Biomolecule Sequencer: Nanopore Sequencing Technology for In-Situ Environmental Monitoring and Astrobiology

    NASA Astrophysics Data System (ADS)

    John, K. K.; Botkin, D. J.; Burton, A. S.; Castro-Wallace, S. L.; Chaput, J. D.; Dworkin, J. P.; Lupisella, M. L.; Mason, C. E.; Rubins, K. H.; Smith, D. J.; Stahl, S.; Switzer, C.

    2016-10-01

    Biomolecule Sequencer will demonstrate, for the first time, that DNA sequencing is feasible as a tool for in-situ environmental monitoring and astrobiology. A space-based sequencer could identify microbes, diseases, and help detect DNA-based life.

  12. DNA sequencing: bench to bedside and beyond†

    PubMed Central

    Hutchison, Clyde A.

    2007-01-01

    Fifteen years elapsed between the discovery of the double helix (1953) and the first DNA sequencing (1968). Modern DNA sequencing began in 1977, with development of the chemical method of Maxam and Gilbert and the dideoxy method of Sanger, Nicklen and Coulson, and with the first complete DNA sequence (phage ϕX174), which demonstrated that sequence could give profound insights into genetic organization. Incremental improvements allowed sequencing of molecules >200 kb (human cytomegalovirus) leading to an avalanche of data that demanded computational analysis and spawned the field of bioinformatics. The US Human Genome Project spurred sequencing activity. By 1992 the first ‘sequencing factory’ was established, and others soon followed. The first complete cellular genome sequences, from bacteria, appeared in 1995 and other eubacterial, archaebacterial and eukaryotic genomes were soon sequenced. Competition between the public Human Genome Project and Celera Genomics produced working drafts of the human genome sequence, published in 2001, but refinement and analysis of the human genome sequence will continue for the foreseeable future. New ‘massively parallel’ sequencing methods are greatly increasing sequencing capacity, but further innovations are needed to achieve the ‘thousand dollar genome’ that many feel is prerequisite to personalized genomic medicine. These advances will also allow new approaches to a variety of problems in biology, evolution and the environment. PMID:17855400

  13. Integration of retinal image sequences

    NASA Astrophysics Data System (ADS)

    Ballerini, Lucia

    1998-10-01

    In this paper a method for noise reduction in ocular fundus image sequences is described. The eye is the only part of the human body where the capillary network can be observed along with the arterial and venous circulation using a non invasive technique. The study of the retinal vessels is very important both for the study of the local pathology (retinal disease) and for the large amount of information it offers on systematic haemodynamics, such as hypertension, arteriosclerosis, and diabetes. In this paper a method for image integration of ocular fundus image sequences is described. The procedure can be divided in two step: registration and fusion. First we describe an automatic alignment algorithm for registration of ocular fundus images. In order to enhance vessel structures, we used a spatially oriented bank of filters designed to match the properties of the objects of interest. To evaluate interframe misalignment we adopted a fast cross-correlation algorithm. The performances of the alignment method have been estimated by simulating shifts between image pairs and by using a cross-validation approach. Then we propose a temporal integration technique of image sequences so as to compute enhanced pictures of the overall capillary network. Image registration is combined with image enhancement by fusing subsequent frames of a same region. To evaluate the attainable results, the signal-to-noise ratio was estimated before and after integration. Experimental results on synthetic images of vessel-like structures with different kind of Gaussian additive noise as well as on real fundus images are reported.

  14. Benchmarking short sequence mapping tools.

    PubMed

    Hatem, Ayat; Bozdağ, Doruk; Toland, Amanda E; Çatalyürek, Ümit V

    2013-06-07

    The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison. We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others. The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results.

  15. Benchmarking short sequence mapping tools

    PubMed Central

    2013-01-01

    Background The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison. Results We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others. Conclusion The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results. PMID:23758764

  16. Proline-rich Sequence Recognition

    PubMed Central

    Schlundt, Andreas; Sticht, Jana; Piotukh, Kirill; Kosslick, Daniela; Jahnke, Nadin; Keller, Sandro; Schuemann, Michael; Krause, Eberhard; Freund, Christian

    2009-01-01

    The tumor maintenance protein Tsg101 has recently gained much attention because of its involvement in endosomal sorting, virus release, cytokinesis, and cancerogenesis. The ubiquitin-E2-like variant (UEV) domain of the protein interacts with proline-rich sequences of target proteins that contain P(S/T)AP amino acid motifs and weakly binds to the ubiquitin moiety of proteins committed to sorting or degradation. Here we performed peptide spot analysis and phage display to refine the peptide binding specificity of the Tsg101 UEV domain. A mass spectrometric proteomics approach that combines domain-based pulldown experiments, binding site inactivation, and stable isotope labeling by amino acids in cell culture (SILAC) was then used to delineate the relative importance of the peptide and ubiquitin binding sites. Clearly “PTAP” interactions dominate target recognition, and we identified several novel binders as for example the poly(A)-binding protein 1 (PABP1), Sec24b, NFκB2, and eIF4b. For PABP1 and eIF4b the interactions were confirmed in the context of the corresponding full-length proteins in cellular lysates. Therefore, our results strongly suggest additional roles of Tsg101 in cellular regulation of mRNA translation. Regulation of Tsg101 itself by the ubiquitin ligase TAL (Tsg101-associated ligase) is most likely conferred by a single PSAP binding motif that enables the interaction with Tsg101 UEV. Together with the results from the accompanying article (Kofler, M., Schuemann, M., Merz, C., Kosslick, D., Schlundt, A., Tannert, A., Schaefer, M., Lührmann, R., Krause, E., and Freund, C. (2009) Proline-rich sequence recognition: I. Marking GYF and WW domain assembly sites in early spliceosomal complexes. Mol. Cell. Proteomics 8, 2461–2473) on GYF and WW domain pathways our work defines major proline-rich sequence-mediated interaction networks that contribute to the modular assembly of physiologically relevant protein complexes. PMID:19542561

  17. Genetic mapping and DNA sequencing

    SciTech Connect

    Speed, T.; Waterman, M.S.

    1996-12-31

    The Human Genome Initiative has as its primary objective the characterization of the human genome. High-resolution linkage maps of genetic markers will play an important role in completing the human genome project. This is one of two volumes based on the proceedings of the 1994 IMA Summer Program on Molecular Biology and comprises Weeks 1 and 2 of the four-week program. This volume focuses on genetic mapping and DNA sequencing. Selected papers are indexed separately for inclusion in the Energy Science and Technology Database.

  18. Apollo: a sequence annotation editor.

    PubMed

    Lewis, S E; Searle, S M J; Harris, N; Gibson, M; Lyer, V; Richter, J; Wiel, C; Bayraktaroglu, L; Birney, E; Crosby, M A; Kaminker, J S; Matthews, B B; Prochnik, S E; Smithy, C D; Tupy, J L; Rubin, G M; Misra, S; Mungall, C J; Clamp, M E

    2002-01-01

    The well-established inaccuracy of purely computational methods for annotating genome sequences necessitates an interactive tool to allow biological experts to refine these approximations by viewing and independently evaluating the data supporting each annotation. Apollo was developed to meet this need, enabling curators to inspect genome annotations closely and edit them. FlyBase biologists successfully used Apollo to annotate the Drosophila melanogaster genome and it is increasingly being used as a starting point for the development of customized annotation editing tools for other genome projects.

  19. Differential correlation for sequencing data.

    PubMed

    Siska, Charlotte; Kechris, Katerina

    2017-01-19

    Several methods have been developed to identify differential correlation (DC) between pairs of molecular features from -omics studies. Most DC methods have only been tested with microarrays and other platforms producing continuous and Gaussian-like data. Sequencing data is in the form of counts, often modeled with a negative binomial distribution making it difficult to apply standard correlation metrics. We have developed an R package for identifying DC called Discordant which uses mixture models for correlations between features and the Expectation Maximization (EM) algorithm for fitting parameters of the mixture model. Several correlation metrics for sequencing data are provided and tested using simulations. Other extensions in the Discordant package include additional modeling for different types of differential correlation, and faster implementation, using a subsampling routine to reduce run-time and address the assumption of independence between molecular feature pairs. With simulations and breast cancer miRNA-Seq and RNA-Seq data, we find that Spearman's correlation has the best performance among the tested correlation methods for identifying differential correlation. Application of Spearman's correlation in the Discordant method demonstrated the most power in ROC curves and sensitivity/specificity plots, and improved ability to identify experimentally validated breast cancer miRNA. We also considered including additional types of differential correlation, which showed a slight reduction in power due to the additional parameters that need to be estimated, but more versatility in applications. Finally, subsampling within the EM algorithm considerably decreased run-time with negligible effect on performance. A new method and R package called Discordant is presented for identifying differential correlation with sequencing data. Based on comparisons with different correlation metrics, this study suggests Spearman's correlation is appropriate for sequencing data

  20. WebLogo: a sequence logo generator.

    PubMed

    Crooks, Gavin E; Hon, Gary; Chandonia, John-Marc; Brenner, Steven E

    2004-06-01

    WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment. Sequence logos provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive. Each logo consists of stacks of letters, one stack for each position in the sequence. The overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. WebLogo has been enhanced recently with additional features and options, to provide a convenient and highly configurable sequence logo generator. A command line interface and the complete, open WebLogo source code are available for local installation and customization. Copyright 2004 Cold Spring Harbor Laboratory Press

  1. The Genome Sequencing Center at NCGR

    SciTech Connect

    Schilkey, Faye

    2010-06-02

    Faye Schilkey from the National Center for Genome Resources discusses NCGR's research, sequencing and analysis experience on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  2. An Assignment Sequence for Underprepared Writers.

    ERIC Educational Resources Information Center

    Nimmo, Kristi

    2000-01-01

    Presents a sequenced writing assignment on shopping to aid basic writers. Describes a writing assignment focused around online and mail-order shopping. Notes steps in preparing for the assignment, the sequence, and discusses responses to the assignments. (SC)

  3. Genome Sequences of Eight Morphologically Diverse Alphaproteobacteria▿

    PubMed Central

    Brown, Pamela J. B.; Kysela, David T.; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V.

    2011-01-01

    The Alphaproteobacteriacomprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium. PMID:21705585

  4. Genome sequences of eight morphologically diverse Alphaproteobacteria.

    PubMed

    Brown, Pamela J B; Kysela, David T; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V

    2011-09-01

    The Alphaproteobacteria comprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium.

  5. The Art of Gymnastics: Creating Sequences.

    ERIC Educational Resources Information Center

    Rovegno, Inez

    1988-01-01

    Offering students opportunities for creating movement sequences in gymnastics allows them to understand the essence of gymnastics, have creative experiences, and learn about themselves. The process of creating sequences is described. (MT)

  6. FOGSAA: Fast Optimal Global Sequence Alignment Algorithm

    NASA Astrophysics Data System (ADS)

    Chakraborty, Angana; Bandyopadhyay, Sanghamitra

    2013-04-01

    In this article we propose a Fast Optimal Global Sequence Alignment Algorithm, FOGSAA, which aligns a pair of nucleotide/protein sequences faster than any optimal global alignment method including the widely used Needleman-Wunsch (NW) algorithm. FOGSAA is applicable for all types of sequences, with any scoring scheme, and with or without affine gap penalty. Compared to NW, FOGSAA achieves a time gain of (70-90)% for highly similar nucleotide sequences (> 80% similarity), and (54-70)% for sequences having (30-80)% similarity. For other sequences, it terminates with an approximate score. For protein sequences, the average time gain is between (25-40)%. Compared to three heuristic global alignment methods, the quality of alignment is improved by about 23%-53%. FOGSAA is, in general, suitable for aligning any two sequences defined over a finite alphabet set, where the quality of the global alignment is of supreme importance.

  7. An Assignment Sequence for Underprepared Writers.

    ERIC Educational Resources Information Center

    Nimmo, Kristi

    2000-01-01

    Presents a sequenced writing assignment on shopping to aid basic writers. Describes a writing assignment focused around online and mail-order shopping. Notes steps in preparing for the assignment, the sequence, and discusses responses to the assignments. (SC)

  8. Entropy analysis of substitutive sequences revisited

    NASA Astrophysics Data System (ADS)

    Karamanos, K.

    2001-11-01

    A given finite sequence of letters over a finite alphabet can always be algorithmically generated, in particular by a Turing machine. This fact is at the heart of complexity theory in the sense of Kolmogorov and Chaitin. A relevant question in this context is whether, given a statistically 'sufficiently long' sequence, there exists a deterministic finite automaton that generates it. In this paper we propose a simple criterion, based on measuring block entropies by lumping, which is satisfied by all automatic sequences. On the basis of this, one can determine that a given sequence is not automatic and obtain interesting information when the sequence is automatic. Following previous work on the Feigenbaum sequence, we give a necessary entropy-based condition valid for all automatic sequences read by lumping. Applications of these ideas to representative examples are discussed. In particular, we establish new entropic decimation schemes for the Thue-Morse, the Rudin-Shapiro and the paperfolding sequences read by lumping.

  9. Block variables for deterministic aperiodic sequences

    NASA Astrophysics Data System (ADS)

    Hörnquist, Michael

    1997-10-01

    We use the concept of block variables to obtain a measure of order/disorder for some one-dimensional deterministic aperiodic sequences. For the Thue - Morse sequence, the Rudin - Shapiro sequence and the period-doubling sequence it is possible to obtain analytical expressions in the limit of infinite sequences. For the Fibonacci sequence, we present some analytical results which can be supported by numerical arguments. It turns out that the block variables show a wide range of different behaviour, some of them indicating that some of the considered sequences are more `random' than other. However, the method does not give any definite answer to the question of which sequence is more disordered than the other and, in this sense, the results obtained are negative. We compare this with some other ways of measuring the amount of order/disorder in such systems, and there seems to be no direct correspondence between the measures.

  10. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid sequence...

  11. Movement sequencing in Huntington disease.

    PubMed

    Georgiou-Karistianis, Nellie; Long, Jeffrey D; Lourens, Spencer G; Stout, Julie C; Mills, James A; Paulsen, Jane S

    2014-08-01

    To examine longitudinal changes in movement sequencing in prodromal Huntington's disease (HD) participants (795 prodromal HD; 225 controls) from the PREDICT-HD study. Prodromal HD participants were tested over seven annual visits and were stratified into three groups (low, medium, high) based on their CAG-Age Product (CAP) score, which indicates likely increasing proximity to diagnosis. A cued movement sequence task assessed the impact of advance cueing on response initiation and execution via three levels of advance information. Compared to controls, all CAP groups showed longer initiation and movement times across all conditions at baseline, demonstrating a disease gradient for the majority of outcomes. Across all conditions, the high CAP group had the highest mean for baseline testing, but also demonstrated an increase in movement time across the study. For initiation time, the high CAP group showed the highest mean baseline time across all conditions, but also faster decreasing rates of change over time. With progress to diagnosis, participants may increasingly use compensatory strategies, as evidenced by faster initiation. However, this occurred in conjunction with slowed execution times, suggesting a decline in effectively accessing control processes required to translate movement into effective execution.

  12. Movement sequencing in Huntington disease

    PubMed Central

    GEORGIOU-KARISTIANIS, NELLIE; LONG, JEFFREY D.; LOURENS, SPENCER G.; STOUT, JULIE C.; MILLS, JAMES A.; PAULSEN, JANE S.

    2015-01-01

    Objectives To examine longitudinal changes in movement sequencing in prodromal Huntington’s disease (HD) participants (795 prodromal HD; 225 controls) from the PREDICT-HD study. Methods Prodromal HD participants were tested over seven annual visits and were stratified into three groups (low, medium, high) based on their CAG-Age Product (CAP) score, which indicates likely increasing proximity to diagnosis. A cued movement sequence task assessed the impact of advance cueing on response initiation and execution via three levels of advance information. Results Compared to controls, all CAP groups showed longer initiation and movement times across all conditions at baseline, demonstrating a disease gradient for the majority of outcomes. Across all conditions, the high CAP group had the highest mean for baseline testing, but also demonstrated an increase in movement time across the study. For initiation time, the high CAP group showed the highest mean baseline time across all conditions, but also faster decreasing rates of change over time. Conclusions With progress to diagnosis, participants may increasingly use compensatory strategies, as evidenced by faster initiation. However, this occurred in conjunction with slowed execution times, suggesting a decline in effectively accessing control processes required to translate movement into effective execution. PMID:24678867

  13. SP8 Sequencing Extinct Genomes

    PubMed Central

    Poinar, H.

    2007-01-01

    Nucleic acids, which hold clues to the evolution of various animal and hominid taxa, are comparatively weak molecules from other cellular debris, and thus evolutionary biologists are in essence time trapped. Fortunately, DNA and protein fragments do exist in fossil remains beyond what theoretical experimentation would suggest. Sequestering of DNA molecules in humic or Maillard-like complexes likely represents a rich source of DNA molecules from the past, which have yet to be tapped. These molecules were impossible to acquire due to the selective nature of the polymerase chain reaction. Recently, however, rapid parallel pyrosequencing techniques, such as those used in metagenomics-based research, which, in theory, allow for the identification of all short nucleotide sequences in a sample in a non-selective approach, have the potential to allow the identification of all nucleic acids in a sample, and thus represent the way forward for ancient DNA. In theory, this new technology will allow the completion of genomes of extinct animals, plants, and microbes. I will discuss the benefits and pitfalls of this metagenomics approach to ancient DNA, highlighting our recent efforts underway to sequence the wooly mammoth genome as well as other fossil remains.

  14. Data structures for DNA sequence manipulation.

    PubMed Central

    Lawrence, C B

    1986-01-01

    Two data structures designated Fragment and Construct are described. The Fragment data structure defines a continuous nucleic acid sequence from a unique genetic origin. The Construct defines a continuous sequence composed of sequences from multiple genetic origins. These data structures are manipulated by a set of software tools to simulate the construction of mosaic recombinant DNA molecules. They are also used as an interface between sequence data banks and analytical programs. PMID:3753765

  15. Sequences of Rational Numbers Converging to Surds

    ERIC Educational Resources Information Center

    Fletcher, Rodney

    2010-01-01

    In this sequence 1/1, 7/5, 41/29, 239/169 and so on, Thomas notes that the sequence converges to square root of 2. By observation, the sequence of numbers in the numerator of the above sequence, have a pattern of generation which is the same as that in the denominator. That is, the next term is found by multiplying the previous term by six and…

  16. The recurrence sequence via the Fibonacci groups

    NASA Astrophysics Data System (ADS)

    Aküzüm, Yeşim; Deveci, Ömür

    2016-04-01

    This work develops properties of the recurrence sequence defined by the aid of the relation matrix of the Fibonacci groups. The study of this sequence modulo m yields cyclic groups and semigroups from generating matrix. Finally, we extend the sequence defined to groups and then, we obtain its period in the Fibonacci groups.

  17. Sequencing crop genomes: approaches and applications

    USDA-ARS?s Scientific Manuscript database

    Plant genome sequencing methodology parrallels the sequencing of the human genome. The first projects were slow and very expensive. BAC by BAC approaches were utilized first and whole-genome shotgun sequencing rapidly replaced that approach. So called 'next generation' technologies such as short rea...

  18. Joint Sequence Analysis: Association and Clustering

    ERIC Educational Resources Information Center

    Piccarreta, Raffaella

    2017-01-01

    In its standard formulation, sequence analysis aims at finding typical patterns in a set of life courses represented as sequences. Recently, some proposals have been introduced to jointly analyze sequences defined on different domains (e.g., work career, partnership, and parental histories). We introduce measures to evaluate whether a set of…

  19. Automated Sequence Generation Process and Software

    NASA Technical Reports Server (NTRS)

    Gladden, Roy

    2007-01-01

    "Automated sequence generation" (autogen) signifies both a process and software used to automatically generate sequences of commands to operate various spacecraft. The autogen software comprises the autogen script plus the Activity Plan Generator (APGEN) program. APGEN can be used for planning missions and command sequences.

  20. Task-Relevant Chunking in Sequence Learning

    ERIC Educational Resources Information Center

    Perlman, Amotz; Pothos, Emmanuel M.; Edwards, Darren J.; Tzelgov, Joseph

    2010-01-01

    In the present study, we investigated possible influences on the unitization of responses. In Experiments 1, 2, 3, and 6, we found that when the same small fragment (i.e., a few consecutive responses in a sequence) was presented as part of two larger sequences, participants responded to it faster when it was part of the sequence that was presented…

  1. Not all sequence tags are created equal: designing and validating sequence identification tags robust to indels.

    PubMed

    Faircloth, Brant C; Glenn, Travis C

    2012-01-01

    Ligating adapters with unique synthetic oligonucleotide sequences (sequence tags) onto individual DNA samples before massively parallel sequencing is a popular and efficient way to obtain sequence data from many individual samples. Tag sequences should be numerous and sufficiently different to ensure sequencing, replication, and oligonucleotide synthesis errors do not cause tags to be unrecoverable or confused. However, many design approaches only protect against substitution errors during sequencing and extant tag sets contain too few tag sequences. We developed an open-source software package to validate sequence tags for conformance to two distance metrics and design sequence tags robust to indel and substitution errors. We use this software package to evaluate several commercial and non-commercial sequence tag sets, design several large sets (max(count) = 7,198) of edit metric sequence tags having different lengths and degrees of error correction, and integrate a subset of these edit metric tags to polymerase chain reaction (PCR) primers and sequencing adapters. We validate a subset of these edit metric tagged PCR primers and sequencing adapters by sequencing on several platforms and subsequent comparison to commercially available alternatives. We find that several commonly used sets of sequence tags or design methodologies used to produce sequence tags do not meet the minimum expectations of their underlying distance metric, and we find that PCR primers and sequencing adapters incorporating edit metric sequence tags designed by our software package perform as well as their commercial counterparts. We suggest that researchers evaluate sequence tags prior to use or evaluate tags that they have been using. The sequence tag sets we design improve on extant sets because they are large, valid across the set, and robust to the suite of substitution, insertion, and deletion errors affecting massively parallel sequencing workflows on all currently used platforms.

  2. Chip-based sequencing nucleic acids

    DOEpatents

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  3. Integer sequence discovery from small graphs

    PubMed Central

    Hoppe, Travis; Petrone, Anna

    2015-01-01

    We have exhaustively enumerated all simple, connected graphs of a finite order and have computed a selection of invariants over this set. Integer sequences were constructed from these invariants and checked against the Online Encyclopedia of Integer Sequences (OEIS). 141 new sequences were added and six sequences were extended. From the graph database, we were able to programmatically suggest relationships among the invariants. It will be shown that we can readily visualize any sequence of graphs with a given criteria. The code has been released as an open-source framework for further analysis and the database was constructed to be extensible to invariants not considered in this work. PMID:27034526

  4. Exploration of sequence space for protein engineering.

    PubMed

    Gustafsson, C; Govindarajan, S; Emig, R

    2001-01-01

    The process of protein engineering is currently evolving towards a heuristic understanding of the sequence-function relationship. Improved DNA sequencing capacity, efficient protein function characterization and improved quality of data points in conjunction with well-established statistical tools from other industries are changing the protein engineering field. Algorithms capturing the heuristic sequence-function relationships will have a drastic impact on the field of protein engineering. In this review, several alternative approaches to quantitatively assess sequence space are discussed and the relatively few examples of wet-lab validation of statistical sequence-function characterization/correlation are described.

  5. An efficient method for multiple sequence alignment

    SciTech Connect

    Kim, J.; Pramanik, S.

    1994-12-31

    Multiple sequence alignment has been a useful method in the study of molecular evolution and sequence-structure relationships. This paper presents a new method for multiple sequence alignment based on simulated annealing technique. Dynamic programming has been widely used to find an optimal alignment. However, dynamic programming has several limitations to obtain optimal alignment. It requires long computation time and cannot apply certain types of cost functions. We describe detail mechanisms of simulated annealing for multiple sequence alignment problem. It is shown that simulated annealing can be an effective approach to overcome the limitations of dynamic programming in multiple sequence alignment problem.

  6. The 2016 Kumamoto earthquake sequence

    PubMed Central

    KATO, Aitaro; NAKAMURA, Kouji; HIYAMA, Yohei

    2016-01-01

    Beginning in April 2016, a series of shallow, moderate to large earthquakes with associated strong aftershocks struck the Kumamoto area of Kyushu, SW Japan. An Mj 7.3 mainshock occurred on 16 April 2016, close to the epicenter of an Mj 6.5 foreshock that occurred about 28 hours earlier. The intense seismicity released the accumulated elastic energy by right-lateral strike slip, mainly along two known, active faults. The mainshock rupture propagated along multiple fault segments with different geometries. The faulting style is reasonably consistent with regional deformation observed on geologic timescales and with the stress field estimated from seismic observations. One striking feature of this sequence is intense seismic activity, including a dynamically triggered earthquake in the Oita region. Following the mainshock rupture, postseismic deformation has been observed, as well as expansion of the seismicity front toward the southwest and northwest. PMID:27725474

  7. Replacement Sequence of Events Generator

    NASA Technical Reports Server (NTRS)

    Fisher, Forest; Gladden, Daniel Wenkert Roy; Khanampompan, Teerpat

    2008-01-01

    The soeWINDOW program automates the generation of an ITAR (International Traffic in Arms Regulations)-compliant sub-RSOE (Replacement Sequence of Events) by extracting a specified temporal window from an RSOE while maintaining page header information. RSOEs contain a significant amount of information that is not ITAR-compliant, yet that foreign partners need to see for command details to their instrument, as well as the surrounding commands that provide context for validation. soeWINDOW can serve as an example of how command support products can be made ITAR-compliant for future missions. This software is a Perl script intended for use in the mission operations UNIX environment. It is designed for use to support the MRO (Mars Reconnaissance Orbiter) instrument team. The tool also provides automated DOM (Distributed Object Manager) storage into the special ITAR-okay DOM collection, and can be used for creating focused RSOEs for product review by any of the MRO teams.

  8. Particle sizer and DNA sequencer

    DOEpatents

    Olivares, Jose A.; Stark, Peter C.

    2005-09-13

    An electrophoretic device separates and detects particles such as DNA fragments, proteins, and the like. The device has a capillary which is coated with a coating with a low refractive index such as Teflon.RTM. AF. A sample of particles is fluorescently labeled and injected into the capillary. The capillary is filled with an electrolyte buffer solution. An electrical field is applied across the capillary causing the particles to migrate from a first end of the capillary to a second end of the capillary. A detector light beam is then scanned along the length of the capillary to detect the location of the separated particles. The device is amenable to a high throughput system by providing additional capillaries. The device can also be used to determine the actual size of the particles and for DNA sequencing.

  9. Nanopore sequencing technology: nanopore preparations.

    PubMed

    Rhee, Minsoung; Burns, Mark A

    2007-04-01

    For the past decade, nanometer-scale pores have been developed as a powerful technique for sensing biological macromolecules. Various potential applications using these nanopores have been reported at the proof-of-principle stage, with the eventual aim of using them as an alternative to de novo DNA sequencing. Currently, there have been two general approaches to prepare nanopores for nucleic acid analysis: organic nanopores, such as alpha-hemolysin pores, are commonly used for DNA analysis, whereas synthetic solid-state nanopores have also been developed using various conventional and non-conventional fabrication techniques. In particular, synthetic nanopores with pore sizes smaller than the alpha-hemolysin pores have been prepared, primarily by electron-beam-assisted techniques: these are more robust and have better dimensional adjustability. This review will examine current methods of nanopore preparation, ranging from organic pore preparations to recent developments in synthetic nanopore fabrications.

  10. Distinguishing Proteins From Arbitrary Amino Acid Sequences

    PubMed Central

    Yau, Stephen S.-T.; Mao, Wei-Guang; Benson, Max; He, Rong Lucy

    2015-01-01

    What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid sequence can be a protein. Here we show that when the collection of arbitrary amino acid sequences is viewed in an appropriate geometric context, the protein sequences cluster together. This leads to a new computational test, described here, that has proved to be remarkably accurate at determining whether an arbitrary amino acid sequence can be a protein. Even more, if the results of this test indicate that the sequence can be a protein, and it is indeed a protein sequence, then its identity as a protein sequence is uniquely defined. We anticipate our computational test will be useful for those who are attempting to complete the job of discovering all proteins, or constructing the protein universe. PMID:25609314

  11. Genotator: A Workbench for Sequence Annotation

    SciTech Connect

    Harris, N.L.

    1997-05-01

    Sequencing centers such as the Human Genome Center at LBNL are producing an ever-increasing flood of genetic data. Annotation can greatly enhance the biological value of these sequences. Useful annotations include possible gene locations, homologies to known genes, and gene signals such as promoters and splice sites. Genotator is a workbench for automated sequence annotation and annotation browsing. The back end runs a series of sequence analysis tools on a DNA sequence, handling the various input and output formats required by the tools. Genotator currently runs five different gene finding programs, three homology searches, and searches for promoters, splice sites, and ORFs. The results of the analyses run by Genotator can be viewed with the interactive graphical browser. The browser displays color-coded sequence annotations on a canvas that can be scrolled and zoomed, allowing the annotated sequence to be explored at multiple levels of detail. The user can view the actual DNA sequence in a separate window; when a region is selected in the map display, it is automatically highlighted in the sequence display, and vice-versa. By displaying the output of all of the sequence analyses, Genotator provides an intuitive way to identify the significant regions (for example, probable exons) in a sequence. Users can interactively add personal annotations to label regions of interest. Additional capabilities of Genotator include primer design and pattern searching.

  12. Experimental investigation of an RNA sequence space

    NASA Technical Reports Server (NTRS)

    Lee, Youn-Hyung; Dsouza, Lisa; Fox, George E.

    1993-01-01

    Modern rRNAs are the historic consequence of an ongoing evolutionary exploration of a sequence space. These extant sequences belong to a special subset of the sequence space that is comprised only of those primary sequences that can validly perform the biological function(s) required of the particular RNA. If it were possible to readily identify all such valid sequences, stochastic predictions could be made about the relative likelihood of various evolutionary pathways available to an RNA. Herein an experimental system which can assess whether a particular sequence is likely to have validity as a eubacterial 5S rRNA is described. A total of ten naturally occurring, and hence known to be valid, sequences and two point mutants of unknown validity were used to test the usefulness of the approach. Nine of the ten valid sequences tested positive whereas both mutants tested as clearly defective. The tenth valid sequence gave results that would be interpreted as reflecting a borderline status were the answer not known. These results demonstrate that it is possible to experimentally determine which sequences in local regions of the sequence space are potentially valid 5S rRNAs.

  13. Experimental investigation of an RNA sequence space

    NASA Astrophysics Data System (ADS)

    Lee, Youn-Hyung; Dsouza, Lisa; Fox, George E.

    1993-12-01

    Modern rRNAs are the historic consequence of an ongoing evolutionary exploration of a sequence space. These extant sequences belong to a special subset of the sequence space that is comprised only of those primary sequences that can validly perform the biological function(s) required of the particular RNA. If it were possible to readily identify all such valid sequences, stochastic predictions could be made about the relative likelihood of various evolutionary pathways available to an RNA. Herein an experimental system which can assess whether a particular sequence is likely to have validity as a eubacterial 5S rRNA is described. A total of ten naturally occurring, and hence known to be valid, sequences and two point mutants of unknown validity were used to test the usefulness of the approach. Nine of the ten valid sequences tested positive whereas both mutants tested as clearly defective. The tenth valid sequence gave results that would be interpreted as reflecting a borderline status were the answer not known. These results demonstrate that it is possible to experimentally determine which sequences in local regions of the sequence space are potentially valid 5S rRNAs. This approach will allow direct study of the constraints governing RNA evolution and allow inquiry into how the last common ancestor of extant life apparently came to have very complex ribosomal RNAs that subsequently were very conserved.

  14. Experimental investigation of an RNA sequence space.

    PubMed

    Lee, Y H; Dsouza, L; Fox, G E

    1993-12-01

    Modern rRNAs are the historic consequence of an ongoing evolutionary exploration of a sequence space. These extant sequences belong to a special subset of the sequence space that is comprised only of those primary sequences that can validly perform the biological function(s) required of the particular RNA. If it were possible to readily identify all such valid sequences, stochastic predictions could be made about the relative likelihood of various evolutionary pathways available to an RNA. Herein an experimental system which can assess whether a particular sequence is likely to have validity as a eubacterial 5S rRNA is described. A total of ten naturally occurring and hence known to be valid, sequences and two point mutants of unknown validity were used to test the usefulness of the approach. Nine of the ten valid sequences tested positive whereas both mutants tested as clearly defective. The tenth valid sequence gave results that would be interpreted as reflecting a borderline status were the answer not known. These results demonstrate that it is possible to experimentally determine which sequences in local regions of the sequence space are potentially valid 5S rRNAs. This approach will allow direct study of the constraints governing RNA evolution and allow inquiry into how the last common ancestor of extant life apparently came to have very complex ribosomal RNAs that subsequently were very conserved.

  15. Experimental investigation of an RNA sequence space

    NASA Technical Reports Server (NTRS)

    Lee, Youn-Hyung; Dsouza, Lisa; Fox, George E.

    1993-01-01

    Modern rRNAs are the historic consequence of an ongoing evolutionary exploration of a sequence space. These extant sequences belong to a special subset of the sequence space that is comprised only of those primary sequences that can validly perform the biological function(s) required of the particular RNA. If it were possible to readily identify all such valid sequences, stochastic predictions could be made about the relative likelihood of various evolutionary pathways available to an RNA. Herein an experimental system which can assess whether a particular sequence is likely to have validity as a eubacterial 5S rRNA is described. A total of ten naturally occurring, and hence known to be valid, sequences and two point mutants of unknown validity were used to test the usefulness of the approach. Nine of the ten valid sequences tested positive whereas both mutants tested as clearly defective. The tenth valid sequence gave results that would be interpreted as reflecting a borderline status were the answer not known. These results demonstrate that it is possible to experimentally determine which sequences in local regions of the sequence space are potentially valid 5S rRNAs.

  16. Sequencing technologies for animal cell culture research.

    PubMed

    Kremkow, Benjamin G; Lee, Kelvin H

    2015-01-01

    Over the last 10 years, 2nd and 3rd generation sequencing technologies have made the use of genomic sequencing within the animal cell culture community increasingly commonplace. Each technology's defining characteristics are unique, including the cost, time, sequence read length, daily throughput, and occurrence of sequence errors. Given each sequencing technology's intrinsic advantages and disadvantages, the optimal technology for a given experiment depends on the particular experiment's objective. This review discusses the current characteristics of six next-generation sequencing technologies, compares the differences between them, and characterizes their relevance to the animal cell culture community. These technologies are continually improving, as evidenced by the recent achievement of the field's benchmark goal: sequencing a human genome for less than $1,000.

  17. Discovering novel sequence motifs with MEME.

    PubMed

    Bailey, Timothy L

    2002-11-01

    This unit illustrates how to use MEME to discover motifs in a group of related nucleotide or peptide sequences. A MEME motif is a sequence pattern that occurs repeatedly in one or more sequences in the input group. MEME can be used to discover novel patterns because it bases its discoveries only on the input sequences, not on any prior knowledge (such as databases of known motifs). The input to MEME is a set of unaligned sequences of the same type (peptide or nucleotide). For each motif it discovers, MEME reports the occurrences (sites), consensus sequence, and the level of conservation (information content) at each position in the pattern. MEME also produces block diagrams showing where all of the discovered motifs occur in the training set sequences. MEME's hypertext (HTML) output also contains buttons that allow for the convenient use of the motifs in other searches.

  18. Instability in progressive multiple sequence alignment algorithms.

    PubMed

    Boyce, Kieran; Sievers, Fabian; Higgins, Desmond G

    2015-01-01

    Progressive alignment is the standard approach used to align large numbers of sequences. As with all heuristics, this involves a tradeoff between alignment accuracy and computation time. We examine this tradeoff and find that, because of a loss of information in the early steps of the approach, the alignments generated by the most common multiple sequence alignment programs are inherently unstable, and simply reversing the order of the sequences in the input file will cause a different alignment to be generated. Although this effect is more obvious with larger numbers of sequences, it can also be seen with data sets in the order of one hundred sequences. We also outline the means to determine the number of sequences in a data set beyond which the probability of instability will become more pronounced. This has major ramifications for both the designers of large-scale multiple sequence alignment algorithms, and for the users of these alignments.

  19. From sequence mapping to genome assemblies.

    PubMed

    Otto, Thomas D

    2015-01-01

    The development of "next-generation" high-throughput sequencing technologies has made it possible for many labs to undertake sequencing-based research projects that were unthinkable just a few years ago. Although the scientific applications are diverse, e.g., new genome projects, gene expression analysis, genome-wide functional screens, or epigenetics-the sequence data are usually processed in one of two ways: sequence reads are either mapped to an existing reference sequence, or they are built into a new sequence ("de novo assembly"). In this chapter, we first discuss some limitations of the mapping process and how these may be overcome through local sequence assembly. We then introduce the concept of de novo assembly and describe essential assembly improvement procedures such as scaffolding, contig ordering, gap closure, error evaluation, gene annotation transfer and ab initio gene annotation. The results are high-quality draft assemblies that will facilitate informative downstream analyses.

  20. The evolution of the Voyager mission sequence software and trends for future mission sequence software systems

    NASA Technical Reports Server (NTRS)

    Brooks, Robert N., Jr.

    1988-01-01

    The historical background of the spacecraft sequence generation process as it is represented by the Voyager mission to the outer planets is discussed. Present plans for future sequencing methods are examined, including the emphasis on cutting costs and the contrast between the centralized and distributed systems for sequencing. The use of artificial intelligence in mission sequencing is addressed.

  1. The evolution of the Voyager mission sequence software and trends for future mission sequence software systems

    NASA Technical Reports Server (NTRS)

    Brooks, Robert N., Jr.

    1988-01-01

    The historical background of the spacecraft sequence generation process as it is represented by the Voyager mission to the outer planets is discussed. Present plans for future sequencing methods are examined, including the emphasis on cutting costs and the contrast between the centralized and distributed systems for sequencing. The use of artificial intelligence in mission sequencing is addressed.

  2. An RNA-protein contact determined by 5-bromouridine substitution, photocrosslinking and sequencing.

    PubMed Central

    Willis, M C; LeCuyer, K A; Meisenheimer, K M; Uhlenbeck, O C; Koch, T H

    1994-01-01

    An analogue of the replicase translational operator of bacteriophage R17, that contains a 5-bromouridine at position -5 (RNA 1), complexes with a dimer of the coat protein and photocrosslinks to the coat protein in high yield upon excitation at 308 nm with a xenon chloride excimer laser. Tryptic digestion of the crosslinked nucleoprotein complex followed by Edman degradation of the tryptic fragment bearing the RNA indicates crosslinking to tyrosine 85 of the coat protein. A control experiment with a Tyr 85 to Ser 85 variant coat protein showed binding but no photocrosslinking at saturating protein concentration. This is consistent with the observation from model compound studies of preferential photocrosslinking of BrU to the electron rich aromatic amino acids tryptophan, tyrosine, and histidine with 308 nm excitation. Images PMID:7800485

  3. Feedback shift register sequences versus uniformly distributed random sequences for correlation chromatography

    NASA Technical Reports Server (NTRS)

    Kaljurand, M.; Valentin, J. R.; Shao, M.

    1996-01-01

    Two alternative input sequences are commonly employed in correlation chromatography (CC). They are sequences derived according to the algorithm of the feedback shift register (i.e., pseudo random binary sequences (PRBS)) and sequences derived by using the uniform random binary sequences (URBS). These two sequences are compared. By applying the "cleaning" data processing technique to the correlograms that result from these sequences, we show that when the PRBS is used the S/N of the correlogram is much higher than the one resulting from using URBS.

  4. Feedback shift register sequences versus uniformly distributed random sequences for correlation chromatography

    NASA Technical Reports Server (NTRS)

    Kaljurand, M.; Valentin, J. R.; Shao, M.

    1996-01-01

    Two alternative input sequences are commonly employed in correlation chromatography (CC). They are sequences derived according to the algorithm of the feedback shift register (i.e., pseudo random binary sequences (PRBS)) and sequences derived by using the uniform random binary sequences (URBS). These two sequences are compared. By applying the "cleaning" data processing technique to the correlograms that result from these sequences, we show that when the PRBS is used the S/N of the correlogram is much higher than the one resulting from using URBS.

  5. Mitogenome sequence accuracy using different elucidation methods

    PubMed Central

    Velozo Timbó, Renata; Coiti Togawa, Roberto; M. C. Costa, Marcos; A. Andow, David

    2017-01-01

    Mitogenome sequences are highly desired because they are used in several biological disciplines. Their elucidation has been facilitated through the development of massive parallel sequencing, accelerating their deposition in public databases. However, sequencing, assembly and annotation methods might induce variability in their quality, raising concerns about the accuracy of the sequences that have been deposited in public databases. In this work we show that different sequencing methods (number of species pooled in a library, insert size and platform) and assembly and annotation methods generated variable completeness and similarity of the resulting mitogenome sequences, using three species of predaceous ladybird beetles as models. The identity of the sequences varied considerably depending on the method used and ranged from 38.19 to 90.1% for Cycloneda sanguinea, 72.85 to 91.06% for Harmonia axyridis and 41.15 to 93.60% for Hippodamia convergens. Dissimilarities were frequently found in the non-coding A+T rich region, but were also common in coding regions, and were not associated with low coverage. Mitogenome completeness and sequence identity were affected by the sequencing and assembly/annotation methods, and high within-species variation was also found for other mitogenome depositions in GenBank. This indicates a need for methods to confirm sequence accuracy, and guidelines for verifying mitogenomes should be discussed and developed by the scientific community. PMID:28662089

  6. Randomness in Sequence Evolution Increases over Time

    PubMed Central

    Wang, Guangyu; Sun, Shixiang; Zhang, Zhang

    2016-01-01

    The second law of thermodynamics states that entropy, as a measure of randomness in a system, increases over time. Although studies have investigated biological sequence randomness from different aspects, it remains unknown whether sequence randomness changes over time and whether this change consists with the second law of thermodynamics. To capture the dynamics of randomness in molecular sequence evolution, here we detect sequence randomness based on a collection of eight statistical random tests and investigate the randomness variation of coding sequences with an application to Escherichia coli. Given that core/essential genes are more ancient than specific/non-essential genes, our results clearly show that core/essential genes are more random than specific/non-essential genes and accordingly indicate that sequence randomness indeed increases over time, consistent well with the second law of thermodynamics. We further find that an increase in sequence randomness leads to increasing randomness of GC content and longer sequence length. Taken together, our study presents an important finding, for the first time, that sequence randomness increases over time, which may provide profound insights for unveiling the underlying mechanisms of molecular sequence evolution. PMID:27224236

  7. Randomness in Sequence Evolution Increases over Time.

    PubMed

    Wang, Guangyu; Sun, Shixiang; Zhang, Zhang

    2016-01-01

    The second law of thermodynamics states that entropy, as a measure of randomness in a system, increases over time. Although studies have investigated biological sequence randomness from different aspects, it remains unknown whether sequence randomness changes over time and whether this change consists with the second law of thermodynamics. To capture the dynamics of randomness in molecular sequence evolution, here we detect sequence randomness based on a collection of eight statistical random tests and investigate the randomness variation of coding sequences with an application to Escherichia coli. Given that core/essential genes are more ancient than specific/non-essential genes, our results clearly show that core/essential genes are more random than specific/non-essential genes and accordingly indicate that sequence randomness indeed increases over time, consistent well with the second law of thermodynamics. We further find that an increase in sequence randomness leads to increasing randomness of GC content and longer sequence length. Taken together, our study presents an important finding, for the first time, that sequence randomness increases over time, which may provide profound insights for unveiling the underlying mechanisms of molecular sequence evolution.

  8. Deciphering the RNA landscape by RNAome sequencing

    PubMed Central

    Derks, Kasper WJ; Misovic, Branislav; van den Hout, Mirjam CGN; Kockx, Christel EM; Payan Gomez, Cesar; Brouwer, Rutger WW; Vrieling, Harry; Hoeijmakers, Jan HJ; van IJcken, Wilfred FJ; Pothof, Joris

    2015-01-01

    Current RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species in an unperturbed manner. We report strand-specific RNAome sequencing that determines expression of small and large RNAs from rRNA-depleted total RNA in a single sequence run. Since current analysis pipelines cannot reliably analyze small and large RNAs simultaneously, we developed TRAP, Total Rna Analysis Pipeline, a robust interface that is also compatible with existing RNA sequencing protocols. RNAome sequencing quantitatively preserved all RNA classes, allowing cross-class comparisons that facilitates the identification of relationships between different RNA classes. We demonstrate the strength of RNAome sequencing in mouse embryonic stem cells treated with cisplatin. MicroRNA and mRNA expression in RNAome sequencing significantly correlated between replicates and was in concordance with both existing RNA sequencing methods and gene expression arrays generated from the same samples. Moreover, RNAome sequencing also detected additional RNA classes such as enhancer RNAs, anti-sense RNAs, novel RNA species and numerous differentially expressed RNAs undetectable by other methods. At the level of complete RNA classes, RNAome sequencing also identified a specific global repression of the microRNA and microRNA isoform classes after cisplatin treatment whereas all other classes such as mRNAs were unchanged. These characteristics of RNAome sequencing will significantly improve expression analysis as well as studies on RNA biology not covered by existing methods. PMID:25826412

  9. Effects of an Additional Sequence of Color Stimuli on Visuomotor Sequence Learning

    PubMed Central

    Tanaka, Kanji; Watanabe, Katsumi

    2017-01-01

    Through practice, people are able to integrate a secondary sequence (e.g., a stimulus-based sequence) into a primary sequence (e.g., a response-based sequence), but it is still controversial whether the integrated sequences lead to better learning than only the primary sequence. In the present study, we aimed to investigate the effects of a sequence that integrated space and color sequences on early and late learning phases (corresponding to effector-independent and effector-dependent learning, respectively) and how the effects differed in the integrated and primary sequences in each learning phase. In the task, the participants were required to learn a sequence of button presses using trial-and-error and to perform the sequence successfully for 20 trials (m × n task). First, in the baseline task, all participants learned a non-colored sequence, in which the response button always turned red. Then, in the learning task, the participants were assigned to two groups: a colored sequence group (i.e., space and color) or a non-colored sequence group (i.e., space). In the colored sequence, the response button turned a pre-determined color and the participants were instructed to attend to the sequences of both location and color as much as they could. The results showed that the participants who performed the colored sequence acquired the correct button presses of the sequence earlier, but showed a slower mean performance time than those who performed the non-colored sequence. Moreover, the slower performance time in the colored sequence group remained in a subsequent transfer task in which the spatial configurations of the buttons were vertically mirrored from the learning task. These results indicated that if participants explicitly attended to both the spatial response sequence and color stimulus sequence at the same time, they could develop their spatial representations of the sequence earlier (i.e., early development of the effector-independent learning), but might

  10. Disks around Main Sequence Stars

    NASA Astrophysics Data System (ADS)

    Trauger, John

    1995-07-01

    About 30 other nearby stars have been shown (Aumann 1985,1988 Sadakane and Nishida 1986) to emit excess infrared flux relative to that expected from their photospheres. It is believed that such emission is the rule rather than the exception and that the limited number is caused by the IRAS detection limits. We propose to observe the prototypical objects Alpha Lyrae. If an optical counterpart to the infrared emission is observed, then the same analysis as that performed on the Beta Pictoris disk will be possible. If not, because of the low scattered light levels in the wings of the HST PSF, stringent limits on the albedo of the disk should be obtained. Only one circumstellar disk has been directly observed around a main sequence star. On the other hand, it is believed that disks are typical byproducts of star formation, and that these disks are the sites where planetary systems are formed. Both of these hypotheses will be tested with the observations proposed here. Firstly, the observations, if they detect the material will constrain its spatial distribution, and test the disk hypothesis. The material surrounding the target stars is presumed but not known to be distributed in a disk. There is not significant extinction towards these targets, but a shell of optically thin material can also fit the existing IRAS observations. The observations also only loosely constrain the radial distribution of the particles. Given a detection, it should be possible to distinguish a disk from

  11. [Rapid-sequence anesthesia induction].

    PubMed

    Lloréns Herrerías, J

    2003-02-01

    Rapid-sequence induction (RSI) techniques are designed to reduce the risk of aspiration in cases where risk is high. ISR is often for surgery, particularly under emergency conditions, but is also found in procedures requiring emergency tracheal intubation inside and outside the hospital. ISR techniques have proven safe for reducing the risk of aspiration and providing good conditions for intubation in such situations. The great variety of clinical situations that can be involved means that the combination of drugs to be used should be individualized for each case. In addition to the two objectives of RSI named and the particular nature of a case, the risk of presenting unforeseen difficult intubation is yet another factor affecting choice of drugs. Precisely because of this last factor and the good results obtained with short-acting opiates, great interest has developed in recent years in RSI that does not use neuromuscular blocking agents. However, conclusive data are unavailable. Studies are often difficult to compare because of small differences in the combination of drugs, the dosing of one or more of them, the route of administration, or because the criteria used to define ideal intubation conditions are different.

  12. Polymer support for exonucleolytic sequencing.

    PubMed

    Hinz, M; Gura, S; Nitzan, B; Margel, S; Seliger, H

    2001-04-13

    Different kinds of particles were investigated for their potential use as supports for exonucleolytic sequence analysis. Composite beads composed of an unreactive polystyrene "core" and a "shell" of functionalized silica nanoparticles were found to best fulfill the various prerequisites. The biotin/streptavidin system was used for attachment of DNA to composite beads of 6 microm diameter. Applying M13 ssDNA in extremely high dilution (approximately 1 molecule versus 100 beads) with internal fluorescent labels, only a small fraction of beads was found to be associated with fluorescent entities, which likely correspond to a very small number of bound DNA molecules per particle. For better selection and transfer of DNA-containing beads into microstructures for exonuclease degradation the loading experiments were repeated with composite beads of 2.3 microm diameter. In this case a covalent bond was formed between carboxylate-functionalized beads and amino-terminated oligonucleotides, which were detected through external labelling with fluorescent nanoparticles interacting with biotinylated segments of the complementary strand.

  13. On the Origin of Sequence

    PubMed Central

    van der Gulik, Peter T. S.

    2015-01-01

    Three aspects which make planet Earth special, and which must be taken in consideration with respect to the emergence of peptides, are the mineralogical composition, the Moon which is in the same size class, and the triple environment consisting of ocean, atmosphere, and continent. GlyGly is a remarkable peptide because it stimulates peptide bond formation in the Salt-Induced Peptide Formation reaction. The role glycine and aspartic acid play in the active site of RNA polymerase is remarkable too. GlyGly might have been the original product of coded peptide synthesis because of its importance in stimulating the production of oligopeptides with a high aspartic acid content, which protected small RNA molecules by binding Mg2+ ions. The feedback loop, which is closed by having RNA molecules producing GlyGly, is proposed as the essential element fundamental to life. Having this system running, longer sequences could evolve, gradually solving the problem of error catastrophe. The basic structure of the standard genetic code (8 fourfold degenerate codon boxes and 8 split codon boxes) is an example of the way information concerning the emergence of life is frozen in the biological constitution of organisms: the structure of the code contains historical information. PMID:26580656

  14. Comparison of Next-Generation Sequencing Systems

    PubMed Central

    Liu, Lin; Li, Yinhu; Li, Siliang; Hu, Ni; He, Yimin; Pong, Ray; Lin, Danni; Lu, Lihua; Law, Maggie

    2012-01-01

    With fast development and wide applications of next-generation sequencing (NGS) technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI), which possesses the world's biggest sequencing capacity, has multiple NGS systems including 137 HiSeq 2000, 27 SOLiD, one Ion Torrent PGM, one MiSeq, and one 454 sequencer. We have accumulated extensive experience in sample handling, sequencing, and bioinformatics analysis. In this paper, technologies of these systems are reviewed, and first-hand data from extensive experience is summarized and analyzed to discuss the advantages and specifics associated with each sequencing system. At last, applications of NGS are summarized. PMID:22829749

  15. Long-range correlations in nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Peng, C.-K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Sciortino, F.; Simons, M.; Stanley, H. E.

    1992-03-01

    DNA SEQUENCES have been analysed using models, such as an it-step Markov chain, that incorporate the possibility of short-range nucleotide correlations1. We propose here a method for studying the stochastic properties of nucleotide sequences by constructing a 1:1 map of the nucleotide sequence onto a walk, which we term a 'DNA walk'. We then use the mapping to provide a quantitative measure of the correlation between nucleotides over long distances along the DNA chain. Thus we uncover in the nucleotide sequence a remarkably long-range power law correlation that implies a new scale-invariant property of DNA. We find such long-range correlations in intron-containing genes and in nontranscribed regulatory DNA sequences, but not in complementary DNA sequences or intron-less genes.

  16. Evolutionarily conserved sequences on human chromosome 21

    SciTech Connect

    Frazer, Kelly A.; Sheehan, John B.; Stokowski, Renee P.; Chen, Xiyin; Hosseini, Roya; Cheng, Jan-Fang; Fodor, Stephen P.A.; Cox, David R.; Patil, Nila

    2001-09-01

    Comparison of human sequences with the DNA of other mammals is an excellent means of identifying functional elements in the human genome. Here we describe the utility of high-density oligonucleotide arrays as a rapid approach for comparing human sequences with the DNA of multiple species whose sequences are not presently available. High-density arrays representing approximately 22.5 Mb of nonrepetitive human chromosome 21 sequence were synthesized and then hybridized with mouse and dog DNA to identify sequences conserved between humans and mice (human-mouse elements) and between humans and dogs (human-dog elements). Our data show that sequence comparison of multiple species provides a powerful empiric method for identifying actively conserved elements in the human genome. A large fraction of these evolutionarily conserved elements are present in regions on chromosome 21 that do not encode known genes.

  17. Multiple sequence alignment with hierarchical clustering.

    PubMed Central

    Corpet, F

    1988-01-01

    An algorithm is presented for the multiple alignment of sequences, either proteins or nucleic acids, that is both accurate and easy to use on microcomputers. The approach is based on the conventional dynamic-programming method of pairwise alignment. Initially, a hierarchical clustering of the sequences is performed using the matrix of the pairwise alignment scores. The closest sequences are aligned creating groups of aligned sequences. Then close groups are aligned until all sequences are aligned in one group. The pairwise alignments included in the multiple alignment form a new matrix that is used to produce a hierarchical clustering. If it is different from the first one, iteration of the process can be performed. The method is illustrated by an example: a global alignment of 39 sequences of cytochrome c. PMID:2849754

  18. Long-range correlations in nucleotide sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Sciortino, F.; Simons, M.; Stanley, H. E.

    1992-01-01

    DNA sequences have been analysed using models, such as an n-step Markov chain, that incorporate the possibility of short-range nucleotide correlations. We propose here a method for studying the stochastic properties of nucleotide sequences by constructing a 1:1 map of the nucleotide sequence onto a walk, which we term a 'DNA walk'. We then use the mapping to provide a quantitative measure of the correlation between nucleotides over long distances along the DNA chain. Thus we uncover in the nucleotide sequence a remarkably long-range power law correlation that implies a new scale-invariant property of DNA. We find such long-range correlations in intron-containing genes and in nontranscribed regulatory DNA sequences, but not in complementary DNA sequences or intron-less genes.

  19. Long-range correlations in nucleotide sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Sciortino, F.; Simons, M.; Stanley, H. E.

    1992-01-01

    DNA sequences have been analysed using models, such as an n-step Markov chain, that incorporate the possibility of short-range nucleotide correlations. We propose here a method for studying the stochastic properties of nucleotide sequences by constructing a 1:1 map of the nucleotide sequence onto a walk, which we term a 'DNA walk'. We then use the mapping to provide a quantitative measure of the correlation between nucleotides over long distances along the DNA chain. Thus we uncover in the nucleotide sequence a remarkably long-range power law correlation that implies a new scale-invariant property of DNA. We find such long-range correlations in intron-containing genes and in nontranscribed regulatory DNA sequences, but not in complementary DNA sequences or intron-less genes.

  20. The genome sequence of parrot bornavirus 5.

    PubMed

    Guo, Jianhua; Tizard, Ian

    2015-12-01

    Although several new avian bornaviruses have recently been described, information on their evolution, virulence, and sequence are often limited. Here we report the complete genome sequence of parrot bornavirus 5 (PaBV-5) isolated from a case of proventricular dilatation disease in a Palm cockatoo (Probosciger aterrimus). The complete genome consists of 8842 nucleotides with distinct 5' and 3' end sequences. This virus shares nucleotide sequence identities of 69-74 % with other bornaviruses in the genomic regions excluding the 5' and 3' terminal sequences. Phylogenetic analysis based on the genomic regions demonstrated this new isolate is an isolated branch within the clade that includes the aquatic bird bornaviruses and the passerine bornaviruses. Based on phylogenetic analyses and its low nucleotide sequence identities with other bornavirus, we support the proposal that PaBV-5 be assigned to a new bornavirus species:- Psittaciform 2 bornavirus.

  1. Introduction: Paleozoic applications of sequence stratigraphy

    USGS Publications Warehouse

    Witzke, B.J.; Ludvigson, Greg A.; Day, J.; ,

    1996-01-01

    Despite conceptual origins from studies of the Paleozoic strata of cratonic basins, sequence stratigraphy has largely been developed and applied to post-Paleozoic successions in extracratonic settings. The application of continental-margin sequence stratigraphic concepts to cratonic basinal successions is fraught with problems owing to slower rates of sediment accumulation, and consequently, a more coarsely defined temporal resolution. In addition, some important sequence stratigraphic components are rare or completely missing from cratonic areas. Common usage of genetic sequence stratigraphic terminology can coopt critical evaluation of depositional characters, and must be practiced with extreme caution in order to avoid 'model-driven' approaches to stratigraphic synthesis. The best available tests for evaluating current questions regarding the central role of eustasy in sequence stratigraphy may be through interregional and intercontinental comparisons of cratonic stratigraphic sequences.

  2. Coupled amplification and sequencing of genomic DNA.

    PubMed Central

    Ruano, G; Kidd, K K

    1991-01-01

    Addition of dideoxyribonucleotides during the exponential phase of the PCR should result in the synthesis of two complementary sequence ladders. We have explored this hypothesis to develop coupled amplification and sequencing of genomic DNA. Coupled amplification and sequencing is a biphasic method for sequencing both strands of template as they are amplified. Stage I selects and amplifies a single target from the genomic DNA sample. Stage II accomplishes the sequencing as well as additional amplification of the target using aliquots from the stage I reaction mixed with end-labeled primer and dideoxynucleotides. We have successfully applied coupled amplification and sequencing to a 300-base-pair fragment 4 kilobases upstream from HOX2B directly from human whole genomic DNA. Images PMID:1672768

  3. Nanopore DNA sequencing with MspA.

    PubMed

    Derrington, Ian M; Butler, Tom Z; Collins, Marcus D; Manrao, Elizabeth; Pavlenok, Mikhail; Niederweis, Michael; Gundlach, Jens H

    2010-09-14

    Nanopore sequencing has the potential to become a direct, fast, and inexpensive DNA sequencing technology. The simplest form of nanopore DNA sequencing utilizes the hypothesis that individual nucleotides of single-stranded DNA passing through a nanopore will uniquely modulate an ionic current flowing through the pore, allowing the record of the current to yield the DNA sequence. We demonstrate that the ionic current through the engineered Mycobacterium smegmatis porin A, MspA, has the ability to distinguish all four DNA nucleotides and resolve single-nucleotides in single-stranded DNA when double-stranded DNA temporarily holds the nucleotides in the pore constriction. Passing DNA with a series of double-stranded sections through MspA provides proof of principle of a simple DNA sequencing method using a nanopore. These findings highlight the importance of MspA in the future of nanopore sequencing.

  4. Nanopore DNA sequencing with MspA

    PubMed Central

    Derrington, Ian M.; Butler, Tom Z.; Collins, Marcus D.; Manrao, Elizabeth; Pavlenok, Mikhail; Niederweis, Michael; Gundlach, Jens H.

    2010-01-01

    Nanopore sequencing has the potential to become a direct, fast, and inexpensive DNA sequencing technology. The simplest form of nanopore DNA sequencing utilizes the hypothesis that individual nucleotides of single-stranded DNA passing through a nanopore will uniquely modulate an ionic current flowing through the pore, allowing the record of the current to yield the DNA sequence. We demonstrate that the ionic current through the engineered Mycobacterium smegmatis porin A, MspA, has the ability to distinguish all four DNA nucleotides and resolve single-nucleotides in single-stranded DNA when double-stranded DNA temporarily holds the nucleotides in the pore constriction. Passing DNA with a series of double-stranded sections through MspA provides proof of principle of a simple DNA sequencing method using a nanopore. These findings highlight the importance of MspA in the future of nanopore sequencing. PMID:20798343

  5. Offline consolidation in implicit sequence learning.

    PubMed

    Meier, Beat; Cock, Josephine

    2014-08-01

    The goal of this study was to investigate offline memory consolidation with regard to general motor skill learning and implicit sequence-specific learning. We trained young adults on a serial reaction time task with a retention interval of either 24 h (Experiment 1) or 1 week (Experiment 2) between two sessions. We manipulated sequence complexity (deterministic vs probabilistic) and motor responses (unimanual or vs bimanual). We found no evidence of offline memory consolidation for sequence-specific learning with either interval (in the sense of no deterioration over the interval but no further improvement either). However, we did find evidence of offline enhancement of general motor skill learning with both intervals, independent of kind of sequence or kind of response. These results suggest that general motor skill learning, but not sequence-specific learning, appears to be enhanced during offline intervals in implicit sequence learning.

  6. Sequencing Intractable DNA to Close Microbial Genomes

    SciTech Connect

    Hurt, Jr., Richard Ashley; Brown, Steven D; Podar, Mircea; Palumbo, Anthony Vito; Elias, Dwayne A

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  7. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    States, David J.

    2004-07-28

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  8. Sequence Compaction to Preserve Transition Frequencies

    SciTech Connect

    Pinar, Ali; Liu, C.L.

    2002-12-12

    Simulation-based power estimation is commonly used for its high accuracy despite excessive computation times. Techniques have been proposed to speed it up by compacting an input sequence while preserving its power-consumption characteristics. We propose a novel method to compact a sequence that preserves transition frequencies. We prove the problem is NP-Complete, and propose a graph model to reduce it to that of finding a heaviest weighted trail on a directed graph, along with a heuristic utilizing this model. We also propose using multiple sequences for better accuracy with even shorter sequences. Experiments showed that power dissipation can be estimated with an error of only 2.3 percent, while simulation times are reduced by 10. Proposed methods effectively preserve transition frequencies and generated solutions that are very close to an optimal. Experiments also showed that multiple sequences granted more accurate results with even shorter sequences.

  9. Specific heat spectra for quasiperiodic ladder sequences

    NASA Astrophysics Data System (ADS)

    Moreira, D. A.; Albuquerque, E. L.; Bezerra, C. G.

    2006-12-01

    We performed a theoretical study of the specific heat C(T) as a function of the temperature for double-strand quasiperiodic sequences. To mimic DNA molecules, the sequences are made up from the nucleotides guanine G, adenine A, cytosine C and thymine T, arranged according to the Fibonacci and Rudin-Shapiro quasiperiodic sequences. The energy spectra are calculated using the two-dimensional Schrödinger equation, in a tight-binding approximation, with the on-site energy exhibiting long-range disorder and non-random hopping amplitudes. We compare the specific heat features of these quasiperiodic artificial sequences to the spectra considering a segment of the first sequenced human chromosome 22 (Ch22), a real genomic DNA sequence.

  10. Some properties of generalized Fibonacci sequence

    NASA Astrophysics Data System (ADS)

    Chong, Chin-Yoon; Ho, C. K.

    2015-12-01

    For all non-negative integer n and real constants a, b, p and q, the generalized Fibonacci sequence {U n } is defined by Un+2 = pUn+1 + qUn with the initial values U0 = a and U1 = b. Throughout the paper, we study some properties of the generalized Fibonacci sequence. Our results will motivate some new research problems concerning the contribution of the generalized sequence.

  11. FRESCO: Referential compression of highly similar sequences.

    PubMed

    Wandelt, Sebastian; Leser, Ulf

    2013-01-01

    In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware.

  12. Discrete sequence prediction and its applications

    NASA Technical Reports Server (NTRS)

    Laird, Philip

    1992-01-01

    Learning from experience to predict sequences of discrete symbols is a fundamental problem in machine learning with many applications. We apply sequence prediction using a simple and practical sequence-prediction algorithm, called TDAG. The TDAG algorithm is first tested by comparing its performance with some common data compression algorithms. Then it is adapted to the detailed requirements of dynamic program optimization, with excellent results.

  13. Multiple tag labeling method for DNA sequencing

    DOEpatents

    Mathies, Richard A.; Huang, Xiaohua C.; Quesada, Mark A.

    1995-01-01

    A DNA sequencing method described which uses single lane or channel electrophoresis. Sequencing fragments are separated in said lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radio-isotope labels.

  14. Genome Sequencing and Analysis Conference IV

    SciTech Connect

    Not Available

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  15. Nucleotide sequences encoding a thermostable alkaline protease

    DOEpatents

    Wilson, D.B.; Lao, G.

    1998-01-06

    Nucleotide sequences, derived from a thermophilic actinomycete microorganism, which encode a thermostable alkaline protease are disclosed. Also disclosed are variants of the nucleotide sequences which encode a polypeptide having thermostable alkaline proteolytic activity. Recombinant thermostable alkaline protease or recombinant polypeptide may be obtained by culturing in a medium a host cell genetically engineered to contain and express a nucleotide sequence according to the present invention, and recovering the recombinant thermostable alkaline protease or recombinant polypeptide from the culture medium. 3 figs.

  16. Nucleotide sequences encoding a thermostable alkaline protease

    DOEpatents

    Wilson, David B.; Lao, Guifang

    1998-01-01

    Nucleotide sequences, derived from a thermophilic actinomycete microorganism, which encode a thermostable alkaline protease are disclosed. Also disclosed are variants of the nucleotide sequences which encode a polypeptide having thermostable alkaline proteolytic activity. Recombinant thermostable alkaline protease or recombinant polypeptide may be obtained by culturing in a medium a host cell genetically engineered to contain and express a nucleotide sequence according to the present invention, and recovering the recombinant thermostable alkaline protease or recombinant polypeptide from the culture medium.

  17. Unlocking Short Read Sequencing for Metagenomics

    SciTech Connect

    Rodrigue, Sébastien; Materna, Arne C.; Timberlake, Sonia C.; Blackburn, Matthew C.; Malmstrom, Rex R.; Alm, Eric J.; Chisholm, Sallie W.; Gilbert, Jack Anthony

    2010-07-28

    We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read.

  18. EGNAS: an exhaustive DNA sequence design algorithm

    PubMed Central

    2012-01-01

    Background The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA) is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of sequences with defined properties. EGNAS (Exhaustive Generation of Nucleic Acid Sequences) offers the possibility of controlling both interstrand and intrastrand properties. The guanine-cytosine content can be adjusted. Sequences can be forced to start and end with guanine or cytosine. This option reduces the risk of “fraying” of DNA strands. It is possible to limit cross hybridizations of a defined length, and to adjust the uniqueness of sequences. Self-complementarity and hairpin structures of certain length can be avoided. Sequences and subsequences can optionally be forbidden. Furthermore, sequences can be designed to have minimum interactions with predefined strands and neighboring sequences. Results The algorithm is realized in a C++ program. TAG sequences can be generated and combined with primers for single-base extension reactions, which were described for multiplexed genotyping of single nucleotide polymorphisms. Thereby, possible foldback through intrastrand interaction of TAG-primer pairs can be limited. The design of sequences for specific attachment of molecular constructs to DNA origami is presented. Conclusions We developed a new software tool called EGNAS for the design of unique nucleic acid sequences. The presented exhaustive algorithm allows to generate greater sets of sequences than with previous software and equal constraints. EGNAS is freely available for noncommercial use at http://www.chm.tu-dresden.de/pc6/EGNAS. PMID:22716030

  19. Multiple tag labeling method for DNA sequencing

    DOEpatents

    Mathies, R.A.; Huang, X.C.; Quesada, M.A.

    1995-07-25

    A DNA sequencing method is described which uses single lane or channel electrophoresis. Sequencing fragments are separated in the lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radioisotope labels. 5 figs.

  20. Completely phased genome sequencing through chromosome sorting

    PubMed Central

    Yang, Hong; Chen, Xi; Wong, Wing Hung

    2011-01-01

    The two haploid genome sequences that a person inherits from the two parents represent the most fundamentally useful type of genetic information for the study of heritable diseases and the development of personalized medicine. Because of the difficulty in obtaining long-range phase information, current sequencing methods are unable to provide this information. Here, we introduce and show feasibility of a scalable approach capable of generating genomic sequences completely phased across the entire chromosome. PMID:21169219

  1. Searching gene and protein sequence databases.

    PubMed

    Barsalou, T; Brutlag, D L

    1991-01-01

    A large-scale effort to map and sequence the human genome is now under way. Crucial to the success of this research is a group of computer programs that analyze and compare data on molecular sequences. This article describes the classic algorithms for similarity searching and sequence alignment. Because good performance of these algorithms is critical to searching very large and growing databases, we analyze the running times of the algorithms and discuss recent improvements in this area.

  2. Fractal analysis of DNA sequence data

    SciTech Connect

    Berthelsen, C.L.

    1993-01-01

    DNA sequence databases are growing at an almost exponential rate. New analysis methods are needed to extract knowledge about the organization of nucleotides from this vast amount of data. Fractal analysis is a new scientific paradigm that has been used successfully in many domains including the biological and physical sciences. Biological growth is a nonlinear dynamic process and some have suggested that to consider fractal geometry as a biological design principle may be most productive. This research is an exploratory study of the application of fractal analysis to DNA sequence data. A simple random fractal, the random walk, is used to represent DNA sequences. The fractal dimension of these walks is then estimated using the [open quote]sandbox method[close quote]. Analysis of 164 human DNA sequences compared to three types of control sequences (random, base-content matched, and dimer-content matched) reveals that long-range correlations are present in DNA that are not explained by base or dimer frequencies. The study also revealed that the fractal dimension of coding sequences was significantly lower than sequences that were primarily noncoding, indicating the presence of longer-range correlations in functional sequences. The multifractal spectrum is used to analyze fractals that are heterogeneous and have a different fractal dimension for subsets with different scalings. The multifractal spectrum of the random walks of twelve mitochondrial genome sequences was estimated. Eight vertebrate mtDNA sequences had uniformly lower spectra values than did four invertebrate mtDNA sequences. Thus, vertebrate mitochondria show significantly longer-range correlations than to invertebrate mitochondria. The higher multifractal spectra values for invertebrate mitochondria suggest a more random organization of the sequences. This research also includes considerable theoretical work on the effects of finite size, embedding dimension, and scaling ranges.

  3. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    PubMed Central

    de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084

  4. The Hippocampus and Disambiguation of Overlapping Sequences

    PubMed Central

    Agster, Kara L.; Fortin, Norbert J.; Eichenbaum, Howard

    2010-01-01

    Recent models of hippocampal function emphasize its potential role in disambiguating sequences of events that compose distinct episodic memories. In this study, rats were trained to distinguish two overlapping sequences of odor choices. The capacity to disambiguate the sequences was measured by the critical odor choice after the overlapping elements of the sequences. When the sequences were presented in rapid alternation, damage to the hippocampus, produced either by infusions of the neurotoxin ibotenic acid or by radiofrequency current, produced a severe deficit, although animals with radiofrequency lesions relearned the task. When the sequences were presented spaced apart and in random order, animals with radiofrequency hippocampal lesions could perform the task. However, they failed when a memory delay was imposed before the critical choice. These findings support the hypothesis that the hippocampus is involved in representing sequences of nonspatial events, particularly when interference between the sequences is high or when animals must remember across a substantial delay preceding items in a current sequence. PMID:12097529

  5. A measurement of disorder in binary sequences

    NASA Astrophysics Data System (ADS)

    Gong, Longyan; Wang, Haihong; Cheng, Weiwen; Zhao, Shengmei

    2015-03-01

    We propose a complex quantity, AL, to characterize the degree of disorder of L-length binary symbolic sequences. As examples, we respectively apply it to typical random and deterministic sequences. One kind of random sequences is generated from a periodic binary sequence and the other is generated from the logistic map. The deterministic sequences are the Fibonacci and Thue-Morse sequences. In these analyzed sequences, we find that the modulus of AL, denoted by |AL | , is a (statistically) equivalent quantity to the Boltzmann entropy, the metric entropy, the conditional block entropy and/or other quantities, so it is a useful quantitative measure of disorder. It can be as a fruitful index to discern which sequence is more disordered. Moreover, there is one and only one value of |AL | for the overall disorder characteristics. It needs extremely low computational costs. It can be easily experimentally realized. From all these mentioned, we believe that the proposed measure of disorder is a valuable complement to existing ones in symbolic sequences.

  6. Recursive sequences in first-year calculus

    NASA Astrophysics Data System (ADS)

    Krainer, Thomas

    2016-02-01

    This article provides ready-to-use supplementary material on recursive sequences for a second-semester calculus class. It equips first-year calculus students with a basic methodical procedure based on which they can conduct a rigorous convergence or divergence analysis of many simple recursive sequences on their own without the need to invoke inductive arguments as is typically required in calculus textbooks. The sequences that are accessible to this kind of analysis are predominantly (eventually) monotonic, but also certain recursive sequences that alternate around their limit point as they converge can be considered.

  7. DNA sequence from Cretaceous period bone fragments.

    PubMed

    Woodward, S R; Weyand, N J; Bunnell, M

    1994-11-18

    DNA was extracted from 80-million-year-old bone fragments found in strata of the Upper Cretaceous Blackhawk Formation in the roof of an underground coal mine in eastern Utah. This DNA was used as the template in a polymerase chain reaction that amplified and sequenced a portion of the gene encoding mitochondrial cytochrome b. These sequences differ from all other cytochrome b sequences investigated, including those in the GenBank and European Molecular Biology Laboratory databases. DNA isolated from these bone fragments and the resulting gene sequences demonstrate that small fragments of DNA may survive in bone for millions of years.

  8. Multiplexed microsatellite recovery using massively parallel sequencing

    USGS Publications Warehouse

    Jennings, T.N.; Knaus, B.J.; Mullins, T.D.; Haig, S.M.; Cronn, R.C.

    2011-01-01

    Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5M (USD).

  9. Choice of next-generation sequencing pipelines.

    PubMed

    Del Chierico, F; Ancora, M; Marcacci, M; Cammà, C; Putignani, L; Conti, Salvatore

    2015-01-01

    The next-generation sequencing (NGS) technologies are revolutionary tools which have made possible achieving remarkable advances in genetics since the beginning of the twenty-first century. Thanks to the possibility to produce large amount of sequence data, these tools are going to completely substitute other high-throughput technologies. Moreover, the large applications of NGS protocols are increasing the genetic decoding of biological systems through studies of genome anatomy and gene mapping, coupled to the transcriptome pictures. The application of NGS pipelines such as (1) de-novo genomic sequencing by mate-paired and whole-genome shotgun strategies; (2) specific gene sequencing on large bacterial communities; and (3) RNA-seq methods including whole transcriptome sequencing and Serial Analysis of Gene Expression (Sage-analysis) are fundamental in the genome-wide fields like metagenomics. Recently, the availability of these advanced protocols has allowed to overcome the usual sequencing technical issues related to the mapping specificity over standard shotgun library sequencing, the detection of large structural genomes variations and bridging sequencing gaps, as well as more precise gene annotation. In this chapter we will discuss how to manage a successful NGS pipeline from the planning of sequencing projects through the choice of the platforms up to the data analysis management.

  10. Visible periodicity of strong nucleosome DNA sequences.

    PubMed

    Salih, Bilal; Tripathi, Vijay; Trifonov, Edward N

    2015-01-01

    Fifteen years ago, Lowary and Widom assembled nucleosomes on synthetic random sequence DNA molecules, selected the strongest nucleosomes and discovered that the TA dinucleotides in these strong nucleosome sequences often appear at 10-11 bases from one another or at distances which are multiples of this period. We repeated this experiment computationally, on large ensembles of natural genomic sequences, by selecting the strongest nucleosomes--i.e. those with such distances between like-named dinucleotides, multiples of 10.4 bases, the structural and sequence period of nucleosome DNA. The analysis confirmed the periodicity of TA dinucleotides in the strong nucleosomes, and revealed as well other periodic sequence elements, notably classical AA and TT dinucleotides. The matrices of DNA bendability and their simple linear forms--nucleosome positioning motifs--are calculated from the strong nucleosome DNA sequences. The motifs are in full accord with nucleosome positioning sequences derived earlier, thus confirming that the new technique, indeed, detects strong nucleosomes. Species- and isochore-specific variations of the matrices and of the positioning motifs are demonstrated. The strong nucleosome DNA sequences manifest the highest hitherto nucleosome positioning sequence signals, showing the dinucleotide periodicities in directly observable rather than in hidden form.

  11. Genomic sequencing of Pleistocene cave bears

    SciTech Connect

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  12. Small scale sequence automation pays big dividends

    NASA Technical Reports Server (NTRS)

    Nelson, Bill

    1994-01-01

    Galileo sequence design and integration are supported by a suite of formal software tools. Sequence review, however, is largely a manual process with reviewers scanning hundreds of pages of cryptic computer printouts to verify sequence correctness. Beginning in 1990, a series of small, PC based sequence review tools evolved. Each tool performs a specific task but all have a common 'look and feel'. The narrow focus of each tool means simpler operation, and easier creation, testing, and maintenance. Benefits from these tools are (1) decreased review time by factors of 5 to 20 or more with a concomitant reduction in staffing, (2) increased review accuracy, and (3) excellent returns on time invested.

  13. Finding Sequences for over 270 Orphan Enzymes

    PubMed Central

    Shearer, Alexander G.; Altman, Tomer; Rhee, Christine D.

    2014-01-01

    Despite advances in sequencing technology, there are still significant numbers of well-characterized enzymatic activities for which there are no known associated sequences. These ‘orphan enzymes’ represent glaring holes in our biological understanding, and it is a top priority to reunite them with their coding sequences. Here we report a methodology for resolving orphan enzymes through a combination of database search and literature review. Using this method we were able to reconnect over 270 orphan enzymes with their corresponding sequence. This success points toward how we can systematically eliminate the remaining orphan enzymes and prevent the introduction of future orphan enzymes. PMID:24826896

  14. Next generation sequencing based approaches to epigenomics

    PubMed Central

    Marra, Marco A.

    2010-01-01

    Next generation sequencing has brought epigenomic studies to the forefront of current research. The power of massively parallel sequencing coupled to innovative molecular and computational techniques has allowed researchers to profile the epigenome at resolutions that were unimaginable only a few years ago. With early proof of concept studies published, the field is now moving into the next phase where the importance of method standardization and rigorous quality control are becoming paramount. In this review we will describe methodologies that have been developed to profile the epigenome using next generation sequencing platforms. We will discuss these in terms of library preparation, sequence platforms and analysis techniques. PMID:21266347

  15. Maize genome sequencing by methylation filtration.

    PubMed

    Palmer, Lance E; Rabinowicz, Pablo D; O'Shaughnessy, Andrew L; Balija, Vivekanand S; Nascimento, Lidia U; Dike, Sujit; de la Bastide, Melissa; Martienssen, Robert A; McCombie, W Richard

    2003-12-19

    Gene enrichment strategies offer an alternative to sequencing large and repetitive genomes such as that of maize. We report the generation and analysis of nearly 100,000 undermethylated (or methylation filtration) maize sequences. Comparison with the rice genome reveals that methylation filtration results in a more comprehensive representation of maize genes than those that result from expressed sequence tags or transposon insertion sites sequences. About 7% of the repetitive DNA is unmethylated and thus selected in our libraries, but potentially active transposons and unmethylated organelle genomes can be identified. Reverse transcription polymerase chain reaction can be used to finish the maize transcriptome.

  16. The Shannon information entropy of protein sequences.

    PubMed Central

    Strait, B J; Dewey, T G

    1996-01-01

    A comprehensive data base is analyzed to determine the Shannon information content of a protein sequence. This information entropy is estimated by three methods: a k-tuplet analysis, a generalized Zipf analysis, and a "Chou-Fasman gambler." The k-tuplet analysis is a "letter" analysis, based on conditional sequence probabilities. The generalized Zipf analysis demonstrates the statistical linguistic qualities of protein sequences and uses the "word" frequency to determine the Shannon entropy. The Zipf analysis and k-tuplet analysis give Shannon entropies of approximately 2.5 bits/amino acid. This entropy is much smaller than the value of 4.18 bits/amino acid obtained from the nonuniform composition of amino acids in proteins. The "Chou-Fasman" gambler is an algorithm based on the Chou-Fasman rules for protein structure. It uses both sequence and secondary structure information to guess at the number of possible amino acids that could appropriately substitute into a sequence. As in the case for the English language, the gambler algorithm gives significantly lower entropies than the k-tuplet analysis. Using these entropies, the number of most probable protein sequences can be calculated. The number of most probable protein sequences is much less than the number of possible sequences but is still much larger than the number of sequences thought to have existed throughout evolution. Implications of these results for mutagenesis experiments are discussed. PMID:8804598

  17. Multiplexed microsatellite recovery using massively parallel sequencing.

    PubMed

    Jennings, T N; Knaus, B J; Mullins, T D; Haig, S M; Cronn, R C

    2011-11-01

    Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356,958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5 M (USD).

  18. Sequence comparisons via algorithmic mutual information

    SciTech Connect

    Milosavijevic, A.

    1994-12-31

    One of the main problems in DNA and protein sequence comparisons is to decide whether observed similarity of two sequences should be explained by their relatedness or by mere presence of some shared internal structure, e.g., shared internal tandem repeats. The standard methods that are based on statistics or classical information theory can be used to discover either internal structure or mutual sequence similarity, but cannot take into account both. Consequently, currently used methods for sequence comparison employ {open_quotes}masking{close_quotes} techniques that simply eliminate sequences that exhibit internal repetitive structure prior to sequence comparisons. The {open_quotes}masking{close_quotes} approach precludes discovery of homologous sequences of moderate or low complexity, which abound at both DNA and protein levels. As a solution to this problem, we propose a general method that is based on algorithmic information theory and minimal length encoding. We show that algorithmic mutual information factors out the sequence similarity that is due to shared internal structure and thus enables discovery of truly related sequences. We extend the recently developed algorithmic significance method to show that significance depends exponentially on algorithmic mutual information.

  19. Locomotor sequence learning in visually guided walking.

    PubMed

    Choi, Julia T; Jensen, Peter; Nielsen, Jens Bo

    2016-04-01

    Voluntary limb modifications must be integrated with basic walking patterns during visually guided walking. In this study we tested whether voluntary gait modifications can become more automatic with practice. We challenged walking control by presenting visual stepping targets that instructed subjects to modify step length from one trial to the next. Our sequence learning paradigm is derived from the serial reaction-time (SRT) task that has been used in upper limb studies. Both random and ordered sequences of step lengths were used to measure sequence-specific and sequence-nonspecific learning during walking. In addition, we determined how age (i.e., healthy young adults vs. children) and biomechanical factors (i.e., walking speed) affected the rate and magnitude of locomotor sequence learning. The results showed that healthy young adults (age 24 ± 5 yr,n= 20) could learn a specific sequence of step lengths over 300 training steps. Younger children (age 6-10 yr,n= 8) had lower baseline performance, but their magnitude and rate of sequence learning were the same compared with those of older children (11-16 yr,n= 10) and healthy adults. In addition, learning capacity may be more limited at faster walking speeds. To our knowledge, this is the first study to demonstrate that spatial sequence learning can be integrated with a highly automatic task such as walking. These findings suggest that adults and children use implicit knowledge about the sequence to plan and execute leg movement during visually guided walking.

  20. Repetitive sequence environment distinguishes housekeeping genes

    PubMed Central

    Eller, C. Daniel; Regelson, Moira; Merriman, Barry; Nelson, Stan; Horvath, Steve; Marahrens, York

    2007-01-01

    Housekeeping genes are expressed across a wide variety of tissues. Since repetitive sequences have been reported to influence the expression of individual genes, we employed a novel approach to determine whether housekeeping genes can be distinguished from tissue-specific genes their repetitive sequence context. We show that Alu elements are more highly concentrated around housekeeping genes while various longer (>400-bp) repetitive sequences ("repeats"), including Long Interspersed Nuclear Element 1 (LINE-1) elements, are excluded from these regions. We further show that isochore membership does not distinguish housekeeping genes from tissue-specific genes and that repetitive sequence environment distinguishes housekeeping genes from tissue-specific genes in every isochore. The distinct repetitive sequence environment, in combination with other previously published sequence properties of housekeeping genes, were used to develop a method of predicting housekeeping genes on the basis of DNA sequence alone. Using expression across tissue types as a measure of success, we demonstrate that repetitive sequence environment is by far the most important sequence feature identified to date for distinguishing housekeeping genes. PMID:17141428

  1. Using SEQUEST with Theoretically Complete Sequence Databases

    NASA Astrophysics Data System (ADS)

    Sadygov, Rovshan G.

    2015-11-01

    SEQUEST has long been used to identify peptides/proteins from their tandem mass spectra and protein sequence databases. The algorithm has proven to be hugely successful for its sensitivity and specificity in identifying peptides/proteins, the sequences of which are present in the protein sequence databases. In this work, we report on work that attempts a new use for the algorithm by applying it to search a complete list of theoretically possible peptides, a de novo-like sequencing. We used freely available mass spectral data and determined a number of unique peptides as identified by SEQUEST. Using masses of these peptides and the mass accuracy of 0.001 Da, we have created a database of all theoretically possible peptide sequences corresponding to the precursor masses. We used our recently developed algorithm for determining all amino acid compositions corresponding to a mass interval, and used a lexicographic ordering to generate theoretical sequences from the compositions. The newly generated theoretical database was many-fold more complex than the original protein sequence database. We used SEQUEST to search and identify the best matches to the spectra from all theoretically possible peptide sequences. We found that SEQUEST cross-correlation score ranked the correct peptide match among the top sequence matches. The results testify to the high specificity of SEQUEST when combined with the high mass accuracy for intact peptides.

  2. Quantitative phenotyping via deep barcode sequencing.

    PubMed

    Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

    2009-10-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.

  3. DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment.

    PubMed

    Wright, Erik S

    2015-10-06

    Alignment of large and diverse sequence sets is a common task in biological investigations, yet there remains considerable room for improvement in alignment quality. Multiple sequence alignment programs tend to reach maximal accuracy when aligning only a few sequences, and then diminish steadily as more sequences are added. This drop in accuracy can be partly attributed to a build-up of error and ambiguity as more sequences are aligned. Most high-throughput sequence alignment algorithms do not use contextual information under the assumption that sites are independent. This study examines the extent to which local sequence context can be exploited to improve the quality of large multiple sequence alignments. Two predictors based on local sequence context were assessed: (i) single sequence secondary structure predictions, and (ii) modulation of gap costs according to the surrounding residues. The results indicate that context-based predictors have appreciable information content that can be utilized to create more accurate alignments. Furthermore, local context becomes more informative as the number of sequences increases, enabling more accurate protein alignments of large empirical benchmarks. These discoveries became the basis for DECIPHER, a new context-aware program for sequence alignment, which outperformed other programs on large sequence sets. Predicting secondary structure based on local sequence context is an efficient means of breaking the independence assumption in alignment. Since secondary structure is more conserved than primary sequence, it can be leveraged to improve the alignment of distantly related proteins. Moreover, secondary structure predictions increase in accuracy as more sequences are used in the prediction. This enables the scalable generation of large sequence alignments that maintain high accuracy even on diverse sequence sets. The DECIPHER R package and source code are freely available for download at DECIPHER.cee.wisc.edu and from the

  4. Compressing DNA sequence databases with coil

    PubMed Central

    White, W Timothy J; Hendy, Michael D

    2008-01-01

    Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work. PMID:18489794

  5. Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags

    PubMed Central

    Shangguan, Lingfei; Han, Jian; Kayesh, Emrul; Sun, Xin; Zhang, Changqing; Pervaiz, Tariq; Wen, Xicheng; Fang, Jinggui

    2013-01-01

    Background With the completion of genome sequencing projects for more than 30 plant species, large volumes of genome sequences have been produced and stored in online databases. Advancements in sequencing technologies have reduced the cost and time of whole genome sequencing enabling more and more plants to be subjected to genome sequencing. Despite this, genome sequence qualities of multiple plants have not been evaluated. Methodology/Principal Finding Integrity and accuracy were calculated to evaluate the genome sequence quality of 32 plants. The integrity of a genome sequence is presented by the ratio of chromosome size and genome size (or between scaffold size and genome size), which ranged from 55.31% to nearly 100%. The accuracy of genome sequence was presented by the ratio between matched EST and selected ESTs where 52.93% ∼ 98.28% and 89.02% ∼ 98.85% of the randomly selected clean ESTs could be mapped to chromosome and scaffold sequences, respectively. According to the integrity, accuracy and other analysis of each plant species, thirteen plant species were divided into four levels. Arabidopsis thaliana, Oryza sativa and Zea mays had the highest quality, followed by Brachypodium distachyon, Populus trichocarpa, Vitis vinifera and Glycine max, Sorghum bicolor, Solanum lycopersicum and Fragaria vesca, and Lotus japonicus, Medicago truncatula and Malus × domestica in that order. Assembling the scaffold sequences into chromosome sequences should be the primary task for the remaining nineteen species. Low GC content and repeat DNA influences genome sequence assembly. Conclusion The quality of plant genome sequences was found to be lower than envisaged and thus the rapid development of genome sequencing projects as well as research on bioinformatics tools and the algorithms of genome sequence assembly should provide increased processing and correction of genome sequences that have already been published. PMID:23922843

  6. Transfer in Motor Sequence Learning: Effects of Practice Schedule and Sequence Context.

    PubMed

    Müssgens, Diana M; Ullén, Fredrik

    2015-01-01

    Transfer (i.e., the application of a learned skill in a novel context) is an important and desirable outcome of motor skill learning. While much research has been devoted to understanding transfer of explicit skills the mechanisms of skill transfer after incidental learning remain poorly understood. The aim of this study was to (1) examine the effect of practice schedule on transfer and (2) investigate whether sequence-specific knowledge can transfer to an unfamiliar sequence context. We trained two groups of participants on an implicit serial response time task under a Constant (one sequence for 10 blocks) or Variable (alternating between two sequences for a total of 10 blocks) practice schedule. We evaluated response times for three types of transfer: task-general transfer to a structurally non-overlapping sequence, inter-manual transfer to a perceptually identical sequence, and sequence-specific transfer to a partially overlapping (three shared triplets) sequence. Results showed partial skill transfer to all three sequences and an advantage of Variable practice only for task-general transfer. Further, we found expression of sequence-specific knowledge for familiar sub-sequences in the overlapping sequence. These findings suggest that (1) constant practice may create interference for task-general transfer and (2) sequence-specific knowledge can transfer to a new sequential context.

  7. Transfer in Motor Sequence Learning: Effects of Practice Schedule and Sequence Context

    PubMed Central

    Müssgens, Diana M.; Ullén, Fredrik

    2015-01-01

    Transfer (i.e., the application of a learned skill in a novel context) is an important and desirable outcome of motor skill learning. While much research has been devoted to understanding transfer of explicit skills the mechanisms of skill transfer after incidental learning remain poorly understood. The aim of this study was to (1) examine the effect of practice schedule on transfer and (2) investigate whether sequence-specific knowledge can transfer to an unfamiliar sequence context. We trained two groups of participants on an implicit serial response time task under a Constant (one sequence for 10 blocks) or Variable (alternating between two sequences for a total of 10 blocks) practice schedule. We evaluated response times for three types of transfer: task-general transfer to a structurally non-overlapping sequence, inter-manual transfer to a perceptually identical sequence, and sequence-specific transfer to a partially overlapping (three shared triplets) sequence. Results showed partial skill transfer to all three sequences and an advantage of Variable practice only for task-general transfer. Further, we found expression of sequence-specific knowledge for familiar sub-sequences in the overlapping sequence. These findings suggest that (1) constant practice may create interference for task-general transfer and (2) sequence-specific knowledge can transfer to a new sequential context. PMID:26635591

  8. Direct Chloroplast Sequencing: Comparison of Sequencing Platforms and Analysis Tools for Whole Chloroplast Barcoding

    PubMed Central

    Brozynska, Marta; Furtado, Agnelo; Henry, Robert James

    2014-01-01

    Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina) and Ion Torrent (Life Technology) sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare). Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels) between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis. PMID:25329378

  9. Lygus hesperus polygalacturonase Characterization and Role in Plant Damage

    USDA-ARS?s Scientific Manuscript database

    The amino terminus, of a Lygus hesperus salivary gland protein revealing polygalacturonase (PG) activity in an SDS-PAGE activity gel assay, has been sequenced via Edman degradation. The N-terminal amino acid sequence shares homology with the predicted amino acid sequence for putative L. lineolaris P...

  10. Multiplex De Novo Sequencing of Peptide Antibiotics

    NASA Astrophysics Data System (ADS)

    Mohimani, Hosein; Liu, Wei-Ting; Yang, Yu-Liang; Gaudêncio, Susana P.; Fenical, William; Dorrestein, Pieter C.; Pevzner, Pavel A.

    Proliferation of drug-resistant diseases raises the challenge of searching for new, more efficient antibiotics. Currently, some of the most effective antibiotics (i.e., Vancomycin and Daptomycin) are cyclic peptides produced by non-ribosomal biosynthetic pathways. The isolation and sequencing of cyclic peptide antibiotics, unlike the same activity with linear peptides, is time-consuming and error-prone. The dominant technique for sequencing cyclic peptides is NMR-based and requires large amounts (milligrams) of purified materials that, for most compounds, are not possible to obtain. Given these facts, there is a need for new tools to sequence cyclic NRPs using picograms of material. Since nearly all cyclic NRPs are produced along with related analogs, we develop a mass spectrometry approach for sequencing all related peptides at once (in contrast to the existing approach that analyzes individual peptides). Our results suggest that instead of attempting to isolate and NMR-sequence the most abundant compound, one should acquire spectra of many related compounds and sequence all of them simultaneously using tandem mass spectrometry. We illustrate applications of this approach by sequencing new variants of cyclic peptide antibiotics from Bacillus brevis, as well as sequencing a previously unknown familiy of cyclic NRPs produced by marine bacteria.

  11. Complete Genome Sequencing of Trivittatus virus

    PubMed Central

    Groseth, Allison; Vine, Veronica; Weisend, Carla; Ebihara, Hideki

    2015-01-01

    Trivittatus virus (family Bunyaviridae, genus Orthobunyavirus) represents an important genetic intermediate between the California encephalitis group, and Bwamba/Pongola and Nyando groups. Here, we report the first complete genome sequence of the prototype (Eklund) strain, isolated in 1948, which interestingly shows only few differences compared to partial sequences of modern strains. PMID:26212363

  12. Some identities of generalized Fibonacci sequence

    NASA Astrophysics Data System (ADS)

    Chong, Chin-Yoon; Cheah, C. L.; Ho, C. K.

    2014-07-01

    We introduced the generalized Fibonacci sequence {Un} defined by U0 = 0, U1 = 1, and Un+2 = pUn+1+qUn for all p, q∈Z+ and for all non-negative integers n. In this paper, we obtained some recursive formulas of the sequence.

  13. On the sum of generalized Fibonacci sequence

    NASA Astrophysics Data System (ADS)

    Chong, Chin-Yoon; Ho, C. K.

    2014-06-01

    We consider the generalized Fibonacci sequence {Un defined by U0 = 0, U1 = 1, and Un+2 = pUn+1+qUn for all n∈Z0+ and p, q∈Z+. In this paper, we derived various sums of the generalized Fibonacci sequence from their recursive relations.

  14. SEQUENCE IN LEARNING--FACT OR FICTION.

    ERIC Educational Resources Information Center

    MIEL, ALICE

    SEQUENCE IN LEARNING IS USEFUL ONLY AS IT CONTRIBUTES TO THE CONTINUITY OF A CHILD'S OVERALL DEVELOPMENT. CHILDREN MAY NOT GO THROUGH THE SAME SEQUENCE TO ARRIVE AT A SIMILAR POINT OF UNDERSTANDING. EDUCATIONAL PROGRESS IS INDICATED BY A CHILD'S GROWTH IN THE DEVELOPMENT OF STRATEGIC CONCEPTS, IN WAYS OF PROCESSING INFORMATION, AND IN WAYS OF…

  15. Concept For Generation Of Long Pseudorandom Sequences

    NASA Technical Reports Server (NTRS)

    Wang, C. C.

    1990-01-01

    Conceptual very-large-scale integrated (VLSI) digital circuit performs exponentiation in finite field. Algorithm that generates unusually long sequences of pseudorandom numbers executed by digital processor that includes such circuits. Concepts particularly advantageous for such applications as spread-spectrum communications, cryptography, and generation of ranging codes, synthetic noise, and test data, where usually desirable to make pseudorandom sequences as long as possible.

  16. A Sequence for Sentence-Combining Instruction.

    ERIC Educational Resources Information Center

    Lawlor, Joseph

    Although sentence combining practice has been shown to be an effective instructional technique for improving students' writing, scant attention has been paid to the appropriate sequence for such instruction. Studies of the natural development of oral and written language point out two general trends that should be considered in sequencing sentence…

  17. Molecular selection in a unified evolutionary sequence

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1986-01-01

    With guidance from experiments and observations that indicate internally limited phenomena, an outline of unified evolutionary sequence is inferred. Such unification is not visible for a context of random matrix and random mutation. The sequence proceeds from Big Bang through prebiotic matter, protocells, through the evolving cell via molecular and natural selection, to mind, behavior, and society.

  18. Program Helps To Optimize Assembly Sequences

    NASA Technical Reports Server (NTRS)

    Borden, Chester S.; Werntz, David G.; Loyola, Steven J.

    1992-01-01

    FAST project-management software tool designed to optimize sequence of assembly of Space Station Freedom. Assesses effects of detailed changes upon system and produces output metrics identifying preferred assembly sequences. Incorporates Space-Shuttle integration, Space-Station hardware, on-orbit operations, and governing programmatic considerations as either precedence relations or numerical data. Written in C language.

  19. Learning of Sensory Sequences in Cerebellar Patients

    ERIC Educational Resources Information Center

    Frings, Markus; Boenisch, Raoul; Gerwig, Marcus; Diener, Hans-Christoph; Timmann, Dagmar

    2004-01-01

    A possible role of the cerebellum in detecting and recognizing event sequences has been proposed. The present study sought to determine whether patients with cerebellar lesions are impaired in the acquisition and discrimination of sequences of sensory stimuli of different modalities. A group of 26 cerebellar patients and 26 controls matched for…

  20. Optimizing cancer genome sequencing and analysis

    PubMed Central

    Griffith, Malachi; Miller, Christopher A.; Griffith, Obi L.; Krysiak, Kilannin; Skidmore, Zachary L.; Ramu, Avinash; Walker, Jason R.; Dang, Ha X.; Trani, Lee; Larson, David E.; Demeter, Ryan T.; Wendl, Michael C.; McMichael, Joshua F.; Austin, Rachel E.; Magrini, Vincent; McGrath, Sean D.; Ly, Amy; Kulkarni, Shashikant; Cordes, Matthew G.; Fronick, Catrina C.; Fulton, Robert S.; Maher, Christopher A.; Ding, Li; Klco, Jeffery M.; Mardis, Elaine R.; Ley, Timothy J.; Wilson, Richard K.

    2015-01-01

    Summary Tumors are typically sequenced to depths of 75–100× (exome) or 30–50× (whole genome). We demonstrate that current sequencing paradigms are inadequate for tumors that are impure, aneuploid or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ~312×) whole genome sequencing (WGS) and exome capture (up to ~433×) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested multiple alignment and variant calling algorithms and validated ~200,000 putative SNVs by sequencing them to depths of ~1,000×. Additional targeted sequencing provided over 10,000× coverage and ddPCR assays provided up to ~250,000× sampling of selected sites. We evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource (dbGaP accession id phs000159). PMID:26645048