Science.gov

Sample records for abrf edman sequencing

  1. The ABRF Edman Sequencing Research Group 2008 Study: Investigation into Homopolymeric Amino Acid N-Terminal Sequence Tags and Their Effects on Automated Edman Degradation

    PubMed Central

    Thoma, R. S.; Smith, J. S.; Sandoval, W.; Leone, J. W.; Hunziker, P.; Hampton, B.; Linse, K. D.; Denslow, N. D.

    2009-01-01

    The Edman Sequence Research Group (ESRG) of the Association of Biomolecular Resource designs and executes interlaboratory studies investigating the use of automated Edman degradation for protein and peptide analysis. In 2008, the ESRG enlisted the help of core sequencing facilities to investigate the effects of a repeating amino acid tag at the N-terminus of a protein. Commonly, to facilitate protein purification, an affinity tag containing a polyhistidine sequence is conjugated to the N-terminus of the protein. After expression, polyhistidine-tagged protein is readily purified via chelation with an immobilized metal affinity resin. The addition of the polyhistidine tag presents unique challenges for the determination of protein identity using Edman degradation chemistry. Participating laboratories were asked to sequence one protein engineered in three configurations: with an N-terminal polyhistidine tag; with an N-terminal polyalanine tag; or with no tag. Study participants were asked to return a data file containing the uncorrected amino acid picomole yields for the first 17 cycles. Initial and repetitive yield (R.Y.) information and the amount of lag were evaluated. Information about instrumentation and sample treatment was also collected as part of the study. For this study, the majority of participating laboratories successfully called the amino acid sequence for 17 cycles for all three test proteins. In general, laboratories found it more difficult to call the sequence containing the polyhistidine tag. Lag was observed earlier and more consistently with the polyhistidine-tagged protein than the polyalanine-tagged protein. Histidine yields were significantly less than the alanine yields in the tag portion of each analysis. The polyhistidine and polyalanine protein-R.Y. calculations were found to be equivalent. These calculations showed that the nontagged portion from each protein was equivalent. The terminal histidines from the tagged portion of the protein

  2. Mass spectrometric and Edman sequencing of lipocortin I isolated by two-dimensional SDS/PAGE of human melanoma lysates.

    PubMed Central

    Hall, S C; Smith, D M; Masiarz, F R; Soo, V W; Tran, H M; Epstein, L B; Burlingame, A L

    1993-01-01

    We have integrated preparative two-dimensional polyacrylamide gel electrophoresis with high-performance tandem mass spectrometry and Edman degradation. By using this approach, we have isolated and identified, by partial sequencing, a human melanoma protein (34 kDa, pI 6.4) as lipocortin I. To our knowledge, this protein was not previously known to be associated with melanoma cells. The identity of the protein was confirmed by two-dimensional immunoblot analysis. High-energy collision-induced dissociation analysis revealed the sequence and acetylation of the N-terminal tryptic peptide and an acrylamide-modified cysteine in another tryptic peptide. Thus, knowledge concerning both the primary structure and covalent modifications of proteins isolated from two-dimensional gels can be obtained directly by this approach, which is applicable to a broad range of biological problems. Images Fig. 1 Fig. 5 PMID:8446611

  3. Primary structure of three cationic peptides from porcine neutrophils. Sequence determination by the combined usage of electrospray ionization mass spectrometry and Edman degradation.

    PubMed

    Mirgorodskaya, O A; Shevchenko, A A; Abdalla, K O; Chernushevich, I V; Egorov, T A; Musoliamov, A X; Kokryakov, V N; Shamova, O V

    1993-09-20

    The primary structure of three major cationic peptides from porcine neutrophils has been determined. The sequencing was made by the combined use of electrospray ionization mass spectrometry and Edman degradation. The determined sequences unambiguously show that these peptides can not be considered as defensins.

  4. High-throughput sequencing of peptoids and peptide-peptoid hybrids by partial edman degradation and mass spectrometry.

    PubMed

    Thakkar, Amit; Cohen, Allison S; Connolly, Michael D; Zuckermann, Ronald N; Pei, Dehua

    2009-03-09

    A method for the rapid sequence determination of peptoids [oligo(N-substituted glycines)] and peptide-peptoid hybrids selected from one-bead-one-compound combinatorial libraries has been developed. In this method, beads carrying unique peptoid (or peptide-peptoid) sequences were subjected to multiple cycles of partial Edman degradation (PED) by treatment with a 1:3 (mol/mol) mixture of phenyl isothiocyanate (PITC) and 9-fluorenylmethyl chloroformate (Fmoc-Cl) to generate a series of N-terminal truncation products for each resin-bound peptoid. After PED, the Fmoc group was removed from the N-terminus and any reacted side chains via piperidine treatment. The resulting mixture of the full-length peptoid and its truncation products was analyzed by matrix-assisted laser desorption ionization (MALDI) mass spectrometry, to reveal the sequence of the full-length peptoid. With a slight modification, the method was also effective in the sequence determination of peptide-peptoid hybrids. This rapid, high-throughput, sensitive, and inexpensive sequencing method should greatly expand the utility of combinatorial peptoid libraries in biomedical and materials research.

  5. Rapid on-membrane proteolytic cleavage for Edman sequencing and mass spectrometric identification of proteins.

    PubMed

    Pham, Victoria C; Henzel, William J; Lill, Jennie R

    2005-11-01

    A method for the rapid limited enzymatic cleavage of PVDF membrane-immobilized proteins is described. This method allows the fast characterization of PVDF blotted proteins by peptide mass fingerprinting (Henzel, W. J., Billeci, T. M., Stults, J. T., Wong, S. C., Grimley, C., Wantanabe, C., Proc. Natl. Acad. Sci. USA 1993, 90, 5011-5015), LC-MS/MS, or N-terminal sequencing and has been demonstrated on a range of proteins using a full complement of proteolytic enzymes. This technique allows the generation of proteolytic fragments between 5 and 60 min (depending on the enzyme employed), which is significantly faster than previously reported on-membrane digestion methods. To date, this on-membrane rapid digestion protocol has aided in the identification and confirmation of mutation sites in over 200 recombinant proteins.

  6. Multi-platform and cross-methodological reproducibility of transcriptome profiling by RNA-seq in the ABRF Next-Generation Sequencing Study

    PubMed Central

    Nicolet, Charles M.; Grove, Deborah; Levy, Shawn; Farmerie, William; Viale, Agnes; Wright, Chris; Schweitzer, Peter A.; Gao, Yuan; Kim, Dewey; Boland, Joe; Hicks, Belynda; Kim, Ryan; Chhangawala, Sagar; Jafari, Nadereh; Raghavachari, Nalini; Gandara, Jorge; Garcia-Reyero, Natàlia; Hendrickson, Cynthia; Roberson, David; Rosenfeld, Jeffrey; Smith, Todd; Underwood, Jason G.; Wang, May; Zumbo, Paul; Baldwin, Don A.; Grills, George S.; Mason, Christopher E.

    2014-01-01

    High-throughput RNA sequencing (RNA-seq) dramatically expands the potential for novel genomics discoveries, but the wide variety of platforms, protocols and performance has created the need for comprehensive reference data. Here we describe the Association of Biomolecular Resource Facilities next-generation sequencing (ABRF-NGS) study on RNA-seq. We tested replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols (polyA-selected, ribo-depleted, size-selected and degraded) on five sequencing platforms (Illumina HiSeq, Life Technologies’ PGM and Proton, Pacific Biosciences RS and Roche’s 454). The results show high intra-platform and inter-platform concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms. These data also demonstrate that ribosomal RNA depletion can both enable effective analysis of degraded RNA samples and be readily compared to polyA-enriched fractions. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq. PMID:25150835

  7. Detection of DBD-carbamoyl amino acids in amino acid sequence and D/L configuration determination of peptides with fluorogenic Edman reagent 7-[(N,N-dimethylamino)sulfonyl]-2,1,3-benzoxadiazol-4-yl isothiocyanate.

    PubMed

    Huang, Y; Matsunaga, H; Toriba, A; Santa, T; Fukushima, T; Imai, K

    1999-06-01

    A method for amino acid sequence and D/L configuration identification of peptides by using fluorogenic Edman reagent 7-[(N, N-dimethylamino)sulfonyl]-2,1,3-benzoxadiazol-4-yl isothiocyanate (DBD-NCS) has been developed. This method was based on the Edman degradation principle with some modifications. A peptide or protein was coupled with DBD-NCS under basic conditions and then cyclized/cleaved to produce DBD-thiazolinone (TZ) derivative by BF3, a Lewis acid, which could significantly suppress the amino acid racemization. The liberated DBD-TZ amino acid was hydrolyzed to DBD-thiocarbamoyl (TC) amino acid under a weakly acidic condition and then oxidized by NaNO2/H+ to DBD-carbamoyl (CA) amino acid which was a stable and had a strong fluorescence intensity. The individual DBD-CA amino acids were separated on a reversed-phase high-performance liquid chromatography (RP-HPLC) for amino acid sequencing and their enantiomers were resolved on a chiral stationary-phase HPLC for identifying their D/L configurations. Combination of the two HPLC systems, the amino acid sequence and D/L configuration of peptides could be determined. This method will be useful for searching D-amino-acid-containing peptides in animals.

  8. A photothermally responsive nanoprobe for bioimaging based on Edman degradation

    NASA Astrophysics Data System (ADS)

    Liu, Yi; Wang, Zhantong; Zhang, Huimin; Lang, Lixin; Ma, Ying; He, Qianjun; Lu, Nan; Huang, Peng; Liu, Yijing; Song, Jibin; Liu, Zhibo; Gao, Shi; Ma, Qingjie; Kiesewetter, Dale O.; Chen, Xiaoyuan

    2016-05-01

    A new type of photothermally responsive nanoprobe based on Edman degradation has been synthesized and characterized. Under irradiation by an 808 nm laser, the heat generated by the gold nanorod core breaks the thiocarbamide structure and releases the fluorescent dye Cy5.5 with increased near-infrared (NIR) fluorescence under mild acidic conditions. This RGD modified nanoprobe is capable of fluorescence imaging of ανβ3 over-expressing U87MG cells in vitro and in vivo. This Edman degradation-based nanoprobe provides a novel strategy to design activatable probes for biomedical imaging and drug/gene delivery.A new type of photothermally responsive nanoprobe based on Edman degradation has been synthesized and characterized. Under irradiation by an 808 nm laser, the heat generated by the gold nanorod core breaks the thiocarbamide structure and releases the fluorescent dye Cy5.5 with increased near-infrared (NIR) fluorescence under mild acidic conditions. This RGD modified nanoprobe is capable of fluorescence imaging of ανβ3 over-expressing U87MG cells in vitro and in vivo. This Edman degradation-based nanoprobe provides a novel strategy to design activatable probes for biomedical imaging and drug/gene delivery. Electronic supplementary information (ESI) available: HPLC, MS and 1H NMR spectrum. See DOI: 10.1039/c6nr01400c

  9. THE ABRF MARG MICROARRAY SURVEY 2005: TAKING THE PULSE ON THE MICROARRAY FIELD

    EPA Science Inventory

    Over the past several years microarray technology has evolved into a critical component of any discovery based program. Since 1999, the Association of Biomolecular Resource Facilities (ABRF) Microarray Research Group (MARG) has conducted biennial surveys designed to generate a pr...

  10. ABRF-PRG07: Advanced Quantitative Proteomics Study

    PubMed Central

    Falick, Arnold M.; Lane, William S.; Lilley, Kathryn S.; MacCoss, Michael J.; Phinney, Brett S.; Sherman, Nicholas E.; Weintraub, Susan T.; Witkowska, H. Ewa; Yates, Nathan A.

    2011-01-01

    A major challenge for core facilities is determining quantitative protein differences across complex biological samples. Although there are numerous techniques in the literature for relative and absolute protein quantification, the majority is nonroutine and can be challenging to carry out effectively. There are few studies comparing these technologies in terms of their reproducibility, accuracy, and precision, and no studies to date deal with performance across multiple laboratories with varied levels of expertise. Here, we describe an Association of Biomolecular Resource Facilities (ABRF) Proteomics Research Group (PRG) study based on samples composed of a complex protein mixture into which 12 known proteins were added at varying but defined ratios. All of the proteins were present at the same concentration in each of three tubes that were provided. The primary goal of this study was to allow each laboratory to evaluate its capabilities and approaches with regard to: detection and identification of proteins spiked into samples that also contain complex mixtures of background proteins and determination of relative quantities of the spiked proteins. The results returned by 43 participants were compiled by the PRG, which also collected information about the strategies used to assess overall performance and as an aid to development of optimized protocols for the methodologies used. The most accurate results were generally reported by the most experienced laboratories. Among laboratories that used the same technique, values that were closer to the expected ratio were obtained by more experienced groups. PMID:21455478

  11. THE ABRF-MARG MICROARRAY SURVEY 2004: TAKING THE PULSE OF THE MICROARRAY FIELD

    EPA Science Inventory

    Over the past several years, the field of microarrays has grown and evolved drastically. In its continued efforts to track this evolution, the ABRF-MARG has once again conducted a survey of international microarray facilities and individual microarray users. The goal of the surve...

  12. Interlaboratory Study on Differential Analysis of Protein Glycosylation by Mass Spectrometry: The ABRF Glycoprotein Research Multi-Institutional Study 2012*

    PubMed Central

    Leymarie, Nancy; Griffin, Paula J.; Jonscher, Karen; Kolarich, Daniel; Orlando, Ron; McComb, Mark; Zaia, Joseph; Aguilan, Jennifer; Alley, William R.; Altmann, Friederich; Ball, Lauren E.; Basumallick, Lipika; Bazemore-Walker, Carthene R.; Behnken, Henning; Blank, Michael A.; Brown, Kristy J.; Bunz, Svenja-Catharina; Cairo, Christopher W.; Cipollo, John F.; Daneshfar, Rambod; Desaire, Heather; Drake, Richard R.; Go, Eden P.; Goldman, Radoslav; Gruber, Clemens; Halim, Adnan; Hathout, Yetrib; Hensbergen, Paul J.; Horn, David M.; Hurum, Deanna; Jabs, Wolfgang; Larson, Göran; Ly, Mellisa; Mann, Benjamin F.; Marx, Kristina; Mechref, Yehia; Meyer, Bernd; Möginger, Uwe; Neusüβ, Christian; Nilsson, Jonas; Novotny, Milos V.; Nyalwidhe, Julius O.; Packer, Nicolle H.; Pompach, Petr; Reiz, Bela; Resemann, Anja; Rohrer, Jeffrey S.; Ruthenbeck, Alexandra; Sanda, Miloslav; Schulz, Jan Mirco; Schweiger-Hufnagel, Ulrike; Sihlbom, Carina; Song, Ehwang; Staples, Gregory O.; Suckau, Detlev; Tang, Haixu; Thaysen-Andersen, Morten; Viner, Rosa I.; An, Yanming; Valmu, Leena; Wada, Yoshinao; Watson, Megan; Windwarder, Markus; Whittal, Randy; Wuhrer, Manfred; Zhu, Yiying; Zou, Chunxia

    2013-01-01

    One of the principal goals of glycoprotein research is to correlate glycan structure and function. Such correlation is necessary in order for one to understand the mechanisms whereby glycoprotein structure elaborates the functions of myriad proteins. The accurate comparison of glycoforms and quantification of glycosites are essential steps in this direction. Mass spectrometry has emerged as a powerful analytical technique in the field of glycoprotein characterization. Its sensitivity, high dynamic range, and mass accuracy provide both quantitative and sequence/structural information. As part of the 2012 ABRF Glycoprotein Research Group study, we explored the use of mass spectrometry and ancillary methodologies to characterize the glycoforms of two sources of human prostate specific antigen (PSA). PSA is used as a tumor marker for prostate cancer, with increasing blood levels used to distinguish between normal and cancer states. The glycans on PSA are believed to be biantennary N-linked, and it has been observed that prostate cancer tissues and cell lines contain more antennae than their benign counterparts. Thus, the ability to quantify differences in glycosylation associated with cancer has the potential to positively impact the use of PSA as a biomarker. We studied standard peptide-based proteomics/glycomics methodologies, including LC-MS/MS for peptide/glycopeptide sequencing and label-free approaches for differential quantification. We performed an interlaboratory study to determine the ability of different laboratories to correctly characterize the differences between glycoforms from two different sources using mass spectrometry methods. We used clustering analysis and ancillary statistical data treatment on the data sets submitted by participating laboratories to obtain a consensus of the glycoforms and abundances. The results demonstrate the relative strengths and weaknesses of top-down glycoproteomics, bottom-up glycoproteomics, and glycomics methods. PMID

  13. Interlaboratory study on differential analysis of protein glycosylation by mass spectrometry: the ABRF glycoprotein research multi-institutional study 2012.

    PubMed

    Leymarie, Nancy; Griffin, Paula J; Jonscher, Karen; Kolarich, Daniel; Orlando, Ron; McComb, Mark; Zaia, Joseph; Aguilan, Jennifer; Alley, William R; Altmann, Friederich; Ball, Lauren E; Basumallick, Lipika; Bazemore-Walker, Carthene R; Behnken, Henning; Blank, Michael A; Brown, Kristy J; Bunz, Svenja-Catharina; Cairo, Christopher W; Cipollo, John F; Daneshfar, Rambod; Desaire, Heather; Drake, Richard R; Go, Eden P; Goldman, Radoslav; Gruber, Clemens; Halim, Adnan; Hathout, Yetrib; Hensbergen, Paul J; Horn, David M; Hurum, Deanna; Jabs, Wolfgang; Larson, Göran; Ly, Mellisa; Mann, Benjamin F; Marx, Kristina; Mechref, Yehia; Meyer, Bernd; Möginger, Uwe; Neusüβ, Christian; Nilsson, Jonas; Novotny, Milos V; Nyalwidhe, Julius O; Packer, Nicolle H; Pompach, Petr; Reiz, Bela; Resemann, Anja; Rohrer, Jeffrey S; Ruthenbeck, Alexandra; Sanda, Miloslav; Schulz, Jan Mirco; Schweiger-Hufnagel, Ulrike; Sihlbom, Carina; Song, Ehwang; Staples, Gregory O; Suckau, Detlev; Tang, Haixu; Thaysen-Andersen, Morten; Viner, Rosa I; An, Yanming; Valmu, Leena; Wada, Yoshinao; Watson, Megan; Windwarder, Markus; Whittal, Randy; Wuhrer, Manfred; Zhu, Yiying; Zou, Chunxia

    2013-10-01

    One of the principal goals of glycoprotein research is to correlate glycan structure and function. Such correlation is necessary in order for one to understand the mechanisms whereby glycoprotein structure elaborates the functions of myriad proteins. The accurate comparison of glycoforms and quantification of glycosites are essential steps in this direction. Mass spectrometry has emerged as a powerful analytical technique in the field of glycoprotein characterization. Its sensitivity, high dynamic range, and mass accuracy provide both quantitative and sequence/structural information. As part of the 2012 ABRF Glycoprotein Research Group study, we explored the use of mass spectrometry and ancillary methodologies to characterize the glycoforms of two sources of human prostate specific antigen (PSA). PSA is used as a tumor marker for prostate cancer, with increasing blood levels used to distinguish between normal and cancer states. The glycans on PSA are believed to be biantennary N-linked, and it has been observed that prostate cancer tissues and cell lines contain more antennae than their benign counterparts. Thus, the ability to quantify differences in glycosylation associated with cancer has the potential to positively impact the use of PSA as a biomarker. We studied standard peptide-based proteomics/glycomics methodologies, including LC-MS/MS for peptide/glycopeptide sequencing and label-free approaches for differential quantification. We performed an interlaboratory study to determine the ability of different laboratories to correctly characterize the differences between glycoforms from two different sources using mass spectrometry methods. We used clustering analysis and ancillary statistical data treatment on the data sets submitted by participating laboratories to obtain a consensus of the glycoforms and abundances. The results demonstrate the relative strengths and weaknesses of top-down glycoproteomics, bottom-up glycoproteomics, and glycomics methods.

  14. ABRF Proteome Informatics Research Group (iPRG) 2015 Study: Detection of Differentially Abundant Proteins in Label-Free Quantitative LC-MS/MS Experiments.

    PubMed

    Choi, Meena; Eren-Dogu, Zeynep F; Colangelo, Christopher; Cottrell, John; Hoopmann, Michael R; Kapp, Eugene A; Kim, Sangtae; Lam, Henry; Neubert, Thomas A; Palmblad, Magnus; Phinney, Brett S; Weintraub, Susan T; MacLean, Brendan; Vitek, Olga

    2017-02-03

    Detection of differentially abundant proteins in label-free quantitative shotgun liquid chromatography-tandem mass spectrometry (LC-MS/MS) experiments requires a series of computational steps that identify and quantify LC-MS features. It also requires statistical analyses that distinguish systematic changes in abundance between conditions from artifacts of biological and technical variation. The 2015 study of the Proteome Informatics Research Group (iPRG) of the Association of Biomolecular Resource Facilities (ABRF) aimed to evaluate the effects of the statistical analysis on the accuracy of the results. The study used LC-tandem mass spectra acquired from a controlled mixture, and made the data available to anonymous volunteer participants. The participants used methods of their choice to detect differentially abundant proteins, estimate the associated fold changes, and characterize the uncertainty of the results. The study found that multiple strategies (including the use of spectral counts versus peak intensities, and various software tools) could lead to accurate results, and that the performance was primarily determined by the analysts' expertise. This manuscript summarizes the outcome of the study, and provides representative examples of good computational and statistical practice. The data set generated as part of this study is publicly available.

  15. The 2012/2013 ABRF Proteomic Research Group Study: Assessing Longitudinal Intralaboratory Variability in Routine Peptide Liquid Chromatography Tandem Mass Spectrometry Analyses*

    PubMed Central

    Bennett, Keiryn L.; Wang, Xia; Bystrom, Cory E.; Chambers, Matthew C.; Andacht, Tracy M.; Dangott, Larry J.; Elortza, Félix; Leszyk, John; Molina, Henrik; Moritz, Robert L.; Phinney, Brett S.; Thompson, J. Will; Bunger, Maureen K.; Tabb, David L.

    2015-01-01

    Questions concerning longitudinal data quality and reproducibility of proteomic laboratories spurred the Protein Research Group of the Association of Biomolecular Resource Facilities (ABRF-PRG) to design a study to systematically assess the reproducibility of proteomic laboratories over an extended period of time. Developed as an open study, initially 64 participants were recruited from the broader mass spectrometry community to analyze provided aliquots of a six bovine protein tryptic digest mixture every month for a period of nine months. Data were uploaded to a central repository, and the operators answered an accompanying survey. Ultimately, 45 laboratories submitted a minimum of eight LC-MSMS raw data files collected in data-dependent acquisition (DDA) mode. No standard operating procedures were enforced; rather the participants were encouraged to analyze the samples according to usual practices in the laboratory. Unlike previous studies, this investigation was not designed to compare laboratories or instrument configuration, but rather to assess the temporal intralaboratory reproducibility. The outcome of the study was reassuring with 80% of the participating laboratories performing analyses at a medium to high level of reproducibility and quality over the 9-month period. For the groups that had one or more outlying experiments, the major contributing factor that correlated to the survey data was the performance of preventative maintenance prior to the LC-MSMS analyses. Thus, the Protein Research Group of the Association of Biomolecular Resource Facilities recommends that laboratories closely scrutinize the quality control data following such events. Additionally, improved quality control recording is imperative. This longitudinal study provides evidence that mass spectrometry-based proteomics is reproducible. When quality control measures are strictly adhered to, such reproducibility is comparable among many disparate groups. Data from the study are

  16. Unraveling the sequence and structure of the protein osteocalcin from a 42 ka fossil horse

    NASA Astrophysics Data System (ADS)

    Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Andrews, Philip C.; Leykam, Joseph; Stafford, Thomas W.; Kelly, Robert L.; Walker, Danny N.; Buckley, Mike; Humpula, James

    2006-04-01

    We report the first complete amino acid sequence and evidence of secondary structure for osteocalcin from a temperate fossil. The osteocalcin derives from a 42 ka equid bone excavated from Juniper Cave, Wyoming. Results were determined by matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-MS) and Edman sequencing with independent confirmation of the sequence in two laboratories. The ancient sequence was compared to that of three modern taxa: horse ( Equus caballus), zebra ( Equus grevyi), and donkey ( Equus asinus). Although there was no difference in sequence among modern taxa, MALDI-MS and Edman sequencing show that residues 48 and 49 of our modern horse are Thr, Ala rather than Pro, Val as previously reported (Carstanjen B., Wattiez, R., Armory, H., Lepage, O.M., Remy, B., 2002. Isolation and characterization of equine osteocalcin. Ann. Med. Vet.146(1), 31-38). MALDI-MS and Edman sequencing data indicate that the osteocalcin sequence of the 42 ka fossil is similar to that of modern horse. Previously inaccessible structural attributes for ancient osteocalcin were observed. Glu 39 rather than Gln 39 is consistent with deamidation, a process known to occur during fossilization and aging. Two post-translational modifications were documented: Hyp 9 and a disulfide bridge. The latter suggests at least partial retention of secondary structure. As has been done for ancient DNA research, we recommend standards for preparation and criteria for authenticating results of ancient protein sequencing.

  17. Protein Sequencing with Tandem Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Ziady, Assem G.; Kinter, Michael

    The recent introduction of electrospray ionization techniques that are suitable for peptides and whole proteins has allowed for the design of mass spectrometric protocols that provide accurate sequence information for proteins. The advantages gained by these approaches over traditional Edman Degradation sequencing include faster analysis and femtomole, sometimes attomole, sensitivity. The ability to efficiently identify proteins has allowed investigators to conduct studies on their differential expression or modification in response to various treatments or disease states. In this chapter, we discuss the use of electrospray tandem mass spectrometry, a technique whereby protein-derived peptides are subjected to fragmentation in the gas phase, revealing sequence information for the protein. This powerful technique has been instrumental for the study of proteins and markers associated with various disorders, including heart disease, cancer, and cystic fibrosis. We use the study of protein expression in cystic fibrosis as an example.

  18. Primary structure of a histidine-rich proteolytic fragment of human ceruloplasmin. I. Amino acid sequence of the cyanogen bromide peptides.

    PubMed

    Kingston, I B; Kingston, B L; Putnam, F W

    1980-04-10

    A histidine-rich fragment, Cp F5, with a molecular weight of 18,650 was isolated from human ceruloplasmin. It consists of 159 amino acids and contains a possible copper-binding site. The sequence of the first 18 NH2-terminal residues of Cp F5 was determined by automated Edman degradation. Cp F5 was cleaved by cyanogen bromide to produce nine fragments of from 2 to 63 residues. The amino acid sequence of all of the cyanogen bromide fragments was investigated using automated and manual Edman degradation, the fragments being digested with trypsin, chymotrypsin, thermolysin, staphylococcal protease, and pepsin as appropriate. The results, in conjunction with the data on the tryptic peptides reported in the accompanying paper (Kingston, I.B., Kingston, B.L., and Putnam, F.L. (1980) J. Biol. Chem. 255, 2886-2896), establish the complete amino acid sequence of Cp F5.

  19. Complete amino acid sequence and structure characterization of the taste-modifying protein, miraculin.

    PubMed

    Theerasilp, S; Hitotsuya, H; Nakajo, S; Nakaya, K; Nakamura, Y; Kurihara, Y

    1989-04-25

    The taste-modifying protein, miraculin, has the unusual property of modifying sour taste into sweet taste. The complete amino acid sequence of miraculin purified from miracle fruits by a newly developed method (Theerasilp, S., and Kurihara, Y. (1988) J. Biol. Chem. 263, 11536-11539) was determined by an automatic Edman degradation method. Miraculin was a single polypeptide with 191 amino acid residues. The calculated molecular weight based on the amino acid sequence and the carbohydrate content (13.9%) was 24,600. Asn-42 and Asn-186 were linked N-glycosidically to carbohydrate chains. High homology was found between the amino acid sequences of miraculin and soybean trypsin inhibitor.

  20. De novo proteomic sequencing of a monoclonal antibody raised against OX40 ligand.

    PubMed

    Pham, Victoria; Henzel, William J; Arnott, David; Hymowitz, Sarah; Sandoval, Wendy N; Truong, Bao-Tran; Lowman, Henry; Lill, Jennie R

    2006-05-01

    De novo sequencing of a full-length monoclonal antibody raised against OX40 ligand is described. Using a combination of overlapping complementary proteolytic and chemical digestions, with analysis by mass spectrometry and Edman degradation, both the heavy and light chains were fully sequenced. Particular attention was paid to those modifications that could be susceptible to degradation in the complementarity determining region and Fc region. An overview of the protocol is described, and suggestions for improvements to aid in such sequencing projects in the future are discussed.

  1. Dipeptide Sequence Determination: Analyzing Phenylthiohydantoin Amino Acids by HPLC

    NASA Astrophysics Data System (ADS)

    Barton, Janice S.; Tang, Chung-Fei; Reed, Steven S.

    2000-02-01

    Amino acid composition and sequence determination, important techniques for characterizing peptides and proteins, are essential for predicting conformation and studying sequence alignment. This experiment presents improved, fundamental methods of sequence analysis for an upper-division biochemistry laboratory. Working in pairs, students use the Edman reagent to prepare phenylthiohydantoin derivatives of amino acids for determination of the sequence of an unknown dipeptide. With a single HPLC technique, students identify both the N-terminal amino acid and the composition of the dipeptide. This method yields good precision of retention times and allows use of a broad range of amino acids as components of the dipeptide. Students learn fundamental principles and techniques of sequence analysis and HPLC.

  2. Amino acid sequence of versutoxin, a lethal neurotoxin from the venom of the funnel-web spider Atrax versutus.

    PubMed

    Brown, M R; Sheumack, D D; Tyler, M I; Howden, M E

    1988-03-01

    The complete amino acid sequence of versutoxin, a lethal neurotoxic polypeptide isolated from the venom of male and female funnel-web spiders of the species Atrax versutus, was determined. Sequencing was performed in a gas-phase protein sequencer by automated Edman degradation of the S-carboxymethylated toxin and fragments of it produced by reaction with CNBr. Versutoxin consisted of a single chain of 42 amino acid residues. It was found to have a high proportion of basic residues and of cystine. The primary structure showed marked homology with that of robustoxin, a novel neurotoxin recently isolated from the venom of another funnel-web-spider species, Atrax robustus.

  3. Amino acid sequence of versutoxin, a lethal neurotoxin from the venom of the funnel-web spider Atrax versutus.

    PubMed Central

    Brown, M R; Sheumack, D D; Tyler, M I; Howden, M E

    1988-01-01

    The complete amino acid sequence of versutoxin, a lethal neurotoxic polypeptide isolated from the venom of male and female funnel-web spiders of the species Atrax versutus, was determined. Sequencing was performed in a gas-phase protein sequencer by automated Edman degradation of the S-carboxymethylated toxin and fragments of it produced by reaction with CNBr. Versutoxin consisted of a single chain of 42 amino acid residues. It was found to have a high proportion of basic residues and of cystine. The primary structure showed marked homology with that of robustoxin, a novel neurotoxin recently isolated from the venom of another funnel-web-spider species, Atrax robustus. PMID:3355530

  4. The application of 0.1 M quadrol to the microsequence of proteins and the sequence of tryptic peptides.

    PubMed

    Brauer, A W; Margolies, M N; Haber, E

    1975-07-01

    In an effort to extend automated Edman degradation to nanomole quantities of protein, the method of sequenator analysis described by Edman and Begg (Edman, P., and Begg, G. (1967), Eur. J. Biochem. 1, 80) was modified to permit long degradations in the absence of carrier proteins. By using an aqueous 0.1 M Quadrol program with limited, combined benezene-ethyl acetate solvent extractions, as well as a change in the delivery system for heptafluorobutyric acid, it was possible to recover and identify the first 30 amino acid residues from a sequenator run on 7 nmol of myoglobin. For 3 nmol of myoglobin, 20 steps could be identified. PTH-amino acids were identified by gas-liquid chromatography and thin-layer chromatography on polyamide sheets. Without using a carrier protein the cup to prevent mechanical losses (Niall, H. D., Jacobs, J. W., Van Rietshoten, J., and Tregear, G. W. (1974), FEBS Lett. 41, 62), the repetitive yield using this program was 93-96%. The same program has been applied successfully to peptides of 14 or more residues with or without modification by Braunitzer's reagent and to a number of larger peptides and proteins including a 216 residue segment of rabbit antibody heavy chain in which a sequence of 35 steps was accomplished on 25 nmol.

  5. Purification and partial sequence analysis of human T-cell growth factor.

    PubMed Central

    Robb, R J; Kutny, R M; Chowdhry, V

    1983-01-01

    A murine monoclonal antibody directed against human T-cell growth factor (TCGF) from the JURKAT cell line was used for affinity column purification of the factor. Bound TCGF was eluted nearly quantitatively at low pH, and the recovered factor appeared homogeneous by two-dimensional gel electrophoresis. The molecule is markedly hydrophobic, with a high content of leucine. A single NH2-terminal sequence of 36 residues was obtained by automated Edman degradation, further supporting the homogeneity of the material. Thus, significant quantities of purified TCGF have been prepared in a single step, making possible detailed analysis of its molecular structure and biological role. Images PMID:6604277

  6. Amino acid sequence of neurotoxin III of the scorpion Androctonus austrialis Hector.

    PubMed

    Kopeyan, C; Martinez, G; Rochat, H

    1979-03-01

    The amino acid sequence of neurotoxin III, purified from the venom of the North African scorpion Androctonus australis Hector, has been determined by Edman degradation using a liquid-phase sequencer. Carboxypeptidase A hydrolyses confirmed not only the sequence of the five last residues but also the presence of a free alpha-carboxylic group at the C-terminus. Edman degradation was conducted on one hand with the Quadrol [N,N,N',N'-tetrakis(2-hydroxypropyl)ethylene diamine] program and S-alkylated protein before or after coupling with sulfophenylisothiocynate (the first 34 residues were thus identified), on the other hand on tryptic and chymotryptic peptides with a dimethylbenzylamine program (residues 1--23 and 31--34 were confirmed, the positions of residues 35-64 were established). Neurotoxin III was found to belong to the same group of scorpion toxins active on mammals as neurotoxin I purified from the same venom (50 homologous positions exist in the two proteins).

  7. Amino acid sequence of mouse submaxillary gland renin.

    PubMed Central

    Misono, K S; Chang, J J; Inagami, T

    1982-01-01

    The complete amino acid sequences of the heavy chain and light chain of mouse submaxillary gland renin have been determined. The heavy chain consists of 288 amino acid residues having a Mr of 31,036 calculated from the sequence. The light chain contains 48 amino acid residues with a Mr of 5,458. The sequence of the heavy chain was determined by automated Edman degradations of the cyanogen bromide peptides and tryptic peptides generated after citraconylation, as well as other peptides generated therefrom. The sequence of the light chain was derived from sequence analyses of the peptides generated by cyanogen bromide cleavage or by digestion with Staphylococcus aureus protease. The sequences in the active site regions in renin containing two catalytically essential aspartyl residues 32 and 215 were found identical with those in pepsin, chymosin, and penicillopepsin. Comparison of the amino acid sequence of renin with that of porcine pepsin indicated a 42% sequence identity of the heavy chain with the amino-terminal and middle regions and a 46% identity of the light chain with the carboxyl-terminal region of the porcine pepsin sequence. Residues identical in renin and pepsin are distributed throughout the length of the molecules, suggesting a similarity in their overall structures. PMID:6812055

  8. Amino acid sequence of myoglobin from white-tailed deer (Odocoileus virginianus).

    PubMed

    Joseph, Poulson; Suman, Surendranath P; Li, Shuting; Fontaine, Michele; Steinke, Laurey

    2012-10-01

    Our objective was to determine the primary structure of white-tailed deer myoglobin (Mb). White-tailed deer Mb was isolated from cardiac muscles employing ammonium sulfate precipitation and gel-filtration chromatography. The amino acid sequence was determined by Edman degradation. Sequence analyses of intact Mb as well as tryptic- and cyanogen bromide-peptides yielded the complete primary structure of white-tailed deer Mb, which shared 100% similarity with red deer Mb. White-tailed deer Mb consists of 153 amino acid residues and shares more than 96% sequence similarity with myoglobins from meat-producing ruminants, such as cattle, buffalo, sheep, and goat. Similar to sheep and goat myoglobins, white-tailed deer Mb contains 12 histidine residues. Proximal (position 93) and distal (position 64) histidine residues responsible for maintaining the stability of heme are conserved in white-tailed deer Mb.

  9. Active site amino acid sequence of human factor D.

    PubMed

    Davis, A E

    1980-08-01

    Factor D was isolated from human plasma by chromatography on CM-Sephadex C50, Sephadex G-75, and hydroxylapatite. Digestion of reduced, S-carboxymethylated factor D with cyanogen bromide resulted in three peptides which were isolated by chromatography on Sephadex G-75 (superfine) equilibrated in 20% formic acid. NH2-Terminal sequences were determined by automated Edman degradation with a Beckman 890C sequencer using a 0.1 M Quadrol program. The smallest peptide (CNBr III) consisted of the NH2-terminal 14 amino acids. The other two peptides had molecular weights of 17,000 (CNBr I) and 7000 (CNBr II). Overlap of the NH2-terminal sequence of factor D with the NH2-terminal sequence of CNBr I established the order of the peptides. The NH2-terminal 53 residues of factor D are somewhat more homologous with the group-specific protease of rat intestine than with other serine proteases. The NH2-terminal sequence of CNBr II revealed the active site serine of factor D. The typical serine protease active site sequence (Gly-Asp-Ser-Gly-Gly-Pro was found at residues 12-17. The region surrounding the active site serine does not appear to be more highly homologous with any one of the other serine proteases. The structural data obtained point out the similarities between factor D and the other proteases. However, complete definition of the degree of relationship between factor D and other proteases will require determination of the remainder of the primary structure.

  10. Purification, amino acid sequence and immunological characterization of Ole e 6, a cysteine-enriched allergen from olive tree pollen.

    PubMed

    Batanero, E; Ledesma, A; Villalba, M; Rodríguez, R

    1997-06-30

    The Ole e 6 allergen from olive tree pollen has been isolated by combining gel permeation and reverse-phase chromatographies. It is a single and highly acidic (pI 4.2) polypeptide chain protein. Its NH2-terminal amino acid sequence has been determined by Edman degradation. Total RNA from the olive tree pollen was isolated, and a specific cDNA was amplified by the polymerase chain reaction using a degenerate oligonucleotide primer designed according to the NH2-terminal sequence of the protein. The nucleotide sequencing of the cDNA rendered an open reading frame encoding a 50 amino acid polypeptide chain, in which two sets of the sequential motif Cys-X3-Cys-X3-Cys are present. No sequence similarity has been found between this protein and other previously described polypeptides.

  11. A manual sequence method of peptides and phosphopeptides using 4-(1'-cyanoisoindolyl)phenylisothiocyanate.

    PubMed

    Shibata, Takayuki; Wainaina, Moses N; Miyoshi, Takayuki; Kabashima, Tsutomu; Kai, Masaaki

    2011-06-17

    A method for sequence analysis and identification of phosphoamino acids in peptides based on high performance liquid chromatography (HPLC) is described. The peptides were derivatized with an Edman type reagent, 4-(1'-cyanoisoindolyl)phenylisothiocyanate (CIPIC) and subsequently cleaved to generate stable and fluorescent 4-(1'-cyanoisoindolyl)phenylthiazolinone (CIP-TZ)-amino acids. Several experimental factors that affected derivatization on membranes were examined. Under the optimized conditions, the CIP-TZ derivatives of Try(p), Thr(p) and Ser(p) were obtained and separated from their parent amino acids with baseline resolution using an isocratic elution system. Up to the 4th residue of phosphorylated pentapeptides was successfully identified, whereas phosphoamino acid residues could not be detected by the conventional procedure using phenylisothiocyanate (PITC). The results demonstrated the potential of CIPIC as a derivatization reagent for peptide sequencing and the applicability of the method for the study and identification of phosphoamino acids in peptides.

  12. Amino acid sequence of myoglobin from emu (Dromaius novaehollandiae) skeletal muscle.

    PubMed

    Suman, S P; Joseph, P; Li, S; Beach, C M; Fontaine, M; Steinke, L

    2010-11-01

    The objective of the present study was to characterize the primary structure of emu myoglobin (Mb). Emu Mb was isolated from Iliofibularis muscle employing gel-filtration chromatography. Matrix Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry was employed to determine the exact molecular mass of emu Mb in comparison with horse Mb, and Edman degradation was utilized to characterize the amino acid sequence. The molecular mass of emu Mb was 17,380 Da and was close to those reported for ratite and poultry myoglobins. Similar to myoglobins from meat-producing livestock and birds, emu Mb has 153 amino acids. Emu Mb contains 9 histidines. Proximal and distal histidines, responsible for coordinating oxygen-binding property of Mb, are conserved in emu. Emu Mb shared more than 90% homology with ratite and chicken myoglobins, whereas it demonstrated only less than 70% sequence similarity with ruminant myoglobins.

  13. Spermatogenesis of the lizard Lacerta vivipara: histological studies and amino acid sequence of a protamine lacertine 1.

    PubMed

    Martinage, A; Depeiges, A; Wouters, D; Morel, L; Sautière, P

    1996-06-01

    The lizard Lacerta vivipara is a seasonal breeder with a well characterized reproductive cycle. An histological study of the lizard testis has been performed at different stages of spermatogenesis and the nuclear basic proteins content was assessed by electrophoretical analysis. Two protamines, lacertines 1 and 2, are present in spermatozoa in April and May. We have isolated lacertine1 and characterized a protamine with a mass of 4,963.7 Da. Amino acid sequence of this protamine (41 residues) was established from data provided by automated Edman degradation. It is characterized by a basic amino acid stretch in the N- and C-terminal regions and by a central part which only consists of 3 different intermingled amino acids. This protamine presents 62% homology with scylliorhinine Z3 from dog-fish Scylliorhinus caniculus and 58% homology with quail protamine. The reported lizard protamine sequence is the first reptilian protamine sequence available so far.

  14. Amino acid sequence of bovine heart coupling factor 6.

    PubMed Central

    Fang, J K; Jacobs, J W; Kanner, B I; Racker, E; Bradshaw, R A

    1984-01-01

    The amino acid sequence of bovine heart mitochondrial coupling factor 6 (F6) has been determined by automated Edman degradation of the whole protein and derived peptides. Preparations based on heat precipitation and ethanol extraction showed allotypic variation at three positions while material further purified by HPLC yielded only one sequence that also differed by a Phe-Thr replacement at residue 62. The mature protein contains 76 amino acids with a calculated molecular weight of 9006 and a pI of approximately equal to 5, in good agreement with experimentally measured values. The charged amino acids are mainly clustered at the termini and in one section in the middle; these three polar segments are separated by two segments relatively rich in nonpolar residues. Chou-Fasman analysis suggests three stretches of alpha-helix coinciding (or within) the high-charge-density sequences with a single beta-turn at the first polar-nonpolar junction. Comparison of the F6 sequence with those of other proteins did not reveal any homologous structures. PMID:6149548

  15. Recognizing Sequences of Sequences

    PubMed Central

    Kiebel, Stefan J.; von Kriegstein, Katharina; Daunizeau, Jean; Friston, Karl J.

    2009-01-01

    The brain's decoding of fast sensory streams is currently impossible to emulate, even approximately, with artificial agents. For example, robust speech recognition is relatively easy for humans but exceptionally difficult for artificial speech recognition systems. In this paper, we propose that recognition can be simplified with an internal model of how sensory input is generated, when formulated in a Bayesian framework. We show that a plausible candidate for an internal or generative model is a hierarchy of ‘stable heteroclinic channels’. This model describes continuous dynamics in the environment as a hierarchy of sequences, where slower sequences cause faster sequences. Under this model, online recognition corresponds to the dynamic decoding of causal sequences, giving a representation of the environment with predictive power on several timescales. We illustrate the ensuing decoding or recognition scheme using synthetic sequences of syllables, where syllables are sequences of phonemes and phonemes are sequences of sound-wave modulations. By presenting anomalous stimuli, we find that the resulting recognition dynamics disclose inference at multiple time scales and are reminiscent of neuronal dynamics seen in the real brain. PMID:19680429

  16. Amino acid sequence of two neurotoxins from the venom of the Egyptian black snake (Walterinnesia aegyptia).

    PubMed

    Samejima, Y; Aoki-Tomomatsu, Y; Yanagisawa, M; Mebs, D

    1997-02-01

    The venom of the Egyptian black snake Walterinnesia aegyptia contains at least three toxins, which act postsynaptically to block the neuromuscular transmission of isolated rat phrenic nerve-diaphragm and chicken biventer cervicis muscle. The complete amino acid sequence of the two toxins, W-III and W-IV, consisting of 62 amino acid residues, was elucidated by Edman degradation of fragments obtained after Staphylococcus aureus protease and prolylpeptidase digestion. Although the toxins exhibit close structural homology to other short-chain postsynaptic neurotoxins from Elapidae venoms, toxin IV is unique by having a free SH-group (cysteine) at position 16. In position 35 of W-III, which is located at the tip of the central loop, threonine is replaced by lysine, which may alter the interaction of the toxin with the acetylcholine receptor, since the toxin is seven times less lethal than toxin W-IV.

  17. Investigation of the protein osteocalcin of Camelops hesternus: Sequence, structure and phylogenetic implications

    NASA Astrophysics Data System (ADS)

    Humpula, James F.; Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Stafford, Thomas W.; Smith, James J.; Voorhies, Michael R.; George Corner, R.; Andrews, Phillip C.

    2007-12-01

    Ancient DNA sequences offer an extraordinary opportunity to unravel the evolutionary history of ancient organisms. Protein sequences offer another reservoir of genetic information that has recently become tractable through the application of mass spectrometric techniques. The extent to which ancient protein sequences resolve phylogenetic relationships, however, has not been explored. We determined the osteocalcin amino acid sequence from the bone of an extinct Camelid (21 ka, Camelops hesternus) excavated from Isleta Cave, New Mexico and three bones of extant camelids: bactrian camel ( Camelus bactrianus); dromedary camel ( Camelus dromedarius) and guanaco ( Llama guanacoe) for a diagenetic and phylogenetic assessment. There was no difference in sequence among the four taxa. Structural attributes observed in both modern and ancient osteocalcin include a post-translation modification, Hyp 9, deamidation of Gln 35 and Gln 39, and oxidation of Met 36. Carbamylation of the N-terminus in ancient osteocalcin may result in blockage and explain previous difficulties in sequencing ancient proteins via Edman degradation. A phylogenetic analysis using osteocalcin sequences of 25 vertebrate taxa was conducted to explore osteocalcin protein evolution and the utility of osteocalcin sequences for delineating phylogenetic relationships. The maximum likelihood tree closely reflected generally recognized taxonomic relationships. For example, maximum likelihood analysis recovered rodents, birds and, within hominins, the Homo-Pan-Gorilla trichotomy. Within Artiodactyla, character state analysis showed that a substitution of Pro 4 for His 4 defines the Capra-Ovis clade within Artiodactyla. Homoplasy in our analysis indicated that osteocalcin evolution is not a perfect indicator of species evolution. Limited sequence availability prevented assigning functional significance to sequence changes. Our preliminary analysis of osteocalcin evolution represents an initial step towards a

  18. Sequence comparison of pepsin-resistant segments of basement-membrane collagen alpha 1(IV) chains from bovine lens capsule and mouse tumour.

    PubMed Central

    Schuppan, D; Glanville, R W; Timpl, R; Dixit, S N; Kang, A H

    1984-01-01

    The C-terminal peptic fragment P1 (about 518 amino acid residues) of bovine lens-capsule collagen alpha 1(IV) chain was cleaved with CNBr and trypsin. The peptides were purified and characterized, allowing their ordering within the P1 fragment by comparison with a corresponding section of mouse collagen alpha 1(IV) chain [Schuppan, Glanville & Timpl (1982) Eur. J. Biochem. 123, 505-512]. About 67% of the sequence of bovine collagen fragment P1 was determined by Edman degradation. Comparison with the sequence of the corresponding mouse collagen fragment P1 showed 76% identity for positions Xaa and Yaa of the triplet structures Gly-Xaa-Yaa. Invariance was found for the positions of two non-triplet interruptions and of 3-hydroxyproline residues, pointing to the functional importance of these structures. PMID:6430279

  19. Purification and N-terminal sequence of a serine proteinase-like protein (BMK-CBP) from the venom of the Chinese scorpion (Buthus martensii Karsch).

    PubMed

    Gao, Rong; Zhang, Yong; Gopalakrishnakone, Ponnampalam

    2008-08-01

    A serine proteinase-like protein was isolated from the venom of Chinese red scorpion (Buthus martensii Karsch) by combination of gel filtration, ion-exchange and reveres-phase chromatography and named BMK-CBP. The apparent molecular weight of BMK-CBP was identified as 33 kDa by SDS-PAGE under non-reducing condition. The sequence of N-terminal 40 amino acids was obtained by Edman degradation. The sequence shows highest similarity to proteinase from insect source. When tested with commonly used substrates of proteinase, no significant hydrolytic activity was observed for BMK-CBP. The purified BMK-CBP was found to bind to the cancer cell line MCF-7 and the cell binding ability was dose-dependent.

  20. Genome Sequencing.

    PubMed

    Verma, Mansi; Kulshrestha, Samarth; Puri, Ayush

    2017-01-01

    Genome sequencing is an important step toward correlating genotypes with phenotypic characters. Sequencing technologies are important in many fields in the life sciences, including functional genomics, transcriptomics, oncology, evolutionary biology, forensic sciences, and many more. The era of sequencing has been divided into three generations. First generation sequencing involved sequencing by synthesis (Sanger sequencing) and sequencing by cleavage (Maxam-Gilbert sequencing). Sanger sequencing led to the completion of various genome sequences (including human) and provided the foundation for development of other sequencing technologies. Since then, various techniques have been developed which can overcome some of the limitations of Sanger sequencing. These techniques are collectively known as "Next-generation sequencing" (NGS), and are further classified into second and third generation technologies. Although NGS methods have many advantages in terms of speed, cost, and parallelism, the accuracy and read length of Sanger sequencing is still superior and has confined the use of NGS mainly to resequencing genomes. Consequently, there is a continuing need to develop improved real time sequencing techniques. This chapter reviews some of the options currently available and provides a generic workflow for sequencing a genome.

  1. Unconventional amino acid sequence of the sun anemone (Stoichactis helianthus) polypeptide neurotoxin

    SciTech Connect

    Kem, W.; Dunn, B.; Parten, B.; Pennington, M.; Price, D.

    1986-05-01

    A 5000 dalton polypeptide neurotoxin (Sh-NI) purified by G50 Sephadex, P-cellulose, and SP-Sephadex chromatography was homogeneous by isoelectric focusing. Sh-NI was highly toxic to crayfish (LD/sub 50/ 0.6 ..mu..g/kg) but without effect upon mice at 15,000 ..mu..g/kg (i.p. injection). The reduced, /sup 3/H-carboxymethylated toxin and its fragments were subjected to automatic Edman degradation and the resulting PTH-amino acids were identified by HPLC, back hydrolysis, and scintillation counting. Peptides resulting from proteolytic (clostripain, staphylococcal protease) and chemical (tryptophan) cleavage were sequenced. The sequence is: AACKCDDEGPDIRTAPLTGTVDLGSCNAGWEKCASYYTIIADCCRKKK. This sequence differs considerably from the homologous Anemonia and Anthopleura toxins; many of the identical residues (6 half-cystines, G9, P10, R13, G19, G29, W30) are probably critical for folding rather than receptor recognition. However, the Sh-NI sequence closely resembles Radioanthus macrodactylus neurotoxin III and r. paumotensis II. The authors propose that Sh-NI and related Radioanthus toxins act upon a different site on the sodium channel.

  2. The amino acid sequence of the aspartate aminotransferase from baker's yeast (Saccharomyces cerevisiae).

    PubMed Central

    Cronin, V B; Maras, B; Barra, D; Doonan, S

    1991-01-01

    1. The single (cytosolic) aspartate aminotransferase was purified in high yield from baker's yeast (Saccharomyces cerevisiae). 2. Amino-acid-sequence analysis was carried out by digestion of the protein with trypsin and with CNBr; some of the peptides produced were further subdigested with Staphylococcus aureus V8 proteinase or with pepsin. Peptides were sequenced by the dansyl-Edman method and/or by automated gas-phase methods. The amino acid sequence obtained was complete except for a probable gap of two residues as indicated by comparison with the structures of counterpart proteins in other species. 3. The N-terminus of the enzyme is blocked. Fast-atom-bombardment m.s. was used to identify the blocking group as an acetyl one. 4. Alignment of the sequence of the enzyme with those of vertebrate cytosolic and mitochondrial aspartate aminotransferases and with the enzyme from Escherichia coli showed that about 25% of residues are conserved between these distantly related forms. 5. Experimental details and confirmatory data for the results presented here are given in a Supplementary Publication (SUP 50164, 25 pages) that has been deposited at the British Library Document Supply Centre, Boston Spa. Wetherby, West Yorkshire LS23 7 BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1991) 273, 5. PMID:1859361

  3. Sequence analysis and location of capsid proteins within RNA 2 of strawberry latent ringspot virus.

    PubMed

    Kreiah, S; Strunk, G; Cooper, J I

    1994-09-01

    The nucleotide sequence of the RNA 2 of a strawberry isolate (H) of strawberry latent ringspot virus (SLRSV) comprised 3824 nucleotides and contained one long open reading frame with a theoretical coding capacity of 890 amino acids equivalent to a protein of 98.8K. The N-terminal amino acid sequences of virion-derived proteins were determined by Edman degradation allowing the capsid coding regions to be located and serine/glycine cleavage sites to be identified within the polyprotein. The amino acid sequence in the capsid coding region of an isolate of SLRSV from flowering cherry in New Zealand was 97% identical to that of SLRSV-H. Except in the 3' and 5' terminal non-coding sequences, computer-based alignment and comparison algorithms did not reveal any substantial homologies between RNA 2 of SLRSV-H and the equivalent genomic segments in the nepoviruses arabis mosaic, cherry leaf roll, grapevine fanleaf, raspberry ringspot, grapevine hungarian chrome mosaic, tomato blackring, tomato ringspot, tobacco ringspot, or in the comoviruses cowpea mosaic and red clover mottle. Despite the similarities in overall genome organization, data from RNA 2 remain insufficient for unambiguous positioning of SLRSV in relation to species/genera in the Comoviridae.

  4. Design, synthesis, and characterization of a protein sequencing reagent yielding amino acid derivatives with enhanced detectability by mass spectrometry.

    PubMed Central

    Aebersold, R.; Bures, E. J.; Namchuk, M.; Goghari, M. H.; Shushan, B.; Covey, T. C.

    1992-01-01

    We report the design, chemical synthesis, and structural and functional characterization of a novel reagent for protein sequence analysis by the Edman degradation, yielding amino acid derivatives rapidly detectable at high sensitivity by ion-evaporation mass spectrometry. We demonstrate that the reagent 3-[4'(ethylene-N,N,N-trimethylamino)phenyl]-2-isothiocyanate is chemically stable and shows coupling and cyclization/cleavage yields comparable to phenylisothiocyanate, the standard reagent in chemical sequence analysis, under conditions typically encountered in manual or automated sequence analysis. Amino acid derivatives generated with this reagent were detectable by ion-evaporation mass spectrometry at the subfemtomole sensitivity level at a pace of one sample per minute. Furthermore, derivatives were identified by their mass, thus permitting the rapid and highly sensitive determination of the molecular nature of modified amino acids. Derivatives of amino acids with acidic, basic, polar, or hydrophobic side chains were reproducibly detectable at comparable sensitivities. The polar nature of the reagent required covalent immobilization of polypeptides prior to automated sequence analysis. This reagent, used in automated sequence analysis, has the potential for overcoming the limitations in sensitivity, speed, and the ability to characterize modified amino acid residues inherent in the chemical sequencing methods that are currently used. PMID:1304351

  5. The complete amino acid sequence of a trypsin inhibitor from Bauhinia variegata var. candida seeds.

    PubMed

    Di Ciero, L; Oliva, M L; Torquato, R; Köhler, P; Weder, J K; Camillo Novello, J; Sampaio, C A; Oliveira, B; Marangoni, S

    1998-11-01

    Trypsin inhibitors of two varieties of Bauhinia variegata seeds have been isolated and characterized. Bauhinia variegata candida trypsin inhibitor (BvcTI) and B. variegata lilac trypsin inhibitor (BvlTI) are proteins with Mr of about 20,000 without free sulfhydryl groups. Amino acid analysis shows a high content of aspartic acid, glutamic acid, serine, and glycine, and a low content of histidine, tyrosine, methionine, and lysine in both inhibitors. Isoelectric focusing for both varieties detected three isoforms (pI 4.85, 5.00, and 5.15), which were resolved by HPLC procedure. The trypsin inhibitors show Ki values of 6.9 and 1.2 nM for BvcTI and BvlTI, respectively. The N-terminal sequences of the three trypsin inhibitor isoforms from both varieties of Bauhinia variegata and the complete amino acid sequence of B. variegata var. candida L. trypsin inhibitor isoform 3 (BvcTI-3) are presented. The sequences have been determined by automated Edman degradation of the reduced and carboxymethylated proteins of the peptides resulting from Staphylococcus aureus protease and trypsin digestion. BvcTI-3 is composed of 167 residues and has a calculated molecular mass of 18,529. Homology studies with other trypsin inhibitors show that BvcTI-3 belongs to the Kunitz family. The putative active site encompasses Arg (63)-Ile (64).

  6. Isolation and sequence of tryptic peptides from the proton-pumping ATPase of the oat plasma membrane.

    PubMed

    Schaller, G E; Sussman, M R

    1988-02-01

    In crude extracts of plant tissue, the M(r) = 100,000 proton-pumping ATPase constitutes less than 0.01% of the total cell protein. A large-scale purification procedure is described that has been used to obtain extensive protein sequence information from this enzyme. Plasma membrane vesicles enriched in ATPase activity were obtained from extracts of oat roots by routine differential and density gradient centrifugation. Following a detergent wash, the ATPase was resolved from other integral membrane proteins by size fractionation at 4 degrees C in the presence of lithium dodecyl sulfate. After carboxymethylation of cysteine residues and removal of detergent, the ATPase was digested with trypsin and resultant peptide fragments separated by reverse phase high performance liquid chromatography. Peptides were recovered with high yield and were readily sequenced by automated Edman degradation on a gas-phase sequencer. Of the eight peptides sequenced, six showed strong homology with known amino acid sequences of the fungal proton-pumping and other cation-transporting ATPases.

  7. The amino acid sequence of Ole e I, the major allergen from olive tree (Olea europaea) pollen.

    PubMed

    Villalba, M; Batanero, E; López-Otín, C; Sánchez, L M; Monsalve, R I; González de la Peña, M A; Lahoz, C; Rodríguez, R

    1993-09-15

    The complete primary structure of the major allergen from Olea europaea (olive tree) pollen, Ole e I (IUIS nomenclature), has been determined. The amino acid sequence was established by automated Edman degradation of the reduced and alkylated molecule as well as of selected fragments obtained by proteolytic digestions. Ole e I contains a single polypeptide chain of 145 amino acid residues with a calculated molecular mass of 16331 Da. No free sulfhydryl groups have been detected in the native protein. The molecule contains a putative glycosylation site. A high degree of microheterogeneity has been observed, mainly centered in the first 33% of the molecule. Comparison of Ole e I sequence with protein sequence databases showed no similarity with other known allergens. However, it has a 36% and 38% sequence identity with the putative polypeptide structures, deduced, respectively, from nucleotide sequences of genes isolated from tomato anthers and corn pollen, which have been suggested to be involved in the growing of the pollen tube. Therefore, the olive tree allergen may be a constitutive protein of the pollen involved in reproductive functions.

  8. Sequencing technologies and genome sequencing.

    PubMed

    Pareek, Chandra Shekhar; Smoczynski, Rafal; Tretyn, Andrzej

    2011-11-01

    The high-throughput - next generation sequencing (HT-NGS) technologies are currently the hottest topic in the field of human and animals genomics researches, which can produce over 100 times more data compared to the most sophisticated capillary sequencers based on the Sanger method. With the ongoing developments of high throughput sequencing machines and advancement of modern bioinformatics tools at unprecedented pace, the target goal of sequencing individual genomes of living organism at a cost of $1,000 each is seemed to be realistically feasible in the near future. In the relatively short time frame since 2005, the HT-NGS technologies are revolutionizing the human and animal genome researches by analysis of chromatin immunoprecipitation coupled to DNA microarray (ChIP-chip) or sequencing (ChIP-seq), RNA sequencing (RNA-seq), whole genome genotyping, genome wide structural variation, de novo assembling and re-assembling of genome, mutation detection and carrier screening, detection of inherited disorders and complex human diseases, DNA library preparation, paired ends and genomic captures, sequencing of mitochondrial genome and personal genomics. In this review, we addressed the important features of HT-NGS like, first generation DNA sequencers, birth of HT-NGS, second generation HT-NGS platforms, third generation HT-NGS platforms: including single molecule Heliscope™, SMRT™ and RNAP sequencers, Nanopore, Archon Genomics X PRIZE foundation, comparison of second and third HT-NGS platforms, applications, advances and future perspectives of sequencing technologies on human and animal genome research.

  9. Complete amino acid sequence of branched-chain amino acid aminotransferase (transaminase B) of Salmonella typhimurium, identification of the coenzyme-binding site and sequence comparison analysis

    SciTech Connect

    Feild, M.J.

    1988-01-01

    The complete amino acid sequence of the subunit of branched-chain amino acid aminotransferase of Salmonella typhimurium was determined by automated Edman degradation of peptide fragments generated by chemical and enzymatic digestion of S-carboxymethylated and S-pyridylethylated transaminase B. Peptide fragments of transaminase B were generated by treatment of the enzyme with trypsin, Staphylococcus aureus V8 protease, endoproteinase Lys-C, and cyanogen bromide. Protocols were developed for separation of the peptide fragments by reverse-phase high performance liquid chromatography (HPLC), ion-exchange HPLC, and SDS-urea gel electrophoresis. The enzyme subunit contains 308 amino acid residues and has a molecular weight of 33,920 daltons. The coenzyme-binding site was determined by treatment of the enzyme, containing bound pyridoxal 5-phosphate, with tritiated sodium borohydride prior to trypsin digestion. Monitoring radioactivity incorporation and peptide map comparisons with an apoenzyme tryptic digest, allowed identification of the pyridoxylated-peptide which was isolated by reverse-phase HPLC and sequenced. The coenzyme-binding site is a lysyl residue at position 159. Some peptides were further characterized by fast atom bombardment mass spectrometry.

  10. Dna Sequencing

    DOEpatents

    Tabor, Stanley; Richardson, Charles C.

    1995-04-25

    A method for sequencing a strand of DNA, including the steps off: providing the strand of DNA; annealing the strand with a primer able to hybridize to the strand to give an annealed mixture; incubating the mixture with four deoxyribonucleoside triphosphates, a DNA polymerase, and at least three deoxyribonucleoside triphosphates in different amounts, under conditions in favoring primer extension to form nucleic acid fragments complementory to the DNA to be sequenced; labelling the nucleic and fragments; separating them and determining the position of the deoxyribonucleoside triphosphates by differences in the intensity of the labels, thereby to determine the DNA sequence.

  11. Bacteriocuprein superoxide dismutase of Photobacterium leiognathi. Isolation and sequence of the gene and evidence for a precursor form.

    PubMed

    Steinman, H M

    1987-02-05

    The gene encoding the bacteriocuprein superoxide dismutase from Photobacterium leiognathi, American Type Culture Collection strain 25521, was cloned in a pUC12 vector and sequenced. The nucleotide sequence predicted a 22-residue leader peptide amino-terminal to the known bacteriocuprein sequence. The expected precursor bacteriocuprein was directly identified in the in vitro translation products of the cloned gene by polyacrylamide gel electrophoresis and automated Edman degradation. Enzymatically active bacteriocuprein that lacked the leader peptide was identified in sonic extracts of Escherichia coli hosts containing the cloned gene. A single transcript of 580 nucleotides was observed in blots of total P. leiognathi RNA, and a unique site of transcriptional initiation was identified by primer extension analysis. P. leiognathi bacteriocuprein is the first bacteriocuprein whose gene has been isolated and sequenced and the first copper-zinc superoxide dismutase in which a leader peptide has been found. The presence of a leader peptide suggests that the bacteriocuprein is localized in the membrane or periplasm, in contrast to the eukaryotic copper-zinc superoxide dismutases, which are cytoplasmic enzymes. Such a difference in intracellular location could be important for understanding the presence and function of the uncommon, bacteriocuprein superoxide dismutase in P. leiognathi.

  12. Complete amino acid sequence of a Lolium perenne (perennial rye grass) pollen allergen, Lol p II.

    PubMed

    Ansari, A A; Shenbagamurthi, P; Marsh, D G

    1989-07-05

    The complete amino acid sequence of a Lolium perenne (rye grass) pollen allergen, Lol p II was determined by automated Edman degradation of the protein and selected fragments. Cleavage of the protein by enzymatic and chemical techniques established an unambiguous sequence for the protein. Lol p II contains 97 amino acid residues, with a calculated molecular weight of 10,882. The protein lacks cysteine and glutamine and shows no evidence of glycosylation. Theoretical predictions by Fraga's (Fraga, S. (1982) Can. J. Chem. 60, 2606-2610) and Hopp and Woods' (Hopp, T. P., and Woods, K. R. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 3824-3828) methods indicate the presence of four hydrophilic regions, which may contribute to sequential or parts of conformational B-cell epitopes. Analysis of amphipathic regions by Berzofsky's method indicates the presence of a highly amphipathic region, which may contain, or contribute to, an Ia/T-cell epitope. This latter segment of Lol p II was found to be highly homologous with an antibody-binding segment of the major rye allergen Lol p I and may explain why immune responsiveness to both the allergens is associated with HLA-DR3.

  13. Plasma-desorption mass spectrometry as an aid in protein sequence determination. Application of the method on a cuticular protein from the migratory locust (Locusta migratoria).

    PubMed Central

    Klarskov, K; Højrup, P; Andersen, S O; Roepstorff, P

    1989-01-01

    The complete amino acid sequence of a structural protein, protein 8, isolated from the pharate cuticle of the locust Locusta migratoria was determined. Protein 8 contains 148 amino acid residues and has an Mr of 15,224. By the extensive use of information obtained by plasma-desorption mass spectrometry (p.d.m.s.) it was possible to reduce the need for conventional sequence determination and to improve the reliability of the results. On the basis of the determined Mr of the intact protein all the peptides that constitute the complete sequence could be isolated from a time-course enzymic digestion. The isolated peptides were sequenced by using a combination of Edman degradation and carboxypeptidase digestion monitored by p.d.m.s. The alignment of the peptides was established from the time-course digestion and further verified by a second enzymic digestion. The primary structure of the protein consists of two hydrophilic and two hydrophobic regions. The hydrophobic regions are enriched in alanine, valine and proline and dominated by a repetitive sequence Ala-Ala-Pro-(Ala/Val). The sequence strengthens the view that the cuticle proteins belong to a unique family of structural proteins. PMID:2590176

  14. Rapid 'de novo' peptide sequencing by a combination of nanoelectrospray, isotopic labeling and a quadrupole/time-of-flight mass spectrometer.

    PubMed

    Shevchenko, A; Chernushevich, I; Ens, W; Standing, K G; Thomson, B; Wilm, M; Mann, M

    1997-01-01

    Protein microanalysis usually involves the sequencing of gel-separated proteins available in very small amounts. While mass spectrometry has become the method of choice for identifying proteins in databases, in almost all laboratories 'de novo' protein sequencing is still performed by Edman degradation. Here we show that a combination of the nanoelectrospray ion source, isotopic end labeling of peptides and a quadrupole/ time-of-flight instrument allows facile read-out of the sequences of tryptic peptides. Isotopic labeling was performed by enzymatic digestion of proteins in 1:1 16O/18O water, eliminating the need for peptide derivatization. A quadrupole/time-of-flight mass spectrometer was constructed from a triple quadrupole and an electrospray time-of-flight instrument. Tandem mass spectra of peptides were obtained with better than 50 ppm mass accuracy and resolution routinely in excess of 5000. Unique and error tolerant identification of yeast proteins as well as the sequencing of a novel protein illustrate the potential of the approach. The high data quality in tandem mass spectra and the additional information provided by the isotopic end labeling of peptides enabled automated interpretation of the spectra via simple software algorithms. The technique demonstrated here removes one of the last obstacles to routine and high throughput protein sequencing by mass spectrometry.

  15. MSLICE Sequencing

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Norris, Jeffrey S.; Morris, John R.

    2011-01-01

    MSLICE Sequencing is a graphical tool for writing sequences and integrating them into RML files, as well as for producing SCMF files for uplink. When operated in a testbed environment, it also supports uplinking these SCMF files to the testbed via Chill. This software features a free-form textural sequence editor featuring syntax coloring, automatic content assistance (including command and argument completion proposals), complete with types, value ranges, unites, and descriptions from the command dictionary that appear as they are typed. The sequence editor also has a "field mode" that allows tabbing between arguments and displays type/range/units/description for each argument as it is edited. Color-coded error and warning annotations on problematic tokens are included, as well as indications of problems that are not visible in the current scroll range. "Quick Fix" suggestions are made for resolving problems, and all the features afforded by modern source editors are also included such as copy/cut/paste, undo/redo, and a sophisticated find-and-replace system optionally using regular expressions. The software offers a full XML editor for RML files, which features syntax coloring, content assistance and problem annotations as above. There is a form-based, "detail view" that allows structured editing of command arguments and sequence parameters when preferred. The "project view" shows the user s "workspace" as a tree of "resources" (projects, folders, and files) that can subsequently be opened in editors by double-clicking. Files can be added, deleted, dragged-dropped/copied-pasted between folders or projects, and these operations are undoable and redoable. A "problems view" contains a tabular list of all problems in the current workspace. Double-clicking on any row in the table opens an editor for the appropriate sequence, scrolling to the specific line with the problem, and highlighting the problematic characters. From there, one can invoke "quick fix" as described

  16. Insertion Sequences

    PubMed Central

    Mahillon, Jacques; Chandler, Michael

    1998-01-01

    Insertion sequences (ISs) constitute an important component of most bacterial genomes. Over 500 individual ISs have been described in the literature to date, and many more are being discovered in the ongoing prokaryotic and eukaryotic genome-sequencing projects. The last 10 years have also seen some striking advances in our understanding of the transposition process itself. Not least of these has been the development of various in vitro transposition systems for both prokaryotic and eukaryotic elements and, for several of these, a detailed understanding of the transposition process at the chemical level. This review presents a general overview of the organization and function of insertion sequences of eubacterial, archaebacterial, and eukaryotic origins with particular emphasis on bacterial elements and on different aspects of the transposition mechanism. It also attempts to provide a framework for classification of these elements by assigning them to various families or groups. A total of 443 members of the collection have been grouped in 17 families based on combinations of the following criteria: (i) similarities in genetic organization (arrangement of open reading frames); (ii) marked identities or similarities in the enzymes which mediate the transposition reactions, the recombinases/transposases (Tpases); (iii) similar features of their ends (terminal IRs); and (iv) fate of the nucleotide sequence of their target sites (generation of a direct target duplication of determined length). A brief description of the mechanism(s) involved in the mobility of individual ISs in each family and of the structure-function relationships of the individual Tpases is included where available. PMID:9729608

  17. Protein identification with N and C-terminal sequence tags in proteome projects.

    PubMed

    Wilkins, M R; Gasteiger, E; Tonella, L; Ou, K; Tyler, M; Sanchez, J C; Gooley, A A; Walsh, B J; Bairoch, A; Appel, R D; Williams, K L; Hochstrasser, D F

    1998-05-08

    Genome sequences are available for increasing numbers of organisms. The proteomes (protein complement expressed by the genome) of many such organisms are being studied with two-dimensional (2D) gel electrophoresis. Here we have investigated the application of short N-terminal and C-terminal sequence tags to the identification of proteins separated on 2D gels. The theoretical N and C termini of 15, 519 proteins, representing all SWISS-PROT entries for the organisms Mycoplasma genitalium, Bacillus subtilis, Escherichia coli, Saccharomyces cerevisiae and human, were analysed. Sequence tags were found to be surprisingly specific, with N-terminal tags of four amino acid residues found to be unique for between 43% and 83% of proteins, and C-terminal tags of four amino acid residues unique for between 74% and 97% of proteins, depending on the species studied. Sequence tags of five amino acid residues were found to be even more specific. To utilise this specificity of sequence tags for protein identification, we created a world-wide web-accessible protein identification program, TagIdent (http://www.expasy.ch/www/tools.html), which matches sequence tags of up to six amino acid residues as well as estimated protein pI and mass against proteins in the SWISS-PROT database. We demonstrate the utility of this identification approach with sequence tags generated from 91 different E. coli proteins purified by 2D gel electrophoresis. Fifty-one proteins were unambiguously identified by virtue of their sequence tags and estimated pI and mass, and a further 11 proteins identified when sequence tags were combined with protein amino acid composition data. We conlcude that the TagIdent identification approach is best suited to the identification of proteins from prokaryotes whose complete genome sequences are available. The approach is less well suited to proteins from eukaryotes, as many eukaryotic proteins are not amenable to sequencing via Edman degradation, and tag protein

  18. Isolation, characterization, and cDNA sequencing of alpha-1-antiproteinase-like protein from rainbow trout seminal plasma.

    PubMed

    Mak, Monika; Mak, Paweł; Olczak, Mariusz; Szalewicz, Agata; Glogowski, Jan; Dubin, Adam; Watorek, Wiesław; Ciereszko, Andrzej

    2004-03-17

    Seminal plasma of teleost fish contains serine proteinase inhibitors related to those present in blood. These inhibitors can be bound to Q-Sepharose and sequentially eluted with a NaCl gradient. In the present study, using a two-step procedure, we purified (73-fold to homogeneity) and characterized the inhibitor eluted as the second fraction of antitrypsin activity (inhibitor II) from Q-Sepharose. The molecular weight of this inhibitor was estimated to be 56 kDa with an isoelectric point of 5.4. It effectively inhibited trypsin and chymotrypsin but was less effective against elastase. It formed SDS-stable complexes with cod and bovine trypsin. Inhibitor II appeared to be a glycoprotein. Carbohydrate content was determined to be 16%. N-terminal Edman sequencing allowed identification of the first 30 N-terminal amino acids HDGDHAGHTEDHHHHLHHIAGEAHPQHSHG and 25 amino acids within the reactive loop IMPMSLPDTIMLNRPFLLFILEDST. The N-terminal sequence did not match any known sequence, however, the sequence within the reactive loop was significantly similar to carp and mammalian alpha1-antiproteinases. Both sequences were used to construct primers and obtain a cDNA sequence from liver. The mRNA coding the protein is 1675 nt in length including a single open reading frame of 1281 nt that encodes 426 amino acid residues. Analysis of this sequence indicated the presence of putative conserved serpin domains and confirmed the similarity to carp alpha1-antiproteinase and mammalian alpha1-antiproteinase. Our results indicate that inhibitor II belongs to the serpin superfamily and is similar to alpha1-antiproteinase.

  19. Complete primary structure of a Lolium perenne (perennial rye grass) pollen allergen, Lol p III: comparison with known Lol p I and II sequences.

    PubMed

    Ansari, A A; Shenbagamurthi, P; Marsh, D G

    1989-10-17

    The complete amino acid sequence of a Lolium perenne (rye grass) pollen allergen, Lol p III, determined by the automated Edman degradation of the protein and its selected fragments, is reported in this paper. Cleavage by enzymatic and chemical techniques established unambiguously the sequence for this 97-residue protein (Mr = 10,909), which lacks cysteine and shows no evidence of glycosylation. The sequence of Lol p III is very similar to that of another L. perenne allergen, Lol p II, which was sequenced recently; of the 97 positions in the two proteins, 57 are occupied by identical amino acids (59% identity). In addition, both allergens share a similar structure with an antibody-binding fragment of a third L. perenne allergen, Lol p I. Since human antibody responsiveness to all these three allergens is associated with HLA-DR3, and since the structure common to the three molecules shows high degrees of amphipathicity in Lol p II and III, we speculate that this common segment in the three molecules might contain or contribute to the respectively Ia/T-cell sites.

  20. Production, purification, sequencing and activity spectra of mutacins D-123.1 and F-59.1

    PubMed Central

    2011-01-01

    Background The increase in bacterial resistance to antibiotics impels the development of new anti-bacterial substances. Mutacins (bacteriocins) are small antibacterial peptides produced by Streptococcus mutans showing activity against bacterial pathogens. The objective of the study was to produce and characterise additional mutacins in order to find new useful antibacterial substances. Results Mutacin F-59.1 was produced in liquid media by S. mutans 59.1 while production of mutacin D-123.1 by S. mutans 123.1 was obtained in semi-solid media. Mutacins were purified by hydrophobic chromatography. The amino acid sequences of the mutacins were obtained by Edman degradation and their molecular mass was determined by mass spectrometry. Mutacin F-59.1 consists of 25 amino acids, containing the YGNGV consensus sequence of pediocin-like bacteriocins with a molecular mass calculated at 2719 Da. Mutacin D-123.1 has an identical molecular mass (2364 Da) with the same first 9 amino acids as mutacin I. Mutacins D-123.1 and F-59.1 have wide activity spectra inhibiting human and food-borne pathogens. The lantibiotic mutacin D-123.1 possesses a broader activity spectrum than mutacin F-59.1 against the bacterial strains tested. Conclusion Mutacin F-59.1 is the first pediocin-like bacteriocin identified and characterised that is produced by Streptococcus mutans. Mutacin D-123.1 appears to be identical to mutacin I previously identified in different strains of S. mutans. PMID:21477375

  1. Novel proline-hydroxyproline glycopeptides from the dandelion (Taraxacum officinale Wigg.) flowers: de novo sequencing and biological activity.

    PubMed

    Astafieva, Alexandra A; Enyenihi, Atim A; Rogozhin, Eugene A; Kozlov, Sergey A; Grishin, Eugene V; Odintsova, Tatyana I; Zubarev, Roman A; Egorov, Tsezi A

    2015-09-01

    Two novel homologous peptides named ToHyp1 and ToHyp2 that show no similarity to any known proteins were isolated from Taraxacum officinale Wigg. flowers by multidimensional liquid chromatography. Amino acid and mass spectrometry analyses demonstrated that the peptides have unusual structure: they are cysteine-free, proline-hydroxyproline-rich and post-translationally glycosylated by pentoses, with 5 carbohydrates in ToHyp2 and 10 in ToHyp1. The ToHyp2 peptide with a monoisotopic molecular mass of 4350.3Da was completely sequenced by a combination of Edman degradation and de novo sequencing via top down multistage collision induced dissociation (CID) and higher energy dissociation (HCD) tandem mass spectrometry (MS(n)). ToHyp2 consists of 35 amino acids, contains eighteen proline residues, of which 8 prolines are hydroxylated. The peptide displays antifungal activity and inhibits growth of Gram-positive and Gram-negative bacteria. We further showed that carbohydrate moieties have no significant impact on the peptide structure, but are important for antifungal activity although not absolutely necessary. The deglycosylated ToHyp2 peptide was less active against the susceptible fungus Bipolaris sorokiniana than the native peptide. Unique structural features of the ToHyp2 peptide place it into a new family of plant defense peptides. The discovery of ToHyp peptides in T. officinale flowers expands the repertoire of molecules of plant origin with practical applications.

  2. Purification and sequencing of radish seed calmodulin antagonists phosphorylated by calcium-dependent protein kinase.

    PubMed Central

    Polya, G M; Chandra, S; Condron, R

    1993-01-01

    A family of radish (Raphanus sativus) calmodulin antagonists (RCAs) was purified from seeds by extraction, centrifugation, batch-wise elution from carboxymethyl-cellulose, and high performance liquid chromatography (HPLC) on an SP5PW cation-exchange column. This RCA fraction was further resolved into three calmodulin antagonist polypeptides (RCA1, RCA2, and RCA3) by denaturation in the presence of guanidinium HCl and mercaptoethanol and subsequent reverse-phase HPLC on a C8 column eluted with an acetonitrile gradient in the presence of 0.1% trifluoroacetic acid. The RCA preparation, RCA1, RCA2, RCA3, and other radish seed proteins are phosphorylated by wheat embryo Ca(2+)-dependent protein kinase (CDPK). The RCA preparation contains other CDPK substrates in addition to RCA1, RCA2, and RCA3. The RCA preparation, RCA1, RCA2, and RCA3 inhibit chicken gizzard calmodulin-dependent myosin light chain kinase assayed with a myosin-light chain-based synthetic peptide substrate (fifty percent inhibitory concentrations of RCA2 and RCA3 are about 7 and 2 microM, respectively). N-terminal sequencing by sequential Edman degradation of RCA1, RCA2, and RCA3 revealed sequences having a high homology with the small subunit of the storage protein napin from Brassica napus and with related proteins. The deduced amino acid sequences of RCA1, RCA2, RCA3, and RCA3' (a subform of RCA3) have agreement with average molecular masses from electrospray mass spectrometry of 4537, 4543, 4532, and 4560 kD, respectively. The only sites for serine phosphorylation are near or at the C termini and hence adjacent to the sites of proteolytic precursor cleavage. PMID:8278508

  3. The sequence of sequencers: The history of sequencing DNA.

    PubMed

    Heather, James M; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way.

  4. Amino acid sequence, S-S bridge arrangement and distribution in plant tissues of thionins from Viscum album.

    PubMed

    Orrù, S; Scaloni, A; Giannattasio, M; Urech, K; Pucci, P; Schaller, G

    1997-09-01

    The complete primary structure of a cytotoxic 5 kDa polypeptide, viscotoxin A1, isolated from Viscum album L., has been determined by combining classical Edman degradation methodology with advanced mass spectrometric procedures. The same integrated approach allowed correction of the sequence of viscotoxin A2 and definition of the pattern of the disulfide bridges. The arrangement of the cysteine pairing was determined as Cys3-Cys40, Cys4-Cys32 and Cys16-Cys26. The primary structure of viscotoxin A1 shares a high degree of similarity with the known viscotoxins and more generally with the plant alpha- and beta-thionins. The pattern of S-S bridges determined for viscotoxin A2 and A1 is similar to that inferred by X-ray and NMR analysis in crambin and related to that present in alpha-purothionin and beta-hordothionin, thus indicating a highly conserved organization of the S-S pairings within the entire family. This arrangement of S-S bridges describes a peculiar structural motif, indicated as 'concentric motif', which is suggested to stabilize a common structure occurring in various small proteins able to interact with cell membranes. The distribution of the new variant toxin in different mistletoe subspecies was investigated. Viscotoxin A1 is abundant in the seeds of the three European subspecies of V. album whereas it represents a minor component in the shoots.

  5. Hydrogen ion titration of 12 S rape seed protein and partial N-terminal sequence of one of it's subunits.

    PubMed

    Bhushan, R; Mahesh, V K; Mallikharjun, P V

    1989-10-01

    The high molecular weight 12 S protein from rape seed was isolated in a homogeneous form and characterized. Six subunits were isolated by PAGE in the presence of SDS and 0.2 M 2-mercaptoethanol. These subunits (s1 to s6) were found in the protein in the weight ratio of 1.32:1.2:1.15:1.0:1.21:1.11. The molecular weights and first two N-terminal amino acids of the isolated subunits were 64,800 and phenylalanine, alanine (s1), 50,650 and valine, tyrosine (s2), 42,500 and phenylalanine, leucine (s3), 28,800 and threonine, glutamic acid (s4), 19,100 and cystine, isoleucine (s5) and 15,600 and alanine, phenylalanine (s6). The number of side chain carboxyl, imidazole and epsilon-amino groups were calculated from the hydrogen ion titrations, which were in agreement with the amino acid assay. Besides, the N-terminal amino acid sequence upto 43 residues for one subunit (s6) is reported using Edman degradation.

  6. Isolation, amino acid sequence and biological characterization of an "aspartic-49" phospholipase A₂ from Bothrops (Rhinocerophis) ammodytoides venom.

    PubMed

    Clement, Herlinda; Costa de Oliveira, Vanessa; Zamudio, Fernando Z; Lago, Néstor R; Valdez-Cruz, Norma A; Bérnard Valle, Melisa; Hajos, Silvia E; Alagón, Alejandro; Possani, Lourival D; de Roodt, Adolfo R

    2012-12-01

    A phospholipase enzyme was separated by chromatography from the venom of the snake Bothrops (Rhinocerophis) ammodytoides and characterized. The experimentally determined molecular weight was 13,853.65 Da, and the full primary structure was determined by Edman degradation and mass spectrometry analysis. The enzyme contains 122 amino acids residues closely stabilized by 7 disulfide bridges with an isoelectric point of 6.13. Sequence comparison with other known secretory PLA2 shows that the enzyme isolated belongs to the group II, presenting an aspartic acid residue at position 48 (numbered by convention as Asp49) of the active site, and accordingly displaying enzymatic activity. The enzyme corresponds to 3% of the total mass of the venom. The enzyme is mildly toxic to mice. The intravenous LD₅₀ of this phospholipase in CD-1 mice was around 6 μg/g of mouse body weight (more exactly 117 μg/mouse of 20 g) and the minimal mortal dose (MMD) was estimated to be close to 10 μg/g. In contrast, the LD₅₀ of the venom was circa 2 μg/g mouse body weight. Toxicological analyses of the purified enzyme were performed in vitro and in vivo using experimental animals (mice and rats). The enzyme at high doses caused pulmonary congestion, intraperitoneal bleeding, inhibition of clot retraction and muscle tissue alterations with increasing of creatine kinase levels.

  7. Contrasting Sequence Groups by Emerging Sequences

    NASA Astrophysics Data System (ADS)

    Deng, Kang; Zaïane, Osmar R.

    Group comparison per se is a fundamental task in many scientific endeavours but is also the basis of any classifier. Contrast sets and emerging patterns contrast between groups of categorical data. Comparing groups of sequence data is a relevant task in many applications. We define Emerging Sequences (ESs) as subsequences that are frequent in sequences of one group and less frequent in the sequences of another, and thus distinguishing or contrasting sequences of different classes. There are two challenges to distinguish sequence classes: the extraction of ESs is not trivially efficient and only exact matches of sequences are considered. In our work we address those problems by a suffix tree-based framework and a similar matching mechanism. We propose a classifier based on Emerging Sequences. Evaluating against two learning algorithms based on frequent subsequences and exact matching subsequences, the experiments on two datasets show that our model outperforms the baseline approaches by up to 20% in prediction accuracy.

  8. Shotgun protein sequencing.

    SciTech Connect

    Faulon, Jean-Loup Michel; Heffelfinger, Grant S.

    2009-06-01

    A novel experimental and computational technique based on multiple enzymatic digestion of a protein or protein mixture that reconstructs protein sequences from sequences of overlapping peptides is described in this SAND report. This approach, analogous to shotgun sequencing of DNA, is to be used to sequence alternative spliced proteins, to identify post-translational modifications, and to sequence genetically engineered proteins.

  9. Isolation, Amino Acid Sequence and Biological Activities of Novel Long-Chain Polyamine-Associated Peptide Toxins from the Sponge Axinyssa aculeata

    PubMed Central

    Matsunaga, Satoko; Jimbo, Mitsuru; Gill, Martin B.; Lash-Van Wyhe, L. Leanne; Murata, Michio; Nonomura, Ken’ichi; Swanson, Geoffrey T.

    2012-01-01

    A novel family of functionalized peptide toxins, aculeines (ACUs), was isolated from the marine sponge Axinyssa aculeate. ACUs are polypeptides with N-terminal residues that are modified by the addition of long-chain polyamines (LCPA). Aculeines were present in the sponge extract as a complex mixture with differing polyamine chain lengths and peptide structures. ACU-A and B, which were purified in this study, share a common polypeptide chain but differ in their N-terminal residue modifications. The amino acid sequence of the polypeptide portion of ACU-A and B was deduced from 3′ and 5′ RACE, and supported by Edman degradation and mass spectral analysis of peptide fragments. ACU induced convulsions upon intracerebroventricular (i.c.v.) injection in mice, and disrupted neuronal membrane integrity in electrophysiological assays. ACU also lysed erythrocytes with a potency that differed between animal species. Here we describe the isolation, amino acid sequence, and biological activity of this new group of cytotoxic sponge peptides. PMID:21830292

  10. Complete amino acid sequence of an acidic, cardiotoxic phospholipase A2 from the venom of Ophiophagus hannah (King Cobra): a novel cobra venom enzyme with "pancreatic loop".

    PubMed

    Huang, M Z; Gopalakrishnakone, P; Chung, M C; Kini, R M

    1997-02-15

    A phospholipase A2 (OHV A-PLA2) from the venom of Ophiophagus hannah (King cobra) is an acidic protein exhibiting cardiotoxicity, myotoxicity, and antiplatelet activity. The complete amino acid sequence of OHV A-PLA2 has been determined using a combination of Edman degradation and mass spectrometric techniques. OHV A-PLA2 is composed of a single chain of 124 amino acid residues with 14 cysteines and a calculated molecular weight of 13719 Da. It contains the loop of residues (62-66) found in pancreatic PLA2s and hence belongs to class IB enzymes. This pancreatic loop is between two proline residues (Pro 59 and Pro 68) and contains several hydrophilic amino acids (Ser and Asp). This region has high degree of conformational flexibility and is on the surface of the molecule, and hence it may be a potential protein-protein interaction site. A relatively low sequence homology is found between OHV A-PLA2 and other known cardiotoxic PLA2s, and hence a contiguous segment could not be identified as a site responsible for the cardiotoxic activity.

  11. De novo sequencing and characterization of a novel Bowman-Birk inhibitor from Lathyrus sativus L. seeds by electrospray mass spectrometry.

    PubMed

    Tamburino, Rachele; Severino, Valeria; Sandomenico, Annamaria; Ruvo, Menotti; Parente, Augusto; Chambery, Angela; Di Maro, Antimo

    2012-10-30

    Bowman-Birk serine protease inhibitors (BBIs) from legume seeds are small proteins showing a two-head structure with distinct reactive site loops, which inhibit two molecules of the same enzyme or two different proteases. Purification and characterization of new BBIs is of broad interest for understanding the basic molecular mechanisms underlying natural defence against the action of proteolytic enzymes. In this study, two novel acidic BBIs (LSI-1a and LSI-2a) were isolated from L. sativus seeds using classical biochemical techniques and characterized for their inhibitory activity. In addition, the N-terminal sequencing of LSI-1a was performed by Edman degradation up to residue 10 and the complete primary structure of the most abundant form (LSI-2a) was determined by using a combination of mass spectrometry approaches, including MALDI-TOF MS, tandem MS and Electron Transfer Dissociation coupled with Proton Transfer Reaction (ETD/PTR) top-down sequencing of N- and C-termini. Furthermore, the LSI-2a dimerization surface has also been investigated by a combination of gel filtration, electrophoretic techniques and homology modelling. Knowing the structure of small proteins inhibiting proteolytic enzymes is of general importance for understanding the defence mechanisms against degradation for their use in biological applications as well as for designing artificial inhibitors.

  12. The sequence of sequencers: The history of sequencing DNA

    PubMed Central

    Heather, James M.; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. PMID:26554401

  13. Multimodal sequence learning.

    PubMed

    Kemény, Ferenc; Meier, Beat

    2016-02-01

    While sequence learning research models complex phenomena, previous studies have mostly focused on unimodal sequences. The goal of the current experiment is to put implicit sequence learning into a multimodal context: to test whether it can operate across different modalities. We used the Task Sequence Learning paradigm to test whether sequence learning varies across modalities, and whether participants are able to learn multimodal sequences. Our results show that implicit sequence learning is very similar regardless of the source modality. However, the presence of correlated task and response sequences was required for learning to take place. The experiment provides new evidence for implicit sequence learning of abstract conceptual representations. In general, the results suggest that correlated sequences are necessary for implicit sequence learning to occur. Moreover, they show that elements from different modalities can be automatically integrated into one unitary multimodal sequence.

  14. Coordinate cytokine regulatory sequences

    DOEpatents

    Frazer, Kelly A.; Rubin, Edward M.; Loots, Gabriela G.

    2005-05-10

    The present invention provides CNS sequences that regulate the cytokine gene expression, expression cassettes and vectors comprising or lacking the CNS sequences, host cells and non-human transgenic animals comprising the CNS sequences or lacking the CNS sequences. The present invention also provides methods for identifying compounds that modulate the functions of CNS sequences as well as methods for diagnosing defects in the CNS sequences of patients.

  15. Amino acid sequence and chemical modification of a novel alpha-neurotoxin (Oh-5) from king cobra (Ophiophagus hannah) venom.

    PubMed

    Lin, S R; Leu, L F; Chang, L S; Chang, C C

    1997-04-01

    A novel alpha-neurotoxin, Oh-5, was isolated from king cobra (Ophiophagus hannah) venom and purified by successive SP-Sephadex C-25 column chromatography and reversed-phase HPLC. The complete sequence of Oh-5 was determined by Edman degradation of peptide fragments generated by endopeptidases, i.e., trypsin, Saccharomyces aureus V8 protease and lysyl endopeptidase. This novel toxin comprises 72 amino acid residues with 10 cysteines. The sequence shows 89% sequence homology with Oh-4, and 60% with Toxins a and b from the same venom. The tyrosine, tryptophan, lysine and arginine residues in Oh-5 were modified with tetranitromethane (TNM), 2-nitrophenylsulfenyl (NPS) chloride, trinitrobenzene sulfonate (TNBS), and p-hydroxyphenylglyoxal (HPG), respectively. Modification of Tyr-4 or Trp-27 did not affect the lethal toxicity at all, while the Tyr-4 and 23 nitrated derivative retained about 50% of the lethality of native toxin. Selective trinitrophenylation of Lys-51 or 69 resulted in a decrease in lethality by 29%, and 50% lethality was retained after modification of Lys-2, 51, and 69. A drastic decrease in lethality to 26% was observed when both Arg-35 and 37 were modified. The neurotoxicity was further decreased when Arg-9 was additionally modified. These results suggest that the aromatic residues, Tyr-4 and Trp-27, are not crucial for the neurotoxicity, whereas the cationic residues are involved in multipoint contact between the toxin molecule and the nicotinic acetylcholine receptor (nAChR). The residues Tyr-23 and Arg-35 and 37 in the central loop of Oh-5 seem to contribute greatly to the neurotoxicity.

  16. MRO Sequence Checking Tool

    NASA Technical Reports Server (NTRS)

    Fisher, Forest; Gladden, Roy; Khanampornpan, Teerapat

    2008-01-01

    The MRO Sequence Checking Tool program, mro_check, automates significant portions of the MRO (Mars Reconnaissance Orbiter) sequence checking procedure. Though MRO has similar checks to the ODY s (Mars Odyssey) Mega Check tool, the checks needed for MRO are unique to the MRO spacecraft. The MRO sequence checking tool automates the majority of the sequence validation procedure and check lists that are used to validate the sequences generated by MRO MPST (mission planning and sequencing team). The tool performs more than 50 different checks on the sequence. The automation varies from summarizing data about the sequence needed for visual verification of the sequence, to performing automated checks on the sequence and providing a report for each step. To allow for the addition of new checks as needed, this tool is built in a modular fashion.

  17. The Connell Sum Sequence

    NASA Astrophysics Data System (ADS)

    Bullington, Grady D.

    2007-01-01

    The Connell sum sequence refers to the partial sums of the Connell sequence. In this paper, the Connell sequence, Connell sum sequence and generalizations from Iannucci and Mills-Taylor are interpreted as sums of elements of triangles, relating them to polygonal number-stuttered arithmetic progressions. The n-th element of the Connell sum sequence is established as a sharp upper bound for the value of a gamma-labeling of a graph of size n. The limiting behavior and a explicit formula for the Connell (m,r)-sum sequence are also given.

  18. Genome Sequence Databases (Overview): Sequencing and Assembly

    SciTech Connect

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  19. Homology of the NH2-terminal amino acid sequences of the heavy and light chains of human monoclonal lupus autoantibodies containing the dominant 16/6 idiotype.

    PubMed Central

    Atkinson, P M; Lampman, G W; Furie, B C; Naparstek, Y; Schwartz, R S; Stollar, B D; Furie, B

    1985-01-01

    The NH2-terminal amino acid sequences have been determined by automated Edman degradation for the heavy and light chains of five monoclonal IgM anti-DNA autoantibodies that were produced by human-human hybridomas derived from lymphocytes of two patients with systemic lupus erythematosus. Four of the antibodies were closely related to the idiotype system 16/6, whereas the fifth antibody was unrelated idiotypically. The light chains of the 16/6 idiotype-positive autoantibodies (HF2-1/13b, HF2-1/17, HF2-18/2, and HF3-16/6) had identical amino acid sequences from residues 1 to 40. Their framework structures were characteristic of VKI light chains. The light chain of the 16/6 idiotype-negative autoantibody HF6-21/28 was characteristic of the VKII subgroup. The heavy chains of the 16/6 idiotype-positive autoantibodies had nearly identical amino acid sequences from residues 1 to 40. The framework structures were characteristic of the VHIII subgroup. In contrast, the GM4672 fusion partner of the hybridoma produced small quantities of an IgG with a VHI heavy chain and a VKI light chain. The heavy chains of the lupus autoantibodies and the light chains of those autoantibodies that were idiotypically related to the 16/6 system had marked sequence homology with WEA, a Waldenstrom IgM that binds to Klebsiella polysaccharides and expresses the 16/6 idiotype. These results indicate a striking homology in the amino termini of the heavy and light chains of the lupus autoantibodies studied and suggest that the V regions of the heavy and light chains of the 16/6 idiotype-positive DNA-binding lupus auto-antibodies are each encoded by a single germ line gene. PMID:3921567

  20. Automated DNA Sequencing System

    SciTech Connect

    Armstrong, G.A.; Ekkebus, C.P.; Hauser, L.J.; Kress, R.L.; Mural, R.J.

    1999-04-25

    Oak Ridge National Laboratory (ORNL) is developing a core DNA sequencing facility to support biological research endeavors at ORNL and to conduct basic sequencing automation research. This facility is novel because its development is based on existing standard biology laboratory equipment; thus, the development process is of interest to the many small laboratories trying to use automation to control costs and increase throughput. Before automation, biology Laboratory personnel purified DNA, completed cycle sequencing, and prepared 96-well sample plates with commercially available hardware designed specifically for each step in the process. Following purification and thermal cycling, an automated sequencing machine was used for the sequencing. A technician handled all movement of the 96-well sample plates between machines. To automate the process, ORNL is adding a CRS Robotics A- 465 arm, ABI 377 sequencing machine, automated centrifuge, automated refrigerator, and possibly an automated SpeedVac. The entire system will be integrated with one central controller that will direct each machine and the robot. The goal of this system is to completely automate the sequencing procedure from bacterial cell samples through ready-to-be-sequenced DNA and ultimately to completed sequence. The system will be flexible and will accommodate different chemistries than existing automated sequencing lines. The system will be expanded in the future to include colony picking and/or actual sequencing. This discrete event, DNA sequencing system will demonstrate that smaller sequencing labs can achieve cost-effective the laboratory grow.

  1. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-02-20

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  2. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-01-01

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  3. Sequence information signal processor

    DOEpatents

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1999-01-01

    An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.

  4. Nonparametric Combinatorial Sequence Models

    NASA Astrophysics Data System (ADS)

    Wauthier, Fabian L.; Jordan, Michael I.; Jojic, Nebojsa

    This work considers biological sequences that exhibit combinatorial structures in their composition: groups of positions of the aligned sequences are "linked" and covary as one unit across sequences. If multiple such groups exist, complex interactions can emerge between them. Sequences of this kind arise frequently in biology but methodologies for analyzing them are still being developed. This paper presents a nonparametric prior on sequences which allows combinatorial structures to emerge and which induces a posterior distribution over factorized sequence representations. We carry out experiments on three sequence datasets which indicate that combinatorial structures are indeed present and that combinatorial sequence models can more succinctly describe them than simpler mixture models. We conclude with an application to MHC binding prediction which highlights the utility of the posterior distribution induced by the prior. By integrating out the posterior our method compares favorably to leading binding predictors.

  5. Roles of repetitive sequences

    SciTech Connect

    Bell, G.I.

    1991-12-31

    The DNA of higher eukaryotes contains many repetitive sequences. The study of repetitive sequences is important, not only because many have important biological function, but also because they provide information on genome organization, evolution and dynamics. In this paper, I will first discuss some generic effects that repetitive sequences will have upon genome dynamics and evolution. In particular, it will be shown that repetitive sequences foster recombination among, and turnover of, the elements of a genome. I will then consider some examples of repetitive sequences, notably minisatellite sequences and telomere sequences as examples of tandem repeats, without and with respectively known function, and Alu sequences as an example of interspersed repeats. Some other examples will also be considered in less detail.

  6. DNA sequencing conference, 2

    SciTech Connect

    Cook-Deegan, R.M.; Venter, J.C.; Gilbert, W.; Mulligan, J.; Mansfield, B.K.

    1991-06-19

    This conference focused on DNA sequencing, genetic linkage mapping, physical mapping, informatics and bioethics. Several were used to study this sequencing and mapping. This article also discusses computer hardware and software aiding in the mapping of genes.

  7. Schur monotone decreasing sequences

    NASA Astrophysics Data System (ADS)

    Ganikhodjaev, Rasul; Saburov, Mansoor; Saburov, Khikmat

    2013-09-01

    In this paper, we introduce Schur monotone decreasing sequences in an n-dimensional space by considering a majorization pre-order. By means of down arrow mappings, we study omega limiting points of bounded Schur monotone decreasing sequences. We provide convergence criteria for such kinds of sequences. We prove that a Cesaro mean (or an arithmetic mean) of any bounded Schur monotone decreasing sequences converges to a unique limiting point.

  8. Career Academy Course Sequences.

    ERIC Educational Resources Information Center

    Markham, Thom; Lenz, Robert

    This career academy course sequence guide is designed to give teachers a quick overview of the course sequences of well-known career academy and career pathway programs from across the country. The guide presents a variety of sample course sequences for the following academy themes: (1) arts and communication; (2) business and finance; (3)…

  9. Low autocorrelation binary sequences

    NASA Astrophysics Data System (ADS)

    Packebusch, Tom; Mertens, Stephan

    2016-04-01

    Binary sequences with minimal autocorrelations have applications in communication engineering, mathematics and computer science. In statistical physics they appear as groundstates of the Bernasconi model. Finding these sequences is a notoriously hard problem, that so far can be solved only by exhaustive search. We review recent algorithms and present a new algorithm that finds optimal sequences of length N in time O(N {1.73}N). We computed all optimal sequences for N≤slant 66 and all optimal skewsymmetric sequences for N≤slant 119.

  10. HIV Sequence Databases

    PubMed Central

    Kuiken, Carla; Korber, Bette; Shafer, Robert W.

    2008-01-01

    Two important databases are often used in HIV genetic research, the HIV Sequence Database in Los Alamos, which collects all sequences and focuses on annotation and data analysis, and the HIV RT/Protease Sequence Database in Stanford, which collects sequences associated with the development of viral resistance against anti-retroviral drugs and focuses on analysis of those sequences. The types of data and services these two databases offer, the tools they provide, and the way they are set up and operated are described in detail. PMID:12875108

  11. Purification and N-terminal amino acid sequence comparisons of structural proteins from retrovirus-D/Washington and Mason-Pfizer monkey virus.

    PubMed Central

    Henderson, L E; Sowder, R; Smythers, G; Benveniste, R E; Oroszlan, S

    1985-01-01

    A new D-type retrovirus originally designated SAIDS-D/Washington and here referred to as retrovirus-D/Washington (R-D/W) was recently isolated at the University of Washington Primate Center, Seattle, Wash., from a rhesus monkey with an acquired immunodeficiency syndrome and retroperitoneal fibromatosis. To better establish the relationship of this new D-type virus to the prototype D-type virus, Mason-Pfizer monkey virus (MPMV), we have purified and compared six structural proteins from each virus. The proteins purified from each D-type retrovirus include p4, p10, p12, p14, p27, and a phosphoprotein designated pp18 for MPMV and pp20 for R-D/W. Amino acid analysis and N-terminal amino acid sequence analysis show that the p4, p12, p14, and p27 proteins of R-D/W are distinct from the homologous proteins of MPMV but that these proteins from the two different viruses share a high degree of amino acid sequence homology. The p10 proteins from the two viruses have similar amino acid compositions, and both are blocked to N-terminal Edman degradation. The phosphoproteins from the two viruses each contain phosphoserine but are different from each other in amino acid composition, molecular weight, and N-terminal amino acid sequence. The data thus show that each of the R-D/W proteins examined is distinguishable from its MPMV homolog and that a major difference between these two D-type retroviruses is found in the viral phosphoproteins. The N-terminal amino acid sequences of D-type retroviral proteins were used to search for sequence homologies between D-type and other retroviral amino acid sequences. An unexpected amino acid sequence homology was found between R-D/W pp20 (a gag protein) and a 28-residue segment of the env precursor polyprotein of Rous sarcoma virus. The N-terminal amino acid sequences of the D-type major gag protein (p27) and the nucleic acid-binding protein (p14) show only limited amino acid sequence homology to functionally homologous proteins of C

  12. Computer assisted multiplex sequencing

    SciTech Connect

    Church, G.M.

    1992-08-01

    The objectives of this project are automation and optimization of multiplex sequencing. This year we have integrated direct transfer electrophoresis, automated multiplex hybridizations and automated film reading and applied this toward sequencing of three contiguous E. coli cosmids. Primers for the directed dideoxy sequence walking and sequence confirmation steps were synthesized with a 15 base tag complimentary to an alkaline phosphatase conjugate. A higher throughput synthesis device is well along in testing as are new automated hybridization devices. We have developed software for automatically annotating ORFs and databases of precise termini of proteis and RNA.

  13. Amino acid sequence and posttranslational modifications of human factor VII sub a from plasma and transfected baby hamster kidney cells

    SciTech Connect

    Thim, L.; Bjoern, S.; Christensen, M.; Nicolaisen, E.M.; Lund-Hansen, T.; Pedersen, A.H.; Hedner, U. )

    1988-10-04

    Blood coagulation factor VII is a vitamin K dependent glycoprotein which in its activated form, factor VII{sub a}, participates in the coagulation process by activating factor X and/or factor IX in the presence of Ca{sup 2+} and tissue factor. Three types of potential posttranslational modifications exist in the human factor VII{sub a} molecule, namely, 10 {gamma}-carboxylated, N-terminally located glutamic acid residues, 1 {beta}-hydroxylated aspartic acid residue, and 2 N-glycosylated asparagine residues. In the present study, the amino acid sequence and posttranslational modifications of recombinant factor VII{sub a} as purified from the culture medium of a transfected baby hamster kidney cell line have been compared to human plasma factor VII{sub a}. By use of HPLC, amino acid analysis, peptide mapping, and automated Edman degradation, the protein backbone of recombinant factor VII{sub a} was found to be identical with human factor VII{sub a}. Asparagine residues 145 and 322 were found to be fully N-glycosylated in human plasma factor VII{sub a}. In the recombinant factor VII{sub a}, asparagine residue 322 was fully glycosylated whereas asparagine residue 145 was only partially (approximately 66%) glycosylated. Besides minor differences in the sialic acid and fucose contents, the overall carbohydrate compositions were nearly identical in recombinant factor VII{sub a} and human plasma factor VII{sub a}. These results show that factor VII{sub a} as produced in the transfected baby hamster kidney cells is very similar to human plasma factor VII{sub a} and that this cell line thus might represent an alternative source for human factor VII{sub a}.

  14. cDNA and deduced amino acid sequence of human pulmonary surfactant-associated proteolipid SPL(Phe)

    SciTech Connect

    Glasser, S.W.; Korfhagen, T.R.; Weaver, T.; Pilot-Matias, T.; Fox, J.L.; Whitsett, J.A.

    1987-06-01

    Hydrophobic surfactant-associated protein of M/sub r/ 6000-14,000 was isolated from either/ethanol or chloroform/methanol extracts of mammalian pulmonary surfactant. Automated Edman degradation in a gas-phase sequencer showed the major N-terminus of the human low molecular weight protein to be Phe-Pro-Ile-Pro-Leu-Pro-Try-Cys-Trp-Leu-Cys-Arg-Ala-Leu-. Because of the N-terminal phenylalanine, the surfactant protein was designated SPL(Phe). Antiserum generated against hydrophobic surfactant protein(s) from bovine pulmonary surfactant recognized protein of M/sub r/ 6000-14,000 in immunoblot analysis and was used to screen a lambdagt11 expression library constructed from adult human lung poly(A)/sup +/ RNA. This resulted in identification of a 1.4-kilobase cDNA clone that was shown to encode the N-terminus of the surfactant polypeptide SPL(Phe) (Phe-Pro-Ile-Pro-Leu-Pro-) within an open reading frame for a larger protein. Expression of a fused ..beta..-galactosidase-SPL (Phe) gene in Escherichia coli yielded an immunoreactive M/sub r/ 34,000 fusion peptide. Hybrid-arrested translation with the cDNA and immunoprecipitation of (/sup 35/S)methionine-labeled in vitro translation products of human poly(A)/sup +/ RNA with a surfactant polyclonal antibody resulted in identification of a M/sub r/ 40,000 precursor protein. Blot hybridization analysis of electrophoretically fractionated RNA from human lung detected a 2.0-kilobase RNA that was more abundant in adult lung than in fetal lung. These proteins, and specifically SPL(Phe), may therefore be useful for synthesis of replacement surfactants for treatment of hyaline membrane disease in newborn infants or of other surfactant-deficient states.

  15. Sequences, Series, and Mathematica.

    ERIC Educational Resources Information Center

    Mathews, John H.

    1992-01-01

    Describes how the computer algebra system Mathematica can be used to enhance the teaching of the topics of sequences and series. Examines its capabilities to find exact, approximate, and graphically generated approximate solutions to problems from these topics and to understand proofs about sequences. (MDH)

  16. Can sequence determine function?

    PubMed Central

    Gerlt, John A; Babbitt, Patricia C

    2000-01-01

    The functional annotation of proteins identified in genome sequencing projects is based on similarities to homologs in the databases. As a result of the possible strategies for divergent evolution, homologous enzymes frequently do not catalyze the same reaction, and we conclude that assignment of function from sequence information alone should be viewed with some skepticism. PMID:11178260

  17. Lichenase and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2000-08-15

    The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.

  18. Biotools: Patenting DNA sequences

    SciTech Connect

    Yablonsky, M.D.; Hone, W.J.

    1995-07-01

    The decision, known as In re Deuel{sup 2}, rejects the PTO`s interpretation of a previous decision of the Federal Circuit and makes it more possible that a {open_quotes}nucleic acid of a particular sequence{close_quotes} - commonly known as a gene sequence - may be patentable. 15 refs.

  19. Sequences for Student Investigation

    ERIC Educational Resources Information Center

    Barton, Jeffrey; Feil, David; Lartigue, David; Mullins, Bernadette

    2004-01-01

    We describe two classes of sequences that give rise to accessible problems for undergraduate research. These problems may be understood with virtually no prerequisites and are well suited for computer-aided investigation. The first sequence is a variation of one introduced by Stephen Wolfram in connection with his study of cellular automata. The…

  20. Cosmetology: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a cosmetology vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  1. Agriculture: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 3-year program in agriculture. The guide consists of a course description; general course objectives;…

  2. Sequencing the maize genome.

    PubMed

    Martienssen, Robert A; Rabinowicz, Pablo D; O'Shaughnessy, Andrew; McCombie, W Richard

    2004-04-01

    Sequencing of complex genomes can be accomplished by enriching shotgun libraries for genes. In maize, gene-enrichment by copy-number normalization (high C(0)t) and methylation filtration (MF) have been used to generate up to two-fold coverage of the gene-space with less than 1 million sequencing reads. Simulations using sequenced bacterial artificial chromosome (BAC) clones predict that 5x coverage of gene-rich regions, accompanied by less than 1x coverage of subclones from BAC contigs, will generate high-quality mapped sequence that meets the needs of geneticists while accommodating unusually high levels of structural polymorphism. By sequencing several inbred strains, we propose a strategy for capturing this polymorphism to investigate hybrid vigor or heterosis.

  3. Sequence History Update Tool

    NASA Technical Reports Server (NTRS)

    Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

    2008-01-01

    The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.

  4. Next-Generation Sequencing.

    PubMed

    Le Gallo, Matthieu; Lozy, Fred; Bell, Daphne W

    2017-01-01

    Endometrial cancers are the most frequently diagnosed gynecological malignancy and were expected to be the seventh leading cause of cancer death among American women in 2015. The majority of endometrial cancers are of serous or endometrioid histology. Most human tumors, including endometrial tumors, are driven by the acquisition of pathogenic mutations in cancer genes. Thus, the identification of somatic mutations within tumor genomes is an entry point toward cancer gene discovery. However, efforts to pinpoint somatic mutations in human cancers have, until recently, relied on high-throughput sequencing of single genes or gene families using Sanger sequencing. Although this approach has been fruitful, the cost and throughput of Sanger sequencing generally prohibits systematic sequencing of the ~22,000 genes that make up the exome. The recent development of next-generation sequencing technologies changed this paradigm by providing the capability to rapidly sequence exomes, transcriptomes, and genomes at relatively low cost. Remarkably, the application of this technology to catalog the mutational landscapes of endometrial tumor exomes, transcriptomes, and genomes has revealed, for the first time, that serous and endometrioid endometrial cancers can be classified into four distinct molecular subgroups. In this chapter, we overview the characteristic genomic features of each subgroup and discuss the known and putative cancer genes that have emerged from next-generation sequencing of endometrial carcinomas.

  5. HIV Sequence Compendium 2015

    SciTech Connect

    Foley, Brian Thomas; Leitner, Thomas Kenneth; Apetrei, Cristian; Hahn, Beatrice; Mizrachi, Ilene; Mullins, James; Rambaut, Andrew; Wolinsky, Steven; Korber, Bette Tina Marie

    2015-10-05

    This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  6. Personalized Course Sequence Recommendations

    NASA Astrophysics Data System (ADS)

    Xu, Jie; Xing, Tianwei; van der Schaar, Mihaela

    2016-10-01

    Given the variability in student learning it is becoming increasingly important to tailor courses as well as course sequences to student needs. This paper presents a systematic methodology for offering personalized course sequence recommendations to students. First, a forward-search backward-induction algorithm is developed that can optimally select course sequences to decrease the time required for a student to graduate. The algorithm accounts for prerequisite requirements (typically present in higher level education) and course availability. Second, using the tools of multi-armed bandits, an algorithm is developed that can optimally recommend a course sequence that both reduces the time to graduate while also increasing the overall GPA of the student. The algorithm dynamically learns how students with different contextual backgrounds perform for given course sequences and then recommends an optimal course sequence for new students. Using real-world student data from the UCLA Mechanical and Aerospace Engineering department, we illustrate how the proposed algorithms outperform other methods that do not include student contextual information when making course sequence recommendations.

  7. Amino acid sequence and molecular modelling of glycoprotein IIb-IIIa and fibronectin receptor iso-antagonists from Trimeresurus elegans venom.

    PubMed Central

    Scaloni, A; Di Martino, E; Miraglia, N; Pelagalli, A; Della Morte, R; Staiano, N; Pucci, P

    1996-01-01

    Low-molecular-mass Arg-Gly-Asp (RGD)-containing polypeptides were isolated from the venom of Trimeresurus elegans by a simple two-step procedure consisting of membrane filtration and reverse-phase HPLC. A combination of electrospray MS, fast-atom bombardment MS and Edman degradation allowed us to ascertain the presence in the venom of different isoforms and to determine their primary structures. The amino acid sequences resembled the structure of elegantin, the only disintegrin previously reported from the T. elegans venom [Williams, Rucinski, Holt and Niewiarowski (1990) Biochim. Biophys, Acta 1039, 81-89]. MS analyses indicated the occurrence of differential proteolytic processing at both the N-terminus and the C-termins of the polypeptide chains. The amino acid sequence alignment of the elegantin isoforms with known components of the disintegrin family demonstrated the complete conservation of the 12 cysteine residues involved in disulphide bridges. Molecular modelling of elegantins predicted an overall folding of these molecules quite similar to that reported for the kistrin solution structure. The newly identified polypeptide isoforms strongly inhibited ADP-induced aggregation in both human and canine platelet-rich plasma but showed a different species-dependent specificity. These molecules were also able to inhibit B16-BL6 murine melanoma cell adhesion to immobilized fibronectin. The comparison of the structures and biological activities of elegantin isoforms and kistrin allowed us to highlight some structural features that, in addition to the RGD locus might be involved in the interaction of these snake-venom polypeptides with the integrin receptors on the platelet and cell surface. PMID:8920980

  8. Sequence TTKF↓QE Defines the Site of Proteolytic Cleavage in Mhp683 Protein, a Novel Glycosaminoglycan and Cilium Adhesin of Mycoplasma hyopneumoniae*

    PubMed Central

    Bogema, Daniel R.; Scott, Nichollas E.; Padula, Matthew P.; Tacchi, Jessica L.; Raymond, Benjamin B. A.; Jenkins, Cheryl; Cordwell, Stuart J.; Minion, F. Chris; Walker, Mark J.; Djordjevic, Steven P.

    2011-01-01

    Mycoplasma hyopneumoniae colonizes the ciliated respiratory epithelium of swine, disrupting mucociliary function and inducing chronic inflammation. P97 and P102 family members are major surface proteins of M. hyopneumoniae and play key roles in colonizing cilia via interactions with glycosaminoglycans and mucin. The p102 paralog, mhp683, and homologs in strains from different geographic origins encode a 135-kDa pre-protein (P135) that is cleaved into three fragments identified here as P45683, P48683, and P50683. A peptide sequence (TTKF↓QE) was identified surrounding both cleavage sites in Mhp683. N-terminal sequences of P48683 and P50683, determined by Edman degradation and mass spectrometry, confirmed cleavage after the phenylalanine residue. A similar proteolytic cleavage site was identified by mass spectrometry in another paralog of the P97/P102 family. Trypsin digestion and surface biotinylation studies showed that P45683, P48683, and P50683 reside on the M. hyopneumoniae cell surface. Binding assays of recombinant proteins F1683–F5683, spanning Mhp683, showed saturable and dose-dependent binding to biotinylated heparin that was inhibited by unlabeled heparin, fucoidan, and mucin. F1683–F5683 also bound porcine epithelial cilia, and antisera to F2683 and F5683 significantly inhibited cilium binding by M. hyopneumoniae cells. These data suggest that P45683, P48683, and P50683 each display cilium- and proteoglycan-binding sites. Mhp683 is the first characterized glycosaminoglycan-binding member of the P102 family. PMID:21969369

  9. Phylogenetic Trees From Sequences

    NASA Astrophysics Data System (ADS)

    Ryvkin, Paul; Wang, Li-San

    In this chapter, we review important concepts and approaches for phylogeny reconstruction from sequence data.We first cover some basic definitions and properties of phylogenetics, and briefly explain how scientists model sequence evolution and measure sequence divergence. We then discuss three major approaches for phylogenetic reconstruction: distance-based phylogenetic reconstruction, maximum parsimony, and maximum likelihood. In the third part of the chapter, we review how multiple phylogenies are compared by consensus methods and how to assess confidence using bootstrapping. At the end of the chapter are two sections that list popular software packages and additional reading.

  10. Automatic Command Sequence Generation

    NASA Technical Reports Server (NTRS)

    Fisher, Forest; Gladded, Roy; Khanampompan, Teerapat

    2007-01-01

    Automatic Sequence Generator (Autogen) Version 3.0 software automatically generates command sequences for the Mars Reconnaissance Orbiter (MRO) and several other JPL spacecraft operated by the multi-mission support team. Autogen uses standard JPL sequencing tools like APGEN, ASP, SEQGEN, and the DOM database to automate the generation of uplink command products, Spacecraft Command Message Format (SCMF) files, and the corresponding ground command products, DSN Keywords Files (DKF). Autogen supports all the major multi-mission mission phases including the cruise, aerobraking, mapping/science, and relay mission phases. Autogen is a Perl script, which functions within the mission operations UNIX environment. It consists of two parts: a set of model files and the autogen Perl script. Autogen encodes the behaviors of the system into a model and encodes algorithms for context sensitive customizations of the modeled behaviors. The model includes knowledge of different mission phases and how the resultant command products must differ for these phases. The executable software portion of Autogen, automates the setup and use of APGEN for constructing a spacecraft activity sequence file (SASF). The setup includes file retrieval through the DOM (Distributed Object Manager), an object database used to store project files. This step retrieves all the needed input files for generating the command products. Depending on the mission phase, Autogen also uses the ASP (Automated Sequence Processor) and SEQGEN to generate the command product sent to the spacecraft. Autogen also provides the means for customizing sequences through the use of configuration files. By automating the majority of the sequencing generation process, Autogen eliminates many sequence generation errors commonly introduced by manually constructing spacecraft command sequences. Through the layering of commands into the sequence by a series of scheduling algorithms, users are able to rapidly and reliably construct the

  11. Toward nanoscale genome sequencing.

    PubMed

    Ryan, Declan; Rahimi, Maryam; Lund, John; Mehta, Ranjana; Parviz, Babak A

    2007-09-01

    This article reports on the state-of-the-art technologies that sequence DNA using miniaturized devices. The article considers the miniaturization of existing technologies for sequencing DNA and the opportunities for cost reduction that 'on-chip' devices can deliver. The ability to construct nano-scale structures and perform measurements using novel nano-scale effects has provided new opportunities to identify nucleotides directly using physical, and not chemical, methods. The challenges that these technologies need to overcome to provide a US$1000-genome sequencing technology are also presented.

  12. DNA sequences encoding osteoinductive products

    SciTech Connect

    Wang, E.A.; Wozney, J.M.; Rosen, V.

    1991-05-07

    This patent describes an isolated DNA sequence encoding an osteoinductive protein the DNA sequence comprising a coding sequence. It comprises: nucleotide No.1 through nucleotide No.387, nucleotide No.356 through nucleotide No.1543, nucleotide $402 through nucleotide No.1626, naturally occurring allelic sequences and equivalent degenerative codon sequences and sequences which hybridize to any of sequences under stringent hybridization conditions; and encode a protein characterized by the ability to induce the formation of bone and/or cartilage.

  13. Compact rotary sequencer

    NASA Technical Reports Server (NTRS)

    Appleberry, W. T.

    1980-01-01

    Rotary sequencer is assembled from conventional planetary differential gearset and latching mechanism utilizing inputs and outputs which are coaxial. Applications include automated production-line equipment in home appliances and in vehicles.

  14. Authentication of byte sequences

    SciTech Connect

    Stearns, S.D.

    1991-06-01

    Algorithms for the authentication of byte sequences are described. The algorithms are designed to authenticate data in the Storage, Retrieval, Analysis, and Display (SRAD) Test Data Archive of the Radiation Effects and Testing Directorate (9100) at Sandia National Laboratories, and may be used in similar situations where authentication of stored data is required. The algorithms use a well-known error detection method called the Cyclic Redundancy Check (CRC). When a byte sequence is authenticated and stored, CRC bytes are generated and attached to the end of the sequence. When the authenticated data is retrieved, the authentication check consists of processing the entire sequence, including the CRC bytes, and checking for a remainder of zero. The error detection properties of the CRC are extensive and result in a reliable authentication of SRAD data.

  15. Advances in sequence analysis.

    PubMed

    Califano, A

    2001-06-01

    In its early days, the entire field of computational biology revolved almost entirely around biological sequence analysis. Over the past few years, however, a number of new non-sequence-based areas of investigation have become mainstream, from the analysis of gene expression data from microarrays, to whole-genome association discovery, and to the reverse engineering of gene regulatory pathways. Nonetheless, with the completion of private and public efforts to map the human genome, as well as those of other organisms, sequence data continue to be a veritable mother lode of valuable biological information that can be mined in a variety of contexts. Furthermore, the integration of sequence data with a variety of alternative information is providing valuable and fundamentally new insight into biological processes, as well as an array of new computational methodologies for the analysis of biological data.

  16. HIV Sequence Compendium 2010

    SciTech Connect

    Kuiken, Carla; Foley, Brian; Leitner, Thomas; Apetrei, Christian; Hahn, Beatrice; Mizrachi, Ilene; Mullins, James; Rambaut, Andrew; Wolinsky, Steven; Korber, Bette

    2010-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is still increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  17. Pairwise Sequence Alignment Library

    SciTech Connect

    Jeff Daily, PNNL

    2015-05-20

    Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, a novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.

  18. Program Synthesizes UML Sequence Diagrams

    NASA Technical Reports Server (NTRS)

    Barry, Matthew R.; Osborne, Richard N.

    2006-01-01

    A computer program called "Rational Sequence" generates Universal Modeling Language (UML) sequence diagrams of a target Java program running on a Java virtual machine (JVM). Rational Sequence thereby performs a reverse engineering function that aids in the design documentation of the target Java program. Whereas previously, the construction of sequence diagrams was a tedious manual process, Rational Sequence generates UML sequence diagrams automatically from the running Java code.

  19. The amino acid sequence of protein AA from a burro (Equus asinus).

    PubMed

    Sletten, Knut; Johnson, Kenneth H; Westermark, Per

    2003-09-01

    The primary structure of amyloid fibril protein AA of a burro has been determined by Edman degradation. The 80 amino acid residue long protein shows strong resemblance to that of other mammalian AA-proteins and differs from equine protein AA at 5 positions: Burro/horse positions 20 (Q/N), 44 (R,Q, K/K,Q), 59 (G,L/G,A), 61 (Q/E) and 65 (N/R).

  20. Sequencing the Connectome

    PubMed Central

    Zador, Anthony M.; Dubnau, Joshua; Oyibo, Hassana K.; Zhan, Huiqing; Cao, Gang; Peikon, Ian D.

    2012-01-01

    Connectivity determines the function of neural circuits. Historically, circuit mapping has usually been viewed as a problem of microscopy, but no current method can achieve high-throughput mapping of entire circuits with single neuron precision. Here we describe a novel approach to determining connectivity. We propose BOINC (“barcoding of individual neuronal connections”), a method for converting the problem of connectivity into a form that can be read out by high-throughput DNA sequencing. The appeal of using sequencing is that its scale—sequencing billions of nucleotides per day is now routine—is a natural match to the complexity of neural circuits. An inexpensive high-throughput technique for establishing circuit connectivity at single neuron resolution could transform neuroscience research. PMID:23109909

  1. Method to amplify variable sequences without imposing primer sequences

    DOEpatents

    Bradbury, Andrew M.; Zeytun, Ahmet

    2006-11-14

    The present invention provides methods of amplifying target sequences without including regions flanking the target sequence in the amplified product or imposing amplification primer sequences on the amplified product. Also provided are methods of preparing a library from such amplified target sequences.

  2. Lining up Arithmetic Sequences

    ERIC Educational Resources Information Center

    Bell, Carol J.

    2011-01-01

    Most future teachers are familiar with number patterns that represent an arithmetic sequence, and most are able to determine the general representation of the "n"th number in the pattern. However, when they are given a visual representation instead of the numbers in the pattern, it is not always easy for them to make the connection between the…

  3. Prenatal Whole Genome Sequencing

    PubMed Central

    Donley, Greer; Hull, Sara Chandros; Berkman, Benjamin E.

    2014-01-01

    With whole genome sequencing set to become the preferred method of prenatal screening, we need to pay more attention to the massive amount of information it will deliver to parents—and the fact that we don't yet understand what most of it means. PMID:22777977

  4. A Sequence of Cylinders

    ERIC Educational Resources Information Center

    Johnson, Erica

    2006-01-01

    Hoping to develop in her students an understanding of mathematics as a way of thinking more than a way of doing, the author of this article describes how her students worked on a spatial reasoning problem stemming from an iteratively constructed sequence of cylinders. She presents an activity of making cylinders out of paper models, and for every…

  5. High Throughput Sequencing: An Overview of Sequencing Chemistry.

    PubMed

    Ambardar, Sheetal; Gupta, Rikita; Trakroo, Deepika; Lal, Rup; Vakhlu, Jyoti

    2016-12-01

    In the present century sequencing is to the DNA science, what gel electrophoresis was to it in the last century. From 1977 to 2016 three generation of the sequencing technologies of various types have been developed. Second and third generation sequencing technologies referred commonly to as next generation sequencing technology, has evolved significantly with increase in sequencing speed, decrease in sequencing cost, since its inception in 2004. GS FLX by 454 Life Sciences/Roche diagnostics, Genome Analyzer, HiSeq, MiSeq and NextSeq by Illumina, Inc., SOLiD by ABI, Ion Torrent by Life Technologies are various type of the sequencing platforms available for second generation sequencing. The platforms available for the third generation sequencing are Helicos™ Genetic Analysis System by SeqLL, LLC, SMRT Sequencing by Pacific Biosciences, Nanopore sequencing by Oxford Nanopore's, Complete Genomics by Beijing Genomics Institute and GnuBIO by BioRad, to name few. The present article is an overview of the principle and the sequencing chemistry of these high throughput sequencing technologies along with brief comparison of various types of sequencing platforms available.

  6. Sequencing of aromatase inhibitors

    PubMed Central

    Bertelli, G

    2005-01-01

    Since the development of the third-generation aromatase inhibitors (AIs), anastrozole, letrozole and exemestane, these agents have been the subject of intensive research to determine their optimal use in advanced breast cancer. Not only have they replaced progestins in second-line therapy and challenged the role of tamoxifen in first-line, but there is also evidence for a lack of cross-resistance between the steroidal and nonsteroidal AIs, meaning that they may be used in sequence to obtain prolonged clinical benefit. Many questions remain, however, as to the best sequence of the two types of AIs and of the other available agents, including tamoxifen and fulvestrant, in different patient groups. PMID:16100523

  7. Transposon facilitated DNA sequencing

    SciTech Connect

    Berg, D.E.; Berg, C.M.; Huang, H.V.

    1990-01-01

    The purpose of this research is to investigate and develop methods that exploit the power of bacterial transposable elements for large scale DNA sequencing: Our premise is that the use of transposons to put primer binding sites randomly in target DNAs should provide access to all portions of large DNA fragments, without the inefficiencies of methods involving random subcloning and attendant repetitive sequencing, or of sequential synthesis of many oligonucleotide primers that are used to match systematically along a DNA molecule. Two unrelated bacterial transposons, Tn5 and {gamma}{delta}, are being used because they have both proven useful for molecular analyses, and because they differ sufficiently in mechanism and specificity of transposition to merit parallel development.

  8. HIV sequence compendium 2002

    SciTech Connect

    Kuiken, Carla; Foley, Brian; Freed, Eric; Hahn, Beatrice; Marx, Preston; McCutchan, Francine; Mellors, John; Wolinsky, Steven; Korber, Bette

    2002-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Traditionally, we present the sequence data themselves in the form of alignments: Section II, an alignment of a selection of HIV-1/SIVcpz full-length genomes (a lot of LAI-like sequences, for example, have been omitted because they are so similar that they bias the alignment); Section III, a combined HIV-1/HIV-2/SIV whole genome alignment; Sections IV–VI, amino acid alignments for HIV-1/SIV-cpz, HIV-2/SIV, and SIVagm. The HIV-2/SIV and SIVagm amino acid alignments are separate because the genetic distances between these groups are so great that presenting them in one alignment would make it very elongated because of the large number of gaps that have to be inserted. As always, tables with extensive background information gathered from the literature accompany the whole genome alignments. The collection of whole-gene sequences in the database is now large enough that we have abundant representation of most subtypes. For many subtypes, and especially for subtype B, a large number of sequences that span entire genes were not included in the printed alignments to conserve space. A more complete version of all alignments is available on our website, http://hiv-web.lanl.gov/content/hiv-db/ALIGN_CURRENT/ALIGN-INDEX.html. Importantly, all these alignments have been edited to include only one sequence per person, based on phylogenetic trees that were created for all of them, as well as on the literature. Because of the number of sequences available, we have decided to use a different selection principle this year, based on the epidemiological importance of the subtypes. Subtypes A–D and CRFs 01 and 02 are by far the most widespread variants, and for these (when available) we have included 8–10 representatives in the alignments. The other

  9. Rapid Polymer Sequencer

    NASA Technical Reports Server (NTRS)

    Stolc, Viktor (Inventor); Brock, Mathew W. (Inventor)

    2011-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal or transverse direction at the tip, a polymer sequence is passed through the tip, and a change in an electrical current signal is measured as each polymer component passes through the tip. Each measured change in electrical current signals is compared with a database of reference signals, with each reference signal identified with a polymer component, to identify the unknown polymer component. The tip preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  10. Malaria Genome Sequencing Project

    DTIC Science & Technology

    2004-01-01

    spectrometry identified i authentic peptides corresponding to proteins0 -: __{_ _ ___ Caenorhabditis elegans encoded by 2,391 of the genes, including...16 P-type ATPases. An Nramp numbers seen in S. cerevisiae, S. pombe or Caenorhabditis elegans divalent cation transporter was identified which may be...used for sequencing were not avail- and Caenorhabditis elegans were nearing comple- able. Although large-insert yeast artificial chromo- tion. Two

  11. The Galaxy End Sequence

    NASA Astrophysics Data System (ADS)

    Eales, Stephen; de Vis, Pieter; Smith, Matthew W. L.; Appah, Kiran; Ciesla, Laure; Duffield, Chris; Schofield, Simon

    2017-03-01

    A common assumption is that galaxies fall in two distinct regions of a plot of specific star formation rate (SSFR) versus galaxy stellar mass: a star-forming galaxy main sequence (GMS) and a separate region of 'passive' or 'red and dead galaxies'. Starting from a volume-limited sample of nearby galaxies designed to contain most of the stellar mass in this volume, and thus representing the end-point of ≃12 billion years of galaxy evolution, we investigate the distribution of galaxies in this diagram today. We show that galaxies follow a strongly curved extended GMS with a steep negative slope at high galaxy stellar masses. There is a gradual change in the morphologies of the galaxies along this distribution, but there is no clear break between early-type and late-type galaxies. Examining the other evidence that there are two distinct populations, we argue that the 'red sequence' is the result of the colours of galaxies changing very little below a critical value of the SSFR, rather than implying a distinct population of galaxies. Herschel observations, which show at least half of early-type galaxies contain a cool interstellar medium, also imply continuity between early-type and late-type galaxies. This picture of a unitary population of galaxies requires more gradual evolutionary processes than the rapid quenching process needed to explain two distinct populations. We challenge theorists to predict quantitatively the properties of this 'Galaxy End Sequence'.

  12. Sequencing BPS spectra

    NASA Astrophysics Data System (ADS)

    Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar; Stošić, Marko; Sułkowski, Piotr

    2016-03-01

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincaré polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (refined) modular S-matrix. This leads to the identification of modular transformations in Chern-Simons theory and 3d {N}=2 theory via the 3d/3d correspondence. Lastly, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.

  13. Sequencing BPS spectra

    SciTech Connect

    Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar; Stošić, Marko; Sułkowski, Piotr

    2016-03-02

    In this article, we provide both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincar e polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (re fined) modular S-matrix. This leads to the identi fication of modular transformations in Chern-Simons theory and 3d N = 2 theory via the 3d/3d correspondence. In conclusion, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.

  14. Sequencing BPS spectra

    DOE PAGES

    Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar; ...

    2016-03-02

    In this article, we provide both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explainmore » from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincar e polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (re fined) modular S-matrix. This leads to the identi fication of modular transformations in Chern-Simons theory and 3d N = 2 theory via the 3d/3d correspondence. In conclusion, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.« less

  15. Plant DNA sequencing for phylogenetic analyses: from plants to sequences.

    PubMed

    Neves, Susana S; Forrest, Laura L

    2011-01-01

    DNA sequences are important sources of data for phylogenetic analysis. Nowadays, DNA sequencing is a routine technique in molecular biology laboratories. However, there are specific questions associated with project design and sequencing of plant samples for phylogenetic analysis, which may not be familiar to researchers starting in the field. This chapter gives an overview of methods and protocols involved in the sequencing of plant samples, including general recommendations on the selection of species/taxa and DNA regions to be sequenced, and field collection of plant samples. Protocols of plant sample preparation, DNA extraction, PCR and cloning, which are critical to the success of molecular phylogenetic projects, are described in detail. Common problems of sequencing (using the Sanger method) are also addressed. Possible applications of second-generation sequencing techniques in plant phylogenetics are briefly discussed. Finally, orientation on the preparation of sequence data for phylogenetic analyses and submission to public databases is also given.

  16. A vision for ubiquitous sequencing.

    PubMed

    Erlich, Yaniv

    2015-10-01

    Genomics has recently celebrated reaching the $1000 genome milestone, making affordable DNA sequencing a reality. With this goal successfully completed, the next goal of the sequencing revolution can be sequencing sensors--miniaturized sequencing devices that are manufactured for real-time applications and deployed in large quantities at low costs. The first part of this manuscript envisions applications that will benefit from moving the sequencers to the samples in a range of domains. In the second part, the manuscript outlines the critical barriers that need to be addressed in order to reach the goal of ubiquitous sequencing sensors.

  17. High-Throughput Sequencing Technologies

    PubMed Central

    Reuter, Jason A.; Spacek, Damek; Snyder, Michael P.

    2015-01-01

    Summary The human genome sequence has profoundly altered our understanding of biology, human diversity and disease. The path from the first draft sequence to our nascent era of personal genomes and genomic medicine has been made possible only because of the extraordinary advancements in DNA sequencing technologies over the past ten years. Here, we discuss commonly used high-throughput sequencing platforms, the growing array of sequencing assays developed around them as well as the challenges facing current sequencing platforms and their clinical application. PMID:26000844

  18. A vision for ubiquitous sequencing

    PubMed Central

    Erlich, Yaniv

    2015-01-01

    Genomics has recently celebrated reaching the $1000 genome milestone, making affordable DNA sequencing a reality. With this goal successfully completed, the next goal of the sequencing revolution can be sequencing sensors—miniaturized sequencing devices that are manufactured for real-time applications and deployed in large quantities at low costs. The first part of this manuscript envisions applications that will benefit from moving the sequencers to the samples in a range of domains. In the second part, the manuscript outlines the critical barriers that need to be addressed in order to reach the goal of ubiquitous sequencing sensors. PMID:26430149

  19. Correspondence: Searching sequence space

    SciTech Connect

    Youvan, D.C.

    1995-08-01

    This correspondence debates the efficiency and application of genetic algorithms (GAs) to search protein sequence space. The important experimental point is that such sparse searches utilize physically realistic syntheses. In this regard, all GA-based technologies are very similar; they {open_quotes}learn{close_quotes} from their initial sparse search and then generate interesting new proteins within a few iterations. Which GA-based technology is best? That probably depends on the protein and the specific engineering goal. Given the fact that the field of combinatorial chemistry is still in its infancy, it is probably wise to consider all of the proven mutagenesis methods. 19 refs.

  20. DNA Sequencing apparatus

    DOEpatents

    Tabor, Stanley; Richardson, Charles C.

    1992-01-01

    An automated DNA sequencing apparatus having a reactor for providing at least two series of DNA products formed from a single primer and a DNA strand, each DNA product of a series differing in molecular weight and having a chain terminating agent at one end; separating means for separating the DNA products to form a series bands, the intensity of substantially all nearby bands in a different series being different, band reading means for determining the position an This invention was made with government support including a grant from the U.S. Public Health Service, contract number AI-06045. The U.S. government has certain rights in the invention.

  1. Marks of Change in Sequences

    NASA Astrophysics Data System (ADS)

    Jürgensen, H.

    2011-12-01

    Given a sequence of events, how does one recognize that a change has occurred? We explore potential definitions of the concept of change in a sequence and propose that words in relativized solid codes might serve as indicators of change.

  2. Spaces of Ideal Convergent Sequences

    PubMed Central

    Mursaleen, M.; Sharma, Sunil K.

    2014-01-01

    In the present paper, we introduce some sequence spaces using ideal convergence and Musielak-Orlicz function ℳ = (Mk). We also examine some topological properties of the resulting sequence spaces. PMID:24592143

  3. Sequencing the Unrearranged Human Immunoglobin

    SciTech Connect

    Warren, Rene

    2010-06-03

    Rene Warren from Canada's Michael Smith Genome Sciences Centre discusses sequencing and finishing the IgH heavy chain locus on June 3, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  4. Next-Generation Sequencing Platforms

    NASA Astrophysics Data System (ADS)

    Mardis, Elaine R.

    2013-06-01

    Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.

  5. Molecular phylogenetics before sequences

    PubMed Central

    Ragan, Mark A; Bernard, Guillaume; Chan, Cheong Xin

    2014-01-01

    From 1971 to 1985, Carl Woese and colleagues generated oligonucleotide catalogs of 16S/18S rRNAs from more than 400 organisms. Using these incomplete and imperfect data, Carl and his colleagues developed unprecedented insights into the structure, function, and evolution of the large RNA components of the translational apparatus. They recognized a third domain of life, revealed the phylogenetic backbone of bacteria (and its limitations), delineated taxa, and explored the tempo and mode of microbial evolution. For these discoveries to have stood the test of time, oligonucleotide catalogs must carry significant phylogenetic signal; they thus bear re-examination in view of the current interest in alignment-free phylogenetics based on k-mers. Here we consider the aims, successes, and limitations of this early phase of molecular phylogenetics. We computationally generate oligonucleotide sets (e-catalogs) from 16S/18S rRNA sequences, calculate pairwise distances between them based on D2 statistics, compute distance trees, and compare their performance against alignment-based and k-mer trees. Although the catalogs themselves were superseded by full-length sequences, this stage in the development of computational molecular biology remains instructive for us today. PMID:24572375

  6. Rapid Polymer Sequencer

    NASA Technical Reports Server (NTRS)

    Stolc, Viktor (Inventor); Brock, Matthew W (Inventor)

    2013-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal direction, or in a transverse direction, in the tip region, a polymer sequence is passed through the tip region, and a change in an electrical current signal is measured as each polymer component passes through the tip region. Each of the measured changes in electrical current signals is compared with a database of reference electrical change signals, with each reference signal corresponding to an identified polymer component, to identify the unknown polymer component with a reference polymer component. The nanopore preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  7. Towards Sequencing Cotton (Gossypium) Genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Despite rapidly decreasing costs and innovative technologies, sequencing of angiosperm genomes is not yet undertaken lightly. Generating larger amounts of sequence data more quickly does not address the difficulties of sequencing and assembling complex genomes de novo. The cotton genomes represent a...

  8. Sequencing Technologies Panel at SFAF

    SciTech Connect

    Turner, Steve; Fiske, Haley; Knight, Jim; Rhodes, Michael; Vander Horn, Peter

    2010-06-02

    From left to right: Steve Turner of Pacific Biosciences, Haley Fiske of Illumina, Jim Knight of Roche, Michael Rhodes of Life Technologies and Peter Vander Horn of Life Technologies' Single Molecule Sequencing group discuss new sequencing technologies and applications on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  9. Automated Identification of Nucleotide Sequences

    NASA Technical Reports Server (NTRS)

    Osman, Shariff; Venkateswaran, Kasthuri; Fox, George; Zhu, Dian-Hui

    2007-01-01

    STITCH is a computer program that processes raw nucleotide-sequence data to automatically remove unwanted vector information, perform reverse-complement comparison, stitch shorter sequences together to make longer ones to which the shorter ones presumably belong, and search against the user s choice of private and Internet-accessible public 16S rRNA databases. ["16S rRNA" denotes a ribosomal ribonucleic acid (rRNA) sequence that is common to all organisms.] In STITCH, a template 16S rRNA sequence is used to position forward and reverse reads. STITCH then automatically searches known 16S rRNA sequences in the user s chosen database(s) to find the sequence most similar to (the sequence that lies at the smallest edit distance from) each spliced sequence. The result of processing by STITCH is the identification of the most similar well-described bacterium. Whereas previously commercially available software for analyzing genetic sequences operates on one sequence at a time, STITCH can manipulate multiple sequences simultaneously to perform the aforementioned operations. A typical analysis of several dozen sequences (length of the order of 103 base pairs) by use of STITCH is completed in a few minutes, whereas such an analysis performed by use of prior software takes hours or days.

  10. Sequence Factorial and Its Applications

    ERIC Educational Resources Information Center

    Asiru, Muniru A.

    2012-01-01

    In this note, we introduce sequence factorial and use this to study generalized M-bonomial coefficients. For the sequence of natural numbers, the twin concepts of sequence factorial and generalized M-bonomial coefficients, respectively, extend the corresponding concepts of factorial of an integer and binomial coefficients. Some latent properties…

  11. Chromosome specific repetitive DNA sequences

    DOEpatents

    Moyzis, Robert K.; Meyne, Julianne

    1991-01-01

    A method is provided for determining specific nucleotide sequences useful in forming a probe which can identify specific chromosomes, preferably through in situ hybridization within the cell itself. In one embodiment, chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family me This invention is the result of a contract with the Department of Energy (Contract No. W-7405-ENG-36).

  12. RIKEN Integrated Sequence Analysis (RISA) System—384-Format Sequencing Pipeline with 384 Multicapillary Sequencer

    PubMed Central

    Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide

    2000-01-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can

  13. Event sequence detector

    NASA Technical Reports Server (NTRS)

    Hanna, M. F. (Inventor)

    1973-01-01

    An event sequence detector is described with input units, each associated with a row of bistable elements arranged in an array of rows and columns. The detector also includes a shift register which is responsive to clock pulses from any of the units to sequentially provide signals on its output lines each of which is connected to the bistable elements in a corresponding column. When the event-indicating signal is received by an input unit it provides a clock pulse to the shift register to provide the signal on one of its output lines. The input unit also enables all its bistable elements so that the particular element in the column supplied with the signal from the register is driven to an event-indicating state.

  14. Asteroid Ida Rotation Sequence

    NASA Technical Reports Server (NTRS)

    1994-01-01

    This montage of 14 images (the time order is right to left, bottom to top) shows Ida as it appeared in the field of view of Galileo's camera on August 28, 1993. Asteroid Ida rotates once every 4 hours, 39 minutes and clockwise when viewed from above the north pole; these images cover about one Ida 'day.' This sequence has been used to create a 3-D model that shows Ida to be almost croissant shaped. The earliest view (lower right) was taken from a range of 240,000 kilometers (150,000 miles), 5.4 hours before closest approach. The asteroid Ida draws its name from mythology, in which the Greek god Zeus was raised by the nymph Ida.

  15. Relay Sequence Generation Software

    NASA Technical Reports Server (NTRS)

    Gladden, Roy E.; Khanampompan, Teerapat

    2009-01-01

    Due to thermal and electromagnetic interactivity between the UHF (ultrahigh frequency) radio onboard the Mars Reconnaissance Orbiter (MRO), which performs relay sessions with the Martian landers, and the remainder of the MRO payloads, it is required to integrate and de-conflict relay sessions with the MRO science plan. The MRO relay SASF/PTF (spacecraft activity sequence file/ payload target file) generation software facilitates this process by generating a PTF that is needed to integrate the periods of time during which MRO supports relay activities with the rest of the MRO science plans. The software also generates the needed command products that initiate the relay sessions, some features of which are provided by the lander team, some are managed by MRO internally, and some being derived.

  16. The evolution of nanopore sequencing

    PubMed Central

    Wang, Yue; Yang, Qiuping; Wang, Zhimin

    2014-01-01

    The “$1000 Genome” project has been drawing increasing attention since its launch a decade ago. Nanopore sequencing, the third-generation, is believed to be one of the most promising sequencing technologies to reach four gold standards set for the “$1000 Genome” while the second-generation sequencing technologies are bringing about a revolution in life sciences, particularly in genome sequencing-based personalized medicine. Both of protein and solid-state nanopores have been extensively investigated for a series of issues, from detection of ionic current blockage to field-effect-transistor (FET) sensors. A newly released protein nanopore sequencer has shown encouraging potential that nanopore sequencing will ultimately fulfill the gold standards. In this review, we address advances, challenges, and possible solutions of nanopore sequencing according to these standards. PMID:25610451

  17. Solid phase sequencing of biopolymers

    DOEpatents

    Cantor, Charles; Koster, Hubert

    2010-09-28

    This invention relates to methods for detecting and sequencing target nucleic acid sequences, to mass modified nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probes comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include DNA or RNA in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated molecular weight analysis and identification of the target sequence.

  18. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  19. Large-Scale Sequence Comparison.

    PubMed

    Lal, Devi; Verma, Mansi

    2017-01-01

    There are millions of sequences deposited in genomic databases, and it is an important task to categorize them according to their structural and functional roles. Sequence comparison is a prerequisite for proper categorization of both DNA and protein sequences, and helps in assigning a putative or hypothetical structure and function to a given sequence. There are various methods available for comparing sequences, alignment being first and foremost for sequences with a small number of base pairs as well as for large-scale genome comparison. Various tools are available for performing pairwise large sequence comparison. The best known tools either perform global alignment or generate local alignments between the two sequences. In this chapter we first provide basic information regarding sequence comparison. This is followed by the description of the PAM and BLOSUM matrices that form the basis of sequence comparison. We also give a practical overview of currently available methods such as BLAST and FASTA, followed by a description and overview of tools available for genome comparison including LAGAN, MumMER, BLASTZ, and AVID.

  20. Explanatory chapter: next generation sequencing.

    PubMed

    Yegnasubramanian, Srinivasan

    2013-01-01

    Technological breakthroughs in sequencing technologies have driven the advancement of molecular biology and molecular genetics research. The advent of high-throughput Sanger sequencing (for information on the method, see Sanger Dideoxy Sequencing of DNA) in the mid- to late-1990s made possible the accelerated completion of the human genome project, which has since revolutionized the pace of discovery in biomedical research. Similarly, the advent of next generation sequencing is poised to revolutionize biomedical research and usher a new era of individualized, rational medicine. The term next generation sequencing refers to technologies that have enabled the massively parallel analysis of DNA sequence facilitated through the convergence of advancements in molecular biology, nucleic acid chemistry and biochemistry, computational biology, and electrical and mechanical engineering. The current next generation sequencing technologies are capable of sequencing tens to hundreds of millions of DNA templates simultaneously and generate >4 gigabases of sequence in a single day. These technologies have largely started to replace high-throughput Sanger sequencing for large-scale genomic projects, and have created significant enthusiasm for the advent of a new era of individualized medicine.

  1. Solid phase sequencing of biopolymers

    SciTech Connect

    Cantor, Charles R.; Hubert, Koster

    2014-06-24

    This invention relates to methods for detecting and sequencing target nucleic acid sequences, to mass modified nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probes comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Probes may be affixed to a solid support such as a hybridization chip to facilitate automated molecular weight analysis and identification of the target sequence.

  2. Nuclear RNA Isolation and Sequencing.

    PubMed

    Dhaliwal, Navroop K; Mitchell, Jennifer A

    2016-01-01

    Most transcriptome studies involve sequencing and quantification of steady-state mRNA by isolating and sequencing poly (A) RNA. Although this type of sequencing data is informative to determine steady-state mRNA levels it does not provide information on transcriptional output and thus may not always reflect changes in transcriptional regulation of gene expression. Furthermore, sequencing poly (A) RNA may miss transcribed regions of the genome not usually modified by polyadenylation which includes many long noncoding RNAs. Here, we describe nuclear-RNA sequencing (nucRNA-seq) which investigates the transcriptional landscape through sequencing and quantification of nuclear RNAs which are both unspliced and spliced transcripts for protein-coding genes and nuclear-retained long noncoding RNAs.

  3. Graphene nanodevices for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Heerema, Stephanie J.; Dekker, Cees

    2016-02-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with nanopores. Owing to its unique structure and properties, graphene provides interesting opportunities for the development of a new sequencing technology. In recent years, a wide range of creative ideas for graphene sequencers have been theoretically proposed and the first experimental demonstrations have begun to appear. Here, we review the different approaches to using graphene nanodevices for DNA sequencing, which involve DNA passing through graphene nanopores, nanogaps, and nanoribbons, and the physisorption of DNA on graphene nanostructures. We discuss the advantages and problems of each of these key techniques, and provide a perspective on the use of graphene in future DNA sequencing technology.

  4. Turtle Graphics of Morphic Sequences

    NASA Astrophysics Data System (ADS)

    Zantema, Hans

    2016-02-01

    The simplest infinite sequences that are not ultimately periodic are pure morphic sequences: fixed points of particular morphisms mapping single symbols to strings of symbols. A basic way to visualize a sequence is by a turtle curve: for every alphabet symbol fix an angle, and then consecutively for all sequence elements draw a unit segment and turn the drawing direction by the corresponding angle. This paper investigates turtle curves of pure morphic sequences. In particular, criteria are given for turtle curves being finite (consisting of finitely many segments), and for being fractal or self-similar: it contains an up-scaled copy of itself. Also space-filling turtle curves are considered, and a turtle curve that is dense in the plane. As a particular result we give an exact relationship between the Koch curve and a turtle curve for the Thue-Morse sequence, where until now for such a result only approximations were known.

  5. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    SciTech Connect

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by patterns in

  6. Venter wins sequencing race - twice

    SciTech Connect

    Nowak, R.

    1995-06-02

    This article discusses the end of the race to sequence the first complete genome of a free-living organism. Craig Venter of the Institute for Geonomic Research unveiled the complete sequences of two bacteria: Haemophilus influenzae and Mycoplasma genitalium at the American Society of Microbiology Meeting in May 1995. Because there are many similarities in bacterial and human biochemistry, the sequences will be useful for searching for human genes.

  7. Fungal Genome Sequencing and Bioenergy

    SciTech Connect

    Baker, Scott E.; Thykaer, Jette; Adney, William S.; Brettin, T.; Brockman, Fred J.; D'haeseleer, Patrik; Martinez, Antonio D.; Miller, R. M.; Rokhsar, Daniel S.; Schadt, Christopher W.; Torok, Tamas; Tuskan, Gerald; Bennett, Joan W.; Berka, Randy; Briggs, Steve; Heitman, Joseph; Taylor, John; Turgeon, Barbara G.; Werner-Washburne, Maggie; Himmel, Michael E.

    2008-09-30

    To date, the number of ongoing filamentous fungal genome sequencing projects is almost tenfold fewer than those of bacterial and archaeal genome projects. The fungi chosen for sequencing represent narrow kingdom diversity; most are pathogens or models. We advocate an ambitious, forward-looking phylogenetic-based genome sequencing program, designed to capture metabolic diversity within the fungal kingdom, thereby enhancing research into alternative bioenergy sources, bioremediation, and fungal-environment interactions.

  8. Orbital-Maneuver-Sequence Optimization

    DTIC Science & Technology

    1985-12-01

    optimization computer program and applied it to the generation of optimal cog-brbital attack4ianeuver sequences * and to the generation of optimal evasions...maneuver-sequence- optimization computer programs can be improved by a general restructuring and streamlining and the addition of various features. It is...believed that with further development and systematic testing the programs have potential for real-time generation of optimal maneuver sequences in an

  9. Direct-Sequence Communication Systems

    DTIC Science & Technology

    2004-03-01

    modulation; CMF = chip-matched filter; SSG = spreading sequence generator. Delay = 0 for QPSK; delay = Tc/2 for OQPSK and MSK...balanced quaternary modulation (de- lay = 0 for QPSK and delay = Tc/2 for OQPSK and MSK); CMF = chip- matched filter; SSG = spreading sequence...with dual quaternary modulation; CMF = chip-matched filter; SSG = spreading sequence generator. Delay = 0 for QPSK; delay = Tc/2 for OQPSK and MSK. 33 N

  10. Biosensors for DNA sequence detection

    NASA Technical Reports Server (NTRS)

    Vercoutere, Wenonah; Akeson, Mark

    2002-01-01

    DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.

  11. Preferential Amplification of Pathogenic Sequences.

    PubMed

    Ge, Fang; Parker, Jayme; Chul Choi, Sang; Layer, Mark; Ross, Katherine; Jilly, Bernard; Chen, Jack

    2015-06-11

    The application of next generation sequencing (NGS) technology in the diagnosis of human pathogens is hindered by the fact that pathogenic sequences, especially viral, are often scarce in human clinical specimens. This known disproportion leads to the requirement of subsequent deep sequencing and extensive bioinformatics analysis. Here we report a method we called "Preferential Amplification of Pathogenic Sequences (PATHseq)" that can be used to greatly enrich pathogenic sequences. Using a computer program, we developed 8-, 9-, and 10-mer oligonucleotides called "non-human primers" that do not match the most abundant human transcripts, but instead selectively match transcripts of human pathogens. Instead of using random primers in the construction of cDNA libraries, the PATHseq method recruits these short non-human primers, which in turn, preferentially amplifies non-human, presumably pathogenic sequences. Using this method, we were able to enrich pathogenic sequences up to 200-fold in the final sequencing library. This method does not require prior knowledge of the pathogen or assumption of the infection; therefore, it provides a fast and sequence-independent approach for detection and identification of human viruses and other pathogens. The PATHseq method, coupled with NGS technology, can be broadly used in identification of known human pathogens and discovery of new pathogens.

  12. SNMR pulse sequence phase cycling

    DOEpatents

    Walsh, David O; Grunewald, Elliot D

    2013-11-12

    Technologies applicable to SNMR pulse sequence phase cycling are disclosed, including SNMR acquisition apparatus and methods, SNMR processing apparatus and methods, and combinations thereof. SNMR acquisition may include transmitting two or more SNMR pulse sequences and applying a phase shift to a pulse in at least one of the pulse sequences, according to any of a variety cycling techniques. SNMR processing may include combining SNMR from a plurality of pulse sequences comprising pulses of different phases, so that desired signals are preserved and indesired signals are canceled.

  13. Sequence independent amplification of DNA

    DOEpatents

    Bohlander, Stefan K.

    1998-01-01

    The present invention is a rapid sequence-independent amplification procedure (SIA). Even minute amounts of DNA from various sources can be amplified independent of any sequence requirements of the DNA or any a priori knowledge of any sequence characteristics of the DNA to be amplified. This method allows, for example the sequence independent amplification of microdissected chromosomal material and the reliable construction of high quality fluorescent in situ hybridization (FISH) probes from YACs or from other sources. These probes can be used to localize YACs on metaphase chromosomes but also--with high efficiency--in interphase nuclei.

  14. Sequence independent amplification of DNA

    DOEpatents

    Bohlander, S.K.

    1998-03-24

    The present invention is a rapid sequence-independent amplification procedure (SIA). Even minute amounts of DNA from various sources can be amplified independent of any sequence requirements of the DNA or any a priori knowledge of any sequence characteristics of the DNA to be amplified. This method allows, for example, the sequence independent amplification of microdissected chromosomal material and the reliable construction of high quality fluorescent in situ hybridization (FISH) probes from YACs or from other sources. These probes can be used to localize YACs on metaphase chromosomes but also--with high efficiency--in interphase nuclei. 25 figs.

  15. Establishing homologies in protein sequences

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.; Barker, W. C.; Hunt, L. T.

    1983-01-01

    Computer-based statistical techniques used to determine homologies between proteins occurring in different species are reviewed. The technique is based on comparison of two protein sequences, either by relating all segments of a given length in one sequence to all segments of the second or by finding the best alignment of the two sequences. Approaches discussed include selection using printed tabulations, identification of very similar sequences, and computer searches of a database. The use of the SEARCH, RELATE, and ALIGN programs (Dayhoff, 1979) is explained; sample data are presented in graphs, diagrams, and tables and the construction of scoring matrices is considered.

  16. Representations of mechanical assembly sequences

    NASA Technical Reports Server (NTRS)

    Homem De Mello, Luiz S.; Sanderson, Arthur C.

    1991-01-01

    Five types of representations for assembly sequences are reviewed: the directed graph of feasible assembly sequences, the AND/OR graph of feasible assembly sequences, the set of establishment conditions, and two types of sets of precedence relationships. (precedence relationships between the establishment of one connection between parts and the establishment of another connection, and precedence relationships between the establishment of one connection and states of the assembly process). The mappings of one representation into the others are established. The correctness and completeness of these representations are established. The results presented are needed in the proof of correctness and completeness of algorithms for the generation of mechanical assembly sequences.

  17. Automated Sequence Preprocessing in a Large-Scale Sequencing Environment

    PubMed Central

    Wendl, Michael C.; Dear, Simon; Hodgson, Dave; Hillier, LaDeana

    1998-01-01

    A software system for transforming fragments from four-color fluorescence-based gel electrophoresis experiments into assembled sequence is described. It has been developed for large-scale processing of all trace data, including shotgun and finishing reads, regardless of clone origin. Design considerations are discussed in detail, as are programming implementation and graphic tools. The importance of input validation, record tracking, and use of base quality values is emphasized. Several quality analysis metrics are proposed and applied to sample results from recently sequenced clones. Such quantities prove to be a valuable aid in evaluating modifications of sequencing protocol. The system is in full production use at both the Genome Sequencing Center and the Sanger Centre, for which combined weekly production is ∼100,000 sequencing reads per week. PMID:9750196

  18. Venturia carpophila draft genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Venturia carpophila causes peach scab, a disease that renders peach fruit unmarketable. We report a high-quality draft genome sequence (36.9 Mb) of V. carpophila from an isolate collected from a peach tree in central Georgia in the United States. The genome sequence described will be a useful resour...

  19. VOE Accounting: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 2-year program in accounting. The guide consists of a course description; general course objectives;…

  20. Aircraft Mechanics: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for an aircraft mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and…

  1. AMPLIFICATION OF RIBOSOMAL RNA SEQUENCES

    EPA Science Inventory

    This book chapter offers an overview of the use of ribosomal RNA sequences. A history of the technology traces the evolution of techniques to measure bacterial phylogenetic relationships and recent advances in obtaining rRNA sequence information. The manual also describes procedu...

  2. Urban Horticulture: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 4-year program in urban horticulture. The guide consists of a course description; general course…

  3. Auto Mechanics: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for an auto mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  4. DNA Sequencing Sensors: An Overview

    PubMed Central

    Garrido-Cardenas, Jose Antonio; Garcia-Maroto, Federico; Alvarez-Bermejo, Jose Antonio; Manzano-Agugliaro, Francisco

    2017-01-01

    The first sequencing of a complete genome was published forty years ago by the double Nobel Prize in Chemistry winner Frederick Sanger. That corresponded to the small sized genome of a bacteriophage, but since then there have been many complex organisms whose DNA have been sequenced. This was possible thanks to continuous advances in the fields of biochemistry and molecular genetics, but also in other areas such as nanotechnology and computing. Nowadays, sequencing sensors based on genetic material have little to do with those used by Sanger. The emergence of mass sequencing sensors, or new generation sequencing (NGS) meant a quantitative leap both in the volume of genetic material that was able to be sequenced in each trial, as well as in the time per run and its cost. One can envisage that incoming technologies, already known as fourth generation sequencing, will continue to cheapen the trials by increasing DNA reading lengths in each run. All of this would be impossible without sensors and detection systems becoming smaller and more precise. This article provides a comprehensive overview on sensors for DNA sequencing developed within the last 40 years. PMID:28335417

  5. Chameleon sequences in neurodegenerative diseases.

    PubMed

    Bahramali, Golnaz; Goliaei, Bahram; Minuchehr, Zarrin; Salari, Ali

    2016-03-25

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to "helix to strand (HE)", "helix to coil (HC)" and "strand to coil (CE)" alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases.

  6. Diesel Mechanics: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a diesel mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  7. Rapid Diagnostics of Onboard Sequences

    NASA Technical Reports Server (NTRS)

    Starbird, Thomas W.; Morris, John R.; Shams, Khawaja S.; Maimone, Mark W.

    2012-01-01

    Keeping track of sequences onboard a spacecraft is challenging. When reviewing Event Verification Records (EVRs) of sequence executions on the Mars Exploration Rover (MER), operators often found themselves wondering which version of a named sequence the EVR corresponded to. The lack of this information drastically impacts the operators diagnostic capabilities as well as their situational awareness with respect to the commands the spacecraft has executed, since the EVRs do not provide argument values or explanatory comments. Having this information immediately available can be instrumental in diagnosing critical events and can significantly enhance the overall safety of the spacecraft. This software provides auditing capability that can eliminate that uncertainty while diagnosing critical conditions. Furthermore, the Restful interface provides a simple way for sequencing tools to automatically retrieve binary compiled sequence SCMFs (Space Command Message Files) on demand. It also enables developers to change the underlying database, while maintaining the same interface to the existing applications. The logging capabilities are also beneficial to operators when they are trying to recall how they solved a similar problem many days ago: this software enables automatic recovery of SCMF and RML (Robot Markup Language) sequence files directly from the command EVRs, eliminating the need for people to find and validate the corresponding sequences. To address the lack of auditing capability for sequences onboard a spacecraft during earlier missions, extensive logging support was added on the Mars Science Laboratory (MSL) sequencing server. This server is responsible for generating all MSL binary SCMFs from RML input sequences. The sequencing server logs every SCMF it generates into a MySQL database, as well as the high-level RML file and dictionary name inputs used to create the SCMF. The SCMF is then indexed by a hash value that is automatically included in all command

  8. Recent advances in nanopore sequencing

    PubMed Central

    Maitra, Raj D.; Kim, Jungsuk; Dunbar, William B.

    2013-01-01

    The prospect of nanopores as a next-generation sequencing (NGS) platform has been a topic of growing interest and considerable government-sponsored research for more than a decade. Oxford Nanopore Technologies recently announced the first commercial nanopore sequencing devices, to be made available by the end of 2012, while other companies (Life, Roche, IBM) are also pursuing nanopore sequencing approaches. In this paper, the state of the art in nanopore sequencing is reviewed, focusing on the most recent contributions that have or promise to have NGS commercial potential. We consider also the scalability of the circuitry to support multichannel arrays of nanopores in future sequencing devices, which is critical to commercial viability. PMID:23138639

  9. Pathogenetic mechanisms of fetal akinesia deformation sequence and oligohydramnios sequence.

    PubMed

    Rodríguez, J I; Palacios, J

    1991-09-01

    This article briefly reviews the participation of fetal compression, muscular weakness, and fetal akinesia in the genesis of the anomalies found in fetal akinesia deformation sequence (FADS) and oligohydramnios sequence (OS). Both sequences share phenotypic manifestations, such as arthrogryposis, short umbilical cord, and lung hypoplasia, in relation to decreased intrauterine fetal motility. Other characteristic manifestations found in OS, such as Potter face, and redundant skin, are produced by fetal compression. On the other hand, growth retardation, craniofacial anomalies, micrognathia, long bone hypoplasia, and polyhydramnios found in FADS could be related to intrauterine muscular weakness.

  10. Image analysis for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Palaniappan, Kannappan; Huang, Thomas S.

    1991-07-01

    There is a great deal of interest in automating the process of DNA (deoxyribonucleic acid) sequencing to support the analysis of genomic DNA such as the Human and Mouse Genome projects. In one class of gel-based sequencing protocols autoradiograph images are generated in the final step and usually require manual interpretation to reconstruct the DNA sequence represented by the image. The need to handle a large volume of sequence information necessitates automation of the manual autoradiograph reading step through image analysis in order to reduce the length of time required to obtain sequence data and reduce transcription errors. Various adaptive image enhancement, segmentation and alignment methods were applied to autoradiograph images. The methods are adaptive to the local characteristics of the image such as noise, background signal, or presence of edges. Once the two-dimensional data is converted to a set of aligned one-dimensional profiles waveform analysis is used to determine the location of each band which represents one nucleotide in the sequence. Different classification strategies including a rule-based approach are investigated to map the profile signals, augmented with the original two-dimensional image data as necessary, to textual DNA sequence information.

  11. Ossification sequence heterochrony among amphibians.

    PubMed

    Harrington, Sean M; Harrison, Luke B; Sheil, Christopher A

    2013-01-01

    Heterochrony is an important mechanism in the evolution of amphibians. Although studies have centered on the relationship between size and shape and the rates of development, ossification sequence heterochrony also may have been important. Rigorous, phylogenetic methods for assessing sequence heterochrony are relatively new, and a comprehensive study of the relative timing of ossification of skeletal elements has not been used to identify instances of sequence heterochrony across Amphibia. In this study, a new version of the program Parsimov-based genetic inference (PGi) was used to identify shifts in ossification sequences across all extant orders of amphibians, for all major structural units of the skeleton. PGi identified a number of heterochronic sequence shifts in all analyses, the most interesting of which seem to be tied to differences in metamorphic patterns among major clades. Early ossification of the vomer, premaxilla, and dentary is retained by Apateon caducus and members of Gymnophiona and Urodela, which lack the strongly biphasic development seen in anurans. In contrast, bones associated with the jaws and face were identified as shifting late in the ancestor of Anura. The bones that do not shift late, and thereby occupy the earliest positions in the anuran cranial sequence, are those in regions of the skull that undergo the least restructuring throughout anuran metamorphosis. Additionally, within Anura, bones of the hind limb and pelvic girdle were also identified as shifting early in the sequence of ossification, which may be a result of functional constraints imposed by the drastic metamorphosis of most anurans.

  12. From mapping to sequencing, post-sequencing and beyond.

    PubMed

    Sasaki, Takuji; Matsumoto, Takashi; Antonio, Baltazar A; Nagamura, Yoshiaki

    2005-01-01

    The Rice Genome Research Program (RGP) in Japan has been collaborating with the international community in elucidating a complete high-quality sequence of the rice genome. As the pioneer in large-scale analysis of the rice genome, the RGP has successfully established the fundamental tools for genome research such as a genetic map, a yeast artificial chromosome (YAC)-based physical map, a transcript map and a phage P1 artificial chromosome (PAC)/bacterial artificial chromosome (BAC) sequence-ready physical map, which serve as common resources for genome sequencing. Among the 12 rice chromosomes, the RGP is in charge of sequencing six chromosomes covering 52% of the 390 Mb total length of the genome. The contribution of the RGP to the realization of decoding the rice genome sequence with high accuracy and deciphering the genetic information in the genome will have a great impact in understanding the biology of the rice plant that provides a major food source for almost half of the world's population. A high-quality draft sequence (phase 2) was completed in December 2002. Since then, much of the finished quality sequence (phase 3) has become available in public databases. With the completion of sequencing in December 2004, it is expected that the genome sequence would facilitate innovative research in functional and applied genomics. A map-based genome sequence is indispensable for further improvement of current rice varieties and for development of novel varieties carrying agronomically important traits such as high yield potential and tolerance to both biotic and abiotic stresses. In addition to genome sequencing, various related projects have been initiated to generate valuable resources, which could serve as indispensable tools in clarifying the structure and function of the rice genome. These resources have been made available to the scientific community through the Rice Genome Resource Center (RGRC) of the National Institute of Agrobiological Sciences (NIAS) to

  13. Paucity of moderately repetitive sequences

    SciTech Connect

    Schmid, C.W.

    1991-01-01

    We examined clones of renatured repetitive human DNA to find novel repetitive DNAs. After eliminating known repeats, the remaining clones were subjected to sequence analysis. These clones also corresponded to known repeats, but with greater sequence diversity. This indicates that either these libraries were depleted of short interspersed repeats in construction, or these repeats are much less prevalent in the human genome than is indicated by data from {und Xenopus} or sea urchin studies. We directly investigated the sequence composition of human DNA through traditional renaturation techniques with the goal of estimating the limits of abundance of repetitive sequence classes in human DNA. Our results sharply limit the maximum possible abundance to 1--2% of the human genome. Our estimate, minus the known repeats in this fraction, leaves about 1% (3 {times} 10{sup 7} nucleotides) of the human genome for novel repetitive elements. 2 refs. (MHB)

  14. Molecular beacon sequence design algorithm.

    PubMed

    Monroe, W Todd; Haselton, Frederick R

    2003-01-01

    A method based on Web-based tools is presented to design optimally functioning molecular beacons. Molecular beacons, fluorogenic hybridization probes, are a powerful tool for the rapid and specific detection of a particular nucleic acid sequence. However, their synthesis costs can be considerable. Since molecular beacon performance is based on its sequence, it is imperative to rationally design an optimal sequence before synthesis. The algorithm presented here uses simple Microsoft Excel formulas and macros to rank candidate sequences. This analysis is carried out using mfold structural predictions along with other free Web-based tools. For smaller laboratories where molecular beacons are not the focus of research, the public domain algorithm described here may be usefully employed to aid in molecular beacon design.

  15. Sequencing as an Item Type.

    ERIC Educational Resources Information Center

    Alderson, J. Charles; Percsich, Richard; Szabo, Gabor

    2000-01-01

    Reports on the potential problems in scoring responses to sequencing tests, the development of a computer program to overcome these difficulties, and an exploration of the value of scoring procedures. (Author/VWL)

  16. Guitars, Violins, and Geometric Sequences

    ERIC Educational Resources Information Center

    Barger, Rita; Haehl, Martha

    2007-01-01

    This article describes middle school mathematics activities that relate measurement, ratios, and geometric sequences to finger positions or the placement of frets on stringed musical instruments. (Contains 2 figures and 2 tables.)

  17. Pythagorean Triples from Harmonic Sequences.

    ERIC Educational Resources Information Center

    DiDomenico, Angelo S.; Tanner, Randy J.

    2001-01-01

    Shows how all primitive Pythagorean triples can be generated from harmonic sequences. Use inductive and deductive reasoning to explore how Pythagorean triples are connected with another area of mathematics. (KHR)

  18. The Dynamics of DNA Sequencing.

    ERIC Educational Resources Information Center

    Morvillo, Nancy

    1997-01-01

    Describes a paper-and-pencil activity that helps students understand DNA sequencing and expands student understanding of DNA structure, replication, and gel electrophoresis. Appropriate for advanced biology students who are familiar with the Sanger method. (DDR)

  19. Rover Sequencing and Visualization Program

    NASA Technical Reports Server (NTRS)

    Cooper, Brian; Hartman, Frank; Maxwell, Scott; Yen, Jeng; Wright, John; Balacuit, Carlos

    2005-01-01

    The Rover Sequencing and Visualization Program (RSVP) is the software tool for use in the Mars Exploration Rover (MER) mission for planning rover operations and generating command sequences for accomplishing those operations. RSVP combines three-dimensional (3D) visualization for immersive exploration of the operations area, stereoscopic image display for high-resolution examination of the downlinked imagery, and a sophisticated command-sequence editing tool for analysis and completion of the sequences. RSVP is linked with actual flight-code modules for operations rehearsal to provide feedback on the expected behavior of the rover prior to committing to a particular sequence. Playback tools allow for review of both rehearsed rover behavior and downlinked results of actual rover operations. These can be displayed simultaneously for comparison of rehearsed and actual activities for verification. The primary inputs to RSVP are downlink data products from the Operations Storage Server (OSS) and activity plans generated by the science team. The activity plans are high-level goals for the next day s activities. The downlink data products include imagery, terrain models, and telemetered engineering data on rover activities and state. The Rover Sequence Editor (RoSE) component of RSVP performs activity expansion to command sequences, command creation and editing with setting of command parameters, and viewing and management of rover resources. The HyperDrive component of RSVP performs 2D and 3D visualization of the rover s environment, graphical and animated review of rover-predicted and telemetered state, and creation and editing of command sequences related to mobility and Instrument Deployment Device (IDD) operations. Additionally, RoSE and HyperDrive together evaluate command sequences for potential violations of flight and safety rules. The products of RSVP include command sequences for uplink that are stored in the Distributed Object Manager (DOM) and predicted rover

  20. Sequenced drive for rotary valves

    DOEpatents

    Mittell, Larry C.

    1981-01-01

    A sequenced drive for rotary valves which provides the benefits of applying rotary and linear motions to the movable sealing element of the valve. The sequenced drive provides a close approximation of linear motion while engaging or disengaging the movable element with the seat minimizing wear and damage due to scrubbing action. The rotary motion of the drive swings the movable element out of the flowpath thus eliminating obstruction to flow through the valve.

  1. Structural Complexity of DNA Sequence

    PubMed Central

    Liou, Cheng-Yuan; Cheng, Wei-Chen; Tsai, Huai-Ying

    2013-01-01

    In modern bioinformatics, finding an efficient way to allocate sequence fragments with biological functions is an important issue. This paper presents a structural approach based on context-free grammars extracted from original DNA or protein sequences. This approach is radically different from all those statistical methods. Furthermore, this approach is compared with a topological entropy-based method for consistency and difference of the complexity results. PMID:23662161

  2. Mycobacterium abscessus multispacer sequence typing

    PubMed Central

    2013-01-01

    Background Mycobacterium abscessus group includes antibiotic-resistant, opportunistic mycobacteria that are responsible for sporadic cases and outbreaks of cutaneous, pulmonary and disseminated infections. However, because of their close genetic relationships, accurate discrimination between the various strains of these mycobacteria remains difficult. In this report, we describe the development of a multispacer sequence typing (MST) analysis for the simultaneous identification and typing of M. abscessus mycobacteria. We also compared MST with the reference multilocus sequence analysis (MLSA) typing method. Results Based on the M. abscessus CIP104536T genome, eight intergenic spacers were selected, PCR amplified and sequenced in 21 M. abscessus isolates and analysed in 48 available M. abscessus genomes. MST and MLSA grouped 37 M. abscessus organisms into 12 and nine types, respectively; four formerly “M. bolletii” organisms and M. abscessus M139 into three and four types, respectively; and 27 formerly “M. massiliense” organisms grouped into nine and five types, respectively. The Hunter-Gaston index was off 0.912 for MST and of 0.903 for MLSA. The MST-derived tree was similar to that based on MLSA and rpoB gene sequencing and yielded three main clusters comprising each the type strain of the respective M. abscessus sub-species. Two isolates exhibited discordant MLSA- and rpoB gene sequence-derived position, one isolate exhibited discordant MST- and rpoB gene sequence-derived position and one isolate exhibited discordant MST- and MLSA-derived position. MST spacer n°2 sequencing alone allowed for the accurate identification of the different isolates at the sub-species level. Conclusions MST is a new sequencing-based approach for both identifying and genotyping M. abscessus mycobacteria that clearly differentiates formerly “M. massiliense” organisms from other M. abscessus subsp. bolletii organisms. PMID:23294800

  3. Long-range barcode labeling-sequencing

    SciTech Connect

    Chen, Feng; Zhang, Tao; Singh, Kanwar K.; Pennacchio, Len A.; Froula, Jeff L.; Eng, Kevin S.

    2016-10-18

    Methods for sequencing single large DNA molecules by clonal multiple displacement amplification using barcoded primers. Sequences are binned based on barcode sequences and sequenced using a microdroplet-based method for sequencing large polynucleotide templates to enable assembly of haplotype-resolved complex genomes and metagenomes.

  4. Conservation of sequence in recombination signal sequence spacers.

    PubMed Central

    Ramsden, D A; Baetz, K; Wu, G E

    1994-01-01

    The variable domains of immunoglobulins and T cell receptors are assembled through the somatic, site specific recombination of multiple germline segments (V, D, and J segments) or V(D)J rearrangement. The recombination signal sequence (RSS) is necessary and sufficient for cell type specific targeting of the V(D)J rearrangement machinery to these germline segments. Previously, the RSS has been described as possessing both a conserved heptamer and a conserved nonamer motif. The heptamer and nonamer motifs are separated by a 'spacer' that was not thought to possess significant sequence conservation, however the length of the spacer could be either 12 +/- 1 bp or 23 +/- 1 bp long. In this report we have assembled and analyzed an extensive data base of published RSS. We have derived, through extensive consensus comparison, a more detailed description of the RSS than has previously been reported. Our analysis indicates that RSS spacers possess significant conservation of sequence, and that the conserved sequence in 12 bp spacers is similar to the conserved sequence in the first half of 23 bp spacers. PMID:8208601

  5. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    PubMed

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.

  6. Sequence Factorization with Multiple References

    PubMed Central

    Wandelt, Sebastian; Leser, Ulf

    2015-01-01

    The success of high-throughput sequencing has lead to an increasing number of projects which sequence large populations of a species. Storage and analysis of sequence data is a key challenge in these projects, because of the sheer size of the datasets. Compression is one simple technology to deal with this challenge. Referential factorization and compression schemes, which store only the differences between input sequence and a reference sequence, gained lots of interest in this field. Highly-similar sequences, e.g., Human genomes, can be compressed with a compression ratio of 1,000:1 and more, up to two orders of magnitude better than with standard compression techniques. Recently, it was shown that the compression against multiple references from the same species can boost the compression ratio up to 4,000:1. However, a detailed analysis of using multiple references is lacking, e.g., for main memory consumption and optimality. In this paper, we describe one key technique for the referential compression against multiple references: The factorization of sequences. Based on the notion of an optimal factorization, we propose optimization heuristics and identify parameter settings which greatly influence 1) the size of the factorization, 2) the time for factorization, and 3) the required amount of main memory. We evaluate a total of 30 setups with a varying number of references on data from three different species. Our results show a wide range of factorization sizes (optimal to an overhead of up to 300%), factorization speed (0.01 MB/s to more than 600 MB/s), and main memory usage (few dozen MB to dozens of GB). Based on our evaluation, we identify the best configurations for common use cases. Our evaluation shows that multi-reference factorization is much better than single-reference factorization. PMID:26422374

  7. Sequencing Needs for Viral Diagnostics

    SciTech Connect

    Gardner, S N; Lam, M; Mulakken, N J; Torres, C L; Smith, J R; Slezak, T

    2004-01-26

    We built a system to guide decisions regarding the amount of genomic sequencing required to develop diagnostic DNA signatures, which are short sequences that are sufficient to uniquely identify a viral species. We used our existing DNA diagnostic signature prediction pipeline, which selects regions of a target species genome that are conserved among strains of the target (for reliability, to prevent false negatives) and unique relative to other species (for specificity, to avoid false positives). We performed simulations, based on existing sequence data, to assess the number of genome sequences of a target species and of close phylogenetic relatives (''near neighbors'') that are required to predict diagnostic signature regions that are conserved among strains of the target species and unique relative to other bacterial and viral species. For DNA viruses such as variola (smallpox), three target genomes provide sufficient guidance for selecting species-wide signatures. Three near neighbor genomes are critical for species specificity. In contrast, most RNA viruses require four target genomes and no near neighbor genomes, since lack of conservation among strains is more limiting than uniqueness. SARS and Ebola Zaire are exceptional, as additional target genomes currently do not improve predictions, but near neighbor sequences are urgently needed. Our results also indicate that double stranded DNA viruses are more conserved among strains than are RNA viruses, since in most cases there was at least one conserved signature candidate for the DNA viruses and zero conserved signature candidates for the RNA viruses.

  8. Statistical properties of DNA sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

    1995-01-01

    We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.

  9. Sequence-dependent nucleosome positioning.

    PubMed

    Chung, Ho-Ryun; Vingron, Martin

    2009-03-13

    Eukaryotic DNA is organized into a macromolecular structure called chromatin. The basic repeating unit of chromatin is the nucleosome, which consists of two copies of each of the four core histones and DNA. The nucleosomal organization and the positions of nucleosomes have profound effects on all DNA-dependent processes. Understanding the factors that influence nucleosome positioning is therefore of general interest. Among the many determinants of nucleosome positioning, the DNA sequence has been proposed to have a major role. Here, we analyzed more than 860,000 nucleosomal DNA sequences to identify sequence features that guide the formation of nucleosomes in vivo. We found that both a periodic enrichment of AT base pairs and an out-of-phase oscillating enrichment of GC base pairs as well as the overall preference for GC base pairs are determinants of nucleosome positioning. The preference for GC pairs can be related to a lower energetic cost required for deformation of the DNA to wrap around the histones. In line with this idea, we found that only incorporation of both signal components into a sequence model for nucleosome formation results in maximal predictive performance on a genome-wide scale. In this manner, one achieves greater predictive power than published approaches. Our results confirm the hypothesis that the DNA sequence has a major role in nucleosome positioning in vivo.

  10. The Extrapolation of Elementary Sequences

    NASA Technical Reports Server (NTRS)

    Laird, Philip; Saul, Ronald

    1992-01-01

    We study sequence extrapolation as a stream-learning problem. Input examples are a stream of data elements of the same type (integers, strings, etc.), and the problem is to construct a hypothesis that both explains the observed sequence of examples and extrapolates the rest of the stream. A primary objective -- and one that distinguishes this work from previous extrapolation algorithms -- is that the same algorithm be able to extrapolate sequences over a variety of different types, including integers, strings, and trees. We define a generous family of constructive data types, and define as our learning bias a stream language called elementary stream descriptions. We then give an algorithm that extrapolates elementary descriptions over constructive datatypes and prove that it learns correctly. For freely-generated types, we prove a polynomial time bound on descriptions of bounded complexity. An especially interesting feature of this work is the ability to provide quantitative measures of confidence in competing hypotheses, using a Bayesian model of prediction.

  11. Numerical classification of coding sequences

    NASA Technical Reports Server (NTRS)

    Collins, D. W.; Liu, C. C.; Jukes, T. H.

    1992-01-01

    DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for example to identify redundant GenBank entries. These descriptors may be compared, in place of DNA sequences, to extract homologous genes from large databases. This approach permits rapid searching with good selectivity.

  12. Explaining the harmonic sequence paradox.

    PubMed

    Schmidt, Ulrich; Zimper, Alexander

    2012-05-01

    According to the harmonic sequence paradox, an expected utility decision maker's willingness to pay for a gamble whose expected payoffs evolve according to the harmonic series is finite if and only if his marginal utility of additional income becomes zero for rather low payoff levels. Since the assumption of zero marginal utility is implausible for finite payoff levels, expected utility theory - as well as its standard generalizations such as cumulative prospect theory - are apparently unable to explain a finite willingness to pay. This paper presents first an experimental study of the harmonic sequence paradox. Additionally, it demonstrates that the theoretical argument of the harmonic sequence paradox only applies to time-patient decision makers, whereas the paradox is easily avoided if time-impatience is introduced.

  13. Prediction, sequences and the hippocampus

    PubMed Central

    Lisman, John; Redish, A.D.

    2009-01-01

    Recordings of rat hippocampal place cells have provided information about how the hippocampus retrieves memory sequences. One line of evidence has to do with phase precession, a process organized by theta and gamma oscillations. This precession can be interpreted as the cued prediction of the sequence of upcoming positions. In support of this interpretation, experiments in two-dimensional environments and on a cue-rich linear track demonstrate that many cells represent a position ahead of the animal and that this position is the same irrespective of which direction the rat is coming from. Other lines of investigation have demonstrated that such predictive processes also occur in the non-spatial domain and that retrieval can be internally or externally cued. The mechanism of sequence retrieval and the usefulness of this retrieval to guide behaviour are discussed. PMID:19528000

  14. Composition for nucleic acid sequencing

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  15. High speed nucleic acid sequencing

    SciTech Connect

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  16. Fault trees and sequence dependencies

    NASA Technical Reports Server (NTRS)

    Dugan, Joanne Bechta; Boyd, Mark A.; Bavuso, Salvatore J.

    1990-01-01

    One of the frequently cited shortcomings of fault-tree models, their inability to model so-called sequence dependencies, is discussed. Several sources of such sequence dependencies are discussed, and new fault-tree gates to capture this behavior are defined. These complex behaviors can be included in present fault-tree models because they utilize a Markov solution. The utility of the new gates is demonstrated by presenting several models of the fault-tolerant parallel processor, which include both hot and cold spares.

  17. Hahn Sequence Space of Modals

    PubMed Central

    Balasubramanian, T.; Zion Chella Ruth, S.

    2014-01-01

    The history of modal intervals goes back to the very first publications on the topic of interval calculus. The modal interval analysis is used in Computer graphics and Computer Aided Design (CAD), namely, the computation of narrow bounds on Bezier and B-Spline curves. Since modal intervals are used in many fields, we introduce a new sequence space h(gI) called the Hahn sequence space of modal intervals. We have given some new definitions and theorems. Some inclusion relation and some topological properties of this space are investigated. Also dual spaces of this space are computed. PMID:27382628

  18. DNA Sequencing Using capillary Electrophoresis

    SciTech Connect

    Dr. Barry Karger

    2011-05-09

    The overall goal of this program was to develop capillary electrophoresis as the tool to be used to sequence for the first time the Human Genome. Our program was part of the Human Genome Project. In this work, we were highly successful and the replaceable polymer we developed, linear polyacrylamide, was used by the DOE sequencing lab in California to sequence a significant portion of the human genome using the MegaBase multiple capillary array electrophoresis instrument. In this final report, we summarize our efforts and success. We began our work by separating by capillary electrophoresis double strand oligonucleotides using cross-linked polyacrylamide gels in fused silica capillaries. This work showed the potential of the methodology. However, preparation of such cross-linked gel capillaries was difficult with poor reproducibility, and even more important, the columns were not very stable. We improved stability by using non-cross linked linear polyacrylamide. Here, the entangled linear chains could move when osmotic pressure (e.g. sample injection) was imposed on the polymer matrix. This relaxation of the polymer dissipated the stress in the column. Our next advance was to use significantly lower concentrations of the linear polyacrylamide that the polymer could be automatically blown out after each run and replaced with fresh linear polymer solution. In this way, a new column was available for each analytical run. Finally, while testing many linear polymers, we selected linear polyacrylamide as the best matrix as it was the most hydrophilic polymer available. Under our DOE program, we demonstrated initially the success of the linear polyacrylamide to separate double strand DNA. We note that the method is used even today to assay purity of double stranded DNA fragments. Our focus, of course, was on the separation of single stranded DNA for sequencing purposes. In one paper, we demonstrated the success of our approach in sequencing up to 500 bases. Other

  19. Information of sequences and applications

    NASA Astrophysics Data System (ADS)

    Bonanno, Claudio; Galatolo, Stefano; Menconi, Giulia

    2002-03-01

    In this short note, we outline some results about complexity of orbits of a dynamical system, entropy and initial condition sensitivity in weakly chaotic dynamical systems. We present a technique to estimate orbit complexity by the use of data compression algorithms. We also outline how this technique has been applied by our research group to dynamical systems and to DNA sequences.

  20. Why Visual Sequences Come First.

    ERIC Educational Resources Information Center

    Barley, Steven D.

    Visual sequences should be the first visual literacy exercises for reasons that are physio-psychological, semantic, and curricular. In infancy, vision is undifferentiated and undetailed. The number of details a child sees increases with age. Therefore, a series of pictures, rather than one photograph which tells a whole story, is more appropriate…

  1. Ideal statistically quasi Cauchy sequences

    NASA Astrophysics Data System (ADS)

    Savas, Ekrem; Cakalli, Huseyin

    2016-08-01

    An ideal I is a family of subsets of N, the set of positive integers which is closed under taking finite unions and subsets of its elements. A sequence (xk) of real numbers is said to be S(I)-statistically convergent to a real number L, if for each ɛ > 0 and for each δ > 0 the set { n ∈N :1/n | { k ≤n :| xk-L | ≥ɛ } | ≥δ } belongs to I. We introduce S(I)-statistically ward compactness of a subset of R, the set of real numbers, and S(I)-statistically ward continuity of a real function in the senses that a subset E of R is S(I)-statistically ward compact if any sequence of points in E has an S(I)-statistically quasi-Cauchy subsequence, and a real function is S(I)-statistically ward continuous if it preserves S(I)-statistically quasi-Cauchy sequences where a sequence (xk) is called to be S(I)-statistically quasi-Cauchy when (Δxk) is S(I)-statistically convergent to 0. We obtain results related to S(I)-statistically ward continuity, S(I)-statistically ward compactness, Nθ-ward continuity, and slowly oscillating continuity.

  2. Single-Cell Semiconductor Sequencing

    PubMed Central

    Kohn, Andrea B.; Moroz, Tatiana P.; Barnes, Jeffrey P.; Netherton, Mandy; Moroz, Leonid L.

    2014-01-01

    RNA-seq or transcriptome analysis of individual cells and small-cell populations is essential for virtually any biomedical field. It is especially critical for developmental, aging, and cancer biology as well as neuroscience where the enormous heterogeneity of cells present a significant methodological and conceptual challenge. Here we present two methods that allow for fast and cost-efficient transcriptome sequencing from ultra-small amounts of tissue or even from individual cells using semiconductor sequencing technology (Ion Torrent, Life Technologies). The first method is a reduced representation sequencing which maximizes capture of RNAs and preserves transcripts’ directionality. The second, a template-switch protocol, is designed for small mammalian neurons. Both protocols, from cell/tissue isolation to final sequence data, take up to 4 days. The efficiency of these protocols has been validated with single hippocampal neurons and various invertebrate tissues including individually identified neurons within a simpler memory-forming circuit of Aplysia californica and early (1-, 2-, 4-, 8-cells) embryonic and developmental stages from basal metazoans. PMID:23929110

  3. [Gene and gene sequence patenting].

    PubMed

    Bergel, S D

    1998-01-01

    According to the author, the patenting of elements isolated or copied from the human body boils down to the issue of genes and gene sequences. He describes the current situation from the comparative law standpoint (U.S. and Spanish law mainly) and then esamines the biotechnology industry's position.

  4. Genome Sequence of Spizellomyces punctatus

    PubMed Central

    Russ, Carsten; Lang, B. Franz; Chen, Zehua; Gujja, Sharvari; Shea, Terrance; Zeng, Qiandong; Young, Sarah; Nusbaum, Chad

    2016-01-01

    Spizellomyces punctatus is a basally branching chytrid fungus that is found in the Chytridiomycota phylum. Spizellomyces species are common in soil and of importance in terrestrial ecosystems. Here, we report the genome sequence of S. punctatus, which will facilitate the study of this group of early diverging fungi. PMID:27540072

  5. Crop Sequence Calculator, v. 3

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Producers need to know how to sequence crops to develop sustainable dynamic cropping systems that take advantage of inherent internal resources, such as crop synergism, nutrient cycling, and soil water, and capitalize on external resources, such as weather, markets, and government programs. Version ...

  6. Fusicladium effusum draft genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The pecan scab fungus (Fusicladium effusum [G. Winter]) is an economically important pathogen of pecan (Carya illinoinensis [Wangenh]. K. Koch), on account of its impact on yield and quality of valuable nutmeats. We describe the first draft genome sequence of F. effusum, the characteristics of annot...

  7. Exome sequencing deciphers rare diseases.

    PubMed

    Maxmen, Amy

    2011-03-04

    Two years ago, NIH's Undiagnosed Diseases Program began delivering genomics to the clinic on an unprecedented scale. Now, with 128 exomes sequenced and 39 rare diseases diagnosed, the program's success is paving the way for widespread personal genomics while pioneering new techniques for reigning in the "tsunami" of genomics data.

  8. Gene Discovery through Expressed Sequence Tag Sequencing in Trypanosoma cruzi

    PubMed Central

    Verdun, Ramiro E.; Di Paolo, Nelson; Urmenyi, Turan P.; Rondinelli, Edson; Frasch, Alberto C. C.; Sanchez, Daniel O.

    1998-01-01

    Analysis of expressed sequence tags (ESTs) constitutes a useful approach for gene identification that, in the case of human pathogens, might result in the identification of new targets for chemotherapy and vaccine development. As part of the Trypanosoma cruzi genome project, we have partially sequenced the 5′ ends of 1,949 clones to generate ESTs. The clones were randomly selected from a normalized CL Brener epimastigote cDNA library. A total of 14.6% of the clones were homologous to previously identified T. cruzi genes, while 18.4% had significant matches to genes from other organisms in the database. A total of 67% of the ESTs had no matches in the database, and thus, some of them might be T. cruzi-specific genes. Functional groups of those sequences with matches in the database were constructed according to their putative biological functions. The two largest categories were protein synthesis (23.3%) and cell surface molecules (10.8%). The information reported in this paper should be useful for researchers in the field to analyze genes and proteins of their own interest. PMID:9784549

  9. Program for Editing Spacecraft Command Sequences

    NASA Technical Reports Server (NTRS)

    Gladden, Roy; Waggoner, Bruce; Kordon, Mark; Hashemi, Mahnaz; Hanks, David; Salcedo, Jose

    2006-01-01

    Sequence Translator, Editor, and Expander Resource (STEER) is a computer program that facilitates construction of sequences and blocks of sequences (hereafter denoted generally as sequence products) for commanding a spacecraft. STEER also provides mechanisms for translating among various sequence product types and quickly expanding activities of a given sequence in chronological order for review and analysis of the sequence. To date, construction of sequence products has generally been done by use of such clumsy mechanisms as text-editor programs, translating among sequence product types has been challenging, and expanding sequences to time-ordered lists has involved arduous processes of converting sequence products to "real" sequences and running them through Class-A software (defined, loosely, as flight and ground software critical to a spacecraft mission). Also, heretofore, generating sequence products in standard formats has been troublesome because precise formatting and syntax are required. STEER alleviates these issues by providing a graphical user interface containing intuitive fields in which the user can enter the necessary information. The STEER expansion function provides a "quick and dirty" means of seeing how a sequence and sequence block would expand into a chronological list, without need to use of Class-A software.

  10. Cassini Mission Sequence Subsystem (MSS)

    NASA Technical Reports Server (NTRS)

    Alland, Robert

    2011-01-01

    This paper describes my work with the Cassini Mission Sequence Subsystem (MSS) team during the summer of 2011. It gives some background on the motivation for this project and describes the expected benefit to the Cassini program. It then introduces the two tasks that I worked on - an automatic system auditing tool and a series of corrections to the Cassini Sequence Generator (SEQ_GEN) - and the specific objectives these tasks were to accomplish. Next, it details the approach I took to meet these objectives and the results of this approach, followed by a discussion of how the outcome of the project compares with my initial expectations. The paper concludes with a summary of my experience working on this project, lists what the next steps are, and acknowledges the help of my Cassini colleagues.

  11. Triple helix purification and sequencing

    DOEpatents

    Wang, Renfeng; Smith, Lloyd M.; Tong, Xinchun E.

    1995-01-01

    Disclosed herein are methods, kits, and equipment for purifying single stranded circular DNA and then using the DNA for DNA sequencing purposes. Templates are provided with an insert having a hybridization region. An elongated oligonucleotide has two regions that are complementary to the insert and the oligo is bound to a magnetic anchor. The oligo hybridizes to the insert on two sides to form a stable triple helix complex. The anchor can then be used to drag the template out of solution using a magnet. The system can purify sequencing templates, and if desired the triple helix complex can be opened up to a double helix so that the oligonucleotide will act as a primer for further DNA synthesis.

  12. Triple helix purification and sequencing

    DOEpatents

    Wang, R.; Smith, L.M.; Tong, X.E.

    1995-03-28

    Disclosed herein are methods, kits, and equipment for purifying single stranded circular DNA and then using the DNA for DNA sequencing purposes. Templates are provided with an insert having a hybridization region. An elongated oligonucleotide has two regions that are complementary to the insert and the oligo is bound to a magnetic anchor. The oligo hybridizes to the insert on two sides to form a stable triple helix complex. The anchor can then be used to drag the template out of solution using a magnet. The system can purify sequencing templates, and if desired the triple helix complex can be opened up to a double helix so that the oligonucleotide will act as a primer for further DNA synthesis. 4 figures.

  13. Extrapolation methods for vector sequences

    NASA Technical Reports Server (NTRS)

    Smith, David A.; Ford, William F.; Sidi, Avram

    1987-01-01

    This paper derives, describes, and compares five extrapolation methods for accelerating convergence of vector sequences or transforming divergent vector sequences to convergent ones. These methods are the scalar epsilon algorithm (SEA), vector epsilon algorithm (VEA), topological epsilon algorithm (TEA), minimal polynomial extrapolation (MPE), and reduced rank extrapolation (RRE). MPE and RRE are first derived and proven to give the exact solution for the right 'essential degree' k. Then, Brezinski's (1975) generalization of the Shanks-Schmidt transform is presented; the generalized form leads from systems of equations to TEA. The necessary connections are then made with SEA and VEA. The algorithms are extended to the nonlinear case by cycling, the error analysis for MPE and VEA is sketched, and the theoretical support for quadratic convergence is discussed. Strategies for practical implementation of the methods are considered.

  14. Apparatus for improved DNA sequencing

    DOEpatents

    Douthart, Richard J.; Crowell, Shannon L.

    1996-01-01

    This invention is a means for the rapid sequencing of DNA samples. More specifically, it consists of a new design direct blotting electrophoresis unit. The DNA sequence is deposited on a membrane attached to a rotating drum. Initial data compaction is facilitated by the use of a machined multi-channeled plate called a ribbon channel plate. Each channel is an isolated mini gel system much like a gel filled capillary. The system as a whole, however, is in a slab gel like format with the advantages of uniformity and easy reusability. The system can be used in different embodiments. The drum system is unique in that after deposition the drum rotates the deposited DNA into a large non-buffer open space where processing and detection can occur. The drum can also be removed in toto to special workstations for downstream processing, multiplexing and detection.

  15. Apparatus for improved DNA sequencing

    DOEpatents

    Douthart, R.J.; Crowell, S.L.

    1996-05-07

    This invention is a means for the rapid sequencing of DNA samples. More specifically, it consists of a new design direct blotting electrophoresis unit. The DNA sequence is deposited on a membrane attached to a rotating drum. Initial data compaction is facilitated by the use of a machined multi-channeled plate called a ribbon channel plate. Each channel is an isolated mini gel system much like a gel filled capillary. The system as a whole, however, is in a slab gel like format with the advantages of uniformity and easy reusability. The system can be used in different embodiments. The drum system is unique in that after deposition the drum rotates the deposited DNA into a large non-buffer open space where processing and detection can occur. The drum can also be removed in toto to special workstations for downstream processing, multiplexing and detection. 18 figs.

  16. Channel plate for DNA sequencing

    DOEpatents

    Douthart, Richard J.; Crowell, Shannon L.

    1998-01-01

    This invention is a channel plate that facilitates data compaction in DNA sequencing. The channel plate has a length, a width and a thickness, and further has a plurality of channels that are parallel. Each channel has a depth partially through the thickness of the channel plate. Additionally an interface edge permits electrical communication across an interface through a buffer to a deposition membrane surface.

  17. Genome Sequence of Mycobacteriophage Momo.

    PubMed

    Pope, Welkin H; Bina, Elizabeth A; Brahme, Indraneel S; Hill, Amy B; Himmelstein, Philip H; Hunsicker, Sara M; Ish, Amanda R; Le, Tinh S; Martin, Mary M; Moscinski, Catherine N; Shetty, Sameer A; Swierzewski, Tomasz; Iyengar, Varun B; Kim, Hannah; Schafer, Claire E; Grubb, Sarah R; Warner, Marcie H; Bowman, Charles A; Russell, Daniel A; Hatfull, Graham F

    2015-06-18

    Momo is a newly discovered phage of Mycobacterium smegmatis mc(2)155. Momo has a double-stranded DNA genome 154,553 bp in length, with 233 predicted protein-encoding genes, 34 tRNA genes, and one transfer-messenger RNA (tmRNA) gene. Momo has a myoviral morphology and shares extensive nucleotide sequence similarity with subcluster C1 mycobacteriophages.

  18. Genome Sequence of Mycobacteriophage Momo

    PubMed Central

    Bina, Elizabeth A.; Brahme, Indraneel S.; Hill, Amy B.; Himmelstein, Philip H.; Hunsicker, Sara M.; Ish, Amanda R.; Le, Tinh S.; Martin, Mary M.; Moscinski, Catherine N.; Shetty, Sameer A.; Swierzewski, Tomasz; Iyengar, Varun B.; Kim, Hannah; Schafer, Claire E.; Grubb, Sarah R.; Warner, Marcie H.; Bowman, Charles A.; Russell, Daniel A.; Hatfull, Graham F.

    2015-01-01

    Momo is a newly discovered phage of Mycobacterium smegmatis mc2155. Momo has a double-stranded DNA genome 154,553 bp in length, with 233 predicted protein-encoding genes, 34 tRNA genes, and one transfer-messenger RNA (tmRNA) gene. Momo has a myoviral morphology and shares extensive nucleotide sequence similarity with subcluster C1 mycobacteriophages. PMID:26089415

  19. Channel plate for DNA sequencing

    DOEpatents

    Douthart, R.J.; Crowell, S.L.

    1998-01-13

    This invention is a channel plate that facilitates data compaction in DNA sequencing. The channel plate has a length, a width and a thickness, and further has a plurality of channels that are parallel. Each channel has a depth partially through the thickness of the channel plate. Additionally an interface edge permits electrical communication across an interface through a buffer to a deposition membrane surface. 15 figs.

  20. The transvaal sequence: an overview

    NASA Astrophysics Data System (ADS)

    Eriksson, P. G.; Schweitzer, J. K.; Bosch, P. J. A.; Schereiber, U. M.; Van Deventer, J. L.; Hatton, C. J.

    1993-02-01

    The 15 000 m of relatively unmetamorphosed clastic and chemical sedimentary and volcanic rocks of the 2550-2050 Ma Transvaal Sequence as preserved within the Transvaal and correlated Griqualand West basins of South Africa, and in the Kanye basin of Botswana are described. Immature clastic sedimentary and largely andesitic volcanic rocks of the Wolkberg, Godwan and Buffelsfontein Groups and the Bloempoort and Wachteenbeetje Formations probably represent rift-related sequences of Ventersdorp age. The thin sandstones of the Black Reef Formation, developed at the base of both the Kanye and Transvaal basin successions and correlated with the basal Vryburg siltstones of the Griqualand West Sequence, are considered here to be the basal unit of the Transvaal Sequence. The Black Reef fluvial deposits grade up into the epeiric marine carbonates of the Malmani Subgroup. These stromatolitic dolomites and interdbedded cherts were laid down within a steepened carbonate ramp setting; transgressions from an initial Griqualand West compartment towards the northeast covered both the Kanye and Transvaal basins. Iron formations of the succeeding Penge Formation and Griqualand West correlates are envisaged as relatively shallow water shelf deposits within the carbonate platform model; siliceous breccias of the Kanye basin are interpreted as reflecting subaerial brecciation of exposed silica gels. The Duitschland Formation overlying the Penge iron formations is seen as a final, regressive clastic and chemical sedimentary deposits as the Malmani-Penge sea retreated from the Transvaal basin. The interbedded sandstones and mudstones of the uncomformity-bounded Pretoria Group probably represent a combination of alluvial fan and fluviodeltaic complexes debouching into the largely lacustrine Transvaal and Kanye basins. A strong glacial influence in the lower Pretoria Group is reflected in the correlated Makganyene diamicities of the Griqualand West Sequence. Sedimentation across all three

  1. RNA sequence analysis using covariance models.

    PubMed Central

    Eddy, S R; Durbin, R

    1994-01-01

    We describe a general approach to several RNA sequence analysis problems using probabilistic models that flexibly describe the secondary structure and primary sequence consensus of an RNA sequence family. We call these models 'covariance models'. A covariance model of tRNA sequences is an extremely sensitive and discriminative tool for searching for additional tRNAs and tRNA-related sequences in sequence databases. A model can be built automatically from an existing sequence alignment. We also describe an algorithm for learning a model and hence a consensus secondary structure from initially unaligned example sequences and no prior structural information. Models trained on unaligned tRNA examples correctly predict tRNA secondary structure and produce high-quality multiple alignments. The approach may be applied to any family of small RNA sequences. Images PMID:8029015

  2. Personal genome sequencing: current approaches and challenges

    PubMed Central

    Snyder, Michael; Du, Jiang; Gerstein, Mark

    2010-01-01

    The revolution in DNA sequencing technologies has now made it feasible to determine the genome sequences of many individuals; i.e., “personal genomes.” Genome sequences of cells and tissues from both normal and disease states have been determined. Using current approaches, whole human genome sequences are not typically assembled and determined de novo, but, instead, variations relative to a reference sequence are identified. We discuss the current state of personal genome sequencing, the main steps involved in determining a genome sequence (i.e., identifying single-nucleotide polymorphisms [SNPs] and structural variations [SVs], assembling new sequences, and phasing haplotypes), and the challenges and performance metrics for evaluating the accuracy of the reconstruction. Finally, we consider the possible individual and societal benefits of personal genome sequences. PMID:20194435

  3. Method and apparatus for biological sequence comparison

    DOEpatents

    Marr, Thomas G.; Chang, William I-Wei

    1997-01-01

    A method and apparatus for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence.

  4. Method and apparatus for biological sequence comparison

    DOEpatents

    Marr, T.G.; Chang, W.I.

    1997-12-23

    A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.

  5. Memory and learning with rapid audiovisual sequences

    PubMed Central

    Keller, Arielle S.; Sekuler, Robert

    2015-01-01

    We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed. PMID:26575193

  6. Sequencing Voyager II for the Uranus encounter

    NASA Technical Reports Server (NTRS)

    Morris, R. B.

    1986-01-01

    The process of developing the programmed sequence of events necessary for the Voyager 2 spacecraft to return desired data from its Uranus encounter is discussed. The major steps in the sequence process are reviewed, and the elements of the Mission Sequence Software are described. The design phase and the implementation phase of the sequence process are discussed, and the Computer Command Subsystem architecture is examined in detail. The software's role in constructing the sequences and converting them into onboard programs is elucidated, and the problems unique to the Uranus encounter sequences are considered.

  7. Biomolecule Sequencer: Nanopore Sequencing Technology for In-Situ Environmental Monitoring and Astrobiology

    NASA Astrophysics Data System (ADS)

    John, K. K.; Botkin, D. J.; Burton, A. S.; Castro-Wallace, S. L.; Chaput, J. D.; Dworkin, J. P.; Lupisella, M. L.; Mason, C. E.; Rubins, K. H.; Smith, D. J.; Stahl, S.; Switzer, C.

    2016-10-01

    Biomolecule Sequencer will demonstrate, for the first time, that DNA sequencing is feasible as a tool for in-situ environmental monitoring and astrobiology. A space-based sequencer could identify microbes, diseases, and help detect DNA-based life.

  8. DNA sequencing: bench to bedside and beyond†

    PubMed Central

    Hutchison, Clyde A.

    2007-01-01

    Fifteen years elapsed between the discovery of the double helix (1953) and the first DNA sequencing (1968). Modern DNA sequencing began in 1977, with development of the chemical method of Maxam and Gilbert and the dideoxy method of Sanger, Nicklen and Coulson, and with the first complete DNA sequence (phage ϕX174), which demonstrated that sequence could give profound insights into genetic organization. Incremental improvements allowed sequencing of molecules >200 kb (human cytomegalovirus) leading to an avalanche of data that demanded computational analysis and spawned the field of bioinformatics. The US Human Genome Project spurred sequencing activity. By 1992 the first ‘sequencing factory’ was established, and others soon followed. The first complete cellular genome sequences, from bacteria, appeared in 1995 and other eubacterial, archaebacterial and eukaryotic genomes were soon sequenced. Competition between the public Human Genome Project and Celera Genomics produced working drafts of the human genome sequence, published in 2001, but refinement and analysis of the human genome sequence will continue for the foreseeable future. New ‘massively parallel’ sequencing methods are greatly increasing sequencing capacity, but further innovations are needed to achieve the ‘thousand dollar genome’ that many feel is prerequisite to personalized genomic medicine. These advances will also allow new approaches to a variety of problems in biology, evolution and the environment. PMID:17855400

  9. Integration of retinal image sequences

    NASA Astrophysics Data System (ADS)

    Ballerini, Lucia

    1998-10-01

    In this paper a method for noise reduction in ocular fundus image sequences is described. The eye is the only part of the human body where the capillary network can be observed along with the arterial and venous circulation using a non invasive technique. The study of the retinal vessels is very important both for the study of the local pathology (retinal disease) and for the large amount of information it offers on systematic haemodynamics, such as hypertension, arteriosclerosis, and diabetes. In this paper a method for image integration of ocular fundus image sequences is described. The procedure can be divided in two step: registration and fusion. First we describe an automatic alignment algorithm for registration of ocular fundus images. In order to enhance vessel structures, we used a spatially oriented bank of filters designed to match the properties of the objects of interest. To evaluate interframe misalignment we adopted a fast cross-correlation algorithm. The performances of the alignment method have been estimated by simulating shifts between image pairs and by using a cross-validation approach. Then we propose a temporal integration technique of image sequences so as to compute enhanced pictures of the overall capillary network. Image registration is combined with image enhancement by fusing subsequent frames of a same region. To evaluate the attainable results, the signal-to-noise ratio was estimated before and after integration. Experimental results on synthetic images of vessel-like structures with different kind of Gaussian additive noise as well as on real fundus images are reported.

  10. Proline-rich Sequence Recognition

    PubMed Central

    Schlundt, Andreas; Sticht, Jana; Piotukh, Kirill; Kosslick, Daniela; Jahnke, Nadin; Keller, Sandro; Schuemann, Michael; Krause, Eberhard; Freund, Christian

    2009-01-01

    The tumor maintenance protein Tsg101 has recently gained much attention because of its involvement in endosomal sorting, virus release, cytokinesis, and cancerogenesis. The ubiquitin-E2-like variant (UEV) domain of the protein interacts with proline-rich sequences of target proteins that contain P(S/T)AP amino acid motifs and weakly binds to the ubiquitin moiety of proteins committed to sorting or degradation. Here we performed peptide spot analysis and phage display to refine the peptide binding specificity of the Tsg101 UEV domain. A mass spectrometric proteomics approach that combines domain-based pulldown experiments, binding site inactivation, and stable isotope labeling by amino acids in cell culture (SILAC) was then used to delineate the relative importance of the peptide and ubiquitin binding sites. Clearly “PTAP” interactions dominate target recognition, and we identified several novel binders as for example the poly(A)-binding protein 1 (PABP1), Sec24b, NFκB2, and eIF4b. For PABP1 and eIF4b the interactions were confirmed in the context of the corresponding full-length proteins in cellular lysates. Therefore, our results strongly suggest additional roles of Tsg101 in cellular regulation of mRNA translation. Regulation of Tsg101 itself by the ubiquitin ligase TAL (Tsg101-associated ligase) is most likely conferred by a single PSAP binding motif that enables the interaction with Tsg101 UEV. Together with the results from the accompanying article (Kofler, M., Schuemann, M., Merz, C., Kosslick, D., Schlundt, A., Tannert, A., Schaefer, M., Lührmann, R., Krause, E., and Freund, C. (2009) Proline-rich sequence recognition: I. Marking GYF and WW domain assembly sites in early spliceosomal complexes. Mol. Cell. Proteomics 8, 2461–2473) on GYF and WW domain pathways our work defines major proline-rich sequence-mediated interaction networks that contribute to the modular assembly of physiologically relevant protein complexes. PMID:19542561

  11. Apollo: a sequence annotation editor.

    PubMed

    Lewis, S E; Searle, S M J; Harris, N; Gibson, M; Lyer, V; Richter, J; Wiel, C; Bayraktaroglu, L; Birney, E; Crosby, M A; Kaminker, J S; Matthews, B B; Prochnik, S E; Smithy, C D; Tupy, J L; Rubin, G M; Misra, S; Mungall, C J; Clamp, M E

    2002-01-01

    The well-established inaccuracy of purely computational methods for annotating genome sequences necessitates an interactive tool to allow biological experts to refine these approximations by viewing and independently evaluating the data supporting each annotation. Apollo was developed to meet this need, enabling curators to inspect genome annotations closely and edit them. FlyBase biologists successfully used Apollo to annotate the Drosophila melanogaster genome and it is increasingly being used as a starting point for the development of customized annotation editing tools for other genome projects.

  12. "First generation" automated DNA sequencing technology.

    PubMed

    Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

    2011-10-01

    Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines.

  13. The Art of Gymnastics: Creating Sequences.

    ERIC Educational Resources Information Center

    Rovegno, Inez

    1988-01-01

    Offering students opportunities for creating movement sequences in gymnastics allows them to understand the essence of gymnastics, have creative experiences, and learn about themselves. The process of creating sequences is described. (MT)

  14. Genome Sequences of Eight Morphologically Diverse Alphaproteobacteria▿

    PubMed Central

    Brown, Pamela J. B.; Kysela, David T.; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V.

    2011-01-01

    The Alphaproteobacteriacomprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium. PMID:21705585

  15. Genome sequences of eight morphologically diverse Alphaproteobacteria.

    PubMed

    Brown, Pamela J B; Kysela, David T; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V

    2011-09-01

    The Alphaproteobacteria comprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium.

  16. FOGSAA: Fast Optimal Global Sequence Alignment Algorithm

    NASA Astrophysics Data System (ADS)

    Chakraborty, Angana; Bandyopadhyay, Sanghamitra

    2013-04-01

    In this article we propose a Fast Optimal Global Sequence Alignment Algorithm, FOGSAA, which aligns a pair of nucleotide/protein sequences faster than any optimal global alignment method including the widely used Needleman-Wunsch (NW) algorithm. FOGSAA is applicable for all types of sequences, with any scoring scheme, and with or without affine gap penalty. Compared to NW, FOGSAA achieves a time gain of (70-90)% for highly similar nucleotide sequences (> 80% similarity), and (54-70)% for sequences having (30-80)% similarity. For other sequences, it terminates with an approximate score. For protein sequences, the average time gain is between (25-40)%. Compared to three heuristic global alignment methods, the quality of alignment is improved by about 23%-53%. FOGSAA is, in general, suitable for aligning any two sequences defined over a finite alphabet set, where the quality of the global alignment is of supreme importance.

  17. Entropy analysis of substitutive sequences revisited

    NASA Astrophysics Data System (ADS)

    Karamanos, K.

    2001-11-01

    A given finite sequence of letters over a finite alphabet can always be algorithmically generated, in particular by a Turing machine. This fact is at the heart of complexity theory in the sense of Kolmogorov and Chaitin. A relevant question in this context is whether, given a statistically 'sufficiently long' sequence, there exists a deterministic finite automaton that generates it. In this paper we propose a simple criterion, based on measuring block entropies by lumping, which is satisfied by all automatic sequences. On the basis of this, one can determine that a given sequence is not automatic and obtain interesting information when the sequence is automatic. Following previous work on the Feigenbaum sequence, we give a necessary entropy-based condition valid for all automatic sequences read by lumping. Applications of these ideas to representative examples are discussed. In particular, we establish new entropic decimation schemes for the Thue-Morse, the Rudin-Shapiro and the paperfolding sequences read by lumping.

  18. Block variables for deterministic aperiodic sequences

    NASA Astrophysics Data System (ADS)

    Hörnquist, Michael

    1997-10-01

    We use the concept of block variables to obtain a measure of order/disorder for some one-dimensional deterministic aperiodic sequences. For the Thue - Morse sequence, the Rudin - Shapiro sequence and the period-doubling sequence it is possible to obtain analytical expressions in the limit of infinite sequences. For the Fibonacci sequence, we present some analytical results which can be supported by numerical arguments. It turns out that the block variables show a wide range of different behaviour, some of them indicating that some of the considered sequences are more `random' than other. However, the method does not give any definite answer to the question of which sequence is more disordered than the other and, in this sense, the results obtained are negative. We compare this with some other ways of measuring the amount of order/disorder in such systems, and there seems to be no direct correspondence between the measures.

  19. The Genome Sequencing Center at NCGR

    SciTech Connect

    Schilkey, Faye

    2010-06-02

    Faye Schilkey from the National Center for Genome Resources discusses NCGR's research, sequencing and analysis experience on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  20. An Assignment Sequence for Underprepared Writers.

    ERIC Educational Resources Information Center

    Nimmo, Kristi

    2000-01-01

    Presents a sequenced writing assignment on shopping to aid basic writers. Describes a writing assignment focused around online and mail-order shopping. Notes steps in preparing for the assignment, the sequence, and discusses responses to the assignments. (SC)

  1. Data structures for DNA sequence manipulation.

    PubMed Central

    Lawrence, C B

    1986-01-01

    Two data structures designated Fragment and Construct are described. The Fragment data structure defines a continuous nucleic acid sequence from a unique genetic origin. The Construct defines a continuous sequence composed of sequences from multiple genetic origins. These data structures are manipulated by a set of software tools to simulate the construction of mosaic recombinant DNA molecules. They are also used as an interface between sequence data banks and analytical programs. PMID:3753765

  2. Movement sequencing in Huntington disease

    PubMed Central

    GEORGIOU-KARISTIANIS, NELLIE; LONG, JEFFREY D.; LOURENS, SPENCER G.; STOUT, JULIE C.; MILLS, JAMES A.; PAULSEN, JANE S.

    2015-01-01

    Objectives To examine longitudinal changes in movement sequencing in prodromal Huntington’s disease (HD) participants (795 prodromal HD; 225 controls) from the PREDICT-HD study. Methods Prodromal HD participants were tested over seven annual visits and were stratified into three groups (low, medium, high) based on their CAG-Age Product (CAP) score, which indicates likely increasing proximity to diagnosis. A cued movement sequence task assessed the impact of advance cueing on response initiation and execution via three levels of advance information. Results Compared to controls, all CAP groups showed longer initiation and movement times across all conditions at baseline, demonstrating a disease gradient for the majority of outcomes. Across all conditions, the high CAP group had the highest mean for baseline testing, but also demonstrated an increase in movement time across the study. For initiation time, the high CAP group showed the highest mean baseline time across all conditions, but also faster decreasing rates of change over time. Conclusions With progress to diagnosis, participants may increasingly use compensatory strategies, as evidenced by faster initiation. However, this occurred in conjunction with slowed execution times, suggesting a decline in effectively accessing control processes required to translate movement into effective execution. PMID:24678867

  3. Automated Sequence Generation Process and Software

    NASA Technical Reports Server (NTRS)

    Gladden, Roy

    2007-01-01

    "Automated sequence generation" (autogen) signifies both a process and software used to automatically generate sequences of commands to operate various spacecraft. The autogen software comprises the autogen script plus the Activity Plan Generator (APGEN) program. APGEN can be used for planning missions and command sequences.

  4. The recurrence sequence via the Fibonacci groups

    NASA Astrophysics Data System (ADS)

    Aküzüm, Yeşim; Deveci, Ömür

    2016-04-01

    This work develops properties of the recurrence sequence defined by the aid of the relation matrix of the Fibonacci groups. The study of this sequence modulo m yields cyclic groups and semigroups from generating matrix. Finally, we extend the sequence defined to groups and then, we obtain its period in the Fibonacci groups.

  5. Not all sequence tags are created equal: designing and validating sequence identification tags robust to indels.

    PubMed

    Faircloth, Brant C; Glenn, Travis C

    2012-01-01

    Ligating adapters with unique synthetic oligonucleotide sequences (sequence tags) onto individual DNA samples before massively parallel sequencing is a popular and efficient way to obtain sequence data from many individual samples. Tag sequences should be numerous and sufficiently different to ensure sequencing, replication, and oligonucleotide synthesis errors do not cause tags to be unrecoverable or confused. However, many design approaches only protect against substitution errors during sequencing and extant tag sets contain too few tag sequences. We developed an open-source software package to validate sequence tags for conformance to two distance metrics and design sequence tags robust to indel and substitution errors. We use this software package to evaluate several commercial and non-commercial sequence tag sets, design several large sets (max(count) = 7,198) of edit metric sequence tags having different lengths and degrees of error correction, and integrate a subset of these edit metric tags to polymerase chain reaction (PCR) primers and sequencing adapters. We validate a subset of these edit metric tagged PCR primers and sequencing adapters by sequencing on several platforms and subsequent comparison to commercially available alternatives. We find that several commonly used sets of sequence tags or design methodologies used to produce sequence tags do not meet the minimum expectations of their underlying distance metric, and we find that PCR primers and sequencing adapters incorporating edit metric sequence tags designed by our software package perform as well as their commercial counterparts. We suggest that researchers evaluate sequence tags prior to use or evaluate tags that they have been using. The sequence tag sets we design improve on extant sets because they are large, valid across the set, and robust to the suite of substitution, insertion, and deletion errors affecting massively parallel sequencing workflows on all currently used platforms.

  6. Exploration of sequence space for protein engineering.

    PubMed

    Gustafsson, C; Govindarajan, S; Emig, R

    2001-01-01

    The process of protein engineering is currently evolving towards a heuristic understanding of the sequence-function relationship. Improved DNA sequencing capacity, efficient protein function characterization and improved quality of data points in conjunction with well-established statistical tools from other industries are changing the protein engineering field. Algorithms capturing the heuristic sequence-function relationships will have a drastic impact on the field of protein engineering. In this review, several alternative approaches to quantitatively assess sequence space are discussed and the relatively few examples of wet-lab validation of statistical sequence-function characterization/correlation are described.

  7. An efficient method for multiple sequence alignment

    SciTech Connect

    Kim, J.; Pramanik, S.

    1994-12-31

    Multiple sequence alignment has been a useful method in the study of molecular evolution and sequence-structure relationships. This paper presents a new method for multiple sequence alignment based on simulated annealing technique. Dynamic programming has been widely used to find an optimal alignment. However, dynamic programming has several limitations to obtain optimal alignment. It requires long computation time and cannot apply certain types of cost functions. We describe detail mechanisms of simulated annealing for multiple sequence alignment problem. It is shown that simulated annealing can be an effective approach to overcome the limitations of dynamic programming in multiple sequence alignment problem.

  8. Integer sequence discovery from small graphs

    PubMed Central

    Hoppe, Travis; Petrone, Anna

    2015-01-01

    We have exhaustively enumerated all simple, connected graphs of a finite order and have computed a selection of invariants over this set. Integer sequences were constructed from these invariants and checked against the Online Encyclopedia of Integer Sequences (OEIS). 141 new sequences were added and six sequences were extended. From the graph database, we were able to programmatically suggest relationships among the invariants. It will be shown that we can readily visualize any sequence of graphs with a given criteria. The code has been released as an open-source framework for further analysis and the database was constructed to be extensible to invariants not considered in this work. PMID:27034526

  9. Chip-based sequencing nucleic acids

    DOEpatents

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  10. Replacement Sequence of Events Generator

    NASA Technical Reports Server (NTRS)

    Fisher, Forest; Gladden, Daniel Wenkert Roy; Khanampompan, Teerpat

    2008-01-01

    The soeWINDOW program automates the generation of an ITAR (International Traffic in Arms Regulations)-compliant sub-RSOE (Replacement Sequence of Events) by extracting a specified temporal window from an RSOE while maintaining page header information. RSOEs contain a significant amount of information that is not ITAR-compliant, yet that foreign partners need to see for command details to their instrument, as well as the surrounding commands that provide context for validation. soeWINDOW can serve as an example of how command support products can be made ITAR-compliant for future missions. This software is a Perl script intended for use in the mission operations UNIX environment. It is designed for use to support the MRO (Mars Reconnaissance Orbiter) instrument team. The tool also provides automated DOM (Distributed Object Manager) storage into the special ITAR-okay DOM collection, and can be used for creating focused RSOEs for product review by any of the MRO teams.

  11. The 2016 Kumamoto earthquake sequence

    PubMed Central

    KATO, Aitaro; NAKAMURA, Kouji; HIYAMA, Yohei

    2016-01-01

    Beginning in April 2016, a series of shallow, moderate to large earthquakes with associated strong aftershocks struck the Kumamoto area of Kyushu, SW Japan. An Mj 7.3 mainshock occurred on 16 April 2016, close to the epicenter of an Mj 6.5 foreshock that occurred about 28 hours earlier. The intense seismicity released the accumulated elastic energy by right-lateral strike slip, mainly along two known, active faults. The mainshock rupture propagated along multiple fault segments with different geometries. The faulting style is reasonably consistent with regional deformation observed on geologic timescales and with the stress field estimated from seismic observations. One striking feature of this sequence is intense seismic activity, including a dynamically triggered earthquake in the Oita region. Following the mainshock rupture, postseismic deformation has been observed, as well as expansion of the seismicity front toward the southwest and northwest. PMID:27725474

  12. Particle sizer and DNA sequencer

    DOEpatents

    Olivares, Jose A.; Stark, Peter C.

    2005-09-13

    An electrophoretic device separates and detects particles such as DNA fragments, proteins, and the like. The device has a capillary which is coated with a coating with a low refractive index such as Teflon.RTM. AF. A sample of particles is fluorescently labeled and injected into the capillary. The capillary is filled with an electrolyte buffer solution. An electrical field is applied across the capillary causing the particles to migrate from a first end of the capillary to a second end of the capillary. A detector light beam is then scanned along the length of the capillary to detect the location of the separated particles. The device is amenable to a high throughput system by providing additional capillaries. The device can also be used to determine the actual size of the particles and for DNA sequencing.

  13. Experimental investigation of an RNA sequence space

    NASA Technical Reports Server (NTRS)

    Lee, Youn-Hyung; Dsouza, Lisa; Fox, George E.

    1993-01-01

    Modern rRNAs are the historic consequence of an ongoing evolutionary exploration of a sequence space. These extant sequences belong to a special subset of the sequence space that is comprised only of those primary sequences that can validly perform the biological function(s) required of the particular RNA. If it were possible to readily identify all such valid sequences, stochastic predictions could be made about the relative likelihood of various evolutionary pathways available to an RNA. Herein an experimental system which can assess whether a particular sequence is likely to have validity as a eubacterial 5S rRNA is described. A total of ten naturally occurring, and hence known to be valid, sequences and two point mutants of unknown validity were used to test the usefulness of the approach. Nine of the ten valid sequences tested positive whereas both mutants tested as clearly defective. The tenth valid sequence gave results that would be interpreted as reflecting a borderline status were the answer not known. These results demonstrate that it is possible to experimentally determine which sequences in local regions of the sequence space are potentially valid 5S rRNAs.

  14. Distinguishing Proteins From Arbitrary Amino Acid Sequences

    PubMed Central

    Yau, Stephen S.-T.; Mao, Wei-Guang; Benson, Max; He, Rong Lucy

    2015-01-01

    What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid sequence can be a protein. Here we show that when the collection of arbitrary amino acid sequences is viewed in an appropriate geometric context, the protein sequences cluster together. This leads to a new computational test, described here, that has proved to be remarkably accurate at determining whether an arbitrary amino acid sequence can be a protein. Even more, if the results of this test indicate that the sequence can be a protein, and it is indeed a protein sequence, then its identity as a protein sequence is uniquely defined. We anticipate our computational test will be useful for those who are attempting to complete the job of discovering all proteins, or constructing the protein universe. PMID:25609314

  15. Genotator: A Workbench for Sequence Annotation

    SciTech Connect

    Harris, N.L.

    1997-05-01

    Sequencing centers such as the Human Genome Center at LBNL are producing an ever-increasing flood of genetic data. Annotation can greatly enhance the biological value of these sequences. Useful annotations include possible gene locations, homologies to known genes, and gene signals such as promoters and splice sites. Genotator is a workbench for automated sequence annotation and annotation browsing. The back end runs a series of sequence analysis tools on a DNA sequence, handling the various input and output formats required by the tools. Genotator currently runs five different gene finding programs, three homology searches, and searches for promoters, splice sites, and ORFs. The results of the analyses run by Genotator can be viewed with the interactive graphical browser. The browser displays color-coded sequence annotations on a canvas that can be scrolled and zoomed, allowing the annotated sequence to be explored at multiple levels of detail. The user can view the actual DNA sequence in a separate window; when a region is selected in the map display, it is automatically highlighted in the sequence display, and vice-versa. By displaying the output of all of the sequence analyses, Genotator provides an intuitive way to identify the significant regions (for example, probable exons) in a sequence. Users can interactively add personal annotations to label regions of interest. Additional capabilities of Genotator include primer design and pattern searching.

  16. The evolution of the Voyager mission sequence software and trends for future mission sequence software systems

    NASA Technical Reports Server (NTRS)

    Brooks, Robert N., Jr.

    1988-01-01

    The historical background of the spacecraft sequence generation process as it is represented by the Voyager mission to the outer planets is discussed. Present plans for future sequencing methods are examined, including the emphasis on cutting costs and the contrast between the centralized and distributed systems for sequencing. The use of artificial intelligence in mission sequencing is addressed.

  17. Feedback shift register sequences versus uniformly distributed random sequences for correlation chromatography

    NASA Technical Reports Server (NTRS)

    Kaljurand, M.; Valentin, J. R.; Shao, M.

    1996-01-01

    Two alternative input sequences are commonly employed in correlation chromatography (CC). They are sequences derived according to the algorithm of the feedback shift register (i.e., pseudo random binary sequences (PRBS)) and sequences derived by using the uniform random binary sequences (URBS). These two sequences are compared. By applying the "cleaning" data processing technique to the correlograms that result from these sequences, we show that when the PRBS is used the S/N of the correlogram is much higher than the one resulting from using URBS.

  18. Randomness in Sequence Evolution Increases over Time

    PubMed Central

    Wang, Guangyu; Sun, Shixiang; Zhang, Zhang

    2016-01-01

    The second law of thermodynamics states that entropy, as a measure of randomness in a system, increases over time. Although studies have investigated biological sequence randomness from different aspects, it remains unknown whether sequence randomness changes over time and whether this change consists with the second law of thermodynamics. To capture the dynamics of randomness in molecular sequence evolution, here we detect sequence randomness based on a collection of eight statistical random tests and investigate the randomness variation of coding sequences with an application to Escherichia coli. Given that core/essential genes are more ancient than specific/non-essential genes, our results clearly show that core/essential genes are more random than specific/non-essential genes and accordingly indicate that sequence randomness indeed increases over time, consistent well with the second law of thermodynamics. We further find that an increase in sequence randomness leads to increasing randomness of GC content and longer sequence length. Taken together, our study presents an important finding, for the first time, that sequence randomness increases over time, which may provide profound insights for unveiling the underlying mechanisms of molecular sequence evolution. PMID:27224236

  19. Deciphering the RNA landscape by RNAome sequencing

    PubMed Central

    Derks, Kasper WJ; Misovic, Branislav; van den Hout, Mirjam CGN; Kockx, Christel EM; Payan Gomez, Cesar; Brouwer, Rutger WW; Vrieling, Harry; Hoeijmakers, Jan HJ; van IJcken, Wilfred FJ; Pothof, Joris

    2015-01-01

    Current RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species in an unperturbed manner. We report strand-specific RNAome sequencing that determines expression of small and large RNAs from rRNA-depleted total RNA in a single sequence run. Since current analysis pipelines cannot reliably analyze small and large RNAs simultaneously, we developed TRAP, Total Rna Analysis Pipeline, a robust interface that is also compatible with existing RNA sequencing protocols. RNAome sequencing quantitatively preserved all RNA classes, allowing cross-class comparisons that facilitates the identification of relationships between different RNA classes. We demonstrate the strength of RNAome sequencing in mouse embryonic stem cells treated with cisplatin. MicroRNA and mRNA expression in RNAome sequencing significantly correlated between replicates and was in concordance with both existing RNA sequencing methods and gene expression arrays generated from the same samples. Moreover, RNAome sequencing also detected additional RNA classes such as enhancer RNAs, anti-sense RNAs, novel RNA species and numerous differentially expressed RNAs undetectable by other methods. At the level of complete RNA classes, RNAome sequencing also identified a specific global repression of the microRNA and microRNA isoform classes after cisplatin treatment whereas all other classes such as mRNAs were unchanged. These characteristics of RNAome sequencing will significantly improve expression analysis as well as studies on RNA biology not covered by existing methods. PMID:25826412

  20. On the Origin of Sequence

    PubMed Central

    van der Gulik, Peter T. S.

    2015-01-01

    Three aspects which make planet Earth special, and which must be taken in consideration with respect to the emergence of peptides, are the mineralogical composition, the Moon which is in the same size class, and the triple environment consisting of ocean, atmosphere, and continent. GlyGly is a remarkable peptide because it stimulates peptide bond formation in the Salt-Induced Peptide Formation reaction. The role glycine and aspartic acid play in the active site of RNA polymerase is remarkable too. GlyGly might have been the original product of coded peptide synthesis because of its importance in stimulating the production of oligopeptides with a high aspartic acid content, which protected small RNA molecules by binding Mg2+ ions. The feedback loop, which is closed by having RNA molecules producing GlyGly, is proposed as the essential element fundamental to life. Having this system running, longer sequences could evolve, gradually solving the problem of error catastrophe. The basic structure of the standard genetic code (8 fourfold degenerate codon boxes and 8 split codon boxes) is an example of the way information concerning the emergence of life is frozen in the biological constitution of organisms: the structure of the code contains historical information. PMID:26580656

  1. [Rapid-sequence anesthesia induction].

    PubMed

    Lloréns Herrerías, J

    2003-02-01

    Rapid-sequence induction (RSI) techniques are designed to reduce the risk of aspiration in cases where risk is high. ISR is often for surgery, particularly under emergency conditions, but is also found in procedures requiring emergency tracheal intubation inside and outside the hospital. ISR techniques have proven safe for reducing the risk of aspiration and providing good conditions for intubation in such situations. The great variety of clinical situations that can be involved means that the combination of drugs to be used should be individualized for each case. In addition to the two objectives of RSI named and the particular nature of a case, the risk of presenting unforeseen difficult intubation is yet another factor affecting choice of drugs. Precisely because of this last factor and the good results obtained with short-acting opiates, great interest has developed in recent years in RSI that does not use neuromuscular blocking agents. However, conclusive data are unavailable. Studies are often difficult to compare because of small differences in the combination of drugs, the dosing of one or more of them, the route of administration, or because the criteria used to define ideal intubation conditions are different.

  2. Evolutionarily conserved sequences on human chromosome 21

    SciTech Connect

    Frazer, Kelly A.; Sheehan, John B.; Stokowski, Renee P.; Chen, Xiyin; Hosseini, Roya; Cheng, Jan-Fang; Fodor, Stephen P.A.; Cox, David R.; Patil, Nila

    2001-09-01

    Comparison of human sequences with the DNA of other mammals is an excellent means of identifying functional elements in the human genome. Here we describe the utility of high-density oligonucleotide arrays as a rapid approach for comparing human sequences with the DNA of multiple species whose sequences are not presently available. High-density arrays representing approximately 22.5 Mb of nonrepetitive human chromosome 21 sequence were synthesized and then hybridized with mouse and dog DNA to identify sequences conserved between humans and mice (human-mouse elements) and between humans and dogs (human-dog elements). Our data show that sequence comparison of multiple species provides a powerful empiric method for identifying actively conserved elements in the human genome. A large fraction of these evolutionarily conserved elements are present in regions on chromosome 21 that do not encode known genes.

  3. Offline consolidation in implicit sequence learning.

    PubMed

    Meier, Beat; Cock, Josephine

    2014-08-01

    The goal of this study was to investigate offline memory consolidation with regard to general motor skill learning and implicit sequence-specific learning. We trained young adults on a serial reaction time task with a retention interval of either 24 h (Experiment 1) or 1 week (Experiment 2) between two sessions. We manipulated sequence complexity (deterministic vs probabilistic) and motor responses (unimanual or vs bimanual). We found no evidence of offline memory consolidation for sequence-specific learning with either interval (in the sense of no deterioration over the interval but no further improvement either). However, we did find evidence of offline enhancement of general motor skill learning with both intervals, independent of kind of sequence or kind of response. These results suggest that general motor skill learning, but not sequence-specific learning, appears to be enhanced during offline intervals in implicit sequence learning.

  4. Long-range correlations in nucleotide sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Sciortino, F.; Simons, M.; Stanley, H. E.

    1992-01-01

    DNA sequences have been analysed using models, such as an n-step Markov chain, that incorporate the possibility of short-range nucleotide correlations. We propose here a method for studying the stochastic properties of nucleotide sequences by constructing a 1:1 map of the nucleotide sequence onto a walk, which we term a 'DNA walk'. We then use the mapping to provide a quantitative measure of the correlation between nucleotides over long distances along the DNA chain. Thus we uncover in the nucleotide sequence a remarkably long-range power law correlation that implies a new scale-invariant property of DNA. We find such long-range correlations in intron-containing genes and in nontranscribed regulatory DNA sequences, but not in complementary DNA sequences or intron-less genes.

  5. Comparison of Next-Generation Sequencing Systems

    PubMed Central

    Liu, Lin; Li, Yinhu; Li, Siliang; Hu, Ni; He, Yimin; Pong, Ray; Lin, Danni; Lu, Lihua; Law, Maggie

    2012-01-01

    With fast development and wide applications of next-generation sequencing (NGS) technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI), which possesses the world's biggest sequencing capacity, has multiple NGS systems including 137 HiSeq 2000, 27 SOLiD, one Ion Torrent PGM, one MiSeq, and one 454 sequencer. We have accumulated extensive experience in sample handling, sequencing, and bioinformatics analysis. In this paper, technologies of these systems are reviewed, and first-hand data from extensive experience is summarized and analyzed to discuss the advantages and specifics associated with each sequencing system. At last, applications of NGS are summarized. PMID:22829749

  6. Multiple sequence alignment with hierarchical clustering.

    PubMed Central

    Corpet, F

    1988-01-01

    An algorithm is presented for the multiple alignment of sequences, either proteins or nucleic acids, that is both accurate and easy to use on microcomputers. The approach is based on the conventional dynamic-programming method of pairwise alignment. Initially, a hierarchical clustering of the sequences is performed using the matrix of the pairwise alignment scores. The closest sequences are aligned creating groups of aligned sequences. Then close groups are aligned until all sequences are aligned in one group. The pairwise alignments included in the multiple alignment form a new matrix that is used to produce a hierarchical clustering. If it is different from the first one, iteration of the process can be performed. The method is illustrated by an example: a global alignment of 39 sequences of cytochrome c. PMID:2849754

  7. Sequencing Intractable DNA to Close Microbial Genomes

    SciTech Connect

    Hurt, Jr., Richard Ashley; Brown, Steven D; Podar, Mircea; Palumbo, Anthony Vito; Elias, Dwayne A

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  8. The genome sequence of parrot bornavirus 5.

    PubMed

    Guo, Jianhua; Tizard, Ian

    2015-12-01

    Although several new avian bornaviruses have recently been described, information on their evolution, virulence, and sequence are often limited. Here we report the complete genome sequence of parrot bornavirus 5 (PaBV-5) isolated from a case of proventricular dilatation disease in a Palm cockatoo (Probosciger aterrimus). The complete genome consists of 8842 nucleotides with distinct 5' and 3' end sequences. This virus shares nucleotide sequence identities of 69-74 % with other bornaviruses in the genomic regions excluding the 5' and 3' terminal sequences. Phylogenetic analysis based on the genomic regions demonstrated this new isolate is an isolated branch within the clade that includes the aquatic bird bornaviruses and the passerine bornaviruses. Based on phylogenetic analyses and its low nucleotide sequence identities with other bornavirus, we support the proposal that PaBV-5 be assigned to a new bornavirus species:- Psittaciform 2 bornavirus.

  9. Specific heat spectra for quasiperiodic ladder sequences

    NASA Astrophysics Data System (ADS)

    Moreira, D. A.; Albuquerque, E. L.; Bezerra, C. G.

    2006-12-01

    We performed a theoretical study of the specific heat C(T) as a function of the temperature for double-strand quasiperiodic sequences. To mimic DNA molecules, the sequences are made up from the nucleotides guanine G, adenine A, cytosine C and thymine T, arranged according to the Fibonacci and Rudin-Shapiro quasiperiodic sequences. The energy spectra are calculated using the two-dimensional Schrödinger equation, in a tight-binding approximation, with the on-site energy exhibiting long-range disorder and non-random hopping amplitudes. We compare the specific heat features of these quasiperiodic artificial sequences to the spectra considering a segment of the first sequenced human chromosome 22 (Ch22), a real genomic DNA sequence.

  10. Long-range correlations in nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Peng, C.-K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Sciortino, F.; Simons, M.; Stanley, H. E.

    1992-03-01

    DNA SEQUENCES have been analysed using models, such as an it-step Markov chain, that incorporate the possibility of short-range nucleotide correlations1. We propose here a method for studying the stochastic properties of nucleotide sequences by constructing a 1:1 map of the nucleotide sequence onto a walk, which we term a 'DNA walk'. We then use the mapping to provide a quantitative measure of the correlation between nucleotides over long distances along the DNA chain. Thus we uncover in the nucleotide sequence a remarkably long-range power law correlation that implies a new scale-invariant property of DNA. We find such long-range correlations in intron-containing genes and in nontranscribed regulatory DNA sequences, but not in complementary DNA sequences or intron-less genes.

  11. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    States, David J.

    2004-07-28

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  12. Sequence Compaction to Preserve Transition Frequencies

    SciTech Connect

    Pinar, Ali; Liu, C.L.

    2002-12-12

    Simulation-based power estimation is commonly used for its high accuracy despite excessive computation times. Techniques have been proposed to speed it up by compacting an input sequence while preserving its power-consumption characteristics. We propose a novel method to compact a sequence that preserves transition frequencies. We prove the problem is NP-Complete, and propose a graph model to reduce it to that of finding a heaviest weighted trail on a directed graph, along with a heuristic utilizing this model. We also propose using multiple sequences for better accuracy with even shorter sequences. Experiments showed that power dissipation can be estimated with an error of only 2.3 percent, while simulation times are reduced by 10. Proposed methods effectively preserve transition frequencies and generated solutions that are very close to an optimal. Experiments also showed that multiple sequences granted more accurate results with even shorter sequences.

  13. Preparing DNA libraries for multiplexed paired-end deep sequencing for Illumina GA sequencers.

    PubMed

    Son, Mike S; Taylor, Ronald K

    2011-02-01

    Whole-genome sequencing, also known as deep sequencing, is becoming a more affordable and efficient way to identify SNP mutations, deletions, and insertions in DNA sequences across several different strains. Two major obstacles preventing the widespread use of deep sequencers are the costs involved in services used to prepare DNA libraries for sequencing and the overall accuracy of the sequencing data. This unit describes the preparation of DNA libraries for multiplexed paired-end sequencing using the Illumina GA series sequencer. Self-preparation of DNA libraries can help reduce overall expenses, especially if optimization is required for the different samples, and use of the Illumina GA Sequencer can improve the quality of the data.

  14. Discrete sequence prediction and its applications

    NASA Technical Reports Server (NTRS)

    Laird, Philip

    1992-01-01

    Learning from experience to predict sequences of discrete symbols is a fundamental problem in machine learning with many applications. We apply sequence prediction using a simple and practical sequence-prediction algorithm, called TDAG. The TDAG algorithm is first tested by comparing its performance with some common data compression algorithms. Then it is adapted to the detailed requirements of dynamic program optimization, with excellent results.

  15. Fractal analysis of DNA sequence data

    SciTech Connect

    Berthelsen, C.L.

    1993-01-01

    DNA sequence databases are growing at an almost exponential rate. New analysis methods are needed to extract knowledge about the organization of nucleotides from this vast amount of data. Fractal analysis is a new scientific paradigm that has been used successfully in many domains including the biological and physical sciences. Biological growth is a nonlinear dynamic process and some have suggested that to consider fractal geometry as a biological design principle may be most productive. This research is an exploratory study of the application of fractal analysis to DNA sequence data. A simple random fractal, the random walk, is used to represent DNA sequences. The fractal dimension of these walks is then estimated using the [open quote]sandbox method[close quote]. Analysis of 164 human DNA sequences compared to three types of control sequences (random, base-content matched, and dimer-content matched) reveals that long-range correlations are present in DNA that are not explained by base or dimer frequencies. The study also revealed that the fractal dimension of coding sequences was significantly lower than sequences that were primarily noncoding, indicating the presence of longer-range correlations in functional sequences. The multifractal spectrum is used to analyze fractals that are heterogeneous and have a different fractal dimension for subsets with different scalings. The multifractal spectrum of the random walks of twelve mitochondrial genome sequences was estimated. Eight vertebrate mtDNA sequences had uniformly lower spectra values than did four invertebrate mtDNA sequences. Thus, vertebrate mitochondria show significantly longer-range correlations than to invertebrate mitochondria. The higher multifractal spectra values for invertebrate mitochondria suggest a more random organization of the sequences. This research also includes considerable theoretical work on the effects of finite size, embedding dimension, and scaling ranges.

  16. Multiple tag labeling method for DNA sequencing

    DOEpatents

    Mathies, R.A.; Huang, X.C.; Quesada, M.A.

    1995-07-25

    A DNA sequencing method is described which uses single lane or channel electrophoresis. Sequencing fragments are separated in the lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radioisotope labels. 5 figs.

  17. FRESCO: Referential compression of highly similar sequences.

    PubMed

    Wandelt, Sebastian; Leser, Ulf

    2013-01-01

    In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware.

  18. Multiple tag labeling method for DNA sequencing

    DOEpatents

    Mathies, Richard A.; Huang, Xiaohua C.; Quesada, Mark A.

    1995-01-01

    A DNA sequencing method described which uses single lane or channel electrophoresis. Sequencing fragments are separated in said lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radio-isotope labels.

  19. EGNAS: an exhaustive DNA sequence design algorithm

    PubMed Central

    2012-01-01

    Background The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA) is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of sequences with defined properties. EGNAS (Exhaustive Generation of Nucleic Acid Sequences) offers the possibility of controlling both interstrand and intrastrand properties. The guanine-cytosine content can be adjusted. Sequences can be forced to start and end with guanine or cytosine. This option reduces the risk of “fraying” of DNA strands. It is possible to limit cross hybridizations of a defined length, and to adjust the uniqueness of sequences. Self-complementarity and hairpin structures of certain length can be avoided. Sequences and subsequences can optionally be forbidden. Furthermore, sequences can be designed to have minimum interactions with predefined strands and neighboring sequences. Results The algorithm is realized in a C++ program. TAG sequences can be generated and combined with primers for single-base extension reactions, which were described for multiplexed genotyping of single nucleotide polymorphisms. Thereby, possible foldback through intrastrand interaction of TAG-primer pairs can be limited. The design of sequences for specific attachment of molecular constructs to DNA origami is presented. Conclusions We developed a new software tool called EGNAS for the design of unique nucleic acid sequences. The presented exhaustive algorithm allows to generate greater sets of sequences than with previous software and equal constraints. EGNAS is freely available for noncommercial use at http://www.chm.tu-dresden.de/pc6/EGNAS. PMID:22716030

  20. Nucleotide sequences encoding a thermostable alkaline protease

    DOEpatents

    Wilson, David B.; Lao, Guifang

    1998-01-01

    Nucleotide sequences, derived from a thermophilic actinomycete microorganism, which encode a thermostable alkaline protease are disclosed. Also disclosed are variants of the nucleotide sequences which encode a polypeptide having thermostable alkaline proteolytic activity. Recombinant thermostable alkaline protease or recombinant polypeptide may be obtained by culturing in a medium a host cell genetically engineered to contain and express a nucleotide sequence according to the present invention, and recovering the recombinant thermostable alkaline protease or recombinant polypeptide from the culture medium.

  1. Nucleotide sequences encoding a thermostable alkaline protease

    DOEpatents

    Wilson, D.B.; Lao, G.

    1998-01-06

    Nucleotide sequences, derived from a thermophilic actinomycete microorganism, which encode a thermostable alkaline protease are disclosed. Also disclosed are variants of the nucleotide sequences which encode a polypeptide having thermostable alkaline proteolytic activity. Recombinant thermostable alkaline protease or recombinant polypeptide may be obtained by culturing in a medium a host cell genetically engineered to contain and express a nucleotide sequence according to the present invention, and recovering the recombinant thermostable alkaline protease or recombinant polypeptide from the culture medium. 3 figs.

  2. Some properties of generalized Fibonacci sequence

    NASA Astrophysics Data System (ADS)

    Chong, Chin-Yoon; Ho, C. K.

    2015-12-01

    For all non-negative integer n and real constants a, b, p and q, the generalized Fibonacci sequence {U n } is defined by Un+2 = pUn+1 + qUn with the initial values U0 = a and U1 = b. Throughout the paper, we study some properties of the generalized Fibonacci sequence. Our results will motivate some new research problems concerning the contribution of the generalized sequence.

  3. Searching gene and protein sequence databases.

    PubMed

    Barsalou, T; Brutlag, D L

    1991-01-01

    A large-scale effort to map and sequence the human genome is now under way. Crucial to the success of this research is a group of computer programs that analyze and compare data on molecular sequences. This article describes the classic algorithms for similarity searching and sequence alignment. Because good performance of these algorithms is critical to searching very large and growing databases, we analyze the running times of the algorithms and discuss recent improvements in this area.

  4. Completely phased genome sequencing through chromosome sorting

    PubMed Central

    Yang, Hong; Chen, Xi; Wong, Wing Hung

    2011-01-01

    The two haploid genome sequences that a person inherits from the two parents represent the most fundamentally useful type of genetic information for the study of heritable diseases and the development of personalized medicine. Because of the difficulty in obtaining long-range phase information, current sequencing methods are unable to provide this information. Here, we introduce and show feasibility of a scalable approach capable of generating genomic sequences completely phased across the entire chromosome. PMID:21169219

  5. Unlocking Short Read Sequencing for Metagenomics

    SciTech Connect

    Rodrigue, Sébastien; Materna, Arne C.; Timberlake, Sonia C.; Blackburn, Matthew C.; Malmstrom, Rex R.; Alm, Eric J.; Chisholm, Sallie W.; Gilbert, Jack Anthony

    2010-07-28

    We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read.

  6. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    PubMed Central

    de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084

  7. Lygus hesperus polygalacturonase Characterization and Role in Plant Damage

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The amino terminus, of a Lygus hesperus salivary gland protein revealing polygalacturonase (PG) activity in an SDS-PAGE activity gel assay, has been sequenced via Edman degradation. The N-terminal amino acid sequence shares homology with the predicted amino acid sequence for putative L. lineolaris P...

  8. Visible periodicity of strong nucleosome DNA sequences.

    PubMed

    Salih, Bilal; Tripathi, Vijay; Trifonov, Edward N

    2015-01-01

    Fifteen years ago, Lowary and Widom assembled nucleosomes on synthetic random sequence DNA molecules, selected the strongest nucleosomes and discovered that the TA dinucleotides in these strong nucleosome sequences often appear at 10-11 bases from one another or at distances which are multiples of this period. We repeated this experiment computationally, on large ensembles of natural genomic sequences, by selecting the strongest nucleosomes--i.e. those with such distances between like-named dinucleotides, multiples of 10.4 bases, the structural and sequence period of nucleosome DNA. The analysis confirmed the periodicity of TA dinucleotides in the strong nucleosomes, and revealed as well other periodic sequence elements, notably classical AA and TT dinucleotides. The matrices of DNA bendability and their simple linear forms--nucleosome positioning motifs--are calculated from the strong nucleosome DNA sequences. The motifs are in full accord with nucleosome positioning sequences derived earlier, thus confirming that the new technique, indeed, detects strong nucleosomes. Species- and isochore-specific variations of the matrices and of the positioning motifs are demonstrated. The strong nucleosome DNA sequences manifest the highest hitherto nucleosome positioning sequence signals, showing the dinucleotide periodicities in directly observable rather than in hidden form.

  9. Maize genome sequencing by methylation filtration.

    PubMed

    Palmer, Lance E; Rabinowicz, Pablo D; O'Shaughnessy, Andrew L; Balija, Vivekanand S; Nascimento, Lidia U; Dike, Sujit; de la Bastide, Melissa; Martienssen, Robert A; McCombie, W Richard

    2003-12-19

    Gene enrichment strategies offer an alternative to sequencing large and repetitive genomes such as that of maize. We report the generation and analysis of nearly 100,000 undermethylated (or methylation filtration) maize sequences. Comparison with the rice genome reveals that methylation filtration results in a more comprehensive representation of maize genes than those that result from expressed sequence tags or transposon insertion sites sequences. About 7% of the repetitive DNA is unmethylated and thus selected in our libraries, but potentially active transposons and unmethylated organelle genomes can be identified. Reverse transcription polymerase chain reaction can be used to finish the maize transcriptome.

  10. Next generation sequencing based approaches to epigenomics

    PubMed Central

    Marra, Marco A.

    2010-01-01

    Next generation sequencing has brought epigenomic studies to the forefront of current research. The power of massively parallel sequencing coupled to innovative molecular and computational techniques has allowed researchers to profile the epigenome at resolutions that were unimaginable only a few years ago. With early proof of concept studies published, the field is now moving into the next phase where the importance of method standardization and rigorous quality control are becoming paramount. In this review we will describe methodologies that have been developed to profile the epigenome using next generation sequencing platforms. We will discuss these in terms of library preparation, sequence platforms and analysis techniques. PMID:21266347

  11. Locomotor sequence learning in visually guided walking.

    PubMed

    Choi, Julia T; Jensen, Peter; Nielsen, Jens Bo

    2016-04-01

    Voluntary limb modifications must be integrated with basic walking patterns during visually guided walking. In this study we tested whether voluntary gait modifications can become more automatic with practice. We challenged walking control by presenting visual stepping targets that instructed subjects to modify step length from one trial to the next. Our sequence learning paradigm is derived from the serial reaction-time (SRT) task that has been used in upper limb studies. Both random and ordered sequences of step lengths were used to measure sequence-specific and sequence-nonspecific learning during walking. In addition, we determined how age (i.e., healthy young adults vs. children) and biomechanical factors (i.e., walking speed) affected the rate and magnitude of locomotor sequence learning. The results showed that healthy young adults (age 24 ± 5 yr,n= 20) could learn a specific sequence of step lengths over 300 training steps. Younger children (age 6-10 yr,n= 8) had lower baseline performance, but their magnitude and rate of sequence learning were the same compared with those of older children (11-16 yr,n= 10) and healthy adults. In addition, learning capacity may be more limited at faster walking speeds. To our knowledge, this is the first study to demonstrate that spatial sequence learning can be integrated with a highly automatic task such as walking. These findings suggest that adults and children use implicit knowledge about the sequence to plan and execute leg movement during visually guided walking.

  12. Sequence comparisons via algorithmic mutual information

    SciTech Connect

    Milosavijevic, A.

    1994-12-31

    One of the main problems in DNA and protein sequence comparisons is to decide whether observed similarity of two sequences should be explained by their relatedness or by mere presence of some shared internal structure, e.g., shared internal tandem repeats. The standard methods that are based on statistics or classical information theory can be used to discover either internal structure or mutual sequence similarity, but cannot take into account both. Consequently, currently used methods for sequence comparison employ {open_quotes}masking{close_quotes} techniques that simply eliminate sequences that exhibit internal repetitive structure prior to sequence comparisons. The {open_quotes}masking{close_quotes} approach precludes discovery of homologous sequences of moderate or low complexity, which abound at both DNA and protein levels. As a solution to this problem, we propose a general method that is based on algorithmic information theory and minimal length encoding. We show that algorithmic mutual information factors out the sequence similarity that is due to shared internal structure and thus enables discovery of truly related sequences. We extend the recently developed algorithmic significance method to show that significance depends exponentially on algorithmic mutual information.

  13. Recursive sequences in first-year calculus

    NASA Astrophysics Data System (ADS)

    Krainer, Thomas

    2016-02-01

    This article provides ready-to-use supplementary material on recursive sequences for a second-semester calculus class. It equips first-year calculus students with a basic methodical procedure based on which they can conduct a rigorous convergence or divergence analysis of many simple recursive sequences on their own without the need to invoke inductive arguments as is typically required in calculus textbooks. The sequences that are accessible to this kind of analysis are predominantly (eventually) monotonic, but also certain recursive sequences that alternate around their limit point as they converge can be considered.

  14. Genomic sequencing of Pleistocene cave bears

    SciTech Connect

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  15. The Hippocampus and Disambiguation of Overlapping Sequences

    PubMed Central

    Agster, Kara L.; Fortin, Norbert J.; Eichenbaum, Howard

    2010-01-01

    Recent models of hippocampal function emphasize its potential role in disambiguating sequences of events that compose distinct episodic memories. In this study, rats were trained to distinguish two overlapping sequences of odor choices. The capacity to disambiguate the sequences was measured by the critical odor choice after the overlapping elements of the sequences. When the sequences were presented in rapid alternation, damage to the hippocampus, produced either by infusions of the neurotoxin ibotenic acid or by radiofrequency current, produced a severe deficit, although animals with radiofrequency lesions relearned the task. When the sequences were presented spaced apart and in random order, animals with radiofrequency hippocampal lesions could perform the task. However, they failed when a memory delay was imposed before the critical choice. These findings support the hypothesis that the hippocampus is involved in representing sequences of nonspatial events, particularly when interference between the sequences is high or when animals must remember across a substantial delay preceding items in a current sequence. PMID:12097529

  16. Multiplexed microsatellite recovery using massively parallel sequencing.

    PubMed

    Jennings, T N; Knaus, B J; Mullins, T D; Haig, S M; Cronn, R C

    2011-11-01

    Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356,958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5 M (USD).

  17. Multiplexed microsatellite recovery using massively parallel sequencing

    USGS Publications Warehouse

    Jennings, T.N.; Knaus, B.J.; Mullins, T.D.; Haig, S.M.; Cronn, R.C.

    2011-01-01

    Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5M (USD).

  18. Choice of next-generation sequencing pipelines.

    PubMed

    Del Chierico, F; Ancora, M; Marcacci, M; Cammà, C; Putignani, L; Conti, Salvatore

    2015-01-01

    The next-generation sequencing (NGS) technologies are revolutionary tools which have made possible achieving remarkable advances in genetics since the beginning of the twenty-first century. Thanks to the possibility to produce large amount of sequence data, these tools are going to completely substitute other high-throughput technologies. Moreover, the large applications of NGS protocols are increasing the genetic decoding of biological systems through studies of genome anatomy and gene mapping, coupled to the transcriptome pictures. The application of NGS pipelines such as (1) de-novo genomic sequencing by mate-paired and whole-genome shotgun strategies; (2) specific gene sequencing on large bacterial communities; and (3) RNA-seq methods including whole transcriptome sequencing and Serial Analysis of Gene Expression (Sage-analysis) are fundamental in the genome-wide fields like metagenomics. Recently, the availability of these advanced protocols has allowed to overcome the usual sequencing technical issues related to the mapping specificity over standard shotgun library sequencing, the detection of large structural genomes variations and bridging sequencing gaps, as well as more precise gene annotation. In this chapter we will discuss how to manage a successful NGS pipeline from the planning of sequencing projects through the choice of the platforms up to the data analysis management.

  19. DNA sequence from Cretaceous period bone fragments.

    PubMed

    Woodward, S R; Weyand, N J; Bunnell, M

    1994-11-18

    DNA was extracted from 80-million-year-old bone fragments found in strata of the Upper Cretaceous Blackhawk Formation in the roof of an underground coal mine in eastern Utah. This DNA was used as the template in a polymerase chain reaction that amplified and sequenced a portion of the gene encoding mitochondrial cytochrome b. These sequences differ from all other cytochrome b sequences investigated, including those in the GenBank and European Molecular Biology Laboratory databases. DNA isolated from these bone fragments and the resulting gene sequences demonstrate that small fragments of DNA may survive in bone for millions of years.

  20. A measurement of disorder in binary sequences

    NASA Astrophysics Data System (ADS)

    Gong, Longyan; Wang, Haihong; Cheng, Weiwen; Zhao, Shengmei

    2015-03-01

    We propose a complex quantity, AL, to characterize the degree of disorder of L-length binary symbolic sequences. As examples, we respectively apply it to typical random and deterministic sequences. One kind of random sequences is generated from a periodic binary sequence and the other is generated from the logistic map. The deterministic sequences are the Fibonacci and Thue-Morse sequences. In these analyzed sequences, we find that the modulus of AL, denoted by |AL | , is a (statistically) equivalent quantity to the Boltzmann entropy, the metric entropy, the conditional block entropy and/or other quantities, so it is a useful quantitative measure of disorder. It can be as a fruitful index to discern which sequence is more disordered. Moreover, there is one and only one value of |AL | for the overall disorder characteristics. It needs extremely low computational costs. It can be easily experimentally realized. From all these mentioned, we believe that the proposed measure of disorder is a valuable complement to existing ones in symbolic sequences.

  1. Using SEQUEST with Theoretically Complete Sequence Databases

    NASA Astrophysics Data System (ADS)

    Sadygov, Rovshan G.

    2015-11-01

    SEQUEST has long been used to identify peptides/proteins from their tandem mass spectra and protein sequence databases. The algorithm has proven to be hugely successful for its sensitivity and specificity in identifying peptides/proteins, the sequences of which are present in the protein sequence databases. In this work, we report on work that attempts a new use for the algorithm by applying it to search a complete list of theoretically possible peptides, a de novo-like sequencing. We used freely available mass spectral data and determined a number of unique peptides as identified by SEQUEST. Using masses of these peptides and the mass accuracy of 0.001 Da, we have created a database of all theoretically possible peptide sequences corresponding to the precursor masses. We used our recently developed algorithm for determining all amino acid compositions corresponding to a mass interval, and used a lexicographic ordering to generate theoretical sequences from the compositions. The newly generated theoretical database was many-fold more complex than the original protein sequence database. We used SEQUEST to search and identify the best matches to the spectra from all theoretically possible peptide sequences. We found that SEQUEST cross-correlation score ranked the correct peptide match among the top sequence matches. The results testify to the high specificity of SEQUEST when combined with the high mass accuracy for intact peptides.

  2. Quantitative phenotyping via deep barcode sequencing.

    PubMed

    Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

    2009-10-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.

  3. Compressing DNA sequence databases with coil

    PubMed Central

    White, W Timothy J; Hendy, Michael D

    2008-01-01

    Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work. PMID:18489794

  4. Transfer in Motor Sequence Learning: Effects of Practice Schedule and Sequence Context

    PubMed Central

    Müssgens, Diana M.; Ullén, Fredrik

    2015-01-01

    Transfer (i.e., the application of a learned skill in a novel context) is an important and desirable outcome of motor skill learning. While much research has been devoted to understanding transfer of explicit skills the mechanisms of skill transfer after incidental learning remain poorly understood. The aim of this study was to (1) examine the effect of practice schedule on transfer and (2) investigate whether sequence-specific knowledge can transfer to an unfamiliar sequence context. We trained two groups of participants on an implicit serial response time task under a Constant (one sequence for 10 blocks) or Variable (alternating between two sequences for a total of 10 blocks) practice schedule. We evaluated response times for three types of transfer: task-general transfer to a structurally non-overlapping sequence, inter-manual transfer to a perceptually identical sequence, and sequence-specific transfer to a partially overlapping (three shared triplets) sequence. Results showed partial skill transfer to all three sequences and an advantage of Variable practice only for task-general transfer. Further, we found expression of sequence-specific knowledge for familiar sub-sequences in the overlapping sequence. These findings suggest that (1) constant practice may create interference for task-general transfer and (2) sequence-specific knowledge can transfer to a new sequential context. PMID:26635591

  5. Direct Chloroplast Sequencing: Comparison of Sequencing Platforms and Analysis Tools for Whole Chloroplast Barcoding

    PubMed Central

    Brozynska, Marta; Furtado, Agnelo; Henry, Robert James

    2014-01-01

    Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina) and Ion Torrent (Life Technology) sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare). Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels) between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis. PMID:25329378

  6. VOE Computer Programming: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 3-year program in computer programming. The guide consists of a course description; general course…

  7. From Arithmetic Sequences to Linear Equations

    ERIC Educational Resources Information Center

    Matsuura, Ryota; Harless, Patrick

    2012-01-01

    The first part of the article focuses on deriving the essential properties of arithmetic sequences by appealing to students' sense making and reasoning. The second part describes how to guide students to translate their knowledge of arithmetic sequences into an understanding of linear equations. Ryota Matsuura originally wrote these lessons for…

  8. Wolbachia Sequence Typing in Butterflies Using Pyrosequencing.

    PubMed

    Choi, Sungmi; Shin, Su-Kyoung; Jeong, Gilsang; Yi, Hana

    2015-09-01

    Wolbachia is an obligate symbiotic bacteria that is ubiquitous in arthropods, with 25-70% of insect species estimated to be infected. Wolbachia species can interact with their insect hosts in a mutualistic or parasitic manner. Sequence types (ST) of Wolbachia are determined by multilocus sequence typing (MLST) of housekeeping genes. However, there are some limitations to MLST with respect to the generation of clone libraries and the Sanger sequencing method when a host is infected with multiple STs of Wolbachia. To assess the feasibility of massive parallel sequencing, also known as next-generation sequencing, we used pyrosequencing for sequence typing of Wolbachia in butterflies. We collected three species of butterflies (Eurema hecabe, Eurema laeta, and Tongeia fischeri) common to Korea and screened them for Wolbachia STs. We found that T. fischeri was infected with a single ST of Wolbachia, ST41. In contrast, E. hecabe and E. laeta were each infected with two STs of Wolbachia, ST41 and ST40. Our results clearly demonstrate that pyrosequencing-based MLST has a higher sensitivity than cloning and Sanger sequencing methods for the detection of minor alleles. Considering the high prevalence of infection with multiple Wolbachia STs, next-generation sequencing with improved analysis would assist with scaling up approaches to Wolbachia MLST.

  9. Draft Genome Sequences of Elizabethkingia meningoseptica

    PubMed Central

    Matyi, Stephanie A.; Hoyt, Peter R.; Hosoyama, Akira; Yamazoe, Atsushi; Fujita, Nobuyuki

    2013-01-01

    Elizabethkingia meningoseptica is ubiquitous in nature, exhibits a multiple-antibiotic resistance phenotype, and causes rare opportunistic infections. We now report two draft genome sequences of E. meningoseptica type strains that were sequenced independently in two laboratories. PMID:23846266

  10. Molecular selection in a unified evolutionary sequence

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1986-01-01

    With guidance from experiments and observations that indicate internally limited phenomena, an outline of unified evolutionary sequence is inferred. Such unification is not visible for a context of random matrix and random mutation. The sequence proceeds from Big Bang through prebiotic matter, protocells, through the evolving cell via molecular and natural selection, to mind, behavior, and society.

  11. Whole-Genome Sequencing: Manual Library Preparation.

    PubMed

    Mardis, Elaine; McCombie, W Richard

    2017-01-03

    This protocol describes a manual approach for the preparation of genomic DNA libraries suitable for Illumina sequencing. Genomic DNA fragments produced by shearing by sonication are ligated to adaptors and amplified by polymerase chain reaction (PCR). The amplified DNA, separated by size and gel-purified, is suitable for use as template in whole-genome sequencing.

  12. SEQUENCE IN LEARNING--FACT OR FICTION.

    ERIC Educational Resources Information Center

    MIEL, ALICE

    SEQUENCE IN LEARNING IS USEFUL ONLY AS IT CONTRIBUTES TO THE CONTINUITY OF A CHILD'S OVERALL DEVELOPMENT. CHILDREN MAY NOT GO THROUGH THE SAME SEQUENCE TO ARRIVE AT A SIMILAR POINT OF UNDERSTANDING. EDUCATIONAL PROGRESS IS INDICATED BY A CHILD'S GROWTH IN THE DEVELOPMENT OF STRATEGIC CONCEPTS, IN WAYS OF PROCESSING INFORMATION, AND IN WAYS OF…

  13. Parallel Computation of Multiple Biological Sequence Comparisons

    DTIC Science & Technology

    1989-07-01

    Stearothermophilus 408 Bacillus Megaterium 411 Bacillus Brevis 354 Pseudomonas Fluorescens 375 Salmonella Typhi 377 Escherichia Coli 282 Saccharomyces Octosporus...This included implied secondary structure and conservation of pairs of nucleotides that are complementary. The first four sequences are all Bacillus ...need to obtain sequences of ribonuclease P RNA from additional species to provide a more 13 Length Name 401 Bacillus Subtilis 417 Bacillus

  14. Understanding Sequence: Electrical Instrumentation, Millwright, Pipefitter.

    ERIC Educational Resources Information Center

    Atkinson, Rhonda; And Others

    Developed as part of the ABCs of Construction National Workplace Literacy Project, this instructional module contains instructional materials designed to help students understand the concept of sequencing and develop basic sequencing skills. The module begins with a unit in which instructional materials dealing with the construction industry are…

  15. Regular Pentagons and the Fibonacci Sequence.

    ERIC Educational Resources Information Center

    French, Doug

    1989-01-01

    Illustrates how to draw a regular pentagon. Shows the sequence of a succession of regular pentagons formed by extending the sides. Calculates the general formula of the Lucas and Fibonacci sequences. Presents a regular icosahedron as an example of the golden ratio. (YP)

  16. On Generalized Difference Hahn Sequence Spaces

    PubMed Central

    Raj, Kuldip; Kiliçman, Adem

    2014-01-01

    We construct some generalized difference Hahn sequence spaces by mean of sequence of modulus functions. The topological properties and some inclusion relations of spaces h p(ℱ, u, Δr) are investigated. Also we compute the dual of these spaces, and some matrix transformations are characterized. PMID:25025085

  17. Genome Sequence of Lactobacillus rhamnosus ATCC 8530

    PubMed Central

    Pittet, Vanessa; Ewen, Emily; Bushell, Barry R.

    2012-01-01

    Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences. PMID:22247527

  18. Multiplex De Novo Sequencing of Peptide Antibiotics

    NASA Astrophysics Data System (ADS)

    Mohimani, Hosein; Liu, Wei-Ting; Yang, Yu-Liang; Gaudêncio, Susana P.; Fenical, William; Dorrestein, Pieter C.; Pevzner, Pavel A.

    Proliferation of drug-resistant diseases raises the challenge of searching for new, more efficient antibiotics. Currently, some of the most effective antibiotics (i.e., Vancomycin and Daptomycin) are cyclic peptides produced by non-ribosomal biosynthetic pathways. The isolation and sequencing of cyclic peptide antibiotics, unlike the same activity with linear peptides, is time-consuming and error-prone. The dominant technique for sequencing cyclic peptides is NMR-based and requires large amounts (milligrams) of purified materials that, for most compounds, are not possible to obtain. Given these facts, there is a need for new tools to sequence cyclic NRPs using picograms of material. Since nearly all cyclic NRPs are produced along with related analogs, we develop a mass spectrometry approach for sequencing all related peptides at once (in contrast to the existing approach that analyzes individual peptides). Our results suggest that instead of attempting to isolate and NMR-sequence the most abundant compound, one should acquire spectra of many related compounds and sequence all of them simultaneously using tandem mass spectrometry. We illustrate applications of this approach by sequencing new variants of cyclic peptide antibiotics from Bacillus brevis, as well as sequencing a previously unknown familiy of cyclic NRPs produced by marine bacteria.

  19. Towards a reference pecan genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The cost of generating DNA sequence data has declined dramatically over the previous 15 years as a result of the Human Genome Project and the potential applications of genome sequencing for human medicine. This cost reduction has generated renewed interest among crop breeding scientists in applying...

  20. Convergence of a Linear Recursive Sequence

    ERIC Educational Resources Information Center

    Tay, E. G.; Toh, T. L.; Dong, F. M.; Lee, T. Y.

    2004-01-01

    A necessary and sufficient condition is found for a linear recursive sequence to be convergent, no matter what initial values are given. Its limit is also obtained when the sequence is convergent. Methods from various areas of mathematics are used to obtain the results.

  1. Optimizing cancer genome sequencing and analysis

    PubMed Central

    Griffith, Malachi; Miller, Christopher A.; Griffith, Obi L.; Krysiak, Kilannin; Skidmore, Zachary L.; Ramu, Avinash; Walker, Jason R.; Dang, Ha X.; Trani, Lee; Larson, David E.; Demeter, Ryan T.; Wendl, Michael C.; McMichael, Joshua F.; Austin, Rachel E.; Magrini, Vincent; McGrath, Sean D.; Ly, Amy; Kulkarni, Shashikant; Cordes, Matthew G.; Fronick, Catrina C.; Fulton, Robert S.; Maher, Christopher A.; Ding, Li; Klco, Jeffery M.; Mardis, Elaine R.; Ley, Timothy J.; Wilson, Richard K.

    2015-01-01

    Summary Tumors are typically sequenced to depths of 75–100× (exome) or 30–50× (whole genome). We demonstrate that current sequencing paradigms are inadequate for tumors that are impure, aneuploid or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ~312×) whole genome sequencing (WGS) and exome capture (up to ~433×) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested multiple alignment and variant calling algorithms and validated ~200,000 putative SNVs by sequencing them to depths of ~1,000×. Additional targeted sequencing provided over 10,000× coverage and ddPCR assays provided up to ~250,000× sampling of selected sites. We evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource (dbGaP accession id phs000159). PMID:26645048

  2. Marketing and Distributive Education: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 2-year program in marketing and distributive education. The guide consists of a course description;…

  3. Program Helps To Optimize Assembly Sequences

    NASA Technical Reports Server (NTRS)

    Borden, Chester S.; Werntz, David G.; Loyola, Steven J.

    1992-01-01

    FAST project-management software tool designed to optimize sequence of assembly of Space Station Freedom. Assesses effects of detailed changes upon system and produces output metrics identifying preferred assembly sequences. Incorporates Space-Shuttle integration, Space-Station hardware, on-orbit operations, and governing programmatic considerations as either precedence relations or numerical data. Written in C language.

  4. Learning of Sensory Sequences in Cerebellar Patients

    ERIC Educational Resources Information Center

    Frings, Markus; Boenisch, Raoul; Gerwig, Marcus; Diener, Hans-Christoph; Timmann, Dagmar

    2004-01-01

    A possible role of the cerebellum in detecting and recognizing event sequences has been proposed. The present study sought to determine whether patients with cerebellar lesions are impaired in the acquisition and discrimination of sequences of sensory stimuli of different modalities. A group of 26 cerebellar patients and 26 controls matched for…

  5. Archaebacterial rhodopsin sequences: Implications for evolution

    NASA Technical Reports Server (NTRS)

    Lanyi, J. K.

    1991-01-01

    It was proposed over 10 years ago that the archaebacteria represent a separate kingdom which diverged very early from the eubacteria and eukaryotes. It follows that investigations of archaebacterial characteristics might reveal features of early evolution. So far, two genes, one for bacteriorhodopsin and another for halorhodopsin, both from Halobacterium halobium, have been sequenced. We cloned and sequenced the gene coding for the polypeptide of another one of these rhodopsins, a halorhodopsin in Natronobacterium pharaonis. Peptide sequencing of cyanogen bromide fragments, and immuno-reactions of the protein and synthetic peptides derived from the C-terminal gene sequence, confirmed that the open reading frame was the structural gene for the pharaonis halorhodopsin polypeptide. The flanking DNA sequences of this gene, as well as those of other bacterial rhodopsins, were compared to previously proposed archaebacterial consensus sequences. In pairwise comparisons of the open reading frame with DNA sequences for bacterio-opsin and halo-opsin from Halobacterium halobium, silent divergences were calculated. These indicate very considerable evolutionary distance between each pair of genes, even in the dame organism. In spite of this, three protein sequences show extensive similarities, indicating strong selective pressures.

  6. Complete Genome Sequencing of Trivittatus virus

    PubMed Central

    Groseth, Allison; Vine, Veronica; Weisend, Carla; Ebihara, Hideki

    2015-01-01

    Trivittatus virus (family Bunyaviridae, genus Orthobunyavirus) represents an important genetic intermediate between the California encephalitis group, and Bwamba/Pongola and Nyando groups. Here, we report the first complete genome sequence of the prototype (Eklund) strain, isolated in 1948, which interestingly shows only few differences compared to partial sequences of modern strains. PMID:26212363

  7. Multilocus Sequence Typing Tool for Cyclospora cayetanensis

    PubMed Central

    Guo, Yaqiong; Roellig, Dawn M.; Li, Na; Tang, Kevin; Frace, Michael; Ortega, Ynes; Arrowood, Michael J.; Qvarnstrom, Yvonne; Wang, Lin; Moss, Delynn M.; Zhang, Longxian; Xiao, Lihua

    2016-01-01

    Because the lack of typing tools for Cyclospora cayetanensis has hampered outbreak investigations, we sequenced its genome and developed a genotyping tool. We observed 2 to 10 geographically segregated sequence types at each of 5 selected loci. This new tool could be useful for case linkage and infection/contamination source tracking. PMID:27433881

  8. Sequencing for the cream of the crop

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In this invited commentary, we discuss how next-generation sequencing methods are beginning to find their way into plant genetics, promising substantial improvements in crop yields over the coming decades. Next-generation sequencing facilitates the construction of high-resolution variation maps, whi...

  9. Complete Genome Sequence of Lleida Bat Lyssavirus

    PubMed Central

    Marston, Denise A.; Ellis, Richard J.; Wise, Emma L.; Aréchiga-Ceballos, Nidia; Freuling, Conrad M.; Banyard, Ashley C.; McElhinney, Lorraine M.; de Lamballerie, Xavier; Müller, Thomas; Echevarría, Juan E.

    2017-01-01

    ABSTRACT All lyssaviruses (family Rhabdoviridae) cause the disease rabies, an acute progressive encephalitis for which, once symptoms occur, there is no effective cure. Using next-generation sequencing, the full-genome sequence for a novel lyssavirus, Lleida bat lyssavirus (LLEBV), from the original brain of a common bent-winged bat has been confirmed. PMID:28082487

  10. Sequencing Events: Exploring Art and Art Jobs.

    ERIC Educational Resources Information Center

    Stephens, Pamela Geiger; Shaddix, Robin K.

    2000-01-01

    Presents an activity for upper-elementary students that correlates the actions of archaeologists, patrons, and artists with the sequencing of events in a logical order. Features ancient Egyptian art images. Discusses the preparation of materials, motivation, a pre-writing activity, and writing a story in sequence. (CMK)

  11. Using Conventional Sequences in L2 French

    ERIC Educational Resources Information Center

    Forsberg, Fanny

    2010-01-01

    By means of a phraseological identification method, this study provides a general description of the use of conventional sequences (CSs) in interviews at four different levels of spoken L2 French as well as in interviews with native speakers. Use of conventional sequences is studied with regard to overall quantity, category distribution and type…

  12. Sequence Factorial of "g"-Gonal Numbers

    ERIC Educational Resources Information Center

    Asiru, Muniru A.

    2013-01-01

    The gamma function, which has the property to interpolate the factorial whenever the argument is an integer, is a special case (the case "g"?=?2) of the general term of the sequence factorial of "g"-gonal numbers. In relation to this special case, a formula for calculating the general term of the sequence factorial of any…

  13. Some identities of generalized Fibonacci sequence

    NASA Astrophysics Data System (ADS)

    Chong, Chin-Yoon; Cheah, C. L.; Ho, C. K.

    2014-07-01

    We introduced the generalized Fibonacci sequence {Un} defined by U0 = 0, U1 = 1, and Un+2 = pUn+1+qUn for all p, q∈Z+ and for all non-negative integers n. In this paper, we obtained some recursive formulas of the sequence.

  14. On the sum of generalized Fibonacci sequence

    NASA Astrophysics Data System (ADS)

    Chong, Chin-Yoon; Ho, C. K.

    2014-06-01

    We consider the generalized Fibonacci sequence {Un defined by U0 = 0, U1 = 1, and Un+2 = pUn+1+qUn for all n∈Z0+ and p, q∈Z+. In this paper, we derived various sums of the generalized Fibonacci sequence from their recursive relations.

  15. Clustering metagenomic sequences with interpolated Markov models

    PubMed Central

    2010-01-01

    Background Sequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects. Results We present SCIMM (Sequence Clustering with Interpolated Markov Models), an unsupervised sequence clustering method. SCIMM achieves greater clustering accuracy than previous unsupervised approaches. We examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of SCIMM and supervised learning method Phymm called PHYSCIMM that performs better when evolutionarily close training genomes are available. Conclusions SCIMM and PHYSCIMM are highly accurate methods to cluster metagenomic sequences. SCIMM operates entirely unsupervised, making it ideal for environments containing mostly novel microbes. PHYSCIMM uses supervised learning to improve clustering in environments containing microbial strains from well-characterized genera. SCIMM and PHYSCIMM are available open source from http://www.cbcb.umd.edu/software/scimm. PMID:21044341

  16. What's Next? Judging Sequences of Binary Events

    ERIC Educational Resources Information Center

    Oskarsson, An T.; Van Boven, Leaf; McClelland, Gary H.; Hastie, Reid

    2009-01-01

    The authors review research on judgments of random and nonrandom sequences involving binary events with a focus on studies documenting gambler's fallacy and hot hand beliefs. The domains of judgment include random devices, births, lotteries, sports performances, stock prices, and others. After discussing existing theories of sequence judgments,…

  17. A Statistical Approach for Ambiguous Sequence Mappings

    Technology Transfer Automated Retrieval System (TEKTRAN)

    When attempting to map RNA sequences to a reference genome, high percentages of short sequence reads are often assigned to multiple genomic locations. One approach to handling these “ambiguous mappings” has been to discard them. This results in a loss of data, which can sometimes be as much as 45% o...

  18. Nonspatial Sequence Coding in CA1 Neurons

    PubMed Central

    Allen, Timothy A.; Salz, Daniel M.; McKenzie, Sam

    2016-01-01

    The hippocampus is critical to the memory for sequences of events, a defining feature of episodic memory. However, the fundamental neuronal mechanisms underlying this capacity remain elusive. While considerable research indicates hippocampal neurons can represent sequences of locations, direct evidence of coding for the memory of sequential relationships among nonspatial events remains lacking. To address this important issue, we recorded neural activity in CA1 as rats performed a hippocampus-dependent sequence-memory task. Briefly, the task involves the presentation of repeated sequences of odors at a single port and requires rats to identify each item as “in sequence” or “out of sequence”. We report that, while the animals' location and behavior remained constant, hippocampal activity differed depending on the temporal context of items—in this case, whether they were presented in or out of sequence. Some neurons showed this effect across items or sequence positions (general sequence cells), while others exhibited selectivity for specific conjunctions of item and sequence position information (conjunctive sequence cells) or for specific probe types (probe-specific sequence cells). We also found that the temporal context of individual trials could be accurately decoded from the activity of neuronal ensembles, that sequence coding at the single-cell and ensemble level was linked to sequence memory performance, and that slow-gamma oscillations (20–40 Hz) were more strongly modulated by temporal context and performance than theta oscillations (4–12 Hz). These findings provide compelling evidence that sequence coding extends beyond the domain of spatial trajectories and is thus a fundamental function of the hippocampus. SIGNIFICANCE STATEMENT The ability to remember the order of life events depends on the hippocampus, but the underlying neural mechanisms remain poorly understood. Here we addressed this issue by recording neural activity in hippocampal

  19. Cancer genome-sequencing study design.

    PubMed

    Mwenifumbo, Jill C; Marra, Marco A

    2013-05-01

    Discoveries from cancer genome sequencing have the potential to translate into advances in cancer prevention, diagnostics, prognostics, treatment and basic biology. Given the diversity of downstream applications, cancer genome-sequencing studies need to be designed to best fulfil specific aims. Knowledge of second-generation cancer genome-sequencing study design also facilitates assessment of the validity and importance of the rapidly growing number of published studies. In this Review, we focus on the practical application of second-generation sequencing technology (also known as next-generation sequencing) to cancer genomics and discuss how aspects of study design and methodological considerations - such as the size and composition of the discovery cohort - can be tailored to serve specific research aims.

  20. Variant Calling From Next Generation Sequence Data.

    PubMed

    Hansen, Nancy F

    2016-01-01

    The use of next generation nucleotide sequencing to discover and genotype small sequence variants has led to numerous insights into the molecular causes of various diseases. This chapter describes the use of freely available software to align next generation sequencing reads to a reference and then to use the resulting alignments to call, annotate, view, and filter small sequence variants. The suggested variant calling workflow includes read alignment with novoalign, the removal of polymerase chain reaction duplicate sequences with samtools or bamUtils, and the detection of variants with Freebayes or bam2mpg software. ANNOVAR is then used to annotate the predicted variants using gene models, population frequencies, and predicted mutation severity, producing variant files which can be viewed and filtered with the variant display tool VarSifter.

  1. Efficient Graph Sequence Mining Using Reverse Search

    NASA Astrophysics Data System (ADS)

    Inokuchi, Akihiro; Ikuta, Hiroaki; Washio, Takashi

    The mining of frequent subgraphs from labeled graph data has been studied extensively. Furthermore, much attention has recently been paid to frequent pattern mining from graph sequences. A method, called GTRACE, has been proposed to mine frequent patterns from graph sequences under the assumption that changes in graphs are gradual. Although GTRACE mines the frequent patterns efficiently, it still needs substantial computation time to mine the patterns from graph sequences containing large graphs and long sequences. In this paper, we propose a new version of GTRACE that permits efficient mining of frequent patterns based on the principle of a reverse search. The underlying concept of the reverse search is a general scheme for designing efficient algorithms for hard enumeration problems. Our performance study shows that the proposed method is efficient and scalable for mining both long and large graph sequence patterns and is several orders of magnitude faster than the original GTRACE.

  2. Primer design for large scale sequencing.

    PubMed

    Haas, S; Vingron, M; Poustka, A; Wiemann, S

    1998-06-15

    We have developed PRIDE, a primer design program that automatically designs primers in single contigs or whole sequencing projects to extend the already known sequence and to double strand single-stranded regions. The program is fully integrated into the Staden package (GAP4) and accessible with a graphical user interface. PRIDE uses a fuzzy logic-based system to calculate primer qualities. The computational performance of PRIDE is enhanced by using suffix trees to store the huge amount of data being produced. A test set of 110 sequencing primers and 11 PCR primer pairs has been designed on genomic templates, cDNAs and sequences containing repetitive elements to analyze PRIDE's success rate. The high performance of PRIDE, combined with its minimal requirement of user interaction and its fast algorithm, make this program useful for the large scale design of primers, especially in large sequencing projects.

  3. Disease gene identification strategies for exome sequencing

    PubMed Central

    Gilissen, Christian; Hoischen, Alexander; Brunner, Han G; Veltman, Joris A

    2012-01-01

    Next generation sequencing can be used to search for Mendelian disease genes in an unbiased manner by sequencing the entire protein-coding sequence, known as the exome, or even the entire human genome. Identifying the pathogenic mutation amongst thousands to millions of genomic variants is a major challenge, and novel variant prioritization strategies are required. The choice of these strategies depends on the availability of well-phenotyped patients and family members, the mode of inheritance, the severity of the disease and its population frequency. In this review, we discuss the current strategies for Mendelian disease gene identification by exome resequencing. We conclude that exome strategies are successful and identify new Mendelian disease genes in approximately 60% of the projects. Improvements in bioinformatics as well as in sequencing technology will likely increase the success rate even further. Exome sequencing is likely to become the most commonly used tool for Mendelian disease gene identification for the coming years. PMID:22258526

  4. EST processing: from trace to sequence.

    PubMed

    Schmid, Ralf; Blaxter, Mark

    2009-01-01

    A common task in EST projects is the conversion of sequence chromatograms originating from gel-based or capillary sequencers into annotated sequence objects. Here we describe the usage of a software pipeline (available from http://www.nematodes.org/bioinformatics/ ), which has been developed to make the most of EST datasets. This modular software solution is targeted toward small- to medium-sized EST projects and comprises a series of Perl scripts. The software design is based on our experience during EST projects for parasitic nematodes and other species. The trace2dbest module processes sequence trace files and prepares the text files necessary for the submission of the sequences to the public repository dbEST. PartiGene provides facilities for clustering and assembling the ESTs into putative gene objects or unigenes and organizes the data in a relational database. Additional tools are available for annotation and for making the data accessible via the World Wide Web.

  5. Alignment method for spectrograms of DNA sequences.

    PubMed

    Bucur, Anca; van Leeuwen, Jasper; Dimitrova, Nevenka; Mittal, Chetan

    2010-01-01

    DNA spectrograms express the periodicities of each of the four nucleotides A, T, C, and G in one or several genomic sequences to be analyzed. DNA spectral analysis can be applied to systematically investigate DNA patterns, which may correspond to relevant biological features. As opposed to looking at nucleotide sequences, spectrogram analysis may detect structural characteristics in very long sequences that are not identifiable by sequence alignment. Alignment of DNA spectrograms can be used to facilitate analysis of very long sequences or entire genomes at different resolutions. Standard clustering algorithms have been used in spectral analysis to find strong patterns in spectra. However, as they use a global distance metric, these algorithms can only detect strong patterns coexisting in several frequencies. In this paper, we propose a new method and several algorithms for aligning spectra suitable for efficient spectral analysis and allowing for the easy detection of strong patterns in both single frequencies and multiple frequencies.

  6. Genomic sequencing of Pleistocene cave bears.

    PubMed

    Noonan, James P; Hofreiter, Michael; Smith, Doug; Priest, James R; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J Chris; Pääbo, Svante; Rubin, Edward M

    2005-07-22

    Despite the greater information content of genomic DNA, ancient DNA studies have largely been limited to the amplification of mitochondrial sequences. Here we describe metagenomic libraries constructed with unamplified DNA extracted from skeletal remains of two 40,000-year-old extinct cave bears. Analysis of approximately 1 megabase of sequence from each library showed that despite significant microbial contamination, 5.8 and 1.1% of clones contained cave bear inserts, yielding 26,861 base pairs of cave bear genome sequence. Comparison of cave bear and modern bear sequences revealed the evolutionary relationship of these lineages. The metagenomic approach used here establishes the feasibility of ancient DNA genome sequencing programs.

  7. Sequencing and comparing whole mitochondrial genomes ofanimals

    SciTech Connect

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  8. Snake Genome Sequencing: Results and Future Prospects

    PubMed Central

    Kerkkamp, Harald M. I.; Kini, R. Manjunatha; Pospelov, Alexey S.; Vonk, Freek J.; Henkel, Christiaan V.; Richardson, Michael K.

    2016-01-01

    Snake genome sequencing is in its infancy—very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression. PMID:27916957

  9. Optimal assembly for high throughput shotgun sequencing

    PubMed Central

    2013-01-01

    We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization. PMID:23902516

  10. Primer design for large scale sequencing.

    PubMed Central

    Haas, S; Vingron, M; Poustka, A; Wiemann, S

    1998-01-01

    We have developed PRIDE, a primer design program that automatically designs primers in single contigs or whole sequencing projects to extend the already known sequence and to double strand single-stranded regions. The program is fully integrated into the Staden package (GAP4) and accessible with a graphical user interface. PRIDE uses a fuzzy logic-based system to calculate primer qualities. The computational performance of PRIDE is enhanced by using suffix trees to store the huge amount of data being produced. A test set of 110 sequencing primers and 11 PCR primer pairs has been designed on genomic templates, cDNAs and sequences containing repetitive elements to analyze PRIDE's success rate. The high performance of PRIDE, combined with its minimal requirement of user interaction and its fast algorithm, make this program useful for the large scale design of primers, especially in large sequencing projects. PMID:9611248

  11. Strategies for complete plastid genome sequencing.

    PubMed

    Twyford, Alex D; Ness, Rob W

    2016-10-28

    Plastid sequencing is an essential tool in the study of plant evolution. This high-copy organelle is one of the most technically accessible regions of the genome, and its sequence conservation makes it a valuable region for comparative genome evolution, phylogenetic analysis and population studies. Here, we discuss recent innovations and approaches for de novo plastid assembly that harness genomic tools. We focus on technical developments including low-cost sequence library preparation approaches for genome skimming, enrichment via hybrid baits and methylation-sensitive capture, sequence platforms with higher read outputs and longer read lengths, and automated tools for assembly. These developments allow for a much more streamlined assembly than via conventional short-range PCR. Although newer methods make complete plastid sequencing possible for any land plant or green alga, there are still challenges for producing finished plastomes particularly from herbarium material or from structurally divergent plastids such as those of parasitic plants.

  12. Nanopore sequencing detects structural variants in cancer.

    PubMed

    Norris, Alexis L; Workman, Rachael E; Fan, Yunfan; Eshleman, James R; Timp, Winston

    2016-01-01

    Despite advances in sequencing, structural variants (SVs) remain difficult to reliably detect due to the short read length (<300 bp) of 2nd generation sequencing. Not only do the reads (or paired-end reads) need to straddle a breakpoint, but repetitive elements often lead to ambiguities in the alignment of short reads. We propose to use the long-reads (up to 20 kb) possible with 3rd generation sequencing, specifically nanopore sequencing on the MinION. Nanopore sequencing relies on a similar concept to a Coulter counter, reading the DNA sequence from the change in electrical current resulting from a DNA strand being forced through a nanometer-sized pore embedded in a membrane. Though nanopore sequencing currently has a relatively high mismatch rate that precludes base substitution and small frameshift mutation detection, its accuracy is sufficient for SV detection because of its long reads. In fact, long reads in some cases may improve SV detection efficiency. We have tested nanopore sequencing to detect a series of well-characterized SVs, including large deletions, inversions, and translocations that inactivate the CDKN2A/p16 and SMAD4/DPC4 tumor suppressor genes in pancreatic cancer. Using PCR amplicon mixes, we have demonstrated that nanopore sequencing can detect large deletions, translocations and inversions at dilutions as low as 1:100, with as few as 500 reads per sample. Given the speed, small footprint, and low capital cost, nanopore sequencing could become the ideal tool for the low-level detection of cancer-associated SVs needed for molecular relapse, early detection, or therapeutic monitoring.

  13. Non-linear sequencing and cognizant failure

    NASA Astrophysics Data System (ADS)

    Gat, Erann

    1999-01-01

    Spacecraft are traditionally commanded using linear sequences of time-based commands. Linear sequences work fairly well, but they are difficult and expensive to generate, and are usually not capable of responding to contingencies. Any anomalous behavior while executing a linear sequence generally results in the spacecraft entering a safe mode. Critical sequences like orbit insertions which must be able to respond to faults without going into safe mode are particularly difficult to design and verify. The effort needed to generate command sequences can be reduced by extending the vocabulary of sequences to include more sophisticated control constructs. The simplest extensions are conditionals and loops. Adding these constructs would make a sequencing language look more or less like a traditional programming language or scripting language, and would come with all the difficulties associated with such a language. In particular, verifying the correctness of a sequence would be tantamount to verifying the correctness of a program, which is undecidable in general. We describe an extended vocabulary for non-linear sequencing based on the architectural notion of cognizant failure. A cognizant failure architecture is divided into components whose contract is to either achieve (or maintain) a certain condition, or report that they have failed to do so. Cognizant failure is an easier condition to verify than correctness, but it can provide high confidence in the safety of the spacecraft. Because cognizant failure inherently implies some kind of representation of the intent of an action, the system can respond to contingencies in more robust and general ways. We will describe an implemented non-linear sequencing system that is being flown on the NASA New Millennium Deep Space 1 Mission as part of the Remote Agent Experiment.

  14. Reading biological processes from nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Murugan, Anand

    Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical

  15. De novo sequencing of unique sequence tags for discovery of post-translational modifications of proteins

    SciTech Connect

    Shen, Yufeng; Tolic, Nikola; Hixson, Kim K.; Purvine, Samuel O.; Anderson, Gordon A.; Smith, Richard D.

    2008-10-15

    De novo sequencing has a promise to discover the protein post-translation modifications; however, such approach is still in their infancy and not widely applied for proteomics practices due to its limited reliability. In this work, we describe a de novo sequencing approach for discovery of protein modifications through identification of the UStags (Anal. Chem. 2008, 80, 1871-1882). The de novo information was obtained from Fourier-transform tandem mass spectrometry for peptides and polypeptides in a yeast lysate, and the de novo sequences obtained were filtered to define a more limited set of UStags. The DNA-predicted database protein sequences were then compared to the UStags, and the differences observed across or in the UStags (i.e., the UStags’ prefix and suffix sequences and the UStags themselves) were used to infer the possible sequence modifications. With this de novo-UStag approach, we uncovered some unexpected variances of yeast protein sequences due to amino acid mutations and/or multiple modifications to the predicted protein sequences. Random matching of the de novo sequences to the predicted sequences were examined with use of two random (false) databases, and ~3% false discovery rates were estimated for the de novo-UStag approach. The factors affecting the reliability (e.g., existence of de novo sequencing noise residues and redundant sequences) and the sensitivity are described. The de novo-UStag complements the UStag method previously reported by enabling discovery of new protein modifications.

  16. De novo assembly of a bell pepper endornavirus genome sequence using RNA sequencing data.

    PubMed

    Jo, Yeonhwa; Choi, Hoseng; Cho, Won Kyong

    2015-03-19

    The genus Endornavirus is a double-stranded RNA virus that infects a wide range of hosts. In this study, we report on the de novo assembly of a bell pepper endornavirus genome sequence by RNA sequencing (RNA-Seq). Our result demonstrates the successful application of RNA-Seq to obtain a complete viral genome sequence from the transcriptome data.

  17. Complete genome sequence of southern tomato virus identified from China using next generation sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...

  18. Microfluidic droplet enrichment for targeted sequencing

    PubMed Central

    Eastburn, Dennis J.; Huang, Yong; Pellegrino, Maurizio; Sciambi, Adam; Ptáček, Louis J.; Abate, Adam R.

    2015-01-01

    Targeted sequence enrichment enables better identification of genetic variation by providing increased sequencing coverage for genomic regions of interest. Here, we report the development of a new target enrichment technology that is highly differentiated from other approaches currently in use. Our method, MESA (Microfluidic droplet Enrichment for Sequence Analysis), isolates genomic DNA fragments in microfluidic droplets and performs TaqMan PCR reactions to identify droplets containing a desired target sequence. The TaqMan positive droplets are subsequently recovered via dielectrophoretic sorting, and the TaqMan amplicons are removed enzymatically prior to sequencing. We demonstrated the utility of this approach by generating an average 31.6-fold sequence enrichment across 250 kb of targeted genomic DNA from five unique genomic loci. Significantly, this enrichment enabled a more comprehensive identification of genetic polymorphisms within the targeted loci. MESA requires low amounts of input DNA, minimal prior locus sequence information and enriches the target region without PCR bias or artifacts. These features make it well suited for the study of genetic variation in a number of research and diagnostic applications. PMID:25873629

  19. Fungal genome sequencing: basic biology to biotechnology.

    PubMed

    Sharma, Krishna Kant

    2016-08-01

    The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet's stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research.

  20. CATEGORIZATION OF EVENT SEQUENCES FOR LICENSE APPLICATION

    SciTech Connect

    G.E. Ragan; P. Mecheret; D. Dexheimer

    2005-04-14

    The purposes of this analysis are: (1) Categorize (as Category 1, Category 2, or Beyond Category 2) internal event sequences that may occur before permanent closure of the repository at Yucca Mountain. (2) Categorize external event sequences that may occur before permanent closure of the repository at Yucca Mountain. This includes examining DBGM-1 seismic classifications and upgrading to DBGM-2, if appropriate, to ensure Beyond Category 2 categorization. (3) State the design and operational requirements that are invoked to make the categorization assignments valid. (4) Indicate the amount of material put at risk by Category 1 and Category 2 event sequences. (5) Estimate frequencies of Category 1 event sequences at the maximum capacity and receipt rate of the repository. (6) Distinguish occurrences associated with normal operations from event sequences. It is beyond the scope of the analysis to propose design requirements that may be required to control radiological exposure associated with normal operations. (7) Provide a convenient compilation of the results of the analysis in tabular form. The results of this analysis are used as inputs to the consequence analyses in an iterative design process that is depicted in Figure 1. Categorization of event sequences for permanent retrieval of waste from the repository is beyond the scope of this analysis. Cleanup activities that take place after an event sequence and other responses to abnormal events are also beyond the scope of the analysis.

  1. Value of a newly sequenced bacterial genome

    PubMed Central

    Barbosa, Eudes GV; Aburjaile, Flavia F; Ramos, Rommel TJ; Carneiro, Adriana R; Le Loir, Yves; Baumbach, Jan; Miyoshi, Anderson; Silva, Artur; Azevedo, Vasco

    2014-01-01

    Next-generation sequencing (NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft (partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the “scientific value” of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information. PMID:24921006

  2. Gelada vocal sequences follow Menzerath's linguistic law.

    PubMed

    Gustison, Morgan L; Semple, Stuart; Ferrer-I-Cancho, Ramon; Bergman, Thore J

    2016-05-10

    Identifying universal principles underpinning diverse natural systems is a key goal of the life sciences. A powerful approach in addressing this goal has been to test whether patterns consistent with linguistic laws are found in nonhuman animals. Menzerath's law is a linguistic law that states that, the larger the construct, the smaller the size of its constituents. Here, to our knowledge, we present the first evidence that Menzerath's law holds in the vocal communication of a nonhuman species. We show that, in vocal sequences of wild male geladas (Theropithecus gelada), construct size (sequence size in number of calls) is negatively correlated with constituent size (duration of calls). Call duration does not vary significantly with position in the sequence, but call sequence composition does change with sequence size and most call types are abbreviated in larger sequences. We also find that intercall intervals follow the same relationship with sequence size as do calls. Finally, we provide formal mathematical support for the idea that Menzerath's law reflects compression-the principle of minimizing the expected length of a code. Our findings suggest that a common principle underpins human and gelada vocal communication, highlighting the value of exploring the applicability of linguistic laws in vocal systems outside the realm of language.

  3. Identification of ancient remains through genomic sequencing

    PubMed Central

    Blow, Matthew J.; Zhang, Tao; Woyke, Tanja; Speller, Camilla F.; Krivoshapkin, Andrei; Yang, Dongya Y.; Derevianko, Anatoly; Rubin, Edward M.

    2008-01-01

    Studies of ancient DNA have been hindered by the preciousness of remains, the small quantities of undamaged DNA accessible, and the limitations associated with conventional PCR amplification. In these studies, we developed and applied a genomewide adapter-mediated emulsion PCR amplification protocol for ancient mammalian samples estimated to be between 45,000 and 69,000 yr old. Using 454 Life Sciences (Roche) and Illumina sequencing (formerly Solexa sequencing) technologies, we examined over 100 megabases of DNA from amplified extracts, revealing unbiased sequence coverage with substantial amounts of nonredundant nuclear sequences from the sample sources and negligible levels of human contamination. We consistently recorded over 500-fold increases, such that nanogram quantities of starting material could be amplified to microgram quantities. Application of our protocol to a 50,000-yr-old uncharacterized bone sample that was unsuccessful in mitochondrial PCR provided sufficient nuclear sequences for comparison with extant mammals and subsequent phylogenetic classification of the remains. The combined use of emulsion PCR amplification and high-throughput sequencing allows for the generation of large quantities of DNA sequence data from ancient remains. Using such techniques, even small amounts of ancient remains with low levels of endogenous DNA preservation may yield substantial quantities of nuclear DNA, enabling novel applications of ancient DNA genomics to the investigation of extinct phyla. PMID:18426903

  4. From deep sequencing to actual clones.

    PubMed

    D'Angelo, Sara; Kumar, Sandeep; Naranjo, Leslie; Ferrara, Fortunato; Kiss, Csaba; Bradbury, Andrew R M

    2014-10-01

    The application of deep sequencing to in vitro display technologies has been invaluable for the straightforward analysis of enriched clones. After sequencing in vitro selected populations, clones are binned into identical or similar groups and ordered by abundance, allowing identification of those that are most enriched. However, the greatest strength of deep sequencing is also its greatest weakness: clones are easily identified by their DNA sequences, but are not physically available for testing without a laborious multistep process involving several rounds of polymerization chain reaction (PCR), assembly and cloning. Here, using the isolation of antibody genes from a phage and yeast display selection as an example, we show the power of a rapid and simple inverse PCR-based method to easily isolate clones identified by deep sequencing. Once primers have been received, clone isolation can be carried out in a single day, rather than two days. Furthermore the reduced number of PCRs required will reduce PCR mutations correspondingly. We have observed a 100% success rate in amplifying clones with an abundance as low as 0.5% in a polyclonal population. This approach allows us to obtain full-length clones even when an incomplete sequence is available, and greatly simplifies the subcloning process. Moreover, rarer, but functional clones missed by traditional screening can be easily isolated using this method, and the approach can be extended to any selected library (scFv, cDNA, libraries based on scaffold proteins) where a unique sequence signature for the desired clones of interest is available.

  5. Exploration of Noncoding Sequences in Metagenomes

    PubMed Central

    Tobar-Tosse, Fabián; Rodríguez, Adrián C.; Vélez, Patricia E.; Zambrano, María M.; Moreno, Pedro A.

    2013-01-01

    Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C) content, Codon Usage (Cd), Trinucleotide Usage (Tn), and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS) in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment. PMID:23536879

  6. Robust temporal alignment of multimodal cardiac sequences

    NASA Astrophysics Data System (ADS)

    Perissinotto, Andrea; Queirós, Sandro; Morais, Pedro; Baptista, Maria J.; Monaghan, Mark; Rodrigues, Nuno F.; D'hooge, Jan; Vilaça, João. L.; Barbosa, Daniel

    2015-03-01

    Given the dynamic nature of cardiac function, correct temporal alignment of pre-operative models and intraoperative images is crucial for augmented reality in cardiac image-guided interventions. As such, the current study focuses on the development of an image-based strategy for temporal alignment of multimodal cardiac imaging sequences, such as cine Magnetic Resonance Imaging (MRI) or 3D Ultrasound (US). First, we derive a robust, modality-independent signal from the image sequences, estimated by computing the normalized cross-correlation between each frame in the temporal sequence and the end-diastolic frame. This signal is a resembler for the left-ventricle (LV) volume curve over time, whose variation indicates different temporal landmarks of the cardiac cycle. We then perform the temporal alignment of these surrogate signals derived from MRI and US sequences of the same patient through Dynamic Time Warping (DTW), allowing to synchronize both sequences. The proposed framework was evaluated in 98 patients, which have undergone both 3D+t MRI and US scans. The end-systolic frame could be accurately estimated as the minimum of the image-derived surrogate signal, presenting a relative error of 1.6 +/- 1.9% and 4.0 +/- 4.2% for the MRI and US sequences, respectively, thus supporting its association with key temporal instants of the cardiac cycle. The use of DTW reduces the desynchronization of the cardiac events in MRI and US sequences, allowing to temporally align multimodal cardiac imaging sequences. Overall, a generic, fast and accurate method for temporal synchronization of MRI and US sequences of the same patient was introduced. This approach could be straightforwardly used for the correct temporal alignment of pre-operative MRI information and intra-operative US images.

  7. Robot Sequencing and Visualization Program (RSVP)

    NASA Technical Reports Server (NTRS)

    Cooper, Brian K.; Maxwell,Scott A.; Hartman, Frank R.; Wright, John R.; Yen, Jeng; Toole, Nicholas T.; Gorjian, Zareh; Morrison, Jack C

    2013-01-01

    The Robot Sequencing and Visualization Program (RSVP) is being used in the Mars Science Laboratory (MSL) mission for downlink data visualization and command sequence generation. RSVP reads and writes downlink data products from the operations data server (ODS) and writes uplink data products to the ODS. The primary users of RSVP are members of the Rover Planner team (part of the Integrated Planning and Execution Team (IPE)), who use it to perform traversability/articulation analyses, take activity plan input from the Science and Mission Planning teams, and create a set of rover sequences to be sent to the rover every sol. The primary inputs to RSVP are downlink data products and activity plans in the ODS database. The primary outputs are command sequences to be placed in the ODS for further processing prior to uplink to each rover. RSVP is composed of two main subsystems. The first, called the Robot Sequence Editor (RoSE), understands the MSL activity and command dictionaries and takes care of converting incoming activity level inputs into command sequences. The Rover Planners use the RoSE component of RSVP to put together command sequences and to view and manage command level resources like time, power, temperature, etc. (via a transparent realtime connection to SEQGEN). The second component of RSVP is called HyperDrive, a set of high-fidelity computer graphics displays of the Martian surface in 3D and in stereo. The Rover Planners can explore the environment around the rover, create commands related to motion of all kinds, and see the simulated result of those commands via its underlying tight coupling with flight navigation, motor, and arm software. This software is the evolutionary replacement for the Rover Sequencing and Visualization software used to create command sequences (and visualize the Martian surface) for the Mars Exploration Rover mission.

  8. Effects of the Ion PGM™ Hi-Q™ sequencing chemistry on sequence data quality.

    PubMed

    Churchill, Jennifer D; King, Jonathan L; Chakraborty, Ranajit; Budowle, Bruce

    2016-09-01

    Massively parallel sequencing (MPS) offers substantial improvements over current forensic DNA typing methodologies such as increased resolution, scalability, and throughput. The Ion PGM™ is a promising MPS platform for analysis of forensic biological evidence. The system employs a sequencing-by-synthesis chemistry on a semiconductor chip that measures a pH change due to the release of hydrogen ions as nucleotides are incorporated into the growing DNA strands. However, implementation of MPS into forensic laboratories requires a robust chemistry. Ion Torrent's Hi-Q™ Sequencing Chemistry was evaluated to determine if it could improve on the quality of the generated sequence data in association with selected genetic marker targets. The whole mitochondrial genome and the HID-Ion STR 10-plex panel were sequenced on the Ion PGM™ system with the Ion PGM™ Sequencing 400 Kit and the Ion PGM™ Hi-Q™ Sequencing Kit. Concordance, coverage, strand balance, noise, and deletion ratios were assessed in evaluating the performance of the Ion PGM™ Hi-Q™ Sequencing Kit. The results indicate that reliable, accurate data are generated and that sequencing through homopolymeric regions can be improved with the use of Ion Torrent's Hi-Q™ Sequencing Chemistry. Overall, the quality of the generated sequencing data supports the potential for use of the Ion PGM™ in forensic genetic laboratories.

  9. Sequencing, analysis, and annotation of expressed sequence tags for Camelus dromedarius.

    PubMed

    Al-Swailem, Abdulaziz M; Shehata, Maher M; Abu-Duhier, Faisel M; Al-Yamani, Essam J; Al-Busadah, Khalid A; Al-Arawi, Mohammed S; Al-Khider, Ali Y; Al-Muhaimeed, Abdullah N; Al-Qahtani, Fahad H; Manee, Manee M; Al-Shomrani, Badr M; Al-Qhtani, Saad M; Al-Harthi, Amer S; Akdemir, Kadir C; Inan, Mehmet S; Otu, Hasan H

    2010-05-19

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and approximately 40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism.

  10. Statistical properties of filtered pseudorandom digital sequences formed from the sum of maximum-length sequences

    NASA Technical Reports Server (NTRS)

    Wallace, G. R.; Weathers, G. D.; Graf, E. R.

    1973-01-01

    The statistics of filtered pseudorandom digital sequences called hybrid-sum sequences, formed from the modulo-two sum of several maximum-length sequences, are analyzed. The results indicate that a relation exists between the statistics of the filtered sequence and the characteristic polynomials of the component maximum length sequences. An analysis procedure is developed for identifying a large group of sequences with good statistical properties for applications requiring the generation of analog pseudorandom noise. By use of the analysis approach, the filtering process is approximated by the convolution of the sequence with a sum of unit step functions. A parameter reflecting the overall statistical properties of filtered pseudorandom sequences is derived. This parameter is called the statistical quality factor. A computer algorithm to calculate the statistical quality factor for the filtered sequences is presented, and the results for two examples of sequence combinations are included. The analysis reveals that the statistics of the signals generated with the hybrid-sum generator are potentially superior to the statistics of signals generated with maximum-length generators. Furthermore, fewer calculations are required to evaluate the statistics of a large group of hybrid-sum generators than are required to evaluate the statistics of the same size group of approximately equivalent maximum-length sequences.

  11. Mixture of experts models to exploit global sequence similarity on biomolecular sequence labeling

    PubMed Central

    Caragea, Cornelia; Sinapov, Jivko; Dobbs, Drena; Honavar, Vasant

    2009-01-01

    Background Identification of functionally important sites in biomolecular sequences has broad applications ranging from rational drug design to the analysis of metabolic and signal transduction networks. Experimental determination of such sites lags far behind the number of known biomolecular sequences. Hence, there is a need to develop reliable computational methods for identifying functionally important sites from biomolecular sequences. Results We present a mixture of experts approach to biomolecular sequence labeling that takes into account the global similarity between biomolecular sequences. Our approach combines unsupervised and supervised learning techniques. Given a set of sequences and a similarity measure defined on pairs of sequences, we learn a mixture of experts model by using spectral clustering to learn the hierarchical structure of the model and by using bayesian techniques to combine the predictions of the experts. We evaluate our approach on two biomolecular sequence labeling problems: RNA-protein and DNA-protein interface prediction problems. The results of our experiments show that global sequence similarity can be exploited to improve the performance of classifiers trained to label biomolecular sequence data. Conclusion The mixture of experts model helps improve the performance of machine learning methods for identifying functionally important sites in biomolecular sequences. PMID:19426452

  12. Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

    PubMed Central

    Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  13. Update on Rover Sequencing and Visualization Program

    NASA Technical Reports Server (NTRS)

    Cooper, Brian; Hartman, Frank; Maxwell, Scott; Yen, Jeng; Wright, John; Balacuit, Carlos

    2005-01-01

    The Rover Sequencing and Visualization Program (RSVP) has been updated. RSVP was reported in Rover Sequencing and Visualization Program (NPO-30845), NASA Tech Briefs, Vol. 29, No. 4 (April 2005), page 38. To recapitulate: The Rover Sequencing and Visualization Program (RSVP) is the software tool to be used in the Mars Exploration Rover (MER) mission for planning rover operations and generating command sequences for accomplishing those operations. RSVP combines three-dimensional (3D) visualization for immersive exploration of the operations area, stereoscopic image display for high-resolution examination of the downlinked imagery, and a sophisticated command-sequence editing tool for analysis and completion of the sequences. RSVP is linked with actual flight code modules for operations rehearsal to provide feedback on the expected behavior of the rover prior to committing to a particular sequence. Playback tools allow for review of both rehearsed rover behavior and downlinked results of actual rover operations. These can be displayed simultaneously for comparison of rehearsed and actual activities for verification. The primary inputs to RSVP are downlink data products from the Operations Storage Server (OSS) and activity plans generated by the science team. The activity plans are high-level goals for the next day s activities. The downlink data products include imagery, terrain models, and telemetered engineering data on rover activities and state. The Rover Sequence Editor (RoSE) component of RSVP performs activity expansion to command sequences, command creation and editing with setting of command parameters, and viewing and management of rover resources. The HyperDrive component of RSVP performs 2D and 3D visualization of the rover s environment, graphical and animated review of rover predicted and telemetered state, and creation and editing of command sequences related to mobility and Instrument Deployment Device (robotic arm) operations. Additionally, RoSE and

  14. Initial retrieval sequence and blending strategy

    SciTech Connect

    Pemwell, D.L.; Grenard, C.E.

    1996-09-01

    This report documents the initial retrieval sequence and the methodology used to select it. Waste retrieval, storage, pretreatment and vitrification were modeled for candidate single-shell tank retrieval sequences. Performance of the sequences was measured by a set of metrics (for example,high-level waste glass volume, relative risk and schedule).Computer models were used to evaluate estimated glass volumes,process rates, retrieval dates, and blending strategy effects.The models were based on estimates of component inventories and concentrations, sludge wash factors and timing, retrieval annex limitations, etc.

  15. Mapping Replication Origin Sequences in Eukaryotic Chromosomes

    PubMed Central

    Fu, Haiqing; Besnard, Emilie; Desprat, Romain; Ryan, Michael; Kahli, Malik; Lemaitre, Jean-Marc; Aladjem, Mirit I.

    2014-01-01

    Recent advances in genome sequencing technology have led towards the complete mapping of DNA replication initiation sites in the human genome. This thorough origin mapping facilitates the understanding of the relationship between replication initiation events, transcription and chromatin modifications and allows the characterization of consensus sequences of potential replication origins. This unit provides a detailed protocol for isolation and sequence analyses of nascent DNA strands. Two variations of the protocol based on non-overlapping assumptions are described below, addressing potential bias issues for whole genome analyses. PMID:25447077

  16. New Stopping Criteria for Segmenting DNA Sequences

    SciTech Connect

    Li, Wentian

    2001-06-18

    We propose a solution on the stopping criterion in segmenting inhomogeneous DNA sequences with complex statistical patterns. This new stopping criterion is based on Bayesian information criterion in the model selection framework. When this criterion is applied to telomere of S.cerevisiae and the complete sequence of E.coli, borders of biologically meaningful units were identified, and a more reasonable number of domains was obtained. We also introduce a measure called segmentation strength which can be used to control the delineation of large domains. The relationship between the average domain size and the threshold of segmentation strength is determined for several genome sequences.

  17. New Stopping Criteria for Segmenting DNA Sequences

    NASA Astrophysics Data System (ADS)

    Li, Wentian

    2001-06-01

    We propose a solution on the stopping criterion in segmenting inhomogeneous DNA sequences with complex statistical patterns. This new stopping criterion is based on Bayesian information criterion in the model selection framework. When this criterion is applied to telomere of S. cerevisiae and the complete sequence of E. coli, borders of biologically meaningful units were identified, and a more reasonable number of domains was obtained. We also introduce a measure called segmentation strength which can be used to control the delineation of large domains. The relationship between the average domain size and the threshold of segmentation strength is determined for several genome sequences.

  18. Scale-PC shielding analysis sequences

    SciTech Connect

    Bowman, S.M.

    1996-05-01

    The SCALE computational system is a modular code system for analyses of nuclear fuel facility and package designs. With the release of SCALE-PC Version 4.3, the radiation shielding analysis community now has the capability to execute the SCALE shielding analysis sequences contained in the control modules SAS1, SAS2, SAS3, and SAS4 on a MS- DOS personal computer (PC). In addition, SCALE-PC includes two new sequences, QADS and ORIGEN-ARP. The capabilities of each sequence are presented, along with example applications.

  19. Method for sequencing DNA base pairs

    DOEpatents

    Sessler, Andrew M.; Dawson, John

    1993-01-01

    The base pairs of a DNA structure are sequenced with the use of a scanning tunneling microscope (STM). The DNA structure is scanned by the STM probe tip, and, as it is being scanned, the DNA structure is separately subjected to a sequence of infrared radiation from four different sources, each source being selected to preferentially excite one of the four different bases in the DNA structure. Each particular base being scanned is subjected to such sequence of infrared radiation from the four different sources as that particular base is being scanned. The DNA structure as a whole is separately imaged for each subjection thereof to radiation from one only of each source.

  20. Iterative method for generating correlated binary sequences

    NASA Astrophysics Data System (ADS)

    Usatenko, O. V.; Melnik, S. S.; Apostolov, S. S.; Makarov, N. M.; Krokhin, A. A.

    2014-11-01

    We propose an efficient iterative method for generating random correlated binary sequences with a prescribed correlation function. The method is based on consecutive linear modulations of an initially uncorrelated sequence into a correlated one. Each step of modulation increases the correlations until the desired level has been reached. The robustness and efficiency of the proposed algorithm are tested by generating sequences with inverse power-law correlations. The substantial increase in the strength of correlation in the iterative method with respect to single-step filtering generation is shown for all studied correlation functions. Our results can be used for design of disordered superlattices, waveguides, and surfaces with selective transport properties.

  1. Model of evolution of molecular sequences

    NASA Astrophysics Data System (ADS)

    Luo, Liaofu; Tsai, Lu; Lee, Weijiang

    1990-05-01

    A simplified model of the evolution of molecular sequences is proposed. An ensemble of strings is considered that consists of two letters and undergoes random point mutations and natural selections. A set of evolution equations is deduced. From the solution it is found that the first-order (second-order) informational parameters (redundancies) D1 decrease (D2 increase) in the course of evolution. Furthermore, the statistical correlations of the letters (bases) in the sequences are investigated in detail and the short-distance correlation is demonstrated. These results give a preliminary explanation of some physical aspects in the evolution of nucleic acid sequences.

  2. Finding important sites in protein sequences

    PubMed Central

    Bickel, Peter J.; Kechris, Katherina J.; Spector, Philip C.; Wedemayer, Gary J.; Glazer, Alexander N.

    2002-01-01

    By using sequence information from an aligned protein family, a procedure is exhibited for finding sites that may be functionally or structurally critical to the protein. Features based on sequence conservation within subfamilies in the alignment and associations between sites are used to select the sites. The sites are subject to statistical evaluation correcting for phylogenetic bias in the collection of sequences. This method is applied to two families: the phycobiliproteins, light-harvesting proteins in cyanobacteria, red algae, and cryptomonads, and the globins that function in oxygen storage and transport. The sites identified by the procedure are located in key structural positions and merit further experimental study. PMID:12417758

  3. The complete amino acid sequence of prochymosin.

    PubMed Central

    Foltmann, B; Pedersen, V B; Jacobsen, H; Kauffman, D; Wybrandt, G

    1977-01-01

    The total sequence of 365 amino acid residues in bovine prochymosin is presented. Alignment with the amino acid sequence of porcine pepsinogen shows that 204 amino acid residues are common to the two zymogens. Further comparison and alignment with the amino acid sequence of penicillopepsin shows that 66 residues are located at identical positions in all three proteases. The three enzymes belong to a large group of proteases with two aspartate residues in the active center. This group forms a family derived from one common ancestor. PMID:329280

  4. Next-generation sequencing discoveries in lymphoma.

    PubMed

    Slack, Graham W; Gascoyne, Randy D

    2013-03-01

    Since the mapping of the human genome and the advent of next-generation sequencing technology thorough examination of the cancer genome has become a reality. Over the last few years several studies have used next-generation sequencing technology to investigate the genetic landscape of Hodgkin and non-Hodgkin lymphomas, identifying novel genetic mutations and gene rearrangements that have shed new light on the underlying tumor biology in these diseases as well as identifying possible targets for directed therapy. This review covers the major discoveries in lymphoma using next-generation sequencing technology.

  5. Nanopore-CMOS Interfaces for DNA Sequencing.

    PubMed

    Magierowski, Sebastian; Huang, Yiyun; Wang, Chengjie; Ghafar-Zadeh, Ebrahim

    2016-08-06

    DNA sequencers based on nanopore sensors present an opportunity for a significant break from the template-based incumbents of the last forty years. Key advantages ushered by nanopore technology include a simplified chemistry and the ability to interface to CMOS technology. The latter opportunity offers substantial promise for improvement in sequencing speed, size and cost. This paper reviews existing and emerging means of interfacing nanopores to CMOS technology with an emphasis on massively-arrayed structures. It presents this in the context of incumbent DNA sequencing techniques, reviews and quantifies nanopore characteristics and models and presents CMOS circuit methods for the amplification of low-current nanopore signals in such interfaces.

  6. Real-Time DNA Sequencing in the Antarctic Dry Valleys Using the Oxford Nanopore Sequencer

    PubMed Central

    Johnson, Sarah S.; Zaikova, Elena; Goerlitz, David S.; Bai, Yu; Tighe, Scott W.

    2017-01-01

    The ability to sequence DNA outside of the laboratory setting has enabled novel research questions to be addressed in the field in diverse areas, ranging from environmental microbiology to viral epidemics. Here, we demonstrate the application of offline DNA sequencing of environmental samples using a hand-held nanopore sequencer in a remote field location: the McMurdo Dry Valleys, Antarctica. Sequencing was performed using a MK1B MinION sequencer from Oxford Nanopore Technologies (ONT; Oxford, United Kingdom) that was equipped with software to operate without internet connectivity. One-direction (1D) genomic libraries were prepared using portable field techniques on DNA isolated from desiccated microbial mats. By adequately insulating the sequencer and laptop, it was possible to run the sequencing protocol for up to 2½ h under arduous conditions. PMID:28337073

  7. New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing

    PubMed Central

    Song, Kai; Ren, Jie; Reinert, Gesine; Deng, Minghua

    2014-01-01

    With the development of next-generation sequencing (NGS) technologies, a large amount of short read data has been generated. Assembly of these short reads can be challenging for genomes and metagenomes without template sequences, making alignment-based genome sequence comparison difficult. In addition, sequence reads from NGS can come from different regions of various genomes and they may not be alignable. Sequence signature-based methods for genome comparison based on the frequencies of word patterns in genomes and metagenomes can potentially be useful for the analysis of short reads data from NGS. Here we review the recent development of alignment-free genome and metagenome comparison based on the frequencies of word patterns with emphasis on the dissimilarity measures between sequences, the statistical power of these measures when two sequences are related and the applications of these measures to NGS data. PMID:24064230

  8. DNA Sequence Determination by Hybridization: A Strategy for Efficient Large-Scale Sequencing

    NASA Astrophysics Data System (ADS)

    Drmanac, R.; Drmanac, S.; Strezoska, Z.; Paunesku, T.; Labat, I.; Zeremski, M.; Snoddy, J.; Funkhouser, W. K.; Koop, B.; Hood, L.; Crkvenjakov, R.

    1993-06-01

    The concept of sequencing by hybridization (SBH) makes use of an array of all possible n-nucleotide oligomers (n-mers) to identify n-mers present in an unknown DNA sequence. Computational approaches can then be used to assemble the complete sequence. As a validation of this concept, the sequences of three DNA fragments, 343 base pairs in length, were determined with octamer oligonucleotides. Possible applications of SBH include physical mapping (ordering) of overlapping DNA clones, sequence checking, DNA fingerprinting comparisons of normal and disease-causing genes, and the identification of DNA fragments with particular sequence motifs in complementary DNA and genomic libraries. The SBH techniques may accelerate the mapping and sequencing phases of the human genome project.

  9. Pittosporum cryptic virus 1: genome sequence completion using next-generation sequencing.

    PubMed

    Elbeaino, Toufic; Kubaa, Raied Abou; Tuzlali, Hasan Tuna; Digiaro, Michele

    2016-07-01

    Next-generation sequencing (NGS) was applied to dsRNAs extracted from an Italian pittosporum plant infected with pittosporum cryptic virus 1 (PiCV1). NGS allowed assembly of the full genome sequence of PiCV1, comprising dsRNA1 (1.9 kbp) and dsRNA2 (1.5 kbp), which encode the RNA-dependent RNA polymerase and capsid protein genes, respectively. Phylogenetic and sequence analyses confirmed that PiCV1 is a new member of the genus Deltapartitivirus, family Partiviridae. From the same plant, NSG also permitted assembly of the complete genome sequence of eggplant mottled dwarf virus (EMDV), which shared 86 % to 98 % nucleotide sequence identity with complete and partial sequences (ca 6750 nt) of other known EMDV isolates with sequences available in the GenBank database.

  10. Real-Time DNA Sequencing in the Antarctic Dry Valleys Using the Oxford Nanopore Sequencer.

    PubMed

    Johnson, Sarah S; Zaikova, Elena; Goerlitz, David S; Bai, Yu; Tighe, Scott W

    2017-04-01

    The ability to sequence DNA outside of the laboratory setting has enabled novel research questions to be addressed in the field in diverse areas, ranging from environmental microbiology to viral epidemics. Here, we demonstrate the application of offline DNA sequencing of environmental samples using a hand-held nanopore sequencer in a remote field location: the McMurdo Dry Valleys, Antarctica. Sequencing was performed using a MK1B MinION sequencer from Oxford Nanopore Technologies (ONT; Oxford, United Kingdom) that was equipped with software to operate without internet connectivity. One-direction (1D) genomic libraries were prepared using portable field techniques on DNA isolated from desiccated microbial mats. By adequately insulating the sequencer and laptop, it was possible to run the sequencing protocol for up to 2½ h under arduous conditions.

  11. Coupling sequencing by hybridization (SBH) with gel sequencing for an inexpensive analysis of genes and genomes

    SciTech Connect

    Drmanac, S.; Labat, I.; Hauser, B.; Drmanac, R.

    1996-11-01

    The speed and cost of DNA sequencing are bottlenecks in the analysis of genes end genomes. Sequencing by hybridization (SBH) is a versatile method with several applications which can accelerated DNA screening, mapping and sequencing. Requirements, achievements and problems in the development of the SBH format 1 (DNA samples arrayed) are presented and schemes for its synergetic coupling with gel sequencing techniques are discussed. It appears that by one hybridization machine with 24 boxes and four ABI gel sequencers 100- 300 Mb of DNA sequence can be determined per year. Various genetic studies based on computer assisted analysis of large collections of partial or complete DNA sequences (`sequenetics`) may be achieved in this century.

  12. DNA sequence determination by hybridization: A strategy for efficient large-scale sequencing

    SciTech Connect

    Drmanac, R.; Drmanac, S.; Strezoska, Z.; Paunesku, T.; Labat, I.; Zeremski, M.; Snoody, J.; Crkvenjakov, R. ); Funkhouser, W.K.; Koop, B.; Hood, L. )

    1993-06-11

    The concept of sequencing by hybridization (SBH) makes use of an array of all possible n-nucleotide oligomers (n-mers) to identify n-mers present in an unknown DNA sequence. Computational approaches can then be used to assemble the complete sequence. As a validation of this concept, the sequences of three DNA fragments, 343 base pairs in length, were determined with octamer oligonucleotides. Possible applications of SBH include physical mapping (ordering) of overlapping DNA clones, sequence checking, DNA fingerprinting comparisons of normal and disease-causing genes, and the identification of DNA fragments with particular sequence motifs in complementary DNA and genomic libraries. The SBH techniques may accelerate the mapping and sequencing phases of the human genome project. 22 refs., 3 figs.

  13. Deep sequencing increases hepatitis C virus phylogenetic cluster detection compared to Sanger sequencing.

    PubMed

    Montoya, Vincent; Olmstead, Andrea; Tang, Patrick; Cook, Darrel; Janjua, Naveed; Grebely, Jason; Jacka, Brendan; Poon, Art F Y; Krajden, Mel

    2016-09-01

    Effective surveillance and treatment strategies are required to control the hepatitis C virus (HCV) epidemic. Phylogenetic analyses are powerful tools for reconstructing the evolutionary history of viral outbreaks and identifying transmission clusters. These studies often rely on Sanger sequencing which typically generates a single consensus sequence for each infected individual. For rapidly mutating viruses such as HCV, consensus sequencing underestimates the complexity of the viral quasispecies population and could therefore generate different phylogenetic tree topologies. Although deep sequencing provides a more detailed quasispecies characterization, in-depth phylogenetic analyses are challenging due to dataset complexity and computational limitations. Here, we apply deep sequencing to a characterized population to assess its ability to identify phylogenetic clusters compared with consensus Sanger sequencing. For deep sequencing, a sample specific threshold determined by the 50th percentile of the patristic distance distribution for all variants within each individual was used to identify clusters. Among seven patristic distance thresholds tested for the Sanger sequence phylogeny ranging from 0.005-0.06, a threshold of 0.03 was found to provide the maximum balance between positive agreement (samples in a cluster) and negative agreement (samples not in a cluster) relative to the deep sequencing dataset. From 77 HCV seroconverters, 10 individuals were identified in phylogenetic clusters using both methods. Deep sequencing analysis identified an additional 4 individuals and excluded 8 other individuals relative to Sanger sequencing. The application of this deep sequencing approach could be a more effective tool to understand onward HCV transmission dynamics compared with Sanger sequencing, since the incorporation of minority sequence variants improves the discrimination of phylogenetically linked clusters.

  14. Hippocampal theta sequences reflect current goals.

    PubMed

    Wikenheiser, Andrew M; Redish, A David

    2015-02-01

    Hippocampal information processing is discretized by oscillations, and the ensemble activity of place cells is organized into temporal sequences bounded by theta cycles. Theta sequences represent time-compressed trajectories through space. Their forward-directed nature makes them an intuitive candidate mechanism for planning future trajectories, but their connection to goal-directed behavior remains unclear. As rats performed a value-guided decision-making task, the extent to which theta sequences projected ahead of the animal's current location varied on a moment-by-moment basis depending on the rat's goals. Look-ahead extended farther on journeys to distant goals than on journeys to more proximal goals and was predictive of the animal's destination. On arrival at goals, however, look-ahead was similar regardless of where the animal began its journey from. Together, these results provide evidence that hippocampal theta sequences contain information related to goals or intentions, pointing toward a potential spatial basis for planning.

  15. Prompting Sequences in Teaching Independent Living Skills.

    ERIC Educational Resources Information Center

    Walls, Richard T.; And Others

    1981-01-01

    The effects of three prompting sequences on the acquisition of independent living skills with 14 mild and moderately mentally retarded vocational rehabilitation clients (16 to 50 years old) are examined. (Author)

  16. Extracting biological knowledge from DNA sequences

    SciTech Connect

    De La Vega, F.M.; Thieffry, D. |; Collado-Vides, J.

    1996-12-31

    This session describes the elucidation of information from dna sequences and what challenges computational biologists face in their task of summarizing and deciphering the human genome. Techniques discussed include methods from statistics, information theory, artificial intelligence and linguistics. 1 ref.

  17. The genome sequence of Drosophila melanogaster.

    PubMed

    Adams, M D; Celniker, S E; Holt, R A; Evans, C A; Gocayne, J D; Amanatides, P G; Scherer, S E; Li, P W; Hoskins, R A; Galle, R F; George, R A; Lewis, S E; Richards, S; Ashburner, M; Henderson, S N; Sutton, G G; Wortman, J R; Yandell, M D; Zhang, Q; Chen, L X; Brandon, R C; Rogers, Y H; Blazej, R G; Champe, M; Pfeiffer, B D; Wan, K H; Doyle, C; Baxter, E G; Helt, G; Nelson, C R; Gabor, G L; Abril, J F; Agbayani, A; An, H J; Andrews-Pfannkoch, C; Baldwin, D; Ballew, R M; Basu, A; Baxendale, J; Bayraktaroglu, L; Beasley, E M; Beeson, K Y; Benos, P V; Berman, B P; Bhandari, D; Bolshakov, S; Borkova, D; Botchan, M R; Bouck, J; Brokstein, P; Brottier, P; Burtis, K C; Busam, D A; Butler, H; Cadieu, E; Center, A; Chandra, I; Cherry, J M; Cawley, S; Dahlke, C; Davenport, L B; Davies, P; de Pablos, B; Delcher, A; Deng, Z; Mays, A D; Dew, I; Dietz, S M; Dodson, K; Doup, L E; Downes, M; Dugan-Rocha, S; Dunkov, B C; Dunn, P; Durbin, K J; Evangelista, C C; Ferraz, C; Ferriera, S; Fleischmann, W; Fosler, C; Gabrielian, A E; Garg, N S; Gelbart, W M; Glasser, K; Glodek, A; Gong, F; Gorrell, J H; Gu, Z; Guan, P; Harris, M; Harris, N L; Harvey, D; Heiman, T J; Hernandez, J R; Houck, J; Hostin, D; Houston, K A; Howland, T J; Wei, M H; Ibegwam, C; Jalali, M; Kalush, F; Karpen, G H; Ke, Z; Kennison, J A; Ketchum, K A; Kimmel, B E; Kodira, C D; Kraft, C; Kravitz, S; Kulp, D; Lai, Z; Lasko, P; Lei, Y; Levitsky, A A; Li, J; Li, Z; Liang, Y; Lin, X; Liu, X; Mattei, B; McIntosh, T C; McLeod, M P; McPherson, D; Merkulov, G; Milshina, N V; Mobarry, C; Morris, J; Moshrefi, A; Mount, S M; Moy, M; Murphy, B; Murphy, L; Muzny, D M; Nelson, D L; Nelson, D R; Nelson, K A; Nixon, K; Nusskern, D R; Pacleb, J M; Palazzolo, M; Pittman, G S; Pan, S; Pollard, J; Puri, V; Reese, M G; Reinert, K; Remington, K; Saunders, R D; Scheeler, F; Shen, H; Shue, B C; Sidén-Kiamos, I; Simpson, M; Skupski, M P; Smith, T; Spier, E; Spradling, A C; Stapleton, M; Strong, R; Sun, E; Svirskas, R; Tector, C; Turner, R; Venter, E; Wang, A H; Wang, X; Wang, Z Y; Wassarman, D A; Weinstock, G M; Weissenbach, J; Williams, S M; WoodageT; Worley, K C; Wu, D; Yang, S; Yao, Q A; Ye, J; Yeh, R F; Zaveri, J S; Zhan, M; Zhang, G; Zhao, Q; Zheng, L; Zheng, X H; Zhong, F N; Zhong, W; Zhou, X; Zhu, S; Zhu, X; Smith, H O; Gibbs, R A; Myers, E W; Rubin, G M; Venter, J C

    2000-03-24

    The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

  18. Nanopore DNA sequencing using kinetic proofreading

    NASA Astrophysics Data System (ADS)

    Ling, Xinsheng

    We propose a method of DNA sequencing by combining the physical method of nanopore electrical measurements and Southern's sequencing-by-hybridization. The new key ingredient, essential to both lowering the costs and increasing the precision, is an asymmetric nanopore sandwich device capable of measuring the DNA hybridization probe twice separated by a designed waiting time. Those incorrect probes appearing only once in nanopore ionic current traces are discriminated from the correct ones that appear twice. This method of discrimination is similar to the principle of kinetic proofreading proposed by Hopfield and Ninio in gene transcription and translation processes. An error analysis is of this nanopore kinetic proofreading (nKP) technique for DNA sequencing is carried out in comparison with the most precise 3' dideoxy termination method developed by Sanger. Nanopore DNA sequencing using kinetic proofreading.

  19. Sequencing Information Management System (SIMS). Final report

    SciTech Connect

    Fields, C.

    1996-02-15

    A feasibility study to develop a requirements analysis and functional specification for a data management system for large-scale DNA sequencing laboratories resulted in a functional specification for a Sequencing Information Management System (SIMS). This document reports the results of this feasibility study, and includes a functional specification for a SIMS relational schema. The SIMS is an integrated information management system that supports data acquisition, management, analysis, and distribution for DNA sequencing laboratories. The SIMS provides ad hoc query access to information on the sequencing process and its results, and partially automates the transfer of data between laboratory instruments, analysis programs, technical personnel, and managers. The SIMS user interfaces are designed for use by laboratory technicians, laboratory managers, and scientists. The SIMS is designed to run in a heterogeneous, multiplatform environment in a client/server mode. The SIMS communicates with external computational and data resources via the internet.

  20. ARB: a software environment for sequence data

    PubMed Central

    Ludwig, Wolfgang; Strunk, Oliver; Westram, Ralf; Richter, Lothar; Meier, Harald; Yadhukumar; Buchner, Arno; Lai, Tina; Steppi, Susanne; Jobb, Gangolf; Förster, Wolfram; Brettske, Igor; Gerber, Stefan; Ginhart, Anton W.; Gross, Oliver; Grumann, Silke; Hermann, Stefan; Jost, Ralf; König, Andreas; Liss, Thomas; Lüßmann, Ralph; May, Michael; Nonhoff, Björn; Reichel, Boris; Strehlow, Robert; Stamatakis, Alexandros; Stuckmann, Norbert; Vilbig, Alexander; Lenke, Michael; Ludwig, Thomas; Bode, Arndt; Schleifer, Karl-Heinz

    2004-01-01

    The ARB (from Latin arbor, tree) project was initiated almost 10 years ago. The ARB program package comprises a variety of directly interacting software tools for sequence database maintenance and analysis which are controlled by a common graphical user interface. Although it was initially designed for ribosomal RNA data, it can be used for any nucleic and amino acid sequence data as well. A central database contains processed (aligned) primary structure data. Any additional descriptive data can be stored in database fields assigned to the individual sequences or linked via local or worldwide networks. A phylogenetic tree visualized in the main window can be used for data access and visualization. The package comprises additional tools for data import and export, sequence alignment, primary and secondary structure editing, profile and filter calculation, phylogenetic analyses, specific hybridization probe design and evaluation and other components for data analysis. Currently, the package is used by numerous working groups worldwide. PMID:14985472

  1. Neutrality tests for sequences with missing data.

    PubMed

    Ferretti, Luca; Raineri, Emanuele; Ramos-Onsins, Sebastian

    2012-08-01

    Missing data are common in DNA sequences obtained through high-throughput sequencing. Furthermore, samples of low quality or problems in the experimental protocol often cause a loss of data even with traditional sequencing technologies. Here we propose modified estimators of variability and neutrality tests that can be naturally applied to sequences with missing data, without the need to remove bases or individuals from the analysis. Modified statistics include the Watterson estimator θW, Tajima's D, Fay and Wu's H, and HKA. We develop a general framework to take missing data into account in frequency spectrum-based neutrality tests and we derive the exact expression for the variance of these statistics under the neutral model. The neutrality tests proposed here can also be used as summary statistics to describe the information contained in other classes of data like DNA microarrays.

  2. CLaMS: Classifier for Metagenomic Sequences

    SciTech Connect

    Pati, Amrita

    2010-12-01

    CLaMS-"Classifer for Metagenonic Sequences" is a Java application for binning assembled metagenomes wings user-specified training sequence sets and other user-specified initial parameters. Since ClAmS analyzes and matches sequence composition-based genomic signatures, it is much faster than binning tools that rely on alignments to homologs; CLaMS can bin ~20,000 sequences in 3 minutes on a laptop with a 2.4 Ghz. Intel Core 2 Duo processor and 2 GB Ram. CLaMS is meant to be desktop application for biologist and can be run on any machine under any operating system on which the Java Runtime Environment is enabled. CLaMS is freely available in both GVI-based and command-line based forms.

  3. The genome sequence of Drosophila melanogaster.

    SciTech Connect

    2000-03-24

    The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the {approximately}120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes {approximately}13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

  4. Quadruplex DNA: sequence, topology and structure

    PubMed Central

    Burge, Sarah; Parkinson, Gary N.; Hazel, Pascale; Todd, Alan K.; Neidle, Stephen

    2006-01-01

    G-quadruplexes are higher-order DNA and RNA structures formed from G-rich sequences that are built around tetrads of hydrogen-bonded guanine bases. Potential quadruplex sequences have been identified in G-rich eukaryotic telomeres, and more recently in non-telomeric genomic DNA, e.g. in nuclease-hypersensitive promoter regions. The natural role and biological validation of these structures is starting to be explored, and there is particular interest in them as targets for therapeutic intervention. This survey focuses on the folding and structural features on quadruplexes formed from telomeric and non-telomeric DNA sequences, and examines fundamental aspects of topology and the emerging relationships with sequence. Emphasis is placed on information from the high-resolution methods of X-ray crystallography and NMR, and their scope and current limitations are discussed. Such information, together with biological insights, will be important for the discovery of drugs targeting quadruplexes from particular genes. PMID:17012276

  5. 32 CFR 179.7 - Sequencing.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... and social factors. (3) Economic factors, including economic considerations pertaining to... addressed before an MRS that presents a lesser relative risk. Other factors, however, may warrant consideration when determining the sequencing for specific MRSs. In evaluating other factors in...

  6. 32 CFR 179.7 - Sequencing.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... and social factors. (3) Economic factors, including economic considerations pertaining to... addressed before an MRS that presents a lesser relative risk. Other factors, however, may warrant consideration when determining the sequencing for specific MRSs. In evaluating other factors in...

  7. V838 Monocerotis Dissolve Sequence of Epochs

    NASA Video Gallery

    A dissolve sequence of eight images taken by Hubble's Advanced Camera for Surveys shows a CAT-scan-like probe of the three-dimensional structure of the shells of dust surrounding the aging star V83...

  8. Expressed sequence tags analysis of Blattella germanica.

    PubMed

    Chung, Hyang Suk; Yu, Tai Hyun; Kim, Bong Jin; Kim, Sun Mi; Kim, Joo Yeong; Yu, Hak Sun; Jeong, Hae Jin; Ock, Mee Sun

    2005-12-01

    Four hundred and sixty five randomly selected clones from a cDNA library of Blattella germanica were partially sequenced and searched using BLAST as a means of analyzing the transcribed sequences of its genome. A total of 363 expressed sequence tags (ESTs) were generated from 465 clones after editing and trimming the vector and ambiguous sequences. About 42% (154/363) of these clones showed significant homology with other data base registered genes. These new B. germanica genes constituted a broad range of transcripts distributed among ribosomal proteins, energy metabolism, allergens, proteases, protease inhibitors, enzymes, translation, cell signaling pathways, and proteins of unknown function. Eighty clones were not well-matched by database searches, and these represent new B. germanica-specific ESTs. Some genes which drew our attention are discussed. The information obtained increases our understanding of the B. germanica genome.

  9. Expressed sequence tags analysis of Blattella germanica

    PubMed Central

    Chung, Hyang Suk; Yu, Tai Hyun; Kim, Bong Jin; Kim, Sun Mi; Kim, Joo Yeong; Yu, Hak Sun; Jeong, Hae Jin

    2005-01-01

    Four hundred and sixty five randomly selected clones from a cDNA library of Blattella germanica were partially sequenced and searched using BLAST as a means of analyzing the transcribed sequences of its genome. A total of 363 expressed sequence tags (ESTs) were generated from 465 clones after editing and trimming the vector and ambiguous sequences. About 42% (154/363) of these clones showed significant homology with other data base registered genes. These new B. germanica genes constituted a broad range of transcripts distributed among ribosomal proteins, energy metabolism, allergens, proteases, protease inhibitors, enzymes, translation, cell signaling pathways, and proteins of unknown function. Eighty clones were not well-matched by database searches, and these represent new B. germanica-specific ESTs. Some genes which drew our attention are discussed. The information obtained increases our understanding of the B. germanica genome. PMID:16340304

  10. Sequence finishing and mapping of Drosophila melanogasterheterochromatin

    SciTech Connect

    Hoskins, Roger A.; Carlson, Joseph W.; Kennedy, Cameron; Acevedo,David; Evans-Holm, Martha; Frise, Erwin; Wan, Kenneth H.; Park, Soo; Mendez-Lago, Maria; Rossi, Fabrizio; Villasante, Alfredo; Dimitri,Patrizio; Karpen, Gary H.; Celniker, Susan E.

    2007-06-15

    Genome sequences for most metazoans are incomplete due tothe presence of repeated DNA in the pericentromeric heterochromatin. Theheterochromatic regions of D. melanogaster contain 20 Mb of sequenceamenable to mapping, sequence assembly and finishing. Here we describethe generation of 15 Mb of finished or improved heterochromatic sequenceusing available clone resources and assembly and mapping methods. We alsoconstructed a BAC-based physical map that spans approximately 13 Mb ofthe pericentromeric heterochromatin, and a cytogenetic map that positionsapproximately 11 Mb of BAC contigs and sequence scaffolds in specificchromosomal locations. The integrated sequence assembly and maps greatlyimprove our understanding of the structure and composition of this poorlyunderstood fraction of a metazoan genome and provide a framework forfunctional analyses.

  11. Fibonacci Sequence and Supramolecular Structure of DNA.

    PubMed

    Shabalkin, I P; Grigor'eva, E Yu; Gudkova, M V; Shabalkin, P I

    2016-05-01

    We proposed a new model of supramolecular DNA structure. Similar to the previously developed by us model of primary DNA structure [11-15], 3D structure of DNA molecule is assembled in accordance to a mathematic rule known as Fibonacci sequence. Unlike primary DNA structure, supramolecular 3D structure is assembled from complex moieties including a regular tetrahedron and a regular octahedron consisting of monomers, elements of the primary DNA structure. The moieties of the supramolecular DNA structure forming fragments of regular spatial lattice are bound via linker (joint) sequences of the DNA chain. The lattice perceives and transmits information signals over a considerable distance without acoustic aberrations. Linker sequences expand conformational space between lattice segments allowing their sliding relative to each other under the action of external forces. In this case, sliding is provided by stretching of the stacked linker sequences.

  12. "X"-tending the Fibonacci Sequence.

    ERIC Educational Resources Information Center

    Moran, Glenn T.

    2002-01-01

    Outlines a lesson on the Fibonacci and Lucas sequences that captures student interest by presenting the opportunity for computation practice, mental mathematics, and proof for algebra students. Discusses an extension for solving simultaneous equations. (YDS)

  13. Simultaneous sensorimotor adaptation and sequence learning.

    PubMed

    Overduin, Simon A; Richardson, Andrew G; Bizzi, Emilio; Press, Daniel Z

    2008-01-01

    Sensorimotor adaptation and sequence learning have often been treated as distinct forms of motor learning. But frequently the motor system must acquire both types of experience simultaneously. Here, we investigated the interaction of these two forms of motor learning by having subjects adapt to predictable forces imposed by a robotic manipulandum while simultaneously reaching to an implicit sequence of targets. We show that adaptation to novel dynamics and learning of a sequence of movements can occur simultaneously and without significant interference or facilitation. When both conditions were presented simultaneously to subjects, their trajectory error and reaction time decreased to the same extent as those of subjects who experienced the force field or sequence independently.

  14. Genetics Home Reference: isolated Pierre Robin sequence

    MedlinePlus

    ... a set of abnormalities affecting the head and face, consisting of a small lower jaw ( micrognathia ), a tongue that is placed further back than normal (glossoptosis), and blockage (obstruction) of the airways. Most people with Pierre Robin sequence are also ...

  15. Compilation of DNA sequences of Escherichia coli

    PubMed Central

    Kröger, Manfred

    1989-01-01

    We have compiled the DNA sequence data for E.coli K12 available from the GENBANK and EMBO databases and over a period of several years independently from the literature. We have introduced all available genetic map data and have arranged the sequences accordingly. As far as possible the overlaps are deleted and a total of 940,449 individual bp is found to be determined till the beginning of 1989. This corresponds to a total of 19.92% of the entire E.coli chromosome consisting of about 4,720 kbp. This number may actually be higher by some extra 2% derived from the sequence of lysogenic bacteriophage lambda and the various insertion sequences. This compilation may be available in machine readable form from one of the international databanks in some future. PMID:2654890

  16. The Value of DNA Sequencing - TCGA

    Cancer.gov

    DNA sequencing: what it tells us about DNA changes in cancer, how looking across many tumors will help to identify meaningful changes and potential drug targets, and how genomics is changing the way we think about cancer.

  17. Quality control of ion torrent sequencing library.

    PubMed

    Pop, Laura-Ancuţa; Puscas, Emil; Pileczki, Valentina; Cojocneanu-Petric, Roxana; Braicu, Cornelia; Achimas-Cadariu, Patriciu; Berindan-Neagoe, Ioana

    2014-01-01

    Next-generation sequencing (NSG) is an important method for gathering large amounts of sequencing data for different types of applications regarding the diagnosis and response to treatment of different diseases. An important step in the NGS process is the quality control of sequencing libraries, which can influence the yield and efficiency of the sequencing run. This study evaluated two different methods for library quality control, Agilent Bioanalyzer and qPCR, and showed that both methods can be used. However, as is the case with any analytical method, they have their limitations. The Agilent Bioanalyzer quantifies only the high quality libraries, but it underestimates their concentration, while qPCR also quantifies lower quality libraries, but it overestimates their concentration.

  18. Female-specific DNA sequences in geese.

    PubMed

    Huang, M C; Lin, W C; Horng, Y M; Rouvier, R; Huang, C W

    2003-07-01

    1. The OPAE random primers (Operon Technologies, Inc., CA) were used for random amplified polymorphic DNA (RAPD) fingerprinting in Chinese, White Roman and Landaise geese. One of these primers, OPAE-06, produced a 938-bp sex-specific fragment in all females and in no males of Chinese geese only. 2. A novel female-specific DNA sequence in Chinese goose was cloned and sequenced. Two primers, CGSex-F and CGSex-R, were designed in order to amplify a 912-bp sex-specific polymerase chain reaction (PCR) fragment on genomic DNA from female geese. 3. It was shown that a simple and effective PCR-based sexing technique could be used in the three goose breeds studied. 4. Nucleotide sequencing of the sex-specific fragments in White Roman and Landaise geese was performed and sequence differences were observed among these three breeds.

  19. Improved polynomial remainder sequences for Ore polynomials.

    PubMed

    Jaroschek, Maximilian

    2013-11-01

    Polynomial remainder sequences contain the intermediate results of the Euclidean algorithm when applied to (non-)commutative polynomials. The running time of the algorithm is dependent on the size of the coefficients of the remainders. Different ways have been studied to make these as small as possible. The subresultant sequence of two polynomials is a polynomial remainder sequence in which the size of the coefficients is optimal in the generic case, but when taking the input from applications, the coefficients are often larger than necessary. We generalize two improvements of the subresultant sequence to Ore polynomials and derive a new bound for the minimal coefficient size. Our approach also yields a new proof for the results in the commutative case, providing a new point of view on the origin of the extraneous factors of the coefficients.

  20. Using mobile sequencers in an academic classroom

    PubMed Central

    Zaaijer, Sophie; Erlich, Yaniv

    2016-01-01

    The advent of mobile DNA sequencers has made it possible to generate DNA sequencing data outside of laboratories and genome centers. Here, we report our experience of using the MinION, a mobile sequencer, in a 13-week academic course for undergraduate and graduate students. The course consisted of theoretical sessions that presented fundamental topics in genomics and several applied hackathon sessions. In these hackathons, the students used MinION sequencers to generate and analyze their own data and gain hands-on experience in the topics discussed in the theoretical classes. The manuscript describes the structure of our class, the educational material, and the lessons we learned in the process. We hope that the knowledge and material presented here will provide the community with useful tools to help educate future generations of genome scientists. DOI: http://dx.doi.org/10.7554/eLife.14258.001 PMID:27054412

  1. Ideal shrinking and expansion of discrete sequences

    NASA Technical Reports Server (NTRS)

    Watson, Andrew B.

    1986-01-01

    Ideal methods are described for shrinking or expanding a discrete sequence, image, or image sequence. The methods are ideal in the sense that they preserve the frequency spectrum of the input up to the Nyquist limit of the input or output, whichever is smaller. Fast implementations that make use of the discrete Fourier transform or the discrete Hartley transform are described. The techniques lead to a new multiresolution image pyramid.

  2. Aspects of coverage in medical DNA sequencing

    PubMed Central

    Wendl, Michael C; Wilson, Richard K

    2008-01-01

    Background DNA sequencing is now emerging as an important component in biomedical studies of diseases like cancer. Short-read, highly parallel sequencing instruments are expected to be used heavily for such projects, but many design specifications have yet to be conclusively established. Perhaps the most fundamental of these is the redundancy required to detect sequence variations, which bears directly upon genomic coverage and the consequent resolving power for discerning somatic mutations. Results We address the medical sequencing coverage problem via an extension of the standard mathematical theory of haploid coverage. The expected diploid multi-fold coverage, as well as its generalization for aneuploidy are derived and these expressions can be readily evaluated for any project. The resulting theory is used as a scaling law to calibrate performance to that of standard BAC sequencing at 8× to 10× redundancy, i.e. for expected coverages that exceed 99% of the unique sequence. A differential strategy is formalized for tumor/normal studies wherein tumor samples are sequenced more deeply than normal ones. In particular, both tumor alleles should be detected at least twice, while both normal alleles are detected at least once. Our theory predicts these requirements can be met for tumor and normal redundancies of approximately 26× and 21×, respectively. We explain why these values do not differ by a factor of 2, as might intuitively be expected. Future technology developments should prompt even deeper sequencing of tumors, but the 21× value for normal samples is essentially a constant. Conclusion Given the assumptions of standard coverage theory, our model gives pragmatic estimates for required redundancy. The differential strategy should be an efficient means of identifying potential somatic mutations for further study. PMID:18485222

  3. Insertion sequence elements in Lactococcus garvieae.

    PubMed

    Eraclio, Giovanni; Ricci, Giovanni; Fortina, Maria Grazia

    2015-01-25

    Insertion sequences are the simplest intracellular Mobile Genetic Elements which can occur in very high numbers in prokaryotic genomes, where they play an important evolutionary role by promoting genome plasticity. As such, the studies on the diversity and distribution of insertion sequences in genomes not yet investigated can contribute to improve the knowledge on a bacterial species and to identify new transposable elements. The present work describes the occurrence of insertion sequences in Lactococcus garvieae, an opportunistic emerging zoonotic and human pathogen, also associated with different food matrices. To date, no insertion elements have been described for L. garvieae in the IS element database. The analysis of the twelve published L. garvieae genomes identified 15 distinct insertion sequences that are members of the IS3, IS982, IS6, IS21 and IS256 families, including five new elements. Most of the insertion sequences in L. garvieae show substantial homology to the Lactococcus lactis elements, suggesting the movement of IS between these two species phylogenetically closely related. ISLL6 elements belonging to IS3 family were most abundant, with several copies distributed in 9 of the 12 genomes analyzed. An alignment analysis of two complete genomes carrying multi-copies of this insertion sequence indicates a possible involvement of ISLL6 in chromosomal rearrangement.

  4. PCR Techniques in Next-Generation Sequencing.

    PubMed

    Goswami, Rashmi S

    2016-01-01

    With the advent of next-generation sequencing and its prolific use in the clinical realm, it would appear that techniques such as PCR would not be in high demand. This is not the case however, as PCR techniques play an important role in the success of NGS technology. Although NGS has rapidly become an important part of clinical molecular diagnostics, whole genome sequencing is still difficult to implement in a clinical laboratory due to high costs of sequencing, as well as issues surrounding data processing, analysis, and data storage, which can reduce efficiency and increase turnaround times. As a result, targeted sequencing is often used in clinical diagnostics, due to its increased efficiency. PCR techniques play an integral role in targeted NGS sequencing, allowing for the generation of multiple NGS libraries and the sequencing of multiple targeted regions simultaneously. We will outline the methods we employ in PCR amplification of targeted genomic regions for cancer mutation hotspots using the Ampliseq Cancer Hotspot v2 panel (Life Technologies, Carlsbad, CA).

  5. Recoupling pulse sequences with constant phase increments

    NASA Astrophysics Data System (ADS)

    Khaneja, Navin; Kumar, Ashutosh

    2016-10-01

    The paper studies a family of recoupling pulse sequences in magic angle spinning (MAS) solid state NMR, that are characterized by constant phase increments at regular intervals. These pulse sequences can be employed for both homonuclear and heteronuclear recoupling experiments and are robust to dispersion in chemical shifts and rf-inhomogeneity. The homonuclear pulse sequence consists of a building block (2 π) ϕp , where ϕp =p (n - 1) π/n, where n is number of blocks in a rotor period and p = 0, 1, 2, … . The pulse sequence repeats itself every rotor period when n is odd and every two rotor period when n is even. The heteronuclear recoupling pulse sequence consists of a building block (2 π) ϕ1p and (2 π) ϕ2p on channel I and S, where ϕ1p = p (2 n - 3) π/2 n, ϕ2p = p (2 n - 1) π/2 n and n is number of blocks in a rotor period. The recoupling pulse sequences mix the z magnetization. Experimental quantification of this method is shown for 13Cα -13CO , homonuclear recoupling in a sample of Glycine and 15N -13Cα , heteronuclear recoupling in Alanine. Application of this method is demonstrated on a sample of tripeptide N-formyl-[U-13C ,15N ]- Met-Leu-Phe-OH (MLF).

  6. long duration dust storm sequences on Mars

    NASA Astrophysics Data System (ADS)

    Wang, H.

    2012-12-01

    The Mars Global Surveyor (MGS) Mars Observer Camera (MOC) and Mars Reconnaissance Orbiter (MRO) Mars Color Imager (MARCI) Mars daily global maps have revealed new characteristics for long duration dust storm sequences. These dust storm sequences have long histories of more than a week, travel long distances out of their origination region, and influence large areas in different regions of the planet. During the Ls = 180 - 360 season, except for global dust storms which involve multiple remote dust lifting centers and generally expand explosively from the southern hemisphere northward, other long-lived dust storm sequences usually travel southward through the Acidalia-Chryse, Utopia-Isidis or Arcadia-Amazonis channels with subsequent dust lifting along the way. Sometimes, they penetrate remarkably deep to the southern high latitudes, producing fantastic display of dust band. During the rest of the year, long duration dust storm sequences usually originate from the Argyre/Solis, Hellas/Noachis, or Cimmeria/Sirenum area and travel northward toward the southern low latitudes. Each route exhibits its own peculiar characteristics. We will present our results about these long duration dust storm sequences summarized from the complete archive of MGS MOC daily global maps and two years of MRO MARCI daily global maps. The systematic daily nearly global coverage of these maps makes it feasible to reconstruct the history of long duration dust storm sequences with detail.

  7. Polynomials Generated by the Fibonacci Sequence

    NASA Astrophysics Data System (ADS)

    Garth, David; Mills, Donald; Mitchell, Patrick

    2007-06-01

    The Fibonacci sequence's initial terms are F_0=0 and F_1=1, with F_n=F_{n-1}+F_{n-2} for n>=2. We define the polynomial sequence p by setting p_0(x)=1 and p_{n}(x)=x*p_{n-1}(x)+F_{n+1} for n>=1, with p_{n}(x)= sum_{k=0}^{n} F_{k+1}x^{n-k}. We call p_n(x) the Fibonacci-coefficient polynomial (FCP) of order n. The FCP sequence is distinct from the well-known Fibonacci polynomial sequence. We answer several questions regarding these polynomials. Specifically, we show that each even-degree FCP has no real zeros, while each odd-degree FCP has a unique, and (for degree at least 3) irrational, real zero. Further, we show that this sequence of unique real zeros converges monotonically to the negative of the golden ratio. Using Rouche's theorem, we prove that the zeros of the FCP's approach the golden ratio in modulus. We also prove a general result that gives the Mahler measures of an infinite subsequence of the FCP sequence whose coefficients are reduced modulo an integer m>=2. We then apply this to the case that m=L_n, the nth Lucas number, showing that the Mahler measure of the subsequence is phi^{n-1}, where phi=(1+sqrt 5)/2.

  8. Sequence composition and genome organization of maize

    PubMed Central

    Messing, Joachim; Bharti, Arvind K.; Karlowski, Wojciech M.; Gundlach, Heidrun; Kim, Hye Ran; Yu, Yeisoo; Wei, Fusheng; Fuks, Galina; Soderlund, Carol A.; Mayer, Klaus F. X.; Wing, Rod A.

    2004-01-01

    Zea mays L. ssp. mays, or corn, one of the most important crops and a model for plant genetics, has a genome ≈80% the size of the human genome. To gain global insight into the organization of its genome, we have sequenced the ends of large insert clones, yielding a cumulative length of one-eighth of the genome with a DNA sequence read every 6.2 kb, thereby describing a large percentage of the genes and transposable elements of maize in an unbiased approach. Based on the accumulative 307 Mb of sequence, repeat sequences occupy 58% and genic regions occupy 7.5%. A conservative estimate predicts ≈59,000 genes, which is higher than in any other organism sequenced so far. Because the sequences are derived from bacterial artificial chromosome clones, which are ordered in overlapping bins, tagged genes are also ordered along continuous chromosomal segments. Based on this positional information, roughly one-third of the genes appear to consist of tandemly arrayed gene families. Although the ancestor of maize arose by tetraploidization, fewer than half of the genes appear to be present in two orthologous copies, indicating that the maize genome has undergone significant gene loss since the duplication event. PMID:15388850

  9. Sequence Alignment to Predict Across Species Susceptibility ...

    EPA Pesticide Factsheets

    Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to simplify, streamline, and quantitatively assess protein sequence/structural similarity across taxonomic groups as a means to predict relative intrinsic susceptibility. The intent of the tool is to allow for evaluation of any potential protein target, so it is amenable to variable degrees of protein characterization, depending on available information about the chemical/protein interaction and the molecular target itself. To allow for flexibility in the analysis, a layered strategy was adopted for the tool. The first level of the SeqAPASS analysis compares primary amino acid sequences to a query sequence, calculating a metric for sequence similarity (including detection of candidate orthologs), the second level evaluates sequence similarity within selected domains (e.g., ligand-binding domain, DNA binding domain), and the third level of analysis compares individual amino acid residue positions identified as being of importance for protein conformation and/or ligand binding upon chemical perturbation. Each level of the SeqAPASS analysis provides increasing evidence to apply toward rapid, screening-level assessments of probable cross species susceptibility. Such analyses can support prioritization of chemicals for further ev

  10. Sequence diversity of wheat mosaic virus isolates.

    PubMed

    Stewart, Lucy R

    2016-02-02

    Wheat mosaic virus (WMoV), transmitted by eriophyid wheat curl mites (Aceria tosichella) is the causal agent of High Plains disease in wheat and maize. WMoV and other members of the genus Emaravirus evaded thorough molecular characterization for many years due to the experimental challenges of mite transmission and manipulating multisegmented negative sense RNA genomes. Recently, the complete genome sequence of a Nebraska isolate of WMoV revealed eight segments, plus a variant sequence of the nucleocapsid protein-encoding segment. Here, near-complete and partial consensus sequences of five more WMoV isolates are reported and compared to the Nebraska isolate: an Ohio maize isolate (GG1), a Kansas barley isolate (KS7), and three Ohio wheat isolates (H1, K1, W1). Results show two distinct groups of WMoV isolates: Ohio wheat isolate RNA segments had 84% or lower nucleotide sequence identity to the NE isolate, whereas GG1 and KS7 had 98% or higher nucleotide sequence identity to the NE isolate. Knowledge of the sequence variability of WMoV isolates is a step toward understanding virus biology, and potentially explaining observed biological variation.

  11. Multi-pathway sequences for MR thermometry

    PubMed Central

    Madore, Bruno; Panych, Lawrence P.; Mei, Chang-Sheng; Yuan, Jing; Chu, Renxin

    2011-01-01

    MR-based thermometry is a valuable adjunct to thermal ablation therapies as it helps to determine when lethal doses are reached at the target and whether surrounding tissues are safe from damage. When the targeted lesion is mobile, MR data can further be used for motion-tracking purposes. The present work introduces pulse sequence modifications that enable significant improvements both in terms of temperature-to-noise-ratio (TNR) properties and target-tracking abilities. Instead of sampling a single magnetization pathway as in typical MR thermometry sequences, the pulse-sequence design introduced here involves sampling at least one additional pathway. Image reconstruction changes associated with the proposed sampling scheme are also described. The method was implemented on two commonly used MR thermometry sequences: the gradient-echo and the interleaved echo-planar imaging (EPI) sequences. Data from the extra pathway enabled TNR improvements by up to 35%, without increasing scan time. Potentially of greater significance is that the sampled pathways featured very different contrast for blood vessels, facilitating their detection and use as internal landmarks for tracking purposes. Through improved TNR and lesion-tracking abilities, the proposed pulse-sequence design may facilitate the use of MR-monitored thermal ablations as an effective treatment option even in mobile organs such as the liver and kidneys. PMID:21394774

  12. A Unified Theoretical Framework for Cognitive Sequencing

    PubMed Central

    Savalia, Tejas; Shukla, Anuj; Bapi, Raju S.

    2016-01-01

    The capacity to sequence information is central to human performance. Sequencing ability forms the foundation stone for higher order cognition related to language and goal-directed planning. Information related to the order of items, their timing, chunking and hierarchical organization are important aspects in sequencing. Past research on sequencing has emphasized two distinct and independent dichotomies: implicit vs. explicit and goal-directed vs. habits. We propose a theoretical framework unifying these two streams. Our proposal relies on brain's ability to implicitly extract statistical regularities from the stream of stimuli and with attentional engagement organizing sequences explicitly and hierarchically. Similarly, sequences that need to be assembled purposively to accomplish a goal require engagement of attentional processes. With repetition, these goal-directed plans become habits with concomitant disengagement of attention. Thus, attention and awareness play a crucial role in the implicit-to-explicit transition as well as in how goal-directed plans become automatic habits. Cortico-subcortical loops basal ganglia-frontal cortex and hippocampus-frontal cortex loops mediate the transition process. We show how the computational principles of model-free and model-based learning paradigms, along with a pivotal role for attention and awareness, offer a unifying framework for these two dichotomies. Based on this framework, we make testable predictions related to the potential influence of response-to-stimulus interval (RSI) on developing awareness in implicit learning tasks. PMID:27917146

  13. A neurocomputational model of automatic sequence production.

    PubMed

    Helie, Sebastien; Roeder, Jessica L; Vucovich, Lauren; Rünger, Dennis; Ashby, F Gregory

    2015-07-01

    Most behaviors unfold in time and include a sequence of submovements or cognitive activities. In addition, most behaviors are automatic and repeated daily throughout life. Yet, relatively little is known about the neurobiology of automatic sequence production. Past research suggests a gradual transfer from the associative striatum to the sensorimotor striatum, but a number of more recent studies challenge this role of the BG in automatic sequence production. In this article, we propose a new neurocomputational model of automatic sequence production in which the main role of the BG is to train cortical-cortical connections within the premotor areas that are responsible for automatic sequence production. The new model is used to simulate four different data sets from human and nonhuman animals, including (1) behavioral data (e.g., RTs), (2) electrophysiology data (e.g., single-neuron recordings), (3) macrostructure data (e.g., TMS), and (4) neurological circuit data (e.g., inactivation studies). We conclude with a comparison of the new model with existing models of automatic sequence production and discuss a possible new role for the BG in automaticity and its implication for Parkinson's disease.

  14. Sequence determinants of human microsatellite variability

    PubMed Central

    2009-01-01

    Background Microsatellite loci are frequently used in genomic studies of DNA sequence repeats and in population studies of genetic variability. To investigate the effect of sequence properties of microsatellites on their level of variability we have analyzed genotypes at 627 microsatellite loci in 1,048 worldwide individuals from the HGDP-CEPH cell line panel together with the DNA sequences of these microsatellites in the human RefSeq database. Results Calibrating PCR fragment lengths in individual genotypes by using the RefSeq sequence enabled us to infer repeat number in the HGDP-CEPH dataset and to calculate the mean number of repeats (as opposed to the mean PCR fragment length), under the assumption that differences in PCR fragment length reflect differences in the numbers of repeats in the embedded repeat sequences. We find the mean and maximum numbers of repeats across individuals to be positively correlated with heterozygosity. The size and composition of the repeat unit of a microsatellite are also important factors in predicting heterozygosity, with tetra-nucleotide repeat units high in G/C content leading to higher heterozygosity. Finally, we find that microsatellites containing more separate sets of repeated motifs generally have higher heterozygosity. Conclusions These results suggest that sequence properties of microsatellites have a significant impact in determining the features of human microsatellite variability. PMID:20015383

  15. Reporting Differences Between Spacecraft Sequence Files

    NASA Technical Reports Server (NTRS)

    Khanampompan, Teerapat; Gladden, Roy E.; Fisher, Forest W.

    2010-01-01

    A suite of computer programs, called seq diff suite, reports differences between the products of other computer programs involved in the generation of sequences of commands for spacecraft. These products consist of files of several types: replacement sequence of events (RSOE), DSN keyword file [DKF (wherein DSN signifies Deep Space Network)], spacecraft activities sequence file (SASF), spacecraft sequence file (SSF), and station allocation file (SAF). These products can include line numbers, request identifications, and other pieces of information that are not relevant when generating command sequence products, though these fields can result in the appearance of many changes to the files, particularly when using the UNIX diff command to inspect file differences. The outputs of prior software tools for reporting differences between such products include differences in these non-relevant pieces of information. In contrast, seq diff suite removes the fields containing the irrelevant pieces of information before processing to extract differences, so that only relevant differences are reported. Thus, seq diff suite is especially useful for reporting changes between successive versions of the various products and in particular flagging difference in fields relevant to the sequence command generation and review process.

  16. Sequencing proteins with transverse ionic transport

    NASA Astrophysics Data System (ADS)

    Boynton, Paul; di Ventra, Massimiliano

    2015-03-01

    De novo protein sequencing is essential for understanding cellular processes that govern the function of living organisms. By obtaining the order of the amino acids that composes a given protein one can determine both its secondary and tertiary structures through protein structure prediction, which is used to create models for protein aggregation diseases such as Alzheimer's Disease. Mass spectrometry is the current technique of choice for de novo sequencing, but because some amino acids have the same mass the sequence cannot be completely determined in many cases. In this paper we propose a new technique for de novo protein sequencing that involves translocating a polypeptide through a synthetic nanochannel and measuring the ionic current of each amino acid through an intersecting perpendicular nanochannel, similar to that proposed in for DNA sequencing. Indeed, we find that the distribution of ionic currents for each of the 20 proteinogenic amino acids encoded by eukaryotic genes is statistically distinct, showing this technique's potential for de novo protein sequencing.

  17. Sequence walkers: a graphical method to display how binding proteins interact with DNA or RNA sequences.

    PubMed Central

    Schneider, T D

    1997-01-01

    A graphical method is presented for displaying how binding proteins and other macromolecules interact with individual bases of nucleotide sequences. Characters representing the sequence are either oriented normally and placed above a line indicating favorable contact, or upside-down and placed below the line indicating unfavorable contact. The positive or negative height of each letter shows the contribution of that base to the average sequence conservation of the binding site, as represented by a sequence logo. These sequence 'walkers' can be stepped along raw sequence data to visually search for binding sites. Many walkers, for the same or different proteins, can be simultaneously placed next to a sequence to create a quantitative map of a complex genetic region. One can alter the sequence to quantitatively engineer binding sites. Database anomalies can be visualized by placing a walker at the recorded positions of a binding molecule and by comparing this to locations found by scanning the nearby sequences. The sequence can also be altered to predict whether a change is a polymorphism or a mutation for the recognizer being modeled. PMID:9336476

  18. Generating matrix and sums of Fibonacci and Pell sequences

    NASA Astrophysics Data System (ADS)

    Ho, C. K.; Woon, H. S.; Chong, Chin-Yoon

    2014-07-01

    In this paper, we study the Fibonacci sequence and Pell sequence and developed generating matrices for them. First we proved two results on the even sum of the Fibonacci sequence and the Pell sequence, using the generating matrix approach. We then deduce the odd sums, some identities and recursive formulas for these two sequences.

  19. Advances in clinical next-generation sequencing: target enrichment and sequencing technologies.

    PubMed

    Ballester, Leomar Y; Luthra, Rajyalakshmi; Kanagal-Shamanna, Rashmi; Singh, Rajesh R

    2016-01-01

    The huge parallel sequencing capabilities of next generation sequencing technologies have made them the tools of choice to characterize genomic aberrations for research and diagnostic purposes. For clinical applications, screening the whole genome or exome is challenging owing to the large genomic area to be sequenced, associated costs, complexity of data, and lack of known clinical significance of all genes. Consequently, routine screening involves limited markers with established clinical relevance. This process, referred to as targeted genome sequencing, requires selective enrichment of the genomic areas comprising these markers via one of several primer or probe-based enrichment strategies, followed by sequencing of the enriched genomic areas. Here, the authors review current target enrichment approaches and next generation sequencing platforms, focusing on the underlying principles, capabilities, and limitations of each technology along with validation and implementation for clinical testing.

  20. Sequencing strategy for the whole mitochondrial genome resulting in high quality sequences

    PubMed Central

    Fendt, Liane; Zimmermann, Bettina; Daniaux, Martin; Parson, Walther

    2009-01-01

    Background It has been demonstrated that a reliable and fail-safe sequencing strategy is mandatory for high-quality analysis of mitochondrial (mt) DNA, as the sequencing and base-calling process is prone to error. Here, we present a high quality, reliable and easy handling manual procedure for the sequencing of full mt genomes that is also appropriate for laboratories where fully automated processes are not available. Results We amplified whole mitochondrial genomes as two overlapping PCR-fragments comprising each about 8500 bases in length. We developed a set of 96 primers that can be applied to a (manual) 96 well-based technology, which resulted in at least double strand sequence coverage of the entire coding region (codR). Conclusion This elaborated sequencing strategy is straightforward and allows for an unambiguous sequence analysis and interpretation including sometimes challenging phenomena such as point and length heteroplasmy that are relevant for the investigation of forensic and clinical samples. PMID:19331681

  1. Illumina MiSeq sequencing disfavours a sequence motif in the GFP reporter gene

    PubMed Central

    Van den Hoecke, Silvie; Verhelst, Judith; Saelens, Xavier

    2016-01-01

    Green fluorescent protein (GFP) is one of the most used reporter genes. We have used next-generation sequencing (NGS) to analyse the genetic diversity of a recombinant influenza A virus that expresses GFP and found a remarkable coverage dip in the GFP coding sequence. This coverage dip was present when virus-derived RT-PCR product or the parental plasmid DNA was used as starting material for NGS and regardless of whether Nextera XT transposase or Covaris shearing was used for DNA fragmentation. Therefore, the sequence coverage dip in the GFP coding sequence was not the result of emerging GFP mutant viruses or a bias introduced by Nextera XT fragmentation. Instead, we found that the Illumina MiSeq sequencing method disfavours the ‘CCCGCC’ motif in the GFP coding sequence. PMID:27193250

  2. An enhanced algorithm for multiple sequence alignment of protein sequences using genetic algorithm

    PubMed Central

    Kumar, Manish

    2015-01-01

    One of the most fundamental operations in biological sequence analysis is multiple sequence alignment (MSA). The basic of multiple sequence alignment problems is to determine the most biologically plausible alignments of protein or DNA sequences. In this paper, an alignment method using genetic algorithm for multiple sequence alignment has been proposed. Two different genetic operators mainly crossover and mutation were defined and implemented with the proposed method in order to know the population evolution and quality of the sequence aligned. The proposed method is assessed with protein benchmark dataset, e.g., BALIBASE, by comparing the obtained results to those obtained with other alignment algorithms, e.g., SAGA, RBT-GA, PRRP, HMMT, SB-PIMA, CLUSTALX, CLUSTAL W, DIALIGN and PILEUP8 etc. Experiments on a wide range of data have shown that the proposed algorithm is much better (it terms of score) than previously proposed algorithms in its ability to achieve high alignment quality. PMID:27065770

  3. Microbial Contamination in Next Generation Sequencing: Implications for Sequence-Based Analysis of Clinical Samples

    PubMed Central

    Strong, Michael J.; Xu, Guorong; Morici, Lisa; Splinter Bon-Durant, Sandra; Baddoo, Melody; Lin, Zhen; Fewell, Claire; Taylor, Christopher M.; Flemington, Erik K.

    2014-01-01

    The high level of accuracy and sensitivity of next generation sequencing for quantifying genetic material across organismal boundaries gives it tremendous potential for pathogen discovery and diagnosis in human disease. Despite this promise, substantial bacterial contamination is routinely found in existing human-derived RNA-seq datasets that likely arises from environmental sources. This raises the need for stringent sequencing and analysis protocols for studies investigating sequence-based microbial signatures in clinical samples. PMID:25412476

  4. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

    PubMed Central

    2011-01-01

    Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/. PMID:21385349

  5. Anomaly Detection for Discrete Sequences: A Survey

    SciTech Connect

    Chandola, Varun; Banerjee, Arindam; Kumar, Vipin

    2012-01-01

    This survey attempts to provide a comprehensive and structured overview of the existing research for the problem of detecting anomalies in discrete/symbolic sequences. The objective is to provide a global understanding of the sequence anomaly detection problem and how existing techniques relate to each other. The key contribution of this survey is the classification of the existing research into three distinct categories, based on the problem formulation that they are trying to solve. These problem formulations are: 1) identifying anomalous sequences with respect to a database of normal sequences; 2) identifying an anomalous subsequence within a long sequence; and 3) identifying a pattern in a sequence whose frequency of occurrence is anomalous. We show how each of these problem formulations is characteristically distinct from each other and discuss their relevance in various application domains. We review techniques from many disparate and disconnected application domains that address each of these formulations. Within each problem formulation, we group techniques into categories based on the nature of the underlying algorithm. For each category, we provide a basic anomaly detection technique, and show how the existing techniques are variants of the basic technique. This approach shows how different techniques within a category are related or different from each other. Our categorization reveals new variants and combinations that have not been investigated before for anomaly detection. We also provide a discussion of relative strengths and weaknesses of different techniques. We show how techniques developed for one problem formulation can be adapted to solve a different formulation, thereby providing several novel adaptations to solve the different problem formulations. We also highlight the applicability of the techniques that handle discrete sequences to other related areas such as online anomaly detection and time series anomaly detection.

  6. Application of sequence stratigraphy to modern sediments

    SciTech Connect

    Suter, J.R.; Boyd, R.; Penland, S.

    1988-02-01

    The concept of sequence stratigraphy provides a genetically linked depositional history of sea level and sedimentary processes which has been effectively used to interpret ancient deposits. Few efforts have been made to apply sequence stratigraphy to modern sediments, primarily due to differences in scale and the million(s)-year time span required to develop the lower-order depositional sequences upon which the concept is based. An extensive high-resolution seismic and vibracore database compiled over the last six years on the Louisiana continental shelf, coupled with published information, allows an application of sequence stratigraphy to the Mississippi River delta (MRD) system. Each of the major elements of the model has a counterpart in the MRD sequence. Eustatic fall beginning some 27,000 years ago resulted in fluvial downcutting and subaerial exposure of the continental shelf, creating incised valleys, a Type 1 unconformity, and a lowstand wedge. The incised valleys filled as sea level began rising some 18,000 years ago, until a series of backstepping shelf-phase deltas were deposited during relative stillstands, onlapping the Type 1 unconformity as a transgressive systems tract. Once sea level reached its current position about 3000 years ago, continuing deltaic deposition initiated the highstand systems tracts, which has reached the shelf margin in the form of the Balize delta complex. Major depocenters like the MRD are able to form deposits analogous to lower-order sequences in time spans several orders of magnitude less than thought necessary, raising questions regarding the driving mechanism and time scale of the sequence stratigraphy approach.

  7. Sequence conservation on the Y chromosome

    SciTech Connect

    Gibson, L.H.; Yang-Feng, L.; Lau, C.

    1994-09-01

    The Y chromosome is present in all mammals and is considered to be essential to sex determination. Despite intense genomic research, only a few genes have been identified and mapped to this chromosome in humans. Several of them, such as SRY and ZFY, have been demonstrated to be conserved and Y-located in other mammals. In order to address the issue of sequence conservation on the Y chromosome, we performed fluorescence in situ hybridization (FISH) with DNA from a human Y cosmid library as a probe to study the Y chromosomes from other mammalian species. Total DNA from 3,000-4,500 cosmid pools were labeled with biotinylated-dUTP and hybridized to metaphase chromosomes. For human and primate preparations, human cot1 DNA was included in the hybridization mixture to suppress the hybridization from repeat sequences. FISH signals were detected on the Y chromosomes of human, gorilla, orangutan and baboon (Old World monkey) and were absent on those of squirrel monkey (New World monkey), Indian munjac, wood lemming, Chinese hamster, rat and mouse. Since sequence analysis suggested that specific genes, e.g. SRY and ZFY, are conserved between these two groups, the lack of detectable hybridization in the latter group implies either that conservation of the human Y sequences is limited to the Y chromosomes of the great apes and Old World monkeys, or that the size of the syntenic segment is too small to be detected under the resolution of FISH, or that homologeous sequences have undergone considerable divergence. Further studies with reduced hybridization stringency are currently being conducted. Our results provide some clues as to Y-sequence conservation across species and demonstrate the limitations of FISH across species with total DNA sequences from a particular chromosome.

  8. Two Hybrid Algorithms for Multiple Sequence Alignment

    NASA Astrophysics Data System (ADS)

    Naznin, Farhana; Sarker, Ruhul; Essam, Daryl

    2010-01-01

    In order to design life saving drugs, such as cancer drugs, the design of Protein or DNA structures has to be accurate. These structures depend on Multiple Sequence Alignment (MSA). MSA is used to find the accurate structure of Protein and DNA sequences from existing approximately correct sequences. To overcome the overly greedy nature of the well known global progressive alignment method for multiple sequence alignment, we have proposed two different algorithms in this paper; one is using an iterative approach with a progressive alignment method (PAMIM) and the second one is using a genetic algorithm with a progressive alignment method (PAMGA). Both of our methods started with a "kmer" distance table to generate single guide-tree. In the iterative approach, we have introduced two new techniques: the first technique is to generate Guide-trees with randomly selected sequences and the second is of shuffling the sequences inside that tree. The output of the tree is a multiple sequence alignment which has been evaluated by the Sum of Pairs Method (SPM) considering the real value data from PAM250. In our second GA approach, these two techniques are used to generate an initial population and also two different approaches of genetic operators are implemented in crossovers and mutation. To test the performance of our two algorithms, we have compared these with the existing well known methods: T-Coffee, MUSCEL, MAFFT and Probcon, using BAliBase benchmarks. The experimental results show that the first algorithm works well for some situations, where other existing methods face difficulties in obtaining better solutions. The proposed second method works well compared to the existing methods for all situations and it shows better performance over the first one.

  9. Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.

    PubMed

    Laehnemann, David; Borkhardt, Arndt; McHardy, Alice Carolyn

    2016-01-01

    Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here.

  10. Detecting and Analyzing DNA Sequencing Errors: Toward a Higher Quality of the Bacillus subtilis Genome Sequence

    PubMed Central

    Médigue, Claudine; Rose, Matthias; Viari, Alain; Danchin, Antoine

    1999-01-01

    During the determination of a DNA sequence, the introduction of artifactual frameshifts and/or in-frame stop codons in putative genes can lead to misprediction of gene products. Detection of such errors with a method based on protein similarity matching is only possible when related sequences are available in databases. Here, we present a method to detect frameshift errors in DNA sequences that is based on the intrinsic properties of the coding sequences. It combines the results of two analyses, the search for translational initiation/termination sites and the prediction of coding regions. This method was used to screen the complete Bacillus subtilis genome sequence and the regions flanking putative errors were resequenced for verification. This procedure allowed us to correct the sequence and to analyze in detail the nature of the errors. Interestingly, in several cases in-frame termination codons or frameshifts were not sequencing errors but confirmed to be present in the chromosome, indicating that the genes are either nonfunctional (pseudogenes) or subject to regulatory processes such as programmed translational frameshifts. The method can be used for checking the quality of the sequences produced by any prokaryotic genome sequencing project. PMID:10568751

  11. Sequence variation of 22 autosomal STR loci detected by next generation sequencing.

    PubMed

    Gettings, Katherine Butler; Kiesler, Kevin M; Faith, Seth A; Montano, Elizabeth; Baker, Christine H; Young, Brian A; Guerrieri, Richard A; Vallone, Peter M

    2016-03-01

    Sequencing short tandem repeat (STR) loci allows for determination of repeat motif variations within the STR (or entire PCR amplicon) which cannot be ascertained by size-based PCR fragment analysis. Sanger sequencing has been used in research laboratories to further characterize STR loci, but is impractical for routine forensic use due to the laborious nature of the procedure in general and additional steps required to separate heterozygous alleles. Recent advances in library preparation methods enable high-throughput next generation sequencing (NGS) and technological improvements in sequencing chemistries now offer sufficient read lengths to encompass STR alleles. Herein, we present sequencing results from 183 DNA samples, including African American, Caucasian, and Hispanic individuals, at 22 autosomal forensic STR loci using an assay designed for NGS. The resulting dataset has been used to perform population genetic analyses of allelic diversity by length compared to sequence, and exemplifies which loci are likely to achieve the greatest gains in discrimination via sequencing. Within this data set, six loci demonstrate greater than double the number of alleles obtained by sequence compared to the number of alleles obtained by length: D12S391, D2S1338, D21S11, D8S1179, vWA, and D3S1358. As expected, repeat region sequences which had not previously been reported in forensic literature were identified.

  12. Mixed sequence reader: a program for analyzing DNA sequences with heterozygous base calling.

    PubMed

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3.

  13. Megabase sequencing of human genome by ordered-shotgun-sequencing (OSS) strategy

    NASA Astrophysics Data System (ADS)

    Chen, Ellson Y.

    1997-05-01

    So far we have used OSS strategy to sequence over 2 megabases DNA in large-insert clones from regions of human X chromosomes with different characteristic levels of GC content. The method starts by randomly fragmenting a BAC, YAC or PAC to 8-12 kb pieces and subcloning those into lambda phage. Insert-ends of these clones are sequenced and overlapped to create a partial map. Complete sequencing is then done on a minimal tiling path of selected subclones, recursively focusing on those at the edges of contigs to facilitate mergers of clones across the entire target. To reduce manual labor, PCR processes have been adapted to prepare sequencing templates throughout the entire operation. The streamlined process can thus lend itself to further automation. The OSS approach is suitable for large- scale genomic sequencing, providing considerable flexibility in the choice of subclones or regions for more or less intensive sequencing. For example, subclones containing contaminating host cell DNA or cloning vector can be recognized and ignored with minimal sequencing effort; regions overlapping a neighboring clone already sequenced need not be redone; and segments containing tandem repeats or long repetitive sequences can be spotted early on and targeted for additional attention.

  14. Does protein relatedness require sequence matching? Alignment via networks in sequence space.

    PubMed

    Frenkel, Zakharia M

    2008-10-01

    To establish possible function of a newly discovered protein, alignment of its sequence with other known sequences is required. When the similarity is marginal, the function remains uncertain. A principally new approach is suggested: to use networks in the protein sequence space. The functionality of the protein is firmly established via networks forming chains of consecutive pair-wise matching fragments. The distant relatives are, thus, considered as relatives, though in some cases, there is even no sequence match between the ends of the chain, while the entire chain belongs to the same functional and structural network.

  15. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid...

  16. Sequence variation of 22 autosomal STR loci detected by next generation sequencing

    PubMed Central

    Gettings, Katherine Butler; Kiesler, Kevin M.; Faith, Seth A.; Montano, Elizabeth; Baker, Christine H.; Young, Brian A.; Guerrieri, Richard A.; Vallone, Peter M.

    2016-01-01

    Sequencing short tandem repeat (STR) loci allows for determination of repeat motif variations within the STR (or entire PCR amplicon) which cannot be ascertained by size-based PCR fragment analysis. Sanger sequencing has been used in research laboratories to further characterize STR loci, but is impractical for routine forensic use due to the laborious nature of the procedure in general and additional steps required to separate heterozygous alleles. Recent advances in library preparation methods enable high-throughput next generation sequencing (NGS) and technological improvements in sequencing chemistries now offer sufficient read lengths to encompass STR alleles. Herein, we present sequencing results from 183 DNA samples, including African American, Caucasian, and Hispanic individuals, at 22 autosomal forensic STR loci using an assay designed for NGS. The resulting dataset has been used to perform population genetic analyses of allelic diversity by length compared to sequence, and exemplifies which loci are likely to achieve the greatest gains in discrimination via sequencing. Within this data set, six loci demonstrate greater than double the number of alleles obtained by sequence compared to the number of alleles obtained by length: D12S391, D2S1338, D21S11, D8S1179, vWA, and D3S1358. As expected, repeat region sequences which had not previously been reported in forensic literature were identified. PMID:26701720

  17. Fibonacci-like sequences and generalized Pascal's triangles

    NASA Astrophysics Data System (ADS)

    Vincenzi, G.; Siani, S.

    2014-05-01

    The properties pertaining to diagonals of generalized Pascal's triangles are studied. Combinatorial relationships between Fibonacci-like sequences and Fibonacci sequence itself are determined, using the sequence of diagonals of generalized Pascal's triangle.

  18. Direct sequencing of the human microbiome readily reveals community differences

    PubMed Central

    2010-01-01

    Culture-independent studies of human microbiota by direct genomic sequencing reveal quite distinct differences among communities, indicating that improved sequencing capacity can be most wisely utilized to study more samples, rather than more sequences per sample. PMID:20441597

  19. Developmental contributions to motor sequence learning.

    PubMed

    Savion-Lemieux, Tal; Bailey, Jennifer A; Penhune, Virginia B

    2009-05-01

    Little is known about how children acquire new motor sequences. In particular, it is not clear if the same learning progression observed in adults is also present in childhood nor whether motor skills are acquired in a similar fashion across development. In the present study we used the multi-finger sequencing task (MFST), a variant of the serial reaction time (SRT) task, to study motor sequence learning, across two consecutive days, in three cross-sectional samples of children aged 6, 8, and 10 years, and a control sample of adults. In the MFST, participants reproduced 10-element sequences of key presses on an electronic keyboard, using four fingers of the right hand. Each block of practice included 10 intermixed trials of a Repeated (REP) sequence and four trials of Random (RAN) sequences. Performance was assessed by examining changes in accuracy, a component of the task that requires the association of the visual stimulus with the motor response, and response synchronization, a component that requires fine-grained sensorimotor integration and timing. Additionally, participants completed Recognition and Recall tests, to assess explicit knowledge of the repeated sequence. Overall, results showed a developmental progression in motor sequence learning within and across days of practice. Interestingly, the two behavioral measures showed different developmental trajectories. For accuracy, differences were greatest for the two youngest groups early in learning, and these groups also showed the greatest rate of improvement. However, by the end of Day 2, only the 6-year-olds still lagged behind all other groups. For response synchronization, all child groups differed from adults early in learning, but both child and adult groups showed similar rates of improvement across blocks of practice. By the end of Day 2, 10-year-olds reached adult levels of performance, whereas 6- and 8-year-olds did not. Taken together, the dissociation observed with our two behavioral measures

  20. Therapeutic vaccination reduces HIV sequence variability.

    PubMed

    Hoffmann, Dieter; Seebach, Judith; Cosma, Antonio; Goebel, Frank D; Strimmer, Korbinian; Schätzl, Hermann M; Erfle, Volker

    2008-02-01

    With HIV persisting lifelong in infected persons, therapeutic vaccination is a novel alternative concept to control virus replication. Even though CD8 and CD4 cell responses to such immunizations have been demonstrated, their effects on virus replication are still unclear. In view of this fact, we studied the impact of a therapeutic vaccination with HIV nef delivered by a recombinant modified vaccinia Ankara vector on viral diversity. We investigated HIV sequences derived from chronically infected persons before and after therapeutic vaccination. Before immunization the mean +/- se pairwise variability of patient-derived Nef protein sequences was 0.1527 +/- 0.0041. After vaccination the respective value was 0.1249 +/- 0.0042, resulting in a significant (P<0.0001) difference between the two time points. The genes vif and 5'gag tested in parallel and nef sequences in control persons yielded a constant amino acid sequence variation. The data presented suggest that Nef immunization induced a selective pressure, limiting HIV sequence variability. To our knowledge this is the first report directly linking therapeutic HIV vaccination to decreasing diversity in patient-derived virus isolates.

  1. California foreshock sequences suggest aseismic triggering process

    NASA Astrophysics Data System (ADS)

    Chen, Xiaowei; Shearer, Peter M.

    2013-06-01

    Foreshocks are one of the few well-documented precursors to large earthquakes; therefore, understanding their nature is very important for earthquake prediction and hazard mitigation. However, the triggering role of foreshocks is not yet clear. It is possible that foreshocks are a self-triggering cascade of events that simply happen to trigger an unusually large aftershock; alternatively, foreshocks might originate from an external aseismic process that ultimately triggers the mainshock. In the former case, the foreshocks will have limited utility for forecasting. The latter case has been observed for several individual large earthquakes; however, it remains unclear how common it is and how to distinguish foreshock sequences from other seismicity clusters that do not lead to large earthquakes. Here we analyze foreshocks of three M>7 mainshocks in southern California. These foreshock sequences appear similar to earthquake swarms, in that they do not start with their largest events and they exhibit spatial migration of seismicity. Analysis of source spectra shows that all three foreshock sequences feature lower average stress drops and depletion of high-frequency energy compared with the aftershocks of their corresponding mainshocks. Using a longer-term stress-drop catalog, we find that the average stress drop of the Landers and Hector Mine foreshock sequences is comparable to nearby swarms. Our observations suggest that these foreshock sequences are manifestations of aseismic transients occurring close to the mainshock hypocenters, possibly related to localized fault zone complexity, which have promoted the occurrence of both the foreshocks and the eventual mainshock.

  2. Inferring ethnicity from mitochondrial DNA sequence

    PubMed Central

    2011-01-01

    Background The assignment of DNA samples to coarse population groups can be a useful but difficult task. One such example is the inference of coarse ethnic groupings for forensic applications. Ethnicity plays an important role in forensic investigation and can be inferred with the help of genetic markers. Being maternally inherited, of high copy number, and robust persistence in degraded samples, mitochondrial DNA may be useful for inferring coarse ethnicity. In this study, we compare the performance of methods for inferring ethnicity from the sequence of the hypervariable region of the mitochondrial genome. Results We present the results of comprehensive experiments conducted on datasets extracted from the mtDNA population database, showing that ethnicity inference based on support vector machines (SVM) achieves an overall accuracy of 80-90%, consistently outperforming nearest neighbor and discriminant analysis methods previously proposed in the literature. We also evaluate methods of handling missing data and characterize the most informative segments of the hypervariable region of the mitochondrial genome. Conclusions Support vector machines can be used to infer coarse ethnicity from a small region of mitochondrial DNA sequence with surprisingly high accuracy. In the presence of missing data, utilizing only the regions common to the training sequences and a test sequence proves to be the best strategy. Given these results, SVM algorithms are likely to also be useful in other DNA sequence classification applications. PMID:21554759

  3. Intranuclear Anchoring of Repetitive DNA Sequences

    PubMed Central

    Weipoltshammer, Klara; Schöfer, Christian; Almeder, Marlene; Philimonenko, Vlada V.; Frei, Klemens; Wachtler, Franz; Hozák, Pavel

    1999-01-01

    Centromeres, telomeres, and ribosomal gene clusters consist of repetitive DNA sequences. To assess their contributions to the spatial organization of the interphase genome, their interactions with the nucleoskeleton were examined in quiescent and activated human lymphocytes. The nucleoskeletons were prepared using “physiological” conditions. The resulting structures were probed for specific DNA sequences of centromeres, telomeres, and ribosomal genes by in situ hybridization; the electroeluted DNA fractions were examined by blot hybridization. In both nonstimulated and stimulated lymphocytes, centromeric alpha-satellite repeats were almost exclusively found in the eluted fraction, while telomeric sequences remained attached to the nucleoskeleton. Ribosomal genes showed a transcription-dependent attachment pattern: in unstimulated lymphocytes, transcriptionally inactive ribosomal genes located outside the nucleolus were eluted completely. When comparing transcription unit and intergenic spacer, significantly more of the intergenic spacer was removed. In activated lymphocytes, considerable but similar amounts of both rDNA fragments were eluted. The results demonstrate that: (a) the various repetitive DNA sequences differ significantly in their intranuclear anchoring, (b) telomeric rather than centromeric DNA sequences form stable attachments to the nucleoskeleton, and (c) different attachment mechanisms might be responsible for the interaction of ribosomal genes with the nucleoskeleton. PMID:10613900

  4. Sequence analysis by iterated maps, a review.

    PubMed

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  5. Entropy estimation of very short symbolic sequences

    NASA Astrophysics Data System (ADS)

    Lesne, Annick; Blanc, Jean-Luc; Pezard, Laurent

    2009-04-01

    While entropy per unit time is a meaningful index to quantify the dynamic features of experimental time series, its estimation is often hampered in practice by the finite length of the data. We here investigate the performance of entropy estimation procedures, relying either on block entropies or Lempel-Ziv complexity, when only very short symbolic sequences are available. Heuristic analytical arguments point at the influence of temporal correlations on the bias and statistical fluctuations, and put forward a reduced effective sequence length suitable for error estimation. Numerical studies are conducted using, as benchmarks, the wealth of different dynamic regimes generated by the family of logistic maps and stochastic evolutions generated by a Markov chain of tunable correlation time. Practical guidelines and validity criteria are proposed. For instance, block entropy leads to a dramatic overestimation for sequences of low entropy, whereas it outperforms Lempel-Ziv complexity at high entropy. As a general result, the quality of entropy estimation is sensitive to the sequence temporal correlation hence self-consistently depends on the entropy value itself, thus promoting a two-step procedure. Lempel-Ziv complexity is to be preferred in the first step and remains the best estimator for highly correlated sequences.

  6. Entropy estimation of very short symbolic sequences.

    PubMed

    Lesne, Annick; Blanc, Jean-Luc; Pezard, Laurent

    2009-04-01

    While entropy per unit time is a meaningful index to quantify the dynamic features of experimental time series, its estimation is often hampered in practice by the finite length of the data. We here investigate the performance of entropy estimation procedures, relying either on block entropies or Lempel-Ziv complexity, when only very short symbolic sequences are available. Heuristic analytical arguments point at the influence of temporal correlations on the bias and statistical fluctuations, and put forward a reduced effective sequence length suitable for error estimation. Numerical studies are conducted using, as benchmarks, the wealth of different dynamic regimes generated by the family of logistic maps and stochastic evolutions generated by a Markov chain of tunable correlation time. Practical guidelines and validity criteria are proposed. For instance, block entropy leads to a dramatic overestimation for sequences of low entropy, whereas it outperforms Lempel-Ziv complexity at high entropy. As a general result, the quality of entropy estimation is sensitive to the sequence temporal correlation hence self-consistently depends on the entropy value itself, thus promoting a two-step procedure. Lempel-Ziv complexity is to be preferred in the first step and remains the best estimator for highly correlated sequences.

  7. Human Genome Sequencing in Health and Disease

    PubMed Central

    Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

    2013-01-01

    Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320

  8. Application of sequence stratigraphy to modern sediments

    SciTech Connect

    Suter, J.R.; Boyd, R.; Penland, S.

    1988-01-01

    The concept of sequence stratigraphy provides a genetically linked depositional history of sea level and sedimentary processes which has been effectively used to interpret ancient deposits. Few efforts have been made to apply sequence stratigraphy to modern sediments, primarily due to differences in scale and the million(s)-year time span required to develop the lower-order depositional sequences upon which the concept is based. An extensive high-resolution seismic and vibracore database compiled over the last six years on the Louisiana continental shelf, coupled with published information, allows an application of sequence stratigraphy to the Mississippi River delta (MRD) system. Each of the major elements of the model has a counterpart in the MRD sequence. Eustatic fall beginning some 27,000 years ago resulted in fluvial downcutting and subaerial exposure of the continental shelf, creating incised valley, a Type 1 unconformity, and a lowstand wedge. The incised valleys filled as sea level began rising some 18,000 years ago, until a series of backstepping shelf-phase deltas were deposited during relative stillstands, onlapping the Type 1 unconformity as a transgressive systems tract. Once sea level reached its current position about 3,000 years ago, continuing deltaic deposition initiated the highstand systems tract, which has reached the shelf margin in the form of the Balize delta complex.

  9. Imaging of DNA sequences with chemiluminescence.

    PubMed Central

    Tizard, R; Cate, R L; Ramachandran, K L; Wysk, M; Voyta, J C; Murphy, O J; Bronstein, I

    1990-01-01

    We have coupled a chemiluminescent detection method that uses an alkaline phosphatase label to the genomic DNA sequencing protocol of Church and Gilbert [Church, G. M. & Gilbert, W. (1984) Proc. Natl. Acad. Sci. USA 81, 1991-1995]. Images of sequence ladders are obtained on x-ray film with exposure times of less than 30 min, as compared to 40 h required for a similar exposure with a 32P-labeled oligomer. Chemically cleaved DNA from a sequencing gel is transferred to a nylon membrane, and specific sequence ladders are selected by hybridization to DNA oligonucleotides labeled with alkaline phosphatase or with biotin, leading directly or indirectly to deposition of enzyme. If a biotinylated probe is used, an incubation with avidin-alkaline phosphatase conjugate follows. The membrane is soaked in the chemiluminescent substrate (AMPPD) and is exposed to film. Dephosphorylation of AMPPD leads in a two-step pathway to a highly localized emission of visible light. The demonstrated shorter exposure times may improve the efficiency of a serial reprobing strategy such as the multiplex sequencing approach of Church and Kieffer-Higgins [Church, G. M. & Kieffer-Higgins, S. (1988) Science 240, 185-188]. Images PMID:2191292

  10. RSAT 2015: Regulatory Sequence Analysis Tools

    PubMed Central

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-01-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632

  11. RSAT 2015: Regulatory Sequence Analysis Tools.

    PubMed

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-07-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/.

  12. Directed forgetting benefits motor sequence encoding.

    PubMed

    Tempel, Tobias; Frings, Christian

    2016-04-01

    Two experiments investigated directed forgetting of newly learned motor sequences. Concurrently with the list method of directed forgetting, participants successively learned two lists of motor sequences. Each sequence consisted of four consecutive finger movements. After a short distractor task, a recall test was given. Both experiments compared a forget group that was instructed to forget list-1 items with a remember group not receiving a forget instruction. We found that the instruction to forget list 1 enhanced recall of subsequently learned motor sequences. This benefit of directed forgetting occurred independently of costs for list 1. A mediation analysis showed that the encoding accuracy of list 2 was a mediator of the recall benefit, that is, the more accurate execution of motor sequences of list 2 after receiving a forget instruction for list 1 accounted for better recall of list 2. Thus, the adaptation of the list method to motor action provided more direct evidence on the effect of directed forgetting on subsequent learning. The results corroborate the assumption of a reset of encoding as a consequence of directed forgetting.

  13. Human genome sequencing in health and disease.

    PubMed

    Gonzaga-Jauregui, Claudia; Lupski, James R; Gibbs, Richard A

    2012-01-01

    Following the "finished," euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges.

  14. Detection of aneuploidies by paralogous sequence quantification

    PubMed Central

    Deutsch, S; Choudhury, U; Merla, G; Howald, C; Sylvan, A; Antonarakis, S

    2004-01-01

    Background: Chromosomal aneuploidies are a common cause of congenital disorders associated with cognitive impairment and multiple dysmorphic features. Pre-natal diagnosis of aneuploidies is most commonly performed by the karyotyping of fetal cells obtained by amniocentesis or chorionic villus sampling, but this method is labour intensive and requires about 14 days to complete. Methods: We have developed a PCR based method for the detection of targeted chromosome number abnormalities termed paralogous sequence quantification (PSQ), based on the use of paralogous genes. Paralogous sequences have a high degree of sequence identity, but accumulate nucleotide substitutions in a locus specific manner. These sequence differences, which we term paralogous sequence mismatches (PSMs), can be quantified using pyrosequencing technology, to estimate the relative dosage between different chromosomes. We designed 10 assays for the detection of trisomies of chromosomes 13, 18, and 21 and sex chromosome aneuploidies. Results: We evaluated the performance of this method on 175 DNAs, highly enriched for abnormal samples. A correct and unambiguous diagnosis was given for 119 out of 120 aneuploid samples as well as for all the controls. One sample which gave an intermediate value for the chromosome 13 assays could not be diagnosed. Conclusions: Our data suggests that PSQ is a robust, easy to interpret, and easy to set up method for the diagnosis of common aneuploidies, and can be performed in less than 48 h, representing a competitive alternative for widespread use in diagnostic laboratories. PMID:15591276

  15. Sequence analysis of the AAA protein family.

    PubMed Central

    Beyer, A.

    1997-01-01

    The AAA protein family, a recently recognized group of Walker-type ATPases, has been subjected to an extensive sequence analysis. Multiple sequence alignments revealed the existence of a region of sequence similarity, the so-called AAA cassette. The borders of this cassette were localized and within it, three boxes of a high degree of conservation were identified. Two of these boxes could be assigned to substantial parts of the ATP binding site (namely, to Walker motifs A and B); the third may be a portion of the catalytic center. Phylogenetic trees were calculated to obtain insights into the evolutionary history of the family. Subfamilies with varying degrees of intra-relatedness could be discriminated; these relationships are also supported by analysis of sequences outside the canonical AAA boxes: within the cassette are regions that are strongly conserved within each subfamily, whereas little or even no similarity between different subfamilies can be observed. These regions are well suited to define fingerprints for subfamilies. A secondary structure prediction utilizing all available sequence information was performed and the result was fitted to the general 3D structure of a Walker A/GTPase. The agreement was unexpectedly high and strongly supports the conclusion that the AAA family belongs to the Walker superfamily of A/GTPases. PMID:9336829

  16. Down Syndome: A search for expressed sequences

    SciTech Connect

    Pritchard, M.; Fuentes, J.J.; Bosch, A.

    1994-09-01

    Down Syndrome (DS) is a major cause of congenital heart disease and mental retardation. The most common anomaly is an extra copy of human chromosome 21 (HC21); however, chromosomal studies in rare patients with partial trisomy 21 have defined a minimal region for DS, including human chromosome 21 bands q22.2-q22.3. The study of genes in this chromosomal region will allow the elucidation of the biochemical and molecular bases for several of the distinct phenotypic traits of the syndrome. This information is the key to the design of therapeutic, pharmacological and genetic tools to counter the effects of three copies of chromosome 21 in the cells of DS patients. Towards this goal, we aim to build a transcriptional map of this region and then characterize any genes isolated. We are using two methods to isolate expressed sequences: (1) Alu-splice consensus PCR (2) cDNA hybridizsation selection. We use as starting material, YACs (CEPH/Genethon) from the specified region and cosmid minilibraries constructed from these YACs. Products are subcloned, sequenced and analyzed in the sequence databases. Several homologies with reported expressed sequences have been found and will be discussed. The HC21 origin of these putative expressed sequences is determined and they are then used to initially screen a human fetal brain full-length cDNA library. We have isolated several cDNAs and these are now being analyzed.

  17. Small Peptide Recognition Sequence for Intracellular Sorting

    PubMed Central

    Pandey, Kailash N.

    2010-01-01

    Increasing evidence indicate that complex arrays of short signals and recognition peptide sequence ensure accurate trafficking and distribution of transmembrane receptors and/or proteins and their ligands into intracellular compartments. Internalization and subsequent trafficking of cell-surface receptors into the cell interior is mediated by specific short-sequence peptide signals within the cytoplasmic domains of these receptor proteins. The short signals usually consist of small linear amino acid sequences, which are recognized by adaptor coat proteins along the endocytic and sorting pathways. In recent years, much has been learned about the function and mechanisms of endocytic pathways responsible for the trafficking and molecular sorting of membrane receptors and their ligands into intracellular compartments, however, the significance and scope of the short sequence motifs in these cellular events is not well understood. Here a particular emphasis has been given to the functions of short-sequence signal motifs responsible for the itinerary and destination of membrane receptors and proteins moving into subcellular compartments. PMID:20817434

  18. Progressive multiple sequence alignments from triplets

    PubMed Central

    Kruspe, Matthias; Stadler, Peter F

    2007-01-01

    Background The quality of progressive sequence alignments strongly depends on the accuracy of the individual pairwise alignment steps since gaps that are introduced at one step cannot be removed at later aggregation steps. Adjacent insertions and deletions necessarily appear in arbitrary order in pairwise alignments and hence form an unavoidable source of errors. Research Here we present a modified variant of progressive sequence alignments that addresses both issues. Instead of pairwise alignments we use exact dynamic programming to align sequence or profile triples. This avoids a large fractions of the ambiguities arising in pairwise alignments. In the subsequent aggregation steps we follow the logic of the Neighbor-Net algorithm, which constructs a phylogenetic network by step-wisely replacing triples by pairs instead of combining pairs to singletons. To this end the three-way alignments are subdivided into two partial alignments, at which stage all-gap columns are naturally removed. This alleviates the "once a gap, always a gap" problem of progressive alignment procedures. Conclusion The three-way Neighbor-Net based alignment program aln3nn is shown to compare favorably on both protein sequences and nucleic acids sequences to other progressive alignment tools. In the latter case one easily can include scoring terms that consider secondary structure features. Overall, the quality of resulting alignments in general exceeds that of clustalw or other multiple alignments tools even though our software does not included heuristics for context dependent (mis)match scores. PMID:17631683

  19. Beyond reasonable doubt: evolution from DNA sequences.

    PubMed

    White, W Timothy J; Zhong, Bojian; Penny, David

    2013-01-01

    We demonstrate quantitatively that, as predicted by evolutionary theory, sequences of homologous proteins from different species converge as we go further and further back in time. The converse, a non-evolutionary model can be expressed as probabilities, and the test works for chloroplast, nuclear and mitochondrial sequences, as well as for sequences that diverged at different time depths. Even on our conservative test, the probability that chance could produce the observed levels of ancestral convergence for just one of the eight datasets of 51 proteins is ≈1×10⁻¹⁹ and combined over 8 datasets is ≈1×10⁻¹³². By comparison, there are about 10⁸⁰ protons in the universe, hence the probability that the sequences could have been produced by a process involving unrelated ancestral sequences is about 10⁵⁰ lower than picking, among all protons, the same proton at random twice in a row. A non-evolutionary control model shows no convergence, and only a small number of parameters are required to account for the observations. It is time that that researchers insisted that doubters put up testable alternatives to evolution.

  20. Mitochondrial COII sequences and modern human origins.

    PubMed

    Ruvolo, M; Zehr, S; von Dornum, M; Pan, D; Chang, B; Lin, J

    1993-11-01

    The aim of this study is to measure human mitochondrial sequence variability in the relatively slowly evolving mitochondrial gene cytochrome oxidase subunit II (COII) and to estimate when the human common ancestral mitochondrial type existed. New COII gene sequences were determined for five humans (Homo sapiens), including some of the most mitochondrially divergent humans known; for two pygmy chimpanzees (Pan paniscus); and for a common chimpanzee (P. troglodytes). COII sequences were analyzed with those from another relatively slowly evolving mitochondrial region (ND4-5). From class 1 (third codon position) sequence data, a relative divergence date for the human mitochondrial ancestor is estimated as 1/27 th of the human-chimpanzee divergence time. If it is assumed that humans and chimpanzees diverged 6 Mya, this places a human mitochondrial ancestor at 222,000 years, significantly different from 1 Myr (the presumed time of an H. erectus emergence from Africa). The mean coalescent time estimated from all 1,580 sites of combined mitochondrial data, when a 6-Mya human-chimpanzee divergence is assumed, is 298,000 years, with 95% confidence interval of 129,000-536,000 years. Neither estimate is compatible with a 1-Myr-old human mitochondrial ancestor. The mitochondrial DNA sequence data from COII and ND4-5 regions therefore do not support this multiregional hypothesis for the emergence of modern humans.