Science.gov

Sample records for abrf edman sequencing

  1. N-Terminal Amino Acid Sequence Determination of Proteins by N-Terminal Dimethyl Labeling: Pitfalls and Advantages When Compared with Edman Degradation Sequence Analysis.

    PubMed

    Chang, Elizabeth; Pourmal, Sergei; Zhou, Chun; Kumar, Rupesh; Teplova, Marianna; Pavletich, Nikola P; Marians, Kenneth J; Erdjument-Bromage, Hediye

    2016-07-01

    In recent history, alternative approaches to Edman sequencing have been investigated, and to this end, the Association of Biomolecular Resource Facilities (ABRF) Protein Sequencing Research Group (PSRG) initiated studies in 2014 and 2015, looking into bottom-up and top-down N-terminal (Nt) dimethyl derivatization of standard quantities of intact proteins with the aim to determine Nt sequence information. We have expanded this initiative and used low picomole amounts of myoglobin to determine the efficiency of Nt-dimethylation. Application of this approach on protein domains, generated by limited proteolysis of overexpressed proteins, confirms that it is a universal labeling technique and is very sensitive when compared with Edman sequencing. Finally, we compared Edman sequencing and Nt-dimethylation of the same polypeptide fragments; results confirm that there is agreement in the identity of the Nt amino acid sequence between these 2 methods. PMID:27006647

  2. N-Terminal Amino Acid Sequence Determination of Proteins by N-Terminal Dimethyl Labeling: Pitfalls and Advantages When Compared with Edman Degradation Sequence Analysis

    PubMed Central

    Chang, Elizabeth; Pourmal, Sergei; Zhou, Chun; Kumar, Rupesh; Teplova, Marianna; Pavletich, Nikola P.; Marians, Kenneth J.

    2016-01-01

    In recent history, alternative approaches to Edman sequencing have been investigated, and to this end, the Association of Biomolecular Resource Facilities (ABRF) Protein Sequencing Research Group (PSRG) initiated studies in 2014 and 2015, looking into bottom-up and top-down N-terminal (Nt) dimethyl derivatization of standard quantities of intact proteins with the aim to determine Nt sequence information. We have expanded this initiative and used low picomole amounts of myoglobin to determine the efficiency of Nt-dimethylation. Application of this approach on protein domains, generated by limited proteolysis of overexpressed proteins, confirms that it is a universal labeling technique and is very sensitive when compared with Edman sequencing. Finally, we compared Edman sequencing and Nt-dimethylation of the same polypeptide fragments; results confirm that there is agreement in the identity of the Nt amino acid sequence between these 2 methods. PMID:27006647

  3. Multi-platform and cross-methodological reproducibility of transcriptome profiling by RNA-seq in the ABRF Next-Generation Sequencing Study

    PubMed Central

    Nicolet, Charles M.; Grove, Deborah; Levy, Shawn; Farmerie, William; Viale, Agnes; Wright, Chris; Schweitzer, Peter A.; Gao, Yuan; Kim, Dewey; Boland, Joe; Hicks, Belynda; Kim, Ryan; Chhangawala, Sagar; Jafari, Nadereh; Raghavachari, Nalini; Gandara, Jorge; Garcia-Reyero, Natàlia; Hendrickson, Cynthia; Roberson, David; Rosenfeld, Jeffrey; Smith, Todd; Underwood, Jason G.; Wang, May; Zumbo, Paul; Baldwin, Don A.; Grills, George S.; Mason, Christopher E.

    2014-01-01

    High-throughput RNA sequencing (RNA-seq) dramatically expands the potential for novel genomics discoveries, but the wide variety of platforms, protocols and performance has created the need for comprehensive reference data. Here we describe the Association of Biomolecular Resource Facilities next-generation sequencing (ABRF-NGS) study on RNA-seq. We tested replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols (polyA-selected, ribo-depleted, size-selected and degraded) on five sequencing platforms (Illumina HiSeq, Life Technologies’ PGM and Proton, Pacific Biosciences RS and Roche’s 454). The results show high intra-platform and inter-platform concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms. These data also demonstrate that ribosomal RNA depletion can both enable effective analysis of degraded RNA samples and be readily compared to polyA-enriched fractions. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq. PMID:25150835

  4. Broad coverage identification of multiple proteolytic cleavage site sequences in complex high molecular weight proteins using quantitative proteomics as a complement to edman sequencing.

    PubMed

    Doucet, Alain; Overall, Christopher M

    2011-05-01

    Proteolytic processing modifies the pleiotropic functions of many large, complex, and modular proteins and can generate cleavage products with new biological activity. The identification of exact proteolytic cleavage sites in the extracellular matrix laminins, fibronectin, and other extracellular matrix proteins is not only important for understanding protein turnover but is needed for the identification of new bioactive cleavage products. Several such products have recently been recognized that are suggested to play important cellular regulatory roles in processes, including angiogenesis. However, identifying multiple cleavage sites in extracellular matrix proteins and other large proteins is challenging as N-terminal Edman sequencing of multiple and often closely spaced cleavage fragments on SDS-PAGE gels is difficult, thus limiting throughput and coverage. We developed a new liquid chromatography-mass spectrometry approach we call amino-terminal oriented mass spectrometry of substrates (ATOMS) for the N-terminal identification of protein cleavage fragments in solution. ATOMS utilizes efficient and low cost dimethylation isotopic labeling of original N-terminal and proteolytically generated N termini of protein cleavage fragments followed by quantitative tandem mass spectrometry analysis. Being a peptide-centric approach, ATOMS is not dependent on the SDS-PAGE resolution limits for protein fragments of similar mass. We demonstrate that ATOMS reliably identifies multiple proteolytic sites per reaction in complex proteins. Fifty-five neutrophil elastase cleavage sites were identified in laminin-1 and fibronectin-1 with 34 more identified by matrix metalloproteinase cleavage. Hence, our degradomics approach offers a complimentary alternative to Edman sequencing with broad applicability in identifying N termini such as cleavage sites in complex high molecular weight extracellular matrix proteins after in vitro cleavage assays. ATOMS can therefore be useful in

  5. THE ABRF MARG MICROARRAY SURVEY 2005: TAKING THE PULSE ON THE MICROARRAY FIELD

    EPA Science Inventory

    Over the past several years microarray technology has evolved into a critical component of any discovery based program. Since 1999, the Association of Biomolecular Resource Facilities (ABRF) Microarray Research Group (MARG) has conducted biennial surveys designed to generate a pr...

  6. ABRF-PRG07: Advanced Quantitative Proteomics Study

    PubMed Central

    Falick, Arnold M.; Lane, William S.; Lilley, Kathryn S.; MacCoss, Michael J.; Phinney, Brett S.; Sherman, Nicholas E.; Weintraub, Susan T.; Witkowska, H. Ewa; Yates, Nathan A.

    2011-01-01

    A major challenge for core facilities is determining quantitative protein differences across complex biological samples. Although there are numerous techniques in the literature for relative and absolute protein quantification, the majority is nonroutine and can be challenging to carry out effectively. There are few studies comparing these technologies in terms of their reproducibility, accuracy, and precision, and no studies to date deal with performance across multiple laboratories with varied levels of expertise. Here, we describe an Association of Biomolecular Resource Facilities (ABRF) Proteomics Research Group (PRG) study based on samples composed of a complex protein mixture into which 12 known proteins were added at varying but defined ratios. All of the proteins were present at the same concentration in each of three tubes that were provided. The primary goal of this study was to allow each laboratory to evaluate its capabilities and approaches with regard to: detection and identification of proteins spiked into samples that also contain complex mixtures of background proteins and determination of relative quantities of the spiked proteins. The results returned by 43 participants were compiled by the PRG, which also collected information about the strategies used to assess overall performance and as an aid to development of optimized protocols for the methodologies used. The most accurate results were generally reported by the most experienced laboratories. Among laboratories that used the same technique, values that were closer to the expected ratio were obtained by more experienced groups. PMID:21455478

  7. A photothermally responsive nanoprobe for bioimaging based on Edman degradation.

    PubMed

    Liu, Yi; Wang, Zhantong; Zhang, Huimin; Lang, Lixin; Ma, Ying; He, Qianjun; Lu, Nan; Huang, Peng; Liu, Yijing; Song, Jibin; Liu, Zhibo; Gao, Shi; Ma, Qingjie; Kiesewetter, Dale O; Chen, Xiaoyuan

    2016-05-19

    A new type of photothermally responsive nanoprobe based on Edman degradation has been synthesized and characterized. Under irradiation by an 808 nm laser, the heat generated by the gold nanorod core breaks the thiocarbamide structure and releases the fluorescent dye Cy5.5 with increased near-infrared (NIR) fluorescence under mild acidic conditions. This RGD modified nanoprobe is capable of fluorescence imaging of ανβ3 over-expressing U87MG cells in vitro and in vivo. This Edman degradation-based nanoprobe provides a novel strategy to design activatable probes for biomedical imaging and drug/gene delivery. PMID:27149392

  8. A photothermally responsive nanoprobe for bioimaging based on Edman degradation

    NASA Astrophysics Data System (ADS)

    Liu, Yi; Wang, Zhantong; Zhang, Huimin; Lang, Lixin; Ma, Ying; He, Qianjun; Lu, Nan; Huang, Peng; Liu, Yijing; Song, Jibin; Liu, Zhibo; Gao, Shi; Ma, Qingjie; Kiesewetter, Dale O.; Chen, Xiaoyuan

    2016-05-01

    A new type of photothermally responsive nanoprobe based on Edman degradation has been synthesized and characterized. Under irradiation by an 808 nm laser, the heat generated by the gold nanorod core breaks the thiocarbamide structure and releases the fluorescent dye Cy5.5 with increased near-infrared (NIR) fluorescence under mild acidic conditions. This RGD modified nanoprobe is capable of fluorescence imaging of ανβ3 over-expressing U87MG cells in vitro and in vivo. This Edman degradation-based nanoprobe provides a novel strategy to design activatable probes for biomedical imaging and drug/gene delivery.A new type of photothermally responsive nanoprobe based on Edman degradation has been synthesized and characterized. Under irradiation by an 808 nm laser, the heat generated by the gold nanorod core breaks the thiocarbamide structure and releases the fluorescent dye Cy5.5 with increased near-infrared (NIR) fluorescence under mild acidic conditions. This RGD modified nanoprobe is capable of fluorescence imaging of ανβ3 over-expressing U87MG cells in vitro and in vivo. This Edman degradation-based nanoprobe provides a novel strategy to design activatable probes for biomedical imaging and drug/gene delivery. Electronic supplementary information (ESI) available: HPLC, MS and 1H NMR spectrum. See DOI: 10.1039/c6nr01400c

  9. THE ABRF-MARG MICROARRAY SURVEY 2004: TAKING THE PULSE OF THE MICROARRAY FIELD

    EPA Science Inventory

    Over the past several years, the field of microarrays has grown and evolved drastically. In its continued efforts to track this evolution, the ABRF-MARG has once again conducted a survey of international microarray facilities and individual microarray users. The goal of the surve...

  10. Monitoring of environmental cancer initiators through hemoglobin adducts by a modified Edman degradation method

    SciTech Connect

    Toernqvist, M.M.; Mowrer, J.; Jensen, S.; Ehrenberg, L.

    1986-04-01

    Tissue doses of cancer initiators/mutagens are suitably monitored through hemoglobin adducts formed in vivo, but the use of this method has been hampered by a lack of sufficiently simple and fast procedures. It was previously observed that when the N-terminal amino acid in hemoglobin, valine, is alkylated it is cleaved off by the Edman sequencing reagent, phenyl isothiocyanate, in the neutral-alkaline coupling medium, as opposed to the acidic medium required by normal amino acids. Based on this principle, conditions for a functioning procedure for gas chromatography/mass spectrometry (GC/MS) determination of N-terminal alkylvalines in hemoglobin were worked out. Derivatizing the protein in formamide solution with pentafluorophenyl isothiocyanate, using a /sup 2/H-alkylated protein as internal standard, and applying on-column injection during analysis, permit reproducible determination of hydroxyethylvaline and other adducts down into the dose range where cancer risks may be considered acceptably low.

  11. New method of peptide cleavage based on Edman degradation.

    PubMed

    Bąchor, Remigiusz; Kluczyk, Alicja; Stefanowicz, Piotr; Szewczuk, Zbigniew

    2013-08-01

    A straightforward cleavage method for N- acylated peptides based on the phenylthiohydantoin (PTH) formation is presented. The procedure could be applied to acid-stable resins, such as TentaGel HL-NH[Formula: see text]. We designed a cleavable linker that consists of a lysine residue with the [Formula: see text]-amino group blocked by Boc, whereas the [Formula: see text]-amino group is used for peptide synthesis. After the peptide assembly is completed, the protecting groups in peptide side chains are removed using trifluoroacetic acid, thus liberating also the [Formula: see text]-amino group of the lysine in the linker. Then the reaction with phenyl isothiocyanate followed by acidolysis causes an efficient peptide release from the resin as a stable PTH derivative. Furthermore, the application of a fixed charge tag in the form of 2-(4-aza-1-azoniabicyclo[2.2.2]octylammonium)acetyl group increases ionization efficiency and reduces the detection limit, allowing ESI-MS/MS sequencing of peptides in the subfemtomolar range. The proposed strategy is compatible with standard conditions during one-bead-one-compound peptide library synthesis. The applicability of the developed strategy in combinatorial chemistry was confirmed using a small training library of [Formula: see text]-chymotrypsin substrates. PMID:23690169

  12. Can Edman degradation be used for quantification? Isotope-dilution liquid chromatography-electrospray ionization tandem mass spectrometry and the long-term stability of 20 phenylthiohydantoin-amino acids.

    PubMed

    Satoh, Ryo; Goto, Takaaki; Lee, Seon Hwa; Oe, Tomoyuki

    2013-10-01

    Edman degradation is a well-known method for obtaining amino acid (AA) sequences from a peptide by means of sequential reactions that release the N-terminal AAs from the peptide as a phenylthiohydantoin (PTH) derivative. Because of unexpected loss during the reaction and handling, there are few reports of use of this reaction for quantification. This manuscript describes the development of isotope-dilution liquid chromatography-electrospray ionization tandem mass spectrometry for 20 PTH-AA derivatives, and long-term stability testing of PTH-AAs to ensure quantitative quality in the reaction. The 20 corresponding [(13)C6]-PTH-AAs were prepared by use of a one-pot reaction involving a mixture of [(13)C6]-Edman reagent and 20 AAs. Good linearity was observed for standard curves for the PTH-AAs, using the corresponding [(13)C6]-PTH-AAs as internal standards (1-100 pmol per injection, r(2) = 0.989-1.000). Serum albumin (human), pepsin (porcine stomach mucosa), α-casein (bovine milk), ribonuclease A (bovine), lysozyme (chicken egg white), and insulin (bovine) subjected to Edman degradation were examined as model proteins and peptides for N-terminal AA analysis. The results of the impurity test were satisfactory. Yield from the entire reaction with human serum albumin was estimated to be at least 75%, indicating great potential for absolute quantification of proteins without protein standards. PMID:23545858

  13. Determination of the covalent structure of an N- and C-terminally blocked glycoprotein from endocuticle of Locusta migratoria. Combined use of plasma desorption mass spectrometry and Edman degradation to study post-translationally modified proteins.

    PubMed

    Talbo, G; Højrup, P; Rahbek-Nielsen, H; Andersen, S O; Roepstorff, P

    1991-01-30

    The complete structure of protein isolated from endocuticle of sexually mature locusts, Locusta migratoria, has been determined by a combination of automatic Edman degradation and plasma desorption mass spectrometry. The protein is extensively post-translationally modified. The N-terminal is 5-oxoproline (pyroglutamic acid) and the C-terminal proline residue is amidated. Furthermore, the protein is glycosylated by a single N-acetyl-galactosamine residue at one, two or three threonines. The N-terminal sequence was obtained by analysing the N-acetylated N,O-permethylated derivative using plasma desorption mass spectrometry. The position and type of carbohydrate were determined by combining an HPLC-based carbohydrate analysis with the peak pattern of the phenylthiohydantoin derivative in automatic sequencing and with mass information on peptides. The protein has pronounced similarity to cuticular proteins from larvae of diptera and lepidoptera, but only slight resemblance to the previously sequenced locust exocuticular proteins. This indicates a similarity between soft larval cuticles and locust endocuticle, a similarity which may extend to their mechanical properties. PMID:1997327

  14. The 2012/2013 ABRF Proteomic Research Group Study: Assessing Longitudinal Intralaboratory Variability in Routine Peptide Liquid Chromatography Tandem Mass Spectrometry Analyses.

    PubMed

    Bennett, Keiryn L; Wang, Xia; Bystrom, Cory E; Chambers, Matthew C; Andacht, Tracy M; Dangott, Larry J; Elortza, Félix; Leszyk, John; Molina, Henrik; Moritz, Robert L; Phinney, Brett S; Thompson, J Will; Bunger, Maureen K; Tabb, David L

    2015-12-01

    Questions concerning longitudinal data quality and reproducibility of proteomic laboratories spurred the Protein Research Group of the Association of Biomolecular Resource Facilities (ABRF-PRG) to design a study to systematically assess the reproducibility of proteomic laboratories over an extended period of time. Developed as an open study, initially 64 participants were recruited from the broader mass spectrometry community to analyze provided aliquots of a six bovine protein tryptic digest mixture every month for a period of nine months. Data were uploaded to a central repository, and the operators answered an accompanying survey. Ultimately, 45 laboratories submitted a minimum of eight LC-MSMS raw data files collected in data-dependent acquisition (DDA) mode. No standard operating procedures were enforced; rather the participants were encouraged to analyze the samples according to usual practices in the laboratory. Unlike previous studies, this investigation was not designed to compare laboratories or instrument configuration, but rather to assess the temporal intralaboratory reproducibility. The outcome of the study was reassuring with 80% of the participating laboratories performing analyses at a medium to high level of reproducibility and quality over the 9-month period. For the groups that had one or more outlying experiments, the major contributing factor that correlated to the survey data was the performance of preventative maintenance prior to the LC-MSMS analyses. Thus, the Protein Research Group of the Association of Biomolecular Resource Facilities recommends that laboratories closely scrutinize the quality control data following such events. Additionally, improved quality control recording is imperative. This longitudinal study provides evidence that mass spectrometry-based proteomics is reproducible. When quality control measures are strictly adhered to, such reproducibility is comparable among many disparate groups. Data from the study are

  15. Unraveling the sequence and structure of the protein osteocalcin from a 42 ka fossil horse

    NASA Astrophysics Data System (ADS)

    Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Andrews, Philip C.; Leykam, Joseph; Stafford, Thomas W.; Kelly, Robert L.; Walker, Danny N.; Buckley, Mike; Humpula, James

    2006-04-01

    We report the first complete amino acid sequence and evidence of secondary structure for osteocalcin from a temperate fossil. The osteocalcin derives from a 42 ka equid bone excavated from Juniper Cave, Wyoming. Results were determined by matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-MS) and Edman sequencing with independent confirmation of the sequence in two laboratories. The ancient sequence was compared to that of three modern taxa: horse ( Equus caballus), zebra ( Equus grevyi), and donkey ( Equus asinus). Although there was no difference in sequence among modern taxa, MALDI-MS and Edman sequencing show that residues 48 and 49 of our modern horse are Thr, Ala rather than Pro, Val as previously reported (Carstanjen B., Wattiez, R., Armory, H., Lepage, O.M., Remy, B., 2002. Isolation and characterization of equine osteocalcin. Ann. Med. Vet.146(1), 31-38). MALDI-MS and Edman sequencing data indicate that the osteocalcin sequence of the 42 ka fossil is similar to that of modern horse. Previously inaccessible structural attributes for ancient osteocalcin were observed. Glu 39 rather than Gln 39 is consistent with deamidation, a process known to occur during fossilization and aging. Two post-translational modifications were documented: Hyp 9 and a disulfide bridge. The latter suggests at least partial retention of secondary structure. As has been done for ancient DNA research, we recommend standards for preparation and criteria for authenticating results of ancient protein sequencing.

  16. Primary structure of a histidine-rich proteolytic fragment of human ceruloplasmin. I. Amino acid sequence of the cyanogen bromide peptides.

    PubMed

    Kingston, I B; Kingston, B L; Putnam, F W

    1980-04-10

    A histidine-rich fragment, Cp F5, with a molecular weight of 18,650 was isolated from human ceruloplasmin. It consists of 159 amino acids and contains a possible copper-binding site. The sequence of the first 18 NH2-terminal residues of Cp F5 was determined by automated Edman degradation. Cp F5 was cleaved by cyanogen bromide to produce nine fragments of from 2 to 63 residues. The amino acid sequence of all of the cyanogen bromide fragments was investigated using automated and manual Edman degradation, the fragments being digested with trypsin, chymotrypsin, thermolysin, staphylococcal protease, and pepsin as appropriate. The results, in conjunction with the data on the tryptic peptides reported in the accompanying paper (Kingston, I.B., Kingston, B.L., and Putnam, F.L. (1980) J. Biol. Chem. 255, 2886-2896), establish the complete amino acid sequence of Cp F5. PMID:6987229

  17. Protein Sequencing with Tandem Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Ziady, Assem G.; Kinter, Michael

    The recent introduction of electrospray ionization techniques that are suitable for peptides and whole proteins has allowed for the design of mass spectrometric protocols that provide accurate sequence information for proteins. The advantages gained by these approaches over traditional Edman Degradation sequencing include faster analysis and femtomole, sometimes attomole, sensitivity. The ability to efficiently identify proteins has allowed investigators to conduct studies on their differential expression or modification in response to various treatments or disease states. In this chapter, we discuss the use of electrospray tandem mass spectrometry, a technique whereby protein-derived peptides are subjected to fragmentation in the gas phase, revealing sequence information for the protein. This powerful technique has been instrumental for the study of proteins and markers associated with various disorders, including heart disease, cancer, and cystic fibrosis. We use the study of protein expression in cystic fibrosis as an example.

  18. Characterization of a benzyladenine binding-site peptide isolated from a wheat cytokinin-binding protein: Sequence analysis and identification of a single affinity-labeled histidine residue by mass spectrometry

    SciTech Connect

    Brinegar, A.C.; Cooper, G.; Stevens, A.; Hauer, C.R.; Shabanowitz, J.; Hunt, D.F.; Fox, J.E. )

    1988-08-01

    A wheat embryo cytokinin-binding protein was covalently modified with the radiolabeled photoaffinity ligand 2-azido-N{sup 6}-({sup 14}C)benzyladenine. A single labeled peptide was obtained after proteolytic digestion and isolation by reversed-phase and anion-exchange HPLC. Sequencing by classical Edman degradation identified 11 of the 12 residues but failed to identify the labeled amino acid. Analysis by laser photodissociation Fourier-transform mass spectrometry of 10 pmol of the peptide independently confirmed the Edman data and also demonstrated that the histidine residue nearest the C terminus (underlined) was modified by the reagent in the sequence Ala-Phe-Leu-Gln-Pro-Ser-His-His{und His}-Asp-Ala-Asp-Glu.

  19. Studies on the high-sulphur proteins of reduced Merino wool. Amino acid sequence of protein SCMKB-IIIB4

    PubMed Central

    Swart, L. S.; Haylett, T.

    1971-01-01

    The complete amino acid sequence of protein SCMKB-IIIB4 is presented. It is closely related to the sequence of protein SCMKB-IIIB3 (Haylett, Swart & Parris, 1971) differing in only four positions. The peptic and thermolysin peptides of protein SCMKB-IIIB4 were analysed by the dansyl–Edman method (Gray, 1967) and by tritium-labelling of C-terminal residues (Matsuo, Fujimoto & Tatsuno, 1966). This protein is the third member of a group of high-sulphur wool proteins with molecular weight of about 11400. It consists of 98 residues and has acetylalanine and carboxymethylcysteine as N- and C-terminal residues respectively. PMID:4942536

  20. Complete amino acid sequence of the myoglobin from the Pacific spotted dolphin, Stenella attenuata graffmani.

    PubMed

    Jones, B N; Wang, C C; Dwulet, F E; Lehman, L D; Meuth, J L; Bogardt, R A; Gurd, F R

    1979-04-25

    The complete amino acid sequence of the major component myoglobin from the Pacific spotted dolphin, Stenella attenuata graffmani, was determined by the automated Edman degradation of several large peptides obtained by specific cleavage of the protein. The acetimidated apomyoglobin was selectively cleaved at its two methionyl residues with cyanogen bromide and at its three arginyl residues by trypsin. By subjecting four of these peptides and the apomyoglobin to automated Edman degradation, over 80% of the primary structure of the protein was obtained. The remainder of the covalent structure was determined by the sequence analysis of peptides that resulted from further digestion of the central cyanogen bromide fragment. This fragment was cleaved at its glutamyl residues with staphylococcal protease and its lysyl residues with trypsin. The action of trypsin was restricted to the lysyl residues by chemical modification of the single arginyl residue of the fragment with 1,2-cyclohexanedione. The primary structure of this myoglobin proved to be identical with that from the Atlantic bottlenosed dolphin and Pacific common dolphin but differs from the myoglobins of the killer whale and pilot whale at two positions. The above sequence identities and differences reflect the close taxonomic relationship of these five species of Cetacea. PMID:454657

  1. Protein chemotaxonomy. XIII. Amino acid sequence of ferredoxin from Panax ginseng.

    PubMed

    Mino, Yoshiki

    2006-08-01

    The complete amino acid sequence of [2Fe-2S] ferredoxin from Panax ginseng (Araliaceae) has been determined by automated Edman degradation of the entire S-carboxymethylcysteinyl protein and of the peptides obtained by enzymatic digestion. This ferredoxin has a unique amino acid sequence, which includes an insertion of Tyr at the 3rd position from the amino-terminus and a deletion of two amino acid residues at the carboxyl terminus. This ferredoxin had 18 differences in its amino acid sequence compared to that of Petroselinum sativum (Umbelliferae). In contrast, 23-33 differences were observed compared to other dicotyledonous plants. This suggests that Panax ginseng is related taxonomically to umbelliferous plants. PMID:16880642

  2. Complete amino acid sequence and structure characterization of the taste-modifying protein, miraculin.

    PubMed

    Theerasilp, S; Hitotsuya, H; Nakajo, S; Nakaya, K; Nakamura, Y; Kurihara, Y

    1989-04-25

    The taste-modifying protein, miraculin, has the unusual property of modifying sour taste into sweet taste. The complete amino acid sequence of miraculin purified from miracle fruits by a newly developed method (Theerasilp, S., and Kurihara, Y. (1988) J. Biol. Chem. 263, 11536-11539) was determined by an automatic Edman degradation method. Miraculin was a single polypeptide with 191 amino acid residues. The calculated molecular weight based on the amino acid sequence and the carbohydrate content (13.9%) was 24,600. Asn-42 and Asn-186 were linked N-glycosidically to carbohydrate chains. High homology was found between the amino acid sequences of miraculin and soybean trypsin inhibitor. PMID:2708331

  3. Biosynthesis of riboflavin: cloning, sequencing, mapping, and expression of the gene coding for GTP cyclohydrolase II in Escherichia coli.

    PubMed Central

    Richter, G; Ritz, H; Katzenmeier, G; Volk, R; Kohnle, A; Lottspeich, F; Allendorf, D; Bacher, A

    1993-01-01

    GTP cyclohydrolase II catalyzes the first committed step in the biosynthesis of riboflavin. The gene coding for this enzyme in Escherichia coli has been cloned by marker rescue. Sequencing indicated an open reading frame of 588 bp coding for a 21.8-kDa peptide of 196 amino acids. The gene was mapped to a position at 28.2 min on the E. coli chromosome and is identical with ribA. GTP cyclohydrolase II was overexpressed in a recombinant strain carrying a plasmid with the cloned gene. The enzyme was purified to homogeneity from the recombinant strain. The N-terminal sequence determined by Edman degradation was identical to the predicted sequence. The sequence is homologous to the 3' part of the central open reading frame in the riboflavin operon of Bacillus subtilis. PMID:8320220

  4. Amino acid sequence of bovine heart coupling factor 6.

    PubMed Central

    Fang, J K; Jacobs, J W; Kanner, B I; Racker, E; Bradshaw, R A

    1984-01-01

    The amino acid sequence of bovine heart mitochondrial coupling factor 6 (F6) has been determined by automated Edman degradation of the whole protein and derived peptides. Preparations based on heat precipitation and ethanol extraction showed allotypic variation at three positions while material further purified by HPLC yielded only one sequence that also differed by a Phe-Thr replacement at residue 62. The mature protein contains 76 amino acids with a calculated molecular weight of 9006 and a pI of approximately equal to 5, in good agreement with experimentally measured values. The charged amino acids are mainly clustered at the termini and in one section in the middle; these three polar segments are separated by two segments relatively rich in nonpolar residues. Chou-Fasman analysis suggests three stretches of alpha-helix coinciding (or within) the high-charge-density sequences with a single beta-turn at the first polar-nonpolar junction. Comparison of the F6 sequence with those of other proteins did not reveal any homologous structures. PMID:6149548

  5. Purification and amino acid sequence of aminopeptidase P from pig kidney.

    PubMed

    Vergas Romero, C; Neudorfer, I; Mann, K; Schäfer, W

    1995-04-01

    Aminopeptidase P from kidney cortex was purified in high yield (recovery greater than or equal to 20%) by a series of column chromatographic steps after solubilization of the membrane-bound glycoprotein with n-butanol. A coupled enzymic assay, using Gly-Pro-Pro-NH-Nap as substrate and dipeptidyl-peptidase IV as auxilliary enzyme, was used to monitor the purification. The purification procedure yielded two forms of aminopeptidase P differing in their carbohydrate composition (glycoforms). Both enzyme preparations were homogeneous as assessed by SDS/PAGE silver staining, and isoelectric focusing. Both forms possessed the same substrate specificity, catalysed the same reaction, and consisted of identical protein chains. The amino acid sequence determined by Edman degradation and mass spectrometry consisted of 623 amino acids. Six N-glycosylation sites, all contained in the N-terminal half of the protein, were characterized. PMID:7744038

  6. Studies on monotreme proteins. VII. Amino acid sequence of myoglobin from the platypus, Ornithoryhynchus anatinus.

    PubMed

    Fisher, W K; Thompson, E O

    1976-03-01

    Myoglobin isolated from skeletal muscle of the platypus contains 153 amino acid residues. The complete amino acid sequence has been determined following cleavage with cyanogen bromide and further digestion of the four fragments with trypsin, chymotrypsin, pepsin and thermolysin. Sequences of the purified peptides were determined by the dansyl-Edman procedure. The amino acid sequence showed 25 differences from human myoglobin and 24 from kangaroo myoglobin. Amino acid sequences in myoglobins are more conserved than sequences in the alpha- and beta-globin chains, and platypus myoglobin shows a similar number of variations in sequence to kangaroo myoglobin when compared with myoglobin of other species. The date of divergence of the platypus from other mammals was estimated at 102 +/- 31 million years, based on the number of amino acid differences between species and allowing for mutations during the evolutionary period. This estimate differs widely from the estimate given by similar treatment of the alpha- and beta-chain sequences and a constant rate of mutation of globin chains is not supported. PMID:962722

  7. Two distinct ferredoxins from Rhodobacter capsulatus: complete amino acid sequences and molecular evolution.

    PubMed

    Saeki, K; Suetsugu, Y; Yao, Y; Horio, T; Marrs, B L; Matsubara, H

    1990-09-01

    Two distinct ferredoxins were purified from Rhodobacter capsulatus SB1003. Their complete amino acid sequences were determined by a combination of protease digestion, BrCN cleavage and Edman degradation. Ferredoxins I and II were composed of 64 and 111 amino acids, respectively, with molecular weights of 6,728 and 12,549 excluding iron and sulfur atoms. Both contained two Cys clusters in their amino acid sequences. The first cluster of ferredoxin I and the second cluster of ferredoxin II had a sequence, CxxCxxCxxxCP, in common with the ferredoxins found in Clostridia. The second cluster of ferredoxin I had a sequence, CxxCxxxxxxxxCxxxCM, with extra amino acids between the second and third Cys, which has been reported for other photosynthetic bacterial ferredoxins and putative ferredoxins (nif-gene products) from nitrogen-fixing bacteria, and with a unique occurrence of Met. The first cluster of ferredoxin II had a CxxCxxxxCxxxCP sequence, with two additional amino acids between the second and third Cys, a characteristics feature of Azotobacter-[3Fe-4S] [4Fe-4S]-ferredoxin. Ferredoxin II was also similar to Azotobacter-type ferredoxins with an extended carboxyl (C-) terminal sequence compared to the common Clostridium-type. The evolutionary relationship of the two together with a putative one recently found to be encoded in nifENXQ region in this bacterium [Moreno-Vivian et al. (1989) J. Bacteriol. 171, 2591-2598] is discussed. PMID:2277040

  8. Sequence of the phosphothreonyl regulatory site peptide from inactive maize leaf pyruvate, orthophosphate dikinase

    SciTech Connect

    Roeske, C.A.; Kutny, R.M.; Budde, R.J.A.; Chollet, R.

    1988-05-15

    The regulatory site peptide sequence of phosphorylated inactive pyruvate, orthophosphate dikinase from maize leaf tissue was determined by automated Edman degradation analysis of /sup 32/P-labeled peptides purified by reversed-phase high performance liquid chromatography. The overlapping phosphopeptides were products of a digestion of the (..beta..-/sup 32/P)ADP-inactivated dikinase with either trypsin or Pronase E. The sequence is Thr-Glu-Arg-Gly-Gly-Met-Thr(P)-Ser-His-Ala-Ala-Val-Val-Ala-Arg. The phosphothreonine residue, which appeared as either an anomalous proline or an unidentifiable phenylthiohydantoin derivative during sequencing, was verified by two-dimensional phosphoamino acid analysis of the phosphopeptides and by resequencing the tryptic peptide after dephosphorylation with exogenous alkaline phosphatase. This sequence, starting at position 4, is completely homologous to the previously published sequence of the tryptic dodecapeptide harboring the catalytically essential (phospho)histidyl residue in the active-site domain of the dikinase from the nonphotosynthetic bacterium, Bacteroides symbiosus. These comparative results indicate that the regulatory phosphothreonine causing complete inactivation of maize leaf dikinase is separated from the critical active-site (phospho)histidine by just one intervening residue in the primary sequence.

  9. Investigation of the protein osteocalcin of Camelops hesternus: Sequence, structure and phylogenetic implications

    NASA Astrophysics Data System (ADS)

    Humpula, James F.; Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Stafford, Thomas W.; Smith, James J.; Voorhies, Michael R.; George Corner, R.; Andrews, Phillip C.

    2007-12-01

    Ancient DNA sequences offer an extraordinary opportunity to unravel the evolutionary history of ancient organisms. Protein sequences offer another reservoir of genetic information that has recently become tractable through the application of mass spectrometric techniques. The extent to which ancient protein sequences resolve phylogenetic relationships, however, has not been explored. We determined the osteocalcin amino acid sequence from the bone of an extinct Camelid (21 ka, Camelops hesternus) excavated from Isleta Cave, New Mexico and three bones of extant camelids: bactrian camel ( Camelus bactrianus); dromedary camel ( Camelus dromedarius) and guanaco ( Llama guanacoe) for a diagenetic and phylogenetic assessment. There was no difference in sequence among the four taxa. Structural attributes observed in both modern and ancient osteocalcin include a post-translation modification, Hyp 9, deamidation of Gln 35 and Gln 39, and oxidation of Met 36. Carbamylation of the N-terminus in ancient osteocalcin may result in blockage and explain previous difficulties in sequencing ancient proteins via Edman degradation. A phylogenetic analysis using osteocalcin sequences of 25 vertebrate taxa was conducted to explore osteocalcin protein evolution and the utility of osteocalcin sequences for delineating phylogenetic relationships. The maximum likelihood tree closely reflected generally recognized taxonomic relationships. For example, maximum likelihood analysis recovered rodents, birds and, within hominins, the Homo-Pan-Gorilla trichotomy. Within Artiodactyla, character state analysis showed that a substitution of Pro 4 for His 4 defines the Capra-Ovis clade within Artiodactyla. Homoplasy in our analysis indicated that osteocalcin evolution is not a perfect indicator of species evolution. Limited sequence availability prevented assigning functional significance to sequence changes. Our preliminary analysis of osteocalcin evolution represents an initial step towards a

  10. A unique charged tyrosine-containing member of the adipokinetic hormone/red-pigment-concentrating hormone peptide family isolated and sequenced from two beetle species.

    PubMed

    Gäde, G

    1991-05-01

    An identical neuropeptide was isolated from the corpora cardiaca of two beetle species, Melolontha melolontha and Geotrupes stercorosus. Its primary structure was determined by pulsed-liquid-phase sequencing employing Edman chemistry after enzymically deblocking the N-terminal pyroglutamate residue. The C-terminus was also blocked, as indicated by the lack of digestion when the peptide was incubated with carboxypeptidase A. The sequence of this peptide, which is designated Mem-CC, is pGlu-Leu-Asn-Tyr-Ser-Pro-Asp-Trp-NH2. It is a new member of the adipokinetic hormone/red-pigment-concentrating hormone (AKH/RPCH) family of peptides with two unusual structural features: it is charged and contains a tyrosine residue at position 4, where all other family members have a phenylalanine residue. Structure-activity studies in the migratory locust (Locusta migratoria) and the American cockroach (Periplaneta americana) revealed that the peptide was poorly active, owing to its structural uniqueness. PMID:2039445

  11. Purification and N-terminal sequence of a serine proteinase-like protein (BMK-CBP) from the venom of the Chinese scorpion (Buthus martensii Karsch).

    PubMed

    Gao, Rong; Zhang, Yong; Gopalakrishnakone, Ponnampalam

    2008-08-01

    A serine proteinase-like protein was isolated from the venom of Chinese red scorpion (Buthus martensii Karsch) by combination of gel filtration, ion-exchange and reveres-phase chromatography and named BMK-CBP. The apparent molecular weight of BMK-CBP was identified as 33 kDa by SDS-PAGE under non-reducing condition. The sequence of N-terminal 40 amino acids was obtained by Edman degradation. The sequence shows highest similarity to proteinase from insect source. When tested with commonly used substrates of proteinase, no significant hydrolytic activity was observed for BMK-CBP. The purified BMK-CBP was found to bind to the cancer cell line MCF-7 and the cell binding ability was dose-dependent. PMID:18625260

  12. The primary structure of the hemoglobin of Malayan sun bear (Helarctos malayanus, Carnivora) and structural comparison to other hemoglobin sequences.

    PubMed

    Hofmann, O; Braunitzer, G; Göltenboth, R

    1987-05-01

    The complete primary structure of the alpha- and beta-chains of the hemoglobin of Malayan Sun Bear (Helarctos malayanus) is presented. After cleavage of the heme-protein link and chain separation by RP-HPLC, amino-acid sequences were determined by Edman degradation in liquid- and gas-phase sequenators. An interesting result of this work is the demonstration that the hemoglobin of Malayan Sun Bear is identical to the hemoglobins of Polar Bear (Ursus maritimus) and Asiatic Black Bear (Ursus tibetanus). The paper gives an updated table of identical hemoglobin chains from different species. This paper may be considered as a compilation of work on the genetic relationship of Pandas. PMID:3620104

  13. Unconventional amino acid sequence of the sun anemone (Stoichactis helianthus) polypeptide neurotoxin

    SciTech Connect

    Kem, W.; Dunn, B.; Parten, B.; Pennington, M.; Price, D.

    1986-05-01

    A 5000 dalton polypeptide neurotoxin (Sh-NI) purified by G50 Sephadex, P-cellulose, and SP-Sephadex chromatography was homogeneous by isoelectric focusing. Sh-NI was highly toxic to crayfish (LD/sub 50/ 0.6 ..mu..g/kg) but without effect upon mice at 15,000 ..mu..g/kg (i.p. injection). The reduced, /sup 3/H-carboxymethylated toxin and its fragments were subjected to automatic Edman degradation and the resulting PTH-amino acids were identified by HPLC, back hydrolysis, and scintillation counting. Peptides resulting from proteolytic (clostripain, staphylococcal protease) and chemical (tryptophan) cleavage were sequenced. The sequence is: AACKCDDEGPDIRTAPLTGTVDLGSCNAGWEKCASYYTIIADCCRKKK. This sequence differs considerably from the homologous Anemonia and Anthopleura toxins; many of the identical residues (6 half-cystines, G9, P10, R13, G19, G29, W30) are probably critical for folding rather than receptor recognition. However, the Sh-NI sequence closely resembles Radioanthus macrodactylus neurotoxin III and r. paumotensis II. The authors propose that Sh-NI and related Radioanthus toxins act upon a different site on the sodium channel.

  14. CSTX-9, a toxic peptide from the spider Cupiennius salei: amino acid sequence, disulphide bridge pattern and comparison with other spider toxins containing the cystine knot structure.

    PubMed

    Schalle, J; Kämpfer, U; Schürch, S; Kuhn-Nentwig, L; Haeberli, S; Nentwig, W

    2001-09-01

    CSTX-9 (68 residues, 7530.9 Da) is one of the most abundant toxic polypeptides in the venom of the wandering spider Cupiennius salei. The amino acid sequence was determined by Edman degradation using reduced and alkylated CSTX-9 and peptides generated by cleavages with endoproteinase Asp-N and trypsin, respectively. Sequence comparison with CSTX-1, the most abundant and the most toxic polypeptide in the crude spider venom, revealed a high degree of similarity (53% identity). By means of limited proteolysis with immobilised trypsin and RP-HPLC, the cystine-containing peptides of CSTX-9 were isolated and the disulphide bridges were assigned by amino acid analysis, Edman degradation and nanospray tandem mass spectrometry. The four disulphide bonds present in CSTX-9 are arranged in the following pattern: 1-4, 2-5, 3-8 and 6-7 (Cys6-Cys21, Cys13-Cys30, Cys20-Cys48, Cys32-Cys46). Sequence comparison of CSTX-1 with CSTX-9 clearly indicates the same disulphide bridge pattern, which is also found in other spider polypeptide toxins, e.g. agatoxins (omega-AGA-IVA, omega-AGA-IVB, mu-AGA-I and mu-AGA-VI) from Agelenopsis aperta, SNX-325 from Segestria florentina and curtatoxins (CT-I, CT-II and CT-III) from Hololena curta. CSTX-1/CSTX-9 belong to the family of ion channel toxins containing the inhibitor cystine knot structural motif. CSTX-9, lacking the lysine-rich C-terminal tail of CSTX-1, exhibits a ninefold lower toxicity to Drosophila melanogaster than CSTX-1. This is in accordance with previous observations of CSTX-2a and CSTX-2b, two truncated forms of CSTX-1 which, like CSTX-9, also lack the C-terminal lysine-rich tail. PMID:11693532

  15. Repetitive Sequences

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Repetitive sequences, or repeats, account for a substantial portion of the eukaryotic genomes. These sequences include very different types of DNA with respect to mode of origin, function, structure, and genomic distribution. Two large families of repetitive sequences can be readily recognized, ta...

  16. Amino acid sequence of mouse nidogen, a multidomain basement membrane protein with binding activity for laminin, collagen IV and cells.

    PubMed Central

    Mann, K; Deutzmann, R; Aumailley, M; Timpl, R; Raimondi, L; Yamada, Y; Pan, T C; Conway, D; Chu, M L

    1989-01-01

    The whole amino acid sequence of nidogen was deduced from cDNA clones isolated from expression libraries and confirmed to approximately 50% by Edman degradation of peptides. The protein consists of some 1217 amino acid residues and a 28-residue signal peptide. The data support a previously proposed dumb-bell model of nidogen by demonstrating a large N-terminal globular domain (641 residues), five EGF-like repeats constituting the rod-like domain (248 residues) and a smaller C-terminal globule (328 residues). Two more EGF-like repeats interrupt the N-terminal and terminate the C-terminal sequences. Weak sequence homologies (25%) were detected between some regions of nidogen, the LDL receptor, thyroglobulin and the EGF precursor. Nidogen contains two consensus sequences for tyrosine sulfation and for asparagine beta-hydroxylation, two N-linked carbohydrate acceptor sites and, within one of the EGF-like repeats an Arg-Gly-Asp sequence. The latter was shown to be functional in cell attachment to nidogen. Binding sites for laminin and collagen IV are present on the C-terminal globule but not yet precisely localized. Images PMID:2496973

  17. Identification and mass spectrometric sequence studies of fragments of l-asparaginase produced during freeze/thaw cycling.

    PubMed

    Jameel, F; Mauri, F; Bogner, R

    1998-01-01

    L-Asparaginase isolated from Er. chrysanthemi was found to lose activity upon exposure to consecutive freeze/thaw cycles. The cause(s) for this loss of activity were investigated using multiple techniques. SEC using UV, RI and light scattering detectors and SDS-PAGE indicated that the l-asparaginase molecule fragments upon exposure to repeated freezing and thawing cycles. Following up on this information, mass spectrometry was used to identify the fragments as small peptides of molecular weight 615 Da, 1424 Da and 1665 Da. Automated Edman sequencing of the frozen and thawed mixture confirmed the presence of fragments and contributed some sequence information. Mass spectral data and sequence studies of these fragments in conjunction with the known sequence of the molecule placed all the fragments within the last 28 C-terminal amino acids. A study of this region using the published 3 dimensional x-ray crystallographic structure of l-asparaginase revealed that the C-terminal region is exposed and can interact with water. The IBI MacVector program "Protein Tool Box" predicted that this region is hydrophilic, has a high surface probability and a strong tendency to interact with water. Both tendencies suggest a potential for bond stress during freeze/thaw cycling. This region is not involved at the catalytic core of the enzyme, but fragmentation in this area may result in unfolding and denaturation of the monomer followed by subsequent aggregation into large, insoluble entities and the loss of enzymatic activity. PMID:9691674

  18. Complete amino acid sequence of the medium-chain S-acyl fatty acid synthetase thio ester hydrolase from rat mammary gland

    SciTech Connect

    Randhawa, Z.I.; Smith, S.

    1987-03-10

    The complete amino acid sequence of the medium-chain S-acyl fatty acid synthetase thio ester hydrolase (thioesterase II) from rat mammary gland is presented. Most of the sequence was derived by analysis of (/sup 14/C)-labelled peptide fragments produced by cleavage at methionyl, glutamyl, lysyl, arginyl, and tryptophanyl residues. A small section of the sequence was deduced from a previously analyzed cDNA clone. The protein consists of 260 residues and has a blocked amino-terminal methionine and calculated M/sub r/ of 29,212. The carboxy-terminal sequence, verified by Edman degradation of the carboxy-terminal cyanogen bromide fragment and carboxypeptidase Y digestion of the intact thioesterase II, terminates with a serine residue and lacks three additional residues predicted by the cDNA sequence. The native enzyme contains three cysteine residues but no disulfide bridges. The active site serine residue is located at position 101. The rat mammary gland thioesterase II exhibits approximately 40% homology with a thioesterase from mallard uropygial gland, the sequence of which was recently determined by cDNA analysis. Thus the two enzymes may share similar structural features and a common evolutionary origin. The location of the active site in these thioesterases differs from that of other serine active site esterases; indeed, the enzymes do not exhibit any significant homology with other serine esterases, suggesting that they may constitute a separate new family of serine active site enzymes.

  19. Dna Sequencing

    DOEpatents

    Tabor, Stanley; Richardson, Charles C.

    1995-04-25

    A method for sequencing a strand of DNA, including the steps off: providing the strand of DNA; annealing the strand with a primer able to hybridize to the strand to give an annealed mixture; incubating the mixture with four deoxyribonucleoside triphosphates, a DNA polymerase, and at least three deoxyribonucleoside triphosphates in different amounts, under conditions in favoring primer extension to form nucleic acid fragments complementory to the DNA to be sequenced; labelling the nucleic and fragments; separating them and determining the position of the deoxyribonucleoside triphosphates by differences in the intensity of the labels, thereby to determine the DNA sequence.

  20. Sequence analyses of two neuropeptides of the AKH/RPCH-family from the lubber grasshopper, Romalea microptera.

    PubMed

    Gäde, G; Hilbich, C; Beyreuther, K; Rinehart, K L

    1988-01-01

    Two neuropeptides with adipokinetic activity in Locusta migratoria and hypertrehalosaemic activity in Periplaneta americana were purified by high-performance liquid chromatography from the corpus cardiacum of the lubber grasshopper, Romalea microptera. The sequences of both peptides, designated Ro I and Ro II, were determined by gas-phase sequencing employing Edman degradation after the N-terminal pyroglutamate residue was enzymatically deblocked, as well as by fast atom bombardment mass spectrometry. Ro I was found to be a decapeptide with the primary structure: pGlu-Val-Asn-Phe-Thr-Pro-Asn-Trp-Gly-Thr-NH2, whereas Ro II is an octapeptide with the structure: pGlu-Val-Asn-Phe-Ser-Thr-Gly-Trp-NH2. Ro II is identical with AKH-G isolated from the cricket Gryllus bimaculatus. Synthetic materials having the assigned structures were found to be chromatographically, mass spectrometrically, and biologically indistinguishable from the natural peptides, confirming the sequences and establishing the Romalea peptides as members of the AKH/RPCH-family of peptides. PMID:3226948

  1. Characterization and amino acid sequence of a fatty acid-binding protein from human heart.

    PubMed

    Offner, G D; Brecher, P; Sawlivich, W B; Costello, C E; Troxler, R F

    1988-05-15

    The complete amino acid sequence of a fatty acid-binding protein from human heart was determined by automated Edman degradation of CNBr, BNPS-skatole [3'-bromo-3-methyl-2-(2-nitrobenzenesulphenyl)indolenine], hydroxylamine, Staphylococcus aureus V8 proteinase, tryptic and chymotryptic peptides, and by digestion of the protein with carboxypeptidase A. The sequence of the blocked N-terminal tryptic peptide from citraconylated protein was determined by collisionally induced decomposition mass spectrometry. The protein contains 132 amino acid residues, is enriched with respect to threonine and lysine, lacks cysteine, has an acetylated valine residue at the N-terminus, and has an Mr of 14768 and an isoelectric point of 5.25. This protein contains two short internal repeated sequences from residues 48-54 and from residues 114-119 located within regions of predicted beta-structure and decreasing hydrophobicity. These short repeats are contained within two longer repeated regions from residues 48-60 and residues 114-125, which display 62% sequence similarity. These regions could accommodate the charged and uncharged moieties of long-chain fatty acids and may represent fatty acid-binding domains consistent with the finding that human heart fatty acid-binding protein binds 2 mol of oleate or palmitate/mol of protein. Detailed evidence for the amino acid sequences of the peptides has been deposited as Supplementary Publication SUP 50143 (23 pages) at the British Library Lending Division, Boston Spa, Yorkshire LS23 7BQ, U.K., from whom copies may be obtained as indicated in Biochem. J. (1988) 249, 5. PMID:3421901

  2. [Sequencing babies?].

    PubMed

    Jordan, Bertrand

    2015-10-01

    An extension of newborn screening to genome sequencing is now feasible but raises a number of scientific, organisational and ethical issues. This is being explored in discussions and in several funded trials, in order to maximize benefits and avoid some identified risks. As some companies are already offering such a service, this is quite an urgent matter. PMID:26481033

  3. MSLICE Sequencing

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Norris, Jeffrey S.; Morris, John R.

    2011-01-01

    MSLICE Sequencing is a graphical tool for writing sequences and integrating them into RML files, as well as for producing SCMF files for uplink. When operated in a testbed environment, it also supports uplinking these SCMF files to the testbed via Chill. This software features a free-form textural sequence editor featuring syntax coloring, automatic content assistance (including command and argument completion proposals), complete with types, value ranges, unites, and descriptions from the command dictionary that appear as they are typed. The sequence editor also has a "field mode" that allows tabbing between arguments and displays type/range/units/description for each argument as it is edited. Color-coded error and warning annotations on problematic tokens are included, as well as indications of problems that are not visible in the current scroll range. "Quick Fix" suggestions are made for resolving problems, and all the features afforded by modern source editors are also included such as copy/cut/paste, undo/redo, and a sophisticated find-and-replace system optionally using regular expressions. The software offers a full XML editor for RML files, which features syntax coloring, content assistance and problem annotations as above. There is a form-based, "detail view" that allows structured editing of command arguments and sequence parameters when preferred. The "project view" shows the user s "workspace" as a tree of "resources" (projects, folders, and files) that can subsequently be opened in editors by double-clicking. Files can be added, deleted, dragged-dropped/copied-pasted between folders or projects, and these operations are undoable and redoable. A "problems view" contains a tabular list of all problems in the current workspace. Double-clicking on any row in the table opens an editor for the appropriate sequence, scrolling to the specific line with the problem, and highlighting the problematic characters. From there, one can invoke "quick fix" as described

  4. Nucleotide sequence and expression of the capsid protein gene of feline calicivirus.

    PubMed Central

    Neill, J D; Reardon, I M; Heinrikson, R L

    1991-01-01

    The sequence of the 3'-terminal 2,486 bases of the feline calicivirus (FCV) genome was determined. This region of the FCV genome, from which the 2.4-kb subgenomic RNA is derived, contained two open reading frames. The larger open reading frame, found in the 5' end of the subgenomic mRNA, contained 2,004 bases encoding a polypeptide of 73,467 Da. The smaller open reading frame, encoded in the 3' end of the mRNA, was composed of 318 bases, encoding a polypeptide of 12,185 Da. The AUG initiation codon of the second open reading frame overlapped the UGA termination codon of the first, with the sequence AUGA. The nucleotide sequence of the region containing this overlap resembles the -1 frameshift sequences of the retroviruses. The 5' end of the 2.4-kb subgenomic RNA was mapped by primer extension analysis. There were two apparent transcription initiation points, both of which were 5' to the AUG initiation codon of the large open reading frame. Transcription from these sites yielded RNA transcripts with 5' nontranslated leader regions of 17 and 18 bases. The total length of the 2.4-kb subgenomic RNA was 2,375 bases (from the 5'-most start site) excluding the poly(A) tail. Edman degradation of the purified capsid protein of FCV showed that the capsid protein was encoded by the large open reading frame. Western immunoblot analysis of FCV-infected cells using a feline anti-FCV antiserum demonstrated that translation of the capsid protein was detectable at 3 h postinfection and continued to accumulate until 8 h postinfection, the last time examined. Images PMID:1716692

  5. Insertion Sequences

    PubMed Central

    Mahillon, Jacques; Chandler, Michael

    1998-01-01

    Insertion sequences (ISs) constitute an important component of most bacterial genomes. Over 500 individual ISs have been described in the literature to date, and many more are being discovered in the ongoing prokaryotic and eukaryotic genome-sequencing projects. The last 10 years have also seen some striking advances in our understanding of the transposition process itself. Not least of these has been the development of various in vitro transposition systems for both prokaryotic and eukaryotic elements and, for several of these, a detailed understanding of the transposition process at the chemical level. This review presents a general overview of the organization and function of insertion sequences of eubacterial, archaebacterial, and eukaryotic origins with particular emphasis on bacterial elements and on different aspects of the transposition mechanism. It also attempts to provide a framework for classification of these elements by assigning them to various families or groups. A total of 443 members of the collection have been grouped in 17 families based on combinations of the following criteria: (i) similarities in genetic organization (arrangement of open reading frames); (ii) marked identities or similarities in the enzymes which mediate the transposition reactions, the recombinases/transposases (Tpases); (iii) similar features of their ends (terminal IRs); and (iv) fate of the nucleotide sequence of their target sites (generation of a direct target duplication of determined length). A brief description of the mechanism(s) involved in the mobility of individual ISs in each family and of the structure-function relationships of the individual Tpases is included where available. PMID:9729608

  6. Novel proline-hydroxyproline glycopeptides from the dandelion (Taraxacum officinale Wigg.) flowers: de novo sequencing and biological activity.

    PubMed

    Astafieva, Alexandra A; Enyenihi, Atim A; Rogozhin, Eugene A; Kozlov, Sergey A; Grishin, Eugene V; Odintsova, Tatyana I; Zubarev, Roman A; Egorov, Tsezi A

    2015-09-01

    Two novel homologous peptides named ToHyp1 and ToHyp2 that show no similarity to any known proteins were isolated from Taraxacum officinale Wigg. flowers by multidimensional liquid chromatography. Amino acid and mass spectrometry analyses demonstrated that the peptides have unusual structure: they are cysteine-free, proline-hydroxyproline-rich and post-translationally glycosylated by pentoses, with 5 carbohydrates in ToHyp2 and 10 in ToHyp1. The ToHyp2 peptide with a monoisotopic molecular mass of 4350.3Da was completely sequenced by a combination of Edman degradation and de novo sequencing via top down multistage collision induced dissociation (CID) and higher energy dissociation (HCD) tandem mass spectrometry (MS(n)). ToHyp2 consists of 35 amino acids, contains eighteen proline residues, of which 8 prolines are hydroxylated. The peptide displays antifungal activity and inhibits growth of Gram-positive and Gram-negative bacteria. We further showed that carbohydrate moieties have no significant impact on the peptide structure, but are important for antifungal activity although not absolutely necessary. The deglycosylated ToHyp2 peptide was less active against the susceptible fungus Bipolaris sorokiniana than the native peptide. Unique structural features of the ToHyp2 peptide place it into a new family of plant defense peptides. The discovery of ToHyp peptides in T. officinale flowers expands the repertoire of molecules of plant origin with practical applications. PMID:26259198

  7. Isolation and complete amino acid sequence of two fibrinolytic proteinases from the toxic Saturnid caterpillar Lonomia achelous.

    PubMed

    Amarant, T; Burkhart, W; LeVine, H; Arocha-Pinango, C L; Parikh, I

    1991-08-30

    The major toxic and fibrinolytic activity of the saliva and hemolymph of the larval form of Lonomia achelous was purified to homogeneity by a combination of metal chelate and affinity chromatography. Two apparent isozymes, Achelase I (213 amino acids, pIcalc = 10.55) and Achelase II (214 amino acids, pIcalc = 8.51), were sequenced by automated Edman degradation, and their C-termini confirmed by Fourier-transform mass spectrometry. The calculated molecular weights (22,473 and 22,727) correspond well to Mr estimates of 24,000 by SDS-PAGE. No carbohydrate was detected during sequencing. The enzymes degraded all three chains of fibrin, alpha greater than beta much greater than gamma, yielding a fragmentation pattern indistinguishable from that produced by trypsin. Chromogenic peptides S-2222 (Factor Xa and trypsin), S-2251 (plasmin), S-2302 (kallikrein) and S-2444 (urokinase) were substrates while S-2288 (broad range of serine proteinases including thrombin) was not hydrolyzed. Among a range of inhibitors Hg+2, aminophenylmercuriacetate, leupeptin, antipain and E-64 but not N-ethylmaleimide or iodoacetate abolished the activity of the purified isozymes against S-2444. Phenylmethylsulfonyl fluoride, soybean trypsin inhibitor and aprotinin were less effective. The presence of the classic catalytic triad (histidine-41, aspartate-86 and serine-189) suggests that Achelases I and II may be serine proteinases, but with a potentially free cysteine-185 which could react with thiol proteinase-directed reagents. PMID:1911844

  8. Production, purification, sequencing and activity spectra of mutacins D-123.1 and F-59.1

    PubMed Central

    2011-01-01

    Background The increase in bacterial resistance to antibiotics impels the development of new anti-bacterial substances. Mutacins (bacteriocins) are small antibacterial peptides produced by Streptococcus mutans showing activity against bacterial pathogens. The objective of the study was to produce and characterise additional mutacins in order to find new useful antibacterial substances. Results Mutacin F-59.1 was produced in liquid media by S. mutans 59.1 while production of mutacin D-123.1 by S. mutans 123.1 was obtained in semi-solid media. Mutacins were purified by hydrophobic chromatography. The amino acid sequences of the mutacins were obtained by Edman degradation and their molecular mass was determined by mass spectrometry. Mutacin F-59.1 consists of 25 amino acids, containing the YGNGV consensus sequence of pediocin-like bacteriocins with a molecular mass calculated at 2719 Da. Mutacin D-123.1 has an identical molecular mass (2364 Da) with the same first 9 amino acids as mutacin I. Mutacins D-123.1 and F-59.1 have wide activity spectra inhibiting human and food-borne pathogens. The lantibiotic mutacin D-123.1 possesses a broader activity spectrum than mutacin F-59.1 against the bacterial strains tested. Conclusion Mutacin F-59.1 is the first pediocin-like bacteriocin identified and characterised that is produced by Streptococcus mutans. Mutacin D-123.1 appears to be identical to mutacin I previously identified in different strains of S. mutans. PMID:21477375

  9. ABRF-MIRG benchmark study: molecular interactions in a three-component system.

    PubMed

    Yamniuk, Aaron P; Edavettal, Suzanne C; Bergqvist, Simon; Yadav, Satya P; Doyle, Michael L; Calabrese, Kelly; Parsons, James F; Eisenstein, Edward

    2012-09-01

    Protein-protein interactions identified through high-throughput proteomics efforts continue to advance our understanding of the protein interactome. In addition to highly specific protein-protein interactions, it is becoming increasingly more common for yeast two-hybrid, pull-down assays, and other proteomics techniques to identify multiple protein ligands that bind to the same target protein. A resulting challenge is to accurately characterize the assembly of these multiprotein complexes and the competition among multiple protein ligands for a given target. The Association of Biomolecular Resource Facilities-Molecular Interactions Research Group recently conducted a benchmark study to assess participants' ability to correctly describe the interactions between two protein ligands and their target protein using primarily biosensor technologies, such as surface plasmon resonance. Participants were provided with microgram quantities of three proteins (A, B, and C) and asked to determine if a ternary A-B-C complex can form or if protein-B and protein-C bind competitively to protein-A. This article will summarize the experimental approaches taken by participants to characterize the molecular interactions, the interpretation of the data, and the results obtained using different biosensor instruments. PMID:22942790

  10. Purification and sequencing of radish seed calmodulin antagonists phosphorylated by calcium-dependent protein kinase.

    PubMed Central

    Polya, G M; Chandra, S; Condron, R

    1993-01-01

    A family of radish (Raphanus sativus) calmodulin antagonists (RCAs) was purified from seeds by extraction, centrifugation, batch-wise elution from carboxymethyl-cellulose, and high performance liquid chromatography (HPLC) on an SP5PW cation-exchange column. This RCA fraction was further resolved into three calmodulin antagonist polypeptides (RCA1, RCA2, and RCA3) by denaturation in the presence of guanidinium HCl and mercaptoethanol and subsequent reverse-phase HPLC on a C8 column eluted with an acetonitrile gradient in the presence of 0.1% trifluoroacetic acid. The RCA preparation, RCA1, RCA2, RCA3, and other radish seed proteins are phosphorylated by wheat embryo Ca(2+)-dependent protein kinase (CDPK). The RCA preparation contains other CDPK substrates in addition to RCA1, RCA2, and RCA3. The RCA preparation, RCA1, RCA2, and RCA3 inhibit chicken gizzard calmodulin-dependent myosin light chain kinase assayed with a myosin-light chain-based synthetic peptide substrate (fifty percent inhibitory concentrations of RCA2 and RCA3 are about 7 and 2 microM, respectively). N-terminal sequencing by sequential Edman degradation of RCA1, RCA2, and RCA3 revealed sequences having a high homology with the small subunit of the storage protein napin from Brassica napus and with related proteins. The deduced amino acid sequences of RCA1, RCA2, RCA3, and RCA3' (a subform of RCA3) have agreement with average molecular masses from electrospray mass spectrometry of 4537, 4543, 4532, and 4560 kD, respectively. The only sites for serine phosphorylation are near or at the C termini and hence adjacent to the sites of proteolytic precursor cleavage. PMID:8278508

  11. Reconstructing of a Sequence Using Similar Sequences

    Energy Science and Technology Software Center (ESTSC)

    1995-11-28

    SIMSEQ reconstructs sequences from oligos. Similar known sequences are used as a reference. At present, simulated data are being used to develop the algorithm. SIMSEQ generates an initial random sequence, then generates a second sequence that is 60 to 90 percent similar to the first. Next, the second sequence is chopped into its appropriate oligos. All possible sequences are reconstructed to determine the most similar. Those with the highest similarity are printed as output.

  12. The amino-acid sequence of the glucose/mannose-specific lectin isolated from Parkia platycephala seeds reveals three tandemly arranged jacalin-related domains.

    PubMed

    Mann, K; Farias, C M; Del Sol, F G; Santos, C F; Grangeiro, T B; Nagano, C S; Cavada, B S; Calvete, J J

    2001-08-01

    A mannose/glucose-specific lectin was isolated from seeds of Parkia platycephala, the most primitive subfamily of Leguminosae plants. The molecular mass of the purified lectin determined by mass spectrometry was 47 946 +/- 6 Da (by electrospray ionization) and 47 951 +/- 9 Da (by matrix-assisted laser-desoption ionization). The apparent molecular mass of the lectin in solutions of pH in the range 4.5-8.5 determined by analytical ultracentrifugation equilibrium sedimentation was 94 +/- 3 kDa, showing that the protein behaved as a non-pH-dependent dimer. The amino-acid sequence of the Parkia lectin was determined by Edman degradation of overlapping peptides. This is the first report of the primary structure of a Mimosoideae lectin. The protein contained a blocked N-terminus and a single, nonglycosylated polypeptide chain composed of three tandemly arranged homologous domains. Each of these domains shares sequence similarity with jacalin-related lectin monomers from Asteraceae, Convolvulaceae, Moraceae, Musaceae, Gramineae, and Fagaceae plant families. Based on this homology, we predict that each Parkia lectin repeat may display a beta prism fold similar to that observed in the crystal structure of the lectin from Helianthus tuberosus. The P. platycephala lectin also shows sequence similarity with stress- and pathogen-upregulated defence genes of a number of different plants, suggesting a common ancestry for jacalin-related lectins and inducible defence proteins. PMID:11502201

  13. Sequence and peptide-binding motif for a variant of HLA-A*0214 (A*02142) in an HIV-1-resistant individual from the Nairobi Sex Worker cohort.

    PubMed

    Luscher, M A; MacDonald, K S; Bwayo, J J; Plummer, F A; Barber, B H

    2001-02-01

    As part of the ongoing study of natural HIV-1 resistance in the women of the Nairobi Sex Workers' study, we have examined a resistance-associated HLA class I allele at the molecular level. Typing by polymerase chain reaction using sequence-specific primers determined that this molecule is closely related to HLA-A*0214, one of a family of HLA-A2 supertype alleles which correlate with HIV-1 resistance in this population. Direct nucleotide sequencing shows that this molecule differs from A*0214, having a silent nucleotide substitution. We therefore propose to designate it HLA-A*02142. We have determined the peptide-binding motif of HLA-A*0214/02142 by peptide elution and bulk Edman degradative sequencing. The resulting motif, X-[Q,V]-X-X-X-K-X-X-[V,L], includes lysine as an anchor at position 6. The data complement available information on the peptide-binding characteristics of this molecule, and will be of use in identifying antigenic peptides from HIV-1 and other pathogens. PMID:11261925

  14. X-ray sequence and crystal structure of luffaculin 1, a novel type 1 ribosome-inactivating protein

    PubMed Central

    Hou, Xiaomin; Chen, Minghuang; Chen, Liqing; Meehan, Edward J; Xie, Jieming; Huang, Mingdong

    2007-01-01

    Background Protein sequence can be obtained through Edman degradation, mass spectrometry, or cDNA sequencing. High resolution X-ray crystallography can also be used to derive protein sequence information, but faces the difficulty in distinguishing the Asp/Asn, Glu/Gln, and Val/Thr pairs. Luffaculin 1 is a new type 1 ribosome-inactivating protein (RIP) isolated from the seeds of Luffa acutangula. Besides rRNA N-glycosidase activity, luffaculin 1 also demonstrates activities including inhibiting tumor cells' proliferation and inducing tumor cells' differentiation. Results The crystal structure of luffaculin 1 was determined at 1.4 Å resolution. Its amino-acid sequence was derived from this high resolution structure using the following criteria: 1) high resolution electron density; 2) comparison of electron density between two molecules that exist in the same crystal; 3) evaluation of the chemical environment of residues to break down the sequence assignment ambiguity in residue pairs Glu/Gln, Asp/Asn, and Val/Thr; 4) comparison with sequences of the homologous proteins. Using the criteria 1 and 2, 66% of the residues can be assigned. By incorporating with criterion 3, 86% of the residues were assigned, suggesting the effectiveness of chemical environment evaluation in breaking down residue ambiguity. In total, 94% of the luffaculin 1 sequence was assigned with high confidence using this improved X-ray sequencing strategy. Two N-acetylglucosamine moieties, linked respectively to the residues Asn77 and Asn84, can be identified in the structure. Residues Tyr70, Tyr110, Glu159 and Arg162 define the active site of luffaculin 1 as an RNA N-glycosidase. Conclusion X-ray sequencing method can be effective to derive sequence information of proteins. The evaluation of the chemical environment of residues is a useful method to break down the assignment ambiguity in Glu/Gln, Asp/Asn, and Val/Thr pairs. The sequence and the crystal structure confirm that luffaculin 1 is a new

  15. High performance liquid chromatography purification and amino acid sequence of toxins from the muscarinic fraction of Tityus discrepans scorpion venom.

    PubMed

    D'Suze, G; Corona, F; Possani, L D; Sevcik, C

    1996-05-01

    Tityus discrepans venom was fractionated by gel filtration on Sephadex G-50 column. The peptides in fraction II from Sephadex were further purified by high performance liquid chromatography, through a C4 reverse-phase column. Lethality of purified peptides was determined by injection into mice and crabs, and their effects were verified electrophysiologically on frog (Hyla crepitans) sartorius neuromuscular junction. Toxins having retention times between 39.6 and 40.7 min depolarized the muscle membrane and caused acetylcholine release at the endplate. The toxin eluted at 42.67 min increased the frequency of miniature endplate potentials without depolarizing muscle fibres. The four most active toxins were reduced, carboxymethylated and sequenced by automatic Edman degradation and named TdII-1 to II-4. Toxin gamma from Tityus serrulatus venom and the toxins from T. discrepans venom were found to be structurally distinct. TdII-1 to II-4 lack the pancreatic effects of T. serrulatus' toxin gamma; yet, the five toxins act on Na+ channels. PMID:8783453

  16. Purification and complete amino acid sequence of a new type of sweet protein taste-modifying activity, curculin.

    PubMed

    Yamashita, H; Theerasilp, S; Aiuchi, T; Nakaya, K; Nakamura, Y; Kurihara, Y

    1990-09-15

    A new taste-modifying protein named curculin was extracted with 0.5 M NaCl from the fruits of Curculigo latifolia and purified by ammonium sulfate fractionation, CM-Sepharose ion-exchange chromatography, and gel filtration. Purified curculin thus obtained gave a single band having a Mr of 12,000 on sodium dodecyl sulfate-polyacrylamide gel electrophoresis in the presence of 8 M urea. The molecular weight determined by low-angle laser light scattering was 27,800. These results suggest that native curculin is a dimer of a 12,000-Da polypeptide. The complete amino acid sequence of curculin was determined by automatic Edman degradation. Curculin consists of 114 residues. Curculin itself elicits a sweet taste. After curculin, water elicits a sweet taste, and sour substances induce a stronger sense of sweetness. No protein with both sweet-tasting and taste-modifying activities has ever been found. There are five sets of tripeptides common to miraculin (a taste-modifying protein), six sets of tripeptides common to thaumatin (a sweet protein), and two sets of tripeptides common to monellin (a sweet protein). Anti-miraculin serum was not immunologically reactive with curculin. The mechanism of the taste-modifying action of curculin is discussed. PMID:2394746

  17. Information contained in the amino acid sequence of the alpha1(I)-chain of collagen and its consequences upon the formation of the triple helix, of fibrils and crosslinks.

    PubMed

    Fietzek, P P; Kühn, K

    1975-09-30

    The molecule of type I collagen from skin consists of two alpha1(I)-chains and one alpha2-chain. The sequence of the entire alpha1-chain comprising 1052 residues is summarily presented and discussed. Apart from the 279 residues of alpha1(I)-CB8 whose sequence has been established for rat skin collagen, all sequences have been determined for calf skin collagen. In order to facilitate sequence analysis, the alpha1-chain was cleaved into defined fragments by cyanogen bromide or hydroxylamine or limited collagenase digestion. Most of the sequence was established by automated stepwise Edman degradation. The alpha1-chain contains two basically different types of sequences: the triple helical region of 1011 amino acid residues in which every third position is occupied by glycine and the N- and C-terminal regions not displaying this type of regularity. Both of these non-triple helical regions carry oxidizable lysine or hydroxylysine residues as functional sites for the intermolecular crosslink formation. Implications of the amino acid sequence for the stability of the triple helix and the fibril as well as for formation of crosslinks are discussed. Evaluation of the sequence in connection with electron microscopical investigations yielded the parameters of the axial arrangement of the molecules within the fibrils. Axial stagger of the molecules by a distance D = 670 angstrom = 233 amino acid residues results in maximal interaction of polar sequence regions of adjacent molecules and similarly of regions of hydrophobic residues. Ordered aggregation of molecules into fibrils is, therefore, regulated by electrostatic and electrophobic forces. Possible loci of intermolecular crosslinks between the alpha1-chains of adjacent molecules may be deduced from the dimensions of the axial aggregation of molecules. PMID:171554

  18. Shotgun protein sequencing.

    SciTech Connect

    Faulon, Jean-Loup Michel; Heffelfinger, Grant S.

    2009-06-01

    A novel experimental and computational technique based on multiple enzymatic digestion of a protein or protein mixture that reconstructs protein sequences from sequences of overlapping peptides is described in this SAND report. This approach, analogous to shotgun sequencing of DNA, is to be used to sequence alternative spliced proteins, to identify post-translational modifications, and to sequence genetically engineered proteins.

  19. The sequence of sequencers: The history of sequencing DNA

    PubMed Central

    Heather, James M.; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. PMID:26554401

  20. Whole Genome Sequencing

    MedlinePlus

    ... you want to learn. Search form Search Whole Genome Sequencing You are here Home Testing & Services Testing ... the full story, click here . What is whole genome sequencing? Whole genome sequencing is the mapping out ...

  1. Coordinate cytokine regulatory sequences

    DOEpatents

    Frazer, Kelly A.; Rubin, Edward M.; Loots, Gabriela G.

    2005-05-10

    The present invention provides CNS sequences that regulate the cytokine gene expression, expression cassettes and vectors comprising or lacking the CNS sequences, host cells and non-human transgenic animals comprising the CNS sequences or lacking the CNS sequences. The present invention also provides methods for identifying compounds that modulate the functions of CNS sequences as well as methods for diagnosing defects in the CNS sequences of patients.

  2. Science sequence design

    NASA Technical Reports Server (NTRS)

    Koskela, P. E.; Bollman, W. E.; Freeman, J. E.; Helton, M. R.; Reichert, R. J.; Travers, E. S.; Zawacki, S. J.

    1973-01-01

    The activities of the following members of the Navigation Team are recorded: the Science Sequence Design Group, responsible for preparing the final science sequence designs; the Advanced Sequence Planning Group, responsible for sequence planning; and the Science Recommendation Team (SRT) representatives, responsible for conducting the necessary sequence design interfaces with the teams during the mission. The interface task included science support in both advance planning and daily operations. Science sequences designed during the mission are also discussed.

  3. Isolation and sequencing of an active-site peptide from Rhodospirillum rubrum ribulosebisphosphate carboxylase/oxygenase after affinity labeling with 2-((Bromoacetyl)amino)pentitol 1,5-bisphosphate

    SciTech Connect

    Fraij, B.; Hartman, F.C.

    1983-01-01

    2-((Bromoacetyl)amino)pentitol 1,5-bisphosphate was reported to be a highly selective affinity label for ribulosebisphosphate carboxylase/oxygenase from Rhodospirillum rubrum. The enzyme has now been inactivated with a /sup 14/C-labeled reagent in order to identify the target residue at the sequence level. Subsequent to inactivation, the enzyme was carboxymethylated with iodoacetate and then digested with trypsin. The only radioactive peptide in the digest was obtained at a high degree of purity by successive chromatography on DEAE-cellulose, SP-Sephadex, and Sephadex G-25. On the basis of amino acid analysis of the purified peptide, the derivatized residue was a methionyl sulfonium salt. Automated Edman degradation confirmed the purity of the labeled peptide and established its sequence as Leu-Gln-Gly-Ala-Ser-Gly-Ile-His-Thr-Gly-Thr-Met-Gly-Phe-Gly-Lys-Met-Glu-Gly-Glu-Ser-Ser-Asp-Arg. Cleavage of this peptide with cyanogen bromide showed that the reagent moiety was covalently attached to the second methionyl residue. Sequence homology with the carboxylase/oxygenase from spinach indicates that the lysyl residue immediately preceding the alkylated methionine corresponds to Lys-334, a residue previously implicated at the active site. 31 references, 4 figures, 3 tables.

  4. The generalized quaternion sequence

    NASA Astrophysics Data System (ADS)

    Deveci, Ömür

    2016-04-01

    In this work, we define the recurrence sequence by using the relation matrix of the generalized quaternion group and then, we obtain miscellaneous properties of this sequence. Also, we obtain the cyclic groups and the semigroups which are produced by generating matrix of the sequence defined when read modulo m. Furthermore, we study this sequence modulo m, and then we derive the relationship among the order the cyclic groups obtained and the periods of the sequence defined.

  5. Structural characterization of blotting membranes and the influence of membrane parameters for electroblotting and subsequent amino acid sequence analysis of proteins.

    PubMed

    Eckerskorn, C; Lottspeich, F

    1993-09-01

    Various blotting membranes were evaluated and correlated with the efficiency of electroblotting and the performance in the sequencing process. Structural parameters including specific surface area, pore size distribution, pore volumes, and permeabilities of different solvents lead to discrimination of the membranes relative to their accessible surfaces and membrane densities. Protein binding capacities as well as protein recoveries in electroblotting correlate with the specific surface areas. Almost quantitative retention of proteins during electroblotting from gels was obtained for membranes with a high specific surface area and narrow pores (Trans-Blot, Immobilon PSQ, Fluorotrans), whereas membranes with a relatively low specific surface area (Immobilon P, Glassybond) showed reduced recoveries of between 10-20% for the tested proteins. Initial yields and repetitive yields were compared for radioiodinated standard proteins that have been either electroblotted or loaded by direct adsorption. The results showed that the different permeabilities for solutions of the Edman chemistry have a major influence on initial yields. The glass fiber-based membranes with an extremely low flow restriction produce consistently high initial yields independent of the application mode of the protein (spotted or electroblotted) or the application of the membranes into the cartridge (discs or small pieces). In contrast, the polymeric membranes showed decreasing initial yields with increasing membrane density for spotted and electroblotted proteins. Yields varied considerably when the membranes were applied as discs into the cartridge. This effect could be minimized by cutting the membranes into pieces as small as possible, as demonstrated for electroblotted proteins. PMID:8223390

  6. Identification of Tuber borchii Vittad. mycelium proteins separated by two-dimensional polyacrylamide gel electrophoresis using amino acid analysis and sequence tagging.

    PubMed

    Vallorani, L; Bernardini, F; Sacconi, C; Pierleoni, R; Pieretti, B; Piccoli, G; Buffalini, M; Stocchi, V

    2000-11-01

    This paper reports the first results in the proteome analysis of Tuber borchii Vittad. mycelium, an ectomycorrhizal fungus poorly defined genetically, but known for its generation of edible fruit bodies known as white truffles. Employing isoelectric focusing on immobilized pH gradients, followed by sodium dodecyl sulfate-polyacrylamide gel electrophoresis, we obtained an electropherogram presenting over 800 spots within the window of isoelectric points (pI) 3.5-9 and a molecular mass of 10-200 kDa. Different reducing agents were tested in the sample preparation buffers, and the standard lysis buffer plus 2% w/v polyvinylpolypyrrolidone allowed the best solubilization and resolution of the proteins. The T. borchii proteins separated in micropreparative gels were electroblotted onto polyvinylidene difluoride membranes and visualized by Coomassie staining. Twenty-three proteins were excised and analyzed by the combination of amino acid and N-terminal analysis. One protein was identified by matching its amino acid composition, estimated isoelectric point and molecular mass against the SWISS-PROT and EMBL databases. Four spots were successfully tagged by Edman microsequencing but no homologous sequences were found in databases. PMID:11271490

  7. Genome Sequence Databases (Overview): Sequencing and Assembly

    SciTech Connect

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  8. Automated DNA Sequencing System

    SciTech Connect

    Armstrong, G.A.; Ekkebus, C.P.; Hauser, L.J.; Kress, R.L.; Mural, R.J.

    1999-04-25

    Oak Ridge National Laboratory (ORNL) is developing a core DNA sequencing facility to support biological research endeavors at ORNL and to conduct basic sequencing automation research. This facility is novel because its development is based on existing standard biology laboratory equipment; thus, the development process is of interest to the many small laboratories trying to use automation to control costs and increase throughput. Before automation, biology Laboratory personnel purified DNA, completed cycle sequencing, and prepared 96-well sample plates with commercially available hardware designed specifically for each step in the process. Following purification and thermal cycling, an automated sequencing machine was used for the sequencing. A technician handled all movement of the 96-well sample plates between machines. To automate the process, ORNL is adding a CRS Robotics A- 465 arm, ABI 377 sequencing machine, automated centrifuge, automated refrigerator, and possibly an automated SpeedVac. The entire system will be integrated with one central controller that will direct each machine and the robot. The goal of this system is to completely automate the sequencing procedure from bacterial cell samples through ready-to-be-sequenced DNA and ultimately to completed sequence. The system will be flexible and will accommodate different chemistries than existing automated sequencing lines. The system will be expanded in the future to include colony picking and/or actual sequencing. This discrete event, DNA sequencing system will demonstrate that smaller sequencing labs can achieve cost-effective the laboratory grow.

  9. Purification and amino acid sequence of a highly insecticidal toxin from the venom of the brazilian spider Phoneutria nigriventer which inhibits NMDA-evoked currents in rat hippocampal neurones.

    PubMed

    de Figueiredo, S G; de Lima, M E; Nascimento Cordeiro, M; Diniz, C R; Patten, D; Halliwell, R F; Gilroy, J; Richardson, M

    2001-01-01

    A new insecticidal toxin Tx4(5-5) was isolated from the fraction PhTx4 of the venom of the spider Phoneutria nigriventer by reverse phase high performance liquid chromatography (HPLC) and anion exchange HPLC. The complete amino acid sequence determined by automated Edman degradation showed that Tx4(5-5) is a single chain polypeptide composed of 47 amino acid residues, including 10 cysteines, with a calculated molecular mass of 5175 Da. Tx4(5-5) shows 64% of sequence identity with Tx4(6-1), another insecticidal toxin from the same venom. Tx4(5-5) was highly toxic to house fly (Musca domestica), cockroach (Periplaneta americana) and cricket (Acheta domesticus ), producing neurotoxic effects (knock-down, trembling with uncoordinated movements) at doses as low as 50 ng/g (house fly), 250 ng/g (cockroach) and 150 ng/g (cricket). In contrast, intracerebroventricular injections (30 microg) into mice induced no behavioural effects. Preliminary electrophysiological studies carried out on whole-cell voltage-clamped rat hippocampal neurones indicated that Tx4(5-5) (at 1 microM) reversibly inhibited the N-methyl-D-aspartate-subtype of ionotropic glutamate receptor, while having little or no effect on kainate-, alpha-amino-3-hydroxy-5-methyl-4-isoxazole-propionic acid- or gamma-aminobutyric acid-activated currents. PMID:10978749

  10. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-01-01

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  11. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-02-20

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  12. Sequence information signal processor

    DOEpatents

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1999-01-01

    An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.

  13. DNA sequencing conference, 2

    SciTech Connect

    Cook-Deegan, R.M.; Venter, J.C.; Gilbert, W.; Mulligan, J.; Mansfield, B.K.

    1991-06-19

    This conference focused on DNA sequencing, genetic linkage mapping, physical mapping, informatics and bioethics. Several were used to study this sequencing and mapping. This article also discusses computer hardware and software aiding in the mapping of genes.

  14. Roles of repetitive sequences

    SciTech Connect

    Bell, G.I.

    1991-12-31

    The DNA of higher eukaryotes contains many repetitive sequences. The study of repetitive sequences is important, not only because many have important biological function, but also because they provide information on genome organization, evolution and dynamics. In this paper, I will first discuss some generic effects that repetitive sequences will have upon genome dynamics and evolution. In particular, it will be shown that repetitive sequences foster recombination among, and turnover of, the elements of a genome. I will then consider some examples of repetitive sequences, notably minisatellite sequences and telomere sequences as examples of tandem repeats, without and with respectively known function, and Alu sequences as an example of interspersed repeats. Some other examples will also be considered in less detail.

  15. Career Academy Course Sequences.

    ERIC Educational Resources Information Center

    Markham, Thom; Lenz, Robert

    This career academy course sequence guide is designed to give teachers a quick overview of the course sequences of well-known career academy and career pathway programs from across the country. The guide presents a variety of sample course sequences for the following academy themes: (1) arts and communication; (2) business and finance; (3)…

  16. T. cacao Transcriptome Sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    To compliment the T. cacao genome sequencing initiative and to build a reference set of expressed genes for functional studies, a broad and state-of-the-art approach to transcriptome sequencing is underway. Using newly optimized methods, transcriptome sequencing libraries were prepared from RNA of o...

  17. Enhanced virome sequencing using targeted sequence capture

    PubMed Central

    Wylie, Todd N.; Wylie, Kristine M.; Herter, Brandi N.; Storch, Gregory A.

    2015-01-01

    Metagenomic shotgun sequencing (MSS) is an important tool for characterizing viral populations. It is culture independent, requires no a priori knowledge of the viruses in the sample, and may provide useful genomic information. However, MSS can lack sensitivity and may yield insufficient data for detailed analysis. We have created a targeted sequence capture panel, ViroCap, designed to enrich nucleic acid from DNA and RNA viruses from 34 families that infect vertebrate hosts. A computational approach condensed ∼1 billion bp of viral reference sequence into <200 million bp of unique, representative sequence suitable for targeted sequence capture. We compared the effectiveness of detecting viruses in standard MSS versus MSS following targeted sequence capture. First, we analyzed two sets of samples, one derived from samples submitted to a diagnostic virology laboratory and one derived from samples collected in a study of fever in children. We detected 14 and 18 viruses in the two sets, comprising 19 genera from 10 families, with dramatic enhancement of genome representation following capture enrichment. The median fold-increases in percentage viral reads post-capture were 674 and 296. Median breadth of coverage increased from 2.1% to 83.2% post-capture in the first set and from 2.0% to 75.6% in the second set. Next, we analyzed samples containing a set of diverse anellovirus sequences and demonstrated that ViroCap could be used to detect viral sequences with up to 58% variation from the references used to select capture probes. ViroCap substantially enhances MSS for a comprehensive set of viruses and has utility for research and clinical applications. PMID:26395152

  18. Low autocorrelation binary sequences

    NASA Astrophysics Data System (ADS)

    Packebusch, Tom; Mertens, Stephan

    2016-04-01

    Binary sequences with minimal autocorrelations have applications in communication engineering, mathematics and computer science. In statistical physics they appear as groundstates of the Bernasconi model. Finding these sequences is a notoriously hard problem, that so far can be solved only by exhaustive search. We review recent algorithms and present a new algorithm that finds optimal sequences of length N in time O(N {1.73}N). We computed all optimal sequences for N≤slant 66 and all optimal skewsymmetric sequences for N≤slant 119.

  19. Uncorrectable sequences and telecommand

    NASA Technical Reports Server (NTRS)

    Ekroot, Laura; Mceliece, R.; Dolinar, S.; Swanson, L.

    1993-01-01

    The purpose of a tail sequence for command link transmission units is to fail to decode, so that the command decoder will begin searching for the start of the next unit. A tail sequence used by several missions and recommended for this purpose by the Consultative Committee on Space Data Standards is analyzed. A single channel error can cause the sequence to decode. An alternative sequence requiring at least two channel errors before it can possibly decode is presented. (No sequence requiring more than two channel errors before it can possibly decode exists for this code.)

  20. Indexing Similar DNA Sequences

    NASA Astrophysics Data System (ADS)

    Huang, Songbo; Lam, T. W.; Sung, W. K.; Tam, S. L.; Yiu, S. M.

    To study the genetic variations of a species, one basic operation is to search for occurrences of patterns in a large number of very similar genomic sequences. To build an indexing data structure on the concatenation of all sequences may require a lot of memory. In this paper, we propose a new scheme to index highly similar sequences by taking advantage of the similarity among the sequences. To store r sequences with k common segments, our index requires only O(n + NlogN) bits of memory, where n is the total length of the common segments and N is the total length of the distinct regions in all texts. The total length of all sequences is rn + N, and any scheme to store these sequences requires Ω(n + N) bits. Searching for a pattern P of length m takes O(m + m logN + m log(rk)psc(P) + occlogn), where psc(P) is the number of prefixes of P that appear as a suffix of some common segments and occ is the number of occurrences of P in all sequences. In practice, rk ≤ N, and psc(P) is usually a small constant. We have implemented our solution and evaluated our solution using real DNA sequences. The experiments show that the memory requirement of our solution is much less than that required by BWT built on the concatenation of all sequences. When compared to the other existing solution (RLCSA), we use less memory with faster searching time.

  1. Unlocking hidden genomic sequence

    PubMed Central

    Keith, Jonathan M.; Cochran, Duncan A. E.; Lala, Gita H.; Adams, Peter; Bryant, Darryn; Mitchelson, Keith R.

    2004-01-01

    Despite the success of conventional Sanger sequencing, significant regions of many genomes still present major obstacles to sequencing. Here we propose a novel approach with the potential to alleviate a wide range of sequencing difficulties. The technique involves extracting target DNA sequence from variants generated by introduction of random mutations. The introduction of mutations does not destroy original sequence information, but distributes it amongst multiple variants. Some of these variants lack problematic features of the target and are more amenable to conventional sequencing. The technique has been successfully demonstrated with mutation levels up to an average 18% base substitution and has been used to read previously intractable poly(A), AT-rich and GC-rich motifs. PMID:14973330

  2. Amino acid sequence and posttranslational modifications of human factor VII sub a from plasma and transfected baby hamster kidney cells

    SciTech Connect

    Thim, L.; Bjoern, S.; Christensen, M.; Nicolaisen, E.M.; Lund-Hansen, T.; Pedersen, A.H.; Hedner, U. )

    1988-10-04

    Blood coagulation factor VII is a vitamin K dependent glycoprotein which in its activated form, factor VII{sub a}, participates in the coagulation process by activating factor X and/or factor IX in the presence of Ca{sup 2+} and tissue factor. Three types of potential posttranslational modifications exist in the human factor VII{sub a} molecule, namely, 10 {gamma}-carboxylated, N-terminally located glutamic acid residues, 1 {beta}-hydroxylated aspartic acid residue, and 2 N-glycosylated asparagine residues. In the present study, the amino acid sequence and posttranslational modifications of recombinant factor VII{sub a} as purified from the culture medium of a transfected baby hamster kidney cell line have been compared to human plasma factor VII{sub a}. By use of HPLC, amino acid analysis, peptide mapping, and automated Edman degradation, the protein backbone of recombinant factor VII{sub a} was found to be identical with human factor VII{sub a}. Asparagine residues 145 and 322 were found to be fully N-glycosylated in human plasma factor VII{sub a}. In the recombinant factor VII{sub a}, asparagine residue 322 was fully glycosylated whereas asparagine residue 145 was only partially (approximately 66%) glycosylated. Besides minor differences in the sialic acid and fucose contents, the overall carbohydrate compositions were nearly identical in recombinant factor VII{sub a} and human plasma factor VII{sub a}. These results show that factor VII{sub a} as produced in the transfected baby hamster kidney cells is very similar to human plasma factor VII{sub a} and that this cell line thus might represent an alternative source for human factor VII{sub a}.

  3. cDNA and deduced amino acid sequence of human pulmonary surfactant-associated proteolipid SPL(Phe)

    SciTech Connect

    Glasser, S.W.; Korfhagen, T.R.; Weaver, T.; Pilot-Matias, T.; Fox, J.L.; Whitsett, J.A.

    1987-06-01

    Hydrophobic surfactant-associated protein of M/sub r/ 6000-14,000 was isolated from either/ethanol or chloroform/methanol extracts of mammalian pulmonary surfactant. Automated Edman degradation in a gas-phase sequencer showed the major N-terminus of the human low molecular weight protein to be Phe-Pro-Ile-Pro-Leu-Pro-Try-Cys-Trp-Leu-Cys-Arg-Ala-Leu-. Because of the N-terminal phenylalanine, the surfactant protein was designated SPL(Phe). Antiserum generated against hydrophobic surfactant protein(s) from bovine pulmonary surfactant recognized protein of M/sub r/ 6000-14,000 in immunoblot analysis and was used to screen a lambdagt11 expression library constructed from adult human lung poly(A)/sup +/ RNA. This resulted in identification of a 1.4-kilobase cDNA clone that was shown to encode the N-terminus of the surfactant polypeptide SPL(Phe) (Phe-Pro-Ile-Pro-Leu-Pro-) within an open reading frame for a larger protein. Expression of a fused ..beta..-galactosidase-SPL (Phe) gene in Escherichia coli yielded an immunoreactive M/sub r/ 34,000 fusion peptide. Hybrid-arrested translation with the cDNA and immunoprecipitation of (/sup 35/S)methionine-labeled in vitro translation products of human poly(A)/sup +/ RNA with a surfactant polyclonal antibody resulted in identification of a M/sub r/ 40,000 precursor protein. Blot hybridization analysis of electrophoretically fractionated RNA from human lung detected a 2.0-kilobase RNA that was more abundant in adult lung than in fetal lung. These proteins, and specifically SPL(Phe), may therefore be useful for synthesis of replacement surfactants for treatment of hyaline membrane disease in newborn infants or of other surfactant-deficient states.

  4. Multiplexed Fragaria Chloroplast Genome Sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A method to sequence multiple chloroplast genomes that uses the sequencing depth of ultra high throughput sequencing technologies was recently described. Sequencing complete chloroplast genomes can resolve phylogenetic relationships at low taxonomic levels and identify point mutations and indels tha...

  5. The ABRF Metabolomics Research Group 2013 Study: Investigation of Spiked Compound Differences in a Human Plasma Matrix

    PubMed Central

    Asara, John M.; Wang, Yiwen; Neubert, Thomas A.; Tolstikov, Vladimir; Turck, Chris W.

    2015-01-01

    Metabolomics is an emerging field that involves qualitative and quantitative measurements of small molecule metabolites in a biological system. These measurements can be useful for developing biomarkers for diagnosis, prognosis, or predicting response to therapy. Currently, a wide variety of metabolomics approaches, including nontargeted and targeted profiling, are used across laboratories on a routine basis. A diverse set of analytical platforms, such as NMR, gas chromatography-mass spectrometry, Orbitrap mass spectrometry, and time-of-flight-mass spectrometry, which use various chromatographic and ionization techniques, are used for resolution, detection, identification, and quantitation of metabolites from various biological matrices. However, few attempts have been made to standardize experimental methodologies or comparative analyses across different laboratories. The Metabolomics Research Group of the Association of Biomolecular Resource Facilities organized a “round-robin” experiment type of interlaboratory study, wherein human plasma samples were spiked with different amounts of metabolite standards in 2 groups of biologic samples (A and B). The goal was a study that resembles a typical metabolomics analysis. Here, we report our efforts and discuss challenges that create bottlenecks for the field. Finally, we discuss benchmarks that could be used by laboratories to compare their methodologies. PMID:26290656

  6. Lichenase and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2000-08-15

    The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.

  7. M&m Sequences

    ERIC Educational Resources Information Center

    Schultz, Harris S.; Shiflett, Ray C.

    2005-01-01

    Consider a sequence recursively formed as follows: Start with three real numbers, and then when k are known, let the (k +1)st be such that the mean of all k +1 equals the median of the first k. The authors conjecture that every such sequence eventually becomes stable. This article presents results related to their conjecture.

  8. Cosmetology: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a cosmetology vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  9. Twin Mitochondrial Sequence Analysis.

    PubMed

    Bouhlal, Yosr; Martinez, Selena; Gong, Henry; Dumas, Kevin; Shieh, Joseph T C

    2013-09-01

    When applying genome-wide sequencing technologies to disease investigation, it is increasingly important to resolve sequence variation in regions of the genome that may have homologous sequences. The human mitochondrial genome challenges interpretation given the potential for heteroplasmy, somatic variation, and homologous nuclear mitochondrial sequences (numts). Identical twins share the same mitochondrial DNA (mtDNA) from early life, but whether the mitochondrial sequence remains similar is unclear. We compared an adult monozygotic twin pair using high throughput-sequencing and evaluated variants with primer extension and mitochondrial pre-enrichment. Thirty-seven variants were shared between the twin individuals, and the variants were verified on the original genomic DNA. These studies support highly identical genetic sequence in this case. Certain low-level variant calls were of high quality and homology to the mitochondrial DNA, and they were further evaluated. When we assessed calls in pre-enriched mitochondrial DNA templates, we found that these may represent numts, which can be differentiated from mtDNA variation. We conclude that twin identity extends to mitochondrial DNA, and it is critical to differentiate between numts and mtDNA in genome sequencing, particularly since significant heteroplasmy could influence genome interpretation. Further studies on mtDNA and numts will aid in understanding how variation occurs and persists. PMID:24040623

  10. Sequences for Student Investigation

    ERIC Educational Resources Information Center

    Barton, Jeffrey; Feil, David; Lartigue, David; Mullins, Bernadette

    2004-01-01

    We describe two classes of sequences that give rise to accessible problems for undergraduate research. These problems may be understood with virtually no prerequisites and are well suited for computer-aided investigation. The first sequence is a variation of one introduced by Stephen Wolfram in connection with his study of cellular automata. The…

  11. Agriculture: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 3-year program in agriculture. The guide consists of a course description; general course objectives;…

  12. Protein sequence databases.

    PubMed

    Apweiler, Rolf; Bairoch, Amos; Wu, Cathy H

    2004-02-01

    A variety of protein sequence databases exist, ranging from simple sequence repositories, which store data with little or no manual intervention in the creation of the records, to expertly curated universal databases that cover all species and in which the original sequence data are enhanced by the manual addition of further information in each sequence record. As the focus of researchers moves from the genome to the proteins encoded by it, these databases will play an even more important role as central comprehensive resources of protein information. Several the leading protein sequence databases are discussed here, with special emphasis on the databases now provided by the Universal Protein Knowledgebase (UniProt) consortium. PMID:15036160

  13. Sequence History Update Tool

    NASA Technical Reports Server (NTRS)

    Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

    2008-01-01

    The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.

  14. Nucleosome dynamics: Sequence matters.

    PubMed

    Eslami-Mossallam, Behrouz; Schiessel, Helmut; van Noort, John

    2016-06-01

    About three quarter of all eukaryotic DNA is wrapped around protein cylinders, forming nucleosomes. Even though the histone proteins that make up the core of nucleosomes are highly conserved in evolution, nucleosomes can be very different from each other due to posttranslational modifications of the histones. Another crucial factor in making nucleosomes unique has so far been underappreciated: the sequence of their DNA. This review provides an overview of the experimental and theoretical progress that increasingly points to the importance of the nucleosomal base pair sequence. Specifically, we discuss the role of the underlying base pair sequence in nucleosome positioning, sliding, breathing, force-induced unwrapping, dissociation and partial assembly and also how the sequence can influence higher-order structures. A new view emerges: the physical properties of nucleosomes, especially their dynamical properties, are determined to a large extent by the mechanical properties of their DNA, which in turn depends on DNA sequence. PMID:26896338

  15. Amino acid sequence and molecular modelling of glycoprotein IIb-IIIa and fibronectin receptor iso-antagonists from Trimeresurus elegans venom.

    PubMed Central

    Scaloni, A; Di Martino, E; Miraglia, N; Pelagalli, A; Della Morte, R; Staiano, N; Pucci, P

    1996-01-01

    Low-molecular-mass Arg-Gly-Asp (RGD)-containing polypeptides were isolated from the venom of Trimeresurus elegans by a simple two-step procedure consisting of membrane filtration and reverse-phase HPLC. A combination of electrospray MS, fast-atom bombardment MS and Edman degradation allowed us to ascertain the presence in the venom of different isoforms and to determine their primary structures. The amino acid sequences resembled the structure of elegantin, the only disintegrin previously reported from the T. elegans venom [Williams, Rucinski, Holt and Niewiarowski (1990) Biochim. Biophys, Acta 1039, 81-89]. MS analyses indicated the occurrence of differential proteolytic processing at both the N-terminus and the C-termins of the polypeptide chains. The amino acid sequence alignment of the elegantin isoforms with known components of the disintegrin family demonstrated the complete conservation of the 12 cysteine residues involved in disulphide bridges. Molecular modelling of elegantins predicted an overall folding of these molecules quite similar to that reported for the kistrin solution structure. The newly identified polypeptide isoforms strongly inhibited ADP-induced aggregation in both human and canine platelet-rich plasma but showed a different species-dependent specificity. These molecules were also able to inhibit B16-BL6 murine melanoma cell adhesion to immobilized fibronectin. The comparison of the structures and biological activities of elegantin isoforms and kistrin allowed us to highlight some structural features that, in addition to the RGD locus might be involved in the interaction of these snake-venom polypeptides with the integrin receptors on the platelet and cell surface. PMID:8920980

  16. HIV Sequence Compendium 2015

    SciTech Connect

    Foley, Brian Thomas; Leitner, Thomas Kenneth; Apetrei, Cristian; Hahn, Beatrice; Mizrachi, Ilene; Mullins, James; Rambaut, Andrew; Wolinsky, Steven; Korber, Bette Tina Marie

    2015-10-05

    This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  17. Automated DNA sequencing.

    PubMed

    Wallis, Yvonne; Morrell, Natalie

    2011-01-01

    Fluorescent cycle sequencing of PCR products is a multistage process and several methodologies are available to perform each stage. This chapter will describe the more commonly utilised dye-terminator cycle sequencing approach using BigDye® terminator chemistry (Applied Biosystems) ready for analysis on a 3730 DNA genetic analyzer. Even though DNA sequencing is one of the most common and robust techniques performed in molecular laboratories it may not always produce desirable results. The causes of the most common problems will also be discussed in this chapter. PMID:20938839

  18. Automatic Command Sequence Generation

    NASA Technical Reports Server (NTRS)

    Fisher, Forest; Gladded, Roy; Khanampompan, Teerapat

    2007-01-01

    Automatic Sequence Generator (Autogen) Version 3.0 software automatically generates command sequences for the Mars Reconnaissance Orbiter (MRO) and several other JPL spacecraft operated by the multi-mission support team. Autogen uses standard JPL sequencing tools like APGEN, ASP, SEQGEN, and the DOM database to automate the generation of uplink command products, Spacecraft Command Message Format (SCMF) files, and the corresponding ground command products, DSN Keywords Files (DKF). Autogen supports all the major multi-mission mission phases including the cruise, aerobraking, mapping/science, and relay mission phases. Autogen is a Perl script, which functions within the mission operations UNIX environment. It consists of two parts: a set of model files and the autogen Perl script. Autogen encodes the behaviors of the system into a model and encodes algorithms for context sensitive customizations of the modeled behaviors. The model includes knowledge of different mission phases and how the resultant command products must differ for these phases. The executable software portion of Autogen, automates the setup and use of APGEN for constructing a spacecraft activity sequence file (SASF). The setup includes file retrieval through the DOM (Distributed Object Manager), an object database used to store project files. This step retrieves all the needed input files for generating the command products. Depending on the mission phase, Autogen also uses the ASP (Automated Sequence Processor) and SEQGEN to generate the command product sent to the spacecraft. Autogen also provides the means for customizing sequences through the use of configuration files. By automating the majority of the sequencing generation process, Autogen eliminates many sequence generation errors commonly introduced by manually constructing spacecraft command sequences. Through the layering of commands into the sequence by a series of scheduling algorithms, users are able to rapidly and reliably construct the

  19. Compact rotary sequencer

    NASA Technical Reports Server (NTRS)

    Appleberry, W. T.

    1980-01-01

    Rotary sequencer is assembled from conventional planetary differential gearset and latching mechanism utilizing inputs and outputs which are coaxial. Applications include automated production-line equipment in home appliances and in vehicles.

  20. Sequencing Complex Genomic Regions

    SciTech Connect

    Eichler, Evan

    2009-05-28

    Evan Eichler, Howard Hughes Medical Investigator at the University of Washington, gives the May 28, 2009 keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM. Part 1 of 2

  1. Sequencing Complex Genomic Regions

    SciTech Connect

    Eichler, Evan

    2009-05-28

    Evan Eichler, Howard Hughes Medical Investigator at the University of Washington, gives the May 28, 2009 keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM. Part 2 of 2

  2. Authentication of byte sequences

    SciTech Connect

    Stearns, S.D.

    1991-06-01

    Algorithms for the authentication of byte sequences are described. The algorithms are designed to authenticate data in the Storage, Retrieval, Analysis, and Display (SRAD) Test Data Archive of the Radiation Effects and Testing Directorate (9100) at Sandia National Laboratories, and may be used in similar situations where authentication of stored data is required. The algorithms use a well-known error detection method called the Cyclic Redundancy Check (CRC). When a byte sequence is authenticated and stored, CRC bytes are generated and attached to the end of the sequence. When the authenticated data is retrieved, the authentication check consists of processing the entire sequence, including the CRC bytes, and checking for a remainder of zero. The error detection properties of the CRC are extensive and result in a reliable authentication of SRAD data.

  3. SINGLE CELL GENOME SEQUENCING

    PubMed Central

    Yilmaz, Suzan; Singh, Anup K.

    2011-01-01

    Whole genome amplification and next-generation sequencing of single cells has become a powerful approach for studying uncultivated microorganisms that represent 90–99 % of all environmental microbes. Single cell sequencing enables not only the identification of microbes but also linking of functions to species, a feat not achievable by metagenomic techniques. Moreover, it allows the analysis of low abundance species that may be missed in community-based analyses. It has also proved very useful in complementing metagenomics in the assembly and binning of single genomes. With the advent of drastically cheaper and higher throughput sequencing technologies, it is expected that single cell sequencing will become a standard tool in studying the genome and transcriptome of microbial communities. PMID:22154471

  4. HIV Sequence Compendium 2010

    SciTech Connect

    Kuiken, Carla; Foley, Brian; Leitner, Thomas; Apetrei, Christian; Hahn, Beatrice; Mizrachi, Ilene; Mullins, James; Rambaut, Andrew; Wolinsky, Steven; Korber, Bette

    2010-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is still increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  5. Burst diaphragm sequence valve

    NASA Astrophysics Data System (ADS)

    Wisneskie, Bradley D.; Hyman, Sheldon; Hallum, Charles E.

    1991-11-01

    A burst diaphragm sequence valve which effectively combines the structure of a burst diaphragm with that of an ordinary swing check valve, the pivot of the ordinary swing check valve being replaced by an integral flexural hinge. The sequence valve provides a way to sequentially burn solid propellant hot gas generators which exit into a common gas manifold, thereby enabling gas-powered devices to operate for a longer time than the duration of one gas generator burn.

  6. Pairwise Sequence Alignment Library

    Energy Science and Technology Software Center (ESTSC)

    2015-05-20

    Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprintmore » that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, a novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.« less

  7. Pairwise Sequence Alignment Library

    SciTech Connect

    Jeff Daily, PNNL

    2015-05-20

    Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, a novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.

  8. Nanapore Sequencing with MSPA

    NASA Astrophysics Data System (ADS)

    Gundlach, Jens H.

    2011-10-01

    Nanopore sequencing is the simplest concept of converting the sequence of a single DNA molecule directly into an electronic signal. We introduced the protein pore MspA. derived from Mycobacterium smegmatis, to nanpore sequencing [1]. MspA has a single, narrow (-1.2nm) and short (<1nm) constriction, ideal to identify single nucleotides. Compared to solid state devices, MspA is reproducible with sub-nanometer precision and is engineerable using genetic mutations. DNA moves through the pore at rates exceeding 1nt/microsec. too fast to observe the passage of each nucleotide. However, when DNA is held with double stranded DNA sections or an avidin anchor, single nucleotides resident in MspA's constriction can be identified with highly resolved current differences. We have provided proof of principle of a nanopore sequencing method [2] in which we use DNA modified by inserting double stranded DNA-sections between every nucleotide. The double stranded sections are designed to halt translocation for long enough to sequentially read the sequence of the original DNA molecule. Prospects and developments to sequence unmodified native DNA using MspA will be discussed.[4pt] [1] T.Z. Butler, et al, PNAS 105 20647 (2008)[0pt] [2] I.M. Derrington, et al, PNAS 107 16060 (2010).

  9. Program Synthesizes UML Sequence Diagrams

    NASA Technical Reports Server (NTRS)

    Barry, Matthew R.; Osborne, Richard N.

    2006-01-01

    A computer program called "Rational Sequence" generates Universal Modeling Language (UML) sequence diagrams of a target Java program running on a Java virtual machine (JVM). Rational Sequence thereby performs a reverse engineering function that aids in the design documentation of the target Java program. Whereas previously, the construction of sequence diagrams was a tedious manual process, Rational Sequence generates UML sequence diagrams automatically from the running Java code.

  10. Controlled processing during sequencing

    PubMed Central

    Thothathiri, Malathi; Rattinger, Michelle

    2015-01-01

    Longstanding evidence has identified a role for the frontal cortex in sequencing within both linguistic and non-linguistic domains. More recently, neuropsychological studies have suggested a specific role for the left premotor-prefrontal junction (BA 44/6) in selection between competing alternatives during sequencing. In this study, we used neuroimaging with healthy adults to confirm and extend knowledge about the neural correlates of sequencing. Participants reproduced visually presented sequences of syllables and words using manual button presses. Items in the sequence were presented either consecutively or concurrently. Concurrent presentation is known to trigger the planning of multiple responses, which might compete with one another. Therefore, we hypothesized that regions involved in controlled processing would show greater recruitment during the concurrent than the consecutive condition. Whole-brain analysis showed concurrent > consecutive activation in sensory, motor and somatosensory cortices and notably also in rostral-dorsal anterior cingulate cortex. Region of interest analyses showed increased activation within left BA 44/6 and correlation between this region’s activation and behavioral response times. Functional connectivity analysis revealed increased connectivity between left BA 44/6 and the posterior lobe of the cerebellum during the concurrent than the consecutive condition. These results corroborate recent evidence and demonstrate the involvement of BA 44/6 and other control regions when ordering co-activated representations. PMID:26578941

  11. DNA sequencing: chemical methods

    SciTech Connect

    Ambrose, B.J.B.; Pless, R.C.

    1987-01-01

    Limited base-specific or base-selective cleavage of a defined DNA fragment yields polynucleotide products, the length of which correlates with the positions of the particular base (or bases) in the original fragment. Sverdlov and co-workers recognized the possibility of using this principle for the determination of DNA sequences. In 1977 a fully elaborated method was introduced based on this principle, which allowed routine analysis of DNA sequences over distances greater than 100 nucleotide unite from a defined, radiolabeled terminus. Six procedures for partial cleavage were described. Simultaneous parallel resolution of an appropriate set of partial cleavage mixtures by polyacrylamide gel electrophoresis, followed by visualization of the radioactive bands by autoradiography, allows the deduction of nucleotide sequence.

  12. Sequencing the Connectome

    PubMed Central

    Zador, Anthony M.; Dubnau, Joshua; Oyibo, Hassana K.; Zhan, Huiqing; Cao, Gang; Peikon, Ian D.

    2012-01-01

    Connectivity determines the function of neural circuits. Historically, circuit mapping has usually been viewed as a problem of microscopy, but no current method can achieve high-throughput mapping of entire circuits with single neuron precision. Here we describe a novel approach to determining connectivity. We propose BOINC (“barcoding of individual neuronal connections”), a method for converting the problem of connectivity into a form that can be read out by high-throughput DNA sequencing. The appeal of using sequencing is that its scale—sequencing billions of nucleotides per day is now routine—is a natural match to the complexity of neural circuits. An inexpensive high-throughput technique for establishing circuit connectivity at single neuron resolution could transform neuroscience research. PMID:23109909

  13. Definition of Mycobacterium tuberculosis culture filtrate proteins by two-dimensional polyacrylamide gel electrophoresis, N-terminal amino acid sequencing, and electrospray mass spectrometry.

    PubMed Central

    Sonnenberg, M G; Belisle, J T

    1997-01-01

    A number of the culture filtrate proteins secreted by Mycobacterium tuberculosis are known to contribute to the immunology of tuberculosis and to possess enzymatic activities associated with pathogenicity. However, a complete analysis of the protein composition of this fraction has been lacking. By using two-dimensional polyacrylamide gel electrophoresis, detailed maps of the culture filtrate proteins of M. tuberculosis H37Rv were generated. In total, 205 protein spots were observed. The coupling of this electrophoretic technique with Western blot analysis allowed the identification and mapping of 32 proteins. Further molecular characterization of abundant proteins within this fraction was achieved by N-terminal amino acid sequencing and liquid chromatography-mass spectrometry. Eighteen proteins were subjected to N-group analysis; of these, only 10 could be sequenced by Edman degradation. Among the most interesting were a novel 52-kDa protein demonstrating significant homology to an alpha-hydroxysteroid dehydrogenase of Eubacterium sp. strain VPI 12708, a 25-kDa protein corresponding to open reading frame 28 of the M. tuberculosis cosmid MTCY1A11, and a 31-kDa protein exhibiting an amino acid sequence identical to that of antigen 85A and 85B. This latter product migrated with an isoelectric point between those of antigen 85A and 85C but did not react with the antibody specific for this complex, suggesting that there is a fourth member of the antigen 85 complex. Novel N-terminal amino acid sequences were obtained for three additional culture filtrate proteins; however, these did not yield significant homology to known protein sequences. A protein cluster of 85 to 88 kDa, recognized by the monoclonal antibodies IT-57 and IT-42 and known to react with sera from a large proportion of tuberculosis patients, was refractory to N-group analysis. Nevertheless, mass spectrometry of peptides obtained from one member of this complex identified it as the M. tuberculosis Kat

  14. Molecular characterization of the body site-specific human epidermal cytokeratin 9: cDNA cloning, amino acid sequence, and tissue specificity of gene expression.

    PubMed

    Langbein, L; Heid, H W; Moll, I; Franke, W W

    1993-12-01

    Differentiation of human plantar and palmar epidermis is characterized by the suprabasal synthesis of a major special intermediate-sized filament (IF) protein, the type I (acidic) cytokeratin 9 (CK 9). Using partial amino acid (aa) sequence information obtained by direct Edman sequencing of peptides resulting from proteolytic digestion of purified CK 9, we synthesized several redundant primers by 'back-translation'. Amplification by polymerase chain reaction (PCR) of cDNAs obtained by reverse transcription of mRNAs from human foot sole epidermis, including 5'-primer extension, resulted in multiple overlapping cDNA clones, from which the complete cDNA (2353 bp) could be constructed. This cDNA encoded the CK 9 polypeptide with a calculated molecular weight of 61,987 and an isoelectric point at about pH 5.0. The aa sequence deduced from cDNA was verified in several parts by comparison with the peptide sequences and showed the typical structure of type I CKs, with a head (153 aa), and alpha-helical coiled-coil-forming rod (306 aa), and a tail (163 aa) domain. The protein displayed the highest homology to human CK 10, not only in the highly conserved rod domain but also in large parts of the head and the tail domains. On the other hand, the aa sequence revealed some remarkable differences from CK 10 and other CKs, even in the most conserved segments of the rod domain. The nuclease digestion pattern seen on Southern blot analysis of human genomic DNA indicated the existence of a unique CK 9 gene. Using CK 9-specific riboprobes for hybridization on Northern blots of RNAs from various epithelia, a mRNA of about 2.4 kb in length could be identified only in foot sole epidermis, and a weaker cross-hybridization signal was seen in RNA from bovine heel pad epidermis at about 2.0 kb. A large number of tissues and cell cultures were examined by PCR of mRNA-derived cDNAs, using CK 9-specific primers. But even with this very sensitive signal amplification, only palmar

  15. Method to amplify variable sequences without imposing primer sequences

    DOEpatents

    Bradbury, Andrew M.; Zeytun, Ahmet

    2006-11-14

    The present invention provides methods of amplifying target sequences without including regions flanking the target sequence in the amplified product or imposing amplification primer sequences on the amplified product. Also provided are methods of preparing a library from such amplified target sequences.

  16. Ranking and Sequencing Model

    Energy Science and Technology Software Center (ESTSC)

    2009-08-13

    This database application (commonly called the Supermodel) provides a repository for managing critical facility/project information, allows the user to subjectively an objectively assess key criteria , quantify project risks, develop ROM cost estimates, determine facility/project end states, ultimately performing risk-based modeling to rank facilities/project based on risk, sequencing project schedules and provides an optimized recommended sequencing/scheduling of these projects which maximize the S&M cost savings to perform closure projects which benefit all stakeholders.

  17. Microchips for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Mastrangelo, Carlos H.; Palaniappan, S.; Man, Piu Francis; Burns, Mark A.; Burke, David T.

    1999-08-01

    Genetic information is vital for understanding features and response of an organism. In humans, genetic errors are linked to the development of major diseases such as cancer and diabetes. In order to maximally exploit this information it is necessary to develop miniature sequencing assays that are rapid and inexpensive. In this paper we show how this could be attained with microfluidic chips that contain integrated assays. To date simple silicon/glass chips aimed for sequencing purpose have been realized; but these chips are not yet practical. Some of the solutions that are used to bring these devices closer to commercial applications are discussed.

  18. The Compliment Sequence.

    ERIC Educational Resources Information Center

    Sims, Anntarie L.

    1989-01-01

    Describes and examines 150 tape-recorded compliment sequences. Reports that the course and outcome of compliments and compliment responses are affected by: (1) the way a compliment is worded; (2) the type of statement that precedes or follows the compliment; and (3) the status and sex of the compliment participants. (RAE)

  19. A Sequence of Cylinders

    ERIC Educational Resources Information Center

    Johnson, Erica

    2006-01-01

    Hoping to develop in her students an understanding of mathematics as a way of thinking more than a way of doing, the author of this article describes how her students worked on a spatial reasoning problem stemming from an iteratively constructed sequence of cylinders. She presents an activity of making cylinders out of paper models, and for every…

  20. Transposon facilitated DNA sequencing

    SciTech Connect

    Berg, D.E.; Berg, C.M.; Huang, H.V.

    1990-01-01

    The purpose of this research is to investigate and develop methods that exploit the power of bacterial transposable elements for large scale DNA sequencing: Our premise is that the use of transposons to put primer binding sites randomly in target DNAs should provide access to all portions of large DNA fragments, without the inefficiencies of methods involving random subcloning and attendant repetitive sequencing, or of sequential synthesis of many oligonucleotide primers that are used to match systematically along a DNA molecule. Two unrelated bacterial transposons, Tn5 and {gamma}{delta}, are being used because they have both proven useful for molecular analyses, and because they differ sufficiently in mechanism and specificity of transposition to merit parallel development.

  1. RNA Sequencing in Schizophrenia

    PubMed Central

    Li, Xin; Teng, Shaolei

    2015-01-01

    Schizophrenia (SCZ) is a serious psychiatric disorder that affects 1% of general population and places a heavy burden worldwide. The underlying genetic mechanism of SCZ remains unknown, but studies indicate that the disease is associated with a global gene expression disturbance across many genes. Next-generation sequencing, particularly of RNA sequencing (RNA-Seq), provides a powerful genome-scale technology to investigate the pathological processes of SCZ. RNA-Seq has been used to analyze the gene expressions and identify the novel splice isoforms and rare transcripts associated with SCZ. This paper provides an overview on the genetics of SCZ, the advantages of RNA-Seq for transcriptome analysis, the accomplishments of RNA-Seq in SCZ cohorts, and the applications of induced pluripotent stem cells and RNA-Seq in SCZ research. PMID:27053919

  2. Rapid Polymer Sequencer

    NASA Technical Reports Server (NTRS)

    Stolc, Viktor (Inventor); Brock, Mathew W. (Inventor)

    2011-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal or transverse direction at the tip, a polymer sequence is passed through the tip, and a change in an electrical current signal is measured as each polymer component passes through the tip. Each measured change in electrical current signals is compared with a database of reference signals, with each reference signal identified with a polymer component, to identify the unknown polymer component. The tip preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  3. HIV sequence compendium 2002

    SciTech Connect

    Kuiken, Carla; Foley, Brian; Freed, Eric; Hahn, Beatrice; Marx, Preston; McCutchan, Francine; Mellors, John; Wolinsky, Steven; Korber, Bette

    2002-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Traditionally, we present the sequence data themselves in the form of alignments: Section II, an alignment of a selection of HIV-1/SIVcpz full-length genomes (a lot of LAI-like sequences, for example, have been omitted because they are so similar that they bias the alignment); Section III, a combined HIV-1/HIV-2/SIV whole genome alignment; Sections IV–VI, amino acid alignments for HIV-1/SIV-cpz, HIV-2/SIV, and SIVagm. The HIV-2/SIV and SIVagm amino acid alignments are separate because the genetic distances between these groups are so great that presenting them in one alignment would make it very elongated because of the large number of gaps that have to be inserted. As always, tables with extensive background information gathered from the literature accompany the whole genome alignments. The collection of whole-gene sequences in the database is now large enough that we have abundant representation of most subtypes. For many subtypes, and especially for subtype B, a large number of sequences that span entire genes were not included in the printed alignments to conserve space. A more complete version of all alignments is available on our website, http://hiv-web.lanl.gov/content/hiv-db/ALIGN_CURRENT/ALIGN-INDEX.html. Importantly, all these alignments have been edited to include only one sequence per person, based on phylogenetic trees that were created for all of them, as well as on the literature. Because of the number of sequences available, we have decided to use a different selection principle this year, based on the epidemiological importance of the subtypes. Subtypes A–D and CRFs 01 and 02 are by far the most widespread variants, and for these (when available) we have included 8–10 representatives in the alignments. The other

  4. Sequencing BPS spectra

    NASA Astrophysics Data System (ADS)

    Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar; Stošić, Marko; Sułkowski, Piotr

    2016-03-01

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincaré polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (refined) modular S-matrix. This leads to the identification of modular transformations in Chern-Simons theory and 3d {N}=2 theory via the 3d/3d correspondence. Lastly, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.

  5. Multiview image sequence enhancement

    NASA Astrophysics Data System (ADS)

    Jovanov, Ljubomir; Luong, Hiêp; Ružic, Tijana; Philips, Wilfried

    2015-03-01

    Realistic visualization is crucial for more intuitive representation of complex data, medical imaging, simulation and entertainment systems. Multiview autostereoscopic displays are great step towards achieving complete immersive user experience. However, providing high quality content for this type of displays is still a great challenge. Due to the different characteristics/settings of the cameras in the multivew setup and varying photometric characteristics of the objects in the scene, the same object may have different appearance in the sequences acquired by the different cameras. Images representing views recorded using different cameras in practice have different local noise, color and sharpness characteristics. View synthesis algorithms introduce artefacts due to errors in disparity estimation/bad occlusion handling or due to erroneous warping function estimation. If the input multivew images are not of sufficient quality and have mismatching color and sharpness characteristics, these artifacts may become even more disturbing. The main goal of our method is to simultaneously perform multiview image sequence denoising, color correction and the improvement of sharpness in slightly blurred regions. Results show that the proposed method significantly reduces the amount of the artefacts in multiview video sequences resulting in a better visual experience.

  6. Sequence logos: a new way to display consensus sequences.

    PubMed Central

    Schneider, T D; Stephens, R M

    1990-01-01

    A graphical method is presented for displaying the patterns in a set of aligned sequences. The characters representing the sequence are stacked on top of each other for each position in the aligned sequences. The height of each letter is made proportional to its frequency, and the letters are sorted so the most common one is on top. The height of the entire stack is then adjusted to signify the information content of the sequences at that position. From these 'sequence logos', one can determine not only the consensus sequence but also the relative frequency of bases and the information content (measured in bits) at every position in a site or sequence. The logo displays both significant residues and subtle sequence patterns. PMID:2172928

  7. A vision for ubiquitous sequencing

    PubMed Central

    Erlich, Yaniv

    2015-01-01

    Genomics has recently celebrated reaching the $1000 genome milestone, making affordable DNA sequencing a reality. With this goal successfully completed, the next goal of the sequencing revolution can be sequencing sensors—miniaturized sequencing devices that are manufactured for real-time applications and deployed in large quantities at low costs. The first part of this manuscript envisions applications that will benefit from moving the sequencers to the samples in a range of domains. In the second part, the manuscript outlines the critical barriers that need to be addressed in order to reach the goal of ubiquitous sequencing sensors. PMID:26430149

  8. High-Throughput Sequencing Technologies

    PubMed Central

    Reuter, Jason A.; Spacek, Damek; Snyder, Michael P.

    2015-01-01

    Summary The human genome sequence has profoundly altered our understanding of biology, human diversity and disease. The path from the first draft sequence to our nascent era of personal genomes and genomic medicine has been made possible only because of the extraordinary advancements in DNA sequencing technologies over the past ten years. Here, we discuss commonly used high-throughput sequencing platforms, the growing array of sequencing assays developed around them as well as the challenges facing current sequencing platforms and their clinical application. PMID:26000844

  9. Sequencing the Unrearranged Human Immunoglobin

    SciTech Connect

    Warren, Rene

    2010-06-03

    Rene Warren from Canada's Michael Smith Genome Sciences Centre discusses sequencing and finishing the IgH heavy chain locus on June 3, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  10. Next-Generation Sequencing Platforms

    NASA Astrophysics Data System (ADS)

    Mardis, Elaine R.

    2013-06-01

    Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.

  11. Music Sequencing and Printing Software.

    ERIC Educational Resources Information Center

    Kassner, Kirk

    2000-01-01

    States that sequencing and printing software eliminates the barriers to students composing music. Describes "Master Tracks Pro," a sequencing program, and "Rhapsody," a printing program. Includes a lesson plan for setting pentatonic music to a poem. (CMK)

  12. Sequence repeats and protein structure

    NASA Astrophysics Data System (ADS)

    Hoang, Trinh X.; Trovato, Antonio; Seno, Flavio; Banavar, Jayanth R.; Maritan, Amos

    2012-11-01

    Repeats are frequently found in known protein sequences. The level of sequence conservation in tandem repeats correlates with their propensities to be intrinsically disordered. We employ a coarse-grained model of a protein with a two-letter amino acid alphabet, hydrophobic (H) and polar (P), to examine the sequence-structure relationship in the realm of repeated sequences. A fraction of repeated sequences comprises a distinct class of bad folders, whose folding temperatures are much lower than those of random sequences. Imperfection in sequence repetition improves the folding properties of the bad folders while deteriorating those of the good folders. Our results may explain why nature has utilized repeated sequences for their versatility and especially to design functional proteins that are intrinsically unstructured at physiological temperatures.

  13. Marks of Change in Sequences

    NASA Astrophysics Data System (ADS)

    Jürgensen, H.

    2011-12-01

    Given a sequence of events, how does one recognize that a change has occurred? We explore potential definitions of the concept of change in a sequence and propose that words in relativized solid codes might serve as indicators of change.

  14. Cement sequence stratigraphy in carbonates

    SciTech Connect

    Braithwaite, C.J.R. )

    1993-03-01

    Conventional paragenesis analysis commonly fails to describe the subtleties of regional variation found in carbonate cement sequences. Application of the concept of sequence stratigraphy to diagenetic studies allows grouping of the products of depositional (precipitation or crystallization) events into sequences. The boundaries to these sequences are surfaces reflecting erosion (dissolution), non-deposition (renucleation), compaction or imposed fracture events. The pattern of chemical changes within a diagenetic sequence provides a fingerprint which allows that sequence to be recognized among others. Diagenetic sequences may be grouped into temporal series to provide an overall view of the diagenesis of the unit which takes account of local unconformities. The belief that lateral stratigraphical equivalence can serve as a proxy for diagenetic time equivalence is evidently mistaken; diagenetic boundaries may cross stratigraphic boundaries. The application of sequence stratigraphy concepts provides a valuable tool for interpreting regional diagenesis in carbonates and potentially offers insight into the pathways of hydrocarbon or ore fluid migration.

  15. DNA Sequencing apparatus

    DOEpatents

    Tabor, Stanley; Richardson, Charles C.

    1992-01-01

    An automated DNA sequencing apparatus having a reactor for providing at least two series of DNA products formed from a single primer and a DNA strand, each DNA product of a series differing in molecular weight and having a chain terminating agent at one end; separating means for separating the DNA products to form a series bands, the intensity of substantially all nearby bands in a different series being different, band reading means for determining the position an This invention was made with government support including a grant from the U.S. Public Health Service, contract number AI-06045. The U.S. government has certain rights in the invention.

  16. Correspondence: Searching sequence space

    SciTech Connect

    Youvan, D.C.

    1995-08-01

    This correspondence debates the efficiency and application of genetic algorithms (GAs) to search protein sequence space. The important experimental point is that such sparse searches utilize physically realistic syntheses. In this regard, all GA-based technologies are very similar; they {open_quotes}learn{close_quotes} from their initial sparse search and then generate interesting new proteins within a few iterations. Which GA-based technology is best? That probably depends on the protein and the specific engineering goal. Given the fact that the field of combinatorial chemistry is still in its infancy, it is probably wise to consider all of the proven mutagenesis methods. 19 refs.

  17. Nucleotide sequences 1986/1987

    SciTech Connect

    Not Available

    1987-01-01

    These eight volumes are the third annual published compendium of nucleic acid sequences included in the European Molecular Biology Laboratory Nucleotide Sequence Data Library and the GenBank Genetic Sequences Data Bank. Each volume surveys one or more subdivisions of the database. The volume subtitles are: Primates; Rodents; Other Vertebrates and Invertebrates, Plants and Organelles, Bacteria and Bacteriophage, Viruses, Structural RNA, Synthetic and Unannotated Sequences, and Database Directory and Master Indices.

  18. Sequencing Technologies Panel at SFAF

    SciTech Connect

    Turner, Steve; Fiske, Haley; Knight, Jim; Rhodes, Michael; Vander Horn, Peter

    2010-06-02

    From left to right: Steve Turner of Pacific Biosciences, Haley Fiske of Illumina, Jim Knight of Roche, Michael Rhodes of Life Technologies and Peter Vander Horn of Life Technologies' Single Molecule Sequencing group discuss new sequencing technologies and applications on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  19. Sequence Factorial and Its Applications

    ERIC Educational Resources Information Center

    Asiru, Muniru A.

    2012-01-01

    In this note, we introduce sequence factorial and use this to study generalized M-bonomial coefficients. For the sequence of natural numbers, the twin concepts of sequence factorial and generalized M-bonomial coefficients, respectively, extend the corresponding concepts of factorial of an integer and binomial coefficients. Some latent properties…

  20. Automated Identification of Nucleotide Sequences

    NASA Technical Reports Server (NTRS)

    Osman, Shariff; Venkateswaran, Kasthuri; Fox, George; Zhu, Dian-Hui

    2007-01-01

    STITCH is a computer program that processes raw nucleotide-sequence data to automatically remove unwanted vector information, perform reverse-complement comparison, stitch shorter sequences together to make longer ones to which the shorter ones presumably belong, and search against the user s choice of private and Internet-accessible public 16S rRNA databases. ["16S rRNA" denotes a ribosomal ribonucleic acid (rRNA) sequence that is common to all organisms.] In STITCH, a template 16S rRNA sequence is used to position forward and reverse reads. STITCH then automatically searches known 16S rRNA sequences in the user s chosen database(s) to find the sequence most similar to (the sequence that lies at the smallest edit distance from) each spliced sequence. The result of processing by STITCH is the identification of the most similar well-described bacterium. Whereas previously commercially available software for analyzing genetic sequences operates on one sequence at a time, STITCH can manipulate multiple sequences simultaneously to perform the aforementioned operations. A typical analysis of several dozen sequences (length of the order of 103 base pairs) by use of STITCH is completed in a few minutes, whereas such an analysis performed by use of prior software takes hours or days.

  1. Chromosome specific repetitive DNA sequences

    DOEpatents

    Moyzis, Robert K.; Meyne, Julianne

    1991-01-01

    A method is provided for determining specific nucleotide sequences useful in forming a probe which can identify specific chromosomes, preferably through in situ hybridization within the cell itself. In one embodiment, chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family me This invention is the result of a contract with the Department of Energy (Contract No. W-7405-ENG-36).

  2. Natural protein sequences are more intrinsically disordered than random sequences.

    PubMed

    Yu, Jia-Feng; Cao, Zanxia; Yang, Yuedong; Wang, Chun-Ling; Su, Zhen-Dong; Zhao, Ya-Wei; Wang, Ji-Hua; Zhou, Yaoqi

    2016-08-01

    Most natural protein sequences have resulted from millions or even billions of years of evolution. How they differ from random sequences is not fully understood. Previous computational and experimental studies of random proteins generated from noncoding regions yielded inclusive results due to species-dependent codon biases and GC contents. Here, we approach this problem by investigating 10,000 sequences randomized at the amino acid level. Using well-established predictors for protein intrinsic disorder, we found that natural sequences have more long disordered regions than random sequences, even when random and natural sequences have the same overall composition of amino acid residues. We also showed that random sequences are as structured as natural sequences according to contents and length distributions of predicted secondary structure, although the structures from random sequences may be in a molten globular-like state, according to molecular dynamics simulations. The bias of natural sequences toward more intrinsic disorder suggests that natural sequences are created and evolved to avoid protein aggregation and increase functional diversity. PMID:26801222

  3. Sequence Maneuverer: tool for sequence extraction from genomes

    PubMed Central

    Yasmin, Tayyaba; Rehman, Inayat Ur; Ansari, Adnan Ahmad; liaqat, Khurrum; khan, Muhammad Irfan

    2012-01-01

    The availability of genomic sequences of many organisms has opened new challenges in many aspects particularly in terms of genome analysis. Sequence extraction is a vital step and many tools have been developed to solve this issue. These tools are available publically but have limitations with reference to the sequence extraction, length of the sequence to be extracted, organism specificity and lack of user friendly interface. We have developed a java based software package having three modules which can be used independently or sequentially. The tool efficiently extracts sequences from large datasets with few simple steps. It can efficiently extract multiple sequences of any desired length from a genome of any organism. The results are crosschecked by published data. Availability URL 1: http://ww3.comsats.edu.pk/bio/ResearchProjects.aspx URL 2: http://ww3.comsats.edu.pk/bio/SequenceManeuverer.aspx PMID:23275734

  4. Rapid Polymer Sequencer

    NASA Technical Reports Server (NTRS)

    Stolc, Viktor (Inventor); Brock, Matthew W (Inventor)

    2013-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal direction, or in a transverse direction, in the tip region, a polymer sequence is passed through the tip region, and a change in an electrical current signal is measured as each polymer component passes through the tip region. Each of the measured changes in electrical current signals is compared with a database of reference electrical change signals, with each reference signal corresponding to an identified polymer component, to identify the unknown polymer component with a reference polymer component. The nanopore preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  5. The EMBL Nucleotide Sequence Database.

    PubMed

    Stoesser, G; Tuli, M A; Lopez, R; Sterk, P

    1999-01-01

    The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl.html) constitutes Europe's primary nucleotide sequence resource. Main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing projects and patent applications. While automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO), the preferred submission tool for individual submitters is Webin (WWW). Through all stages, dataflow is monitored by EBI biologists communicating with the sequencing groups. In collaboration with DDBJ and GenBank the database is produced, maintained and distributed at the European Bioinformatics Institute (EBI). Database releases are produced quarterly and are distributed on CD-ROM. Network services allow access to the most up-to-date data collection via Internet and World Wide Web interface. EBI's Sequence Retrieval System (SRS) is a Network Browser for Databanks in Molecular Biology, integrating and linking the main nucleotide and protein databases, plus many specialised databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, Blast etc) are available for external users to compare their own sequences against the most currently available data in the EMBL Nucleotide Sequence Database and SWISS-PROT. PMID:9847133

  6. Sequence analysis on microcomputers.

    PubMed

    Cannon, G C

    1987-10-01

    Overall, each of the program packages performed their tasks satisfactorily. For analyses where there was a well-defined answer, such as a search for a restriction site, there were few significant differences between the program sets. However, for tasks in which a degree of flexibility is desirable, such as homology or similarity determinations and database searches, DNASTAR consistently afforded the user more options in conducting the required analysis than did the other two packages. However, for laboratories where sequence analysis is not a major effort and the expense of a full sequence analysis workstation cannot be justified, MicroGenie and IBI-Pustell offer a satisfactory alternative. MicroGenie is a polished program system. Many may find that its user interface is more "user friendly" than the standard menu-driven interfaces. Its system of filing sequences under individual passwords facilitates use by more than one person. MicroGenie uses a hardware device for software protection that occupies a card slot in the computer on which it is used. Although I am sympathetic to the problem of software piracy, I feel that a less drastic solution is in order for a program likely to be sharing limited computer space with other software packages. The IBI-Pustell package performs the required analysis functions as accurately and quickly as MicroGenie but it lacks the clearness and ease of use. The menu system seems disjointed, and new or infrequent users often find themselves at apparent "dead-end menus" where the only clear alternative is to restart the entire program package. It is suggested from published accounts that the user interface is going to be upgraded and perhaps when that version is available, use of the system will be improved. The documentation accompanying each package was relatively clear as to how to run the programs, but all three packages assumed that the user was familiar with the computational techniques employed. MicroGenie and IBI-Pustell further

  7. The evolution of nanopore sequencing.

    PubMed

    Wang, Yue; Yang, Qiuping; Wang, Zhimin

    2014-01-01

    The "$1000 Genome" project has been drawing increasing attention since its launch a decade ago. Nanopore sequencing, the third-generation, is believed to be one of the most promising sequencing technologies to reach four gold standards set for the "$1000 Genome" while the second-generation sequencing technologies are bringing about a revolution in life sciences, particularly in genome sequencing-based personalized medicine. Both of protein and solid-state nanopores have been extensively investigated for a series of issues, from detection of ionic current blockage to field-effect-transistor (FET) sensors. A newly released protein nanopore sequencer has shown encouraging potential that nanopore sequencing will ultimately fulfill the gold standards. In this review, we address advances, challenges, and possible solutions of nanopore sequencing according to these standards. PMID:25610451

  8. The evolution of nanopore sequencing

    PubMed Central

    Wang, Yue; Yang, Qiuping; Wang, Zhimin

    2014-01-01

    The “$1000 Genome” project has been drawing increasing attention since its launch a decade ago. Nanopore sequencing, the third-generation, is believed to be one of the most promising sequencing technologies to reach four gold standards set for the “$1000 Genome” while the second-generation sequencing technologies are bringing about a revolution in life sciences, particularly in genome sequencing-based personalized medicine. Both of protein and solid-state nanopores have been extensively investigated for a series of issues, from detection of ionic current blockage to field-effect-transistor (FET) sensors. A newly released protein nanopore sequencer has shown encouraging potential that nanopore sequencing will ultimately fulfill the gold standards. In this review, we address advances, challenges, and possible solutions of nanopore sequencing according to these standards. PMID:25610451

  9. ISHAN: sequence homology analysis package.

    PubMed

    Shil, Pratip; Dudani, Niraj; Vidyasagar, Pandit B

    2006-01-01

    Sequence based homology studies play an important role in evolutionary tracing and classification of proteins. Various methods are available to analyze biological sequence information. However, with the advent of proteomics era, there is a growing demand for analysis of huge amount of biological sequence information, and it has become necessary to have programs that would provide speedy analysis. ISHAN has been developed as a homology analysis package, built on various sequence analysis tools viz FASTA, ALIGN, CLUSTALW, PHYLIP and CODONW (for DNA sequences). This JAVA application offers the user choice of analysis tools. For testing, ISHAN was applied to perform phylogenetic analysis for sets of Caspase 3 DNA sequences and NF-kappaB p105 amino acid sequences. By integrating several tools it has made analysis much faster and reduced manual intervention. PMID:17274766

  10. Solid phase sequencing of biopolymers

    DOEpatents

    Cantor, Charles; Koster, Hubert

    2010-09-28

    This invention relates to methods for detecting and sequencing target nucleic acid sequences, to mass modified nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probes comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include DNA or RNA in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated molecular weight analysis and identification of the target sequence.

  11. Making sense of deep sequencing.

    PubMed

    Goldman, D; Domschke, K

    2014-10-01

    This review, the first of an occasional series, tries to make sense of the concepts and uses of deep sequencing of polynucleic acids (DNA and RNA). Deep sequencing, synonymous with next-generation sequencing, high-throughput sequencing and massively parallel sequencing, includes whole genome sequencing but is more often and diversely applied to specific parts of the genome captured in different ways, for example the highly expressed portion of the genome known as the exome and portions of the genome that are epigenetically marked either by DNA methylation, the binding of proteins including histones, or that are in different configurations and thus more or less accessible to enzymes that cleave DNA. Deep sequencing of RNA (RNASeq) reverse-transcribed to complementary DNA is invaluable for measuring RNA expression and detecting changes in RNA structure. Important concepts in deep sequencing include the length and depth of sequence reads, mapping and assembly of reads, sequencing error, haplotypes, and the propensity of deep sequencing, as with other types of 'big data', to generate large numbers of errors, requiring monitoring for methodologic biases and strategies for replication and validation. Deep sequencing yields a unique genetic fingerprint that can be used to identify a person, and a trove of predictors of genetic medical diseases. Deep sequencing to identify epigenetic events including changes in DNA methylation and RNA expression can reveal the history and impact of environmental exposures. Because of the power of sequencing to identify and deliver biomedically significant information about a person and their blood relatives, it creates ethical dilemmas and practical challenges in research and clinical care, for example the decision and procedures to report incidental findings that will increasingly and frequently be discovered. PMID:24925306

  12. Asteroid Ida Rotation Sequence

    NASA Technical Reports Server (NTRS)

    1994-01-01

    This montage of 14 images (the time order is right to left, bottom to top) shows Ida as it appeared in the field of view of Galileo's camera on August 28, 1993. Asteroid Ida rotates once every 4 hours, 39 minutes and clockwise when viewed from above the north pole; these images cover about one Ida 'day.' This sequence has been used to create a 3-D model that shows Ida to be almost croissant shaped. The earliest view (lower right) was taken from a range of 240,000 kilometers (150,000 miles), 5.4 hours before closest approach. The asteroid Ida draws its name from mythology, in which the Greek god Zeus was raised by the nymph Ida.

  13. Relay Sequence Generation Software

    NASA Technical Reports Server (NTRS)

    Gladden, Roy E.; Khanampompan, Teerapat

    2009-01-01

    Due to thermal and electromagnetic interactivity between the UHF (ultrahigh frequency) radio onboard the Mars Reconnaissance Orbiter (MRO), which performs relay sessions with the Martian landers, and the remainder of the MRO payloads, it is required to integrate and de-conflict relay sessions with the MRO science plan. The MRO relay SASF/PTF (spacecraft activity sequence file/ payload target file) generation software facilitates this process by generating a PTF that is needed to integrate the periods of time during which MRO supports relay activities with the rest of the MRO science plans. The software also generates the needed command products that initiate the relay sessions, some features of which are provided by the lander team, some are managed by MRO internally, and some being derived.

  14. Isotropic sequence order learning.

    PubMed

    Porr, Bernd; Wörgötter, Florentin

    2003-04-01

    In this article, we present an isotropic unsupervised algorithm for temporal sequence learning. No special reward signal is used such that all inputs are completely isotropic. All input signals are bandpass filtered before converging onto a linear output neuron. All synaptic weights change according to the correlation of bandpass-filtered inputs with the derivative of the output. We investigate the algorithm in an open- and a closed-loop condition, the latter being defined by embedding the learning system into a behavioral feedback loop. In the open-loop condition, we find that the linear structure of the algorithm allows analytically calculating the shape of the weight change, which is strictly heterosynaptic and follows the shape of the weight change curves found in spike-time-dependent plasticity. Furthermore, we show that synaptic weights stabilize automatically when no more temporal differences exist between the inputs without additional normalizing measures. In the second part of this study, the algorithm is is placed in an environment that leads to closed sensor-motor loop. To this end, a robot is programmed with a prewired retraction reflex reaction in response to collisions. Through isotropic sequence order (ISO) learning, the robot achieves collision avoidance by learning the correlation between his early range-finder signals and the later occurring collision signal. Synaptic weights stabilize at the end of learning as theoretically predicted. Finally, we discuss the relation of ISO learning with other drive reinforcement models and with the commonly used temporal difference learning algorithm. This study is followed up by a mathematical analysis of the closed-loop situation in the companion article in this issue, "ISO Learning Approximates a Solution to the Inverse-Controller Problem in an Unsupervised Behavioral Paradigm" (pp. 865-884). PMID:12689389

  15. Adversary Sequence Interruption Model

    Energy Science and Technology Software Center (ESTSC)

    1985-11-15

    PC EASI is an IBM personal computer or PC-compatible version of an analytical technique for measuring the effectiveness of physical protection systems. PC EASI utilizes a methodology called Estimate of Adversary Sequence Interruption (EASI) which evaluates the probability of interruption (PI) for a given sequence of adversary tasks. Probability of interruption is defined as the probability that the response force will arrive before the adversary force has completed its task. The EASI methodology is amore » probabilistic approach that analytically evaluates basic functions of the physical security system (detection, assessment, communications, and delay) with respect to response time along a single adversary path. It is important that the most critical scenarios for each target be identified to ensure that vulnerabilities have not been overlooked. If the facility is not overly complex, this can be accomplished by examining all paths. If the facility is complex, a global model such as Safeguards Automated Facility Evaluation (SAFE) may be used to identify the most vulnerable paths. PC EASI is menu-driven with screen forms for entering and editing the basic scenarios. In addition to evaluating PI for the basic scenario, the sensitivities of many of the parameters chosen in the scenario can be analyzed. These sensitivities provide information to aid the analyst in determining the tradeoffs for reducing the probability of interruption. PC EASI runs under the Micro Data Base Systems'' proprietary database management system Knowledgeman. KMAN provides the user environment and file management for the specified basic scenarios, and KGRAPH the graphical output of the sensitivity calculations. This software is not included. Due to errors in release 2 of KMAN, PC EASI will not execute properly; release 1.07 of KMAN is required.« less

  16. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  17. BAC sequencing using pooled methods.

    PubMed

    Saski, Christopher A; Feltus, F Alex; Parida, Laxmi; Haiminen, Niina

    2015-01-01

    Shotgun sequencing and assembly of a large, complex genome can be both expensive and challenging to accurately reconstruct the true genome sequence. Repetitive DNA arrays, paralogous sequences, polyploidy, and heterozygosity are main factors that plague de novo genome sequencing projects that typically result in highly fragmented assemblies and are difficult to extract biological meaning. Targeted, sub-genomic sequencing offers complexity reduction by removing distal segments of the genome and a systematic mechanism for exploring prioritized genomic content through BAC sequencing. If one isolates and sequences the genome fraction that encodes the relevant biological information, then it is possible to reduce overall sequencing costs and efforts that target a genomic segment. This chapter describes the sub-genome assembly protocol for an organism based upon a BAC tiling path derived from a genome-scale physical map or from fine mapping using BACs to target sub-genomic regions. Methods that are described include BAC isolation and mapping, DNA sequencing, and sequence assembly. PMID:25239741

  18. Turtle Graphics of Morphic Sequences

    NASA Astrophysics Data System (ADS)

    Zantema, Hans

    2016-02-01

    The simplest infinite sequences that are not ultimately periodic are pure morphic sequences: fixed points of particular morphisms mapping single symbols to strings of symbols. A basic way to visualize a sequence is by a turtle curve: for every alphabet symbol fix an angle, and then consecutively for all sequence elements draw a unit segment and turn the drawing direction by the corresponding angle. This paper investigates turtle curves of pure morphic sequences. In particular, criteria are given for turtle curves being finite (consisting of finitely many segments), and for being fractal or self-similar: it contains an up-scaled copy of itself. Also space-filling turtle curves are considered, and a turtle curve that is dense in the plane. As a particular result we give an exact relationship between the Koch curve and a turtle curve for the Thue-Morse sequence, where until now for such a result only approximations were known.

  19. Solid phase sequencing of biopolymers

    DOEpatents

    Cantor, Charles R.; Hubert, Koster

    2014-06-24

    This invention relates to methods for detecting and sequencing target nucleic acid sequences, to mass modified nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probes comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Probes may be affixed to a solid support such as a hybridization chip to facilitate automated molecular weight analysis and identification of the target sequence.

  20. Graphene nanodevices for DNA sequencing.

    PubMed

    Heerema, Stephanie J; Dekker, Cees

    2016-02-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with nanopores. Owing to its unique structure and properties, graphene provides interesting opportunities for the development of a new sequencing technology. In recent years, a wide range of creative ideas for graphene sequencers have been theoretically proposed and the first experimental demonstrations have begun to appear. Here, we review the different approaches to using graphene nanodevices for DNA sequencing, which involve DNA passing through graphene nanopores, nanogaps, and nanoribbons, and the physisorption of DNA on graphene nanostructures. We discuss the advantages and problems of each of these key techniques, and provide a perspective on the use of graphene in future DNA sequencing technology. PMID:26839258

  1. Graphene nanodevices for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Heerema, Stephanie J.; Dekker, Cees

    2016-02-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with nanopores. Owing to its unique structure and properties, graphene provides interesting opportunities for the development of a new sequencing technology. In recent years, a wide range of creative ideas for graphene sequencers have been theoretically proposed and the first experimental demonstrations have begun to appear. Here, we review the different approaches to using graphene nanodevices for DNA sequencing, which involve DNA passing through graphene nanopores, nanogaps, and nanoribbons, and the physisorption of DNA on graphene nanostructures. We discuss the advantages and problems of each of these key techniques, and provide a perspective on the use of graphene in future DNA sequencing technology.

  2. AB118. Validation of next generation sequencing by Sanger sequencing

    PubMed Central

    Low, Meow Hong Wendy; Lai, Hwei Meeng Angeline; Jamuar, Saumya Shekhar; Law, Hai Yang

    2015-01-01

    Background and objective Development of the next generation sequencing (NGS) platform was driven by the completion of the Human Genome Project in 2003. With the availability of NGS, the time taken for sequencing of humongous genomic regions was greatly reduced and data generated per unit DNA was also significantly increased. Though the cost to use NGS in a clinically setting is far from ideal, economically speaking, there is a significant decrease in the average cost per sequenced base. To validate findings of NGS on mutation detected for FBN1, TGFBR2, RAF1, RTEL1, LMNA, MID2, KCNK9, DMD, SMARCA2 and IQSEC2 by using gold standard, Sanger Sequencing. Methods The coordinate of the mutation identified by NGS was used to retrieve the adjacent genomic sequence in UCSC Genome Browser (Available from URL: https://genome.ucsc.edu/). Targeted primers were designed with Primer 3 software (Available from URL: http://primer3.ut.ee/) based on the genomic sequence obtained from UCSC. The following step involves the optimization of a Polymerase Chain Reaction (PCR) with the designed primers to amplify the desired DNA template for the targeted region. Upon optimization, the template is purified and subjected to dye terminator sequencing to generate multiple DNA fragments of varying sizes. Lastly, the DNA fragments will be purified and analysed with an automated sequencer. The sequencer separates the DNA fragments based on their size by carrying out capillary electrophoresis. Results A total of 28 cases were validated with Sanger sequencing. Of them, 25 (89.3%) cases concur with the findings from NGS and 3 (10.7%) cases were false-positive calls. Conclusions NGS shows promise in the future molecular diagnostic regime, however, at the present moment, it needs to be done concurrently with Sanger sequencing for clinical applications.

  3. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    SciTech Connect

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by patterns in

  4. Venter wins sequencing race - twice

    SciTech Connect

    Nowak, R.

    1995-06-02

    This article discusses the end of the race to sequence the first complete genome of a free-living organism. Craig Venter of the Institute for Geonomic Research unveiled the complete sequences of two bacteria: Haemophilus influenzae and Mycoplasma genitalium at the American Society of Microbiology Meeting in May 1995. Because there are many similarities in bacterial and human biochemistry, the sequences will be useful for searching for human genes.

  5. Sequence factorial and its applications

    NASA Astrophysics Data System (ADS)

    Asiru, Muniru A.

    2012-06-01

    In this note, we introduce sequence factorial and use this to study generalized M-bonomial coefficients. For the sequence of natural numbers, the twin concepts of sequence factorial and generalized M-bonomial coefficients, respectively, extend the corresponding concepts of factorial of an integer and binomial coefficients. Some latent properties of generalized M-bonomial coefficients by which a vast majority of practical problems involving generalized M-bonomial coefficients can be solved are derived.

  6. Quasi-Random Sequence Generators.

    Energy Science and Technology Software Center (ESTSC)

    1994-03-01

    Version 00 LPTAU generates quasi-random sequences. The sequences are uniformly distributed sets of L=2**30 points in the N-dimensional unit cube: I**N=[0,1]. The sequences are used as nodes for multidimensional integration, as searching points in global optimization, as trial points in multicriteria decision making, as quasi-random points for quasi Monte Carlo algorithms.

  7. Biosensors for DNA sequence detection

    NASA Technical Reports Server (NTRS)

    Vercoutere, Wenonah; Akeson, Mark

    2002-01-01

    DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.

  8. Representations of mechanical assembly sequences

    NASA Astrophysics Data System (ADS)

    Homem de Mello, Luiz S.; Sanderson, Arthur C.

    1991-04-01

    Five types of representations for assembly sequences are reviewed: the directed graph of feasible assembly sequences, the AND/OR graph of feasible assembly sequences, the set of establishment conditions, and two types of sets of precedence relationships. (precedence relationships between the establishment of one connection between parts and the establishment of another connection, and precedence relationships between the establishment of one connection and states of the assembly process). The mappings of one representation into the others are established. The correctness and completeness of these representations are established. The results presented are needed in the proof of correctness and completeness of algorithms for the generation of mechanical assembly sequences.

  9. SNMR pulse sequence phase cycling

    DOEpatents

    Walsh, David O; Grunewald, Elliot D

    2013-11-12

    Technologies applicable to SNMR pulse sequence phase cycling are disclosed, including SNMR acquisition apparatus and methods, SNMR processing apparatus and methods, and combinations thereof. SNMR acquisition may include transmitting two or more SNMR pulse sequences and applying a phase shift to a pulse in at least one of the pulse sequences, according to any of a variety cycling techniques. SNMR processing may include combining SNMR from a plurality of pulse sequences comprising pulses of different phases, so that desired signals are preserved and indesired signals are canceled.

  10. Sequence independent amplification of DNA

    DOEpatents

    Bohlander, S.K.

    1998-03-24

    The present invention is a rapid sequence-independent amplification procedure (SIA). Even minute amounts of DNA from various sources can be amplified independent of any sequence requirements of the DNA or any a priori knowledge of any sequence characteristics of the DNA to be amplified. This method allows, for example, the sequence independent amplification of microdissected chromosomal material and the reliable construction of high quality fluorescent in situ hybridization (FISH) probes from YACs or from other sources. These probes can be used to localize YACs on metaphase chromosomes but also--with high efficiency--in interphase nuclei. 25 figs.

  11. Establishing homologies in protein sequences

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.; Barker, W. C.; Hunt, L. T.

    1983-01-01

    Computer-based statistical techniques used to determine homologies between proteins occurring in different species are reviewed. The technique is based on comparison of two protein sequences, either by relating all segments of a given length in one sequence to all segments of the second or by finding the best alignment of the two sequences. Approaches discussed include selection using printed tabulations, identification of very similar sequences, and computer searches of a database. The use of the SEARCH, RELATE, and ALIGN programs (Dayhoff, 1979) is explained; sample data are presented in graphs, diagrams, and tables and the construction of scoring matrices is considered.

  12. Representations of mechanical assembly sequences

    NASA Technical Reports Server (NTRS)

    Homem De Mello, Luiz S.; Sanderson, Arthur C.

    1991-01-01

    Five types of representations for assembly sequences are reviewed: the directed graph of feasible assembly sequences, the AND/OR graph of feasible assembly sequences, the set of establishment conditions, and two types of sets of precedence relationships. (precedence relationships between the establishment of one connection between parts and the establishment of another connection, and precedence relationships between the establishment of one connection and states of the assembly process). The mappings of one representation into the others are established. The correctness and completeness of these representations are established. The results presented are needed in the proof of correctness and completeness of algorithms for the generation of mechanical assembly sequences.

  13. Sequence independent amplification of DNA

    DOEpatents

    Bohlander, Stefan K.

    1998-01-01

    The present invention is a rapid sequence-independent amplification procedure (SIA). Even minute amounts of DNA from various sources can be amplified independent of any sequence requirements of the DNA or any a priori knowledge of any sequence characteristics of the DNA to be amplified. This method allows, for example the sequence independent amplification of microdissected chromosomal material and the reliable construction of high quality fluorescent in situ hybridization (FISH) probes from YACs or from other sources. These probes can be used to localize YACs on metaphase chromosomes but also--with high efficiency--in interphase nuclei.

  14. Complementary DNA sequencing: Expressed sequence tags and human genome project

    SciTech Connect

    Adams, M.D.; Kelley, J.M.; Gocayne, J.D.; Dubnick, M.; Wu, A.; Olde, B.; Moreno, R.F.; Kerlavage, A.R.; McCombie, W.R.; Venter, J.C. ); Polymeropoulos, M.H.; Hong Xiao; Merril, C.R. )

    1991-06-21

    Automated partial DNA sequencing was conducted on more than 600 randomly selected human brain complementary DNA (cDNA) clones to generate expressed sequence tags (ESTs). ESTs have applications in the discovery of new human genes, mapping of the human genome, and identification of coding regions in genomic sequences. Of the sequences generated, 337 represent new genes, including 48 with significant similarity to genes from other organisms, such as a yeast RNA polymerase II subunit; Drosophila kinesin, Notch, and Enhancer of split; and a murine tyrosine kinase receptor. Forty-six ESTs were mapped to chromosomes after amplification by the polymerase chain reaction. This fast approach to cDNA characterization will facilitate the tagging of most human genes in a few years at a fraction of the cost of complete genomic sequencing, provide new genetic markers, and serve as a resource in diverse biological research fields.

  15. Automated Sequence Preprocessing in a Large-Scale Sequencing Environment

    PubMed Central

    Wendl, Michael C.; Dear, Simon; Hodgson, Dave; Hillier, LaDeana

    1998-01-01

    A software system for transforming fragments from four-color fluorescence-based gel electrophoresis experiments into assembled sequence is described. It has been developed for large-scale processing of all trace data, including shotgun and finishing reads, regardless of clone origin. Design considerations are discussed in detail, as are programming implementation and graphic tools. The importance of input validation, record tracking, and use of base quality values is emphasized. Several quality analysis metrics are proposed and applied to sample results from recently sequenced clones. Such quantities prove to be a valuable aid in evaluating modifications of sequencing protocol. The system is in full production use at both the Genome Sequencing Center and the Sanger Centre, for which combined weekly production is ∼100,000 sequencing reads per week. PMID:9750196

  16. AMPLIFICATION OF RIBOSOMAL RNA SEQUENCES

    EPA Science Inventory

    This book chapter offers an overview of the use of ribosomal RNA sequences. A history of the technology traces the evolution of techniques to measure bacterial phylogenetic relationships and recent advances in obtaining rRNA sequence information. The manual also describes procedu...

  17. Mass genotyping by sequencing technology

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Large scale genotyping of a moderate number of loci is cost prohibitive with current chip-based technologies. We demonstrate the ability to use next generation sequencing technologies to genotype many DNA samples for a moderate number of loci – a mass genotyping by sequencing technology (MGST). Ou...

  18. Sequence in the Social Studies

    ERIC Educational Resources Information Center

    Ediger, Marlow

    2010-01-01

    Quality sequence in the social studies is of utmost importance. Sequence emphasizes "when" selected concepts should be stressed in ongoing lessons and units of study. The social studies teacher needs to observe pupils carefully in teaching and learning situations to ascertain suitable, ordered experiences for pupils. Pupils face frustration if the…

  19. Diesel Mechanics: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a diesel mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  20. Assembly of shotgun sequencing data

    SciTech Connect

    Huang, Xiaoqiu

    1996-12-31

    We present a simple algorithm for construction of the DNA sequence from a set of fragments generated in a shotgun sequencing project. The algorithm is based on rigorous detection of overlaps among fragments. We report assembly results of the algorithm on two genomic data sets. 14 refs., 1 fig.

  1. Health Occupations: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 3-year program in health occupations. The guide consists of a course description; general course…

  2. Urban Horticulture: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 4-year program in urban horticulture. The guide consists of a course description; general course…

  3. DNA Sequencing by Capillary Electrophoresis

    PubMed Central

    Karger, Barry L.; Guttman, Andras

    2009-01-01

    Sequencing of human and other genomes has been at the center of interest in the biomedical field over the past several decades and is now leading toward an era of personalized medicine. During this time, DNA sequencing methods have evolved from the labor intensive slab gel electrophoresis, through automated multicapillary electrophoresis systems using fluorophore labeling with multispectral imaging, to the “next generation” technologies of cyclic array, hybridization based, nanopore and single molecule sequencing. Deciphering the genetic blueprint and follow-up confirmatory sequencing of Homo sapiens and other genomes was only possible by the advent of modern sequencing technologies that was a result of step by step advances with a contribution of academics, medical personnel and instrument companies. While next generation sequencing is moving ahead at break-neck speed, the multicapillary electrophoretic systems played an essential role in the sequencing of the Human Genome, the foundation of the field of genomics. In this prospective, we wish to overview the role of capillary electrophoresis in DNA sequencing based in part of several of our articles in this journal. PMID:19517496

  4. Aircraft Mechanics: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for an aircraft mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and…

  5. VOE Accounting: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 2-year program in accounting. The guide consists of a course description; general course objectives;…

  6. Auto Mechanics: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for an auto mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  7. Recently published protein sequences. I.

    NASA Technical Reports Server (NTRS)

    Jukes, T. H.; Holmquist, R.

    1972-01-01

    Some polypeptide sequences that have been published in the 1972 scientific literature are listed. Only selected sequences are included. The compilation has two objectives. Current information between periods when more comprehensive compilations are published is to be assembled and the use of data that do not include arrangements of unsequenced peptides for 'maximum homology' is to be encouraged.

  8. Sequence Learning and Selection Difficulty

    ERIC Educational Resources Information Center

    Rowland, Lee A.; Shanks, David R.

    2006-01-01

    The authors studied the role of attention as a selection mechanism in implicit learning by examining the effect on primary sequence learning of performing a demanding target-selection task. Participants were trained on probabilistic sequences in a novel version of the serial reaction time (SRT) task, with dual- and triple-stimulus participants…

  9. Commercial Photography: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a commercial photography vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents,…

  10. Rapid Diagnostics of Onboard Sequences

    NASA Technical Reports Server (NTRS)

    Starbird, Thomas W.; Morris, John R.; Shams, Khawaja S.; Maimone, Mark W.

    2012-01-01

    Keeping track of sequences onboard a spacecraft is challenging. When reviewing Event Verification Records (EVRs) of sequence executions on the Mars Exploration Rover (MER), operators often found themselves wondering which version of a named sequence the EVR corresponded to. The lack of this information drastically impacts the operators diagnostic capabilities as well as their situational awareness with respect to the commands the spacecraft has executed, since the EVRs do not provide argument values or explanatory comments. Having this information immediately available can be instrumental in diagnosing critical events and can significantly enhance the overall safety of the spacecraft. This software provides auditing capability that can eliminate that uncertainty while diagnosing critical conditions. Furthermore, the Restful interface provides a simple way for sequencing tools to automatically retrieve binary compiled sequence SCMFs (Space Command Message Files) on demand. It also enables developers to change the underlying database, while maintaining the same interface to the existing applications. The logging capabilities are also beneficial to operators when they are trying to recall how they solved a similar problem many days ago: this software enables automatic recovery of SCMF and RML (Robot Markup Language) sequence files directly from the command EVRs, eliminating the need for people to find and validate the corresponding sequences. To address the lack of auditing capability for sequences onboard a spacecraft during earlier missions, extensive logging support was added on the Mars Science Laboratory (MSL) sequencing server. This server is responsible for generating all MSL binary SCMFs from RML input sequences. The sequencing server logs every SCMF it generates into a MySQL database, as well as the high-level RML file and dictionary name inputs used to create the SCMF. The SCMF is then indexed by a hash value that is automatically included in all command

  11. Decoding the human genome sequence.

    PubMed

    Bentley, D R

    2000-10-01

    The year 2000 is marked by the production of the sequence of the human genome. A 'working draft' of high quality sequence covering 90% of the genome has been determined and a quarter is in finished form, including the first two completed chromosomes. All sequence data from the project is made freely available to the community via the Internet, for further analysis and exploitation. The challenge which lies ahead is to decipher the information. Knowledge of the human genome sequence will enable us to understand how the genetic information determines the development, structure and function of the human body. We will be able to explore how variations within our DNA sequence cause disease, how they affect our interaction with our environment and ultimately to develop new and effective ways to improve human health. PMID:11005789

  12. Poisson process approximation for sequence repeats, and sequencing by hybridization.

    PubMed

    Arratia, R; Martin, D; Reinert, G; Waterman, M S

    1996-01-01

    Sequencing by hybridization is a tool to determine a DNA sequence from the unordered list of all l-tuples contained in this sequence; typical numbers for l are l = 8, 10, 12. For theoretical purposes we assume that the multiset of all l-tuples is known. This multiset determines the DNA sequence uniquely if none of the so-called Ukkonen transformations are possible. These transformations require repeats of (l-1)-tuples in the sequence, with these repeats occurring in certain spatial patterns. We model DNA as an i.i.d. sequence. We first prove Poisson process approximations for the process of indicators of all leftmost long repeats allowing self-overlap and for the process of indicators of all left-most long repeats without self-overlap. Using the Chen-Stein method, we get bounds on the error of these approximations. As a corollary, we approximate the distribution of longest repeats. In the second step we analyze the spatial patterns of the repeats. Finally we combine these two steps to prove an approximation for the probability that a random sequence is uniquely recoverable from its list of l-tuples. For all our results we give some numerical examples including error bounds. PMID:8891959

  13. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    PubMed Central

    Cosentino, Salvatore; Rasmussen, Simon; Friis, Carsten; Hasman, Henrik; Marvig, Rasmus Lykke; Jelsbak, Lars; Sicheritz-Pontén, Thomas; Ussery, David W.; Aarestrup, Frank M.; Lund, Ole

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the “gold standard” of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST. PMID:22238442

  14. Multilocus sequence typing of total-genome-sequenced bacteria.

    PubMed

    Larsen, Mette V; Cosentino, Salvatore; Rasmussen, Simon; Friis, Carsten; Hasman, Henrik; Marvig, Rasmus Lykke; Jelsbak, Lars; Sicheritz-Pontén, Thomas; Ussery, David W; Aarestrup, Frank M; Lund, Ole

    2012-04-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST. PMID:22238442

  15. Default processing of event sequences.

    PubMed

    Hymel, Alicia; Levin, Daniel T; Baker, Lewis J

    2016-02-01

    In a wide range of circumstances, it is important to perceive and represent the sequence of events. For example, sequence perception is necessary to learn statistical contingencies between events, and to generate predictions about events when segmenting actions. However, viewer's awareness of event sequence is rarely tested, and at least some means of encoding event sequence are likely to be resource-intensive. Therefore, previous research may have overestimated the degree to which viewers are aware of specific event sequences. In the experiments reported here, we tested viewers' ability to detect anomalies during visual event sequences. Participants viewed videos containing events that either did or did not contain an out-of-order action. Participants were unable to consistently detect the misordered events, and performance on the task decreased significantly to very low levels when performing a secondary task. In addition, participants almost never detected misorderings in an incidental version of the task, and performance increased when videos ended immediately after the misordering, We argue that these results demonstrate that viewers can effectively perceive the elements of events, but do not consistently test their expectations about the specific sequence of natural events unless bidden to do so by task-specific demands. (PsycINFO Database Record PMID:26348070

  16. Image analysis for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Palaniappan, Kannappan; Huang, Thomas S.

    1991-07-01

    There is a great deal of interest in automating the process of DNA (deoxyribonucleic acid) sequencing to support the analysis of genomic DNA such as the Human and Mouse Genome projects. In one class of gel-based sequencing protocols autoradiograph images are generated in the final step and usually require manual interpretation to reconstruct the DNA sequence represented by the image. The need to handle a large volume of sequence information necessitates automation of the manual autoradiograph reading step through image analysis in order to reduce the length of time required to obtain sequence data and reduce transcription errors. Various adaptive image enhancement, segmentation and alignment methods were applied to autoradiograph images. The methods are adaptive to the local characteristics of the image such as noise, background signal, or presence of edges. Once the two-dimensional data is converted to a set of aligned one-dimensional profiles waveform analysis is used to determine the location of each band which represents one nucleotide in the sequence. Different classification strategies including a rule-based approach are investigated to map the profile signals, augmented with the original two-dimensional image data as necessary, to textual DNA sequence information.

  17. Phylogenetic Analysis of Poliovirus Sequences.

    PubMed

    Jorba, Jaume

    2016-01-01

    Comparative genomic sequencing is a major surveillance tool in the Polio Laboratory Network. Due to the rapid evolution of polioviruses (~1 % per year), pathways of virus transmission can be reconstructed from the pathways of genomic evolution. Here, we describe three main phylogenetic methods; estimation of genetic distances, reconstruction of a maximum-likelihood (ML) tree, and estimation of substitution rates using Bayesian Markov chain Monte Carlo (MCMC). The data set used consists of complete capsid sequences from a survey of poliovirus sequences available in GenBank. PMID:26983737

  18. Study Design for Sequencing Studies.

    PubMed

    Honaas, Loren A; Altman, Naomi S; Krzywinski, Martin

    2016-01-01

    Once a biochemical method has been devised to sample RNA or DNA of interest, sequencing can be used to identify the sampled molecules with high fidelity and low bias. High-throughput sequencing has therefore become the primary data acquisition method for many genomics studies and is being used more and more to address molecular biology questions. By applying principles of statistical experimental design, sequencing experiments can be made more sensitive to the effects under study as well as more biologically sound, hence more replicable. PMID:27008009

  19. Pythagorean Triples from Harmonic Sequences.

    ERIC Educational Resources Information Center

    DiDomenico, Angelo S.; Tanner, Randy J.

    2001-01-01

    Shows how all primitive Pythagorean triples can be generated from harmonic sequences. Use inductive and deductive reasoning to explore how Pythagorean triples are connected with another area of mathematics. (KHR)

  20. The Dynamics of DNA Sequencing.

    ERIC Educational Resources Information Center

    Morvillo, Nancy

    1997-01-01

    Describes a paper-and-pencil activity that helps students understand DNA sequencing and expands student understanding of DNA structure, replication, and gel electrophoresis. Appropriate for advanced biology students who are familiar with the Sanger method. (DDR)

  1. Mining protein sequences for motifs.

    PubMed

    Narasimhan, Giri; Bu, Changsong; Gao, Yuan; Wang, Xuning; Xu, Ning; Mathee, Kalai

    2002-01-01

    We use methods from Data Mining and Knowledge Discovery to design an algorithm for detecting motifs in protein sequences. The algorithm assumes that a motif is constituted by the presence of a "good" combination of residues in appropriate locations of the motif. The algorithm attempts to compile such good combinations into a "pattern dictionary" by processing an aligned training set of protein sequences. The dictionary is subsequently used to detect motifs in new protein sequences. Statistical significance of the detection results are ensured by statistically determining the various parameters of the algorithm. Based on this approach, we have implemented a program called GYM. The Helix-Turn-Helix motif was used as a model system on which to test our program. The program was also extended to detect Homeodomain motifs. The detection results for the two motifs compare favorably with existing programs. In addition, the GYM program provides a lot of useful information about a given protein sequence. PMID:12487759

  2. Paucity of moderately repetitive sequences

    SciTech Connect

    Schmid, C.W.

    1991-01-01

    We examined clones of renatured repetitive human DNA to find novel repetitive DNAs. After eliminating known repeats, the remaining clones were subjected to sequence analysis. These clones also corresponded to known repeats, but with greater sequence diversity. This indicates that either these libraries were depleted of short interspersed repeats in construction, or these repeats are much less prevalent in the human genome than is indicated by data from {und Xenopus} or sea urchin studies. We directly investigated the sequence composition of human DNA through traditional renaturation techniques with the goal of estimating the limits of abundance of repetitive sequence classes in human DNA. Our results sharply limit the maximum possible abundance to 1--2% of the human genome. Our estimate, minus the known repeats in this fraction, leaves about 1% (3 {times} 10{sup 7} nucleotides) of the human genome for novel repetitive elements. 2 refs. (MHB)

  3. Expressed sequence tags: an overview.

    PubMed

    Parkinson, John; Blaxter, Mark

    2009-01-01

    Expressed sequence tags (ESTs) are fragments of mRNA sequences derived through single sequencing reactions performed on randomly selected clones from cDNA libraries. To date, over 45 million ESTs have been generated from over 1400 different species of eukaryotes. For the most part, EST projects are used to either complement existing genome projects or serve as low-cost alternatives for purposes of gene discovery. However, with improvements in accuracy and coverage, they are beginning to find application in fields such as phylogenetics, transcript profiling and proteomics. This volume provides practical details on the generation and analysis of ESTs. Chapters are presented which cover creation of cDNA libraries; generation and processing of sequence data; bioinformatics analysis of ESTs; and their application to phylogenetics and transcript profiling. PMID:19277571

  4. Guitars, Violins, and Geometric Sequences

    ERIC Educational Resources Information Center

    Barger, Rita; Haehl, Martha

    2007-01-01

    This article describes middle school mathematics activities that relate measurement, ratios, and geometric sequences to finger positions or the placement of frets on stringed musical instruments. (Contains 2 figures and 2 tables.)

  5. Rover Sequencing and Visualization Program

    NASA Technical Reports Server (NTRS)

    Cooper, Brian; Hartman, Frank; Maxwell, Scott; Yen, Jeng; Wright, John; Balacuit, Carlos

    2005-01-01

    The Rover Sequencing and Visualization Program (RSVP) is the software tool for use in the Mars Exploration Rover (MER) mission for planning rover operations and generating command sequences for accomplishing those operations. RSVP combines three-dimensional (3D) visualization for immersive exploration of the operations area, stereoscopic image display for high-resolution examination of the downlinked imagery, and a sophisticated command-sequence editing tool for analysis and completion of the sequences. RSVP is linked with actual flight-code modules for operations rehearsal to provide feedback on the expected behavior of the rover prior to committing to a particular sequence. Playback tools allow for review of both rehearsed rover behavior and downlinked results of actual rover operations. These can be displayed simultaneously for comparison of rehearsed and actual activities for verification. The primary inputs to RSVP are downlink data products from the Operations Storage Server (OSS) and activity plans generated by the science team. The activity plans are high-level goals for the next day s activities. The downlink data products include imagery, terrain models, and telemetered engineering data on rover activities and state. The Rover Sequence Editor (RoSE) component of RSVP performs activity expansion to command sequences, command creation and editing with setting of command parameters, and viewing and management of rover resources. The HyperDrive component of RSVP performs 2D and 3D visualization of the rover s environment, graphical and animated review of rover-predicted and telemetered state, and creation and editing of command sequences related to mobility and Instrument Deployment Device (IDD) operations. Additionally, RoSE and HyperDrive together evaluate command sequences for potential violations of flight and safety rules. The products of RSVP include command sequences for uplink that are stored in the Distributed Object Manager (DOM) and predicted rover

  6. DNA Sequences at a Glance

    PubMed Central

    Pinho, Armando J.; Garcia, Sara P.; Pratas, Diogo; Ferreira, Paulo J. S. G.

    2013-01-01

    Data summarization and triage is one of the current top challenges in visual analytics. The goal is to let users visually inspect large data sets and examine or request data with particular characteristics. The need for summarization and visual analytics is also felt when dealing with digital representations of DNA sequences. Genomic data sets are growing rapidly, making their analysis increasingly more difficult, and raising the need for new, scalable tools. For example, being able to look at very large DNA sequences while immediately identifying potentially interesting regions would provide the biologist with a flexible exploratory and analytical tool. In this paper we present a new concept, the “information profile”, which provides a quantitative measure of the local complexity of a DNA sequence, independently of the direction of processing. The computation of the information profiles is computationally tractable: we show that it can be done in time proportional to the length of the sequence. We also describe a tool to compute the information profiles of a given DNA sequence, and use the genome of the fission yeast Schizosaccharomyces pombe strain 972 h− and five human chromosomes 22 for illustration. We show that information profiles are useful for detecting large-scale genomic regularities by visual inspection. Several discovery strategies are possible, including the standalone analysis of single sequences, the comparative analysis of sequences from individuals from the same species, and the comparative analysis of sequences from different organisms. The comparison scale can be varied, allowing the users to zoom-in on specific details, or obtain a broad overview of a long segment. Software applications have been made available for non-commercial use at http://bioinformatics.ua.pt/software/dna-at-glance. PMID:24278218

  7. DNA sequences at a glance.

    PubMed

    Pinho, Armando J; Garcia, Sara P; Pratas, Diogo; Ferreira, Paulo J S G

    2013-01-01

    Data summarization and triage is one of the current top challenges in visual analytics. The goal is to let users visually inspect large data sets and examine or request data with particular characteristics. The need for summarization and visual analytics is also felt when dealing with digital representations of DNA sequences. Genomic data sets are growing rapidly, making their analysis increasingly more difficult, and raising the need for new, scalable tools. For example, being able to look at very large DNA sequences while immediately identifying potentially interesting regions would provide the biologist with a flexible exploratory and analytical tool. In this paper we present a new concept, the "information profile", which provides a quantitative measure of the local complexity of a DNA sequence, independently of the direction of processing. The computation of the information profiles is computationally tractable: we show that it can be done in time proportional to the length of the sequence. We also describe a tool to compute the information profiles of a given DNA sequence, and use the genome of the fission yeast Schizosaccharomyces pombe strain 972 h(-) and five human chromosomes 22 for illustration. We show that information profiles are useful for detecting large-scale genomic regularities by visual inspection. Several discovery strategies are possible, including the standalone analysis of single sequences, the comparative analysis of sequences from individuals from the same species, and the comparative analysis of sequences from different organisms. The comparison scale can be varied, allowing the users to zoom-in on specific details, or obtain a broad overview of a long segment. Software applications have been made available for non-commercial use at http://bioinformatics.ua.pt/software/dna-at-glance. PMID:24278218

  8. Sequenced drive for rotary valves

    DOEpatents

    Mittell, Larry C.

    1981-01-01

    A sequenced drive for rotary valves which provides the benefits of applying rotary and linear motions to the movable sealing element of the valve. The sequenced drive provides a close approximation of linear motion while engaging or disengaging the movable element with the seat minimizing wear and damage due to scrubbing action. The rotary motion of the drive swings the movable element out of the flowpath thus eliminating obstruction to flow through the valve.

  9. Sequencing Centers Panel at SFAF

    SciTech Connect

    Schilkey, Faye; Ali, Johar; Grafham, Darren; Muzny, Donna; Fulton, Bob; Fitzgerald, Mike; Hostetler, Jessica; Daum, Chris

    2010-06-02

    From left to right: Faye Schilkey of NCGR, Johar Ali of OICR, Darren Grafham of Wellcome Trust Sanger Institute, Donna Muzny of the Baylor College of Medicine, Bob Fulton of Washington University, Mike Fitzgerald of the Broad Institute, Jessica Hostetler of the J. Craig Venter Institute and Chris Daum of the DOE Joint Genome Institute discuss sequencing technologies, applications and pipelines on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  10. Structural Complexity of DNA Sequence

    PubMed Central

    Liou, Cheng-Yuan; Cheng, Wei-Chen; Tsai, Huai-Ying

    2013-01-01

    In modern bioinformatics, finding an efficient way to allocate sequence fragments with biological functions is an important issue. This paper presents a structural approach based on context-free grammars extracted from original DNA or protein sequences. This approach is radically different from all those statistical methods. Furthermore, this approach is compared with a topological entropy-based method for consistency and difference of the complexity results. PMID:23662161

  11. Poultry Genome Sequences: Progress and Outstanding Challenges

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The first build of the chicken genome sequence appeared in March 2004 – the first genome sequence of any animal agriculture species. That sequence was done primarily by whole genome shotgun Sanger sequencing, along with the use of an extensive BAC contig-based physical map to assemble the sequence ...

  12. Genomic sequencing in clinical trials

    PubMed Central

    2011-01-01

    Human genome sequencing is the process by which the exact order of nucleic acid base pairs in the 24 human chromosomes is determined. Since the completion of the Human Genome Project in 2003, genomic sequencing is rapidly becoming a major part of our translational research efforts to understand and improve human health and disease. This article reviews the current and future directions of clinical research with respect to genomic sequencing, a technology that is just beginning to find its way into clinical trials both nationally and worldwide. We highlight the currently available types of genomic sequencing platforms, outline the advantages and disadvantages of each, and compare first- and next-generation techniques with respect to capabilities, quality, and cost. We describe the current geographical distributions and types of disease conditions in which these technologies are used, and how next-generation sequencing is strategically being incorporated into new and existing studies. Lastly, recent major breakthroughs and the ongoing challenges of using genomic sequencing in clinical research are discussed. PMID:22206293

  13. Elimination sequence optimization for SPAR

    NASA Technical Reports Server (NTRS)

    Hogan, Harry A.

    1986-01-01

    SPAR is a large-scale computer program for finite element structural analysis. The program allows user specification of the order in which the joints of a structure are to be eliminated since this order can have significant influence over solution performance, in terms of both storage requirements and computer time. An efficient elimination sequence can improve performance by over 50% for some problems. Obtaining such sequences, however, requires the expertise of an experienced user and can take hours of tedious effort to affect. Thus, an automatic elimination sequence optimizer would enhance productivity by reducing the analysts' problem definition time and by lowering computer costs. Two possible methods for automating the elimination sequence specifications were examined. Several algorithms based on the graph theory representations of sparse matrices were studied with mixed results. Significant improvement in the program performance was achieved, but sequencing by an experienced user still yields substantially better results. The initial results provide encouraging evidence that the potential benefits of such an automatic sequencer would be well worth the effort.

  14. Genome Sequence of Canine Herpesvirus

    PubMed Central

    Papageorgiou, Konstantinos V.; Suárez, Nicolás M.; Wilkie, Gavin S.; McDonald, Michael; Graham, Elizabeth M.; Davison, Andrew J.

    2016-01-01

    Canine herpesvirus is a widespread alphaherpesvirus that causes a fatal haemorrhagic disease of neonatal puppies. We have used high-throughput methods to determine the genome sequences of three viral strains (0194, V777 and V1154) isolated in the United Kingdom between 1985 and 2000. The sequences are very closely related to each other. The canine herpesvirus genome is estimated to be 125 kbp in size and consists of a unique long sequence (97.5 kbp) and a unique short sequence (7.7 kbp) that are each flanked by terminal and internal inverted repeats (38 bp and 10.0 kbp, respectively). The overall nucleotide composition is 31.6% G+C, which is the lowest among the completely sequenced alphaherpesviruses. The genome contains 76 open reading frames predicted to encode functional proteins, all of which have counterparts in other alphaherpesviruses. The availability of the sequences will facilitate future research on the diagnosis and treatment of canine herpesvirus-associated disease. PMID:27213534

  15. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    PubMed

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC. PMID:23589541

  16. Sequence Factorization with Multiple References

    PubMed Central

    Wandelt, Sebastian; Leser, Ulf

    2015-01-01

    The success of high-throughput sequencing has lead to an increasing number of projects which sequence large populations of a species. Storage and analysis of sequence data is a key challenge in these projects, because of the sheer size of the datasets. Compression is one simple technology to deal with this challenge. Referential factorization and compression schemes, which store only the differences between input sequence and a reference sequence, gained lots of interest in this field. Highly-similar sequences, e.g., Human genomes, can be compressed with a compression ratio of 1,000:1 and more, up to two orders of magnitude better than with standard compression techniques. Recently, it was shown that the compression against multiple references from the same species can boost the compression ratio up to 4,000:1. However, a detailed analysis of using multiple references is lacking, e.g., for main memory consumption and optimality. In this paper, we describe one key technique for the referential compression against multiple references: The factorization of sequences. Based on the notion of an optimal factorization, we propose optimization heuristics and identify parameter settings which greatly influence 1) the size of the factorization, 2) the time for factorization, and 3) the required amount of main memory. We evaluate a total of 30 setups with a varying number of references on data from three different species. Our results show a wide range of factorization sizes (optimal to an overhead of up to 300%), factorization speed (0.01 MB/s to more than 600 MB/s), and main memory usage (few dozen MB to dozens of GB). Based on our evaluation, we identify the best configurations for common use cases. Our evaluation shows that multi-reference factorization is much better than single-reference factorization. PMID:26422374

  17. A model of random sequences for de novo peptide sequencing

    SciTech Connect

    Jarman, Kenneth D.; Cannon, William R.; Jarman, Kristin H.; Heredia-Langner, Alejandro

    2003-04-15

    We present a model for the probability of random sequences appearing in product ion spectra obtained from tandem mass spectrometry experiments using collision-induced dissociation. We demonstrate the use of these probabilities for ranking candidate peptide sequences obtained using a de novo algorithm. Sequence candidates are obtained from a spectrum graph that is greatly reduced in size from those in previous graph-theoretical de novo approaches. Evidence of multiple instances of subsequences of each candidate, due to different fragment ion type series as well as isotopic peaks, is incorporated in a hierarchical scoring scheme. This approach is shown to be useful for confirming results from database search and as a first step towards a statistically rigorous de novo algorithm.

  18. Sequence change and phylogenetic signal in muscoid COII DNA sequences.

    PubMed

    Szalanski, Allen L; Owens, Carrie B

    2003-08-01

    The complete DNA sequence of the mtDNA cytochrome oxidase II gene from house fly, Musca domestica, face fly, Musca autumnalis, stable fly, Stomoxys calcitrans, horn fly, Haematobia irritans, and black garbage fly, Hydrotaea aenescens, are reported. The nucleotide sequence codes for a 229 amino acid peptide. The COII sequence is A + T rich (74.1%), with up to 12.3% nucleotide and 8.4% amino acid divergence among the five taxa. Of the 688 nucleotides encoding for the gene, 135 nucleotide sites (19.6%) are variable, and 55 (8.0%) are phylogenetically informative. A phylogenetic analysis using three calliphorids as the outgroup taxa, indicates that the two haematophagus species, horn fly and stable fly, form a sister group. PMID:14631656

  19. Whole Chloroplast Genome Sequencing in Fragaria Using Deep Sequencing: A Comparison of Three Methods

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Chloroplast sequences previously investigated in Fragaria revealed low amounts of variation. Deep sequencing technologies enable economical sequencing of complete chloroplast genomes. These sequences can potentially provide robust phylogenetic resolution, even at low taxonomic levels within plant gr...

  20. Predicting the molecular complexity of sequencing libraries.

    PubMed

    Daley, Timothy; Smith, Andrew D

    2013-04-01

    Predicting the molecular complexity of a genomic sequencing library is a critical but difficult problem in modern sequencing applications. Methods to determine how deeply to sequence to achieve complete coverage or to predict the benefits of additional sequencing are lacking. We introduce an empirical bayesian method to accurately characterize the molecular complexity of a DNA sample for almost any sequencing application on the basis of limited preliminary sequencing. PMID:23435259

  1. Engineered DNA sequence syntax inspector.

    PubMed

    Hsiau, Timothy Hwei-Chung; Anderson, J Christopher

    2014-02-21

    DNAs encoding polypeptides often contain design errors that cause experiments to prematurely fail. One class of design errors is incorrect or missing elements in the DNA, here termed syntax errors. We have identified three major causes of syntax errors: point mutations from sequencing or manual data entry, gene structure misannotation, and unintended open reading frames (ORFs). The Engineered DNA Sequence Syntax Inspector (EDSSI) is an online bioinformatics pipeline that checks for syntax errors through three steps. First, ORF prediction in input DNA sequences is done by GeneMark; next, homologous sequences are retrieved by BLAST, and finally, syntax errors in the protein sequence are predicted by using the SIFT algorithm. We show that the EDSSI is able to identify previously published examples of syntactical errors and also show that our indel addition to the SIFT program is 97% accurate on a test set of Escherichia coli proteins. The EDSSI is available at http://andersonlab.qb3.berkeley.edu/Software/EDSSI/ . PMID:24364864

  2. Statistical properties of DNA sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

    1995-01-01

    We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.

  3. Sequencing Needs for Viral Diagnostics

    SciTech Connect

    Gardner, S N; Lam, M; Mulakken, N J; Torres, C L; Smith, J R; Slezak, T

    2004-01-26

    We built a system to guide decisions regarding the amount of genomic sequencing required to develop diagnostic DNA signatures, which are short sequences that are sufficient to uniquely identify a viral species. We used our existing DNA diagnostic signature prediction pipeline, which selects regions of a target species genome that are conserved among strains of the target (for reliability, to prevent false negatives) and unique relative to other species (for specificity, to avoid false positives). We performed simulations, based on existing sequence data, to assess the number of genome sequences of a target species and of close phylogenetic relatives (''near neighbors'') that are required to predict diagnostic signature regions that are conserved among strains of the target species and unique relative to other bacterial and viral species. For DNA viruses such as variola (smallpox), three target genomes provide sufficient guidance for selecting species-wide signatures. Three near neighbor genomes are critical for species specificity. In contrast, most RNA viruses require four target genomes and no near neighbor genomes, since lack of conservation among strains is more limiting than uniqueness. SARS and Ebola Zaire are exceptional, as additional target genomes currently do not improve predictions, but near neighbor sequences are urgently needed. Our results also indicate that double stranded DNA viruses are more conserved among strains than are RNA viruses, since in most cases there was at least one conserved signature candidate for the DNA viruses and zero conserved signature candidates for the RNA viruses.

  4. Detecting selection in immunoglobulin sequences.

    PubMed

    Uduman, Mohamed; Yaari, Gur; Hershberg, Uri; Stern, Jacob A; Shlomchik, Mark J; Kleinstein, Steven H

    2011-07-01

    The ability to detect selection by analyzing mutation patterns in experimentally derived immunoglobulin (Ig) sequences is a critical part of many studies. Such techniques are useful not only for understanding the response to pathogens, but also to determine the role of antigen-driven selection in autoimmunity, B cell cancers and the diversification of pre-immune repertoires in certain species. Despite its importance, quantifying selection in experimentally derived sequences is fraught with difficulties. The necessary parameters for statistical tests (such as the expected frequency of replacement mutations in the absence of selection) are non-trivial to calculate, and results are not easily interpretable when analyzing more than a handful of sequences. We have developed a web server that implements our previously proposed Focused binomial test for detecting selection. Several features are integrated into the web site in order to facilitate analysis, including V(D)J germline segment identification with IMGT alignment, batch submission of sequences and integration of additional test statistics proposed by other groups. We also implement a Z-score-based statistic that increases the power of detecting selection while maintaining specificity, and further allows for the combined analysis of sequences from different germlines. The tool is freely available at http://clip.med.yale.edu/selection. PMID:21665923

  5. Statistical properties of DNA sequences

    NASA Astrophysics Data System (ADS)

    Peng, C.-K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

    1995-02-01

    We review evidence supporting the idea that the DNA sequence in genese containing non-coding regions is correlated, and that the correlation is remarkably long range - indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the “non-stationarity” feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33 301 coding and 29 453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.

  6. Comparison of novel decoy database designs for optimizing protein identification searches using ABRF sPRG2006 standard MS/MS data sets.

    PubMed

    Blanco, Luca; Mead, Jennifer A; Bessant, Conrad

    2009-04-01

    Decoy database searches are used to filter out false positive protein identifications derived from search engines, but there is no consensus about which decoy is "the best". We evaluate nine different decoy designs using public data sets from samples of known composition. Statistically significant performance differences were found, but no single decoy stood out among the best performers. Ultimately, we recommend peptide level reverse decoys searched independently from the target. PMID:19714810

  7. The Extrapolation of Elementary Sequences

    NASA Technical Reports Server (NTRS)

    Laird, Philip; Saul, Ronald

    1992-01-01

    We study sequence extrapolation as a stream-learning problem. Input examples are a stream of data elements of the same type (integers, strings, etc.), and the problem is to construct a hypothesis that both explains the observed sequence of examples and extrapolates the rest of the stream. A primary objective -- and one that distinguishes this work from previous extrapolation algorithms -- is that the same algorithm be able to extrapolate sequences over a variety of different types, including integers, strings, and trees. We define a generous family of constructive data types, and define as our learning bias a stream language called elementary stream descriptions. We then give an algorithm that extrapolates elementary descriptions over constructive datatypes and prove that it learns correctly. For freely-generated types, we prove a polynomial time bound on descriptions of bounded complexity. An especially interesting feature of this work is the ability to provide quantitative measures of confidence in competing hypotheses, using a Bayesian model of prediction.

  8. Numerical classification of coding sequences

    NASA Technical Reports Server (NTRS)

    Collins, D. W.; Liu, C. C.; Jukes, T. H.

    1992-01-01

    DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for example to identify redundant GenBank entries. These descriptors may be compared, in place of DNA sequences, to extract homologous genes from large databases. This approach permits rapid searching with good selectivity.

  9. Statistical analysis of nucleotide sequences.

    PubMed Central

    Stückle, E E; Emmrich, C; Grob, U; Nielsen, P J

    1990-01-01

    In order to scan nucleic acid databases for potentially relevant but as yet unknown signals, we have developed an improved statistical model for pattern analysis of nucleic acid sequences by modifying previous methods based on Markov chains. We demonstrate the importance of selecting the appropriate parameters in order for the method to function at all. The model allows the simultaneous analysis of several short sequences with unequal base frequencies and Markov order k not equal to 0 as is usually the case in databases. As a test of these modifications, we show that in E. coli sequences there is a bias against palindromic hexamers which correspond to known restriction enzyme recognition sites. PMID:2251125

  10. Sequence alignment with tandem duplication

    SciTech Connect

    Benson, G.

    1997-12-01

    Algorithm development for comparing and aligning biological sequences has, until recently, been based on the SI model of mutational events which assumes that modification of sequences proceeds through any of the operations of substitution, insertion or deletion (the latter two collectively termed indels). While this model has worked farily well, it has long been apparent that other mutational events occur. In this paper, we introduce a new model, the DSI model which includes another common mutational event, tandem duplication. Tandem duplication produces tandem repeats which are common in DNA, making up perhaps 10% of the human genome. They are responsible for some human diseases and may serve a multitude of functions in DNA regulation and evolution. Using the DSI model, we develop new exact and heuristic algorithms for comparing and aligning DNA sequences when they contain tandem repeats. 30 refs., 3 figs.

  11. Sequence-invariant state machines

    NASA Astrophysics Data System (ADS)

    Whitaker, Sterling R.; Manjunath, Shamanna K.; Maki, Gary K.

    1991-08-01

    A synthesis method and an MOS VLSI architecture are presented to realize sequential circuits that have the ability to implement any state machine having N states and m inputs, regardless of the actual sequence specified in the flow table. The design method utilizes binary tree structured (BTS) logic to implement regular and dense circuits. The desired state sequence can be hardwired with power supply connections or can be dynamically reallocated if stored in a register. This allows programmable VLSI controllers to be designed with a compact size and performance approaching that of dedicated logic. Results of ICV implementations are reported and an example sequence-invariant state machine is contrasted with implementations based on traditional methods.

  12. Graph Partitioning and Sequencing Software

    Energy Science and Technology Software Center (ESTSC)

    1995-09-19

    Graph partitioning is a fundemental problem in many scientific contexts. CHACO2.0 is a software package designed to partition and sequence graphs. CHACO2.0 allows for recursive application of several methods for finding small edge separators in weighted graphs. These methods include inertial, spectral, Kernighan Lin and multilevel methods in addition to several simpler strategies. Each of these approaches can be used to partition the graph into two, four, or eight pieces at each level of recursion.more » In addition, the Kernighan Lin method can be used to improve partitions generated by any of the other algorithms. CHACO2.0 can also be used to address various graph sequencing problems, with applications to scientific computing, database design, gene sequencing and other problems.« less

  13. Composition for nucleic acid sequencing

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  14. A repetitive sequence assembler based on next-generation sequencing.

    PubMed

    Lian, S; Tu, Y; Wang, Y; Chen, X; Wang, L

    2016-01-01

    Repetitive sequences of variable length are common in almost all eukaryotic genomes, and most of them are presumed to have important biomedical functions and can cause genomic instability. Next-generation sequencing (NGS) technologies provide the possibility of identifying capturing these repetitive sequences directly from the NGS data. In this study, we assessed the performances in identifying capturing repeats of leading assemblers, such as Velvet, SOAPdenovo, SGA, MSR-CA, Bambus2, ALLPATHS-LG, and AByss using three real NGS datasets. Our results indicated that most of them performed poorly in capturing the repeats. Consequently, we proposed a repetitive sequence assembler, named NGSReper, for capturing repeats from NGS data. Simulated datasets were used to validate the feasibility of NGSReper. The results indicate that the completeness of capturing repeat is up to 99%. Cross validation was performed in three real NGS datasets, and extensive comparisons indicate that NGSReper performed best in terms of completeness and accuracy in capturing repeats. In conclusion, NGSReper is an appropriate and suitable tool for capturing repeats directly from NGS data. PMID:27525861

  15. DNA Sequencing Using capillary Electrophoresis

    SciTech Connect

    Dr. Barry Karger

    2011-05-09

    The overall goal of this program was to develop capillary electrophoresis as the tool to be used to sequence for the first time the Human Genome. Our program was part of the Human Genome Project. In this work, we were highly successful and the replaceable polymer we developed, linear polyacrylamide, was used by the DOE sequencing lab in California to sequence a significant portion of the human genome using the MegaBase multiple capillary array electrophoresis instrument. In this final report, we summarize our efforts and success. We began our work by separating by capillary electrophoresis double strand oligonucleotides using cross-linked polyacrylamide gels in fused silica capillaries. This work showed the potential of the methodology. However, preparation of such cross-linked gel capillaries was difficult with poor reproducibility, and even more important, the columns were not very stable. We improved stability by using non-cross linked linear polyacrylamide. Here, the entangled linear chains could move when osmotic pressure (e.g. sample injection) was imposed on the polymer matrix. This relaxation of the polymer dissipated the stress in the column. Our next advance was to use significantly lower concentrations of the linear polyacrylamide that the polymer could be automatically blown out after each run and replaced with fresh linear polymer solution. In this way, a new column was available for each analytical run. Finally, while testing many linear polymers, we selected linear polyacrylamide as the best matrix as it was the most hydrophilic polymer available. Under our DOE program, we demonstrated initially the success of the linear polyacrylamide to separate double strand DNA. We note that the method is used even today to assay purity of double stranded DNA fragments. Our focus, of course, was on the separation of single stranded DNA for sequencing purposes. In one paper, we demonstrated the success of our approach in sequencing up to 500 bases. Other

  16. Regulation of next generation sequencing.

    PubMed

    Javitt, Gail H; Carner, Katherine Strong

    2014-01-01

    Next generation sequencing raises new questions within the context of an existing and still evolving regulatory landscape for device manufacturers and clinical laboratories. FDA cleared the first NGS sequencing platform in November 2013, but it is unclear what lies ahead for this technology. NGS will require new types of training and expertise to interpret the vast quantities of genetic data so as to provide meaningful clinical information to physicians and patients. This paper will describe the current regulatory landscape for NGS technologies, identify the regulatory challenges they present, and consider whether new regulatory paradigms are needed to accommodate NGS technologies and services. PMID:25298288

  17. Endogenized viral sequences in mammals.

    PubMed

    Parrish, Nicholas F; Tomonaga, Keizo

    2016-06-01

    Reverse-transcribed RNA molecules compose a significant portion of the human genome. Many of these RNA molecules were retrovirus genomes either infecting germline cells or having done so in a previous generation but retaining transcriptional activity. This mechanism itself accounts for a quarter of the genomic sequence information of mammals for which there is data. We understand relatively little about the causes and consequences of retroviral endogenization. This review highlights functions ascribed to sequences of viral origin endogenized into mammalian genomes and suggests some of the most pressing questions raised by these observations. PMID:27128186

  18. High speed nucleic acid sequencing

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  19. Detection of latent sequence periodicities.

    PubMed Central

    Pizzi, E; Liuni, S; Frontali, C

    1990-01-01

    A method is proposed for the automatic detection of serial periodicities in a linear sequence. Its application to DNA subtelomeric sequences from two lower eukaryotes, P.falciparum and S.cerevisiae, reveals ordered patterns organised in hierarchical periodicities, not easily recognizable by other methods. The possible implications concerning the evolution of tandemly repetitive arrays are discussed in light of a model which involves, as successive steps, random repeat modification, the fusion of differently modified repeat versions into longer units, and the amplification of (and/or homogenization to) the more recent repeat units. PMID:2197595

  20. Data compression for sequencing data

    PubMed Central

    2013-01-01

    Post-Sanger sequencing methods produce tons of data, and there is a general agreement that the challenge to store and process them must be addressed with data compression. In this review we first answer the question “why compression” in a quantitative manner. Then we also answer the questions “what” and “how”, by sketching the fundamental compression ideas, describing the main sequencing data types and formats, and comparing the specialized compression algorithms and tools. Finally, we go back to the question “why compression” and give other, perhaps surprising answers, demonstrating the pervasiveness of data compression techniques in computational biology. PMID:24252160

  1. Tracking simple and complex sequences.

    PubMed

    Large, Edward W; Fink, Philip; Kelso, J A Scott

    2002-02-01

    We address issues of synchronization to rhythms of musical complexity. In two experiments, synchronization to simple and more complex rhythmic sequences was investigated. Experiment 1 examined responses to phase and tempo perturbations within simple, structurally isochronous sequences, presented at different base rates. Experiment 2 investigated responses to similar perturbations embedded within more complex, metrically structured sequences; participants were explicitly instructed to synchronize at different metrical levels (i.e., tap at different rates to the same rhythmic patterns) on different trials. We found evidence that (1) the intrinsic tapping frequency adapts in response to temporal perturbations in both simple (isochronous) and complex (metrically structured) rhythms, (2) people can synchronize with unpredictable, metrically structured rhythms at different metrical levels, with qualitatively different patterns of synchronization seen at higher versus lower levels of metrical structure, and (3) synchronization at each tapping level reflects information from other metrical levels. The latter finding provides evidence for a dynamic and flexible internal representation of the sequence's metrical structure. PMID:11963276

  2. Why Visual Sequences Come First.

    ERIC Educational Resources Information Center

    Barley, Steven D.

    Visual sequences should be the first visual literacy exercises for reasons that are physio-psychological, semantic, and curricular. In infancy, vision is undifferentiated and undetailed. The number of details a child sees increases with age. Therefore, a series of pictures, rather than one photograph which tells a whole story, is more appropriate…

  3. RSAT: regulatory sequence analysis tools.

    PubMed

    Thomas-Chollier, Morgane; Sand, Olivier; Turatsinze, Jean-Valéry; Janky, Rekin's; Defrance, Matthieu; Vervisch, Eric; Brohée, Sylvain; van Helden, Jacques

    2008-07-01

    The regulatory sequence analysis tools (RSAT, http://rsat.ulb.ac.be/rsat/) is a software suite that integrates a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. The suite includes programs for sequence retrieval, pattern discovery, phylogenetic footprint detection, pattern matching, genome scanning and feature map drawing. Random controls can be performed with random gene selections or by generating random sequences according to a variety of background models (Bernoulli, Markov). Beyond the original word-based pattern-discovery tools (oligo-analysis and dyad-analysis), we recently added a battery of tools for matrix-based detection of cis-acting elements, with some original features (adaptive background models, Markov-chain estimation of P-values) that do not exist in other matrix-based scanning tools. The web server offers an intuitive interface, where each program can be accessed either separately or connected to the other tools. In addition, the tools are now available as web services, enabling their integration in programmatic workflows. Genomes are regularly updated from various genome repositories (NCBI and EnsEMBL) and 682 organisms are currently supported. Since 1998, the tools have been used by several hundreds of researchers from all over the world. Several predictions made with RSAT were validated experimentally and published. PMID:18495751

  4. Crop Sequence Calculator, v. 3

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Producers need to know how to sequence crops to develop sustainable dynamic cropping systems that take advantage of inherent internal resources, such as crop synergism, nutrient cycling, and soil water, and capitalize on external resources, such as weather, markets, and government programs. Version ...

  5. Genome Sequence of Spizellomyces punctatus

    PubMed Central

    Russ, Carsten; Lang, B. Franz; Chen, Zehua; Gujja, Sharvari; Shea, Terrance; Zeng, Qiandong; Young, Sarah; Nusbaum, Chad

    2016-01-01

    Spizellomyces punctatus is a basally branching chytrid fungus that is found in the Chytridiomycota phylum. Spizellomyces species are common in soil and of importance in terrestrial ecosystems. Here, we report the genome sequence of S. punctatus, which will facilitate the study of this group of early diverging fungi. PMID:27540072

  6. [Multilocus sequence typing (MLST) analysis].

    PubMed

    Matsumura, Yasufumi

    2013-12-01

    Multilocus sequence typing (MLST) analysis has been emerging as a powerful tool for genotyping specific bacterial species. MLST utilizes internal fragments of multiple housekeeping genes and the combination of each allele defines the sequence type for each isolate. MLST databases contain reference data and are freely accessible via internet websites. The standard method for investigating short-term hospital outbreaks is still pulse-field gel-electrophoresis and MLST analysis is not a substitute. However, analysis of sequence types and clonal complexes (closely related sequence types) enables identification and understanding of a specific clone that is widely spreading among drug-resistant organisms, or a key clone that is important for evolution of the organism. In the case of Escherichia coli, CTX-M-15 or CTX-M-14 extended-spectrum beta-lactamase producing ST131 clone has emerged and spread globally in the last 10 years. MLST analysis is an unambiguous procedure and is becoming a common typing method to characterize isolates. PMID:24605545

  7. Recent Scope-and-Sequence Models.

    ERIC Educational Resources Information Center

    Beem, Ronald

    1990-01-01

    Presents scope-and-sequence models from the National Commission on Social Studies in the Schools, the Bradley Commission on History in Schools, and three from the National Council for the Social Studies (NCSS) Ad Hoc Committee on Scope and Sequence. Provides NCSS's criteria on scope and sequence and notes that sequence of the five models varies…

  8. Teaching Task Sequencing via Verbal Mediation.

    ERIC Educational Resources Information Center

    Rusch, Frank R.; And Others

    1987-01-01

    Verbal sequence training was used to teach a moderately mentally retarded woman to sequence job-related tasks. Learning to say the tasks in the proper sequence resulted in the employee performing her tasks in that sequence, and the employee was capable of mediating her own work behavior when scheduled changes occurred. (Author/JDD)

  9. Multiple Strand Sequencing Using the Elaboration Theory.

    ERIC Educational Resources Information Center

    Beissner, Katherine; Reigeluth, Charles M.

    This study examined the sequencing of instruction in a course in physical therapy. In the first phase, a procedural elaboration sequence was designed using the Simplifying Assumptions Method. In the second phase, a prescriptive-theoretical elaboration sequence independent of the procedural sequence was designed. A descriptive-theoretical…

  10. Gene Discovery through Expressed Sequence Tag Sequencing in Trypanosoma cruzi

    PubMed Central

    Verdun, Ramiro E.; Di Paolo, Nelson; Urmenyi, Turan P.; Rondinelli, Edson; Frasch, Alberto C. C.; Sanchez, Daniel O.

    1998-01-01

    Analysis of expressed sequence tags (ESTs) constitutes a useful approach for gene identification that, in the case of human pathogens, might result in the identification of new targets for chemotherapy and vaccine development. As part of the Trypanosoma cruzi genome project, we have partially sequenced the 5′ ends of 1,949 clones to generate ESTs. The clones were randomly selected from a normalized CL Brener epimastigote cDNA library. A total of 14.6% of the clones were homologous to previously identified T. cruzi genes, while 18.4% had significant matches to genes from other organisms in the database. A total of 67% of the ESTs had no matches in the database, and thus, some of them might be T. cruzi-specific genes. Functional groups of those sequences with matches in the database were constructed according to their putative biological functions. The two largest categories were protein synthesis (23.3%) and cell surface molecules (10.8%). The information reported in this paper should be useful for researchers in the field to analyze genes and proteins of their own interest. PMID:9784549

  11. Program for Editing Spacecraft Command Sequences

    NASA Technical Reports Server (NTRS)

    Gladden, Roy; Waggoner, Bruce; Kordon, Mark; Hashemi, Mahnaz; Hanks, David; Salcedo, Jose

    2006-01-01

    Sequence Translator, Editor, and Expander Resource (STEER) is a computer program that facilitates construction of sequences and blocks of sequences (hereafter denoted generally as sequence products) for commanding a spacecraft. STEER also provides mechanisms for translating among various sequence product types and quickly expanding activities of a given sequence in chronological order for review and analysis of the sequence. To date, construction of sequence products has generally been done by use of such clumsy mechanisms as text-editor programs, translating among sequence product types has been challenging, and expanding sequences to time-ordered lists has involved arduous processes of converting sequence products to "real" sequences and running them through Class-A software (defined, loosely, as flight and ground software critical to a spacecraft mission). Also, heretofore, generating sequence products in standard formats has been troublesome because precise formatting and syntax are required. STEER alleviates these issues by providing a graphical user interface containing intuitive fields in which the user can enter the necessary information. The STEER expansion function provides a "quick and dirty" means of seeing how a sequence and sequence block would expand into a chronological list, without need to use of Class-A software.

  12. Visual mislocalization during saccade sequences.

    PubMed

    Zimmermann, Eckart; Morrone, Maria Concetta; Burr, David

    2015-02-01

    Visual objects briefly presented around the time of saccadic eye movements are perceived compressed towards the saccade target. Here, we investigated perisaccadic mislocalization with a double-step saccade paradigm, measuring localization of small probe dots briefly flashed at various times around the sequence of the two saccades. At onset of the first saccade, probe dots were mislocalized towards the first and, to a lesser extent, also towards the second saccade target. However, there was very little mislocalization at the onset of the second saccade. When we increased the presentation duration of the saccade targets prior to onset of the saccade sequence, perisaccadic mislocalization did occur at the onset of the second saccade. PMID:25370348

  13. Triple helix purification and sequencing

    DOEpatents

    Wang, R.; Smith, L.M.; Tong, X.E.

    1995-03-28

    Disclosed herein are methods, kits, and equipment for purifying single stranded circular DNA and then using the DNA for DNA sequencing purposes. Templates are provided with an insert having a hybridization region. An elongated oligonucleotide has two regions that are complementary to the insert and the oligo is bound to a magnetic anchor. The oligo hybridizes to the insert on two sides to form a stable triple helix complex. The anchor can then be used to drag the template out of solution using a magnet. The system can purify sequencing templates, and if desired the triple helix complex can be opened up to a double helix so that the oligonucleotide will act as a primer for further DNA synthesis. 4 figures.

  14. Extrapolation methods for vector sequences

    NASA Technical Reports Server (NTRS)

    Smith, David A.; Ford, William F.; Sidi, Avram

    1987-01-01

    This paper derives, describes, and compares five extrapolation methods for accelerating convergence of vector sequences or transforming divergent vector sequences to convergent ones. These methods are the scalar epsilon algorithm (SEA), vector epsilon algorithm (VEA), topological epsilon algorithm (TEA), minimal polynomial extrapolation (MPE), and reduced rank extrapolation (RRE). MPE and RRE are first derived and proven to give the exact solution for the right 'essential degree' k. Then, Brezinski's (1975) generalization of the Shanks-Schmidt transform is presented; the generalized form leads from systems of equations to TEA. The necessary connections are then made with SEA and VEA. The algorithms are extended to the nonlinear case by cycling, the error analysis for MPE and VEA is sketched, and the theoretical support for quadratic convergence is discussed. Strategies for practical implementation of the methods are considered.

  15. Sequence correlations shape protein promiscuity

    NASA Astrophysics Data System (ADS)

    Lukatsky, David B.; Afek, Ariel; Shakhnovich, Eugene I.

    2011-08-01

    We predict analytically that diagonal correlations of amino acid positions within protein sequences statistically enhance protein propensity for nonspecific binding. We use the term "promiscuity" to describe such nonspecific binding. Diagonal correlations represent statistically significant repeats of sequence patterns where amino acids of the same type are clustered together. The predicted effect is qualitatively robust with respect to the form of the microscopic interaction potentials and the average amino acid composition. Our analytical results provide an explanation for the enhanced diagonal correlations observed in hubs of eukaryotic organismal proteomes [J. Mol. Biol. 409, 439 (2011)], 10.1016/j.jmb.2011.03.056. We suggest experiments that will allow direct testing of the predicted effect.

  16. Triple helix purification and sequencing

    DOEpatents

    Wang, Renfeng; Smith, Lloyd M.; Tong, Xinchun E.

    1995-01-01

    Disclosed herein are methods, kits, and equipment for purifying single stranded circular DNA and then using the DNA for DNA sequencing purposes. Templates are provided with an insert having a hybridization region. An elongated oligonucleotide has two regions that are complementary to the insert and the oligo is bound to a magnetic anchor. The oligo hybridizes to the insert on two sides to form a stable triple helix complex. The anchor can then be used to drag the template out of solution using a magnet. The system can purify sequencing templates, and if desired the triple helix complex can be opened up to a double helix so that the oligonucleotide will act as a primer for further DNA synthesis.

  17. Cassini Mission Sequence Subsystem (MSS)

    NASA Technical Reports Server (NTRS)

    Alland, Robert

    2011-01-01

    This paper describes my work with the Cassini Mission Sequence Subsystem (MSS) team during the summer of 2011. It gives some background on the motivation for this project and describes the expected benefit to the Cassini program. It then introduces the two tasks that I worked on - an automatic system auditing tool and a series of corrections to the Cassini Sequence Generator (SEQ_GEN) - and the specific objectives these tasks were to accomplish. Next, it details the approach I took to meet these objectives and the results of this approach, followed by a discussion of how the outcome of the project compares with my initial expectations. The paper concludes with a summary of my experience working on this project, lists what the next steps are, and acknowledges the help of my Cassini colleagues.

  18. Apparatus for improved DNA sequencing

    DOEpatents

    Douthart, R.J.; Crowell, S.L.

    1996-05-07

    This invention is a means for the rapid sequencing of DNA samples. More specifically, it consists of a new design direct blotting electrophoresis unit. The DNA sequence is deposited on a membrane attached to a rotating drum. Initial data compaction is facilitated by the use of a machined multi-channeled plate called a ribbon channel plate. Each channel is an isolated mini gel system much like a gel filled capillary. The system as a whole, however, is in a slab gel like format with the advantages of uniformity and easy reusability. The system can be used in different embodiments. The drum system is unique in that after deposition the drum rotates the deposited DNA into a large non-buffer open space where processing and detection can occur. The drum can also be removed in toto to special workstations for downstream processing, multiplexing and detection. 18 figs.

  19. Apparatus for improved DNA sequencing

    DOEpatents

    Douthart, Richard J.; Crowell, Shannon L.

    1996-01-01

    This invention is a means for the rapid sequencing of DNA samples. More specifically, it consists of a new design direct blotting electrophoresis unit. The DNA sequence is deposited on a membrane attached to a rotating drum. Initial data compaction is facilitated by the use of a machined multi-channeled plate called a ribbon channel plate. Each channel is an isolated mini gel system much like a gel filled capillary. The system as a whole, however, is in a slab gel like format with the advantages of uniformity and easy reusability. The system can be used in different embodiments. The drum system is unique in that after deposition the drum rotates the deposited DNA into a large non-buffer open space where processing and detection can occur. The drum can also be removed in toto to special workstations for downstream processing, multiplexing and detection.

  20. Channel plate for DNA sequencing

    DOEpatents

    Douthart, R.J.; Crowell, S.L.

    1998-01-13

    This invention is a channel plate that facilitates data compaction in DNA sequencing. The channel plate has a length, a width and a thickness, and further has a plurality of channels that are parallel. Each channel has a depth partially through the thickness of the channel plate. Additionally an interface edge permits electrical communication across an interface through a buffer to a deposition membrane surface. 15 figs.

  1. Dynamical model for DNA sequences

    NASA Astrophysics Data System (ADS)

    Allegrini, P.; Barbi, M.; Grigolini, P.; West, B. J.

    1995-11-01

    We address the problem of DNA sequences, developing a ``dynamical'' method based on the assumption that the statistical properties of DNA paths are determined by the joint action of two processes, one deterministic with long-range correlations, and the other random and δ-function correlated. The generator of the deterministic evolution is a nonlinear map, belonging to a class of maps recently tailored to mimic the processes of weak chaos that are responsible for the birth of anomalous diffusion. It is assumed that the deterministic process corresponds to unknown biological rules that determine the DNA path, whereas the noise mimics the influence of an infinite-dimensional environment on the biological process under study. We prove that the resulting diffusion process, if the effect of the random process is neglected, is an α-stable Lévy process with 1<α<2. We also show that, if the diffusion process is determined by the joint action of the deterministic and the random process, the correlation effects of the ``deterministic dynamics'' are cancelled on the short-range scale, but show up in the long-range one. We denote our prescription to generate statistical sequences as the copying mistake map (CMM). We carry out our analysis of several DNA sequences and their CMM realizations with a variety of techniques, and we especially focus on a method of regression to equilibrium, which we call the Onsager analysis. With these techniques we establish the statistical equivalence of the real DNA sequences with their CMM realizations. We show that long-range correlations are present in exons as well as in introns, but are difficult to detect, since the exon ``dynamics'' is shown to be determined by the entanglement of three distinct and independent CMM's.

  2. Insertion Sequence Diversity in Archaea

    PubMed Central

    Filée, J.; Siguier, P.; Chandler, M.

    2007-01-01

    Insertion sequences (ISs) can constitute an important component of prokaryotic (bacterial and archaeal) genomes. Over 1,500 individual ISs are included at present in the ISfinder database (www-is.biotoul.fr), and these represent only a small portion of those in the available prokaryotic genome sequences and those that are being discovered in ongoing sequencing projects. In spite of this diversity, the transposition mechanisms of only a few of these ubiquitous mobile genetic elements are known, and these are all restricted to those present in bacteria. This review presents an overview of ISs within the archaeal kingdom. We first provide a general historical summary of the known properties and behaviors of archaeal ISs. We then consider how transposition might be regulated in some cases by small antisense RNAs and by termination codon readthrough. This is followed by an extensive analysis of the IS content in the sequenced archaeal genomes present in the public databases as of June 2006, which provides an overview of their distribution among the major archaeal classes and species. We show that the diversity of archaeal ISs is very great and comparable to that of bacteria. We compare archaeal ISs to known bacterial ISs and find that most are clearly members of families first described for bacteria. Several cases of lateral gene transfer between bacteria and archaea are clearly documented, notably for methanogenic archaea. However, several archaeal ISs do not have bacterial equivalents but can be grouped into Archaea-specific groups or families. In addition to ISs, we identify and list nonautonomous IS-derived elements, such as miniature inverted-repeat transposable elements. Finally, we present a possible scenario for the evolutionary history of ISs in the Archaea. PMID:17347521

  3. Channel plate for DNA sequencing

    DOEpatents

    Douthart, Richard J.; Crowell, Shannon L.

    1998-01-01

    This invention is a channel plate that facilitates data compaction in DNA sequencing. The channel plate has a length, a width and a thickness, and further has a plurality of channels that are parallel. Each channel has a depth partially through the thickness of the channel plate. Additionally an interface edge permits electrical communication across an interface through a buffer to a deposition membrane surface.

  4. Shaping Action Sequences in Basal Ganglia Circuits

    PubMed Central

    Jin, Xin; Costa, Rui M

    2015-01-01

    Many behaviors necessary for organism survival are learned anew and become organized as complex sequences of actions. Recent studies suggest that cortico-basal ganglia circuits are important for chunking isolated movements into precise and robust action sequences that permit the achievement of particular goals. During sequence learning many neurons in the basal ganglia develop sequence-related activity - related to the initiation, execution, and termination of sequences - suggesting that action sequences are processed as action units. Corticostriatal plasticity is critical for the crystallization of action sequences, and for the development of sequence-related neural activity. Furthermore, this sequence-related activity is differentially expressed in direct and indirect basal ganglia pathways. These findings have implications for understanding the symptoms associated with movement and psychiatric disorders. PMID:26189204

  5. RNA sequence analysis using covariance models.

    PubMed Central

    Eddy, S R; Durbin, R

    1994-01-01

    We describe a general approach to several RNA sequence analysis problems using probabilistic models that flexibly describe the secondary structure and primary sequence consensus of an RNA sequence family. We call these models 'covariance models'. A covariance model of tRNA sequences is an extremely sensitive and discriminative tool for searching for additional tRNAs and tRNA-related sequences in sequence databases. A model can be built automatically from an existing sequence alignment. We also describe an algorithm for learning a model and hence a consensus secondary structure from initially unaligned example sequences and no prior structural information. Models trained on unaligned tRNA examples correctly predict tRNA secondary structure and produce high-quality multiple alignments. The approach may be applied to any family of small RNA sequences. Images PMID:8029015

  6. Biological sequence classification with multivariate string kernels.

    PubMed

    Kuksa, Pavel P

    2013-01-01

    String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. They often exhibit state-of-the-art performance on many practical tasks of sequence analysis such as biological sequence classification, remote homology detection, or protein superfamily and fold prediction. However, typical string kernel methods rely on the analysis of discrete 1D string data (e.g., DNA or amino acid sequences). In this paper, we address the multiclass biological sequence classification problems using multivariate representations in the form of sequences of features vectors (as in biological sequence profiles, or sequences of individual amino acid physicochemical descriptors) and a class of multivariate string kernels that exploit these representations. On three protein sequence classification tasks, the proposed multivariate representations and kernels show significant 15-20 percent improvements compared to existing state-of-the-art sequence classification methods. PMID:24384708

  7. Biological Sequence Analysis with Multivariate String Kernels.

    PubMed

    Kuksa, Pavel P

    2013-03-01

    String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. They often exhibit state-of-the-art performance on many practical tasks of sequence analysis such as biological sequence classification, remote homology detection, or protein superfamily and fold prediction. However, typical string kernel methods rely on analysis of discrete one-dimensional (1D) string data (e.g., DNA or amino acid sequences). In this work we address the multi-class biological sequence classification problems using multivariate representations in the form of sequences of features vectors (as in biological sequence profiles, or sequences of individual amino acid physico-chemical descriptors) and a class of multivariate string kernels that exploit these representations. On a number of protein sequence classification tasks proposed multivariate representations and kernels show significant 15-20\\% improvements compared to existing state-of-the-art sequence classification methods. PMID:23509193

  8. Fluorescence-detected DNA sequencing

    SciTech Connect

    Haugland, R.P.

    1990-01-01

    Our research effort funded by this grant primarily focused on development of suitable fluorescent dyes for DNA sequencing studies. Prior to our efforts, the dyes being sued in commercial DNA sequencers were various versions of fluorescein dyes for the shorter wavelengths and of rhodamine dyes for the longer wavelengths. Our initial goal was to synthesize a set of four dyes that could all be excited by the 488 and 514 nm line of the argon laser lines and that have emission spectra that minimize spectral overlap. The specific result sought was higher fluorescent intensity, particularly of the longest wavelength dyes than was available using existing dyes. Another important property of the desired set of dyes was uniform ionic charge in order to have minimum interference on the electrophoretic mobility during the sequencing. During the period of this grant we prepared and characterized four types of dyes: fluorescent bifluorophores, derivatives of rhodamine dyes, derivatives of rhodol dyes and derivatives of boron dipyrromethene difluoride (BODIPY{trademark}) dyes.

  9. Spectral clustering of protein sequences

    PubMed Central

    Paccanaro, Alberto; Casbon, James A.; Saqi, Mansoor A. S.

    2006-01-01

    An important problem in genomics is automatically clustering homologous proteins when only sequence information is available. Most methods for clustering proteins are local, and are based on simply thresholding a measure related to sequence distance. We first show how locality limits the performance of such methods by analysing the distribution of distances between protein sequences. We then present a global method based on spectral clustering and provide theoretical justification of why it will have a remarkable improvement over local methods. We extensively tested our method and compared its performance with other local methods on several subsets of the SCOP (Structural Classification of Proteins) database, a gold standard for protein structure classification. We consistently observed that, the number of clusters that we obtain for a given set of proteins is close to the number of superfamilies in that set; there are fewer singletons; and the method correctly groups most remote homologs. In our experiments, the quality of the clusters as quantified by a measure that combines sensitivity and specificity was consistently better [on average, improvements were 84% over hierarchical clustering, 34% over Connected Component Analysis (CCA) (similar to GeneRAGE) and 72% over another global method, TribeMCL]. PMID:16547200

  10. Memory and learning with rapid audiovisual sequences

    PubMed Central

    Keller, Arielle S.; Sekuler, Robert

    2015-01-01

    We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed. PMID:26575193

  11. Memory and learning with rapid audiovisual sequences.

    PubMed

    Keller, Arielle S; Sekuler, Robert

    2015-01-01

    We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed. PMID:26575193

  12. Method and apparatus for biological sequence comparison

    DOEpatents

    Marr, T.G.; Chang, W.I.

    1997-12-23

    A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.

  13. Method and apparatus for biological sequence comparison

    DOEpatents

    Marr, Thomas G.; Chang, William I-Wei

    1997-01-01

    A method and apparatus for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence.

  14. The modular structure of informational sequences.

    PubMed

    Schmitt, A O; Ebeling, W; Herzel, H

    1996-01-01

    It is shown that DNA sequences can be decomposed into smaller units much the same as texts can be decomposed into syllables, words, or groups of words. Those smaller units (modules) are extracted from DNA sequences according to statistical criteria. Tests with sequences of known modular structure (two novels and a FORTRAN source code) were performed. The rate to which DNA sequences can be decomposed into modules (modularity) turns out to be a very sensitive measure to distinguish DNA sequences from random sequences. PMID:8924645

  15. Sequencing Voyager II for the Uranus encounter

    NASA Technical Reports Server (NTRS)

    Morris, R. B.

    1986-01-01

    The process of developing the programmed sequence of events necessary for the Voyager 2 spacecraft to return desired data from its Uranus encounter is discussed. The major steps in the sequence process are reviewed, and the elements of the Mission Sequence Software are described. The design phase and the implementation phase of the sequence process are discussed, and the Computer Command Subsystem architecture is examined in detail. The software's role in constructing the sequences and converting them into onboard programs is elucidated, and the problems unique to the Uranus encounter sequences are considered.

  16. A simple method for global sequence comparison.

    PubMed Central

    Pizzi, E; Attimonelli, M; Liuni, S; Frontali, C; Saccone, C

    1992-01-01

    A simple method of sequence comparison, based on a correlation analysis of oligonucleotide frequency distributions, is here shown to be a reliable test of overall sequence similarity. The method does not involve sequence alignment procedures and permits the rapid screening of large amounts of sequence data. It identifies those sequences which deserve more careful analysis of sequence similarity at the level of resolution of the single nucleotide. It uses observed quantities only and does not involve the adoption of any theoretical model. PMID:1738591

  17. DNA sequencing: bench to bedside and beyond†

    PubMed Central

    Hutchison, Clyde A.

    2007-01-01

    Fifteen years elapsed between the discovery of the double helix (1953) and the first DNA sequencing (1968). Modern DNA sequencing began in 1977, with development of the chemical method of Maxam and Gilbert and the dideoxy method of Sanger, Nicklen and Coulson, and with the first complete DNA sequence (phage ϕX174), which demonstrated that sequence could give profound insights into genetic organization. Incremental improvements allowed sequencing of molecules >200 kb (human cytomegalovirus) leading to an avalanche of data that demanded computational analysis and spawned the field of bioinformatics. The US Human Genome Project spurred sequencing activity. By 1992 the first ‘sequencing factory’ was established, and others soon followed. The first complete cellular genome sequences, from bacteria, appeared in 1995 and other eubacterial, archaebacterial and eukaryotic genomes were soon sequenced. Competition between the public Human Genome Project and Celera Genomics produced working drafts of the human genome sequence, published in 2001, but refinement and analysis of the human genome sequence will continue for the foreseeable future. New ‘massively parallel’ sequencing methods are greatly increasing sequencing capacity, but further innovations are needed to achieve the ‘thousand dollar genome’ that many feel is prerequisite to personalized genomic medicine. These advances will also allow new approaches to a variety of problems in biology, evolution and the environment. PMID:17855400

  18. The Art of Gymnastics: Creating Sequences.

    ERIC Educational Resources Information Center

    Rovegno, Inez

    1988-01-01

    Offering students opportunities for creating movement sequences in gymnastics allows them to understand the essence of gymnastics, have creative experiences, and learn about themselves. The process of creating sequences is described. (MT)

  19. An Assignment Sequence for Underprepared Writers.

    ERIC Educational Resources Information Center

    Nimmo, Kristi

    2000-01-01

    Presents a sequenced writing assignment on shopping to aid basic writers. Describes a writing assignment focused around online and mail-order shopping. Notes steps in preparing for the assignment, the sequence, and discusses responses to the assignments. (SC)

  20. Benchmarking short sequence mapping tools

    PubMed Central

    2013-01-01

    Background The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison. Results We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others. Conclusion The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results. PMID:23758764

  1. Genetic mapping and DNA sequencing

    SciTech Connect

    Speed, T.; Waterman, M.S.

    1996-12-31

    The Human Genome Initiative has as its primary objective the characterization of the human genome. High-resolution linkage maps of genetic markers will play an important role in completing the human genome project. This is one of two volumes based on the proceedings of the 1994 IMA Summer Program on Molecular Biology and comprises Weeks 1 and 2 of the four-week program. This volume focuses on genetic mapping and DNA sequencing. Selected papers are indexed separately for inclusion in the Energy Science and Technology Database.

  2. Apollo: a sequence annotation editor

    PubMed Central

    Lewis, SE; Searle, SMJ; Harris, N; Gibson, M; Iyer, V; Richter, J; Wiel, C; Bayraktaroglu, L; Birney, E; Crosby, MA; Kaminker, JS; Matthews, BB; Prochnik, SE; Smith, CD; Tupy, JL; Rubin, GM; Misra, S; Mungall, CJ; Clamp, ME

    2002-01-01

    The well-established inaccuracy of purely computational methods for annotating genome sequences necessitates an interactive tool to allow biological experts to refine these approximations by viewing and independently evaluating the data supporting each annotation. Apollo was developed to meet this need, enabling curators to inspect genome annotations closely and edit them. FlyBase biologists successfully used Apollo to annotate the Drosophila melanogaster genome and it is increasingly being used as a starting point for the development of customized annotation editing tools for other genome projects. PMID:12537571

  3. The PIR-International Protein Sequence Database.

    PubMed Central

    George, D G; Barker, W C; Mewes, H W; Pfeiffer, F; Tsugita, A

    1994-01-01

    PIR-International is an association of macromolecular sequence data collection centers dedicated to fostering international cooperation as an essential element in the development of scientific databases. A major objective of PIR-International is to continue the development of the Protein Sequence Database as an essential public resource for protein sequence information. This paper briefly describes the architecture of the Protein Sequence Database and how it and associated data sets are distributed and can be accessed electronically. PMID:7937060

  4. Sequences of Rational Numbers Converging to Surds

    ERIC Educational Resources Information Center

    Fletcher, Rodney

    2010-01-01

    In this sequence 1/1, 7/5, 41/29, 239/169 and so on, Thomas notes that the sequence converges to square root of 2. By observation, the sequence of numbers in the numerator of the above sequence, have a pattern of generation which is the same as that in the denominator. That is, the next term is found by multiplying the previous term by six and…

  5. Sequencing crop genomes: approaches and applications

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Plant genome sequencing methodology parrallels the sequencing of the human genome. The first projects were slow and very expensive. BAC by BAC approaches were utilized first and whole-genome shotgun sequencing rapidly replaced that approach. So called 'next generation' technologies such as short rea...

  6. Incidental Sequence Learning across the Lifespan

    ERIC Educational Resources Information Center

    Weiermann, Brigitte; Meier, Beat

    2012-01-01

    The purpose of the present study was to investigate incidental sequence learning across the lifespan. We tested 50 children (aged 7-16), 50 young adults (aged 20-30), and 50 older adults (aged >65) with a sequence learning paradigm that involved both a task and a response sequence. After several blocks of practice, all age groups slowed down…

  7. The recurrence sequence via the Fibonacci groups

    NASA Astrophysics Data System (ADS)

    Aküzüm, Yeşim; Deveci, Ömür

    2016-04-01

    This work develops properties of the recurrence sequence defined by the aid of the relation matrix of the Fibonacci groups. The study of this sequence modulo m yields cyclic groups and semigroups from generating matrix. Finally, we extend the sequence defined to groups and then, we obtain its period in the Fibonacci groups.

  8. Sequencing and mapping of the onion genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The cost of DNA sequencing continues to decline and, in the near future, it will become reasonable to undertake sequencing of the enormous nuclear genome of onion. We undertook sequencing of expressed and genomic regions of the onion genome to learn about the structure of the onion genome, as well a...

  9. PacBio Sequencing and Its Applications

    PubMed Central

    Rhoads, Anthony; Au, Kin Fai

    2015-01-01

    Single-molecule, real-time sequencing developed by Pacific BioSciences offers longer read lengths than the second-generation sequencing (SGS) technologies, making it well-suited for unsolved problems in genome, transcriptome, and epigenetics research. The highly-contiguous de novo assemblies using PacBio sequencing can close gaps in current reference assemblies and characterize structural variation (SV) in personal genomes. With longer reads, we can sequence through extended repetitive regions and detect mutations, many of which are associated with diseases. Moreover, PacBio transcriptome sequencing is advantageous for the identification of gene isoforms and facilitates reliable discoveries of novel genes and novel isoforms of annotated genes, due to its ability to sequence full-length transcripts or fragments with significant lengths. Additionally, PacBio’s sequencing technique provides information that is useful for the direct detection of base modifications, such as methylation. In addition to using PacBio sequencing alone, many hybrid sequencing strategies have been developed to make use of more accurate short reads in conjunction with PacBio long reads. In general, hybrid sequencing strategies are more affordable and scalable especially for small-size laboratories than using PacBio Sequencing alone. The advent of PacBio sequencing has made available much information that could not be obtained via SGS alone. PMID:26542840

  10. Artificial sequences and complexity measures

    NASA Astrophysics Data System (ADS)

    Baronchelli, Andrea; Caglioti, Emanuele; Loreto, Vittorio

    2005-04-01

    In this paper we exploit concepts of information theory to address the fundamental problem of identifying and defining the most suitable tools for extracting, in a automatic and agnostic way, information from a generic string of characters. We introduce in particular a class of methods which use in a crucial way data compression techniques in order to define a measure of remoteness and distance between pairs of sequences of characters (e.g. texts) based on their relative information content. We also discuss in detail how specific features of data compression techniques could be used to introduce the notion of dictionary of a given sequence and of artificial text and we show how these new tools can be used for information extraction purposes. We point out the versatility and generality of our method that applies to any kind of corpora of character strings independently of the type of coding behind them. We consider as a case study linguistic motivated problems and we present results for automatic language recognition, authorship attribution and self-consistent classification.

  11. Not All Sequence Tags Are Created Equal: Designing and Validating Sequence Identification Tags Robust to Indels

    PubMed Central

    Faircloth, Brant C.; Glenn, Travis C.

    2012-01-01

    Ligating adapters with unique synthetic oligonucleotide sequences (sequence tags) onto individual DNA samples before massively parallel sequencing is a popular and efficient way to obtain sequence data from many individual samples. Tag sequences should be numerous and sufficiently different to ensure sequencing, replication, and oligonucleotide synthesis errors do not cause tags to be unrecoverable or confused. However, many design approaches only protect against substitution errors during sequencing and extant tag sets contain too few tag sequences. We developed an open-source software package to validate sequence tags for conformance to two distance metrics and design sequence tags robust to indel and substitution errors. We use this software package to evaluate several commercial and non-commercial sequence tag sets, design several large sets (maxcount = 7,198) of edit metric sequence tags having different lengths and degrees of error correction, and integrate a subset of these edit metric tags to polymerase chain reaction (PCR) primers and sequencing adapters. We validate a subset of these edit metric tagged PCR primers and sequencing adapters by sequencing on several platforms and subsequent comparison to commercially available alternatives. We find that several commonly used sets of sequence tags or design methodologies used to produce sequence tags do not meet the minimum expectations of their underlying distance metric, and we find that PCR primers and sequencing adapters incorporating edit metric sequence tags designed by our software package perform as well as their commercial counterparts. We suggest that researchers evaluate sequence tags prior to use or evaluate tags that they have been using. The sequence tag sets we design improve on extant sets because they are large, valid across the set, and robust to the suite of substitution, insertion, and deletion errors affecting massively parallel sequencing workflows on all currently used platforms

  12. Chip-based sequencing nucleic acids

    DOEpatents

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  13. Distinguishing Proteins From Arbitrary Amino Acid Sequences

    PubMed Central

    Yau, Stephen S.-T.; Mao, Wei-Guang; Benson, Max; He, Rong Lucy

    2015-01-01

    What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid sequence can be a protein. Here we show that when the collection of arbitrary amino acid sequences is viewed in an appropriate geometric context, the protein sequences cluster together. This leads to a new computational test, described here, that has proved to be remarkably accurate at determining whether an arbitrary amino acid sequence can be a protein. Even more, if the results of this test indicate that the sequence can be a protein, and it is indeed a protein sequence, then its identity as a protein sequence is uniquely defined. We anticipate our computational test will be useful for those who are attempting to complete the job of discovering all proteins, or constructing the protein universe. PMID:25609314

  14. Genotator: a workbench for sequence annotation.

    PubMed

    Harris, N L

    1997-07-01

    Sequencing centers such as the Human Genome Center at LBNL are producing an ever-increasing flood of genetic data. Annotation can greatly enhance the biological value of these sequences. Useful annotations include possible gene locations, homologies to known genes, and gene signal such as promoters and splice sites. Genotator is a workbench for automated sequence annotation and annotation browsing. The back end runs a series of sequence analysis tools on a DNA sequence, handling the various input and output formats required by the tools. Genotator currently runs five different gene-finding programs, three homology searches, and searches for promoters, splice sites, and ORFs. The results of the analyses run by Genotator can be viewed with the interactive graphical browser. The browser displays color-coded sequence annotations on a canvas that can be scrolled and zoomed, allowing the annotated sequence to be explored at multiple levels of detail. The user can view the actual DNA sequence in a separate window; when a region is selected in the map display, it is highlighted automatically in the sequence display, and vice versa. By displaying the output of all of the sequence analyses, Genotator provides an intuitive way to identify the significant regions (for example, probable exons) in a sequence. Users can interactively add personal annotations to label regions of interest. Additional capabilities of Genotator include primer design and pattern searching. PMID:9253604

  15. Experimental investigation of an RNA sequence space

    NASA Technical Reports Server (NTRS)

    Lee, Youn-Hyung; Dsouza, Lisa; Fox, George E.

    1993-01-01

    Modern rRNAs are the historic consequence of an ongoing evolutionary exploration of a sequence space. These extant sequences belong to a special subset of the sequence space that is comprised only of those primary sequences that can validly perform the biological function(s) required of the particular RNA. If it were possible to readily identify all such valid sequences, stochastic predictions could be made about the relative likelihood of various evolutionary pathways available to an RNA. Herein an experimental system which can assess whether a particular sequence is likely to have validity as a eubacterial 5S rRNA is described. A total of ten naturally occurring, and hence known to be valid, sequences and two point mutants of unknown validity were used to test the usefulness of the approach. Nine of the ten valid sequences tested positive whereas both mutants tested as clearly defective. The tenth valid sequence gave results that would be interpreted as reflecting a borderline status were the answer not known. These results demonstrate that it is possible to experimentally determine which sequences in local regions of the sequence space are potentially valid 5S rRNAs.

  16. The evolution of the Voyager mission sequence software and trends for future mission sequence software systems

    NASA Technical Reports Server (NTRS)

    Brooks, Robert N., Jr.

    1988-01-01

    The historical background of the spacecraft sequence generation process as it is represented by the Voyager mission to the outer planets is discussed. Present plans for future sequencing methods are examined, including the emphasis on cutting costs and the contrast between the centralized and distributed systems for sequencing. The use of artificial intelligence in mission sequencing is addressed.

  17. RNAome sequencing delineates the complete RNA landscape.

    PubMed

    Derks, Kasper W J; Pothof, Joris

    2015-09-01

    Standard RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species. For example, small and large RNAs from the same sample cannot be sequenced in a single sequence run. We designed RNAome sequencing, which is a strand-specific method to determine the expression of small and large RNAs from ribosomal RNA-depleted total RNA in a single sequence run. RNAome sequencing quantitatively preserves all RNA classes. This characteristic allows comparisons between RNA classes, thereby facilitating relationships between different RNA classes. Here, we describe in detail the experimental procedure associated with RNAome sequencing published by Derks and colleagues in RNA Biology (2015) [1]. We also provide the R code for the developed Total Rna Analysis Pipeline (TRAP), an algorithm to analyze RNAome sequencing datasets (deposited at the Gene Expression Omnibus data repository, accession number GSE48084). PMID:26484291

  18. Should the draft chimpanzee sequence be finished?

    PubMed

    Taudien, Stefan; Ebersberger, Ingo; Glöckner, Gernot; Platzer, Matthias

    2006-03-01

    Owing to the availability of genome working drafts (WDs), current comparative-sequence studies are frequently performed on a genome-wide scale. In this article, we appraise the utility of WD sequences in the detection of genomic differences in closely related species. We compared human DNA sequences with draft and high-quality versions of the corresponding chimpanzee loci to reveal the overall high quality of the chimp WD sequence. Nevertheless, a significant proportion of the differences between WD and high-quality sequences we observed can be attributed to sequencing errors in the draft. Although we suggest methods to reduce the number of such false positives efficiently, our study emphasizes the benefit expected from finishing the chimpanzee genome sequence. PMID:16406850

  19. Atypical regions in large genomic DNA sequences

    SciTech Connect

    Scherer, S. |; McPeek, M.S.; Speed, T.P.

    1994-07-19

    Large genomic DNA sequences contain regions with distinctive patterns of sequence organization. The authors describe a method using logarithms of probabilities based on seventh-order Markov chains to rapidly identify genomic sequences that do not resemble models of genome organization built from compilations of octanucleotide usage. Data bases have been constructed from Escherichia coli and Saccharomyces cerevisiae DNA sequences of >1000 nt and human sequences of >10,000 nt. Atypical genes and clusters of genes have been located in bacteriophage, yeast, and primate DNA sequences. The authors consider criteria for statistical significance of the results, offer possible explanations for the observed variation in genome organization, and give additional applications of these methods in DNA sequence analysis.

  20. De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data

    PubMed Central

    DiGuistini, Scott; Liao, Nancy Y; Platt, Darren; Robertson, Gordon; Seidel, Michael; Chan, Simon K; Docking, T Roderick; Birol, Inanc; Holt, Robert A; Hirst, Martin; Mardis, Elaine; Marra, Marco A; Hamelin, Richard C; Bohlmann, Jörg; Breuil, Colette; Jones, Steven JM

    2009-01-01

    Sequencing-by-synthesis technologies can reduce the cost of generating de novo genome assemblies. We report a method for assembling draft genome sequences of eukaryotic organisms that integrates sequence information from different sources, and demonstrate its effectiveness by assembling an approximately 32.5 Mb draft genome sequence for the forest pathogen Grosmannia clavigera, an ascomycete fungus. We also developed a method for assessing draft assemblies using Illumina paired end read data and demonstrate how we are using it to guide future sequence finishing. Our results demonstrate that eukaryotic genome sequences can be accurately assembled by combining Illumina, 454 and Sanger sequence data. PMID:19747388

  1. Feedback shift register sequences versus uniformly distributed random sequences for correlation chromatography

    NASA Technical Reports Server (NTRS)

    Kaljurand, M.; Valentin, J. R.; Shao, M.

    1996-01-01

    Two alternative input sequences are commonly employed in correlation chromatography (CC). They are sequences derived according to the algorithm of the feedback shift register (i.e., pseudo random binary sequences (PRBS)) and sequences derived by using the uniform random binary sequences (URBS). These two sequences are compared. By applying the "cleaning" data processing technique to the correlograms that result from these sequences, we show that when the PRBS is used the S/N of the correlogram is much higher than the one resulting from using URBS.

  2. Secondary-task effects on sequence learning.

    PubMed

    Heuer, H; Schmidtke, V

    1996-01-01

    With a repeated sequence of stimuli, performance in a serial reaction-time task improves more than with a random sequence. The difference has been taken as a measure of implicit sequence learning. Implicit sequence learning is impaired when a secondary task is added to the serial RT task. In the first experiment, secondary-task effects on different types of sequences were studied to test the hypothesis that the learning of unique sequences (where each sequence element has a unique relation to the following one) is not impaired by the secondary task, while the learning of ambiguous sequences is. The sequences were random up to a certain order of sequential dependencies, where they became deterministic. Contrary to the hypothesis, secondary-task effects on the learning of unique sequences were as strong or stronger than such effects on the learning of ambiguous sequences. In the second experiment a hybrid sequence (with unique as well as ambiguous transitions) was used with different secondary tasks. A visuo-spatial and a verbal memory task did not interfere with the learning of the sequence, but interference was observed with an auditory go/no-go task in which high- and low-pitched tones were presented after each manual response and a foot pedal had to be pressed in response to high-pitched tones. Thus, interference seems to be specific to certain secondary tasks and may be related to memory processes (but most likely not to visuo-spatial and verbal memory) or to the organizations of sequences, consistent with previous suggestions. PMID:8810586

  3. Automatic Sequencing for Experimental Protocols

    NASA Astrophysics Data System (ADS)

    Hsieh, Paul F.; Stern, Ivan

    We present a paradigm and implementation of a system for the specification of the experimental protocols to be used for the calibration of AXAF mirrors. For the mirror calibration, several thousand individual measurements need to be defined. For each measurement, over one hundred parameters need to be tabulated for the facility test conductor and several hundred instrument parameters need to be set. We provide a high level protocol language which allows for a tractable representation of the measurement protocol. We present a procedure dispatcher which automatically sequences a protocol more accurately and more rapidly than is possible by an unassisted human operator. We also present back-end tools to generate printed procedure manuals and database tables required for review by the AXAF program. This paradigm has been tested and refined in the calibration of detectors to be used in mirror calibration.

  4. Replacement Sequence of Events Generator

    NASA Technical Reports Server (NTRS)

    Fisher, Forest; Gladden, Daniel Wenkert Roy; Khanampompan, Teerpat

    2008-01-01

    The soeWINDOW program automates the generation of an ITAR (International Traffic in Arms Regulations)-compliant sub-RSOE (Replacement Sequence of Events) by extracting a specified temporal window from an RSOE while maintaining page header information. RSOEs contain a significant amount of information that is not ITAR-compliant, yet that foreign partners need to see for command details to their instrument, as well as the surrounding commands that provide context for validation. soeWINDOW can serve as an example of how command support products can be made ITAR-compliant for future missions. This software is a Perl script intended for use in the mission operations UNIX environment. It is designed for use to support the MRO (Mars Reconnaissance Orbiter) instrument team. The tool also provides automated DOM (Distributed Object Manager) storage into the special ITAR-okay DOM collection, and can be used for creating focused RSOEs for product review by any of the MRO teams.

  5. Particle sizer and DNA sequencer

    DOEpatents

    Olivares, Jose A.; Stark, Peter C.

    2005-09-13

    An electrophoretic device separates and detects particles such as DNA fragments, proteins, and the like. The device has a capillary which is coated with a coating with a low refractive index such as Teflon.RTM. AF. A sample of particles is fluorescently labeled and injected into the capillary. The capillary is filled with an electrolyte buffer solution. An electrical field is applied across the capillary causing the particles to migrate from a first end of the capillary to a second end of the capillary. A detector light beam is then scanned along the length of the capillary to detect the location of the separated particles. The device is amenable to a high throughput system by providing additional capillaries. The device can also be used to determine the actual size of the particles and for DNA sequencing.

  6. Dynamic Denoising of Tracking Sequences

    PubMed Central

    Michailovich, Oleg; Tannenbaum, Allen

    2009-01-01

    In this paper, we describe an approach to the problem of simultaneously enhancing image sequences and tracking the objects of interest represented by the latter. The enhancement part of the algorithm is based on Bayesian wavelet denoising, which has been chosen due to its exceptional ability to incorporate diverse a priori information into the process of image recovery. In particular, we demonstrate that, in dynamic settings, useful statistical priors can come both from some reasonable assumptions on the properties of the image to be enhanced as well as from the images that have already been observed before the current scene. Using such priors forms the main contribution of the present paper which is the proposal of the dynamic denoising as a tool for simultaneously enhancing and tracking image sequences. Within the proposed framework, the previous observations of a dynamic scene are employed to enhance its present observation. The mechanism that allows the fusion of the information within successive image frames is Bayesian estimation, while transferring the useful information between the images is governed by a Kalman filter that is used for both prediction and estimation of the dynamics of tracked objects. Therefore, in this methodology, the processes of target tracking and image enhancement “collaborate” in an interlacing manner, rather than being applied separately. The dynamic denoising is demonstrated on several examples of SAR imagery. The results demonstrated in this paper indicate a number of advantages of the proposed dynamic denoising over “static” approaches, in which the tracking images are enhanced independently of each other. PMID:18482881

  7. Lygus hesperus polygalacturonase Characterization and Role in Plant Damage

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The amino terminus, of a Lygus hesperus salivary gland protein revealing polygalacturonase (PG) activity in an SDS-PAGE activity gel assay, has been sequenced via Edman degradation. The N-terminal amino acid sequence shares homology with the predicted amino acid sequence for putative L. lineolaris P...

  8. Diotic and dichotic discrimination of binary sequences

    NASA Astrophysics Data System (ADS)

    Sheft, Stanley; Yost, William A.; Dye, Raymond H.

    2005-04-01

    Binary-sequence discrimination was compared for diotic and dichotic stimuli. Sequences consisted of 4 to 32 wideband-noise pulses with pulse duration ranging from 8 to 32 ms. Diotic sequences were distinguished by pulse-amplitude pattern, while dichotic patterns differed by their sequence of ear of presentation. Discrimination was measured as a function of the number of pattern elements that differed between the standard and comparison sequences with temporal location of the altered pulses randomly selected on each trial. Additional fringe pulses bracketed the target sequences to avoid onset and offset cuing. Neither diotic nor dichotic performance was monotonic with the ratio of the number of altered to sequence pulses, with greater exception noted in the dichotic results. Except at the shortest pulse duration, diotic performance was significantly better than that obtained in the dichotic condition with similar pulse duration and numbers of altered and sequence pulses. For the range of stimulus parameters used, sequence discrimination often relied on a global percept rather than processing of individual pulse attributes with timbre differences cuing diotic discrimination. Though exhibiting fine resolution, results suggest poorer ability of the binaural than monaural system at extracting a global percept to cue sequence discrimination. [Work supported by NIDCD.

  9. Randomness in Sequence Evolution Increases over Time

    PubMed Central

    Wang, Guangyu; Sun, Shixiang; Zhang, Zhang

    2016-01-01

    The second law of thermodynamics states that entropy, as a measure of randomness in a system, increases over time. Although studies have investigated biological sequence randomness from different aspects, it remains unknown whether sequence randomness changes over time and whether this change consists with the second law of thermodynamics. To capture the dynamics of randomness in molecular sequence evolution, here we detect sequence randomness based on a collection of eight statistical random tests and investigate the randomness variation of coding sequences with an application to Escherichia coli. Given that core/essential genes are more ancient than specific/non-essential genes, our results clearly show that core/essential genes are more random than specific/non-essential genes and accordingly indicate that sequence randomness indeed increases over time, consistent well with the second law of thermodynamics. We further find that an increase in sequence randomness leads to increasing randomness of GC content and longer sequence length. Taken together, our study presents an important finding, for the first time, that sequence randomness increases over time, which may provide profound insights for unveiling the underlying mechanisms of molecular sequence evolution. PMID:27224236

  10. Deciphering the RNA landscape by RNAome sequencing.

    PubMed

    Derks, Kasper W J; Misovic, Branislav; van den Hout, Mirjam C G N; Kockx, Christel E M; Gomez, Cesar Payan; Brouwer, Rutger W W; Vrieling, Harry; Hoeijmakers, Jan H J; van IJcken, Wilfred F J; Pothof, Joris

    2015-01-01

    Current RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species in an unperturbed manner. We report strand-specific RNAome sequencing that determines expression of small and large RNAs from rRNA-depleted total RNA in a single sequence run. Since current analysis pipelines cannot reliably analyze small and large RNAs simultaneously, we developed TRAP, Total Rna Analysis Pipeline, a robust interface that is also compatible with existing RNA sequencing protocols. RNAome sequencing quantitatively preserved all RNA classes, allowing cross-class comparisons that facilitates the identification of relationships between different RNA classes. We demonstrate the strength of RNAome sequencing in mouse embryonic stem cells treated with cisplatin. MicroRNA and mRNA expression in RNAome sequencing significantly correlated between replicates and was in concordance with both existing RNA sequencing methods and gene expression arrays generated from the same samples. Moreover, RNAome sequencing also detected additional RNA classes such as enhancer RNAs, anti-sense RNAs, novel RNA species and numerous differentially expressed RNAs undetectable by other methods. At the level of complete RNA classes, RNAome sequencing also identified a specific global repression of the microRNA and microRNA isoform classes after cisplatin treatment whereas all other classes such as mRNAs were unchanged. These characteristics of RNAome sequencing will significantly improve expression analysis as well as studies on RNA biology not covered by existing methods. PMID:25826412

  11. Variable copy number DNA sequences in rice.

    PubMed

    Kikuchi, S; Takaiwa, F; Oono, K

    1987-12-01

    We have cloned two types of variable copy number DNA sequences from the rice embryo genome. One of these sequences, which was cloned in pRB301, was amplified about 50-fold during callus formation and diminished in copy number to the embryonic level during regeneration. The other clone, named pRB401, showed the reciprocal pattern. The copy numbers of both sequences were changed even in the early developmental stage and eliminated from nuclear DNA along with growth of the plant. Sequencing analysis of the pRB301 insert revealed some open reading frames and direct repeat structures, but corresponding sequences were not identified in the EMBL and LASL DNA databases. Sequencing of the nuclear genomic fragment cloned in pRB401 revealed the presence of the 3'rps12-rps7 region of rice chloroplast DNA. Our observations suggest that during callus formation (dedifferentiation), regeneration and the growth process the copy numbers of some DNA sequences are variable and that nuclear integrated chloroplast DNA acts as a variable copy number sequence in the rice genome. Based on data showing a common sequence in mitochondria and chloroplast DNA of maize (Stern and Lonsdale 1982) and that the rps12 gene of tobacco chloroplast DNA is a divided gene (Torazawa et al. 1986), it is suggested that the sequence on the inverted repeat structure of chloroplast DNA may have the character of a movable genetic element. PMID:3481021

  12. Randomness in Sequence Evolution Increases over Time.

    PubMed

    Wang, Guangyu; Sun, Shixiang; Zhang, Zhang

    2016-01-01

    The second law of thermodynamics states that entropy, as a measure of randomness in a system, increases over time. Although studies have investigated biological sequence randomness from different aspects, it remains unknown whether sequence randomness changes over time and whether this change consists with the second law of thermodynamics. To capture the dynamics of randomness in molecular sequence evolution, here we detect sequence randomness based on a collection of eight statistical random tests and investigate the randomness variation of coding sequences with an application to Escherichia coli. Given that core/essential genes are more ancient than specific/non-essential genes, our results clearly show that core/essential genes are more random than specific/non-essential genes and accordingly indicate that sequence randomness indeed increases over time, consistent well with the second law of thermodynamics. We further find that an increase in sequence randomness leads to increasing randomness of GC content and longer sequence length. Taken together, our study presents an important finding, for the first time, that sequence randomness increases over time, which may provide profound insights for unveiling the underlying mechanisms of molecular sequence evolution. PMID:27224236

  13. Deciphering the RNA landscape by RNAome sequencing

    PubMed Central

    Derks, Kasper WJ; Misovic, Branislav; van den Hout, Mirjam CGN; Kockx, Christel EM; Payan Gomez, Cesar; Brouwer, Rutger WW; Vrieling, Harry; Hoeijmakers, Jan HJ; van IJcken, Wilfred FJ; Pothof, Joris

    2015-01-01

    Current RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species in an unperturbed manner. We report strand-specific RNAome sequencing that determines expression of small and large RNAs from rRNA-depleted total RNA in a single sequence run. Since current analysis pipelines cannot reliably analyze small and large RNAs simultaneously, we developed TRAP, Total Rna Analysis Pipeline, a robust interface that is also compatible with existing RNA sequencing protocols. RNAome sequencing quantitatively preserved all RNA classes, allowing cross-class comparisons that facilitates the identification of relationships between different RNA classes. We demonstrate the strength of RNAome sequencing in mouse embryonic stem cells treated with cisplatin. MicroRNA and mRNA expression in RNAome sequencing significantly correlated between replicates and was in concordance with both existing RNA sequencing methods and gene expression arrays generated from the same samples. Moreover, RNAome sequencing also detected additional RNA classes such as enhancer RNAs, anti-sense RNAs, novel RNA species and numerous differentially expressed RNAs undetectable by other methods. At the level of complete RNA classes, RNAome sequencing also identified a specific global repression of the microRNA and microRNA isoform classes after cisplatin treatment whereas all other classes such as mRNAs were unchanged. These characteristics of RNAome sequencing will significantly improve expression analysis as well as studies on RNA biology not covered by existing methods. PMID:25826412

  14. The expanding scope of DNA sequencing

    PubMed Central

    Shendure, Jay; Aiden, Erez Lieberman

    2014-01-01

    In just seven years, next-generation technologies have reduced the cost and increased the speed of DNA sequencing by four orders of magnitude, and experiments requiring many millions of sequencing reads are now routine. In research, sequencing is being applied not only to assemble genomes and to investigate the genetic basis of human disease, but also to explore myriad phenomena in organismic and cellular biology. In the clinic, the utility of sequence data is being intensively evaluated in diverse contexts, including reproductive medicine, oncology and infectious disease. A recurrent theme in the development of new sequencing applications is the creative ‘recombination’ of existing experimental building blocks. However, there remain many potentially high-impact applications of next-generation DNA sequencing that are not yet fully realized. PMID:23138308

  15. Data management for re-sequencing DNA

    SciTech Connect

    Ying Jiahsu; Gilson, H.; Long, K.; Gibbs, R.A.

    1993-12-31

    The human genome project has greatly stimulated the advancement of techniques to sequence large fragments of DNA. The development of improved molecular methods has also simplified the process of comparing shorter, homologous DNA sequences from different individuals and species. This process of `re-sequencing` DNA has applications in medical genetics, in evolutionary studies, and for the identification of complex molecular variation that may explain multifactorial traits. Intrinsic differences in the processes of `sequencing` and `re-sequencing` suggest new requirements for data management tools. A data management scheme for a `re-sequencing` project is demonstrated using the Virtual Notebook System, a flexible multi-user tool designed as a metaphor of the laboratory notebook.

  16. Catalog of PRA dominant accident sequence information

    SciTech Connect

    Cathey, N.G.; Krantz, E.A.; Poloski, J.P.; Shapiro, B.J.

    1985-07-01

    Information concerning the dominant accident sequences from twelve published probabilistic risk assessments (PRA) is cataloged in this report, which is published as a part of the Accident Sequence Evaluation Program (ASEP). The purpose of this report is to provide users of PRA information a single reference document. The cataloged results include plant operation information, core-melt frequency, event tree models, dominant factors affecting core-melt and sequence frequencies, and a description of each dominant accident sequence. The report provides a consistent set of insights on the factors that drive the dominant accident sequences. ASEP has reconstructed the PRA fault tree models at the system or train level of detail and requantified the sequence likelihoods to provide the consistent insights. This work provides the information for the other ASEP activities on accident likelihood assessment for the operating and near-term operating plants.

  17. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    States, David J.

    2004-07-28

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  18. Evolutionarily conserved sequences on human chromosome 21

    SciTech Connect

    Frazer, Kelly A.; Sheehan, John B.; Stokowski, Renee P.; Chen, Xiyin; Hosseini, Roya; Cheng, Jan-Fang; Fodor, Stephen P.A.; Cox, David R.; Patil, Nila

    2001-09-01

    Comparison of human sequences with the DNA of other mammals is an excellent means of identifying functional elements in the human genome. Here we describe the utility of high-density oligonucleotide arrays as a rapid approach for comparing human sequences with the DNA of multiple species whose sequences are not presently available. High-density arrays representing approximately 22.5 Mb of nonrepetitive human chromosome 21 sequence were synthesized and then hybridized with mouse and dog DNA to identify sequences conserved between humans and mice (human-mouse elements) and between humans and dogs (human-dog elements). Our data show that sequence comparison of multiple species provides a powerful empiric method for identifying actively conserved elements in the human genome. A large fraction of these evolutionarily conserved elements are present in regions on chromosome 21 that do not encode known genes.

  19. Sequencing Intractable DNA to Close Microbial Genomes

    SciTech Connect

    Hurt, Jr., Richard Ashley; Brown, Steven D; Podar, Mircea; Palumbo, Anthony Vito; Elias, Dwayne A

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  20. Specific heat spectra for quasiperiodic ladder sequences

    NASA Astrophysics Data System (ADS)

    Moreira, D. A.; Albuquerque, E. L.; Bezerra, C. G.

    2006-12-01

    We performed a theoretical study of the specific heat C(T) as a function of the temperature for double-strand quasiperiodic sequences. To mimic DNA molecules, the sequences are made up from the nucleotides guanine G, adenine A, cytosine C and thymine T, arranged according to the Fibonacci and Rudin-Shapiro quasiperiodic sequences. The energy spectra are calculated using the two-dimensional Schrödinger equation, in a tight-binding approximation, with the on-site energy exhibiting long-range disorder and non-random hopping amplitudes. We compare the specific heat features of these quasiperiodic artificial sequences to the spectra considering a segment of the first sequenced human chromosome 22 (Ch22), a real genomic DNA sequence.

  1. Novel bioinformatic developments for exome sequencing.

    PubMed

    Lelieveld, Stefan H; Veltman, Joris A; Gilissen, Christian

    2016-06-01

    With the widespread adoption of next generation sequencing technologies by the genetics community and the rapid decrease in costs per base, exome sequencing has become a standard within the repertoire of genetic experiments for both research and diagnostics. Although bioinformatics now offers standard solutions for the analysis of exome sequencing data, many challenges still remain; especially the increasing scale at which exome data are now being generated has given rise to novel challenges in how to efficiently store, analyze and interpret exome data of this magnitude. In this review we discuss some of the recent developments in bioinformatics for exome sequencing and the directions that this is taking us to. With these developments, exome sequencing is paving the way for the next big challenge, the application of whole genome sequencing. PMID:27075447

  2. Long-range correlations in nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Peng, C.-K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Sciortino, F.; Simons, M.; Stanley, H. E.

    1992-03-01

    DNA SEQUENCES have been analysed using models, such as an it-step Markov chain, that incorporate the possibility of short-range nucleotide correlations1. We propose here a method for studying the stochastic properties of nucleotide sequences by constructing a 1:1 map of the nucleotide sequence onto a walk, which we term a 'DNA walk'. We then use the mapping to provide a quantitative measure of the correlation between nucleotides over long distances along the DNA chain. Thus we uncover in the nucleotide sequence a remarkably long-range power law correlation that implies a new scale-invariant property of DNA. We find such long-range correlations in intron-containing genes and in nontranscribed regulatory DNA sequences, but not in complementary DNA sequences or intron-less genes.

  3. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    David J. States

    1998-08-01

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  4. Comparison of Next-Generation Sequencing Systems

    PubMed Central

    Liu, Lin; Li, Yinhu; Li, Siliang; Hu, Ni; He, Yimin; Pong, Ray; Lin, Danni; Lu, Lihua; Law, Maggie

    2012-01-01

    With fast development and wide applications of next-generation sequencing (NGS) technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI), which possesses the world's biggest sequencing capacity, has multiple NGS systems including 137 HiSeq 2000, 27 SOLiD, one Ion Torrent PGM, one MiSeq, and one 454 sequencer. We have accumulated extensive experience in sample handling, sequencing, and bioinformatics analysis. In this paper, technologies of these systems are reviewed, and first-hand data from extensive experience is summarized and analyzed to discuss the advantages and specifics associated with each sequencing system. At last, applications of NGS are summarized. PMID:22829749

  5. A window into third-generation sequencing.

    PubMed

    Schadt, Eric E; Turner, Steve; Kasarskis, Andrew

    2010-10-15

    First- and second-generation sequencing technologies have led the way in revolutionizing the field of genomics and beyond, motivating an astonishing number of scientific advances, including enabling a more complete understanding of whole genome sequences and the information encoded therein, a more complete characterization of the methylome and transcriptome and a better understanding of interactions between proteins and DNA. Nevertheless, there are sequencing applications and aspects of genome biology that are presently beyond the reach of current sequencing technologies, leaving fertile ground for additional innovation in this space. In this review, we describe a new generation of single-molecule sequencing technologies (third-generation sequencing) that is emerging to fill this space, with the potential for dramatically longer read lengths, shorter time to result and lower overall cost. PMID:20858600

  6. The genome sequence of parrot bornavirus 5.

    PubMed

    Guo, Jianhua; Tizard, Ian

    2015-12-01

    Although several new avian bornaviruses have recently been described, information on their evolution, virulence, and sequence are often limited. Here we report the complete genome sequence of parrot bornavirus 5 (PaBV-5) isolated from a case of proventricular dilatation disease in a Palm cockatoo (Probosciger aterrimus). The complete genome consists of 8842 nucleotides with distinct 5' and 3' end sequences. This virus shares nucleotide sequence identities of 69-74 % with other bornaviruses in the genomic regions excluding the 5' and 3' terminal sequences. Phylogenetic analysis based on the genomic regions demonstrated this new isolate is an isolated branch within the clade that includes the aquatic bird bornaviruses and the passerine bornaviruses. Based on phylogenetic analyses and its low nucleotide sequence identities with other bornavirus, we support the proposal that PaBV-5 be assigned to a new bornavirus species:- Psittaciform 2 bornavirus. PMID:26403158

  7. Nanopore DNA sequencing with MspA.

    PubMed

    Derrington, Ian M; Butler, Tom Z; Collins, Marcus D; Manrao, Elizabeth; Pavlenok, Mikhail; Niederweis, Michael; Gundlach, Jens H

    2010-09-14

    Nanopore sequencing has the potential to become a direct, fast, and inexpensive DNA sequencing technology. The simplest form of nanopore DNA sequencing utilizes the hypothesis that individual nucleotides of single-stranded DNA passing through a nanopore will uniquely modulate an ionic current flowing through the pore, allowing the record of the current to yield the DNA sequence. We demonstrate that the ionic current through the engineered Mycobacterium smegmatis porin A, MspA, has the ability to distinguish all four DNA nucleotides and resolve single-nucleotides in single-stranded DNA when double-stranded DNA temporarily holds the nucleotides in the pore constriction. Passing DNA with a series of double-stranded sections through MspA provides proof of principle of a simple DNA sequencing method using a nanopore. These findings highlight the importance of MspA in the future of nanopore sequencing. PMID:20798343

  8. Nanopore DNA sequencing with MspA

    PubMed Central

    Derrington, Ian M.; Butler, Tom Z.; Collins, Marcus D.; Manrao, Elizabeth; Pavlenok, Mikhail; Niederweis, Michael; Gundlach, Jens H.

    2010-01-01

    Nanopore sequencing has the potential to become a direct, fast, and inexpensive DNA sequencing technology. The simplest form of nanopore DNA sequencing utilizes the hypothesis that individual nucleotides of single-stranded DNA passing through a nanopore will uniquely modulate an ionic current flowing through the pore, allowing the record of the current to yield the DNA sequence. We demonstrate that the ionic current through the engineered Mycobacterium smegmatis porin A, MspA, has the ability to distinguish all four DNA nucleotides and resolve single-nucleotides in single-stranded DNA when double-stranded DNA temporarily holds the nucleotides in the pore constriction. Passing DNA with a series of double-stranded sections through MspA provides proof of principle of a simple DNA sequencing method using a nanopore. These findings highlight the importance of MspA in the future of nanopore sequencing. PMID:20798343

  9. Mining frequent biological sequences based on bitmap without candidate sequence generation.

    PubMed

    Wang, Qian; Davis, Darryl N; Ren, Jiadong

    2016-02-01

    Biological sequences carry a lot of important genetic information of organisms. Furthermore, there is an inheritance law related to protein function and structure which is useful for applications such as disease prediction. Frequent sequence mining is a core technique for association rule discovery, but existing algorithms suffer from low efficiency or poor error rate because biological sequences differ from general sequences with more characteristics. In this paper, an algorithm for mining Frequent Biological Sequence based on Bitmap, FBSB, is proposed. FBSB uses bitmaps as the simple data structure and transforms each row into a quicksort list QS-list for sequence growth. For the continuity and accuracy requirement of biological sequence mining, tested sequences used during the mining process of FBSB are real ones instead of generated candidates, and all the frequent sequences can be mined without any errors. Comparing with other algorithms, the experimental results show that FBSB can achieve a better performance on both run time and scalability. PMID:26773937

  10. Preparing DNA Libraries for Multiplexed Paired-End Deep Sequencing for Illumina GA Sequencers

    PubMed Central

    Son, Mike S.; Taylor, Ronald K.

    2011-01-01

    Whole genome sequencing, also known as deep sequencing, is becoming a more affordable and efficient way to identify SNP mutations, deletions and insertions in DNA sequences across several different strains. Two major obstacles preventing the widespread use of deep sequencers are the costs involved in services used to prepare DNA libraries for sequencing and the overall accuracy of the sequencing data. This Unit describes the preparation of DNA libraries for multiplexed paired-end sequencing using the Illumina GA series sequencer. Self-preparation of DNA libraries can help reduce overall expenses, especially if optimization is required for the different samples, and use of the Illumina GA Sequencer can improve the quality of the data. PMID:21400673

  11. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    PubMed Central

    de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084

  12. Unlocking Short Read Sequencing for Metagenomics

    DOE PAGESBeta

    Rodrigue, Sébastien; Materna, Arne C.; Timberlake, Sonia C.; Blackburn, Matthew C.; Malmstrom, Rex R.; Alm, Eric J.; Chisholm, Sallie W.; Gilbert, Jack Anthony

    2010-07-28

    We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read.

  13. Genome Sequencing and Analysis Conference IV

    SciTech Connect

    Not Available

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  14. Synchronization of motion sequences from different sources

    NASA Astrophysics Data System (ADS)

    Skurowski, Przemysław; Pruszowski, Przemysław; Peszor, Damian

    2016-06-01

    The paper describes an algorithm for the synchronization of motion sequences derived with different motion capture (mocap) systems. The algorithm is invented for the temporal matching of the motion represented as angular orientation timeseries obtained with different mocap systems. We employed the PCA to reduce problem to a single dimension, then the algorithm comprises twofold exhaustive search allowing for the precise matching of sequences. The method was verified with both - semi synthetic and real sequences.

  15. Some properties of generalized Fibonacci sequence

    NASA Astrophysics Data System (ADS)

    Chong, Chin-Yoon; Ho, C. K.

    2015-12-01

    For all non-negative integer n and real constants a, b, p and q, the generalized Fibonacci sequence {U n } is defined by Un+2 = pUn+1 + qUn with the initial values U0 = a and U1 = b. Throughout the paper, we study some properties of the generalized Fibonacci sequence. Our results will motivate some new research problems concerning the contribution of the generalized sequence.

  16. Fractal analysis of DNA sequence data

    SciTech Connect

    Berthelsen, C.L.

    1993-01-01

    DNA sequence databases are growing at an almost exponential rate. New analysis methods are needed to extract knowledge about the organization of nucleotides from this vast amount of data. Fractal analysis is a new scientific paradigm that has been used successfully in many domains including the biological and physical sciences. Biological growth is a nonlinear dynamic process and some have suggested that to consider fractal geometry as a biological design principle may be most productive. This research is an exploratory study of the application of fractal analysis to DNA sequence data. A simple random fractal, the random walk, is used to represent DNA sequences. The fractal dimension of these walks is then estimated using the [open quote]sandbox method[close quote]. Analysis of 164 human DNA sequences compared to three types of control sequences (random, base-content matched, and dimer-content matched) reveals that long-range correlations are present in DNA that are not explained by base or dimer frequencies. The study also revealed that the fractal dimension of coding sequences was significantly lower than sequences that were primarily noncoding, indicating the presence of longer-range correlations in functional sequences. The multifractal spectrum is used to analyze fractals that are heterogeneous and have a different fractal dimension for subsets with different scalings. The multifractal spectrum of the random walks of twelve mitochondrial genome sequences was estimated. Eight vertebrate mtDNA sequences had uniformly lower spectra values than did four invertebrate mtDNA sequences. Thus, vertebrate mitochondria show significantly longer-range correlations than to invertebrate mitochondria. The higher multifractal spectra values for invertebrate mitochondria suggest a more random organization of the sequences. This research also includes considerable theoretical work on the effects of finite size, embedding dimension, and scaling ranges.

  17. Fractal Analysis of DNA Sequence Data

    NASA Astrophysics Data System (ADS)

    Berthelsen, Cheryl Lynn

    DNA sequence databases are growing at an almost exponential rate. New analysis methods are needed to extract knowledge about the organization of nucleotides from this vast amount of data. Fractal analysis is a new scientific paradigm that has been used successfully in many domains including the biological and physical sciences. Biological growth is a nonlinear dynamic process and some have suggested that to consider fractal geometry as a biological design principle may be most productive. This research is an exploratory study of the application of fractal analysis to DNA sequence data. A simple random fractal, the random walk, is used to represent DNA sequences. The fractal dimension of these walks is then estimated using the "sandbox method." Analysis of 164 human DNA sequences compared to three types of control sequences (random, base -content matched, and dimer-content matched) reveals that long-range correlations are present in DNA that are not explained by base or dimer frequencies. The study also revealed that the fractal dimension of coding sequences was significantly lower than sequences that were primarily noncoding, indicating the presence of longer-range correlations in functional sequences. The multifractal spectrum is used to analyze fractals that are heterogeneous and have a different fractal dimension for subsets with different scalings. The multifractal spectrum of the random walks of twelve mitochondrial genome sequences was estimated. Eight vertebrate mtDNA sequences had uniformly lower spectra values than did four invertebrate mtDNA sequences. Thus, vertebrate mitochondria show significantly longer-range correlations than do invertebrate mitochondria. The higher multifractal spectra values for invertebrate mitochondria suggest a more random organization of the sequences. This research also includes considerable theoretical work on the effects of finite size, embedding dimension, and scaling ranges.

  18. Multiple tag labeling method for DNA sequencing

    DOEpatents

    Mathies, R.A.; Huang, X.C.; Quesada, M.A.

    1995-07-25

    A DNA sequencing method is described which uses single lane or channel electrophoresis. Sequencing fragments are separated in the lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radioisotope labels. 5 figs.

  19. Nucleotide sequences encoding a thermostable alkaline protease

    DOEpatents

    Wilson, David B.; Lao, Guifang

    1998-01-01

    Nucleotide sequences, derived from a thermophilic actinomycete microorganism, which encode a thermostable alkaline protease are disclosed. Also disclosed are variants of the nucleotide sequences which encode a polypeptide having thermostable alkaline proteolytic activity. Recombinant thermostable alkaline protease or recombinant polypeptide may be obtained by culturing in a medium a host cell genetically engineered to contain and express a nucleotide sequence according to the present invention, and recovering the recombinant thermostable alkaline protease or recombinant polypeptide from the culture medium.

  20. REvolver: modeling sequence evolution under domain constraints.

    PubMed

    Koestler, Tina; von Haeseler, Arndt; Ebersberger, Ingo

    2012-09-01

    Simulating the change of protein sequences over time in a biologically realistic way is fundamental for a broad range of studies with a focus on evolution. It is, thus, problematic that typically simulators evolve individual sites of a sequence identically and independently. More realistic simulations are possible; however, they are often prohibited by limited knowledge concerning site-specific evolutionary constraints or functional dependencies between amino acids. As a consequence, a protein's functional and structural characteristics are rapidly lost in the course of simulated evolution. Here, we present REvolver (www.cibiv.at/software/revolver), a program that simulates protein sequence alteration such that evolutionarily stable sequence characteristics, like functional domains, are maintained. For this purpose, REvolver recruits profile hidden Markov models (pHMMs) for parameterizing site-specific models of sequence evolution in an automated fashion. pHMMs derived from alignments of homologous proteins or protein domains capture information regarding which sequence sites remained conserved over time and where in a sequence insertions or deletions are more likely to occur. Thus, they describe constraints on the evolutionary process acting on these sequences. To demonstrate the performance of REvolver as well as its applicability in large-scale simulation studies, we evolved the entire human proteome up to 1.5 expected substitutions per site. Simultaneously, we analyzed the preservation of Pfam and SMART domains in the simulated sequences over time. REvolver preserved 92% of the Pfam domains originally present in the human sequences. This value drops to 15% when traditional models of amino acid sequence evolution are used. Thus, REvolver represents a significant advance toward a realistic simulation of protein sequence evolution on a proteome-wide scale. Further, REvolver facilitates the simulation of a protein family with a user-defined domain architecture at

  1. Discrete sequence prediction and its applications

    NASA Technical Reports Server (NTRS)

    Laird, Philip

    1992-01-01

    Learning from experience to predict sequences of discrete symbols is a fundamental problem in machine learning with many applications. We apply sequence prediction using a simple and practical sequence-prediction algorithm, called TDAG. The TDAG algorithm is first tested by comparing its performance with some common data compression algorithms. Then it is adapted to the detailed requirements of dynamic program optimization, with excellent results.

  2. Nucleotide sequences encoding a thermostable alkaline protease

    DOEpatents

    Wilson, D.B.; Lao, G.

    1998-01-06

    Nucleotide sequences, derived from a thermophilic actinomycete microorganism, which encode a thermostable alkaline protease are disclosed. Also disclosed are variants of the nucleotide sequences which encode a polypeptide having thermostable alkaline proteolytic activity. Recombinant thermostable alkaline protease or recombinant polypeptide may be obtained by culturing in a medium a host cell genetically engineered to contain and express a nucleotide sequence according to the present invention, and recovering the recombinant thermostable alkaline protease or recombinant polypeptide from the culture medium. 3 figs.

  3. FRESCO: Referential compression of highly similar sequences.

    PubMed

    Wandelt, Sebastian; Leser, Ulf

    2013-01-01

    In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware. PMID:24524158

  4. Unlocking Short Read Sequencing for Metagenomics.

    SciTech Connect

    Rodrigue, S A. C.; Materna, S C; Timberlake, M C; Blacburn, R R; Malmstrom, E J. Alm; Chisholm, S W

    2010-01-01

    We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read.

  5. Multiple tag labeling method for DNA sequencing

    DOEpatents

    Mathies, Richard A.; Huang, Xiaohua C.; Quesada, Mark A.

    1995-01-01

    A DNA sequencing method described which uses single lane or channel electrophoresis. Sequencing fragments are separated in said lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radio-isotope labels.

  6. On the Origin of Sequence.

    PubMed

    Gulik, Peter T S van der

    2015-01-01

    Three aspects which make planet Earth special, and which must be taken in consideration with respect to the emergence of peptides, are the mineralogical composition, the Moon which is in the same size class, and the triple environment consisting of ocean, atmosphere, and continent. GlyGly is a remarkable peptide because it stimulates peptide bond formation in the Salt-Induced Peptide Formation reaction. The role glycine and aspartic acid play in the active site of RNA polymerase is remarkable too. GlyGly might have been the original product of coded peptide synthesis because of its importance in stimulating the production of oligopeptides with a high aspartic acid content, which protected small RNA molecules by binding Mg2+ ions. The feedback loop, which is closed by having RNA molecules producing GlyGly, is proposed as the essential element fundamental to life. Having this system running, longer sequences could evolve, gradually solving the problem of error catastrophe. The basic structure of the standard genetic code (8 fourfold degenerate codon boxes and 8 split codon boxes) is an example of the way information concerning the emergence of life is frozen in the biological constitution of organisms: the structure of the code contains historical information. PMID:26580656

  7. On the Origin of Sequence

    PubMed Central

    van der Gulik, Peter T. S.

    2015-01-01

    Three aspects which make planet Earth special, and which must be taken in consideration with respect to the emergence of peptides, are the mineralogical composition, the Moon which is in the same size class, and the triple environment consisting of ocean, atmosphere, and continent. GlyGly is a remarkable peptide because it stimulates peptide bond formation in the Salt-Induced Peptide Formation reaction. The role glycine and aspartic acid play in the active site of RNA polymerase is remarkable too. GlyGly might have been the original product of coded peptide synthesis because of its importance in stimulating the production of oligopeptides with a high aspartic acid content, which protected small RNA molecules by binding Mg2+ ions. The feedback loop, which is closed by having RNA molecules producing GlyGly, is proposed as the essential element fundamental to life. Having this system running, longer sequences could evolve, gradually solving the problem of error catastrophe. The basic structure of the standard genetic code (8 fourfold degenerate codon boxes and 8 split codon boxes) is an example of the way information concerning the emergence of life is frozen in the biological constitution of organisms: the structure of the code contains historical information. PMID:26580656

  8. Quantitative phenotyping via deep barcode sequencing.

    PubMed

    Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

    2009-10-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale. PMID:19622793

  9. Using SEQUEST with Theoretically Complete Sequence Databases

    NASA Astrophysics Data System (ADS)

    Sadygov, Rovshan G.

    2015-11-01

    SEQUEST has long been used to identify peptides/proteins from their tandem mass spectra and protein sequence databases. The algorithm has proven to be hugely successful for its sensitivity and specificity in identifying peptides/proteins, the sequences of which are present in the protein sequence databases. In this work, we report on work that attempts a new use for the algorithm by applying it to search a complete list of theoretically possible peptides, a de novo-like sequencing. We used freely available mass spectral data and determined a number of unique peptides as identified by SEQUEST. Using masses of these peptides and the mass accuracy of 0.001 Da, we have created a database of all theoretically possible peptide sequences corresponding to the precursor masses. We used our recently developed algorithm for determining all amino acid compositions corresponding to a mass interval, and used a lexicographic ordering to generate theoretical sequences from the compositions. The newly generated theoretical database was many-fold more complex than the original protein sequence database. We used SEQUEST to search and identify the best matches to the spectra from all theoretically possible peptide sequences. We found that SEQUEST cross-correlation score ranked the correct peptide match among the top sequence matches. The results testify to the high specificity of SEQUEST when combined with the high mass accuracy for intact peptides.

  10. Finding Sequences for over 270 Orphan Enzymes

    PubMed Central

    Shearer, Alexander G.; Altman, Tomer; Rhee, Christine D.

    2014-01-01

    Despite advances in sequencing technology, there are still significant numbers of well-characterized enzymatic activities for which there are no known associated sequences. These ‘orphan enzymes’ represent glaring holes in our biological understanding, and it is a top priority to reunite them with their coding sequences. Here we report a methodology for resolving orphan enzymes through a combination of database search and literature review. Using this method we were able to reconnect over 270 orphan enzymes with their corresponding sequence. This success points toward how we can systematically eliminate the remaining orphan enzymes and prevent the introduction of future orphan enzymes. PMID:24826896

  11. Repetitive sequence environment distinguishes housekeeping genes

    PubMed Central

    Eller, C. Daniel; Regelson, Moira; Merriman, Barry; Nelson, Stan; Horvath, Steve; Marahrens, York

    2007-01-01

    Housekeeping genes are expressed across a wide variety of tissues. Since repetitive sequences have been reported to influence the expression of individual genes, we employed a novel approach to determine whether housekeeping genes can be distinguished from tissue-specific genes their repetitive sequence context. We show that Alu elements are more highly concentrated around housekeeping genes while various longer (>400-bp) repetitive sequences ("repeats"), including Long Interspersed Nuclear Element 1 (LINE-1) elements, are excluded from these regions. We further show that isochore membership does not distinguish housekeeping genes from tissue-specific genes and that repetitive sequence environment distinguishes housekeeping genes from tissue-specific genes in every isochore. The distinct repetitive sequence environment, in combination with other previously published sequence properties of housekeeping genes, were used to develop a method of predicting housekeeping genes on the basis of DNA sequence alone. Using expression across tissue types as a measure of success, we demonstrate that repetitive sequence environment is by far the most important sequence feature identified to date for distinguishing housekeeping genes. PMID:17141428

  12. Next generation sequencing of viral RNA genomes

    PubMed Central

    2013-01-01

    Background With the advent of Next Generation Sequencing (NGS) technologies, the ability to generate large amounts of sequence data has revolutionized the genomics field. Most RNA viruses have relatively small genomes in comparison to other organisms and as such, would appear to be an obvious success story for the use of NGS technologies. However, due to the relatively low abundance of viral RNA in relation to host RNA, RNA viruses have proved relatively difficult to sequence using NGS technologies. Here we detail a simple, robust methodology, without the use of ultra-centrifugation, filtration or viral enrichment protocols, to prepare RNA from diagnostic clinical tissue samples, cell monolayers and tissue culture supernatant, for subsequent sequencing on the Roche 454 platform. Results As representative RNA viruses, full genome sequence was successfully obtained from known lyssaviruses belonging to recognized species and a novel lyssavirus species using these protocols and assembling the reads using de novo algorithms. Furthermore, genome sequences were generated from considerably less than 200 ng RNA, indicating that manufacturers’ minimum template guidance is conservative. In addition to obtaining genome consensus sequence, a high proportion of SNPs (Single Nucleotide Polymorphisms) were identified in the majority of samples analyzed. Conclusions The approaches reported clearly facilitate successful full genome lyssavirus sequencing and can be universally applied to discovering and obtaining consensus genome sequences of RNA viruses from a variety of sources. PMID:23822119

  13. Genomic sequencing of Pleistocene cave bears

    SciTech Connect

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  14. A measurement of disorder in binary sequences

    NASA Astrophysics Data System (ADS)

    Gong, Longyan; Wang, Haihong; Cheng, Weiwen; Zhao, Shengmei

    2015-03-01

    We propose a complex quantity, AL, to characterize the degree of disorder of L-length binary symbolic sequences. As examples, we respectively apply it to typical random and deterministic sequences. One kind of random sequences is generated from a periodic binary sequence and the other is generated from the logistic map. The deterministic sequences are the Fibonacci and Thue-Morse sequences. In these analyzed sequences, we find that the modulus of AL, denoted by |AL | , is a (statistically) equivalent quantity to the Boltzmann entropy, the metric entropy, the conditional block entropy and/or other quantities, so it is a useful quantitative measure of disorder. It can be as a fruitful index to discern which sequence is more disordered. Moreover, there is one and only one value of |AL | for the overall disorder characteristics. It needs extremely low computational costs. It can be easily experimentally realized. From all these mentioned, we believe that the proposed measure of disorder is a valuable complement to existing ones in symbolic sequences.

  15. Pyrosequencing sheds light on DNA sequencing.

    PubMed

    Ronaghi, M

    2001-01-01

    DNA sequencing is one of the most important platforms for the study of biological systems today. Sequence determination is most commonly performed using dideoxy chain termination technology. Recently, pyrosequencing has emerged as a new sequencing methodology. This technique is a widely applicable, alternative technology for the detailed characterization of nucleic acids. Pyrosequencing has the potential advantages of accuracy, flexibility, parallel processing, and can be easily automated. Furthermore, the technique dispenses with the need for labeled primers, labeled nucleotides, and gel-electrophoresis. This article considers key features regarding different aspects of pyrosequencing technology, including the general principles, enzyme properties, sequencing modes, instrumentation, and potential applications. PMID:11156611

  16. Learning of Sensory Sequences in Cerebellar Patients

    PubMed Central

    Frings, Markus; Boenisch, Raoul; Gerwig, Marcus; Diener, Hans-Christoph; Timmann, Dagmar

    2004-01-01

    A possible role of the cerebellum in detecting and recognizing event sequences has been proposed. The present study sought to determine whether patients with cerebellar lesions are impaired in the acquisition and discrimination of sequences of sensory stimuli of different modalities. A group of 26 cerebellar patients and 26 controls matched for age, sex, handedness, musicality, and level of education were tested. Auditory and visual sensory sequences were presented out of different sensory pattern categories (tones with different acoustic frequencies and durations, visual stimuli with different spatial locations and colors, sequential vision of irregular shapes) and different ranges of inter-cue time intervals (fast and slow). Motor requirements were small, with vocal responses and no time restrictions. Perception of visual and acoustic stimuli was generally preserved in patients and controls. The number of errors was significantly higher in the faster tempo of sequence presentation in learning of sequences of tones of different frequencies and in learning of sequences of visual stimuli of different spatial locations and different colors. No difference in tempo between the groups was shown. The total number of errors between the two groups was identical in the sequence conditions. No major disturbances in acquisition or discrimination of various sensory sequences were observed in the group of cerebellar patients. Sequence learning may be impaired only in tasks with significant motor demands. PMID:15169865

  17. Recursive sequences in first-year calculus

    NASA Astrophysics Data System (ADS)

    Krainer, Thomas

    2016-02-01

    This article provides ready-to-use supplementary material on recursive sequences for a second-semester calculus class. It equips first-year calculus students with a basic methodical procedure based on which they can conduct a rigorous convergence or divergence analysis of many simple recursive sequences on their own without the need to invoke inductive arguments as is typically required in calculus textbooks. The sequences that are accessible to this kind of analysis are predominantly (eventually) monotonic, but also certain recursive sequences that alternate around their limit point as they converge can be considered.

  18. Towards modeling DNA sequences as automata

    NASA Astrophysics Data System (ADS)

    Burks, Christian; Farmer, Doyne

    1984-01-01

    We seek to describe a starting point for modeling the evolution and role of DNA sequences within the framework of cellular automata by discussing the current understanding of genetic information storage in DNA sequences. This includes alternately viewing the role of DNA in living organisms as a simple scheme and as a complex scheme; a brief review of strategies for identifying and classifying patterns in DNA sequences; and finally, notes towards establishing DNA-like automata models, including a discussion of the extent of experimentally determined DNA sequence data present in the database at Los Alamos.

  19. Multiplexed microsatellite recovery using massively parallel sequencing

    USGS Publications Warehouse

    Jennings, T.N.; Knaus, B.J.; Mullins, T.D.; Haig, S.M.; Cronn, R.C.

    2011-01-01

    Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5M (USD).

  20. Picoeukaryotic sequences in the Sargasso Sea metagenome

    PubMed Central

    Piganeau, Gwenael; Desdevises, Yves; Derelle, Evelyne; Moreau, Herve

    2008-01-01

    Background With genome sequencing becoming more and more affordable, environmental shotgun sequencing of the microorganisms present in an environment generates a challenging amount of sequence data for the scientific community. These sequence data enable the diversity of the microbial world and the metabolic pathways within an environment to be investigated, a previously unthinkable achievement when using traditional approaches. DNA sequence data assembled from extracts of 0.8 μm filtered Sargasso seawater unveiled an unprecedented glimpse of marine prokaryotic diversity and gene content. Serendipitously, many sequences representing picoeukaryotes (cell size <2 μm) were also present within this dataset. We investigated the picoeukaryotic diversity of this database by searching sequences containing homologs of eight nuclear anchor genes that are well conserved throughout the eukaryotic lineage, as well as one chloroplastic and one mitochondrial gene. Results We found up to 41 distinct eukaryotic scaffolds, with a broad phylogenetic spread on the eukaryotic tree of life. The average eukaryotic scaffold size is 2,909 bp, with one gap every 1,253 bp. Strikingly, the AT frequency of the eukaryotic sequences (51.4%) is significantly lower than the average AT frequency of the metagenome (61.4%). This represents 4% to 18% of the estimated prokaryotic diversity, depending on the average prokaryotic versus eukaryotic genome size ratio. Conclusion Despite similar cell size, eukaryotic sequences of the Sargasso Sea metagenome have higher GC content, suggesting that different environmental pressures affect the evolution of their base composition. PMID:18179699

  1. Small scale sequence automation pays big dividends

    NASA Technical Reports Server (NTRS)

    Nelson, Bill

    1994-01-01

    Galileo sequence design and integration are supported by a suite of formal software tools. Sequence review, however, is largely a manual process with reviewers scanning hundreds of pages of cryptic computer printouts to verify sequence correctness. Beginning in 1990, a series of small, PC based sequence review tools evolved. Each tool performs a specific task but all have a common 'look and feel'. The narrow focus of each tool means simpler operation, and easier creation, testing, and maintenance. Benefits from these tools are (1) decreased review time by factors of 5 to 20 or more with a concomitant reduction in staffing, (2) increased review accuracy, and (3) excellent returns on time invested.

  2. Corruption of genomic databases with anomalous sequence.

    PubMed Central

    Lamperti, E D; Kittelberger, J M; Smith, T F; Villa-Komaroff, L

    1992-01-01

    We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%. PMID:1614861

  3. On a Class of Thue-Morse Type Sequences

    NASA Astrophysics Data System (ADS)

    Astudillo, Ricardo

    2003-12-01

    We consider a class of binary sequences that generalize the Thue-Morse sequence. In particular, we investigate the occurrences of palindromes in such sequences. We also introduce the notion of the first difference of a binary sequence and characterize first differences of our class of Thue-Morse type sequences. Finally, we define the concept of a "change sequence" of a given binary sequence, a sequence which encodes the positions at which a binary sequence changes values. We characterize the change sequences corresponding to our class of Thue-Morse type sequences.

  4. Direct Chloroplast Sequencing: Comparison of Sequencing Platforms and Analysis Tools for Whole Chloroplast Barcoding

    PubMed Central

    Brozynska, Marta; Furtado, Agnelo; Henry, Robert James

    2014-01-01

    Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina) and Ion Torrent (Life Technology) sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare). Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels) between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis. PMID:25329378

  5. Transfer in Motor Sequence Learning: Effects of Practice Schedule and Sequence Context

    PubMed Central

    Müssgens, Diana M.; Ullén, Fredrik

    2015-01-01

    Transfer (i.e., the application of a learned skill in a novel context) is an important and desirable outcome of motor skill learning. While much research has been devoted to understanding transfer of explicit skills the mechanisms of skill transfer after incidental learning remain poorly understood. The aim of this study was to (1) examine the effect of practice schedule on transfer and (2) investigate whether sequence-specific knowledge can transfer to an unfamiliar sequence context. We trained two groups of participants on an implicit serial response time task under a Constant (one sequence for 10 blocks) or Variable (alternating between two sequences for a total of 10 blocks) practice schedule. We evaluated response times for three types of transfer: task-general transfer to a structurally non-overlapping sequence, inter-manual transfer to a perceptually identical sequence, and sequence-specific transfer to a partially overlapping (three shared triplets) sequence. Results showed partial skill transfer to all three sequences and an advantage of Variable practice only for task-general transfer. Further, we found expression of sequence-specific knowledge for familiar sub-sequences in the overlapping sequence. These findings suggest that (1) constant practice may create interference for task-general transfer and (2) sequence-specific knowledge can transfer to a new sequential context. PMID:26635591

  6. Representing objects, relations, and sequences.

    PubMed

    Gallant, Stephen I; Okaywe, T Wendy

    2013-08-01

    Vector symbolic architectures (VSAs) are high-dimensional vector representations of objects (e.g., words, image parts), relations (e.g., sentence structures), and sequences for use with machine learning algorithms. They consist of a vector addition operator for representing a collection of unordered objects, a binding operator for associating groups of objects, and a methodology for encoding complex structures. We first develop constraints that machine learning imposes on VSAs; for example, similar structures must be represented by similar vectors. The constraints suggest that current VSAs should represent phrases ("The smart Brazilian girl") by binding sums of terms, in addition to simply binding the terms directly. We show that matrix multiplication can be used as the binding operator for a VSA, and that matrix elements can be chosen at random. A consequence for living systems is that binding is mathematically possible without the need to specify, in advance, precise neuron-to-neuron connection properties for large numbers of synapses. A VSA that incorporates these ideas, Matrix Binding of Additive Terms (MBAT), is described that satisfies all constraints. With respect to machine learning, for some types of problems appropriate VSA representations permit us to prove learnability rather than relying on simulations. We also propose dividing machine (and neural) learning and representation into three stages, with differing roles for learning in each stage. For neural modeling, we give representational reasons for nervous systems to have many recurrent connections, as well as for the importance of phrases in language processing. Sizing simulations and analyses suggest that VSAs in general, and MBAT in particular, are ready for real-world applications. PMID:23607563

  7. SNP discovery through de novo deep sequencing using the next generation of DNA sequencers

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The production of high volumes of DNA sequence data using new technologies has permitted more efficient identification of single nucleotide polymorphisms in vertebrate genomes. This chapter presented practical methodology for production and analysis of DNA sequence data for SNP discovery....

  8. Serological evidence and amino acid sequence of ubiquitin-like protein isolated from coelomic fluid and cells of the earthworm Eisenia fetida andrei.

    PubMed

    Lassalle, F; Lassègues, M; Roch, P

    1993-03-01

    1. A small protein of M(r) 10 kDa has been isolated by reverse-phase chromatography of the basic proteins contained in the coelomic fluid and cell lysate of the earthworm Eisenia fetida andrei. 2. The protein crossreacted in dot-blot with an anti-bovine ubiquitin antiserum. 3. Its N-terminal primary structure was determined by automatic Edman degradation on 26 consecutive amino acids and showed 69% (based on the 26 amino acids) or 82% (based on the first 19 consecutive amino acids) identity with many ubiquitins and similar charge and hydrophobicity profiles and secondary structure conformation. PMID:8386996

  9. Towards a reference pecan genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The cost of generating DNA sequence data has declined dramatically over the previous 15 years as a result of the Human Genome Project and the potential applications of genome sequencing for human medicine. This cost reduction has generated renewed interest among crop breeding scientists in applying...

  10. Bonobos Extract Meaning from Call Sequences

    PubMed Central

    Clay, Zanna; Zuberbühler, Klaus

    2011-01-01

    Studies on language-trained bonobos have revealed their remarkable abilities in representational and communication tasks. Surprisingly, however, corresponding research into their natural communication has largely been neglected. We address this issue with a first playback study on the natural vocal behaviour of bonobos. Bonobos produce five acoustically distinct call types when finding food, which they regularly mix together into longer call sequences. We found that individual call types were relatively poor indicators of food quality, while context specificity was much greater at the call sequence level. We therefore investigated whether receivers could extract meaning about the quality of food encountered by the caller by integrating across different call sequences. We first trained four captive individuals to find two types of foods, kiwi (preferred) and apples (less preferred) at two different locations. We then conducted naturalistic playback experiments during which we broadcasted sequences of four calls, originally produced by a familiar individual responding to either kiwi or apples. All sequences contained the same number of calls but varied in the composition of call types. Following playbacks, we found that subjects devoted significantly more search effort to the field indicated by the call sequence. Rather than attending to individual calls, bonobos attended to the entire sequences to make inferences about the food encountered by a caller. These results provide the first empirical evidence that bonobos are able to extract information about external events by attending to vocal sequences of other individuals and highlight the importance of call combinations in their natural communication system. PMID:21556149

  11. Learning of Sensory Sequences in Cerebellar Patients

    ERIC Educational Resources Information Center

    Frings, Markus; Boenisch, Raoul; Gerwig, Marcus; Diener, Hans-Christoph; Timmann, Dagmar

    2004-01-01

    A possible role of the cerebellum in detecting and recognizing event sequences has been proposed. The present study sought to determine whether patients with cerebellar lesions are impaired in the acquisition and discrimination of sequences of sensory stimuli of different modalities. A group of 26 cerebellar patients and 26 controls matched for…

  12. A Statistical Approach for Ambiguous Sequence Mappings

    Technology Transfer Automated Retrieval System (TEKTRAN)

    When attempting to map RNA sequences to a reference genome, high percentages of short sequence reads are often assigned to multiple genomic locations. One approach to handling these “ambiguous mappings” has been to discard them. This results in a loss of data, which can sometimes be as much as 45% o...

  13. From Arithmetic Sequences to Linear Equations

    ERIC Educational Resources Information Center

    Matsuura, Ryota; Harless, Patrick

    2012-01-01

    The first part of the article focuses on deriving the essential properties of arithmetic sequences by appealing to students' sense making and reasoning. The second part describes how to guide students to translate their knowledge of arithmetic sequences into an understanding of linear equations. Ryota Matsuura originally wrote these lessons for…

  14. Molecular selection in a unified evolutionary sequence

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1986-01-01

    With guidance from experiments and observations that indicate internally limited phenomena, an outline of unified evolutionary sequence is inferred. Such unification is not visible for a context of random matrix and random mutation. The sequence proceeds from Big Bang through prebiotic matter, protocells, through the evolving cell via molecular and natural selection, to mind, behavior, and society.

  15. Program Helps To Optimize Assembly Sequences

    NASA Technical Reports Server (NTRS)

    Borden, Chester S.; Werntz, David G.; Loyola, Steven J.

    1992-01-01

    FAST project-management software tool designed to optimize sequence of assembly of Space Station Freedom. Assesses effects of detailed changes upon system and produces output metrics identifying preferred assembly sequences. Incorporates Space-Shuttle integration, Space-Station hardware, on-orbit operations, and governing programmatic considerations as either precedence relations or numerical data. Written in C language.

  16. Turning yeast sequence into protein function

    SciTech Connect

    Heijne, G. von

    1996-04-01

    The complete genome sequencing of the yeast Saccharomyces Cerevisiae leads us into a new era of potential use for such data base information. Protein engineering studies suggest that genetic selection of overproducing strains may aid the assignment of protein function. Data base management and sequencing software have been developed to scan entire genomes.

  17. Draft Genome Sequences of Fungus Aspergillus calidoustus.

    PubMed

    Horn, Fabian; Linde, Jörg; Mattern, Derek J; Walther, Grit; Guthke, Reinhard; Scherlach, Kirstin; Martin, Karin; Brakhage, Axel A; Petzke, Lutz; Valiante, Vito

    2016-01-01

    Here, we report the draft genome sequence of Aspergillus calidoustus (strain SF006504). The functional annotation of A. calidoustus predicts a relatively large number of secondary metabolite gene clusters. The presented genome sequence builds the basis for further genome mining. PMID:26966204

  18. Multilocus Sequence Typing Tool for Cyclospora cayetanensis

    PubMed Central

    Guo, Yaqiong; Roellig, Dawn M.; Li, Na; Tang, Kevin; Frace, Michael; Ortega, Ynes; Arrowood, Michael J.; Qvarnstrom, Yvonne; Wang, Lin; Moss, Delynn M.; Zhang, Longxian; Xiao, Lihua

    2016-01-01

    Because the lack of typing tools for Cyclospora cayetanensis has hampered outbreak investigations, we sequenced its genome and developed a genotyping tool. We observed 2 to 10 geographically segregated sequence types at each of 5 selected loci. This new tool could be useful for case linkage and infection/contamination source tracking. PMID:27433881

  19. Project Report: Automatic Sequence Processor Software Analysis

    NASA Technical Reports Server (NTRS)

    Benjamin, Brandon

    2011-01-01

    The Mission Planning and Sequencing (MPS) element of Multi-Mission Ground System and Services (MGSS) provides space missions with multi-purpose software to plan spacecraft activities, sequence spacecraft commands, and then integrate these products and execute them on spacecraft. Jet Propulsion Laboratory (JPL) is currently is flying many missions. The processes for building, integrating, and testing the multi-mission uplink software need to be improved to meet the needs of the missions and the operations teams that command the spacecraft. The Multi-Mission Sequencing Team is responsible for collecting and processing the observations, experiments and engineering activities that are to be performed on a selected spacecraft. The collection of these activities is called a sequence and ultimately a sequence becomes a sequence of spacecraft commands. The operations teams check the sequence to make sure that no constraints are violated. The workflow process involves sending a program start command, which activates the Automatic Sequence Processor (ASP). The ASP is currently a file-based system that is comprised of scripts written in perl, c-shell and awk. Once this start process is complete, the system checks for errors and aborts if there are any; otherwise the system converts the commands to binary, and then sends the resultant information to be radiated to the spacecraft.

  20. Expression Profiling Using New Generation Sequencing Technologies

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Microarray hybridization technology has become widely used in parallel analysis of gene expression. Recent advances in genome sequencing platforms point to an alternate approach through digital quantitation of sequencing reads produced from cDNA samples. This presentation will compare advantages a...

  1. Complete Genome Sequencing of Trivittatus virus

    PubMed Central

    Groseth, Allison; Vine, Veronica; Weisend, Carla; Ebihara, Hideki

    2015-01-01

    Trivittatus virus (family Bunyaviridae, genus Orthobunyavirus) represents an important genetic intermediate between the California encephalitis group, and Bwamba/Pongola and Nyando groups. Here, we report the first complete genome sequence of the prototype (Eklund) strain, isolated in 1948, which interestingly shows only few differences compared to partial sequences of modern strains. PMID:26212363

  2. Draft Genome Sequence of Goose Dicistrovirus

    PubMed Central

    Jerome, Keith R.

    2016-01-01

    We report the draft genome sequence of goose dicistrovirus assembled from the filtered feces of a Canadian goose from South Lake Union in Seattle, Washington. The 9.1-kb dicistronic RNA virus falls within the family Dicistroviridae; however, it shares <33% translated amino acid sequence within the nonstructural open reading frame (ORF) from aparavirus or cripavirus. PMID:26941149

  3. Genome sequence of Lactobacillus rhamnosus ATCC 8530.

    PubMed

    Pittet, Vanessa; Ewen, Emily; Bushell, Barry R; Ziola, Barry

    2012-02-01

    Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences. PMID:22247527

  4. Feature expressions: creating and manipulating sequence datasets.

    PubMed

    Fristensky, B

    1993-12-25

    Annotation of features, such as introns, exons and protein coding regions in GenBank/EMBL/DDBJ entries is now standardized through use of the Features Table (FT) language. The essence of the FT language is described by the relation 'expression-->sequence', meaning that each FT expression evaluates to a sequence. For example, the expression M74750:1..50 evaluates to the first 50 bases of the sequence with accession number M74750. Because FT is intrinsic to the database definition, it can serve as a software- and platform-independent lingua franca for sequence manipulation. The XYLEM package makes it possible to create and manipulate sequence datasets using FT expressions. FEATURES is a program that resolves FT expressions into their corresponding sequences. Annotated features can be retrieved either by feature key or by expression. Even unannotated portions of a sequence can be retrieved by user-generated FT expressions. Applications of the FT language include retrieval of subsequences from large sequence entries, generation of chromosome models or artificial DNA constructs, and representation of restriction maps or mutants. PMID:8290362

  5. A prime number approach to biological sequencing.

    PubMed

    Greer, W; Barrett, A N; Sowden, J M

    1985-03-01

    Computational sequencing of nucleic acid and amino acid sequences is placing increasing demands on computer resources. The use of prime numbers is explored as a convenient means of improving program speed and reducing storage requirements. It is concluded that the application of the prime number approach leads to significant increases in speed and some reduction in storage requirements. PMID:3840126

  6. Complete Genome Sequences of 63 Mycobacteriophages

    PubMed Central

    2013-01-01

    Mycobacteriophages are viruses that infect mycobacterial hosts. The current collection of sequenced mycobacteriophages—all isolated on a single host strain, Mycobacterium smegmatis mc2155, reveals substantial genetic diversity. The complete genome sequences of 63 newly isolated mycobacteriophages expand the resolution of our understanding of phage diversity. PMID:24285655

  7. Sequence Factorial of "g"-Gonal Numbers

    ERIC Educational Resources Information Center

    Asiru, Muniru A.

    2013-01-01

    The gamma function, which has the property to interpolate the factorial whenever the argument is an integer, is a special case (the case "g"?=?2) of the general term of the sequence factorial of "g"-gonal numbers. In relation to this special case, a formula for calculating the general term of the sequence factorial of any…

  8. Auto Body Repair: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for an auto body repair vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and…

  9. Multiplex De Novo Sequencing of Peptide Antibiotics

    NASA Astrophysics Data System (ADS)

    Mohimani, Hosein; Liu, Wei-Ting; Yang, Yu-Liang; Gaudêncio, Susana P.; Fenical, William; Dorrestein, Pieter C.; Pevzner, Pavel A.

    Proliferation of drug-resistant diseases raises the challenge of searching for new, more efficient antibiotics. Currently, some of the most effective antibiotics (i.e., Vancomycin and Daptomycin) are cyclic peptides produced by non-ribosomal biosynthetic pathways. The isolation and sequencing of cyclic peptide antibiotics, unlike the same activity with linear peptides, is time-consuming and error-prone. The dominant technique for sequencing cyclic peptides is NMR-based and requires large amounts (milligrams) of purified materials that, for most compounds, are not possible to obtain. Given these facts, there is a need for new tools to sequence cyclic NRPs using picograms of material. Since nearly all cyclic NRPs are produced along with related analogs, we develop a mass spectrometry approach for sequencing all related peptides at once (in contrast to the existing approach that analyzes individual peptides). Our results suggest that instead of attempting to isolate and NMR-sequence the most abundant compound, one should acquire spectra of many related compounds and sequence all of them simultaneously using tandem mass spectrometry. We illustrate applications of this approach by sequencing new variants of cyclic peptide antibiotics from Bacillus brevis, as well as sequencing a previously unknown familiy of cyclic NRPs produced by marine bacteria.

  10. Marketing and Distributive Education: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 2-year program in marketing and distributive education. The guide consists of a course description;…

  11. Development of Ordinal Sequence Perception in Infancy

    ERIC Educational Resources Information Center

    Lewkowicz, David J.

    2013-01-01

    Perception of the ordinal position of a sequence element is critical to many cognitive and motor functions. Here, the prediction that this ability is based on a domain-general perceptual mechanism and, thus, that it emerges prior to the emergence of language was tested. Infants were habituated with sequences of moving/sounding objects and then…

  12. VOE Computer Programming: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 3-year program in computer programming. The guide consists of a course description; general course…

  13. Using Conventional Sequences in L2 French

    ERIC Educational Resources Information Center

    Forsberg, Fanny

    2010-01-01

    By means of a phraseological identification method, this study provides a general description of the use of conventional sequences (CSs) in interviews at four different levels of spoken L2 French as well as in interviews with native speakers. Use of conventional sequences is studied with regard to overall quantity, category distribution and type…

  14. Genome Sequence of Pseudomonas chlororaphis Strain 189

    PubMed Central

    Town, Jennifer; Audy, Patrice; Boyetchko, Susan M.

    2016-01-01

    Pseudomonas chlororaphis strain 189 is a potent inhibitor of the growth of the potato pathogen Phytophthora infestans. We determined the complete, finished sequence of the 6.8-Mbp genome of this strain, consisting of a single contiguous molecule. Strain 189 is closely related to previously sequenced strains of P. chlororaphis. PMID:27340063

  15. Regular Pentagons and the Fibonacci Sequence.

    ERIC Educational Resources Information Center

    French, Doug

    1989-01-01

    Illustrates how to draw a regular pentagon. Shows the sequence of a succession of regular pentagons formed by extending the sides. Calculates the general formula of the Lucas and Fibonacci sequences. Presents a regular icosahedron as an example of the golden ratio. (YP)

  16. Archaebacterial rhodopsin sequences: Implications for evolution

    NASA Technical Reports Server (NTRS)

    Lanyi, J. K.

    1991-01-01

    It was proposed over 10 years ago that the archaebacteria represent a separate kingdom which diverged very early from the eubacteria and eukaryotes. It follows that investigations of archaebacterial characteristics might reveal features of early evolution. So far, two genes, one for bacteriorhodopsin and another for halorhodopsin, both from Halobacterium halobium, have been sequenced. We cloned and sequenced the gene coding for the polypeptide of another one of these rhodopsins, a halorhodopsin in Natronobacterium pharaonis. Peptide sequencing of cyanogen bromide fragments, and immuno-reactions of the protein and synthetic peptides derived from the C-terminal gene sequence, confirmed that the open reading frame was the structural gene for the pharaonis halorhodopsin polypeptide. The flanking DNA sequences of this gene, as well as those of other bacterial rhodopsins, were compared to previously proposed archaebacterial consensus sequences. In pairwise comparisons of the open reading frame with DNA sequences for bacterio-opsin and halo-opsin from Halobacterium halobium, silent divergences were calculated. These indicate very considerable evolutionary distance between each pair of genes, even in the dame organism. In spite of this, three protein sequences show extensive similarities, indicating strong selective pressures.

  17. Test Sequences for Reed-Solomon Encoders

    NASA Technical Reports Server (NTRS)

    Lee, J. J.

    1985-01-01

    Theory of Reed-Solomon codes yields sequences of input test symbols. Two specific sequences worked out for codes of 8 bits per symbol with 223 information symbols and 32 parity check symbols per code words. Test patterns also used for decoders.

  18. Gene Sequence Homology of Chemokines Across Species

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The abundance of expressed gene and protein sequences available in the biological information databases facilitates comparison of protein homologies. A high degree of sequence similarity typically implies homology regarding structure and function and may provide clues to antibody cross-reactivities...

  19. GENE SEQUENCE HOMOLOGY OF CHEMOKINES ACROSS SPECIES

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The abundance of expressed gene and protein sequences available in the biological information databases facilitates comparison of protein homologies. A high degree of sequence similarity typically implies homology regarding structure and function and may provide clues to antibody cross-react...

  20. Concept For Generation Of Long Pseudorandom Sequences

    NASA Technical Reports Server (NTRS)

    Wang, C. C.

    1990-01-01

    Conceptual very-large-scale integrated (VLSI) digital circuit performs exponentiation in finite field. Algorithm that generates unusually long sequences of pseudorandom numbers executed by digital processor that includes such circuits. Concepts particularly advantageous for such applications as spread-spectrum communications, cryptography, and generation of ranging codes, synthetic noise, and test data, where usually desirable to make pseudorandom sequences as long as possible.

  1. Genome Sequence of Gordonia Phage Yvonnetastic.

    PubMed

    Pope, Welkin H; Bandyopadhyay, Anshika; Carlton, Meghan L; Kane, Meghan T; Panchal, Niyati J; Pham, Yvonne C; Reynolds, Zachary J; Sapienza, Michael S; German, Brian A; McDonnell, Jill E; Schafer, Claire E; Yu, Victor J; Furbee, Emily C; Grubb, Sarah R; Warner, Marcie H; Montgomery, Matthew T; Garlena, Rebecca A; Russell, Daniel A; Jacobs-Sera, Deborah; Hatfull, Graham F

    2016-01-01

    Gordonia bacteriophage Yvonnetastic was isolated from soil in Pittsburgh, PA, using Gordonia terrae 3612 as a host. Yvonnetastic has siphoviral morphology and a genome of 98,136 bp, with 198 predicted protein-coding genes and five tRNA genes. Yvonnetastic does not share substantial sequence similarity with other sequenced bacteriophage genomes. PMID:27389265

  2. Draft Genome Sequences of Fungus Aspergillus calidoustus

    PubMed Central

    Horn, Fabian; Linde, Jörg; Mattern, Derek J.; Walther, Grit; Guthke, Reinhard; Scherlach, Kirstin; Martin, Karin; Brakhage, Axel A.; Petzke, Lutz

    2016-01-01

    Here, we report the draft genome sequence of Aspergillus calidoustus (strain SF006504). The functional annotation of A. calidoustus predicts a relatively large number of secondary metabolite gene clusters. The presented genome sequence builds the basis for further genome mining. PMID:26966204

  3. Genome Sequence of Gordonia Phage Yvonnetastic

    PubMed Central

    Bandyopadhyay, Anshika; Carlton, Meghan L.; Kane, Meghan T.; Panchal, Niyati J.; Pham, Yvonne C.; Reynolds, Zachary J.; Sapienza, Michael S.; German, Brian A.; McDonnell, Jill E.; Schafer, Claire E.; Yu, Victor J.; Furbee, Emily C.; Grubb, Sarah R.; Warner, Marcie H.; Montgomery, Matthew T.; Garlena, Rebecca A.; Russell, Daniel A.; Jacobs-Sera, Deborah; Hatfull, Graham F.

    2016-01-01

    Gordonia bacteriophage Yvonnetastic was isolated from soil in Pittsburgh, PA, using Gordonia terrae 3612 as a host. Yvonnetastic has siphoviral morphology and a genome of 98,136 bp, with 198 predicted protein-coding genes and five tRNA genes. Yvonnetastic does not share substantial sequence similarity with other sequenced bacteriophage genomes. PMID:27389265

  4. Sequencing for the cream of the crop

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In this invited commentary, we discuss how next-generation sequencing methods are beginning to find their way into plant genetics, promising substantial improvements in crop yields over the coming decades. Next-generation sequencing facilitates the construction of high-resolution variation maps, whi...

  5. Draft Genome Sequence of Goose Dicistrovirus.

    PubMed

    Greninger, Alexander L; Jerome, Keith R

    2016-01-01

    We report the draft genome sequence of goose dicistrovirus assembled from the filtered feces of a Canadian goose from South Lake Union in Seattle, Washington. The 9.1-kb dicistronic RNA virus falls within the family Dicistroviridae; however, it shares <33% translated amino acid sequence within the nonstructural open reading frame (ORF) from aparavirus or cripavirus. PMID:26941149

  6. What's Next? Judging Sequences of Binary Events

    ERIC Educational Resources Information Center

    Oskarsson, An T.; Van Boven, Leaf; McClelland, Gary H.; Hastie, Reid

    2009-01-01

    The authors review research on judgments of random and nonrandom sequences involving binary events with a focus on studies documenting gambler's fallacy and hot hand beliefs. The domains of judgment include random devices, births, lotteries, sports performances, stock prices, and others. After discussing existing theories of sequence judgments,…

  7. Occupational Sequences: Auto Engines 1. AT 121.

    ERIC Educational Resources Information Center

    Korb, A. W.; And Others

    In an attempt to individualize an automotive course, the Vocational-Technical Division of Northern Montana College has developed Occupational Sequences for an engine rebuilding course. Occupational Sequences, a learning or teaching aid, is an analysis of numbered operations involved in engine rebuilding. Job sheets, included in the book, provide a…

  8. Global Alignment System for Large Genomic Sequencing

    Energy Science and Technology Software Center (ESTSC)

    2002-03-01

    AVID is a global alignment system tailored for the alignment of large genomic sequences up to megabases in length. Features include the possibility of one sequence being in draft form, fast alignment, robustness and accuracy. The method is an anchor based alignment using maximal matches derived from suffix trees.

  9. Meeting Highlights: Genome Sequencing and Biology 2001

    PubMed Central

    2001-01-01

    We bring you a report from the CSHL Genome Sequencing and Biology Meeting, which has a long and prestigious history. This year there were sessions on large-scale sequencing and analysis, polymorphisms (covering discovery and technologies and mapping and analysis), comparative genomics of mammalian and model organism genomes, functional genomics and bioinformatics. PMID:18628920

  10. Nonspatial Sequence Coding in CA1 Neurons

    PubMed Central

    Allen, Timothy A.; Salz, Daniel M.; McKenzie, Sam

    2016-01-01

    The hippocampus is critical to the memory for sequences of events, a defining feature of episodic memory. However, the fundamental neuronal mechanisms underlying this capacity remain elusive. While considerable research indicates hippocampal neurons can represent sequences of locations, direct evidence of coding for the memory of sequential relationships among nonspatial events remains lacking. To address this important issue, we recorded neural activity in CA1 as rats performed a hippocampus-dependent sequence-memory task. Briefly, the task involves the presentation of repeated sequences of odors at a single port and requires rats to identify each item as “in sequence” or “out of sequence”. We report that, while the animals' location and behavior remained constant, hippocampal activity differed depending on the temporal context of items—in this case, whether they were presented in or out of sequence. Some neurons showed this effect across items or sequence positions (general sequence cells), while others exhibited selectivity for specific conjunctions of item and sequence position information (conjunctive sequence cells) or for specific probe types (probe-specific sequence cells). We also found that the temporal context of individual trials could be accurately decoded from the activity of neuronal ensembles, that sequence coding at the single-cell and ensemble level was linked to sequence memory performance, and that slow-gamma oscillations (20–40 Hz) were more strongly modulated by temporal context and performance than theta oscillations (4–12 Hz). These findings provide compelling evidence that sequence coding extends beyond the domain of spatial trajectories and is thus a fundamental function of the hippocampus. SIGNIFICANCE STATEMENT The ability to remember the order of life events depends on the hippocampus, but the underlying neural mechanisms remain poorly understood. Here we addressed this issue by recording neural activity in hippocampal

  11. Transcriptome sequencing goals, assembly, and assessment.

    PubMed

    Wheat, Christopher W; Vogel, Heiko

    2011-01-01

    Transcriptome sequencing provides quick, direct access to the mRNA. With this information, one can design primers for PCR of thousands of different genes, SNP markers, probes for microarrays and qPCR, or just use the sequence data itself in comparative studies. Transcriptome sequencing, while getting cheaper, is still an expensive endeavor, with an examination of data quality and its assembly infrequently performed in depth. Here, we outline many of the important issues we think need consideration when starting a transcriptome sequencing project. We also walk the reader through a detailed analysis of an example transcriptome dataset, highlighting the importance of both within-dataset analysis and comparative inferences. Our hope is that with greater attention focused upon assessing assembly performance, advances in transcriptome assembly will increase as prices continue to drop and new technologies, such as Illumina sequencing, start to be used. PMID:22065435

  12. Metagenomics using next-generation sequencing.

    PubMed

    Bragg, Lauren; Tyson, Gene W

    2014-01-01

    Traditionally, microbial genome sequencing has been restricted to the small number of species that can be grown in pure culture. The progressive development of culture-independent methods over the last 15 years now allows researchers to sequence microbial communities directly from environmental samples. This approach is commonly referred to as "metagenomics" or "community genomics". However, the term metagenomics is applied liberally in the literature to describe any culture-independent analysis of microbial communities. Here, we define metagenomics as shotgun ("random") sequencing of the genomic DNA of a sample taken directly from the environment. The metagenome can be thought of as a sampling of the collective genome of the microbial community. We outline the considerations and analyses that should be undertaken to ensure the success of a metagenomic sequencing project, including the choice of sequencing platform and methods for assembly, binning, annotation, and comparative analysis. PMID:24515370

  13. M-sequences in ophthalmic electrophysiology.

    PubMed

    Müller, Philipp L; Meigen, Thomas

    2016-01-01

    The aim of this review is to use the multimedia aspects of a purely digital online publication to explain and illustrate the highly capable technique of m-sequences in multifocal ophthalmic electrophysiology. M-sequences have been successfully applied in clinical routines during the past 20 years. However, the underlying mathematical rationale is often daunting. These mathematical properties of m-sequences allow one not only to separate the responses from different fields but also to analyze adaptational effects and impacts of former events. By explaining the history, the formation, and the different aspects of application, a better comprehension of the technique is intended. With this review we aim to clarify the opportunities of m-sequences in order to motivate scientists to use m-sequences in their future research. PMID:26818968

  14. EST processing: from trace to sequence.

    PubMed

    Schmid, Ralf; Blaxter, Mark

    2009-01-01

    A common task in EST projects is the conversion of sequence chromatograms originating from gel-based or capillary sequencers into annotated sequence objects. Here we describe the usage of a software pipeline (available from http://www.nematodes.org/bioinformatics/ ), which has been developed to make the most of EST datasets. This modular software solution is targeted toward small- to medium-sized EST projects and comprises a series of Perl scripts. The software design is based on our experience during EST projects for parasitic nematodes and other species. The trace2dbest module processes sequence trace files and prepares the text files necessary for the submission of the sequences to the public repository dbEST. PartiGene provides facilities for clustering and assembling the ESTs into putative gene objects or unigenes and organizes the data in a relational database. Additional tools are available for annotation and for making the data accessible via the World Wide Web. PMID:19277557

  15. Sequencing and comparing whole mitochondrial genomes ofanimals

    SciTech Connect

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  16. Corrected sequence of the wheat plastid genome.

    PubMed

    Bahieldin, Ahmed; Al-Kordy, Magdy A; Shokry, Ahmed M; Gadalla, Nour O; Al-Hejin, Ahmed M M; Sabir, Jamal S M; Hassan, Sabah M; Al-Ahmadi, Ahlam A; Schwarz, Erika N; Eissa, Hala F; El-Domyati, Fotouh M; Jansen, Robert K

    2014-09-01

    Wheat is the most important cereal in the world in terms of acreage and productivity. We sequenced and assembled the plastid genome of one Egyptian wheat cultivar using next-generation sequence data. The size of the plastid genome is 133,873 bp, which is 672 bp smaller than the published plastid genome of "Chinese Spring" cultivar, due mainly to the presence of three sequences from the rice plastid genome. The difference in size between the previously published wheat plastid genome and the sequence reported here is due to contamination of the published genome with rice plastid DNA, most of which is present in three sequences of 332, 131 and 131 bp. The corrected plastid genome of wheat has been submitted to GenBank (accession number KJ592713) and can be used in future comparisons. PMID:25242688

  17. Genotyping-by-Sequencing in Plants

    PubMed Central

    Deschamps, Stéphane; Llaca, Victor; May, Gregory D.

    2012-01-01

    The advent of next-generation DNA sequencing (NGS) technologies has led to the development of rapid genome-wide Single Nucleotide Polymorphism (SNP) detection applications in various plant species. Recent improvements in sequencing throughput combined with an overall decrease in costs per gigabase of sequence is allowing NGS to be applied to not only the evaluation of small subsets of parental inbred lines, but also the mapping and characterization of traits of interest in much larger populations. Such an approach, where sequences are used simultaneously to detect and score SNPs, therefore bypassing the entire marker assay development stage, is known as genotyping-by-sequencing (GBS). This review will summarize the current state of GBS in plants and the promises it holds as a genome-wide genotyping application. PMID:24832503

  18. Sequence Affects the Cyclization of DNA Minicircles.

    PubMed

    Wang, Qian; Pettitt, B Montgomery

    2016-03-17

    Understanding how the sequence of a DNA molecule affects its dynamic properties is a central problem affecting biochemistry and biotechnology. The process of cyclizing short DNA, as a critical step in molecular cloning, lacks a comprehensive picture of the kinetic process containing sequence information. We have elucidated this process by using coarse-grained simulations, enhanced sampling methods, and recent theoretical advances. We are able to identify the types and positions of structural defects during the looping process at a base-pair level. Correlations along a DNA molecule dictate critical sequence positions that can affect the looping rate. Structural defects change the bending elasticity of the DNA molecule from a harmonic to subharmonic potential with respect to bending angles. We explore the subelastic chain as a possible model in loop formation kinetics. A sequence-dependent model is developed to qualitatively predict the relative loop formation time as a function of DNA sequence. PMID:26938490

  19. Optimal assembly for high throughput shotgun sequencing

    PubMed Central

    2013-01-01

    We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization. PMID:23902516

  20. Optimized ultra-fast imaging sequence (OUFIS).

    PubMed

    Zha, L; Lowe, I J

    1995-03-01

    The DUFIS sequence can make ultra-fast images (approximately 10 ms) without the use of rapidly switched gradients. The RF excitation sequence is spatially selective so that only a small fraction of the magnetization in each imaging pixel is used which produces a poor imaging signal to noise ratio (SNR). We have developed several alternative RF sequences that use RF pulses with multiple phases, and also with just 0 degrees and 180 degrees phases to excite almost all the magnetization in a pixel and greatly improve the SNR. The optimization of these pulse sequences (now called OUFIS) have been conducted both analytically and by numerical searches, with various linear and nonlinear models. Both theoretical and computational methods used in the optimizations are described in detail. Preliminary experimental results are briefly presented with several possible applications of the OUFIS excitation sequences suggested. PMID:7760705

  1. The DNA sequence of human chromosome 7.

    PubMed

    Hillier, Ladeana W; Fulton, Robert S; Fulton, Lucinda A; Graves, Tina A; Pepin, Kymberlie H; Wagner-McPherson, Caryn; Layman, Dan; Maas, Jason; Jaeger, Sara; Walker, Rebecca; Wylie, Kristine; Sekhon, Mandeep; Becker, Michael C; O'Laughlin, Michelle D; Schaller, Mark E; Fewell, Ginger A; Delehaunty, Kimberly D; Miner, Tracie L; Nash, William E; Cordes, Matt; Du, Hui; Sun, Hui; Edwards, Jennifer; Bradshaw-Cordum, Holland; Ali, Johar; Andrews, Stephanie; Isak, Amber; Vanbrunt, Andrew; Nguyen, Christine; Du, Feiyu; Lamar, Betty; Courtney, Laura; Kalicki, Joelle; Ozersky, Philip; Bielicki, Lauren; Scott, Kelsi; Holmes, Andrea; Harkins, Richard; Harris, Anthony; Strong, Cynthia Madsen; Hou, Shunfang; Tomlinson, Chad; Dauphin-Kohlberg, Sara; Kozlowicz-Reilly, Amy; Leonard, Shawn; Rohlfing, Theresa; Rock, Susan M; Tin-Wollam, Aye-Mon; Abbott, Amanda; Minx, Patrick; Maupin, Rachel; Strowmatt, Catrina; Latreille, Phil; Miller, Nancy; Johnson, Doug; Murray, Jennifer; Woessner, Jeffrey P; Wendl, Michael C; Yang, Shiaw-Pyng; Schultz, Brian R; Wallis, John W; Spieth, John; Bieri, Tamberlyn A; Nelson, Joanne O; Berkowicz, Nicolas; Wohldmann, Patricia E; Cook, Lisa L; Hickenbotham, Matthew T; Eldred, James; Williams, Donald; Bedell, Joseph A; Mardis, Elaine R; Clifton, Sandra W; Chissoe, Stephanie L; Marra, Marco A; Raymond, Christopher; Haugen, Eric; Gillett, Will; Zhou, Yang; James, Rose; Phelps, Karen; Iadanoto, Shawn; Bubb, Kerry; Simms, Elizabeth; Levy, Ruth; Clendenning, James; Kaul, Rajinder; Kent, W James; Furey, Terrence S; Baertsch, Robert A; Brent, Michael R; Keibler, Evan; Flicek, Paul; Bork, Peer; Suyama, Mikita; Bailey, Jeffrey A; Portnoy, Matthew E; Torrents, David; Chinwalla, Asif T; Gish, Warren R; Eddy, Sean R; McPherson, John D; Olson, Maynard V; Eichler, Evan E; Green, Eric D; Waterston, Robert H; Wilson, Richard K

    2003-07-10

    Human chromosome 7 has historically received prominent attention in the human genetics community, primarily related to the search for the cystic fibrosis gene and the frequent cytogenetic changes associated with various forms of cancer. Here we present more than 153 million base pairs representing 99.4% of the euchromatic sequence of chromosome 7, the first metacentric chromosome completed so far. The sequence has excellent concordance with previously established physical and genetic maps, and it exhibits an unusual amount of segmentally duplicated sequence (8.2%), with marked differences between the two arms. Our initial analyses have identified 1,150 protein-coding genes, 605 of which have been confirmed by complementary DNA sequences, and an additional 941 pseudogenes. Of genes confirmed by transcript sequences, some are polymorphic for mutations that disrupt the reading frame. PMID:12853948

  2. Pulse sequences in photoassociation via adiabatic passage

    NASA Astrophysics Data System (ADS)

    Li, Xuan; Dupre, William; Parker, Gregory A.

    2012-07-01

    We perform a detailed study of pulse sequences in a photoassociation via adiabatic passage (PAP) process to transfer population from an ensemble of ultracold atomic clouds to a vibrationally cold molecular state. We show that an appreciable final population of ultracold NaCs molecules can be achieved with optimized pulses in either the ‘counter-intuitive’ (tP > tS) or ‘intuitive’ (tP < tS) PAP pulse sequences, with tP and tS denoting the temporal centers of the pump and Stokes pulses, respectively. By investigating the dependence of the reactive yield on pulse sequences, in a wide range of tP-tS, we show that there is not a fundamental preference to either pulse sequence in a PAP process. We explain this no-sequence-preference phenomenon by analyzing a multi-bound model so that an analogy can be drawn to the conventional stimulated Raman adiabatic passage.

  3. [Sequence learning in major depressive disorder].

    PubMed

    Borbély-Ipkovich, Emöke; Németh, Dezsö; Janacsek, Karolina; Gonda, Xénia

    2014-01-01

    Major Depressive Disorder (MDD) is one of the most common psychiatric diagnoses, accompanied by several psychological, behavioural and emotional symptoms, and in addition to the symptoms affecting the quality of life, it can lead to severe consequences, including suicide. Sequence learning plays a key role in adapting to the environment, neural plasticity, first language acquisition, social learning and skills, at the same time it defines the behaviour of the patient and also therapeutic possibilities. The aim of this paper is to review sequence learning and its consolidation in MDD. We know little about the effects of mood disorders on sequence learning; the results are contradictory, therefore, further studies are needed to test the effects of MDD on sequence learning and on the consolidation of implicitly acquired sequence knowledge. PMID:25411225

  4. De novo assembly of a bell pepper endornavirus genome sequence using RNA sequencing data.

    PubMed

    Jo, Yeonhwa; Choi, Hoseng; Cho, Won Kyong

    2015-01-01

    The genus Endornavirus is a double-stranded RNA virus that infects a wide range of hosts. In this study, we report on the de novo assembly of a bell pepper endornavirus genome sequence by RNA sequencing (RNA-Seq). Our result demonstrates the successful application of RNA-Seq to obtain a complete viral genome sequence from the transcriptome data. PMID:25792042

  5. Complete genome sequence of southern tomato virus identified from China using next generation sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...

  6. Whole-Genome Shotgun Sequencing of a Colonizing Multilocus Sequence Type 17 Streptococcus agalactiae Strain

    PubMed Central

    Singh, Pallavi; Springman, A. Cody; Davies, H. Dele

    2012-01-01

    This report highlights the whole-genome shotgun draft sequence for a Streptococcus agalactiae strain representing multilocus sequence type (ST) 17, isolated from a colonized woman at 8 weeks postpartum. This sequence represents an important addition to the published genomes and will promote comparative genomic studies of S. agalactiae recovered from diverse sources. PMID:23045509

  7. Sequencing and analysis of a genomic fragment provide an insight into the Dunaliella viridis genomic sequence.

    PubMed

    Sun, Xiao-Ming; Tang, Yuan-Ping; Meng, Xiang-Zong; Zhang, Wen-Wen; Li, Shan; Deng, Zhi-Rui; Xu, Zheng-Kai; Song, Ren-Tao

    2006-11-01

    Dunaliella is a genus of wall-less unicellular eukaryotic green alga. Its exceptional resistances to salt and various other stresses have made it an ideal model for stress tolerance study. However, very little is known about its genome and genomic sequences. In this study, we sequenced and analyzed a 29,268 bp genomic fragment from Dunaliella viridis. The fragment showed low sequence homology to the GenBank database. At the nucleotide level, only a segment with significant sequence homology to 18S rRNA was found. The fragment contained six putative genes, but only one gene showed significant homology at the protein level to GenBank database. The average GC content of this sequence was 51.1%, which was much lower than that of close related green algae Chlamydomonas (65.7%). Significant segmental duplications were found within this fragment. The duplicated sequences accounted for about 35.7% of the entire region. Large amounts of simple sequence repeats (microsatellites) were found, with strong bias towards (AC)(n) type (76%). Analysis of other Dunaliella genomic sequences in the GenBank database (total 25,749 bp) was in agreement with these findings. These sequence features made it difficult to sequence Dunaliella genomic sequences. Further investigation should be made to reveal the biological significance of these unique sequence features. PMID:17091199

  8. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid sequence disclosures must include a copy of the sequence listing in accordance with the requirements in 37 CFR...

  9. Draft Genome Sequence of Neisseria gonorrhoeae Sequence Type 1407, a Multidrug-Resistant Clinical Isolate.

    PubMed

    Anselmo, A; Ciammaruconi, A; Carannante, A; Neri, A; Fazio, C; Fortunato, A; Palozzi, A M; Vacca, P; Fillo, S; Lista, F; Stefanelli, P

    2015-01-01

    Gonorrhea may become untreatable due to the spread of resistant or multidrug-resistant strains. Cefixime-resistant gonococci belonging to sequence type 1407 have been described worldwide. We report the genome sequence of Neisseria gonorrhoeae strain G2891, a multidrug-resistant isolate of sequence type 1407, collected in Italy in 2013. PMID:26272575

  10. De novo sequencing of unique sequence tags for discovery of post-translational modifications of proteins

    SciTech Connect

    Shen, Yufeng; Tolic, Nikola; Hixson, Kim K.; Purvine, Samuel O.; Anderson, Gordon A.; Smith, Richard D.

    2008-10-15

    De novo sequencing has a promise to discover the protein post-translation modifications; however, such approach is still in their infancy and not widely applied for proteomics practices due to its limited reliability. In this work, we describe a de novo sequencing approach for discovery of protein modifications through identification of the UStags (Anal. Chem. 2008, 80, 1871-1882). The de novo information was obtained from Fourier-transform tandem mass spectrometry for peptides and polypeptides in a yeast lysate, and the de novo sequences obtained were filtered to define a more limited set of UStags. The DNA-predicted database protein sequences were then compared to the UStags, and the differences observed across or in the UStags (i.e., the UStags’ prefix and suffix sequences and the UStags themselves) were used to infer the possible sequence modifications. With this de novo-UStag approach, we uncovered some unexpected variances of yeast protein sequences due to amino acid mutations and/or multiple modifications to the predicted protein sequences. Random matching of the de novo sequences to the predicted sequences were examined with use of two random (false) databases, and ~3% false discovery rates were estimated for the de novo-UStag approach. The factors affecting the reliability (e.g., existence of de novo sequencing noise residues and redundant sequences) and the sensitivity are described. The de novo-UStag complements the UStag method previously reported by enabling discovery of new protein modifications.

  11. Nanopore sequencing detects structural variants in cancer.

    PubMed

    Norris, Alexis L; Workman, Rachael E; Fan, Yunfan; Eshleman, James R; Timp, Winston

    2016-03-01

    Despite advances in sequencing, structural variants (SVs) remain difficult to reliably detect due to the short read length (<300 bp) of 2nd generation sequencing. Not only do the reads (or paired-end reads) need to straddle a breakpoint, but repetitive elements often lead to ambiguities in the alignment of short reads. We propose to use the long-reads (up to 20 kb) possible with 3rd generation sequencing, specifically nanopore sequencing on the MinION. Nanopore sequencing relies on a similar concept to a Coulter counter, reading the DNA sequence from the change in electrical current resulting from a DNA strand being forced through a nanometer-sized pore embedded in a membrane. Though nanopore sequencing currently has a relatively high mismatch rate that precludes base substitution and small frameshift mutation detection, its accuracy is sufficient for SV detection because of its long reads. In fact, long reads in some cases may improve SV detection efficiency. We have tested nanopore sequencing to detect a series of well-characterized SVs, including large deletions, inversions, and translocations that inactivate the CDKN2A/p16 and SMAD4/DPC4 tumor suppressor genes in pancreatic cancer. Using PCR amplicon mixes, we have demonstrated that nanopore sequencing can detect large deletions, translocations and inversions at dilutions as low as 1:100, with as few as 500 reads per sample. Given the speed, small footprint, and low capital cost, nanopore sequencing could become the ideal tool for the low-level detection of cancer-associated SVs needed for molecular relapse, early detection, or therapeutic monitoring. PMID:26787508

  12. Reading biological processes from nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Murugan, Anand

    Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical

  13. Nanopore sequencing detects structural variants in cancer

    PubMed Central

    Norris, Alexis L.; Workman, Rachael E.; Fan, Yunfan; Eshleman, James R.; Timp, Winston

    2016-01-01

    ABSTRACT Despite advances in sequencing, structural variants (SVs) remain difficult to reliably detect due to the short read length (<300 bp) of 2nd generation sequencing. Not only do the reads (or paired-end reads) need to straddle a breakpoint, but repetitive elements often lead to ambiguities in the alignment of short reads. We propose to use the long-reads (up to 20 kb) possible with 3rd generation sequencing, specifically nanopore sequencing on the MinION. Nanopore sequencing relies on a similar concept to a Coulter counter, reading the DNA sequence from the change in electrical current resulting from a DNA strand being forced through a nanometer-sized pore embedded in a membrane. Though nanopore sequencing currently has a relatively high mismatch rate that precludes base substitution and small frameshift mutation detection, its accuracy is sufficient for SV detection because of its long reads. In fact, long reads in some cases may improve SV detection efficiency. We have tested nanopore sequencing to detect a series of well-characterized SVs, including large deletions, inversions, and translocations that inactivate the CDKN2A/p16 and SMAD4/DPC4 tumor suppressor genes in pancreatic cancer. Using PCR amplicon mixes, we have demonstrated that nanopore sequencing can detect large deletions, translocations and inversions at dilutions as low as 1:100, with as few as 500 reads per sample. Given the speed, small footprint, and low capital cost, nanopore sequencing could become the ideal tool for the low-level detection of cancer-associated SVs needed for molecular relapse, early detection, or therapeutic monitoring. PMID:26787508

  14. Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks

    PubMed Central

    Ma, Qicheng; Chirn, Gung-Wei; Cai, Richard; Szustakowski, Joseph D; Nirmala, NR

    2005-01-01

    Background The sequencing of the human genome has enabled us to access a comprehensive list of genes (both experimental and predicted) for further analysis. While a majority of the approximately 30000 known and predicted human coding genes are characterized and have been assigned at least one function, there remains a fair number of genes (about 12000) for which no annotation has been made. The recent sequencing of other genomes has provided us with a huge amount of auxiliary sequence data which could help in the characterization of the human genes. Clustering these sequences into families is one of the first steps to perform comparative studies across several genomes. Results Here we report a novel clustering algorithm (CLUGEN) that has been used to cluster sequences of experimentally verified and predicted proteins from all sequenced genomes using a novel distance metric which is a neural network score between a pair of protein sequences. This distance metric is based on the pairwise sequence similarity score and the similarity between their domain structures. The distance metric is the probability that a pair of protein sequences are of the same Interpro family/domain, which facilitates the modelling of transitive homology closure to detect remote homologues. The hierarchical average clustering method is applied with the new distance metric. Conclusion Benchmarking studies of our algorithm versus those reported in the literature shows that our algorithm provides clustering results with lower false positive and false negative rates. The clustering algorithm is applied to cluster several eukaryotic genomes and several dozens of prokaryotic genomes. PMID:16202129

  15. Exploration of noncoding sequences in metagenomes.

    PubMed

    Tobar-Tosse, Fabián; Rodríguez, Adrián C; Vélez, Patricia E; Zambrano, María M; Moreno, Pedro A

    2013-01-01

    Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C) content, Codon Usage (Cd), Trinucleotide Usage (Tn), and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS) in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment. PMID:23536879

  16. Value of a newly sequenced bacterial genome.

    PubMed

    Barbosa, Eudes Gv; Aburjaile, Flavia F; Ramos, Rommel Tj; Carneiro, Adriana R; Le Loir, Yves; Baumbach, Jan; Miyoshi, Anderson; Silva, Artur; Azevedo, Vasco

    2014-05-26

    Next-generation sequencing (NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft (partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information. PMID:24921006

  17. Exploration of Noncoding Sequences in Metagenomes

    PubMed Central

    Tobar-Tosse, Fabián; Rodríguez, Adrián C.; Vélez, Patricia E.; Zambrano, María M.; Moreno, Pedro A.

    2013-01-01

    Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C) content, Codon Usage (Cd), Trinucleotide Usage (Tn), and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS) in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment. PMID:23536879

  18. Gelada vocal sequences follow Menzerath's linguistic law.

    PubMed

    Gustison, Morgan L; Semple, Stuart; Ferrer-I-Cancho, Ramon; Bergman, Thore J

    2016-05-10

    Identifying universal principles underpinning diverse natural systems is a key goal of the life sciences. A powerful approach in addressing this goal has been to test whether patterns consistent with linguistic laws are found in nonhuman animals. Menzerath's law is a linguistic law that states that, the larger the construct, the smaller the size of its constituents. Here, to our knowledge, we present the first evidence that Menzerath's law holds in the vocal communication of a nonhuman species. We show that, in vocal sequences of wild male geladas (Theropithecus gelada), construct size (sequence size in number of calls) is negatively correlated with constituent size (duration of calls). Call duration does not vary significantly with position in the sequence, but call sequence composition does change with sequence size and most call types are abbreviated in larger sequences. We also find that intercall intervals follow the same relationship with sequence size as do calls. Finally, we provide formal mathematical support for the idea that Menzerath's law reflects compression-the principle of minimizing the expected length of a code. Our findings suggest that a common principle underpins human and gelada vocal communication, highlighting the value of exploring the applicability of linguistic laws in vocal systems outside the realm of language. PMID:27091968

  19. Fungal genome sequencing: basic biology to biotechnology.

    PubMed

    Sharma, Krishna Kant

    2016-08-01

    The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet's stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research. PMID:25721271

  20. A comparative evaluation of sequence classification programs

    PubMed Central

    2012-01-01

    Background A fundamental problem in modern genomics is to taxonomically or functionally classify DNA sequence fragments derived from environmental sampling (i.e., metagenomics). Several different methods have been proposed for doing this effectively and efficiently, and many have been implemented in software. In addition to varying their basic algorithmic approach to classification, some methods screen sequence reads for ’barcoding genes’ like 16S rRNA, or various types of protein-coding genes. Due to the sheer number and complexity of methods, it can be difficult for a researcher to choose one that is well-suited for a particular analysis. Results We divided the very large number of programs that have been released in recent years for solving the sequence classification problem into three main categories based on the general algorithm they use to compare a query sequence against a database of sequences. We also evaluated the performance of the leading programs in each category on data sets whose taxonomic and functional composition is known. Conclusions We found significant variability in classification accuracy, precision, and resource consumption of sequence classification programs when used to analyze various metagenomics data sets. However, we observe some general trends and patterns that will be useful to researchers who use sequence classification programs. PMID:22574964

  1. Microfluidic droplet enrichment for targeted sequencing

    PubMed Central

    Eastburn, Dennis J.; Huang, Yong; Pellegrino, Maurizio; Sciambi, Adam; Ptáček, Louis J.; Abate, Adam R.

    2015-01-01

    Targeted sequence enrichment enables better identification of genetic variation by providing increased sequencing coverage for genomic regions of interest. Here, we report the development of a new target enrichment technology that is highly differentiated from other approaches currently in use. Our method, MESA (Microfluidic droplet Enrichment for Sequence Analysis), isolates genomic DNA fragments in microfluidic droplets and performs TaqMan PCR reactions to identify droplets containing a desired target sequence. The TaqMan positive droplets are subsequently recovered via dielectrophoretic sorting, and the TaqMan amplicons are removed enzymatically prior to sequencing. We demonstrated the utility of this approach by generating an average 31.6-fold sequence enrichment across 250 kb of targeted genomic DNA from five unique genomic loci. Significantly, this enrichment enabled a more comprehensive identification of genetic polymorphisms within the targeted loci. MESA requires low amounts of input DNA, minimal prior locus sequence information and enriches the target region without PCR bias or artifacts. These features make it well suited for the study of genetic variation in a number of research and diagnostic applications. PMID:25873629

  2. HLA typing by direct DNA sequencing.

    PubMed

    Smith, Linda K

    2012-01-01

    Sequencing-based typing is a high resolution method for the identification of HLA polymorphisms. The majority of HLA Class I alleles can be discriminated by their exon 2 and 3 sequence, and for Class II alleles, exon 2 is generally sufficient. There are polymorphic positions in other exons which may require additional sequencing to exclude certain alleles with differences outside exon 2 and 3, depending on the clinical requirement and relevant accredition guidelines. The process involves selective amplification of target alleles by PCR, agarose gel electrophoresis of the PCR products to assess the quantity and quality, followed by purification of PCR amplicons to remove excess primer and dNTPs. Cycle sequencing reactions using Applied Biosystems™ BigDye(®) Terminator Ready Reaction v1.1 or v3.1 Kit are performed, then purification of sequence reactions before electrophoresing using Applied Biosystems™ 3730 or 3730XL Genetic Analyser (or similar). Data is processed by specialised software packages, which compare the sample sequence to the sequences of all possible theoretical allele combinations to assign an accurate genotype. Examination of all nucleotides, both at conserved and polymorphic positions enables the direct identification of new alleles, which may not be possible with techniques such as SSP and SSO typing. PMID:22665229

  3. CATEGORIZATION OF EVENT SEQUENCES FOR LICENSE APPLICATION

    SciTech Connect

    G.E. Ragan; P. Mecheret; D. Dexheimer

    2005-04-14

    The purposes of this analysis are: (1) Categorize (as Category 1, Category 2, or Beyond Category 2) internal event sequences that may occur before permanent closure of the repository at Yucca Mountain. (2) Categorize external event sequences that may occur before permanent closure of the repository at Yucca Mountain. This includes examining DBGM-1 seismic classifications and upgrading to DBGM-2, if appropriate, to ensure Beyond Category 2 categorization. (3) State the design and operational requirements that are invoked to make the categorization assignments valid. (4) Indicate the amount of material put at risk by Category 1 and Category 2 event sequences. (5) Estimate frequencies of Category 1 event sequences at the maximum capacity and receipt rate of the repository. (6) Distinguish occurrences associated with normal operations from event sequences. It is beyond the scope of the analysis to propose design requirements that may be required to control radiological exposure associated with normal operations. (7) Provide a convenient compilation of the results of the analysis in tabular form. The results of this analysis are used as inputs to the consequence analyses in an iterative design process that is depicted in Figure 1. Categorization of event sequences for permanent retrieval of waste from the repository is beyond the scope of this analysis. Cleanup activities that take place after an event sequence and other responses to abnormal events are also beyond the scope of the analysis.

  4. Sequencing and Analysis of Neanderthal Genomic DNA

    PubMed Central

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Pääbo, Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2008-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library are of Neanderthal origin, the strongest being the ascertainment of sequence identities between Neanderthal and chimpanzee at sites where the human genomic sequence is different. These results enabled us to calculate the human-Neanderthal divergence time based on multiple randomly distributed autosomal loci. Our analyses suggest that on average the Neanderthal genomic sequence we obtained and the reference human genome sequence share a most recent common ancestor ~706,000 years ago, and that the human and Neanderthal ancestral populations split ~370,000 years ago, before the emergence of anatomically modern humans. Our finding that the Neanderthal and human genomes are at least 99.5% identical led us to develop and successfully implement a targeted method for recovering specific ancient DNA sequences from metagenomic libraries. This initial analysis of the Neanderthal genome advances our understanding of the evolutionary relationship of Homo sapiens and Homo neanderthalensis and signifies the dawn of Neanderthal genomics. PMID:17110569

  5. Effects of the Ion PGM™ Hi-Q™ sequencing chemistry on sequence data quality.

    PubMed

    Churchill, Jennifer D; King, Jonathan L; Chakraborty, Ranajit; Budowle, Bruce

    2016-09-01

    Massively parallel sequencing (MPS) offers substantial improvements over current forensic DNA typing methodologies such as increased resolution, scalability, and throughput. The Ion PGM™ is a promising MPS platform for analysis of forensic biological evidence. The system employs a sequencing-by-synthesis chemistry on a semiconductor chip that measures a pH change due to the release of hydrogen ions as nucleotides are incorporated into the growing DNA strands. However, implementation of MPS into forensic laboratories requires a robust chemistry. Ion Torrent's Hi-Q™ Sequencing Chemistry was evaluated to determine if it could improve on the quality of the generated sequence data in association with selected genetic marker targets. The whole mitochondrial genome and the HID-Ion STR 10-plex panel were sequenced on the Ion PGM™ system with the Ion PGM™ Sequencing 400 Kit and the Ion PGM™ Hi-Q™ Sequencing Kit. Concordance, coverage, strand balance, noise, and deletion ratios were assessed in evaluating the performance of the Ion PGM™ Hi-Q™ Sequencing Kit. The results indicate that reliable, accurate data are generated and that sequencing through homopolymeric regions can be improved with the use of Ion Torrent's Hi-Q™ Sequencing Chemistry. Overall, the quality of the generated sequencing data supports the potential for use of the Ion PGM™ in forensic genetic laboratories. PMID:27025714

  6. Statistical properties of filtered pseudorandom digital sequences formed from the sum of maximum-length sequences

    NASA Technical Reports Server (NTRS)

    Wallace, G. R.; Weathers, G. D.; Graf, E. R.

    1973-01-01

    The statistics of filtered pseudorandom digital sequences called hybrid-sum sequences, formed from the modulo-two sum of several maximum-length sequences, are analyzed. The results indicate that a relation exists between the statistics of the filtered sequence and the characteristic polynomials of the component maximum length sequences. An analysis procedure is developed for identifying a large group of sequences with good statistical properties for applications requiring the generation of analog pseudorandom noise. By use of the analysis approach, the filtering process is approximated by the convolution of the sequence with a sum of unit step functions. A parameter reflecting the overall statistical properties of filtered pseudorandom sequences is derived. This parameter is called the statistical quality factor. A computer algorithm to calculate the statistical quality factor for the filtered sequences is presented, and the results for two examples of sequence combinations are included. The analysis reveals that the statistics of the signals generated with the hybrid-sum generator are potentially superior to the statistics of signals generated with maximum-length generators. Furthermore, fewer calculations are required to evaluate the statistics of a large group of hybrid-sum generators than are required to evaluate the statistics of the same size group of approximately equivalent maximum-length sequences.

  7. Sequencing, analysis, and annotation of expressed sequence tags for Camelus dromedarius.

    PubMed

    Al-Swailem, Abdulaziz M; Shehata, Maher M; Abu-Duhier, Faisel M; Al-Yamani, Essam J; Al-Busadah, Khalid A; Al-Arawi, Mohammed S; Al-Khider, Ali Y; Al-Muhaimeed, Abdullah N; Al-Qahtani, Fahad H; Manee, Manee M; Al-Shomrani, Badr M; Al-Qhtani, Saad M; Al-Harthi, Amer S; Akdemir, Kadir C; Inan, Mehmet S; Otu, Hasan H

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and approximately 40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  8. Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

    PubMed Central

    Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  9. Robust temporal alignment of multimodal cardiac sequences

    NASA Astrophysics Data System (ADS)

    Perissinotto, Andrea; Queirós, Sandro; Morais, Pedro; Baptista, Maria J.; Monaghan, Mark; Rodrigues, Nuno F.; D'hooge, Jan; Vilaça, João. L.; Barbosa, Daniel

    2015-03-01

    Given the dynamic nature of cardiac function, correct temporal alignment of pre-operative models and intraoperative images is crucial for augmented reality in cardiac image-guided interventions. As such, the current study focuses on the development of an image-based strategy for temporal alignment of multimodal cardiac imaging sequences, such as cine Magnetic Resonance Imaging (MRI) or 3D Ultrasound (US). First, we derive a robust, modality-independent signal from the image sequences, estimated by computing the normalized cross-correlation between each frame in the temporal sequence and the end-diastolic frame. This signal is a resembler for the left-ventricle (LV) volume curve over time, whose variation indicates different temporal landmarks of the cardiac cycle. We then perform the temporal alignment of these surrogate signals derived from MRI and US sequences of the same patient through Dynamic Time Warping (DTW), allowing to synchronize both sequences. The proposed framework was evaluated in 98 patients, which have undergone both 3D+t MRI and US scans. The end-systolic frame could be accurately estimated as the minimum of the image-derived surrogate signal, presenting a relative error of 1.6 +/- 1.9% and 4.0 +/- 4.2% for the MRI and US sequences, respectively, thus supporting its association with key temporal instants of the cardiac cycle. The use of DTW reduces the desynchronization of the cardiac events in MRI and US sequences, allowing to temporally align multimodal cardiac imaging sequences. Overall, a generic, fast and accurate method for temporal synchronization of MRI and US sequences of the same patient was introduced. This approach could be straightforwardly used for the correct temporal alignment of pre-operative MRI information and intra-operative US images.

  10. Oligonucleotide Sequence Motifs as Nucleosome Positioning Signals

    PubMed Central

    Collings, Clayton K.; Fernandez, Alfonso G.; Pitschka, Chad G.; Hawkins, Troy B.; Anderson, John N.

    2010-01-01

    To gain a better understanding of the sequence patterns that characterize positioned nucleosomes, we first performed an analysis of the periodicities of the 256 tetranucleotides in a yeast genome-wide library of nucleosomal DNA sequences that was prepared by in vitro reconstitution. The approach entailed the identification and analysis of 24 unique tetranucleotides that were defined by 8 consensus sequences. These consensus sequences were shown to be responsible for most if not all of the tetranucleotide and dinucleotide periodicities displayed by the entire library, demonstrating that the periodicities of dinucleotides that characterize the yeast genome are, in actuality, due primarily to the 8 consensus sequences. A novel combination of experimental and bioinformatic approaches was then used to show that these tetranucleotides are important for preferred formation of nucleosomes at specific sites along DNA in vitro. These results were then compared to tetranucleotide patterns in genome-wide in vivo libraries from yeast and C. elegans in order to assess the contributions of DNA sequence in the control of nucleosome residency in the cell. These comparisons revealed striking similarities in the tetranucleotide occurrence profiles that are likely to be involved in nucleosome positioning in both in vitro and in vivo libraries, suggesting that DNA sequence is an important factor in the control of nucleosome placement in vivo. However, the strengths of the tetranucleotide periodicities were 3–4 fold higher in the in vitro as compared to the in vivo libraries, which implies that DNA sequence plays less of a role in dictating nucleosome positions in vivo. The results of this study have important implications for models of sequence-dependent positioning since they suggest that a defined subset of tetranucleotides is involved in preferred nucleosome occupancy and that these tetranucleotides are the major source of the dinucleotide periodicities that are characteristic of

  11. Quantitative texton sequences for legible bivariate maps.

    PubMed

    Ware, Colin

    2009-01-01

    Representing bivariate scalar maps is a common but difficult visualization problem. One solution has been to use two dimensional color schemes, but the results are often hard to interpret and inaccurately read. An alternative is to use a color sequence for one variable and a texture sequence for another. This has been used, for example, in geology, but much less studied than the two dimensional color scheme, although theory suggests that it should lead to easier perceptual separation of information relating to the two variables. To make a texture sequence more clearly readable the concept of the quantitative texton sequence (QTonS) is introduced. A QTonS is defined a sequence of small graphical elements, called textons, where each texton represents a different numerical value and sets of textons can be densely displayed to produce visually differentiable textures. An experiment was carried out to compare two bivariate color coding schemes with two schemes using QTonS for one bivariate map component and a color sequence for the other. Two different key designs were investigated (a key being a sequence of colors or textures used in obtaining quantitative values from a map). The first design used two separate keys, one for each dimension, in order to measure how accurately subjects could independently estimate the underlying scalar variables. The second key design was two dimensional and intended to measure the overall integral accuracy that could be obtained. The results show that the accuracy is substantially higher for the QTonS/color sequence schemes. A hypothesis that texture/color sequence combinations are better for independent judgments of mapped quantities was supported. A second experiment probed the limits of spatial resolution for QTonSs. PMID:19834229

  12. Robot Sequencing and Visualization Program (RSVP)

    NASA Technical Reports Server (NTRS)

    Cooper, Brian K.; Maxwell,Scott A.; Hartman, Frank R.; Wright, John R.; Yen, Jeng; Toole, Nicholas T.; Gorjian, Zareh; Morrison, Jack C

    2013-01-01

    The Robot Sequencing and Visualization Program (RSVP) is being used in the Mars Science Laboratory (MSL) mission for downlink data visualization and command sequence generation. RSVP reads and writes downlink data products from the operations data server (ODS) and writes uplink data products to the ODS. The primary users of RSVP are members of the Rover Planner team (part of the Integrated Planning and Execution Team (IPE)), who use it to perform traversability/articulation analyses, take activity plan input from the Science and Mission Planning teams, and create a set of rover sequences to be sent to the rover every sol. The primary inputs to RSVP are downlink data products and activity plans in the ODS database. The primary outputs are command sequences to be placed in the ODS for further processing prior to uplink to each rover. RSVP is composed of two main subsystems. The first, called the Robot Sequence Editor (RoSE), understands the MSL activity and command dictionaries and takes care of converting incoming activity level inputs into command sequences. The Rover Planners use the RoSE component of RSVP to put together command sequences and to view and manage command level resources like time, power, temperature, etc. (via a transparent realtime connection to SEQGEN). The second component of RSVP is called HyperDrive, a set of high-fidelity computer graphics displays of the Martian surface in 3D and in stereo. The Rover Planners can explore the environment around the rover, create commands related to motion of all kinds, and see the simulated result of those commands via its underlying tight coupling with flight navigation, motor, and arm software. This software is the evolutionary replacement for the Rover Sequencing and Visualization software used to create command sequences (and visualize the Martian surface) for the Mars Exploration Rover mission.

  13. DNA Sequence Determination by Hybridization: A Strategy for Efficient Large-Scale Sequencing

    NASA Astrophysics Data System (ADS)

    Drmanac, R.; Drmanac, S.; Strezoska, Z.; Paunesku, T.; Labat, I.; Zeremski, M.; Snoddy, J.; Funkhouser, W. K.; Koop, B.; Hood, L.; Crkvenjakov, R.

    1993-06-01

    The concept of sequencing by hybridization (SBH) makes use of an array of all possible n-nucleotide oligomers (n-mers) to identify n-mers present in an unknown DNA sequence. Computational approaches can then be used to assemble the complete sequence. As a validation of this concept, the sequences of three DNA fragments, 343 base pairs in length, were determined with octamer oligonucleotides. Possible applications of SBH include physical mapping (ordering) of overlapping DNA clones, sequence checking, DNA fingerprinting comparisons of normal and disease-causing genes, and the identification of DNA fragments with particular sequence motifs in complementary DNA and genomic libraries. The SBH techniques may accelerate the mapping and sequencing phases of the human genome project.

  14. Pittosporum cryptic virus 1: genome sequence completion using next-generation sequencing.

    PubMed

    Elbeaino, Toufic; Kubaa, Raied Abou; Tuzlali, Hasan Tuna; Digiaro, Michele

    2016-07-01

    Next-generation sequencing (NGS) was applied to dsRNAs extracted from an Italian pittosporum plant infected with pittosporum cryptic virus 1 (PiCV1). NGS allowed assembly of the full genome sequence of PiCV1, comprising dsRNA1 (1.9 kbp) and dsRNA2 (1.5 kbp), which encode the RNA-dependent RNA polymerase and capsid protein genes, respectively. Phylogenetic and sequence analyses confirmed that PiCV1 is a new member of the genus Deltapartitivirus, family Partiviridae. From the same plant, NSG also permitted assembly of the complete genome sequence of eggplant mottled dwarf virus (EMDV), which shared 86 % to 98 % nucleotide sequence identity with complete and partial sequences (ca 6750 nt) of other known EMDV isolates with sequences available in the GenBank database. PMID:27087112

  15. DNA sequence determination by hybridization: A strategy for efficient large-scale sequencing

    SciTech Connect

    Drmanac, R.; Drmanac, S.; Strezoska, Z.; Paunesku, T.; Labat, I.; Zeremski, M.; Snoody, J.; Crkvenjakov, R. ); Funkhouser, W.K.; Koop, B.; Hood, L. )

    1993-06-11

    The concept of sequencing by hybridization (SBH) makes use of an array of all possible n-nucleotide oligomers (n-mers) to identify n-mers present in an unknown DNA sequence. Computational approaches can then be used to assemble the complete sequence. As a validation of this concept, the sequences of three DNA fragments, 343 base pairs in length, were determined with octamer oligonucleotides. Possible applications of SBH include physical mapping (ordering) of overlapping DNA clones, sequence checking, DNA fingerprinting comparisons of normal and disease-causing genes, and the identification of DNA fragments with particular sequence motifs in complementary DNA and genomic libraries. The SBH techniques may accelerate the mapping and sequencing phases of the human genome project. 22 refs., 3 figs.

  16. Association Claims in the Sequencing Era

    PubMed Central

    Pulit, Sara L.; Leusink, Maarten; Menelaou, Androniki; de Bakker, Paul I. W.

    2014-01-01

    Since the completion of the Human Genome Project, the field of human genetics has been in great flux, largely due to technological advances in studying DNA sequence variation. Although community-wide adoption of statistical standards was key to the success of genome-wide association studies, similar standards have not yet been globally applied to the processing and interpretation of sequencing data. It has proven particularly challenging to pinpoint unequivocally disease variants in sequencing studies of polygenic traits. Here, we comment on a number of factors that may contribute to irreproducible claims of association in scientific literature and discuss possible steps that we can take towards cultural change. PMID:24705293

  17. Initial retrieval sequence and blending strategy

    SciTech Connect

    Pemwell, D.L.; Grenard, C.E.

    1996-09-01

    This report documents the initial retrieval sequence and the methodology used to select it. Waste retrieval, storage, pretreatment and vitrification were modeled for candidate single-shell tank retrieval sequences. Performance of the sequences was measured by a set of metrics (for example,high-level waste glass volume, relative risk and schedule).Computer models were used to evaluate estimated glass volumes,process rates, retrieval dates, and blending strategy effects.The models were based on estimates of component inventories and concentrations, sludge wash factors and timing, retrieval annex limitations, etc.

  18. Update on Rover Sequencing and Visualization Program

    NASA Technical Reports Server (NTRS)

    Cooper, Brian; Hartman, Frank; Maxwell, Scott; Yen, Jeng; Wright, John; Balacuit, Carlos

    2005-01-01

    The Rover Sequencing and Visualization Program (RSVP) has been updated. RSVP was reported in Rover Sequencing and Visualization Program (NPO-30845), NASA Tech Briefs, Vol. 29, No. 4 (April 2005), page 38. To recapitulate: The Rover Sequencing and Visualization Program (RSVP) is the software tool to be used in the Mars Exploration Rover (MER) mission for planning rover operations and generating command sequences for accomplishing those operations. RSVP combines three-dimensional (3D) visualization for immersive exploration of the operations area, stereoscopic image display for high-resolution examination of the downlinked imagery, and a sophisticated command-sequence editing tool for analysis and completion of the sequences. RSVP is linked with actual flight code modules for operations rehearsal to provide feedback on the expected behavior of the rover prior to committing to a particular sequence. Playback tools allow for review of both rehearsed rover behavior and downlinked results of actual rover operations. These can be displayed simultaneously for comparison of rehearsed and actual activities for verification. The primary inputs to RSVP are downlink data products from the Operations Storage Server (OSS) and activity plans generated by the science team. The activity plans are high-level goals for the next day s activities. The downlink data products include imagery, terrain models, and telemetered engineering data on rover activities and state. The Rover Sequence Editor (RoSE) component of RSVP performs activity expansion to command sequences, command creation and editing with setting of command parameters, and viewing and management of rover resources. The HyperDrive component of RSVP performs 2D and 3D visualization of the rover s environment, graphical and animated review of rover predicted and telemetered state, and creation and editing of command sequences related to mobility and Instrument Deployment Device (robotic arm) operations. Additionally, RoSE and

  19. Clinical sequencing: is WGS the better WES?

    PubMed

    Meienberg, Janine; Bruggmann, Rémy; Oexle, Konrad; Matyas, Gabor

    2016-03-01

    Current clinical next-generation sequencing is done by using gene panels and exome analysis, both of which involve selective capturing of target regions. However, capturing has limitations in sufficiently covering coding exons, especially GC-rich regions. We compared whole exome sequencing (WES) with the most recent PCR-free whole genome sequencing (WGS), showing that only the latter is able to provide hitherto unprecedented complete coverage of the coding region of the genome. Thus, from a clinical/technical point of view, WGS is the better WES so that capturing is no longer necessary for the most comprehensive genomic testing of Mendelian disorders. PMID:26742503

  20. A strategy for sequence phylogeny research.

    PubMed Central

    Sankoff, D; Cedergren, R J; McKay, W

    1982-01-01

    Minimal mutation trees, and almost minimal trees, are constructed from two data sets, one of phenylalanine tRNA sequences, and the other of 5S RNA sequences, from a diverse range of organisms. The two sets of results are mutually consistent. Trees representing previous evolutionary hypotheses are compared using a total weighted mutational distance criterion. The importance of sequence data from relatively little-studed phylogenetic lines is stressed. A procedure is illustrated which circumvents the computational difficulty of evaluating the astronomically large number of possible trees, without resorting to suboptimal methods. PMID:6917153

  1. Iterative method for generating correlated binary sequences

    NASA Astrophysics Data System (ADS)

    Usatenko, O. V.; Melnik, S. S.; Apostolov, S. S.; Makarov, N. M.; Krokhin, A. A.

    2014-11-01

    We propose an efficient iterative method for generating random correlated binary sequences with a prescribed correlation function. The method is based on consecutive linear modulations of an initially uncorrelated sequence into a correlated one. Each step of modulation increases the correlations until the desired level has been reached. The robustness and efficiency of the proposed algorithm are tested by generating sequences with inverse power-law correlations. The substantial increase in the strength of correlation in the iterative method with respect to single-step filtering generation is shown for all studied correlation functions. Our results can be used for design of disordered superlattices, waveguides, and surfaces with selective transport properties.

  2. Goldenhar sequence and mosaic trisomy 22

    SciTech Connect

    Pridjian, G.; Gill, W.L.; Shapira, E.

    1995-12-04

    We describe a term infant with facio-auriculo-vertebral {open_quotes}dysplasia{close_quotes} (Goldenhar sequence), hypertelorism, and mosaic trisomy 22: peripheral blood, 46, XY/47, XY,+22 (72%/28%); skin fibroblasts, 47, XY,+22(100%). This is the second report of Goldenbar anomaly with epibulbar dermoids in a live-born infant with aneuploidy. Hypertelorism is rare in Goldenhar sequence, but typical of trisomy 22. We recommend chromosome analysis in all patients with Goldenhar sequence. Those with hypertelorism may be more likely to have aneuploidy as well. 19 refs., 3 figs.

  3. Genomic sequence analysis tools: a user's guide.

    PubMed

    Fortna, A; Gardiner, K

    2001-03-01

    The wealth of information from various genome sequencing projects provides the biologist with a new perspective from which to analyze, and design experiments with, mammalian systems. The complexity of the information, however, requires new software tools, and numerous such tools are now available. Which type and which specific system is most effective depends, in part, upon how much sequence is to be analyzed and with what level of experimental support. Here we survey a number of mammalian genomic sequence analysis systems with respect to the data they provide and the ease of their use. The hope is to aid the experimental biologist in choosing the most appropriate tool for their analyses. PMID:11226611

  4. How Long is an Aftershock Sequence?

    NASA Astrophysics Data System (ADS)

    Godano, Cataldo; Tramelli, Anna

    2016-06-01

    The occurrence of a mainschok is always followed by aftershocks spatially distributed within the fault area. The aftershocks rate decay with time is described by the empirical Omori law which was inferred by catalogues analysis. The sequences discrimination within catalogues is not a straightforward operation, especially for low-magnitude mainshocks. Here, we describe the rate decay of the Omori law obtained using different sequence discrimination tools and we discover that, when the background seismicity is excluded, the sequences tend to last for the temporal extension of the catalogue.

  5. Expressed sequence tags: analysis and annotation.

    PubMed

    Parkinson, John; Blaxter, Mark

    2004-01-01

    Expressed sequence tags (ESTs) present a special set of problems for bioinformatic analysis. They are partial and error-prone, and large datasets can have significant internal redundancy. To facilitate analysis of small EST datasets from in-house projects, we present an integrated "pipeline" of tools that take EST data from sequence trace to database submission. These tools also can be used to provide clustering of ESTs into putative genes and to annotate these genes with preliminary sequence similarity searches. The systems are written to use the public-domain LINUX environment and other openly available analytical tools. PMID:15153624

  6. Complete Nucleotide Sequence of Tn10

    PubMed Central

    Chalmers, Ronald; Sewitz, Sven; Lipkow, Karen; Crellin, Paul

    2000-01-01

    The complete nucleotide sequence of Tn10 has been determined. The dinucleotide signature and percent G+C of the sequence had no discontinuities, indicating that Tn10 constitutes a homogeneous unit. The new sequence contained three new open reading frames corresponding to a glutamate permease, repressors of heavy metal resistance operons, and a hypothetical protein in Bacillus subtilis. The glutamate permease was fully functional when expressed, but Tn10 did not protect Escherichia coli from the toxic effects of various metals. PMID:10781570

  7. Method for sequencing DNA base pairs

    DOEpatents

    Sessler, Andrew M.; Dawson, John

    1993-01-01

    The base pairs of a DNA structure are sequenced with the use of a scanning tunneling microscope (STM). The DNA structure is scanned by the STM probe tip, and, as it is being scanned, the DNA structure is separately subjected to a sequence of infrared radiation from four different sources, each source being selected to preferentially excite one of the four different bases in the DNA structure. Each particular base being scanned is subjected to such sequence of infrared radiation from the four different sources as that particular base is being scanned. The DNA structure as a whole is separately imaged for each subjection thereof to radiation from one only of each source.

  8. How Long is an Aftershock Sequence?

    NASA Astrophysics Data System (ADS)

    Godano, Cataldo; Tramelli, Anna

    2016-07-01

    The occurrence of a mainschok is always followed by aftershocks spatially distributed within the fault area. The aftershocks rate decay with time is described by the empirical Omori law which was inferred by catalogues analysis. The sequences discrimination within catalogues is not a straightforward operation, especially for low-magnitude mainshocks. Here, we describe the rate decay of the Omori law obtained using different sequence discrimination tools and we discover that, when the background seismicity is excluded, the sequences tend to last for the temporal extension of the catalogue.

  9. Deep sequencing increases hepatitis C virus phylogenetic cluster detection compared to Sanger sequencing.

    PubMed

    Montoya, Vincent; Olmstead, Andrea; Tang, Patrick; Cook, Darrel; Janjua, Naveed; Grebely, Jason; Jacka, Brendan; Poon, Art F Y; Krajden, Mel

    2016-09-01

    Effective surveillance and treatment strategies are required to control the hepatitis C virus (HCV) epidemic. Phylogenetic analyses are powerful tools for reconstructing the evolutionary history of viral outbreaks and identifying transmission clusters. These studies often rely on Sanger sequencing which typically generates a single consensus sequence for each infected individual. For rapidly mutating viruses such as HCV, consensus sequencing underestimates the complexity of the viral quasispecies population and could therefore generate different phylogenetic tree topologies. Although deep sequencing provides a more detailed quasispecies characterization, in-depth phylogenetic analyses are challenging due to dataset complexity and computational limitations. Here, we apply deep sequencing to a characterized population to assess its ability to identify phylogenetic clusters compared with consensus Sanger sequencing. For deep sequencing, a sample specific threshold determined by the 50th percentile of the patristic distance distribution for all variants within each individual was used to identify clusters. Among seven patristic distance thresholds tested for the Sanger sequence phylogeny ranging from 0.005-0.06, a threshold of 0.03 was found to provide the maximum balance between positive agreement (samples in a cluster) and negative agreement (samples not in a cluster) relative to the deep sequencing dataset. From 77 HCV seroconverters, 10 individuals were identified in phylogenetic clusters using both methods. Deep sequencing analysis identified an additional 4 individuals and excluded 8 other individuals relative to Sanger sequencing. The application of this deep sequencing approach could be a more effective tool to understand onward HCV transmission dynamics compared with Sanger sequencing, since the incorporation of minority sequence variants improves the discrimination of phylogenetically linked clusters. PMID:27282472

  10. Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification

    PubMed Central

    Borozan, Ivan; Watt, Stuart; Ferretti, Vincent

    2015-01-01

    Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. Availability and implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. Contact: ivan.borozan@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573913

  11. Probabilistic Motor Sequence Yields Greater Offline and Less Online Learning than Fixed Sequence

    PubMed Central

    Du, Yue; Prashad, Shikha; Schoenbrun, Ilana; Clark, Jane E.

    2016-01-01

    It is well acknowledged that motor sequences can be learned quickly through online learning. Subsequently, the initial acquisition of a motor sequence is boosted or consolidated by offline learning. However, little is known whether offline learning can drive the fast learning of motor sequences (i.e., initial sequence learning in the first training session). To examine offline learning in the fast learning stage, we asked four groups of young adults to perform the serial reaction time (SRT) task with either a fixed or probabilistic sequence and with or without preliminary knowledge (PK) of the presence of a sequence. The sequence and PK were manipulated to emphasize either procedural (probabilistic sequence; no preliminary knowledge (NPK)) or declarative (fixed sequence; with PK) memory that were found to either facilitate or inhibit offline learning. In the SRT task, there were six learning blocks with a 2 min break between each consecutive block. Throughout the session, stimuli followed the same fixed or probabilistic pattern except in Block 5, in which stimuli appeared in a random order. We found that PK facilitated the learning of a fixed sequence, but not a probabilistic sequence. In addition to overall learning measured by the mean reaction time (RT), we examined the progressive changes in RT within and between blocks (i.e., online and offline learning, respectively). It was found that the two groups who performed the fixed sequence, regardless of PK, showed greater online learning than the other two groups who performed the probabilistic sequence. The groups who performed the probabilistic sequence, regardless of PK, did not display online learning, as indicated by a decline in performance within the learning blocks. However, they did demonstrate remarkably greater offline improvement in RT, which suggests that they are learning the probabilistic sequence offline. These results suggest that in the SRT task, the fast acquisition of a motor sequence is driven

  12. Using mobile sequencers in an academic classroom.

    PubMed

    Zaaijer, Sophie; Erlich, Yaniv

    2016-01-01

    The advent of mobile DNA sequencers has made it possible to generate DNA sequencing data outside of laboratories and genome centers. Here, we report our experience of using the MinION, a mobile sequencer, in a 13-week academic course for undergraduate and graduate students. The course consisted of theoretical sessions that presented fundamental topics in genomics and several applied hackathon sessions. In these hackathons, the students used MinION sequencers to generate and analyze their own data and gain hands-on experience in the topics discussed in the theoretical classes. The manuscript describes the structure of our class, the educational material, and the lessons we learned in the process. We hope that the knowledge and material presented here will provide the community with useful tools to help educate future generations of genome scientists. PMID:27054412

  13. Automated correction of genome sequence errors

    PubMed Central

    Gajer, Pawel; Schatz, Michael; Salzberg, Steven L.

    2004-01-01

    By using information from an assembly of a genome, a new program called AutoEditor significantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the overall accuracy of genome sequences and facilitates the use of these sequences for polymorphism discovery. We describe the algorithm and its application in a large set of recent genome sequencing projects. The number of erroneous base calls in these projects was reduced by 80%. In an analysis of over one million corrections, we found that AutoEditor made just one error per 8828 corrections. By substantially increasing the accuracy of base calling, AutoEditor can dramatically accelerate the process of finishing genomes, which involves closing all gaps and ensuring minimum quality standards for the final sequence. It also greatly improves our ability to discover single nucleotide polymorphisms (SNPs) between closely related strains and isolates of the same species. PMID:14744981

  14. Adaptive seeds tame genomic sequence comparison.

    PubMed

    Kiełbasa, Szymon M; Wan, Raymond; Sato, Kengo; Horton, Paul; Frith, Martin C

    2011-03-01

    The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition. PMID:21209072

  15. The genome sequence of Drosophila melanogaster.

    SciTech Connect

    2000-03-24

    The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the {approximately}120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes {approximately}13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

  16. Fibonacci Sequence and Supramolecular Structure of DNA.

    PubMed

    Shabalkin, I P; Grigor'eva, E Yu; Gudkova, M V; Shabalkin, P I

    2016-05-01

    We proposed a new model of supramolecular DNA structure. Similar to the previously developed by us model of primary DNA structure [11-15], 3D structure of DNA molecule is assembled in accordance to a mathematic rule known as Fibonacci sequence. Unlike primary DNA structure, supramolecular 3D structure is assembled from complex moieties including a regular tetrahedron and a regular octahedron consisting of monomers, elements of the primary DNA structure. The moieties of the supramolecular DNA structure forming fragments of regular spatial lattice are bound via linker (joint) sequences of the DNA chain. The lattice perceives and transmits information signals over a considerable distance without acoustic aberrations. Linker sequences expand conformational space between lattice segments allowing their sliding relative to each other under the action of external forces. In this case, sliding is provided by stretching of the stacked linker sequences. PMID:27265133

  17. "X"-tending the Fibonacci Sequence.

    ERIC Educational Resources Information Center

    Moran, Glenn T.

    2002-01-01

    Outlines a lesson on the Fibonacci and Lucas sequences that captures student interest by presenting the opportunity for computation practice, mental mathematics, and proof for algebra students. Discusses an extension for solving simultaneous equations. (YDS)

  18. A Clinician's perspective on clinical exome sequencing.

    PubMed

    O'Donnell-Luria, Anne H; Miller, David T

    2016-06-01

    Clinical exome sequencing has clearly improved our ability as clinicians to identify the cause of a wide variety of disorders. Prior to exome sequencing, a majority of patients with apparent syndromes never received a specific molecular genetic diagnosis despite extensive diagnostic odysseys. Even for those receiving an answer to the question of what caused their disorder, the diagnostic odyssey often spanned years to decades. Determining the particular genetic cause in an individual patient can be challenging due to inherent phenotypic and genetic heterogeneity of disease, technical limitations of testing or both. Blended phenotypes, due to multiple monogenic disorders in the same patient, are true dilemmas for traditional genetic evaluations, but are increasingly being diagnosed through clinical exome sequencing. New sequencing technologies have increased the proportion of patients receiving molecular diagnoses, while significantly shortening the time scale, providing multiple benefits for the health-care team, patient and family. PMID:27126233

  19. Complete genome sequence of tobacco mosqueado virus.

    PubMed

    Blawid, Rosana; Rodrigues, Kelly Barreto; de Moraes Rêgo, Camila; Inoue-Nagata, Alice K; Nagata, Tatsuya

    2016-09-01

    We describe the genomic characteristics of a new potyvirus isolated from tobacco plants showing mottling ("mosqueado" in Portuguese) in southern Brazil. The complete genomic sequence consists of 9896 nucleotides, without the poly(A) tail, and shares the highest pairwise nucleotide sequence identities of 68.5 % with pepper yellow mosaic virus and 68.2 % with Brugmansia mosaic virus isolate D437. These identity values are below the level of 76.0 % used as a criterion for species demarcation in the genus Potyvirus based on the complete genome sequence. The viral genomic organization and sequence comparison thus suggest that this virus, tentatively named "tobacco mosqueado virus" (TMosqV), represents a new potyvirus species. PMID:27368991

  20. Using mobile sequencers in an academic classroom

    PubMed Central

    Zaaijer, Sophie; Erlich, Yaniv

    2016-01-01

    The advent of mobile DNA sequencers has made it possible to generate DNA sequencing data outside of laboratories and genome centers. Here, we report our experience of using the MinION, a mobile sequencer, in a 13-week academic course for undergraduate and graduate students. The course consisted of theoretical sessions that presented fundamental topics in genomics and several applied hackathon sessions. In these hackathons, the students used MinION sequencers to generate and analyze their own data and gain hands-on experience in the topics discussed in the theoretical classes. The manuscript describes the structure of our class, the educational material, and the lessons we learned in the process. We hope that the knowledge and material presented here will provide the community with useful tools to help educate future generations of genome scientists. DOI: http://dx.doi.org/10.7554/eLife.14258.001 PMID:27054412

  1. Genetics Home Reference: isolated lissencephaly sequence

    MedlinePlus

    ... lissencephaly sequence (ILS) is a condition that affects brain development before birth. Normally, the cells that make up ... the brain do not form. This impairment of brain development leads to the smooth brain appearance and the ...

  2. Multifunctional pulse sequence generator for pulse NMR

    NASA Astrophysics Data System (ADS)

    Wang, Dongsheng

    1988-06-01

    A new multifunctional pulse sequence generator has been designed and constructed. It can conveniently generate various pulse sequences used in nuclear-magnetic resonance (NMR) to measure the spin-lattice relaxation time T1, the spin-spin relaxation time T2, and the spin-locking relaxation time T1 ρ. It can also be used in pulse Fourier transform NMR and double resonance. The intervals of pulses can increase automatically with sequence repetitions and the generator can be used in two-dimensional spectrum measurement and spin-density imaging research. The sequences can be generated through four different triggering methods and there are two synchronous pulse outputs and fifteen auxiliary pulse outputs, so the generator can be conveniently interfaced with a computer or other instruments. The circuitry, functions, and features of the generator are described in this article.

  3. Quadruplex DNA: sequence, topology and structure

    PubMed Central

    Burge, Sarah; Parkinson, Gary N.; Hazel, Pascale; Todd, Alan K.; Neidle, Stephen

    2006-01-01

    G-quadruplexes are higher-order DNA and RNA structures formed from G-rich sequences that are built around tetrads of hydrogen-bonded guanine bases. Potential quadruplex sequences have been identified in G-rich eukaryotic telomeres, and more recently in non-telomeric genomic DNA, e.g. in nuclease-hypersensitive promoter regions. The natural role and biological validation of these structures is starting to be explored, and there is particular interest in them as targets for therapeutic intervention. This survey focuses on the folding and structural features on quadruplexes formed from telomeric and non-telomeric DNA sequences, and examines fundamental aspects of topology and the emerging relationships with sequence. Emphasis is placed on information from the high-resolution methods of X-ray crystallography and NMR, and their scope and current limitations are discussed. Such information, together with biological insights, will be important for the discovery of drugs targeting quadruplexes from particular genes. PMID:17012276

  4. Evolutionary sequences for horizontal branch stars

    NASA Technical Reports Server (NTRS)

    Sweigart, Allen V.

    1987-01-01

    A new grid of canonical evolutionary horizontal branch (HB) sequences is presented. Sequences are computed for each combination of the following helium and heavy-element abundances, respectively: Y(main sequence) = 0.20, 0.25, 0.30, and Z = 0.0001, 0.001, and 0.01. The results show that the bifurcation point at which the HB morphology changes from redward-evolving tracks to tracks with blueward loops shifts to higher effective temperatures with increasing helium abundance or metallicity. The sequences can be used to study in more detail how a number of HB properties such as the HB lifetime, the effective temperature at the bifurcation point in the track morphology, the luminosity dropoff of the blue HB, and the luminosity width of the red HB depend on the composition.

  5. Mutations in the K+ channel signature sequence.

    PubMed Central

    Heginbotham, L; Lu, Z; Abramson, T; MacKinnon, R

    1994-01-01

    Potassium channels share a highly conserved stretch of eight amino acids, a K+ channel signature sequence. The conserved sequence falls within the previously defined P-region of voltage-activated K+ channels. In this study we investigate the effect of mutations in the signature sequence of the Shaker channel on K+ selectivity determined under bi-ionic conditions. Nonconservative substitutions of two threonine residues and the tyrosine residue leave selectivity intact. In contrast, mutations at some positions render the channel nonselective among monovalent cations. These findings are consistent with a proposal that the signature sequence contributes to a selectivity filter. Furthermore, the results illustrate that the hydroxyl groups at the third and fourth positions, and the aromatic group at position seven, are not essential in determining K+ selectivity. Images FIGURE 1 PMID:8038378

  6. Compilation of DNA sequences of Escherichia coli

    PubMed Central

    Kröger, Manfred

    1989-01-01

    We have compiled the DNA sequence data for E.coli K12 available from the GENBANK and EMBO databases and over a period of several years independently from the literature. We have introduced all available genetic map data and have arranged the sequences accordingly. As far as possible the overlaps are deleted and a total of 940,449 individual bp is found to be determined till the beginning of 1989. This corresponds to a total of 19.92% of the entire E.coli chromosome consisting of about 4,720 kbp. This number may actually be higher by some extra 2% derived from the sequence of lysogenic bacteriophage lambda and the various insertion sequences. This compilation may be available in machine readable form from one of the international databanks in some future. PMID:2654890

  7. Genetics Home Reference: isolated Pierre Robin sequence

    MedlinePlus

    ... of isolated Pierre Robin sequence: Boston Children's Hospital: Cleft Lip and Cleft Palate Treatment and Care Genetic Testing ... 7 links) Centers for Disease Control: Facts About Cleft Lip and Cleft Palate Children's Craniofacial Association: A Guide ...

  8. Sequence finishing and mapping of Drosophila melanogasterheterochromatin

    SciTech Connect

    Hoskins, Roger A.; Carlson, Joseph W.; Kennedy, Cameron; Acevedo,David; Evans-Holm, Martha; Frise, Erwin; Wan, Kenneth H.; Park, Soo; Mendez-Lago, Maria; Rossi, Fabrizio; Villasante, Alfredo; Dimitri,Patrizio; Karpen, Gary H.; Celniker, Susan E.

    2007-06-15

    Genome sequences for most metazoans are incomplete due tothe presence of repeated DNA in the pericentromeric heterochromatin. Theheterochromatic regions of D. melanogaster contain 20 Mb of sequenceamenable to mapping, sequence assembly and finishing. Here we describethe generation of 15 Mb of finished or improved heterochromatic sequenceusing available clone resources and assembly and mapping methods. We alsoconstructed a BAC-based physical map that spans approximately 13 Mb ofthe pericentromeric heterochromatin, and a cytogenetic map that positionsapproximately 11 Mb of BAC contigs and sequence scaffolds in specificchromosomal locations. The integrated sequence assembly and maps greatlyimprove our understanding of the structure and composition of this poorlyunderstood fraction of a metazoan genome and provide a framework forfunctional analyses.

  9. CLaMS: Classifier for Metagenomic Sequences

    Energy Science and Technology Software Center (ESTSC)

    2010-12-01

    CLaMS-"Classifer for Metagenonic Sequences" is a Java application for binning assembled metagenomes wings user-specified training sequence sets and other user-specified initial parameters. Since ClAmS analyzes and matches sequence composition-based genomic signatures, it is much faster than binning tools that rely on alignments to homologs; CLaMS can bin ~20,000 sequences in 3 minutes on a laptop with a 2.4 Ghz. Intel Core 2 Duo processor and 2 GB Ram. CLaMS is meant to be desktop applicationmore » for biologist and can be run on any machine under any operating system on which the Java Runtime Environment is enabled. CLaMS is freely available in both GVI-based and command-line based forms.« less

  10. A disruptive sequencer meets disruptive publishing

    PubMed Central

    Loman, Nick; Goodwin, Sarah; Jansen, Hans; Loose, Matt

    2015-01-01

    Nanopore sequencing was recently made available to users in the form of the Oxford Nanopore MinION. Released to users through an early access programme, the MinION is made unique by its tiny form factor and ability to generate very long sequences from single DNA molecules. The platform is undergoing rapid evolution with three distinct nanopore types and five updates to library preparation chemistry in the last 18 months. To keep pace with the rapid evolution of this sequencing platform, and to provide a space where new analysis methods can be openly discussed, we present a new F1000Research channel devoted to updates to and analysis of nanopore sequence data. PMID:26998227

  11. The Value of DNA Sequencing - TCGA

    Cancer.gov

    DNA sequencing: what it tells us about DNA changes in cancer, how looking across many tumors will help to identify meaningful changes and potential drug targets, and how genomics is changing the way we think about cancer.

  12. Exome Sequencing in Parkinson’s disease

    PubMed Central

    Bras, Jose M; Singleton, Andrew B

    2011-01-01

    Exome Sequencing is rapidly becoming a fundamental tool for genetics and functional genomics laboratories. This methodology has enabled the discovery of novel pathogenic mutations causing mendelian diseases that had, until now, remained elusive. In this review we discuss not only how we envisage exome sequencing being applied to a complex disease, such as Parkinson’s disease, but also what are the known caveats of this approach. PMID:21651510

  13. Transcriptional profiling of Dictyostelium with RNA sequencing

    PubMed Central

    Miranda, Edward Roshan; Rot, Gregor; Toplak, Marko; Santhanam, Balaji; Curk, Tomaz; Shaulsky, Gad; Zupan, Blaz

    2014-01-01

    Summary Transcriptional profiling methods have been utilized in the analysis of various biological processes in Dictyostelium. Recent advances in high-throughput sequencing have increased the resolution and the dynamic range of transcriptional profiling. Here we describe the utility of RNA-sequencing with the Illumina technology for production of transcriptional profiles. We also describe methods for data mapping and storage as well as common and specialized tools for data analysis, both online and offline. PMID:23494306

  14. Proteomics-grade de novo sequencing approach.

    PubMed

    Savitski, Mikhail M; Nielsen, Michael L; Kjeldsen, Frank; Zubarev, Roman A

    2005-01-01

    The conventional approach in modern proteomics to identify proteins from limited information provided by molecular and fragment masses of their enzymatic degradation products carries an inherent risk of both false positive and false negative identifications. For reliable identification of even known proteins, complete de novo sequencing of their peptides is desired. The main problems of conventional sequencing based on tandem mass spectrometry are incomplete backbone fragmentation and the frequent overlap of fragment masses. In this work, the first proteomics-grade de novo approach is presented, where the above problems are alleviated by the use of complementary fragmentation techniques CAD and ECD. Implementation of a high-current, large-area dispenser cathode as a source of low-energy electrons provided efficient ECD of doubly charged peptides, the most abundant species (65-80%), in a typical trypsin-based proteomics experiment. A new linear de novo algorithm is developed combining efficiency and speed, processing on a conventional 3 GHz PC, 1000 MS/MS data sets in 60 s. More than 6% of all MS/MS data for doubly charged peptides yielded complete sequences, and another 13% gave nearly complete sequences with a maximum gap of two amino acid residues. These figures are comparable with the typical success rates (5-15%) of database identification. For peptides reliably found in the database (Mowse score > or = 34), the agreement with de novo-derived full sequences was >95%. Full sequences were derived in 67% of the cases when full sequence information was present in MS/MS spectra. Thus the new de novo sequencing approach reached the same level of efficiency and reliability as conventional database-identification strategies. PMID:16335984

  15. Reporting Differences Between Spacecraft Sequence Files

    NASA Technical Reports Server (NTRS)

    Khanampompan, Teerapat; Gladden, Roy E.; Fisher, Forest W.

    2010-01-01

    A suite of computer programs, called seq diff suite, reports differences between the products of other computer programs involved in the generation of sequences of commands for spacecraft. These products consist of files of several types: replacement sequence of events (RSOE), DSN keyword file [DKF (wherein DSN signifies Deep Space Network)], spacecraft activities sequence file (SASF), spacecraft sequence file (SSF), and station allocation file (SAF). These products can include line numbers, request identifications, and other pieces of information that are not relevant when generating command sequence products, though these fields can result in the appearance of many changes to the files, particularly when using the UNIX diff command to inspect file differences. The outputs of prior software tools for reporting differences between such products include differences in these non-relevant pieces of information. In contrast, seq diff suite removes the fields containing the irrelevant pieces of information before processing to extract differences, so that only relevant differences are reported. Thus, seq diff suite is especially useful for reporting changes between successive versions of the various products and in particular flagging difference in fields relevant to the sequence command generation and review process.

  16. Polynomials Generated by the Fibonacci Sequence

    NASA Astrophysics Data System (ADS)

    Garth, David; Mills, Donald; Mitchell, Patrick

    2007-06-01

    The Fibonacci sequence's initial terms are F_0=0 and F_1=1, with F_n=F_{n-1}+F_{n-2} for n>=2. We define the polynomial sequence p by setting p_0(x)=1 and p_{n}(x)=x*p_{n-1}(x)+F_{n+1} for n>=1, with p_{n}(x)= sum_{k=0}^{n} F_{k+1}x^{n-k}. We call p_n(x) the Fibonacci-coefficient polynomial (FCP) of order n. The FCP sequence is distinct from the well-known Fibonacci polynomial sequence. We answer several questions regarding these polynomials. Specifically, we show that each even-degree FCP has no real zeros, while each odd-degree FCP has a unique, and (for degree at least 3) irrational, real zero. Further, we show that this sequence of unique real zeros converges monotonically to the negative of the golden ratio. Using Rouche's theorem, we prove that the zeros of the FCP's approach the golden ratio in modulus. We also prove a general result that gives the Mahler measures of an infinite subsequence of the FCP sequence whose coefficients are reduced modulo an integer m>=2. We then apply this to the case that m=L_n, the nth Lucas number, showing that the Mahler measure of the subsequence is phi^{n-1}, where phi=(1+sqrt 5)/2.

  17. SEQOPTICS: a protein sequence clustering system

    PubMed Central

    Chen, Yonghui; Reilly, Kevin D; Sprague, Alan P; Guan, Zhijie

    2006-01-01

    Background Protein sequence clustering has been widely used as a part of the analysis of protein structure and function. In most cases single linkage or graph-based clustering algorithms have been applied. OPTICS (Ordering Points To Identify the Clustering Structure) is an attractive approach due to its emphasis on visualization of results and support for interactive work, e.g., in choosing parameters. However, OPTICS has not been used, as far as we know, for protein sequence clustering. Results In this paper, a system of clustering proteins, SEQOPTICS (SEQuence clustering with OPTICS) is demonstrated. The system is implemented with Smith-Waterman as protein distance measurement and OPTICS at its core to perform protein sequence clustering. SEQOPTICS is tested with four data sets from different data sources. Visualization of the sequence clustering structure is demonstrated as well. Conclusion The system was evaluated by comparison with other existing methods. Analysis of the results demonstrates that SEQOPTICS performs better based on some evaluation criteria including Jaccard coefficient, Precision, and Recall. It is a promising protein sequence clustering method with future possible improvement on parallel computing and other protein distance measurements. PMID:17217502

  18. Variations on strongly lacunary quasi Cauchy sequences

    NASA Astrophysics Data System (ADS)

    Kaplan, Huseyin; Cakalli, Huseyin

    2016-08-01

    We introduce a new function space, namely the space of Nθ (p)-ward continuous functions, which turns out to be a closed subspace of the space of continuous functions for each positive integer p. Nθα(p ) -ward continuity is also introduced and investigated for any fixed 0 < α ≤ 1, and for any fixed positive integer p. A real valued function f defined on a subset A of R, the set of real numbers is Nθα(p ) -ward continuous if it preserves Nθα(p ) -quasi-Cauchy sequences, i.e. (f (xn)) is an Nθα(p ) -quasi-Cauchy sequence whenever (xn) is Nθα(p ) -quasi-Cauchy sequence of points in A, where a sequence (xk) of points in R is called Nθα(p ) -quasi-Cauchy if lim r →∞ 1/hrα ∑k ∈Ir |Δ xk | p =0 , where Δxk = xk+1-xk for each positive integer k, p is a fixed positive integer, α is fixed in ]0, 1], Ir = (kr-1, kr], and θ = (kr) is a lacunary sequence, i.e. an increasing sequence of positive integers such that k0 ≠ 0, and hr: kr-kr-1 →∞.

  19. Unlocking Short Read Sequencing for Metagenomics

    PubMed Central

    Timberlake, Sonia C.; Blackburn, Matthew C.; Malmstrom, Rex R.; Alm, Eric J.; Chisholm, Sallie W.

    2010-01-01

    Background Different high-throughput nucleic acid sequencing platforms are currently available but a trade-off currently exists between the cost and number of reads that can be generated versus the read length that can be achieved. Methodology/Principal Findings We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read. Conclusions/Significance This strategy is broadly applicable to sequencing applications that benefit from low-cost high-throughput sequencing, but require longer read lengths. We demonstrate that our approach enables metagenomic analyses using the Illumina Genome Analyzer, with low error rates, and at a fraction of the cost of pyrosequencing. PMID:20676378

  20. A neurocomputational model of automatic sequence production.

    PubMed

    Helie, Sebastien; Roeder, Jessica L; Vucovich, Lauren; Rünger, Dennis; Ashby, F Gregory

    2015-07-01

    Most behaviors unfold in time and include a sequence of submovements or cognitive activities. In addition, most behaviors are automatic and repeated daily throughout life. Yet, relatively little is known about the neurobiology of automatic sequence production. Past research suggests a gradual transfer from the associative striatum to the sensorimotor striatum, but a number of more recent studies challenge this role of the BG in automatic sequence production. In this article, we propose a new neurocomputational model of automatic sequence production in which the main role of the BG is to train cortical-cortical connections within the premotor areas that are responsible for automatic sequence production. The new model is used to simulate four different data sets from human and nonhuman animals, including (1) behavioral data (e.g., RTs), (2) electrophysiology data (e.g., single-neuron recordings), (3) macrostructure data (e.g., TMS), and (4) neurological circuit data (e.g., inactivation studies). We conclude with a comparison of the new model with existing models of automatic sequence production and discuss a possible new role for the BG in automaticity and its implication for Parkinson's disease. PMID:25671503