Deroost, Natacha; Coomans, Daphné
2018-02-01
We examined the role of sequence awareness in a pure perceptual sequence learning design. Participants had to react to the target's colour that changed according to a perceptual sequence. By varying the mapping of the target's colour onto the response keys, motor responses changed randomly. The effect of sequence awareness on perceptual sequence learning was determined by manipulating the learning instructions (explicit versus implicit) and assessing the amount of sequence awareness after the experiment. In the explicit instruction condition (n = 15), participants were instructed to intentionally search for the colour sequence, whereas in the implicit instruction condition (n = 15), they were left uninformed about the sequenced nature of the task. Sequence awareness after the sequence learning task was tested by means of a questionnaire and the process-dissociation-procedure. The results showed that the instruction manipulation had no effect on the amount of perceptual sequence learning. Based on their report to have actively applied their sequence knowledge during the experiment, participants were subsequently regrouped in a sequence strategy group (n = 14, of which 4 participants from the implicit instruction condition and 10 participants from the explicit instruction condition) and a no-sequence strategy group (n = 16, of which 11 participants from the implicit instruction condition and 5 participants from the explicit instruction condition). Only participants of the sequence strategy group showed reliable perceptual sequence learning and sequence awareness. These results indicate that perceptual sequence learning depends upon the continuous employment of strategic cognitive control processes on sequence knowledge. Sequence awareness is suggested to be a necessary but not sufficient condition for perceptual learning to take place. Copyright © 2018 Elsevier B.V. All rights reserved.
Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide
2000-01-01
The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can be revealed by seven RISA systems within one month. PMID:11076861
Synchronized excitability in a network enables generation of internal neuronal sequences
Wang, Yingxue; Roth, Zachary; Pastalkova, Eva
2016-01-01
Hippocampal place field sequences are supported by sensory cues and network internal mechanisms. In contrast, sharp-wave (SPW) sequences, theta sequences, and episode field sequences are internally generated. The relationship of these sequences to memory is unclear. SPW sequences have been shown to support learning and have been assumed to also support episodic memory. Conversely, we demonstrate these SPW sequences were present in trained rats even after episodic memory was impaired and after other internal sequences – episode field and theta sequences – were eliminated. SPW sequences did not support memory despite continuing to ‘replay’ all task-related sequences – place- field and episode field sequences. Sequence replay occurred selectively during synchronous increases of population excitability -- SPWs. Similarly, theta sequences depended on the presence of repeated synchronized waves of excitability – theta oscillations. Thus, we suggest that either intermittent or rhythmic synchronized changes of excitability trigger sequential firing of neurons, which in turn supports learning and/or memory. DOI: http://dx.doi.org/10.7554/eLife.20697.001 PMID:27677848
Method for identifying and quantifying nucleic acid sequence aberrations
Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.
1998-01-01
A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.
Method for identifying and quantifying nucleic acid sequence aberrations
Lucas, J.N.; Straume, T.; Bogen, K.T.
1998-07-21
A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.
Sequencing artifacts in the type A influenza databases and attempts to correct them.
Suarez, David L; Chester, Nikki; Hatfield, Jason
2014-07-01
There are over 276 000 influenza gene sequences in public databases, with the quality of the sequences determined by the contributor. As part of a high school class project, influenza sequences with possible errors were identified in the public databases based on the size of the gene being longer than expected, with the hypothesis that these sequences would have an error. Students contacted sequence submitters alerting them of the possible sequence issue(s) and requested they the suspect sequence(s) be correct as appropriate. Type A influenza viruses were screened, and gene segments longer than the accepted size were identified for further analysis. Attention was placed on sequences with additional nucleotides upstream or downstream of the highly conserved non-coding ends of the viral segments. A total of 1081 sequences were identified that met this criterion. Three types of errors were commonly observed: non-influenza primer sequence wasn't removed from the sequence; PCR product was cloned and plasmid sequence was included in the sequence; and Taq polymerase added an adenine at the end of the PCR product. Internal insertions of nucleotide sequence were also commonly observed, but in many cases it was unclear if the sequence was correct or actually contained an error. A total of 215 sequences, or 22.8% of the suspect sequences, were corrected in the public databases in the first year of the student project. Unfortunately 138 additional sequences with possible errors were added to the databases in the second year. Additional awareness of the need for data integrity of sequences submitted to public databases is needed to fully reap the benefits of these large data sets. © 2014 The Authors. Influenza and Other Respiratory Viruses Published by John Wiley & Sons Ltd.
First complete genome sequence of infectious laryngotracheitis virus
2011-01-01
Background Infectious laryngotracheitis virus (ILTV) is an alphaherpesvirus that causes acute respiratory disease in chickens worldwide. To date, only one complete genomic sequence of ILTV has been reported. This sequence was generated by concatenating partial sequences from six different ILTV strains. Thus, the full genomic sequence of a single (individual) strain of ILTV has not been determined previously. This study aimed to use high throughput sequencing technology to determine the complete genomic sequence of a live attenuated vaccine strain of ILTV. Results The complete genomic sequence of the Serva vaccine strain of ILTV was determined, annotated and compared to the concatenated ILTV reference sequence. The genome size of the Serva strain was 152,628 bp, with a G + C content of 48%. A total of 80 predicted open reading frames were identified. The Serva strain had 96.5% DNA sequence identity with the concatenated ILTV sequence. Notably, the concatenated ILTV sequence was found to lack four large regions of sequence, including 528 bp and 594 bp of sequence in the UL29 and UL36 genes, respectively, and two copies of a 1,563 bp sequence in the repeat regions. Considerable differences in the size of the predicted translation products of 4 other genes (UL54, UL30, UL37 and UL38) were also identified. More than 530 single-nucleotide polymorphisms (SNPs) were identified. Most SNPs were located within three genomic regions, corresponding to sequence from the SA-2 ILTV vaccine strain in the concatenated ILTV sequence. Conclusions This is the first complete genomic sequence of an individual ILTV strain. This sequence will facilitate future comparative genomic studies of ILTV by providing an appropriate reference sequence for the sequence analysis of other ILTV strains. PMID:21501528
Locating Sequence on FPC Maps and Selecting a Minimal Tiling Path
Engler, Friedrich W.; Hatfield, James; Nelson, William; Soderlund, Carol A.
2003-01-01
This study discusses three software tools, the first two aid in integrating sequence with an FPC physical map and the third automatically selects a minimal tiling path given genomic draft sequence and BAC end sequences. The first tool, FSD (FPC Simulated Digest), takes a sequenced clone and adds it back to the map based on a fingerprint generated by an in silico digest of the clone. This allows verification of sequenced clone positions and the integration of sequenced clones that were not originally part of the FPC map. The second tool, BSS (Blast Some Sequence), takes a query sequence and positions it on the map based on sequence associated with the clones in the map. BSS has multiple uses as follows: (1) When the query is a file of marker sequences, they can be added as electronic markers. (2) When the query is draft sequence, the results of BSS can be used to close gaps in a sequenced clone or the physical map. (3) When the query is a sequenced clone and the target is BAC end sequences, one may select the next clone for sequencing using both sequence comparison results and map location. (4) When the query is whole-genome draft sequence and the target is BAC end sequences, the results can be used to select many clones for a minimal tiling path at once. The third tool, pickMTP, automates the majority of this last usage of BSS. Results are presented using the rice FPC map, BAC end sequences, and whole-genome shotgun from Syngenta. PMID:12915486
Program for Editing Spacecraft Command Sequences
NASA Technical Reports Server (NTRS)
Gladden, Roy; Waggoner, Bruce; Kordon, Mark; Hashemi, Mahnaz; Hanks, David; Salcedo, Jose
2006-01-01
Sequence Translator, Editor, and Expander Resource (STEER) is a computer program that facilitates construction of sequences and blocks of sequences (hereafter denoted generally as sequence products) for commanding a spacecraft. STEER also provides mechanisms for translating among various sequence product types and quickly expanding activities of a given sequence in chronological order for review and analysis of the sequence. To date, construction of sequence products has generally been done by use of such clumsy mechanisms as text-editor programs, translating among sequence product types has been challenging, and expanding sequences to time-ordered lists has involved arduous processes of converting sequence products to "real" sequences and running them through Class-A software (defined, loosely, as flight and ground software critical to a spacecraft mission). Also, heretofore, generating sequence products in standard formats has been troublesome because precise formatting and syntax are required. STEER alleviates these issues by providing a graphical user interface containing intuitive fields in which the user can enter the necessary information. The STEER expansion function provides a "quick and dirty" means of seeing how a sequence and sequence block would expand into a chronological list, without need to use of Class-A software.
Kemény, Ferenc; Meier, Beat
2016-02-01
While sequence learning research models complex phenomena, previous studies have mostly focused on unimodal sequences. The goal of the current experiment is to put implicit sequence learning into a multimodal context: to test whether it can operate across different modalities. We used the Task Sequence Learning paradigm to test whether sequence learning varies across modalities, and whether participants are able to learn multimodal sequences. Our results show that implicit sequence learning is very similar regardless of the source modality. However, the presence of correlated task and response sequences was required for learning to take place. The experiment provides new evidence for implicit sequence learning of abstract conceptual representations. In general, the results suggest that correlated sequences are necessary for implicit sequence learning to occur. Moreover, they show that elements from different modalities can be automatically integrated into one unitary multimodal sequence. Copyright © 2015 Elsevier B.V. All rights reserved.
A trace display and editing program for data from fluorescence based sequencing machines.
Gleeson, T; Hillier, L
1991-12-11
'Ted' (Trace editor) is a graphical editor for sequence and trace data from automated fluorescence sequencing machines. It provides facilities for viewing sequence and trace data (in top or bottom strand orientation), for editing the base sequence, for automated or manual trimming of the head (vector) and tail (uncertain data) from the sequence, for vertical and horizontal trace scaling, for keeping a history of sequence editing, and for output of the edited sequence. Ted has been used extensively in the C.elegans genome sequencing project, both as a stand-alone program and integrated into the Staden sequence assembly package, and has greatly aided in the efficiency and accuracy of sequence editing. It runs in the X windows environment on Sun workstations and is available from the authors. Ted currently supports sequence and trace data from the ABI 373A and Pharmacia A.L.F. sequencers.
Sequences show rapid motor transfer and spatial translation in the oculomotor system.
Stainer, Matthew J; Carpenter, R H S; Brotchie, Peter; Anderson, Andrew J
2016-07-01
Every day we perform learnt sequences of actions that seem to happen almost without awareness. It has been argued that for learning such sequences parallel learning networks exist - one using spatial coordinates and one using motor coordinates - with sequence acquisition involving a progressive shift from the former to the latter as a sequence is rehearsed. When sequences are interrupted by an out-of-sequence target, there is a delay in the response to the target, and so here we transiently interrupt oculomotor sequences to probe the influence of oculomotor rehearsal and spatial coordinates in sequence acquisition. For our main experiments, we used a repeating sequences of eight targets in length that was first learnt either using saccadic eye movements (left/right), manual responses (left/right or up/down) or as a sequence of colour (blue/red) requiring no motor response. The sequence was immediately repeated for saccadic eye movements, during which the influence of on out-of-sequence target (an interruption) was assessed. When a sequence is learnt beforehand in an abstract way (for example, as a sequence of colours or of orthogonally mapped manual responses), interruptions are immediately disruptive to latency, suggesting neither motor rehearsal nor specific spatial coordinates are essential for encoding sequences of actions and that sequences - no matter how they are encoded - can be rapidly translated into oculomotor coordinates. The magnitude of a disruption does, however, correspond to how well a sequence is learnt: introducing an interruption to an extended sequence before it was reliably learnt reduces the magnitude of the latency disruption. Copyright © 2016 Elsevier Ltd. All rights reserved.
Seo, Joo-Hyun; Park, Jihyang; Kim, Eun-Mi; Kim, Juhan; Joo, Keehyoung; Lee, Jooyoung; Kim, Byung-Gee
2014-02-01
Sequence subgrouping for a given sequence set can enable various informative tasks such as the functional discrimination of sequence subsets and the functional inference of unknown sequences. Because an identity threshold for sequence subgrouping may vary according to the given sequence set, it is highly desirable to construct a robust subgrouping algorithm which automatically identifies an optimal identity threshold and generates subgroups for a given sequence set. To meet this end, an automatic sequence subgrouping method, named 'Subgrouping Automata' was constructed. Firstly, tree analysis module analyzes the structure of tree and calculates the all possible subgroups in each node. Sequence similarity analysis module calculates average sequence similarity for all subgroups in each node. Representative sequence generation module finds a representative sequence using profile analysis and self-scoring for each subgroup. For all nodes, average sequence similarities are calculated and 'Subgrouping Automata' searches a node showing statistically maximum sequence similarity increase using Student's t-value. A node showing the maximum t-value, which gives the most significant differences in average sequence similarity between two adjacent nodes, is determined as an optimum subgrouping node in the phylogenetic tree. Further analysis showed that the optimum subgrouping node from SA prevents under-subgrouping and over-subgrouping. Copyright © 2013. Published by Elsevier Ltd.
Large-Scale Concatenation cDNA Sequencing
Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.
1997-01-01
A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174
Finding functional features in Saccharomyces genomes by phylogenetic footprinting.
Cliften, Paul; Sudarsanam, Priya; Desikan, Ashwin; Fulton, Lucinda; Fulton, Bob; Majors, John; Waterston, Robert; Cohen, Barak A; Johnston, Mark
2003-07-04
The sifting and winnowing of DNA sequence that occur during evolution cause nonfunctional sequences to diverge, leaving phylogenetic footprints of functional sequence elements in comparisons of genome sequences. We searched for such footprints among the genome sequences of six Saccharomyces species and identified potentially functional sequences. Comparison of these sequences allowed us to revise the catalog of yeast genes and identify sequence motifs that may be targets of transcriptional regulatory proteins. Some of these conserved sequence motifs reside upstream of genes with similar functional annotations or similar expression patterns or those bound by the same transcription factor and are thus good candidates for functional regulatory sequences.
Matsuda, M; Tazumi, A; Kagawa, S; Sekizuka, T; Murayama, O; Moore, JE; Millar, BC
2006-01-01
Background At present, six accessible sequences of 16S rDNA from Taylorella equigenitalis (T. equigenitalis) are available, whose sequence differences occur at a few nucleotide positions. Thus it is important to determine these sequences from additional strains in other countries, if possible, in order to clarify any anomalies regarding 16S rDNA sequence heterogeneity. Here, we clone and sequence the approximate full-length 16S rDNA from additional strains of T. equigenitalis isolated in Japan, Australia and France and compare these sequences to the existing published sequences. Results Clarification of any anomalies regarding 16S rDNA sequence heterogeneity of T. equigenitalis was carried out. When cloning, sequencing and comparison of the approximate full-length 16S rDNA from 17 strains of T. equigenitalis isolated in Japan, Australia and France, nucleotide sequence differences were demonstrated at the six loci in the 1,469 nucleotide sequence. Moreover, 12 polymorphic sites occurred among 23 sequences of the 16S rDNA, including the six reference sequences. Conclusion High sequence similarity (99.5% or more) was observed throughout, except from nucleotide positions 138 to 501 where substitutions and deletions were noted. PMID:16398935
Use of the Minion nanopore sequencer for rapid sequencing of avian influenza virus isolates
USDA-ARS?s Scientific Manuscript database
A relatively new sequencing technology, the MinION nanopore sequencer, provides a platform that is smaller, faster, and cheaper than existing Next Generation Sequence (NGS) technologies. The MinION sequences of individual strands of DNA and can produce millions of sequencing reads. The cost of the s...
NASA Technical Reports Server (NTRS)
Kaljurand, M.; Valentin, J. R.; Shao, M.
1996-01-01
Two alternative input sequences are commonly employed in correlation chromatography (CC). They are sequences derived according to the algorithm of the feedback shift register (i.e., pseudo random binary sequences (PRBS)) and sequences derived by using the uniform random binary sequences (URBS). These two sequences are compared. By applying the "cleaning" data processing technique to the correlograms that result from these sequences, we show that when the PRBS is used the S/N of the correlogram is much higher than the one resulting from using URBS.
Process of labeling specific chromosomes using recombinant repetitive DNA
Moyzis, R.K.; Meyne, J.
1988-02-12
Chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family members and consensus sequences of the repetitive DNA families for the chromosome preferential sequences. The selected low homology regions are then hybridized with chromosomes to determine those low homology regions hybridized with a specific chromosome under normal stringency conditions.
NASA Technical Reports Server (NTRS)
Fisher, Forest; Gladden, Roy; Khanampornpan, Teerapat
2008-01-01
The MRO Sequence Checking Tool program, mro_check, automates significant portions of the MRO (Mars Reconnaissance Orbiter) sequence checking procedure. Though MRO has similar checks to the ODY s (Mars Odyssey) Mega Check tool, the checks needed for MRO are unique to the MRO spacecraft. The MRO sequence checking tool automates the majority of the sequence validation procedure and check lists that are used to validate the sequences generated by MRO MPST (mission planning and sequencing team). The tool performs more than 50 different checks on the sequence. The automation varies from summarizing data about the sequence needed for visual verification of the sequence, to performing automated checks on the sequence and providing a report for each step. To allow for the addition of new checks as needed, this tool is built in a modular fashion.
Comparison of next generation sequencing technologies for transcriptome characterization
2009-01-01
Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms. PMID:19646272
Siah, Ahmed; Morrison, Diane B.; Fringuelli, Elena; Savage, Paul S.; Richmond, Zina; Purcell, Maureen K.; Johns, Robert; Johnson, Stewart C.; Sakasida, Sonja M.
2015-01-01
Piscine reovirus (PRV) is a double stranded non-enveloped RNA virus detected in farmed and wild salmonids. This study examined the phylogenetic relationships among different PRV sequence types present in samples from salmonids in Western Canada and the US, including Alaska (US), British Columbia (Canada) and Washington State (US). Tissues testing positive for PRV were partially sequenced for segment S1, producing 71 sequences that grouped into 10 unique sequence types. Sequence analysis revealed no identifiable geographical or temporal variation among the sequence types. Identical sequence types were found in fish sampled in 2001, 2005 and 2014. In addition, PRV positive samples from fish derived from Alaska, British Columbia and Washington State share identical sequence types. Comparative analysis of the phylogenetic tree indicated that Canada/US Pacific Northwest sequences formed a subgroup with some Norwegian sequence types (group II), distinct from other Norwegian and Chilean sequences (groups I, III and IV). Representative PRV positive samples from farmed and wild fish in British Columbia and Washington State were subjected to genome sequencing using next generation sequencing methods. Individual analysis of each of the 10 partial segments indicated that the Canadian and US PRV sequence types clustered separately from available whole genome sequences of some Norwegian and Chilean sequences for all segments except the segment S4. In summary, PRV was genetically homogenous over a large geographic distance (Alaska to Washington State), and the sequence types were relatively stable over a 13 year period.
Siah, Ahmed; Morrison, Diane B.; Fringuelli, Elena; Savage, Paul; Richmond, Zina; Johns, Robert; Purcell, Maureen K.; Johnson, Stewart C.; Saksida, Sonja M.
2015-01-01
Piscine reovirus (PRV) is a double stranded non-enveloped RNA virus detected in farmed and wild salmonids. This study examined the phylogenetic relationships among different PRV sequence types present in samples from salmonids in Western Canada and the US, including Alaska (US), British Columbia (Canada) and Washington State (US). Tissues testing positive for PRV were partially sequenced for segment S1, producing 71 sequences that grouped into 10 unique sequence types. Sequence analysis revealed no identifiable geographical or temporal variation among the sequence types. Identical sequence types were found in fish sampled in 2001, 2005 and 2014. In addition, PRV positive samples from fish derived from Alaska, British Columbia and Washington State share identical sequence types. Comparative analysis of the phylogenetic tree indicated that Canada/US Pacific Northwest sequences formed a subgroup with some Norwegian sequence types (group II), distinct from other Norwegian and Chilean sequences (groups I, III and IV). Representative PRV positive samples from farmed and wild fish in British Columbia and Washington State were subjected to genome sequencing using next generation sequencing methods. Individual analysis of each of the 10 partial segments indicated that the Canadian and US PRV sequence types clustered separately from available whole genome sequences of some Norwegian and Chilean sequences for all segments except the segment S4. In summary, PRV was genetically homogenous over a large geographic distance (Alaska to Washington State), and the sequence types were relatively stable over a 13 year period. PMID:26536673
Sanchez, Daniel J; Reber, Paul J
2012-04-01
The memory system that supports implicit perceptual-motor sequence learning relies on brain regions that operate separately from the explicit, medial temporal lobe memory system. The implicit learning system therefore likely has distinct operating characteristics and information processing constraints. To attempt to identify the limits of the implicit sequence learning mechanism, participants performed the serial interception sequence learning (SISL) task with covertly embedded repeating sequences that were much longer than most previous studies: ranging from 30 to 60 (Experiment 1) and 60 to 90 (Experiment 2) items in length. Robust sequence-specific learning was observed for sequences up to 80 items in length, extending the known capacity of implicit sequence learning. In Experiment 3, 12-item repeating sequences were embedded among increasing amounts of irrelevant nonrepeating sequences (from 20 to 80% of training trials). Despite high levels of irrelevant trials, learning occurred across conditions. A comparison of learning rates across all three experiments found a surprising degree of constancy in the rate of learning regardless of sequence length or embedded noise. Sequence learning appears to be constant with the logarithm of the number of sequence repetitions practiced during training. The consistency in learning rate across experiments and conditions implies that the mechanisms supporting implicit sequence learning are not capacity-constrained by very long sequences nor adversely affected by high rates of irrelevant sequences during training.
[Study on ITS sequences of Aconitum vilmorinianum and its medicinal adulterant].
Zhang, Xiao-nan; Du, Chun-hua; Fu, De-huan; Gao, Li; Zhou, Pei-jun; Wang, Li
2012-09-01
To analyze and compare the ITS sequences of Aconitum vilmorinianum and its medicinal adulterant Aconitum austroyunnanense. Total genomic DNA were extracted from sample materials by improved CTAB method, ITS sequences of samples were amplified using PCR systems, directly sequenced and analyzed using software DNAStar, ClustalX1.81 and MEGA 4.0. 299 consistent sites, 19 variable sites and 13 informative sites were found in ITS1 sequences, 162 consistent sites, 2 variable sites and 1 informative sites were found in 5.8S sequences, 217 consistent sites, 3 variable sites and 1 informative site were found in ITS2 sequences. Base transition and transversion was not found only in 5.8S sequences, 2 sites transition and 1 site transversion were found in ITS1 sequences, only 1 site transversion was found in ITS2 sequences comparting the ITS sequences data matrix. By analyzing the ITS sequences data matrix from 2 population of Aconitum vilmorinianum and 3 population of Aconitum austroyunnanense, we found a stable informative site at the 596th base in ITS2 sequences, in all the samples of Aconitum vilmorinianum the base was C, and in all the samples of Aconitum austroyunnanense the base was A. Aconitum vilmorinianum and Aconitum austroyunnanense can be identified by their characters of ITS sequences, and the variable sites in ITS1 sequences are more than in ITS2 sequences.
Long-range barcode labeling-sequencing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Feng; Zhang, Tao; Singh, Kanwar K.
Methods for sequencing single large DNA molecules by clonal multiple displacement amplification using barcoded primers. Sequences are binned based on barcode sequences and sequenced using a microdroplet-based method for sequencing large polynucleotide templates to enable assembly of haplotype-resolved complex genomes and metagenomes.
Coordinate cytokine regulatory sequences
Frazer, Kelly A.; Rubin, Edward M.; Loots, Gabriela G.
2005-05-10
The present invention provides CNS sequences that regulate the cytokine gene expression, expression cassettes and vectors comprising or lacking the CNS sequences, host cells and non-human transgenic animals comprising the CNS sequences or lacking the CNS sequences. The present invention also provides methods for identifying compounds that modulate the functions of CNS sequences as well as methods for diagnosing defects in the CNS sequences of patients.
Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.
Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi
2017-07-01
PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.
Sequence verification of synthetic DNA by assembly of sequencing reads
Wilson, Mandy L.; Cai, Yizhi; Hanlon, Regina; Taylor, Samantha; Chevreux, Bastien; Setubal, João C.; Tyler, Brett M.; Peccoud, Jean
2013-01-01
Gene synthesis attempts to assemble user-defined DNA sequences with base-level precision. Verifying the sequences of construction intermediates and the final product of a gene synthesis project is a critical part of the workflow, yet one that has received the least attention. Sequence validation is equally important for other kinds of curated clone collections. Ensuring that the physical sequence of a clone matches its published sequence is a common quality control step performed at least once over the course of a research project. GenoREAD is a web-based application that breaks the sequence verification process into two steps: the assembly of sequencing reads and the alignment of the resulting contig with a reference sequence. GenoREAD can determine if a clone matches its reference sequence. Its sophisticated reporting features help identify and troubleshoot problems that arise during the sequence verification process. GenoREAD has been experimentally validated on thousands of gene-sized constructs from an ORFeome project, and on longer sequences including whole plasmids and synthetic chromosomes. Comparing GenoREAD results with those from manual analysis of the sequencing data demonstrates that GenoREAD tends to be conservative in its diagnostic. GenoREAD is available at www.genoread.org. PMID:23042248
Droege, Marcus; Hill, Brendon
2008-08-31
The Genome Sequencer FLX System (GS FLX), powered by 454 Sequencing, is a next-generation DNA sequencing technology featuring a unique mix of long reads, exceptional accuracy, and ultra-high throughput. It has been proven to be the most versatile of all currently available next-generation sequencing technologies, supporting many high-profile studies in over seven applications categories. GS FLX users have pursued innovative research in de novo sequencing, re-sequencing of whole genomes and target DNA regions, metagenomics, and RNA analysis. 454 Sequencing is a powerful tool for human genetics research, having recently re-sequenced the genome of an individual human, currently re-sequencing the complete human exome and targeted genomic regions using the NimbleGen sequence capture process, and detected low-frequency somatic mutations linked to cancer.
Method to amplify variable sequences without imposing primer sequences
Bradbury, Andrew M.; Zeytun, Ahmet
2006-11-14
The present invention provides methods of amplifying target sequences without including regions flanking the target sequence in the amplified product or imposing amplification primer sequences on the amplified product. Also provided are methods of preparing a library from such amplified target sequences.
Sequence Complexity of Chromosome 3 in Caenorhabditis elegans
Pierro, Gaetano
2012-01-01
The nucleotide sequences complexity in chromosome 3 of Caenorhabditis elegans (C. elegans) is studied. The complexity of these sequences is compared with some random sequences. Moreover, by using some parameters related to complexity such as fractal dimension and frequency, indicator matrix is given a first classification of sequences of C. elegans. In particular, the sequences with highest and lowest fractal value are singled out. It is shown that the intrinsic nature of the low fractal dimension sequences has many common features with the random sequences. PMID:22919380
Chromosome specific repetitive DNA sequences
Moyzis, Robert K.; Meyne, Julianne
1991-01-01
A method is provided for determining specific nucleotide sequences useful in forming a probe which can identify specific chromosomes, preferably through in situ hybridization within the cell itself. In one embodiment, chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family me This invention is the result of a contract with the Department of Energy (Contract No. W-7405-ENG-36).
Children's discrimination of vowel sequences
NASA Astrophysics Data System (ADS)
Coady, Jeffry A.; Kluender, Keith R.; Evans, Julia
2003-10-01
Children's ability to discriminate sequences of steady-state vowels was investigated. Vowels (as in ``beet,'' ``bat,'' ``bought,'' and ``boot'') were synthesized at durations of 40, 80, 160, 320, 640, and 1280 ms. Four different vowel sequences were created by concatenating different orders of vowels for each duration, separated by 10-ms intervening silence. Thus, sequences differed in vowel order and duration (rate). Sequences were 12 s in duration, with amplitude ramped linearly over the first and last 2 s. Sequence pairs included both same (identical sequences) and different trials (sequences with vowels in different orders). Sequences with vowel of equal duration were presented on individual trials. Children aged 7;0 to 10;6 listened to pairs of sequences (with 100 ms between sequences) and responded whether sequences sounded the same or different. Results indicate that children are best able to discriminate sequences of intermediate-duration vowels, typical of conversational speaking rate. Children were less accurate with both shorter and longer vowels. Results are discussed in terms of auditory processing (shortest vowels) and memory (longest vowels). [Research supported by NIDCD DC-05263, DC-04072, and DC-005650.
Gibbs, Mark J; Armstrong, John S; Gibbs, Adrian J
2005-01-01
Background Most current DNA diagnostic tests for identifying organisms use specific oligonucleotide probes that are complementary in sequence to, and hence only hybridise with the DNA of one target species. By contrast, in traditional taxonomy, specimens are usually identified by 'dichotomous keys' that use combinations of characters shared by different members of the target set. Using one specific character for each target is the least efficient strategy for identification. Using combinations of shared bisectionally-distributed characters is much more efficient, and this strategy is most efficient when they separate the targets in a progressively binary way. Results We have developed a practical method for finding minimal sets of sub-sequences that identify individual sequences, and could be targeted by combinations of probes, so that the efficient strategy of traditional taxonomic identification could be used in DNA diagnosis. The sizes of minimal sub-sequence sets depended mostly on sequence diversity and sub-sequence length and interactions between these parameters. We found that 201 distinct cytochrome oxidase subunit-1 (CO1) genes from moths (Lepidoptera) were distinguished using only 15 sub-sequences 20 nucleotides long, whereas only 8–10 sub-sequences 6–10 nucleotides long were required to distinguish the CO1 genes of 92 species from the 9 largest orders of insects. Conclusion The presence/absence of sub-sequences in a set of gene sequences can be used like the questions in a traditional dichotomous taxonomic key; hybridisation probes complementary to such sub-sequences should provide a very efficient means for identifying individual species, subtypes or genotypes. Sequence diversity and sub-sequence length are the major factors that determine the numbers of distinguishing sub-sequences in any set of sequences. PMID:15817134
Tzou, Philip L; Ariyaratne, Pramila; Varghese, Vici; Lee, Charlie; Rakhmanaliev, Elian; Villy, Carolin; Yee, Meiqi; Tan, Kevin; Michel, Gerd; Pinsky, Benjamin A; Shafer, Robert W
2018-06-01
The ability of next-generation sequencing (NGS) technologies to detect low frequency HIV-1 drug resistance mutations (DRMs) not detected by dideoxynucleotide Sanger sequencing has potential advantages for improved patient outcomes. We compared the performance of an in vitro diagnostic (IVD) NGS assay, the Sentosa SQ HIV genotyping assay for HIV-1 genotypic resistance testing, with Sanger sequencing on 138 protease/reverse transcriptase (RT) and 39 integrase sequences. The NGS assay used a 5% threshold for reporting low-frequency variants. The level of complete plus partial nucleotide sequence concordance between Sanger sequencing and NGS was 99.9%. Among the 138 protease/RT sequences, a mean of 6.4 DRMs was identified by both Sanger and NGS, a mean of 0.5 DRM was detected by NGS alone, and a mean of 0.1 DRM was detected by Sanger sequencing alone. Among the 39 integrase sequences, a mean of 1.6 DRMs was detected by both Sanger sequencing and NGS and a mean of 0.15 DRM was detected by NGS alone. Compared with Sanger sequencing, NGS estimated higher levels of resistance to one or more antiretroviral drugs for 18.2% of protease/RT sequences and 5.1% of integrase sequences. There was little evidence for technical artifacts in the NGS sequences, but the G-to-A hypermutation was detected in three samples. In conclusion, the IVD NGS assay evaluated in this study was highly concordant with Sanger sequencing. At the 5% threshold for reporting minority variants, NGS appeared to attain a modestly increased sensitivity for detecting low-frequency DRMs without compromising sequence accuracy. Copyright © 2018 American Society for Microbiology.
NASA Astrophysics Data System (ADS)
Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.
2017-07-01
DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.
Sequence Bundles: a novel method for visualising, discovering and exploring sequence motifs
2014-01-01
Background We introduce Sequence Bundles--a novel data visualisation method for representing multiple sequence alignments (MSAs). We identify and address key limitations of the existing bioinformatics data visualisation methods (i.e. the Sequence Logo) by enabling Sequence Bundles to give salient visual expression to sequence motifs and other data features, which would otherwise remain hidden. Methods For the development of Sequence Bundles we employed research-led information design methodologies. Sequences are encoded as uninterrupted, semi-opaque lines plotted on a 2-dimensional reconfigurable grid. Each line represents a single sequence. The thickness and opacity of the stack at each residue in each position indicates the level of conservation and the lines' curved paths expose patterns in correlation and functionality. Several MSAs can be visualised in a composite image. The Sequence Bundles method is designed to favour a tangible, continuous and intuitive display of information. Results We have developed a software demonstration application for generating a Sequence Bundles visualisation of MSAs provided for the BioVis 2013 redesign contest. A subsequent exploration of the visualised line patterns allowed for the discovery of a number of interesting features in the dataset. Reported features include the extreme conservation of sequences displaying a specific residue and bifurcations of the consensus sequence. Conclusions Sequence Bundles is a novel method for visualisation of MSAs and the discovery of sequence motifs. It can aid in generating new insight and hypothesis making. Sequence Bundles is well disposed for future implementation as an interactive visual analytics software, which can complement existing visualisation tools. PMID:25237395
Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions
Gardner, Shea N [San Leandro, CA; Mariella, Jr., Raymond P.; Christian, Allen T [Tracy, CA; Young, Jennifer A [Berkeley, CA; Clague, David S [Livermore, CA
2011-01-18
A method of fabricating a DNA molecule of user-defined sequence. The method comprises the steps of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an even or odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths. In one embodiment starting sequence fragments are of different lengths, n, n+1, n+2, etc.
Program Synthesizes UML Sequence Diagrams
NASA Technical Reports Server (NTRS)
Barry, Matthew R.; Osborne, Richard N.
2006-01-01
A computer program called "Rational Sequence" generates Universal Modeling Language (UML) sequence diagrams of a target Java program running on a Java virtual machine (JVM). Rational Sequence thereby performs a reverse engineering function that aids in the design documentation of the target Java program. Whereas previously, the construction of sequence diagrams was a tedious manual process, Rational Sequence generates UML sequence diagrams automatically from the running Java code.
Probabilistic Motor Sequence Yields Greater Offline and Less Online Learning than Fixed Sequence
Du, Yue; Prashad, Shikha; Schoenbrun, Ilana; Clark, Jane E.
2016-01-01
It is well acknowledged that motor sequences can be learned quickly through online learning. Subsequently, the initial acquisition of a motor sequence is boosted or consolidated by offline learning. However, little is known whether offline learning can drive the fast learning of motor sequences (i.e., initial sequence learning in the first training session). To examine offline learning in the fast learning stage, we asked four groups of young adults to perform the serial reaction time (SRT) task with either a fixed or probabilistic sequence and with or without preliminary knowledge (PK) of the presence of a sequence. The sequence and PK were manipulated to emphasize either procedural (probabilistic sequence; no preliminary knowledge (NPK)) or declarative (fixed sequence; with PK) memory that were found to either facilitate or inhibit offline learning. In the SRT task, there were six learning blocks with a 2 min break between each consecutive block. Throughout the session, stimuli followed the same fixed or probabilistic pattern except in Block 5, in which stimuli appeared in a random order. We found that PK facilitated the learning of a fixed sequence, but not a probabilistic sequence. In addition to overall learning measured by the mean reaction time (RT), we examined the progressive changes in RT within and between blocks (i.e., online and offline learning, respectively). It was found that the two groups who performed the fixed sequence, regardless of PK, showed greater online learning than the other two groups who performed the probabilistic sequence. The groups who performed the probabilistic sequence, regardless of PK, did not display online learning, as indicated by a decline in performance within the learning blocks. However, they did demonstrate remarkably greater offline improvement in RT, which suggests that they are learning the probabilistic sequence offline. These results suggest that in the SRT task, the fast acquisition of a motor sequence is driven by concurrent online and offline learning. In addition, as the acquisition of a probabilistic sequence requires greater procedural memory compared to the acquisition of a fixed sequence, our results suggest that offline learning is more likely to take place in a procedural sequence learning task. PMID:26973502
Probabilistic Motor Sequence Yields Greater Offline and Less Online Learning than Fixed Sequence.
Du, Yue; Prashad, Shikha; Schoenbrun, Ilana; Clark, Jane E
2016-01-01
It is well acknowledged that motor sequences can be learned quickly through online learning. Subsequently, the initial acquisition of a motor sequence is boosted or consolidated by offline learning. However, little is known whether offline learning can drive the fast learning of motor sequences (i.e., initial sequence learning in the first training session). To examine offline learning in the fast learning stage, we asked four groups of young adults to perform the serial reaction time (SRT) task with either a fixed or probabilistic sequence and with or without preliminary knowledge (PK) of the presence of a sequence. The sequence and PK were manipulated to emphasize either procedural (probabilistic sequence; no preliminary knowledge (NPK)) or declarative (fixed sequence; with PK) memory that were found to either facilitate or inhibit offline learning. In the SRT task, there were six learning blocks with a 2 min break between each consecutive block. Throughout the session, stimuli followed the same fixed or probabilistic pattern except in Block 5, in which stimuli appeared in a random order. We found that PK facilitated the learning of a fixed sequence, but not a probabilistic sequence. In addition to overall learning measured by the mean reaction time (RT), we examined the progressive changes in RT within and between blocks (i.e., online and offline learning, respectively). It was found that the two groups who performed the fixed sequence, regardless of PK, showed greater online learning than the other two groups who performed the probabilistic sequence. The groups who performed the probabilistic sequence, regardless of PK, did not display online learning, as indicated by a decline in performance within the learning blocks. However, they did demonstrate remarkably greater offline improvement in RT, which suggests that they are learning the probabilistic sequence offline. These results suggest that in the SRT task, the fast acquisition of a motor sequence is driven by concurrent online and offline learning. In addition, as the acquisition of a probabilistic sequence requires greater procedural memory compared to the acquisition of a fixed sequence, our results suggest that offline learning is more likely to take place in a procedural sequence learning task.
Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags
de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.
2000-01-01
Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084
The recurrence sequences via Sylvester matrices
NASA Astrophysics Data System (ADS)
Karaduman, Erdal; Deveci, Ömür
2017-07-01
In this work, we define the Pell-Jacobsthal-Slyvester sequence and the Jacobsthal-Pell-Slyvester sequence by using the Slyvester matrices which are obtained from the characteristic polynomials of the Pell and Jacobsthal sequences and then, we study the sequences defined modulo m. Also, we obtain the cyclic groups and the semigroups from the generating matrices of these sequences when read modulo m and then, we derive the relationships among the orders of the cyclic groups and the periods of the sequences. Furthermore, we redefine Pell-Jacobsthal-Slyvester sequence and the Jacobsthal-Pell-Slyvester sequence by means of the elements of the groups and then, we examine them in the finite groups.
BAC sequencing using pooled methods.
Saski, Christopher A; Feltus, F Alex; Parida, Laxmi; Haiminen, Niina
2015-01-01
Shotgun sequencing and assembly of a large, complex genome can be both expensive and challenging to accurately reconstruct the true genome sequence. Repetitive DNA arrays, paralogous sequences, polyploidy, and heterozygosity are main factors that plague de novo genome sequencing projects that typically result in highly fragmented assemblies and are difficult to extract biological meaning. Targeted, sub-genomic sequencing offers complexity reduction by removing distal segments of the genome and a systematic mechanism for exploring prioritized genomic content through BAC sequencing. If one isolates and sequences the genome fraction that encodes the relevant biological information, then it is possible to reduce overall sequencing costs and efforts that target a genomic segment. This chapter describes the sub-genome assembly protocol for an organism based upon a BAC tiling path derived from a genome-scale physical map or from fine mapping using BACs to target sub-genomic regions. Methods that are described include BAC isolation and mapping, DNA sequencing, and sequence assembly.
Long-range correlations and charge transport properties of DNA sequences
NASA Astrophysics Data System (ADS)
Liu, Xiao-liang; Ren, Yi; Xie, Qiong-tao; Deng, Chao-sheng; Xu, Hui
2010-04-01
By using Hurst's analysis and transfer approach, the rescaled range functions and Hurst exponents of human chromosome 22 and enterobacteria phage lambda DNA sequences are investigated and the transmission coefficients, Landauer resistances and Lyapunov coefficients of finite segments based on above genomic DNA sequences are calculated. In a comparison with quasiperiodic and random artificial DNA sequences, we find that λ-DNA exhibits anticorrelation behavior characterized by a Hurst exponent 0.5
Single molecule sequencing of the M13 virus genome without amplification
Zhao, Luyang; Deng, Liwei; Li, Gailing; Jin, Huan; Cai, Jinsen; Shang, Huan; Li, Yan; Wu, Haomin; Xu, Weibin; Zeng, Lidong; Zhang, Renli; Zhao, Huan; Wu, Ping; Zhou, Zhiliang; Zheng, Jiao; Ezanno, Pierre; Yang, Andrew X.; Yan, Qin; Deem, Michael W.; He, Jiankui
2017-01-01
Next generation sequencing (NGS) has revolutionized life sciences research. However, GC bias and costly, time-intensive library preparation make NGS an ill fit for increasing sequencing demands in the clinic. A new class of third-generation sequencing platforms has arrived to meet this need, capable of directly measuring DNA and RNA sequences at the single-molecule level without amplification. Here, we use the new GenoCare single-molecule sequencing platform from Direct Genomics to sequence the genome of the M13 virus. Our platform detects single-molecule fluorescence by total internal reflection microscopy, with sequencing-by-synthesis chemistry. We sequenced the genome of M13 to a depth of 316x, with 100% coverage. We determined a consensus sequence accuracy of 100%. In contrast to GC bias inherent to NGS results, we demonstrated that our single-molecule sequencing method yields minimal GC bias. PMID:29253901
Single molecule sequencing of the M13 virus genome without amplification.
Zhao, Luyang; Deng, Liwei; Li, Gailing; Jin, Huan; Cai, Jinsen; Shang, Huan; Li, Yan; Wu, Haomin; Xu, Weibin; Zeng, Lidong; Zhang, Renli; Zhao, Huan; Wu, Ping; Zhou, Zhiliang; Zheng, Jiao; Ezanno, Pierre; Yang, Andrew X; Yan, Qin; Deem, Michael W; He, Jiankui
2017-01-01
Next generation sequencing (NGS) has revolutionized life sciences research. However, GC bias and costly, time-intensive library preparation make NGS an ill fit for increasing sequencing demands in the clinic. A new class of third-generation sequencing platforms has arrived to meet this need, capable of directly measuring DNA and RNA sequences at the single-molecule level without amplification. Here, we use the new GenoCare single-molecule sequencing platform from Direct Genomics to sequence the genome of the M13 virus. Our platform detects single-molecule fluorescence by total internal reflection microscopy, with sequencing-by-synthesis chemistry. We sequenced the genome of M13 to a depth of 316x, with 100% coverage. We determined a consensus sequence accuracy of 100%. In contrast to GC bias inherent to NGS results, we demonstrated that our single-molecule sequencing method yields minimal GC bias.
DNA sequencing using polymerase substrate-binding kinetics
Previte, Michael John Robert; Zhou, Chunhong; Kellinger, Matthew; Pantoja, Rigo; Chen, Cheng-Yao; Shi, Jin; Wang, BeiBei; Kia, Amirali; Etchin, Sergey; Vieceli, John; Nikoomanzar, Ali; Bomati, Erin; Gloeckner, Christian; Ronaghi, Mostafa; He, Molly Min
2015-01-01
Next-generation sequencing (NGS) has transformed genomic research by decreasing the cost of sequencing. However, whole-genome sequencing is still costly and complex for diagnostics purposes. In the clinical space, targeted sequencing has the advantage of allowing researchers to focus on specific genes of interest. Routine clinical use of targeted NGS mandates inexpensive instruments, fast turnaround time and an integrated and robust workflow. Here we demonstrate a version of the Sequencing by Synthesis (SBS) chemistry that potentially can become a preferred targeted sequencing method in the clinical space. This sequencing chemistry uses natural nucleotides and is based on real-time recording of the differential polymerase/DNA-binding kinetics in the presence of correct or mismatch nucleotides. This ensemble SBS chemistry has been implemented on an existing Illumina sequencing platform with integrated cluster amplification. We discuss the advantages of this sequencing chemistry for targeted sequencing as well as its limitations for other applications. PMID:25612848
Memory and learning with rapid audiovisual sequences
Keller, Arielle S.; Sekuler, Robert
2015-01-01
We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed. PMID:26575193
Memory and learning with rapid audiovisual sequences.
Keller, Arielle S; Sekuler, Robert
2015-01-01
We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed.
SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read
2010-01-01
Background High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. Results SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. Conclusions SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts. PMID:20089148
Liu, Gary W; Livesay, Brynn R; Kacherovsky, Nataly A; Cieslewicz, Maryelise; Lutz, Emi; Waalkes, Adam; Jensen, Michael C; Salipante, Stephen J; Pun, Suzie H
2015-08-19
Peptide ligands are used to increase the specificity of drug carriers to their target cells and to facilitate intracellular delivery. One method to identify such peptide ligands, phage display, enables high-throughput screening of peptide libraries for ligands binding to therapeutic targets of interest. However, conventional methods for identifying target binders in a library by Sanger sequencing are low-throughput, labor-intensive, and provide a limited perspective (<0.01%) of the complete sequence space. Moreover, the small sample space can be dominated by nonspecific, preferentially amplifying "parasitic sequences" and plastic-binding sequences, which may lead to the identification of false positives or exclude the identification of target-binding sequences. To overcome these challenges, we employed next-generation Illumina sequencing to couple high-throughput screening and high-throughput sequencing, enabling more comprehensive access to the phage display library sequence space. In this work, we define the hallmarks of binding sequences in next-generation sequencing data, and develop a method that identifies several target-binding phage clones for murine, alternatively activated M2 macrophages with a high (100%) success rate: sequences and binding motifs were reproducibly present across biological replicates; binding motifs were identified across multiple unique sequences; and an unselected, amplified library accurately filtered out parasitic sequences. In addition, we validate the Multiple Em for Motif Elicitation tool as an efficient and principled means of discovering binding sequences.
Integration of Temporal and Ordinal Information During Serial Interception Sequence Learning
Gobel, Eric W.; Sanchez, Daniel J.; Reber, Paul J.
2011-01-01
The expression of expert motor skills typically involves learning to perform a precisely timed sequence of movements (e.g., language production, music performance, athletic skills). Research examining incidental sequence learning has previously relied on a perceptually-cued task that gives participants exposure to repeating motor sequences but does not require timing of responses for accuracy. Using a novel perceptual-motor sequence learning task, learning a precisely timed cued sequence of motor actions is shown to occur without explicit instruction. Participants learned a repeating sequence through practice and showed sequence-specific knowledge via a performance decrement when switched to an unfamiliar sequence. In a second experiment, the integration of representation of action order and timing sequence knowledge was examined. When either action order or timing sequence information was selectively disrupted, performance was reduced to levels similar to completely novel sequences. Unlike prior sequence-learning research that has found timing information to be secondary to learning action sequences, when the task demands require accurate action and timing information, an integrated representation of these types of information is acquired. These results provide the first evidence for incidental learning of fully integrated action and timing sequence information in the absence of an independent representation of action order, and suggest that this integrative mechanism may play a material role in the acquisition of complex motor skills. PMID:21417511
Method and apparatus for biological sequence comparison
Marr, T.G.; Chang, W.I.
1997-12-23
A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.
Method and apparatus for biological sequence comparison
Marr, Thomas G.; Chang, William I-Wei
1997-01-01
A method and apparatus for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence.
Budiman, Muhammad A.; Mao, Long; Wood, Todd C.; Wing, Rod A.
2000-01-01
Recently a new strategy using BAC end sequences as sequence-tagged connectors (STCs) was proposed for whole-genome sequencing projects. In this study, we present the construction and detailed characterization of a 15.0 haploid genome equivalent BAC library for the cultivated tomato, Lycopersicon esculentum cv. Heinz 1706. The library contains 129,024 clones with an average insert size of 117.5 kb and a chloroplast content of 1.11%. BAC end sequences from 1490 ends were generated and analyzed as a preliminary evaluation for using this library to develop an STC framework to sequence the tomato genome. A total of 1205 BAC end sequences (80.9%) were obtained, with an average length of 360 high-quality bases, and were searched against the GenBank database. Using a cutoff expectation value of <10−6, and combining the results from BLASTN, BLASTX, and TBLASTX searches, 24.3% of the BAC end sequences were similar to known sequences, of which almost half (48.7%) share sequence similarities to retrotransposons and 7% to known genes. Some of the transposable element sequences were the first reported in tomato, such as sequences similar to maize transposon Activator (Ac) ORF and tobacco pararetrovirus-like sequences. Interestingly, there were no BAC end sequences similar to the highly repeated TGRI and TGRII elements. However, the majority (70.3%) of STCs did not share significant sequence similarities to any sequences in GenBank at either the DNA or predicted protein levels, indicating that a large portion of the tomato genome is still unknown. Our data demonstrate that this BAC library is suitable for developing an STC database to sequence the tomato genome. The advantages of developing an STC framework for whole-genome sequencing of tomato are discussed. [The BAC end sequences described in this paper have been deposited in the GenBank data library under accession nos. AQ367111–AQ368361.] PMID:10645957
Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske
2007-02-14
The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.
Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions
Gardner, Shea N; Mariella, Jr., Raymond P; Christian, Allen T; Young, Jennifer A; Clague, David S
2013-06-25
A method of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths.
Repetitive sequences in plant nuclear DNA: types, distribution, evolution and function.
Mehrotra, Shweta; Goyal, Vinod
2014-08-01
Repetitive DNA sequences are a major component of eukaryotic genomes and may account for up to 90% of the genome size. They can be divided into minisatellite, microsatellite and satellite sequences. Satellite DNA sequences are considered to be a fast-evolving component of eukaryotic genomes, comprising tandemly-arrayed, highly-repetitive and highly-conserved monomer sequences. The monomer unit of satellite DNA is 150-400 base pairs (bp) in length. Repetitive sequences may be species- or genus-specific, and may be centromeric or subtelomeric in nature. They exhibit cohesive and concerted evolution caused by molecular drive, leading to high sequence homogeneity. Repetitive sequences accumulate variations in sequence and copy number during evolution, hence they are important tools for taxonomic and phylogenetic studies, and are known as "tuning knobs" in the evolution. Therefore, knowledge of repetitive sequences assists our understanding of the organization, evolution and behavior of eukaryotic genomes. Repetitive sequences have cytoplasmic, cellular and developmental effects and play a role in chromosomal recombination. In the post-genomics era, with the introduction of next-generation sequencing technology, it is possible to evaluate complex genomes for analyzing repetitive sequences and deciphering the yet unknown functional potential of repetitive sequences. Copyright © 2014 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.
Timmermans, M J T N; Dodsworth, S; Culverwell, C L; Bocak, L; Ahrens, D; Littlewood, D T J; Pons, J; Vogler, A P
2010-11-01
Mitochondrial genome sequences are important markers for phylogenetics but taxon sampling remains sporadic because of the great effort and cost required to acquire full-length sequences. Here, we demonstrate a simple, cost-effective way to sequence the full complement of protein coding mitochondrial genes from pooled samples using the 454/Roche platform. Multiplexing was achieved without the need for expensive indexing tags ('barcodes'). The method was trialled with a set of long-range polymerase chain reaction (PCR) fragments from 30 species of Coleoptera (beetles) sequenced in a 1/16th sector of a sequencing plate. Long contigs were produced from the pooled sequences with sequencing depths ranging from ∼10 to 100× per contig. Species identity of individual contigs was established via three 'bait' sequences matching disparate parts of the mitochondrial genome obtained by conventional PCR and Sanger sequencing. This proved that assembly of contigs from the sequencing pool was correct. Our study produced sequences for 21 nearly complete and seven partial sets of protein coding mitochondrial genes. Combined with existing sequences for 25 taxa, an improved estimate of basal relationships in Coleoptera was obtained. The procedure could be employed routinely for mitochondrial genome sequencing at the species level, to provide improved species 'barcodes' that currently use the cox1 gene only.
Novel methodologies for spectral classification of exon and intron sequences
NASA Astrophysics Data System (ADS)
Kwan, Hon Keung; Kwan, Benjamin Y. M.; Kwan, Jennifer Y. Y.
2012-12-01
Digital processing of a nucleotide sequence requires it to be mapped to a numerical sequence in which the choice of nucleotide to numeric mapping affects how well its biological properties can be preserved and reflected from nucleotide domain to numerical domain. Digital spectral analysis of nucleotide sequences unfolds a period-3 power spectral value which is more prominent in an exon sequence as compared to that of an intron sequence. The success of a period-3 based exon and intron classification depends on the choice of a threshold value. The main purposes of this article are to introduce novel codes for 1-sequence numerical representations for spectral analysis and compare them to existing codes to determine appropriate representation, and to introduce novel thresholding methods for more accurate period-3 based exon and intron classification of an unknown sequence. The main findings of this study are summarized as follows: Among sixteen 1-sequence numerical representations, the K-Quaternary Code I offers an attractive performance. A windowed 1-sequence numerical representation (with window length of 9, 15, and 24 bases) offers a possible speed gain over non-windowed 4-sequence Voss representation which increases as sequence length increases. A winner threshold value (chosen from the best among two defined threshold values and one other threshold value) offers a top precision for classifying an unknown sequence of specified fixed lengths. An interpolated winner threshold value applicable to an unknown and arbitrary length sequence can be estimated from the winner threshold values of fixed length sequences with a comparable performance. In general, precision increases as sequence length increases. The study contributes an effective spectral analysis of nucleotide sequences to better reveal embedded properties, and has potential applications in improved genome annotation.
NASA Astrophysics Data System (ADS)
Yoshida, S.
2000-11-01
High-frequency stratigraphic sequences that comprise the Desert Member of the Blackhawk Formation, the Lower Castlegate Sandstone, and the Buck Tongue in the Green River area of Utah display changes in sequence architecture from marine deposits to marginal marine deposits to an entirely nonmarine section. Facies and sequence architecture differ above and below the regionally extensive Castlegate sequence boundary, which separates two low-frequency (106-year cyclicity) sequences. Below this surface, high-frequency sequences are identified and interpreted as comprising the highstand systems tract of the low-frequency Blackhawk sequence. Each high-frequency sequence has a local incised valley system on top of the wave-dominated delta, and coastal plain to shallow marine deposits are preserved. Above the Castlegate sequence boundary, in contrast, a regionally extensive sheet sandstone of fluvial to estuarine origin with laterally continuous internal erosional surfaces occurs. These deposits above the Castlegate sequence boundary are interpreted as the late lowstand to early transgressive systems tracts of the low-frequency Castlegate sequence. The base-level changes that generated both the low- and high-frequency sequences are attributed to crustal response to fluctuations in compressive intraplate stress on two different time scales. The low-frequency stratigraphic sequences are attributed to changes in the long-term regional subsidence rate and regional tilting of foreland basin fill. High-frequency sequences probably reflect the response of anisotropic basement to tectonism. Sequence architecture changes rapidly across the faulted margin of the underlying Paleozoic Paradox Basin. The high-frequency sequences are deeply eroded and stack above the Paradox Basin, but display less relief and become conformable updip. These features indicate that the area above the Paradox Basin was more prone to vertical structural movements during formation of the Blackhawk-Lower Castlegate succession.
Equally parsimonious pathways through an RNA sequence space are not equally likely
NASA Technical Reports Server (NTRS)
Lee, Y. H.; DSouza, L. M.; Fox, G. E.
1997-01-01
An experimental system for determining the potential ability of sequences resembling 5S ribosomal RNA (rRNA) to perform as functional 5S rRNAs in vivo in the Escherichia coli cellular environment was devised previously. Presumably, the only 5S rRNA sequences that would have been fixed by ancestral populations are ones that were functionally valid, and hence the actual historical paths taken through RNA sequence space during 5S rRNA evolution would have most likely utilized valid sequences. Herein, we examine the potential validity of all sequence intermediates along alternative equally parsimonious trajectories through RNA sequence space which connect two pairs of sequences that had previously been shown to behave as valid 5S rRNAs in E. coli. The first trajectory requires a total of four changes. The 14 sequence intermediates provide 24 apparently equally parsimonious paths by which the transition could occur. The second trajectory involves three changes, six intermediate sequences, and six potentially equally parsimonious paths. In total, only eight of the 20 sequence intermediates were found to be clearly invalid. As a consequence of the position of these invalid intermediates in the sequence space, seven of the 30 possible paths consisted of exclusively valid sequences. In several cases, the apparent validity/invalidity of the intermediate sequences could not be anticipated on the basis of current knowledge of the 5S rRNA structure. This suggests that the interdependencies in RNA sequence space may be more complex than currently appreciated. If ancestral sequences predicted by parsimony are to be regarded as actual historical sequences, then the present results would suggest that they should also satisfy a validity requirement and that, in at least limited cases, this conjecture can be tested experimentally.
Aftershock occurrence rate decay for individual sequences and catalogs
NASA Astrophysics Data System (ADS)
Nyffenegger, Paul A.
One of the earliest observations of the Earth's seismicity is that the rate of aftershock occurrence decays with time according to a power law commonly known as modified Omori-law (MOL) decay. However, the physical reasons for aftershock occurrence and the empirical decay in rate remain unclear despite numerous models that yield similar rate decay behavior. Key problems in relating the observed empirical relationship to the physical conditions of the mainshock and fault are the lack of studies including small magnitude mainshocks and the lack of uniformity between studies. We use simulated aftershock sequences to investigate the factors which influence the maximum likelihood (ML) estimate of the Omori-law p value, the parameter describing aftershock occurrence rate decay, for both individual aftershock sequences and "stacked" or superposed sequences. Generally the ML estimate of p is accurate, but since the ML estimated uncertainty is unaffected by whether the sequence resembles an MOL model, a goodness-of-fit test such as the Anderson-Darling statistic is necessary. While stacking aftershock sequences permits the study of entire catalogs and sequences with small aftershock populations, stacking introduces artifacts. The p value for stacked sequences is approximately equal to the mean of the individual sequence p values. We apply single-link cluster analysis to identify all aftershock sequences from eleven regional seismicity catalogs. We observe two new mathematically predictable empirical relationships for the distribution of aftershock sequence populations. The average properties of aftershock sequences are not correlated with tectonic environment, but aftershock populations and p values do show a depth dependence. The p values show great variability with time, and large values or changes in p sometimes precedes major earthquakes. Studies of teleseismic earthquake catalogs over the last twenty years have led seismologists to question seismicity models and aftershock sequence decay for deep sequences. For seven exceptional deep sequences, we conclude that MOL decay adequately describes these sequences, and little difference exists compared to shallow sequences. However, they do include larger aftershock populations compared to most deep sequences. These results imply that p values for deep sequences are larger than those for intermediate depth sequences.
"First generation" automated DNA sequencing technology.
Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M
2011-10-01
Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.
Jurka, Jerzy W.
1997-01-01
Enhanced homologous recombination is obtained by employing a consensus sequence which has been found to be associated with integration of repeat sequences, such as Alu and ID. The consensus sequence or sequence having a single transition mutation determines one site of a double break which allows for high efficiency of integration at the site. By introducing single or double stranded DNA having the consensus sequence flanking region joined to a sequence of interest, one can reproducibly direct integration of the sequence of interest at one or a limited number of sites. In this way, specific sites can be identified and homologous recombination achieved at the site by employing a second flanking sequence associated with a sequence proximal to the 3'-nick.
The sequence of sequencers: The history of sequencing DNA
Heather, James M.; Chain, Benjamin
2016-01-01
Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. PMID:26554401
Borozan, Ivan; Watt, Stuart; Ferretti, Vincent
2015-05-01
Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. ivan.borozan@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Borozan, Ivan; Watt, Stuart; Ferretti, Vincent
2015-01-01
Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. Availability and implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. Contact: ivan.borozan@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573913
Valenzuela-González, Fabiola; Martínez-Porchas, Marcel; Villalpando-Canchola, Enrique; Vargas-Albores, Francisco
2016-03-01
Ultrafast-metagenomic sequence classification using exact alignments (Kraken) is a novel approach to classify 16S rDNA sequences. The classifier is based on mapping short sequences to the lowest ancestor and performing alignments to form subtrees with specific weights in each taxon node. This study aimed to evaluate the classification performance of Kraken with long 16S rDNA random environmental sequences produced by cloning and then Sanger sequenced. A total of 480 clones were isolated and expanded, and 264 of these clones formed contigs (1352 ± 153 bp). The same sequences were analyzed using the Ribosomal Database Project (RDP) classifier. Deeper classification performance was achieved by Kraken than by the RDP: 73% of the contigs were classified up to the species or variety levels, whereas 67% of these contigs were classified no further than the genus level by the RDP. The results also demonstrated that unassembled sequences analyzed by Kraken provide similar or inclusively deeper information. Moreover, sequences that did not form contigs, which are usually discarded by other programs, provided meaningful information when analyzed by Kraken. Finally, it appears that the assembly step for Sanger sequences can be eliminated when using Kraken. Kraken cumulates the information of both sequence senses, providing additional elements for the classification. In conclusion, the results demonstrate that Kraken is an excellent choice for use in the taxonomic assignment of sequences obtained by Sanger sequencing or based on third generation sequencing, of which the main goal is to generate larger sequences. Copyright © 2016 Elsevier B.V. All rights reserved.
Rapid identification of causative species in patients with Old World leishmaniasis.
Minodier, P; Piarroux, R; Gambarelli, F; Joblet, C; Dumon, H
1997-01-01
Conventional methods for the identification of species of Leishmania parasite causing infections have limitations. By using a DNA-based alternative, the present study tries to develop a new tool for this purpose. Thirty-three patients living in Marseilles (in the south of France) were suffering from visceral or cutaneous leishmaniasis. DNA of the parasite in clinical samples (bone marrow, peripheral blood, or skin) from these patients were amplified by PCR and were directly sequenced. The sequences observed were compared to these of 30 strains of the genus causing Old World leishmaniasis collected in Europe, Africa, or Asia. In the analysis of the sequences of the strains, two different sequence patterns for Leishmania infantum, one sequence for Leishmania donovani, one sequence for Leishmania major, two sequences for Leishmania tropica, and one sequence for Leishmania aethiopica were obtained. Four sequences were observed among the strains from the patients: one was similar to the sequence for the L. major strains, two were identical to the sequences for the L. infantum strains, and the last sequence was not observed within the strains but had a high degree of homology with the sequences of the L. infantum and L. donovani strains. The L. infantum strains from all immunocompetent patients had the same sequence. The L. infantum strains from immunodeficient patients suffering from visceral leishmaniasis had three different sequences. This fact might signify that some variants of L. infantum acquire pathogenicity exclusively in immunocompromised patients. To dispense with the sequencing step, a restriction assay with HaeIII was used. Some restriction patterns might support genetic exchanges in members of the genus Leishmania. PMID:9316906
NASA Technical Reports Server (NTRS)
Koskela, P. E.; Bollman, W. E.; Freeman, J. E.; Helton, M. R.; Reichert, R. J.; Travers, E. S.; Zawacki, S. J.
1973-01-01
The activities of the following members of the Navigation Team are recorded: the Science Sequence Design Group, responsible for preparing the final science sequence designs; the Advanced Sequence Planning Group, responsible for sequence planning; and the Science Recommendation Team (SRT) representatives, responsible for conducting the necessary sequence design interfaces with the teams during the mission. The interface task included science support in both advance planning and daily operations. Science sequences designed during the mission are also discussed.
The first genome sequences of human bocaviruses from Vietnam
Thanh, Tran Tan; Van, Hoang Minh Tu; Hong, Nguyen Thi Thu; Nhu, Le Nguyen Truc; Anh, Nguyen To; Tuan, Ha Manh; Hien, Ho Van; Tuong, Nguyen Manh; Kien, Trinh Trung; Khanh, Truong Huu; Nhan, Le Nguyen Thanh; Hung, Nguyen Thanh; Chau, Nguyen Van Vinh; Thwaites, Guy; van Doorn, H. Rogier; Tan, Le Van
2017-01-01
As part of an ongoing effort to generate complete genome sequences of hand, foot and mouth disease-causing enteroviruses directly from clinical specimens, two complete coding sequences and two partial genomic sequences of human bocavirus 1 (n=3) and 2 (n=1) were co-amplified and sequenced, representing the first genome sequences of human bocaviruses from Vietnam. The sequences may aid future study aiming at understanding the evolution of the virus. PMID:28090592
Meeting the challenges of non-referenced genome assembly from short-read sequence data
M. Parks; A. Liston; R. Cronn
2010-01-01
Massively parallel sequencing technologies (MPST) offer unprecedented opportunities for novel sequencing projects. MPST, while offering tremendous sequencing capacity, are typically most effective in resequencing projects (as opposed to the sequencing of novel genomes) due to the fact that sequence is returned in relatively short reads. Nonetheless, there is great...
Experimental investigation of an RNA sequence space
NASA Technical Reports Server (NTRS)
Lee, Youn-Hyung; Dsouza, Lisa; Fox, George E.
1993-01-01
Modern rRNAs are the historic consequence of an ongoing evolutionary exploration of a sequence space. These extant sequences belong to a special subset of the sequence space that is comprised only of those primary sequences that can validly perform the biological function(s) required of the particular RNA. If it were possible to readily identify all such valid sequences, stochastic predictions could be made about the relative likelihood of various evolutionary pathways available to an RNA. Herein an experimental system which can assess whether a particular sequence is likely to have validity as a eubacterial 5S rRNA is described. A total of ten naturally occurring, and hence known to be valid, sequences and two point mutants of unknown validity were used to test the usefulness of the approach. Nine of the ten valid sequences tested positive whereas both mutants tested as clearly defective. The tenth valid sequence gave results that would be interpreted as reflecting a borderline status were the answer not known. These results demonstrate that it is possible to experimentally determine which sequences in local regions of the sequence space are potentially valid 5S rRNAs.
Verwey, Willem B; Lammens, Robin; van Honk, Jack
2002-01-01
Participants practiced two discrete six-key sequences for a total of 420 trials. The 1 x 6 sequence had a unique order of key presses while the 2 x 3 sequence involved repetition of a three-key segment. Both sequences showed a long interkey interval halfway the sequence indicating hierarchical sequence control in that not only the 2 x 3 but also the 1 x 6 sequence was executed as two successive motor chunks. Besides, the second part of both sequences was executed faster than the first part. This supports the earlier notion of a motor processor executing the elements of familiar motor chunks and a cognitive processor triggering either these motor chunks or individual sequence elements. Low-frequency, off-line transcranial magnetic stimulation (TMS) of the supplementary motor area (SMA) counteracted normal improvement with practice of key presses at all sequence positions. Together, these results are in line with the notion that with moderate practice, the SMA executes short sequence fragments that are concatenated by other brain structures.
First-order and higher order sequence learning in specific language impairment.
Clark, Gillian M; Lum, Jarrad A G
2017-02-01
A core claim of the procedural deficit hypothesis of specific language impairment (SLI) is that the disorder is associated with poor implicit sequence learning. This study investigated whether implicit sequence learning problems in SLI are present for first-order conditional (FOC) and higher order conditional (HOC) sequences. Twenty-five children with SLI and 27 age-matched, nonlanguage-impaired children completed 2 serial reaction time tasks. On 1 version, the sequence to be implicitly learnt comprised a FOC sequence and on the other a HOC sequence. Results showed that the SLI group learned the HOC sequence (η p ² = .285, p = .005) but not the FOC sequence (η p ² = .099, p = .118). The control group learned both sequences (FOC η p ² = .497, HOC η p 2= .465, ps < .001). The SLI group's difficulty learning the FOC sequence is consistent with the procedural deficit hypothesis. However, the study provides new evidence that multiple mechanisms may underpin the learning of FOC and HOC sequences. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Lucas, J.N.; Straume, T.; Bogen, K.T.
1998-03-24
A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. 14 figs.
Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.
1998-01-01
A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration.
Shiroguchi, Katsuyuki; Jia, Tony Z.; Sims, Peter A.; Xie, X. Sunney
2012-01-01
RNA sequencing (RNA-Seq) is a powerful tool for transcriptome profiling, but is hampered by sequence-dependent bias and inaccuracy at low copy numbers intrinsic to exponential PCR amplification. We developed a simple strategy for mitigating these complications, allowing truly digital RNA-Seq. Following reverse transcription, a large set of barcode sequences is added in excess, and nearly every cDNA molecule is uniquely labeled by random attachment of barcode sequences to both ends. After PCR, we applied paired-end deep sequencing to read the two barcodes and cDNA sequences. Rather than counting the number of reads, RNA abundance is measured based on the number of unique barcode sequences observed for a given cDNA sequence. We optimized the barcodes to be unambiguously identifiable, even in the presence of multiple sequencing errors. This method allows counting with single-copy resolution despite sequence-dependent bias and PCR-amplification noise, and is analogous to digital PCR but amendable to quantifying a whole transcriptome. We demonstrated transcriptome profiling of Escherichia coli with more accurate and reproducible quantification than conventional RNA-Seq. PMID:22232676
Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions
2014-01-01
Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads. PMID:24428920
Büssow, Konrad; Hoffmann, Steve; Sievert, Volker
2002-12-19
Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. A Java program was developed for retrieval of protein and nucleic acid sequences and annotations from NCBI GenBank, using the XML sequence format. Annotations retrieved by ORFer include sequence name, organism and also the completeness of the sequence. The program has a graphical user interface, although it can be used in a non-interactive mode. For protein sequences, the program also extracts the open reading frame sequence, if available, and checks its correct translation. ORFer accepts user input in the form of single or lists of GenBank GI identifiers or accession numbers. It can be used to extract complete sets of open reading frames and protein sequences from any kind of GenBank sequence entry, including complete genomes or chromosomes. Sequences are either stored with their features in a relational database or can be exported as text files in Fasta or tabulator delimited format. The ORFer program is freely available at http://www.proteinstrukturfabrik.de/orfer. The ORFer program allows for fast retrieval of DNA sequences, protein sequences and their open reading frames and sequence annotations from GenBank. Furthermore, storage of sequences and features in a relational database is supported. Such a database can supplement a laboratory information system (LIMS) with appropriate sequence information.
Targeted Re-Sequencing Emulsion PCR Panel for Myopathies: Results in 94 Cases.
Punetha, Jaya; Kesari, Akanchha; Uapinyoying, Prech; Giri, Mamta; Clarke, Nigel F; Waddell, Leigh B; North, Kathryn N; Ghaoui, Roula; O'Grady, Gina L; Oates, Emily C; Sandaradura, Sarah A; Bönnemann, Carsten G; Donkervoort, Sandra; Plotz, Paul H; Smith, Edward C; Tesi-Rocha, Carolina; Bertorini, Tulio E; Tarnopolsky, Mark A; Reitter, Bernd; Hausmanowa-Petrusewicz, Irena; Hoffman, Eric P
2016-05-27
Molecular diagnostics in the genetic myopathies often requires testing of the largest and most complex transcript units in the human genome (DMD, TTN, NEB). Iteratively targeting single genes for sequencing has traditionally entailed high costs and long turnaround times. Exome sequencing has begun to supplant single targeted genes, but there are concerns regarding coverage and needed depth of the very large and complex genes that frequently cause myopathies. To evaluate efficiency of next-generation sequencing technologies to provide molecular diagnostics for patients with previously undiagnosed myopathies. We tested a targeted re-sequencing approach, using a 45 gene emulsion PCR myopathy panel, with subsequent sequencing on the Illumina platform in 94 undiagnosed patients. We compared the targeted re-sequencing approach to exome sequencing for 10 of these patients studied. We detected likely pathogenic mutations in 33 out of 94 patients with a molecular diagnostic rate of approximately 35%. The remaining patients showed variants of unknown significance (35/94 patients) or no mutations detected in the 45 genes tested (26/94 patients). Mutation detection rates for targeted re-sequencing vs. whole exome were similar in both methods; however exome sequencing showed better distribution of reads and fewer exon dropouts. Given that costs of highly parallel re-sequencing and whole exome sequencing are similar, and that exome sequencing now takes considerably less laboratory processing time than targeted re-sequencing, we recommend exome sequencing as the standard approach for molecular diagnostics of myopathies.
Shotgun Protein Sequencing with Meta-contig Assembly*
Guthals, Adrian; Clauser, Karl R.; Bandeira, Nuno
2012-01-01
Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings. PMID:22798278
Shotgun protein sequencing with meta-contig assembly.
Guthals, Adrian; Clauser, Karl R; Bandeira, Nuno
2012-10-01
Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.
El-Sherry, Shiem; Ogedengbe, Mosun E; Hafeez, Mian A; Barta, John R
2013-07-01
Multiple 18S rDNA sequences were obtained from two single-oocyst-derived lines of each of Eimeria meleagrimitis and Eimeria adenoeides. After analysing the 15 new 18S rDNA sequences from two lines of E. meleagrimitis and 17 new sequences from two lines of E. adenoeides, there were clear indications that divergent, paralogous 18S rDNA copies existed within the nuclear genome of E. meleagrimitis. In contrast, mitochondrial cytochrome c oxidase subunit I (COI) partial sequences from all lines of a particular Eimeria sp. were identical and, in phylogenetic analyses, COI sequences clustered unambiguously in monophyletic and highly-supported clades specific to individual Eimeria sp. Phylogenetic analysis of the new 18S rDNA sequences from E. meleagrimitis showed that they formed two distinct clades: Type A with four new sequences; and Type B with nine new sequences; both Types A and B sequences were obtained from each of the single-oocyst-derived lines of E. meleagrimitis. Together these rDNA types formed a well-supported E. meleagrimitis clade. Types A and B 18S rDNA sequences from E. meleagrimitis had a mean sequence identity of only 97.4% whereas mean sequence identity within types was 99.1-99.3%. The observed intraspecific sequence divergence among E. meleagrimitis 18S rDNA sequence types was even higher (approximately 2.6%) than the interspecific sequence divergence present between some well-recognized species such as Eimeria tenella and Eimeria necatrix (1.1%). Our observations suggest that, unlike COI sequences, 18S rDNA sequences are not reliable molecular markers to be used alone for species identification with coccidia, although 18S rDNA sequences have clear utility for phylogenetic reconstruction of apicomplexan parasites at the genus and higher taxonomic ranks. Copyright © 2013. Published by Elsevier Ltd.
Existence of host-related DNA sequences in the schistosome genome.
Iwamura, Y; Irie, Y; Kominami, R; Nara, T; Yasuraoka, K
1991-06-01
DNA sequences homologous to the mouse intracisternal A particle and endogenous type C retrovirus were detected in the DNAs of Schistosoma japonicum adults and S. mansoni eggs. Furthermore, other kinds of repetitive sequences in the host genome such as mouse type 1 Alu sequence (B1), mouse type 2 Alu sequence (B2) and mo-2 sequence, a mouse mini-satellite, were also detected in the DNAs from adults and eggs of S. japonicum and eggs of S. mansoni. Almost all of the sequences described above were absent in the DNAs of S. mansoni adults. The DNA fingerprints of schistosomes, using the mo-2 sequence, were indistinguishable from each other and resembled those of their murine hosts. Moreover, the mo-2 sequence was hypermethylated in the DNAs of schistosomes and its amount was variable in them. These facts indicate that host-related sequences are actually present in schistosomes and that the mo-2 repetitive sequence exists probably in extra-chromosome.
The complete CDS of the prion protein (PRNP) gene of African lion (Panthera leo).
Maj, Andrzej; Spellman, Garth M; Sarver, Shane K
2008-04-01
We provide the complete PRNP CDS sequence for the African lion, which is different from the previously published sequence and more similar to other carnivore sequences. The newly obtained prion protein sequence differs from the domestic cat sequence at three amino acid positions and contains only four octapeptide repeats. We recommend that this sequence be used as the reference sequence for future studies of the PRNP gene for this species.
Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R. Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M.; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas
2011-01-01
Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi. PMID:21949797
The rapid evolution of molecular genetic diagnostics in neuromuscular diseases.
Volk, Alexander E; Kubisch, Christian
2017-10-01
The development of massively parallel sequencing (MPS) has revolutionized molecular genetic diagnostics in monogenic disorders. The present review gives a brief overview of different MPS-based approaches used in clinical diagnostics of neuromuscular disorders (NMDs) and highlights their advantages and limitations. MPS-based approaches like gene panel sequencing, (whole) exome sequencing, (whole) genome sequencing, and RNA sequencing have been used to identify the genetic cause in NMDs. Although gene panel sequencing has evolved as a standard test for heterogeneous diseases, it is still debated, mainly because of financial issues and unsolved problems of variant interpretation, whether genome sequencing (and to a lesser extent also exome sequencing) of single patients can already be regarded as routine diagnostics. However, it has been shown that the inclusion of parents and additional family members often leads to a substantial increase in the diagnostic yield in exome-wide/genome-wide MPS approaches. In addition, MPS-based RNA sequencing just enters the research and diagnostic scene. Next-generation sequencing increasingly enables the detection of the genetic cause in highly heterogeneous diseases like NMDs in an efficient and affordable way. Gene panel sequencing and family-based exome sequencing have been proven as potent and cost-efficient diagnostic tools. Although clinical validation and interpretation of genome sequencing is still challenging, diagnostic RNA sequencing represents a promising tool to bypass some hurdles of diagnostics using genomic DNA.
Brandon Schlautman; Vera Pfeiffer; Juan Zalapa; Johanne Brunet
2014-01-01
Numerous microsatellite markers were developed for Aquilegia formosafrom sequences deposited within the Expressed Sequence Tag (EST), Genomic Survey Sequence (GSS), and Nucleotide databases in NCBI. Microsatellites (SSRs) were identified and primers were designed for 9 SSR containing sequences in the Nucleotide database, 3803 sequences in the EST...
Sequence repeats and protein structure
NASA Astrophysics Data System (ADS)
Hoang, Trinh X.; Trovato, Antonio; Seno, Flavio; Banavar, Jayanth R.; Maritan, Amos
2012-11-01
Repeats are frequently found in known protein sequences. The level of sequence conservation in tandem repeats correlates with their propensities to be intrinsically disordered. We employ a coarse-grained model of a protein with a two-letter amino acid alphabet, hydrophobic (H) and polar (P), to examine the sequence-structure relationship in the realm of repeated sequences. A fraction of repeated sequences comprises a distinct class of bad folders, whose folding temperatures are much lower than those of random sequences. Imperfection in sequence repetition improves the folding properties of the bad folders while deteriorating those of the good folders. Our results may explain why nature has utilized repeated sequences for their versatility and especially to design functional proteins that are intrinsically unstructured at physiological temperatures.
The sequence of sequencers: The history of sequencing DNA.
Heather, James M; Chain, Benjamin
2016-01-01
Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
An improved model for whole genome phylogenetic analysis by Fourier transform.
Yin, Changchuan; Yau, Stephen S-T
2015-10-07
DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees and demonstrates that the improved DFT dissimilarity measure is an efficient and effective similarity measure of DNA sequences. Due to its high efficiency and accuracy, the proposed DFT similarity measure is successfully applied on phylogenetic analysis for individual genes and large whole bacterial genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.
Kit for detecting nucleic acid sequences using competitive hybridization probes
Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.
2001-01-01
A kit is provided for detecting a target nucleic acid sequence in a sample, the kit comprising: a first hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the first hybridization probe including a first complexing agent for forming a binding pair with a second complexing agent; and a second hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the first hybridization probe does not selectively hybridize, the second hybridization probe including a detectable marker; a third hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the third hybridization probe including the same detectable marker as the second hybridization probe; and a fourth hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the third hybridization probe does not selectively hybridize, the fourth hybridization probe including the first complexing agent for forming a binding pair with the second complexing agent; wherein the first and second hybridization probes are capable of simultaneously hybridizing to the target sequence and the third and fourth hybridization probes are capable of simultaneously hybridizing to the target sequence, the detectable marker is not present on the first or fourth hybridization probes and the first, second, third, and fourth hybridization probes each include a competitive nucleic acid sequence which is sufficiently complementary to a third portion of the target sequence that the competitive sequences of the first, second, third, and fourth hybridization probes compete with each other to hybridize to the third portion of the target sequence.
PFAAT version 2.0: a tool for editing, annotating, and analyzing multiple sequence alignments.
Caffrey, Daniel R; Dana, Paul H; Mathur, Vidhya; Ocano, Marco; Hong, Eun-Jong; Wang, Yaoyu E; Somaroo, Shyamal; Caffrey, Brian E; Potluri, Shobha; Huang, Enoch S
2007-10-11
By virtue of their shared ancestry, homologous sequences are similar in their structure and function. Consequently, multiple sequence alignments are routinely used to identify trends that relate to function. This type of analysis is particularly productive when it is combined with structural and phylogenetic analysis. Here we describe the release of PFAAT version 2.0, a tool for editing, analyzing, and annotating multiple sequence alignments. Support for multiple annotations is a key component of this release as it provides a framework for most of the new functionalities. The sequence annotations are accessible from the alignment and tree, where they are typically used to label sequences or hyperlink them to related databases. Sequence annotations can be created manually or extracted automatically from UniProt entries. Once a multiple sequence alignment is populated with sequence annotations, sequences can be easily selected and sorted through a sophisticated search dialog. The selected sequences can be further analyzed using statistical methods that explicitly model relationships between the sequence annotations and residue properties. Residue annotations are accessible from the alignment viewer and are typically used to designate binding sites or properties for a particular residue. Residue annotations are also searchable, and allow one to quickly select alignment columns for further sequence analysis, e.g. computing percent identities. Other features include: novel algorithms to compute sequence conservation, mapping conservation scores to a 3D structure in Jmol, displaying secondary structure elements, and sorting sequences by residue composition. PFAAT provides a framework whereby end-users can specify knowledge for a protein family in the form of annotation. The annotations can be combined with sophisticated analysis to test hypothesis that relate to sequence, structure and function.
Bryant, D A; de Lorimier, R; Lambert, D H; Dubbs, J M; Stirewalt, V L; Stevens, S E; Porter, R D; Tam, J; Jay, E
1985-01-01
The genes for the alpha- and beta-subunit apoproteins of allophycocyanin (AP) were isolated from the cyanelle genome of Cyanophora paradoxa and subjected to nucleotide sequence analysis. The AP beta-subunit apoprotein gene was localized to a 7.8-kilobase-pair Pst I restriction fragment from cyanelle DNA by hybridization with a tetradecameric oligonucleotide probe. Sequence analysis using that oligonucleotide and its complement as primers for the dideoxy chain-termination sequencing method confirmed the presence of both AP alpha- and beta-subunit genes on this restriction fragment. Additional oligonucleotide primers were synthesized as sequencing progressed and were used to determine rapidly the nucleotide sequence of a 1336-base-pair region of this cloned fragment. This strategy allowed the sequencing to be completed without a detailed restriction map and without extensive and time-consuming subcloning. The sequenced region contains two open reading frames whose deduced amino acid sequences are 81-85% homologous to cyanobacterial and red algal AP subunits whose amino acid sequences have been determined. The two open reading frames are in the same orientation and are separated by 39 base pairs. AP alpha is 5' to AP beta and both coding sequences are preceded by a polypurine, Shine-Dalgarno-type sequence. Sequences upstream from AP alpha closely resemble the Escherichia coli consensus promoter sequences and also show considerable homology to promoter sequences for several chloroplast-encoded psbA genes. A 56-base-pair palindromic sequence downstream from the AP beta gene could play a role in the termination of transcription or translation. The allophycocyanin apoprotein subunit genes are located on the large single-copy region of the cyanelle genome. PMID:2987916
Sequence dependent aggregation of peptides and fibril formation
NASA Astrophysics Data System (ADS)
Hung, Nguyen Ba; Le, Duy-Manh; Hoang, Trinh X.
2017-09-01
Deciphering the links between amino acid sequence and amyloid fibril formation is key for understanding protein misfolding diseases. Here we use Monte Carlo simulations to study the aggregation of short peptides in a coarse-grained model with hydrophobic-polar (HP) amino acid sequences and correlated side chain orientations for hydrophobic contacts. A significant heterogeneity is observed in the aggregate structures and in the thermodynamics of aggregation for systems of different HP sequences and different numbers of peptides. Fibril-like ordered aggregates are found for several sequences that contain the common HPH pattern, while other sequences may form helix bundles or disordered aggregates. A wide variation of the aggregation transition temperatures among sequences, even among those of the same hydrophobic fraction, indicates that not all sequences undergo aggregation at a presumable physiological temperature. The transition is found to be the most cooperative for sequences forming fibril-like structures. For a fibril-prone sequence, it is shown that fibril formation follows the nucleation and growth mechanism. Interestingly, a binary mixture of peptides of an aggregation-prone and a non-aggregation-prone sequence shows the association and conversion of the latter to the fibrillar structure. Our study highlights the role of a sequence in selecting fibril-like aggregates and also the impact of a structural template on fibril formation by peptides of unrelated sequences.
The primary structure of the Saccharomyces cerevisiae gene for 3-phosphoglycerate kinase.
Hitzeman, R A; Hagie, F E; Hayflick, J S; Chen, C Y; Seeburg, P H; Derynck, R
1982-01-01
The DNA sequence of the gene for the yeast glycolytic enzyme, 3-phosphoglycerate kinase (PGK), has been obtained by sequencing part of a 3.1 kbp HindIII fragment obtained from the yeast genome. The structural gene sequence corresponds to a reading frame of 1251 bp coding for 416 amino acids with no intervening DNA sequences. The amino acid sequence is approximately 65 percent homologous with human and horse PGK protein sequences and is in general agreement with the published protein sequence for yeast PGK. As for other highly expressed structural genes in yeast, the coding sequence is highly codon biased with 95 percent of the amino acids coded for by a select 25 codons (out of 61 possible). Besides structural DNA sequence, 291 bp of 5'-flanking sequence and 286 bp of 3'-flanking sequence were determined. Transcription starts 36 nucleotides upstream from the translational start and stops 86-93 nucleotides downstream from the translational stop. These results suggest a non-polyadenylated mRNA length of 1373 to 1380 nucleotides, which is consistent with the observed length of 1500 nucleotides for polyadenylated PGK mRNA. A sequence TATATATAAA is found at 145 nucleotides upstream from the translational start. This sequence resembles the TATAAA box that is possibly associated with RNA polymerase II binding. Images PMID:6296791
NASA Technical Reports Server (NTRS)
Wallace, G. R.; Weathers, G. D.; Graf, E. R.
1973-01-01
The statistics of filtered pseudorandom digital sequences called hybrid-sum sequences, formed from the modulo-two sum of several maximum-length sequences, are analyzed. The results indicate that a relation exists between the statistics of the filtered sequence and the characteristic polynomials of the component maximum length sequences. An analysis procedure is developed for identifying a large group of sequences with good statistical properties for applications requiring the generation of analog pseudorandom noise. By use of the analysis approach, the filtering process is approximated by the convolution of the sequence with a sum of unit step functions. A parameter reflecting the overall statistical properties of filtered pseudorandom sequences is derived. This parameter is called the statistical quality factor. A computer algorithm to calculate the statistical quality factor for the filtered sequences is presented, and the results for two examples of sequence combinations are included. The analysis reveals that the statistics of the signals generated with the hybrid-sum generator are potentially superior to the statistics of signals generated with maximum-length generators. Furthermore, fewer calculations are required to evaluate the statistics of a large group of hybrid-sum generators than are required to evaluate the statistics of the same size group of approximately equivalent maximum-length sequences.
Effects of the Ion PGM™ Hi-Q™ sequencing chemistry on sequence data quality.
Churchill, Jennifer D; King, Jonathan L; Chakraborty, Ranajit; Budowle, Bruce
2016-09-01
Massively parallel sequencing (MPS) offers substantial improvements over current forensic DNA typing methodologies such as increased resolution, scalability, and throughput. The Ion PGM™ is a promising MPS platform for analysis of forensic biological evidence. The system employs a sequencing-by-synthesis chemistry on a semiconductor chip that measures a pH change due to the release of hydrogen ions as nucleotides are incorporated into the growing DNA strands. However, implementation of MPS into forensic laboratories requires a robust chemistry. Ion Torrent's Hi-Q™ Sequencing Chemistry was evaluated to determine if it could improve on the quality of the generated sequence data in association with selected genetic marker targets. The whole mitochondrial genome and the HID-Ion STR 10-plex panel were sequenced on the Ion PGM™ system with the Ion PGM™ Sequencing 400 Kit and the Ion PGM™ Hi-Q™ Sequencing Kit. Concordance, coverage, strand balance, noise, and deletion ratios were assessed in evaluating the performance of the Ion PGM™ Hi-Q™ Sequencing Kit. The results indicate that reliable, accurate data are generated and that sequencing through homopolymeric regions can be improved with the use of Ion Torrent's Hi-Q™ Sequencing Chemistry. Overall, the quality of the generated sequencing data supports the potential for use of the Ion PGM™ in forensic genetic laboratories.
Fundamental Bounds for Sequence Reconstruction from Nanopore Sequencers.
Magner, Abram; Duda, Jarosław; Szpankowski, Wojciech; Grama, Ananth
2016-06-01
Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given indel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) using replicated extrusion (the process of passing a DNA strand through the nanopore), what is the number of replicas needed to accurately reconstruct the true sequence with high probability? Our results provide a number of important insights: (i) the probability of accurate reconstruction of a sequence from a single sample in the presence of indel errors tends quickly (i.e., exponentially) to zero as the length of the sequence increases; and (ii) replicated extrusion is an effective technique for accurate reconstruction. We show that for typical distributions of indel errors, the required number of replicas is a slow function (polylogarithmic) of sequence length - implying that through replicated extrusion, we can sequence large reads using nanopore sequencers. Moreover, we show that in certain cases, the required number of replicas can be related to information-theoretic parameters of the indel error distributions.
AlignMe—a membrane protein sequence alignment web server
Stamm, Marcus; Staritzbichler, René; Khafizov, Kamil; Forrest, Lucy R.
2014-01-01
We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe PMID:24753425
Quantitative phenotyping via deep barcode sequencing.
Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey
2009-10-01
Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.
Polypeptide having or assisting in carbohydrate material degrading activity and uses thereof
Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter
2016-02-16
The invention relates to a polypeptide which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having beta-glucosidase activity and uses thereof
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well asmore » the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.« less
Polypeptide having swollenin activity and uses thereof
Schoonneveld-Bergmans, Margot Elizabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica D; Damveld, Robbertus Antonius
2015-11-04
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko
2016-08-15
Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities. Copyright © 2016 Elsevier Ltd. All rights reserved.
Polypeptide having beta-glucosidase activity and uses thereof
Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel; Damveld, Robbertus Antonius
2015-09-01
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having cellobiohydrolase activity and uses thereof
Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter
2015-09-15
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having acetyl xylan esterase activity and uses thereof
Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter
2015-10-20
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having carbohydrate degrading activity and uses thereof
Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica Diana; Damveld, Robbertus Antonius
2015-08-18
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
GATA: A graphic alignment tool for comparative sequenceanalysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nix, David A.; Eisen, Michael B.
2005-01-01
Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dotmore » plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.« less
Deep Sequencing to Identify the Causes of Viral Encephalitis
Chan, Benjamin K.; Wilson, Theodore; Fischer, Kael F.; Kriesel, John D.
2014-01-01
Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue. PMID:24699691
Method for phosphorothioate antisense DNA sequencing by capillary electrophoresis with UV detection.
Froim, D; Hopkins, C E; Belenky, A; Cohen, A S
1997-11-01
The progress of antisense DNA therapy demands development of reliable and convenient methods for sequencing short single-stranded oligonucleotides. A method of phosphorothioate antisense DNA sequencing analysis using UV detection coupled to capillary electrophoresis (CE) has been developed based on a modified chain termination sequencing method. The proposed method reduces the sequencing cost since it uses affordable CE-UV instrumentation and requires no labeling with minimal sample processing before analysis. Cycle sequencing with ThermoSequenase generates quantities of sequencing products that are readily detectable by UV. Discrimination of undesired components from sequencing products in the reaction mixture, previously accomplished by fluorescent or radioactive labeling, is now achieved by bringing concentrations of undesired components below the UV detection range which yields a 'clean', well defined sequence. UV detection coupled with CE offers additional conveniences for sequencing since it can be accomplished with commercially available CE-UV equipment and is readily amenable to automation.
Method for phosphorothioate antisense DNA sequencing by capillary electrophoresis with UV detection.
Froim, D; Hopkins, C E; Belenky, A; Cohen, A S
1997-01-01
The progress of antisense DNA therapy demands development of reliable and convenient methods for sequencing short single-stranded oligonucleotides. A method of phosphorothioate antisense DNA sequencing analysis using UV detection coupled to capillary electrophoresis (CE) has been developed based on a modified chain termination sequencing method. The proposed method reduces the sequencing cost since it uses affordable CE-UV instrumentation and requires no labeling with minimal sample processing before analysis. Cycle sequencing with ThermoSequenase generates quantities of sequencing products that are readily detectable by UV. Discrimination of undesired components from sequencing products in the reaction mixture, previously accomplished by fluorescent or radioactive labeling, is now achieved by bringing concentrations of undesired components below the UV detection range which yields a 'clean', well defined sequence. UV detection coupled with CE offers additional conveniences for sequencing since it can be accomplished with commercially available CE-UV equipment and is readily amenable to automation. PMID:9336449
Orthogonal Polynomials Associated with Complementary Chain Sequences
NASA Astrophysics Data System (ADS)
Behera, Kiran Kumar; Sri Ranga, A.; Swaminathan, A.
2016-07-01
Using the minimal parameter sequence of a given chain sequence, we introduce the concept of complementary chain sequences, which we view as perturbations of chain sequences. Using the relation between these complementary chain sequences and the corresponding Verblunsky coefficients, the para-orthogonal polynomials and the associated Szegő polynomials are analyzed. Two illustrations, one involving Gaussian hypergeometric functions and the other involving Carathéodory functions are also provided. A connection between these two illustrations by means of complementary chain sequences is also observed.
The sequence measurement system of the IR camera
NASA Astrophysics Data System (ADS)
Geng, Ai-hui; Han, Hong-xia; Zhang, Hai-bo
2011-08-01
Currently, the IR cameras are broadly used in the optic-electronic tracking, optic-electronic measuring, fire control and optic-electronic countermeasure field, but the output sequence of the most presently applied IR cameras in the project is complex and the giving sequence documents from the leave factory are not detailed. Aiming at the requirement that the continuous image transmission and image procession system need the detailed sequence of the IR cameras, the sequence measurement system of the IR camera is designed, and the detailed sequence measurement way of the applied IR camera is carried out. The FPGA programming combined with the SignalTap online observation way has been applied in the sequence measurement system, and the precise sequence of the IR camera's output signal has been achieved, the detailed document of the IR camera has been supplied to the continuous image transmission system, image processing system and etc. The sequence measurement system of the IR camera includes CameraLink input interface part, LVDS input interface part, FPGA part, CameraLink output interface part and etc, thereinto the FPGA part is the key composed part in the sequence measurement system. Both the video signal of the CmaeraLink style and the video signal of LVDS style can be accepted by the sequence measurement system, and because the image processing card and image memory card always use the CameraLink interface as its input interface style, the output signal style of the sequence measurement system has been designed into CameraLink interface. The sequence measurement system does the IR camera's sequence measurement work and meanwhile does the interface transmission work to some cameras. Inside the FPGA of the sequence measurement system, the sequence measurement program, the pixel clock modification, the SignalTap file configuration and the SignalTap online observation has been integrated to realize the precise measurement to the IR camera. Te sequence measurement program written by the verilog language combining the SignalTap tool on line observation can count the line numbers in one frame, pixel numbers in one line and meanwhile account the line offset and row offset of the image. Aiming at the complex sequence of the IR camera's output signal, the sequence measurement system of the IR camera accurately measures the sequence of the project applied camera, supplies the detailed sequence document to the continuous system such as image processing system and image transmission system and gives out the concrete parameters of the fval, lval, pixclk, line offset and row offset. The experiment shows that the sequence measurement system of the IR camera can get the precise sequence measurement result and works stably, laying foundation for the continuous system.
USDA-ARS?s Scientific Manuscript database
Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...
USDA-ARS?s Scientific Manuscript database
Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...
Noncoding sequence classification based on wavelet transform analysis: part I
NASA Astrophysics Data System (ADS)
Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.
2017-09-01
DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.
Setoh, Yin Xiang; Amarilla, Alberto A; Peng, Nias Y; Slonchak, Andrii; Periasamy, Parthiban; Figueiredo, Luiz T M; Aquino, Victor H; Khromykh, Alexander A
2018-01-01
Rocio virus (ROCV) is an arbovirus belonging to the genus Flavivirus, family Flaviviridae. We present an updated sequence of ROCV strain SPH 34675 (GenBank: AY632542.4), the only available full genome sequence prior to this study. Using next-generation sequencing of the entire genome, we reveal substantial sequence variation from the prototype sequence, with 30 nucleotide differences amounting to 14 amino acid changes, as well as significant changes to predicted 3'UTR RNA structures. Our results present an updated and corrected sequence of a potential emerging human-virulent flavivirus uniquely indigenous to Brazil (GenBank: MF461639).
What is a melody? On the relationship between pitch and brightness of timbre.
Cousineau, Marion; Carcagno, Samuele; Demany, Laurent; Pressnitzer, Daniel
2013-01-01
Previous studies showed that the perceptual processing of sound sequences is more efficient when the sounds vary in pitch than when they vary in loudness. We show here that sequences of sounds varying in brightness of timbre are processed with the same efficiency as pitch sequences. The sounds used consisted of two simultaneous pure tones one octave apart, and the listeners' task was to make same/different judgments on pairs of sequences varying in length (one, two, or four sounds). In one condition, brightness of timbre was varied within the sequences by changing the relative level of the two pure tones. In other conditions, pitch was varied by changing fundamental frequency, or loudness was varied by changing the overall level. In all conditions, only two possible sounds could be used in a given sequence, and these two sounds were equally discriminable. When sequence length increased from one to four, discrimination performance decreased substantially for loudness sequences, but to a smaller extent for brightness sequences and pitch sequences. In the latter two conditions, sequence length had a similar effect on performance. These results suggest that the processes dedicated to pitch and brightness analysis, when probed with a sequence-discrimination task, share unexpected similarities.
Prefrontal neural correlates of memory for sequences.
Averbeck, Bruno B; Lee, Daeyeol
2007-02-28
The sequence of actions appropriate to solve a problem often needs to be discovered by trial and error and recalled in the future when faced with the same problem. Here, we show that when monkeys had to discover and then remember a sequence of decisions across trials, ensembles of prefrontal cortex neurons reflected the sequence of decisions the animal would make throughout the interval between trials. This signal could reflect either an explicit memory process or a sequence-planning process that begins far in advance of the actual sequence execution. This finding extended to error trials such that, when the neural activity during the intertrial interval specified the wrong sequence, the animal also attempted to execute an incorrect sequence. More specifically, we used a decoding analysis to predict the sequence the monkey was planning to execute at the end of the fore-period, just before sequence execution. When this analysis was applied to error trials, we were able to predict where in the sequence the error would occur, up to three movements into the future. This suggests that prefrontal neural activity can retain information about sequences between trials, and that regardless of whether information is remembered correctly or incorrectly, the prefrontal activity veridically reflects the animal's action plan.
Local alignment of two-base encoded DNA sequence
Homer, Nils; Merriman, Barry; Nelson, Stanley F
2009-01-01
Background DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. Results We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions. Conclusion The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data. PMID:19508732
Soares, René Arderius; Passaglia, Luciane Maria Pereira
2010-10-01
Bradyrhizobium elkanii is successfully used in the formulation of commercial inoculants and, together with B. japonicum, it fully supplies the plant nitrogen demands. Despite the similarity between B. japonicum and B. elkanii species, several works demonstrated genetic and physiological differences between them. In this work Representational Difference Analysis (RDA) was used for genomic comparison between B. elkanii SEMIA 587, a crop inoculant strain, and B. japonicum USDA 110, a reference strain. Two hundred sequences were obtained. From these, 46 sequences belonged exclusively to the genome of B. elkanii strain, and 154 showed similarity to sequences from B. japonicum genome. From the 46 sequences with no similarity to sequences from B. japonicum, 39 showed no similarity to sequences in public databases and seven showed similarity to sequences of genes coding for known proteins. These seven sequences were divided in three groups: similar to sequences from other Bradyrhizobium strains, similar to sequences from other nitrogen-fixing bacteria, and similar to sequences from non nitrogen-fixing bacteria. These new sequences could be used as DNA markers in order to investigate the rates of genetic material gain and loss in natural Bradyrhizobium strains.
Augmented brain function by coordinated reset stimulation with slowly varying sequences.
Zeitler, Magteld; Tass, Peter A
2015-01-01
Several brain disorders are characterized by abnormally strong neuronal synchrony. Coordinated Reset (CR) stimulation was developed to selectively counteract abnormal neuronal synchrony by desynchronization. For this, phase resetting stimuli are delivered to different subpopulations in a timely coordinated way. In neural networks with spike timing-dependent plasticity CR stimulation may eventually lead to an anti-kindling, i.e., an unlearning of abnormal synaptic connectivity and abnormal synchrony. The spatiotemporal sequence by which all stimulation sites are stimulated exactly once is called the stimulation site sequence, or briefly sequence. So far, in simulations, pre-clinical and clinical applications CR was applied either with fixed sequences or rapidly varying sequences (RVS). In this computational study we show that appropriate repetition of the sequence with occasional random switching to the next sequence may significantly improve the anti-kindling effect of CR. To this end, a sequence is applied many times before randomly switching to the next sequence. This new method is called SVS CR stimulation, i.e., CR with slowly varying sequences. In a neuronal network with strong short-range excitatory and weak long-range inhibitory dynamic couplings SVS CR stimulation turns out to be superior to CR stimulation with fixed sequences or RVS.
Augmented brain function by coordinated reset stimulation with slowly varying sequences
Zeitler, Magteld; Tass, Peter A.
2015-01-01
Several brain disorders are characterized by abnormally strong neuronal synchrony. Coordinated Reset (CR) stimulation was developed to selectively counteract abnormal neuronal synchrony by desynchronization. For this, phase resetting stimuli are delivered to different subpopulations in a timely coordinated way. In neural networks with spike timing-dependent plasticity CR stimulation may eventually lead to an anti-kindling, i.e., an unlearning of abnormal synaptic connectivity and abnormal synchrony. The spatiotemporal sequence by which all stimulation sites are stimulated exactly once is called the stimulation site sequence, or briefly sequence. So far, in simulations, pre-clinical and clinical applications CR was applied either with fixed sequences or rapidly varying sequences (RVS). In this computational study we show that appropriate repetition of the sequence with occasional random switching to the next sequence may significantly improve the anti-kindling effect of CR. To this end, a sequence is applied many times before randomly switching to the next sequence. This new method is called SVS CR stimulation, i.e., CR with slowly varying sequences. In a neuronal network with strong short-range excitatory and weak long-range inhibitory dynamic couplings SVS CR stimulation turns out to be superior to CR stimulation with fixed sequences or RVS. PMID:25873867
Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike
2018-01-01
ABSTRACT Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection. PMID:29564396
Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike; Khan, Arifa S
2018-01-01
Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection.
Memory for sequences of events impaired in typical aging.
Allen, Timothy A; Morris, Andrea M; Stark, Shauna M; Fortin, Norbert J; Stark, Craig E L
2015-03-01
Typical aging is associated with diminished episodic memory performance. To improve our understanding of the fundamental mechanisms underlying this age-related memory deficit, we previously developed an integrated, cross-species approach to link converging evidence from human and animal research. This novel approach focuses on the ability to remember sequences of events, an important feature of episodic memory. Unlike existing paradigms, this task is nonspatial, nonverbal, and can be used to isolate different cognitive processes that may be differentially affected in aging. Here, we used this task to make a comprehensive comparison of sequence memory performance between younger (18-22 yr) and older adults (62-86 yr). Specifically, participants viewed repeated sequences of six colored, fractal images and indicated whether each item was presented "in sequence" or "out of sequence." Several out of sequence probe trials were used to provide a detailed assessment of sequence memory, including: (i) repeating an item from earlier in the sequence ("Repeats"; e.g., AB A: DEF), (ii) skipping ahead in the sequence ("Skips"; e.g., AB D: DEF), and (iii) inserting an item from a different sequence into the same ordinal position ("Ordinal Transfers"; e.g., AB 3: DEF). We found that older adults performed as well as younger controls when tested on well-known and predictable sequences, but were severely impaired when tested using novel sequences. Importantly, overall sequence memory performance in older adults steadily declined with age, a decline not detected with other measures (RAVLT or BPS-O). We further characterized this deficit by showing that performance of older adults was severely impaired on specific probe trials that required detailed knowledge of the sequence (Skips and Ordinal Transfers), and was associated with a shift in their underlying mnemonic representation of the sequences. Collectively, these findings provide unambiguous evidence that the capacity to remember sequences of events is fundamentally affected by typical aging. © 2015 Allen et al.; Published by Cold Spring Harbor Laboratory Press.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Onda, M.; Kudo, S.; Fukuda, M.
Human glycophorin A, B, and E (GPA, GPB, and GPE) genes belong to a gene family located at the long arm of chromosome 4. These three genes are homologous from the 5'-flanking sequence to the Alu sequence, which is 1 kb downstream from the exon encoding the transmembrane domain. Analysis of the Alu sequence and flanking direct repeat sequences suggested that the GPA gene most closely resembles the ancestral gene, whereas the GPB and GPE gene arose by homologous recombination within the Alu sequence, acquiring 3' sequences from an unrelated precursor genomic segment. Here the authors describe the identification ofmore » this putative precursor genomic segment. A human genomic library was screened by using the sequence of the 3' region of the GPB gene as a probe. The genomic clones isolated were found to contain an Alu sequence that appeared to be involved in the recombination. Downstream from the Alu sequence, the nucleotide sequence of the precursor genomic segment is almost identical to that of the GPB or GPE gene. In contrast, the upstream sequence of the genomic segment differs entirely from that of the GPA, GPB, and GPE genes. Conservation of the direct repeats flanking the Alu sequence of the genomic segment strongly suggests that the sequence of this genomic segment has been maintained during evolution. This identified genomic segment was found to reside downstream from the GPA gene by both gene mapping and in situ chromosomal localization. The precursor genomic segment was also identified in the orangutan genome, which is known to lack GPB and GPE genes. These results indicate that one of the duplicated ancestral glycophorin genes acquired a unique 3' sequence by unequal crossing-over through its Alu sequence and the further downstream Alu sequence present in the duplicated gene. Further duplication and divergence of this gene yielded the GPB and GPE genes. 37 refs., 5 figs.« less
Browning, J.V.; Miller, K.G.; McLaughlin, P.P.; Edwards, L.E.; Kulpecz, A.A.; Powars, D.S.; Wade, B.S.; Feigenson, M.D.; Wright, J.D.
2009-01-01
The Eyreville core holes provide the first continuously cored record of postimpact sequences from within the deepest part of the central Chesapeake Bay impact crater. We analyzed the upper Eocene to Pliocene postimpact sediments from the Eyreville A and C core holes for lithology (semiquantitative measurements of grain size and composition), sequence stratigraphy, and chronostratigraphy. Age is based primarily on Sr isotope stratigraphy supplemented by biostratigraphy (dinocysts, nannofossils, and planktonic foraminifers); age resolution is approximately ??0.5 Ma for early Miocene sequences and approximately ??1.0 Ma for younger and older sequences. Eocene-lower Miocene sequences are subtle, upper middle to lower upper Miocene sequences are more clearly distinguished, and upper Miocene- Pliocene sequences display a distinct facies pattern within sequences. We recognize two upper Eocene, two Oligocene, nine Miocene, three Pliocene, and one Pleistocene sequence and correlate them with those in New Jersey and Delaware. The upper Eocene through Pleistocene strata at Eyreville record changes from: (1) rapidly deposited, extremely fi ne-grained Eocene strata that probably represent two sequences deposited in a deep (>200 m) basin; to (2) highly dissected Oligocene (two very thin sequences) to lower Miocene (three thin sequences) with a long hiatus; to (3) a thick, rapidly deposited (43-73 m/Ma), very fi ne-grained, biosiliceous middle Miocene (16.5-14 Ma) section divided into three sequences (V5-V3) deposited in middle neritic paleoenvironments; to (4) a 4.5-Ma-long hiatus (12.8-8.3 Ma); to (5) sandy, shelly upper Miocene to Pliocene strata (8.3-2.0 Ma) divided into six sequences deposited in shelf and shoreface environments; and, last, to (6) a sandy middle Pleistocene paralic sequence (~400 ka). The Eyreville cores thus record the fi lling of a deep impact-generated basin where the timing of sequence boundaries is heavily infl uenced by eustasy. ?? 2009 The Geological Society of America.
Are commercial providers a viable option for clinical bacterial sequencing?
Raven, Kathy; Blane, Beth; Churcher, Carol; Parkhill, Julian; Peacock, Sharon J
2018-04-05
Bacterial whole-genome sequencing in the clinical setting has the potential to bring major improvements to infection control and clinical practice. Sequencing instruments are not currently available in the majority of routine microbiology laboratories worldwide, but an alternative is to use external sequencing providers. To foster discussion around this we investigated whether send-out services were a viable option. Four providers offering MiSeq sequencing were selected based on cost and evaluated based on the service provided and sequence data quality. DNA was prepared from five methicillin-resistant Staphylococcus aureus (MRSA) isolates, four of which were investigated during a previously published outbreak in the UK together with a reference MRSA isolate (ST22 HO 5096 0412). Cost of sequencing per isolate ranged from £155 to £342 and turnaround times from DNA postage to arrival of sequence data ranged from 12 to 63 days. Comparison of commercially generated genomes against the original sequence data demonstrated very high concordance, with no more than one single nucleotide polymorphism (SNP) difference on core genome mapping between the original sequences and the new sequence for all four providers. Multilocus sequence type could not be assigned based on assembly for the two cheapest sequence providers due to fragmented assemblies probably caused by a lower output of sequence data per isolate. Our results indicate that external providers returned highly accurate genome data, but that improvements are required in turnaround time to make this a viable option for use in clinical practice.
Universal sequence map (USM) of arbitrary discrete sequences
2002-01-01
Background For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units. Results We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM), is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR). The latter enables the representation of 4 unit type sequences (like DNA) as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/. Conclusions USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules. PMID:11895567
Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai
2017-06-01
Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase.
Delong, Allison K; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W; Kantor, Rami
2012-08-01
Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature ( http://hivdb.Stanford.edu ). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1-2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or <0.5% or >15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences.
Effects of pre- and pro-sequence of thaumatin on the secretion by Pichia pastoris.
Ide, Nobuyuki; Masuda, Tetsuya; Kitabatake, Naofumi
2007-11-23
Thaumatin is a 22-kDa sweet-tasting protein containing eight disulfide bonds. When thaumatin is expressed in Pichia pastoris using the thaumatin cDNA fused with both the alpha-factor signal sequence and the Kex2 protease cleavage site from Saccharomyces cerevisiae, the N-terminal sequence of the secreted thaumatin molecule is not processed correctly. To examine the role of the thaumatin cDNA-encoded N-terminal pre-sequence and C-terminal pro-sequence on the processing of thaumatin and efficiency of thaumatin production in P. pastoris, four expression plasmids with different pre-sequence and pro-sequence were constructed and transformed into P. pastoris. The transformants containing pre-thaumatin gene that has the native plant signal, secreted thaumatin molecules in the medium. The N-terminal amino acid sequence of the secreted thaumatin molecule was processed correctly. The production yield of thaumatin was not affected by the C-terminal pro-sequence, and the pro-sequence was not processed in P. pastoris, indicating that pro-sequence is not necessary for thaumatin synthesis.
From Conventional to Next Generation Sequencing of Epstein-Barr Virus Genomes.
Kwok, Hin; Chiang, Alan Kwok Shing
2016-02-24
Genomic sequences of Epstein-Barr virus (EBV) have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS) and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.
Wang, Penghao; Wilson, Susan R
2013-01-01
Mass spectrometry-based protein identification is a very challenging task. The main identification approaches include de novo sequencing and database searching. Both approaches have shortcomings, so an integrative approach has been developed. The integrative approach firstly infers partial peptide sequences, known as tags, directly from tandem spectra through de novo sequencing, and then puts these sequences into a database search to see if a close peptide match can be found. However the current implementation of this integrative approach has several limitations. Firstly, simplistic de novo sequencing is applied and only very short sequence tags are used. Secondly, most integrative methods apply an algorithm similar to BLAST to search for exact sequence matches and do not accommodate sequence errors well. Thirdly, by applying these methods the integrated de novo sequencing makes a limited contribution to the scoring model which is still largely based on database searching. We have developed a new integrative protein identification method which can integrate de novo sequencing more efficiently into database searching. Evaluated on large real datasets, our method outperforms popular identification methods.
Metagenome assembly through clustering of next-generation sequencing data using protein sequences.
Sim, Mikang; Kim, Jaebum
2015-02-01
The study of environmental microbial communities, called metagenomics, has gained a lot of attention because of the recent advances in next-generation sequencing (NGS) technologies. Microbes play a critical role in changing their environments, and the mode of their effect can be solved by investigating metagenomes. However, the difficulty of metagenomes, such as the combination of multiple microbes and different species abundance, makes metagenome assembly tasks more challenging. In this paper, we developed a new metagenome assembly method by utilizing protein sequences, in addition to the NGS read sequences. Our method (i) builds read clusters by using mapping information against available protein sequences, and (ii) creates contig sequences by finding consensus sequences through probabilistic choices from the read clusters. By using simulated NGS read sequences from real microbial genome sequences, we evaluated our method in comparison with four existing assembly programs. We found that our method could generate relatively long and accurate metagenome assemblies, indicating that the idea of using protein sequences, as a guide for the assembly, is promising. Copyright © 2015 Elsevier B.V. All rights reserved.
Yin, Changchuan
2015-04-01
To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.
An efficient approach to BAC based assembly of complex genomes.
Visendi, Paul; Berkman, Paul J; Hayashi, Satomi; Golicz, Agnieszka A; Bayer, Philipp E; Ruperao, Pradeep; Hurgobin, Bhavna; Montenegro, Juan; Chan, Chon-Kit Kenneth; Staňková, Helena; Batley, Jacqueline; Šimková, Hana; Doležel, Jaroslav; Edwards, David
2016-01-01
There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes.
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.
Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D
2017-01-01
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.
Sequence memory based on coherent spin-interaction neural networks.
Xia, Min; Wong, W K; Wang, Zhijie
2014-12-01
Sequence information processing, for instance, the sequence memory, plays an important role on many functions of brain. In the workings of the human brain, the steady-state period is alterable. However, in the existing sequence memory models using heteroassociations, the steady-state period cannot be changed in the sequence recall. In this work, a novel neural network model for sequence memory with controllable steady-state period based on coherent spininteraction is proposed. In the proposed model, neurons fire collectively in a phase-coherent manner, which lets a neuron group respond differently to different patterns and also lets different neuron groups respond differently to one pattern. The simulation results demonstrating the performance of the sequence memory are presented. By introducing a new coherent spin-interaction sequence memory model, the steady-state period can be controlled by dimension parameters and the overlap between the input pattern and the stored patterns. The sequence storage capacity is enlarged by coherent spin interaction compared with the existing sequence memory models. Furthermore, the sequence storage capacity has an exponential relationship to the dimension of the neural network.
A Next-Generation Sequencing Primer—How Does It Work and What Can It Do?
Alekseyev, Yuriy O.; Fazeli, Roghayeh; Yang, Shi; Basran, Raveen; Miller, Nancy S.
2018-01-01
Next-generation sequencing refers to a high-throughput technology that determines the nucleic acid sequences and identifies variants in a sample. The technology has been introduced into clinical laboratory testing and produces test results for precision medicine. Since next-generation sequencing is relatively new, graduate students, medical students, pathology residents, and other physicians may benefit from a primer to provide a foundation about basic next-generation sequencing methods and applications, as well as specific examples where it has had diagnostic and prognostic utility. Next-generation sequencing technology grew out of advances in multiple fields to produce a sophisticated laboratory test with tremendous potential. Next-generation sequencing may be used in the clinical setting to look for specific genetic alterations in patients with cancer, diagnose inherited conditions such as cystic fibrosis, and detect and profile microbial organisms. This primer will review DNA sequencing technology, the commercialization of next-generation sequencing, and clinical uses of next-generation sequencing. Specific applications where next-generation sequencing has demonstrated utility in oncology are provided. PMID:29761157
Towards predicting the encoding capability of MR fingerprinting sequences.
Sommer, K; Amthor, T; Doneva, M; Koken, P; Meineke, J; Börnert, P
2017-09-01
Sequence optimization and appropriate sequence selection is still an unmet need in magnetic resonance fingerprinting (MRF). The main challenge in MRF sequence design is the lack of an appropriate measure of the sequence's encoding capability. To find such a measure, three different candidates for judging the encoding capability have been investigated: local and global dot-product-based measures judging dictionary entry similarity as well as a Monte Carlo method that evaluates the noise propagation properties of an MRF sequence. Consistency of these measures for different sequence lengths as well as the capability to predict actual sequence performance in both phantom and in vivo measurements was analyzed. While the dot-product-based measures yielded inconsistent results for different sequence lengths, the Monte Carlo method was in a good agreement with phantom experiments. In particular, the Monte Carlo method could accurately predict the performance of different flip angle patterns in actual measurements. The proposed Monte Carlo method provides an appropriate measure of MRF sequence encoding capability and may be used for sequence optimization. Copyright © 2017 Elsevier Inc. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Over the past decade, Next Generation Sequencing (NGS) technologies, also called deep sequencing, have continued to evolve, increasing capacity and lower the cost necessary for large genome sequencing projects. The one of the advantage of NGS platforms is the possibility to sequence the samples with...
New Sequences with Low Correlation and Large Family Size
NASA Astrophysics Data System (ADS)
Zeng, Fanxin
In direct-sequence code-division multiple-access (DS-CDMA) communication systems and direct-sequence ultra wideband (DS-UWB) radios, sequences with low correlation and large family size are important for reducing multiple access interference (MAI) and accepting more active users, respectively. In this paper, a new collection of families of sequences of length pn-1, which includes three constructions, is proposed. The maximum number of cyclically distinct families without GMW sequences in each construction is φ(pn-1)/n·φ(pm-1)/m, where p is a prime number, n is an even number, and n=2m, and these sequences can be binary or polyphase depending upon choice of the parameter p. In Construction I, there are pn distinct sequences within each family and the new sequences have at most d+2 nontrivial periodic correlation {-pm-1, -1, pm-1, 2pm-1,…,dpm-1}. In Construction II, the new sequences have large family size p2n and possibly take the nontrivial correlation values in {-pm-1, -1, pm-1, 2pm-1,…,(3d-4)pm-1}. In Construction III, the new sequences possess the largest family size p(d-1)n and have at most 2d correlation levels {-pm-1, -1,pm-1, 2pm-1,…,(2d-2)pm-1}. Three constructions are near-optimal with respect to the Welch bound because the values of their Welch-Ratios are moderate, WR_??_d, WR_??_3d-4 and WR_??_2d-2, respectively. Each family in Constructions I, II and III contains a GMW sequence. In addition, Helleseth sequences and Niho sequences are special cases in Constructions I and III, and their restriction conditions to the integers m and n, pm≠2 (mod 3) and n≅0 (mod 4), respectively, are removed in our sequences. Our sequences in Construction III include the sequences with Niho type decimation 3·2m-2, too. Finally, some open questions are pointed out and an example that illustrates the performance of these sequences is given.
Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment
2013-01-01
Background Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. Results In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Conclusion Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to identify conserved regions fast or even interactively using a standard PC. Our method has many potential applications such as finding characteristic signature sequences for families of organisms and studying conserved and variable regions in, for example, 16S rRNA. PMID:24564200
Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment.
Nagar, Anurag; Hahsler, Michael
2013-01-01
Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to identify conserved regions fast or even interactively using a standard PC. Our method has many potential applications such as finding characteristic signature sequences for families of organisms and studying conserved and variable regions in, for example, 16S rRNA.
Zopf, Agnes; Raim, Roman; Danzer, Martin; Niklas, Norbert; Spilka, Rita; Pröll, Johannes; Gabriel, Christian; Nechansky, Andreas; Roucka, Markus
2015-03-01
The detection of KRAS mutations in codons 12 and 13 is critical for anti-EGFR therapy strategies; however, only those methodologies with high sensitivity, specificity, and accuracy as well as the best cost and turnaround balance are suitable for routine daily testing. Here we compared the performance of compact sequencing using the novel hybcell technology with 454 next-generation sequencing (454-NGS), Sanger sequencing, and pyrosequencing, using an evaluation panel of 35 specimens. A total of 32 mutations and 10 wild-type cases were reported using 454-NGS as the reference method. Specificity ranged from 100% for Sanger sequencing to 80% for pyrosequencing. Sanger sequencing and hybcell-based compact sequencing achieved a sensitivity of 96%, whereas pyrosequencing had a sensitivity of 88%. Accuracy was 97% for Sanger sequencing, 85% for pyrosequencing, and 94% for hybcell-based compact sequencing. Quantitative results were obtained for 454-NGS and hybcell-based compact sequencing data, resulting in a significant correlation (r = 0.914). Whereas pyrosequencing and Sanger sequencing were not able to detect multiple mutated cell clones within one tumor specimen, 454-NGS and the hybcell-based compact sequencing detected multiple mutations in two specimens. Our comparison shows that the hybcell-based compact sequencing is a valuable alternative to state-of-the-art methodologies used for detection of clinically relevant point mutations.
tRNADB-CE: tRNA gene database well-timed in the era of big sequence data.
Abe, Takashi; Inokuchi, Hachiro; Yamada, Yuko; Muto, Akira; Iwasaki, Yuki; Ikemura, Toshimichi
2014-01-01
The tRNA gene data base curated by experts "tRNADB-CE" (http://trna.ie.niigata-u.ac.jp) was constructed by analyzing 1,966 complete and 5,272 draft genomes of prokaryotes, 171 viruses', 121 chloroplasts', and 12 eukaryotes' genomes plus fragment sequences obtained by metagenome studies of environmental samples. 595,115 tRNA genes in total, and thus two times of genes compiled previously, have been registered, for which sequence, clover-leaf structure, and results of sequence-similarity and oligonucleotide-pattern searches can be browsed. To provide collective knowledge with help from experts in tRNA researches, we added a column for enregistering comments to each tRNA. By grouping bacterial tRNAs with an identical sequence, we have found high phylogenetic preservation of tRNA sequences, especially at the phylum level. Since many species-unknown tRNAs from metagenomic sequences have sequences identical to those found in species-known prokaryotes, the identical sequence group (ISG) can provide phylogenetic markers to investigate the microbial community in an environmental ecosystem. This strategy can be applied to a huge amount of short sequences obtained from next-generation sequencers, as showing that tRNADB-CE is a well-timed database in the era of big sequence data. It is also discussed that batch-learning self-organizing-map with oligonucleotide composition is useful for efficient knowledge discovery from big sequence data.
Singh, Aditya; Bhatia, Prateek
2016-12-01
Sanger sequencing platforms, such as applied biosystems instruments, generate chromatogram files. Generally, for 1 region of a sequence, we use both forward and reverse primers to sequence that area, in that way, we have 2 sequences that need to be aligned and a consensus generated before mutation detection studies. This work is cumbersome and takes time, especially if the gene is large with many exons. Hence, we devised a rapid automated command system to filter, build, and align consensus sequences and also optionally extract exonic regions, translate them in all frames, and perform an amino acid alignment starting from raw sequence data within a very short time. In full capabilities of Automated Mutation Analysis Pipeline (ASAP), it is able to read "*.ab1" chromatogram files through command line interface, convert it to the FASTQ format, trim the low-quality regions, reverse-complement the reverse sequence, create a consensus sequence, extract the exonic regions using a reference exonic sequence, translate the sequence in all frames, and align the nucleic acid and amino acid sequences to reference nucleic acid and amino acid sequences, respectively. All files are created and can be used for further analysis. ASAP is available as Python 3.x executable at https://github.com/aditya-88/ASAP. The version described in this paper is 0.28.
TaxI: a software tool for DNA barcoding using distance methods
Steinke, Dirk; Vences, Miguel; Salzburger, Walter; Meyer, Axel
2005-01-01
DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding. PMID:16214755
Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius
Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.
2010-01-01
Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665
Multiplexed fragaria chloroplast genome sequencing
W. Njuguna; A. Liston; R. Cronn; N.V. Bassil
2010-01-01
A method to sequence multiple chloroplast genomes using ultra high throughput sequencing technologies was recently described. Complete chloroplast genome sequences can resolve phylogenetic relationships at low taxonomic levels and identify informative point mutations and indels. The objective of this research was to sequence multiple Fragaria...
Detection of a divergent variant of grapevine virus F by next-generation sequencing.
Molenaar, Nicholas; Burger, Johan T; Maree, Hans J
2015-08-01
The complete genome sequence of a South African isolate of grapevine virus F (GVF) is presented. It was first detected by metagenomic next-generation sequencing of field samples and validated through direct Sanger sequencing. The genome sequence of GVF isolate V5 consists of 7539 nucleotides and contains a poly(A) tail. It has a typical vitivirus genome arrangement that comprises five open reading frames (ORFs), which share only 88.96 % nucleotide sequence identity with the existing complete GVF genome sequence (JX105428).
van der Kuyl, A C; Kuiken, C L; Dekker, J T; Perizonius, W R; Goudsmit, J
1995-06-01
Monkey mummy bones and teeth originating from the North Saqqara Baboon Galleries (Egypt), soft tissue from a mummified baboon in a museum collection, and nineteenth/twentieth-century skin fragments from mangabeys were used for DNA extraction and PCR amplification of part of the mitochondrial 12S rRNA gene. Sequences aligning with the 12S rRNA gene were recovered but were only distantly related to contemporary monkey mitochondrial 12S rRNA sequences. However, many of these sequences were identical or closely related to human nuclear DNA sequences resembling mitochondrial 12S rRNA (isolated from a cell line depleted in mitochondria) and therefore have to be considered contamination. Subsequently in a separate study we were able to recover genuine mitochondrial 12S rRNA sequences from many extant species of nonhuman Old World primates and sequences closely resembling the human nuclear integrations. Analysis of all sequences by the neighbor-joining (NJ) method indicated that mitochondrial DNA sequences and their nuclear counterparts can be divided into two distinct clusters. One cluster contained all temporary cytoplasmic mitochondrial DNA sequences and approximately half of the monkey nuclear mitochondriallike sequences. A second cluster contained most human nuclear sequences and the other half of monkey nuclear sequences with a separate branch leading to human and gorilla mitochondrial and nuclear sequences. Sequences recovered from ancient materials were equally divided between the two clusters. These results constitute a warning for when working with ancient DNA or performing phylogenetic analysis using mitochondrial DNA as a target sequence: Nuclear counterparts of mitochondrial genes may lead to faulty interpretation of results.
Tins, B; Cassar-Pullicino, V; Haddaway, M; Nachtrab, U
2012-01-01
Objectives The bulk of spinal imaging is still performed with conventional two-dimensional sequences. This study assesses the suitability of three-dimensional sampling perfection with application-optimised contrasts using a different flip angle evolutions (SPACE) sequence for routine spinal imaging. Methods 62 MRI examinations of the spine were evaluated by 2 examiners in consensus for the depiction of anatomy and presence of artefact. We noted pathologies that might be missed using the SPACE sequence only or the SPACE and a sagittal T1 weighted sequence. The reference standards were sagittal and axial T1 weighted and T2 weighted sequences. At a later date the evaluation was repeated by one of the original examiners and an additional examiner. Results There was good agreement of the single evaluations and consensus evaluation for the conventional sequences: κ>0.8, confidence interval (CI)>0.6–1.0. For the SPACE sequence, depiction of anatomy was very good for 84% of cases, with high interobserver agreement, but there was poor interobserver agreement for other cases. For artefact assessment of SPACE, κ=0.92, CI=0.92–1.0. The SPACE sequence was superior to conventional sequences for depiction of anatomy and artefact resistance. The SPACE sequence occasionally missed bone marrow oedema. In conjunction with sagittal T1 weighted sequences, no abnormality was missed. The isotropic SPACE sequence was superior to conventional sequences in imaging difficult anatomy such as in scoliosis and spondylolysis. Conclusion The SPACE sequence allows excellent assessment of anatomy owing to high spatial resolution and resistance to artefact. The sensitivity for bone marrow abnormalities is limited. PMID:22374284
Tins, B; Cassar-Pullicino, V; Haddaway, M; Nachtrab, U
2012-08-01
The bulk of spinal imaging is still performed with conventional two-dimensional sequences. This study assesses the suitability of three-dimensional sampling perfection with application-optimised contrasts using a different flip angle evolutions (SPACE) sequence for routine spinal imaging. 62 MRI examinations of the spine were evaluated by 2 examiners in consensus for the depiction of anatomy and presence of artefact. We noted pathologies that might be missed using the SPACE sequence only or the SPACE and a sagittal T(1) weighted sequence. The reference standards were sagittal and axial T(1) weighted and T(2) weighted sequences. At a later date the evaluation was repeated by one of the original examiners and an additional examiner. There was good agreement of the single evaluations and consensus evaluation for the conventional sequences: κ>0.8, confidence interval (CI)>0.6-1.0. For the SPACE sequence, depiction of anatomy was very good for 84% of cases, with high interobserver agreement, but there was poor interobserver agreement for other cases. For artefact assessment of SPACE, κ=0.92, CI=0.92-1.0. The SPACE sequence was superior to conventional sequences for depiction of anatomy and artefact resistance. The SPACE sequence occasionally missed bone marrow oedema. In conjunction with sagittal T(1) weighted sequences, no abnormality was missed. The isotropic SPACE sequence was superior to conventional sequences in imaging difficult anatomy such as in scoliosis and spondylolysis. The SPACE sequence allows excellent assessment of anatomy owing to high spatial resolution and resistance to artefact. The sensitivity for bone marrow abnormalities is limited.
Alachiotis, Nikolaos; Vogiatzi, Emmanouella; Pavlidis, Pavlos; Stamatakis, Alexandros
2013-01-01
Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined) characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a per-peak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG), an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA) for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability. The guided MSA assembly procedure in ChromatoGate detects chromatogram peaks of all characters in an alignment that lead to polymorphic sites, given a user-defined threshold. The threshold value represents the sensitivity of the sequencing error detection mechanism. After this pre-filtering, the user only needs to inspect a small number of peaks in every chromatogram to correct sequencing errors. Finally, we show that correcting sequencing errors is important, because population genetic and phylogenetic inferences can be misled by MSAs with uncorrected mis-calls. Our experiments indicate that estimates of population mutation rates can be affected two- to three-fold by uncorrected errors. PMID:24688709
Alachiotis, Nikolaos; Vogiatzi, Emmanouella; Pavlidis, Pavlos; Stamatakis, Alexandros
2013-01-01
Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined) characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a per-peak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG), an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA) for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability. The guided MSA assembly procedure in ChromatoGate detects chromatogram peaks of all characters in an alignment that lead to polymorphic sites, given a user-defined threshold. The threshold value represents the sensitivity of the sequencing error detection mechanism. After this pre-filtering, the user only needs to inspect a small number of peaks in every chromatogram to correct sequencing errors. Finally, we show that correcting sequencing errors is important, because population genetic and phylogenetic inferences can be misled by MSAs with uncorrected mis-calls. Our experiments indicate that estimates of population mutation rates can be affected two- to three-fold by uncorrected errors.
Chiu, Kuo Ping; Wong, Chee-Hong; Chen, Qiongyu; Ariyaratne, Pramila; Ooi, Hong Sain; Wei, Chia-Lin; Sung, Wing-Kin Ken; Ruan, Yijun
2006-08-25
We recently developed the Paired End diTag (PET) strategy for efficient characterization of mammalian transcriptomes and genomes. The paired end nature of short PET sequences derived from long DNA fragments raised a new set of bioinformatics challenges, including how to extract PETs from raw sequence reads, and correctly yet efficiently map PETs to reference genome sequences. To accommodate and streamline data analysis of the large volume PET sequences generated from each PET experiment, an automated PET data process pipeline is desirable. We designed an integrated computation program package, PET-Tool, to automatically process PET sequences and map them to the genome sequences. The Tool was implemented as a web-based application composed of four modules: the Extractor module for PET extraction; the Examiner module for analytic evaluation of PET sequence quality; the Mapper module for locating PET sequences in the genome sequences; and the Project Manager module for data organization. The performance of PET-Tool was evaluated through the analyses of 2.7 million PET sequences. It was demonstrated that PET-Tool is accurate and efficient in extracting PET sequences and removing artifacts from large volume dataset. Using optimized mapping criteria, over 70% of quality PET sequences were mapped specifically to the genome sequences. With a 2.4 GHz LINUX machine, it takes approximately six hours to process one million PETs from extraction to mapping. The speed, accuracy, and comprehensiveness have proved that PET-Tool is an important and useful component in PET experiments, and can be extended to accommodate other related analyses of paired-end sequences. The Tool also provides user-friendly functions for data quality check and system for multi-layer data management.
Making sense of deep sequencing
Goldman, D.; Domschke, K.
2016-01-01
This review, the first of an occasional series, tries to make sense of the concepts and uses of deep sequencing of polynucleic acids (DNA and RNA). Deep sequencing, synonymous with next-generation sequencing, high-throughput sequencing and massively parallel sequencing, includes whole genome sequencing but is more often and diversely applied to specific parts of the genome captured in different ways, for example the highly expressed portion of the genome known as the exome and portions of the genome that are epigenetically marked either by DNA methylation, the binding of proteins including histones, or that are in different configurations and thus more or less accessible to enzymes that cleave DNA. Deep sequencing of RNA (RNASeq) reverse-transcribed to complementary DNA is invaluable for measuring RNA expression and detecting changes in RNA structure. Important concepts in deep sequencing include the length and depth of sequence reads, mapping and assembly of reads, sequencing error, haplotypes, and the propensity of deep sequencing, as with other types of ‘big data’, to generate large numbers of errors, requiring monitoring for methodologic biases and strategies for replication and validation. Deep sequencing yields a unique genetic fingerprint that can be used to identify a person, and a trove of predictors of genetic medical diseases. Deep sequencing to identify epigenetic events including changes in DNA methylation and RNA expression can reveal the history and impact of environmental exposures. Because of the power of sequencing to identify and deliver biomedically significant information about a person and their blood relatives, it creates ethical dilemmas and practical challenges in research and clinical care, for example the decision and procedures to report incidental findings that will increasingly and frequently be discovered. PMID:24925306
A statistical method for the detection of variants from next-generation resequencing of DNA pools.
Bansal, Vikas
2010-06-15
Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing. We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80-85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3-5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP. Implementation of this method is available at http://polymorphism.scripps.edu/~vbansal/software/CRISP/.
Sequence modelling and an extensible data model for genomic database
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Peter Wei-Der
1992-01-01
The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data modelmore » that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.« less
Sequence modelling and an extensible data model for genomic database
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Peter Wei-Der
1992-01-01
The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data modelmore » that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.« less
Mavromatis, Konstantinos; Land, Miriam L; Brettin, Thomas S; Quest, Daniel J; Copeland, Alex; Clum, Alicia; Goodwin, Lynne; Woyke, Tanja; Lapidus, Alla; Klenk, Hans Peter; Cottingham, Robert W; Kyrpides, Nikos C
2012-01-01
The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).
Olson, Nathan D.; Lund, Steven P.; Zook, Justin M.; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S.; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B.
2015-01-01
This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030
NASA Technical Reports Server (NTRS)
Springer, E.; Sachs, M. S.; Woese, C. R.; Boone, D. R.
1995-01-01
Representatives of the family Methanosarcinaceae were analyzed phylogenetically by comparing partial sequences of their methyl-coenzyme M reductase (mcrI) genes. A 490-bp fragment from the A subunit of the gene was selected, amplified by the PCR, cloned, and sequenced for each of 25 strains belonging to the Methanosarcinaceae. The sequences obtained were aligned with the corresponding portions of five previously published sequences, and all of the sequences were compared to determine phylogenetic distances by Fitch distance matrix methods. We prepared analogous trees based on 16S rRNA sequences; these trees corresponded closely to the mcrI trees, although the mcrI sequences of pairs of organisms had 3.01 +/- 0.541 times more changes than the respective pairs of 16S rRNA sequences, suggesting that the mcrI fragment evolved about three times more rapidly than the 16S rRNA gene. The qualitative similarity of the mcrI and 16S rRNA trees suggests that transfer of genetic information between dissimilar organisms has not significantly affected these sequences, although we found inconsistencies between some mcrI distances that we measured and and previously published DNA reassociation data. It is unlikely that multiple mcrI isogenes were present in the organisms that we examined, because we found no major discrepancies in multiple determinations of mcrI sequences from the same organism. Our primers for the PCR also match analogous sites in the previously published mcrII sequences, but all of the sequences that we obtained from members of the Methanosarcinaceae were more closely related to mcrI sequences than to mcrII sequences, suggesting that members of the Methanosarcinaceae do not have distinct mcrII genes.
Palzkill, T G; Oliver, S G; Newlon, C S
1986-01-01
Four fragments of Saccharomyces cerevisiae chromosome III DNA which carry ARS elements have been sequenced. Each fragment contains multiple copies of sequences that have at least 10 out of 11 bases of homology to a previously reported 11 bp core consensus sequence. A survey of these new ARS sequences and previously reported sequences revealed the presence of an additional 11 bp conserved element located on the 3' side of the T-rich strand of the core consensus. Subcloning analysis as well as deletion and transposon insertion mutagenesis of ARS fragments support a role for 3' conserved sequence in promoting ARS activity. PMID:3529036
Single-cell genomic sequencing using Multiple Displacement Amplification.
Lasken, Roger S
2007-10-01
Single microbial cells can now be sequenced using DNA amplified by the Multiple Displacement Amplification (MDA) reaction. The few femtograms of DNA in a bacterium are amplified into micrograms of high molecular weight DNA suitable for DNA library construction and Sanger sequencing. The MDA-generated DNA also performs well when used directly as template for pyrosequencing by the 454 Life Sciences method. While MDA from single cells loses some of the genomic sequence, this approach will greatly accelerate the pace of sequencing from uncultured microbes. The genetically linked sequences from single cells are also a powerful tool to be used in guiding genomic assembly of shotgun sequences of multiple organisms from environmental DNA extracts (metagenomic sequences).
Suckau, Detlev; Resemann, Anja
2009-12-01
The ability to match Top-Down protein sequencing (TDS) results by MALDI-TOF to protein sequences by classical protein database searching was evaluated in this work. Resulting from these analyses were the protein identity, the simultaneous assignment of the N- and C-termini and protein sequences of up to 70 residues from either terminus. In combination with de novo sequencing using the MALDI-TDS data, even fusion proteins were assigned and the detailed sequence around the fusion site was elucidated. MALDI-TDS allowed to efficiently match protein sequences quickly and to validate recombinant protein structures-in particular, protein termini-on the level of undigested proteins.
Computational analysis of sequence selection mechanisms.
Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron
2004-04-01
Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.
Nonparametric Combinatorial Sequence Models
NASA Astrophysics Data System (ADS)
Wauthier, Fabian L.; Jordan, Michael I.; Jojic, Nebojsa
This work considers biological sequences that exhibit combinatorial structures in their composition: groups of positions of the aligned sequences are "linked" and covary as one unit across sequences. If multiple such groups exist, complex interactions can emerge between them. Sequences of this kind arise frequently in biology but methodologies for analyzing them are still being developed. This paper presents a nonparametric prior on sequences which allows combinatorial structures to emerge and which induces a posterior distribution over factorized sequence representations. We carry out experiments on three sequence datasets which indicate that combinatorial structures are indeed present and that combinatorial sequence models can more succinctly describe them than simpler mixture models. We conclude with an application to MHC binding prediction which highlights the utility of the posterior distribution induced by the prior. By integrating out the posterior our method compares favorably to leading binding predictors.
Real-Time DNA Sequencing in the Antarctic Dry Valleys Using the Oxford Nanopore Sequencer
Johnson, Sarah S.; Zaikova, Elena; Goerlitz, David S.; Bai, Yu; Tighe, Scott W.
2017-01-01
The ability to sequence DNA outside of the laboratory setting has enabled novel research questions to be addressed in the field in diverse areas, ranging from environmental microbiology to viral epidemics. Here, we demonstrate the application of offline DNA sequencing of environmental samples using a hand-held nanopore sequencer in a remote field location: the McMurdo Dry Valleys, Antarctica. Sequencing was performed using a MK1B MinION sequencer from Oxford Nanopore Technologies (ONT; Oxford, United Kingdom) that was equipped with software to operate without internet connectivity. One-direction (1D) genomic libraries were prepared using portable field techniques on DNA isolated from desiccated microbial mats. By adequately insulating the sequencer and laptop, it was possible to run the sequencing protocol for up to 2½ h under arduous conditions. PMID:28337073
WebLogo: A Sequence Logo Generator
Crooks, Gavin E.; Hon, Gary; Chandonia, John-Marc; Brenner, Steven E.
2004-01-01
WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment. Sequence logos provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive. Each logo consists of stacks of letters, one stack for each position in the sequence. The overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. WebLogo has been enhanced recently with additional features and options, to provide a convenient and highly configurable sequence logo generator. A command line interface and the complete, open WebLogo source code are available for local installation and customization. PMID:15173120
Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding.
Lan, Freeman; Demaree, Benjamin; Ahmed, Noorsher; Abate, Adam R
2017-07-01
The application of single-cell genome sequencing to large cell populations has been hindered by technical challenges in isolating single cells during genome preparation. Here we present single-cell genomic sequencing (SiC-seq), which uses droplet microfluidics to isolate, fragment, and barcode the genomes of single cells, followed by Illumina sequencing of pooled DNA. We demonstrate ultra-high-throughput sequencing of >50,000 cells per run in a synthetic community of Gram-negative and Gram-positive bacteria and fungi. The sequenced genomes can be sorted in silico based on characteristic sequences. We use this approach to analyze the distributions of antibiotic-resistance genes, virulence factors, and phage sequences in microbial communities from an environmental sample. The ability to routinely sequence large populations of single cells will enable the de-convolution of genetic heterogeneity in diverse cell populations.
A measurement of disorder in binary sequences
NASA Astrophysics Data System (ADS)
Gong, Longyan; Wang, Haihong; Cheng, Weiwen; Zhao, Shengmei
2015-03-01
We propose a complex quantity, AL, to characterize the degree of disorder of L-length binary symbolic sequences. As examples, we respectively apply it to typical random and deterministic sequences. One kind of random sequences is generated from a periodic binary sequence and the other is generated from the logistic map. The deterministic sequences are the Fibonacci and Thue-Morse sequences. In these analyzed sequences, we find that the modulus of AL, denoted by |AL | , is a (statistically) equivalent quantity to the Boltzmann entropy, the metric entropy, the conditional block entropy and/or other quantities, so it is a useful quantitative measure of disorder. It can be as a fruitful index to discern which sequence is more disordered. Moreover, there is one and only one value of |AL | for the overall disorder characteristics. It needs extremely low computational costs. It can be easily experimentally realized. From all these mentioned, we believe that the proposed measure of disorder is a valuable complement to existing ones in symbolic sequences.
Quantitative phenotyping via deep barcode sequencing
Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey
2009-01-01
Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793
Resurgence of Integrated Behavioral Units
Bachá-Méndez, Gustavo; Reid, Alliston K; Mendoza-Soylovna, Adela
2007-01-01
Two experiments with rats examined the dynamics of well-learned response sequences when reinforcement contingencies were changed. Both experiments contained four phases, each of which reinforced a 2-response sequence of lever presses until responding was stable. The contingencies then were shifted to a new reinforced sequence until responding was again stable. Extinction-induced resurgence of previously reinforced, and then extinguished, heterogeneous response sequences was observed in all subjects in both experiments. These sequences were demonstrated to be integrated behavioral units, controlled by processes acting at the level of the entire sequence. Response-level processes were also simultaneously operative. Errors in sequence production were strongly influenced by the terminal, not the initial, response in the currently reinforced sequence, but not by the previously reinforced sequence. These studies demonstrate that sequence-level and response-level processes can operate simultaneously in integrated behavioral units. Resurgence and the development of integrated behavioral units may be dissociated; thus the observation of one does not necessarily imply the other. PMID:17345948
Effect of Next-Generation Exome Sequencing Depth for Discovery of Diagnostic Variants.
Kim, Kyung; Seong, Moon-Woo; Chung, Won-Hyong; Park, Sung Sup; Leem, Sangseob; Park, Won; Kim, Jihyun; Lee, KiYoung; Park, Rae Woong; Kim, Namshin
2015-06-01
Sequencing depth, which is directly related to the cost and time required for the generation, processing, and maintenance of next-generation sequencing data, is an important factor in the practical utilization of such data in clinical fields. Unfortunately, identifying an exome sequencing depth adequate for clinical use is a challenge that has not been addressed extensively. Here, we investigate the effect of exome sequencing depth on the discovery of sequence variants for clinical use. Toward this, we sequenced ten germ-line blood samples from breast cancer patients on the Illumina platform GAII(x) at a high depth of ~200×. We observed that most function-related diverse variants in the human exonic regions could be detected at a sequencing depth of 120×. Furthermore, investigation using a diagnostic gene set showed that the number of clinical variants identified using exome sequencing reached a plateau at an average sequencing depth of about 120×. Moreover, the phenomena were consistent across the breast cancer samples.
Carbohydrate degrading polypeptide and uses thereof
Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter
2015-10-20
The invention relates to a polypeptide having carbohydrate material degrading activity which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 4, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional protein and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
High-Resolution Sequence-Function Mapping of Full-Length Proteins
Kowalsky, Caitlin A.; Klesmith, Justin R.; Stapleton, James A.; Kelly, Vince; Reichkitzer, Nolan; Whitehead, Timothy A.
2015-01-01
Comprehensive sequence-function mapping involves detailing the fitness contribution of every possible single mutation to a gene by comparing the abundance of each library variant before and after selection for the phenotype of interest. Deep sequencing of library DNA allows frequency reconstruction for tens of thousands of variants in a single experiment, yet short read lengths of current sequencers makes it challenging to probe genes encoding full-length proteins. Here we extend the scope of sequence-function maps to entire protein sequences with a modular, universal sequence tiling method. We demonstrate the approach with both growth-based selections and FACS screening, offer parameters and best practices that simplify design of experiments, and present analytical solutions to normalize data across independent selections. Using this protocol, sequence-function maps covering full sequences can be obtained in four to six weeks. Best practices introduced in this manuscript are fully compatible with, and complementary to, other recently published sequence-function mapping protocols. PMID:25790064
On the joint spectral density of bivariate random sequences. Thesis Technical Report No. 21
NASA Technical Reports Server (NTRS)
Aalfs, David D.
1995-01-01
For univariate random sequences, the power spectral density acts like a probability density function of the frequencies present in the sequence. This dissertation extends that concept to bivariate random sequences. For this purpose, a function called the joint spectral density is defined that represents a joint probability weighing of the frequency content of pairs of random sequences. Given a pair of random sequences, the joint spectral density is not uniquely determined in the absence of any constraints. Two approaches to constraining the sequences are suggested: (1) assume the sequences are the margins of some stationary random field, (2) assume the sequences conform to a particular model that is linked to the joint spectral density. For both approaches, the properties of the resulting sequences are investigated in some detail, and simulation is used to corroborate theoretical results. It is concluded that under either of these two constraints, the joint spectral density can be computed from the non-stationary cross-correlation.
Applications of Single-Cell Sequencing for Multiomics.
Xu, Yungang; Zhou, Xiaobo
2018-01-01
Single-cell sequencing interrogates the sequence or chromatin information from individual cells with advanced next-generation sequencing technologies. It provides a higher resolution of cellular differences and a better understanding of the underlying genetic and epigenetic mechanisms of an individual cell in the context of its survival and adaptation to microenvironment. However, it is more challenging to perform single-cell sequencing and downstream data analysis, owing to the minimal amount of starting materials, sample loss, and contamination. In addition, due to the picogram level of the amount of nucleic acids used, heavy amplification is often needed during sample preparation of single-cell sequencing, resulting in the uneven coverage, noise, and inaccurate quantification of sequencing data. All these unique properties raise challenges in and thus high demands for computational methods that specifically fit single-cell sequencing data. We here comprehensively survey the current strategies and challenges for multiple single-cell sequencing, including single-cell transcriptome, genome, and epigenome, beginning with a brief introduction to multiple sequencing techniques for single cells.
Innovative /ye/ and /we/ sequences in recent loans in Japanese
NASA Astrophysics Data System (ADS)
Vance, Timothy; Matsugu, Yuka
2005-04-01
The GV sequences /ye/ and /we/ do not occur in Japanese except perhaps in recent loans. Katakana spellings of the relevant loans in authoritative dictionaries are inconsistent, and it is not clear whether native speakers treat them as containing the GV sequences /ye/ and /we/ or as containing the VV sequences /ie/ and /ue/. Native speakers of Japanese with minimal exposure to spoken English were recorded producing some relevant loans in response to picture prompts. The same speakers were also recorded producing some native words containing uncontroversial /ie/ and /ue/ sequences. All the productions are being analyzed acoustically to determine whether they show the expected contrast between GV and VV sequences. A VV sequence is disyllabic (and bimoraic) and should therefore have greater duration and more gradual formant movements than a monosyllabic (and monomoraic) GV sequence. Utterance-initially, a VV sequence should have a LH pitch pattern and should be preceded by a nondistinctive glottal stop, whereas a GV sequence should have a H pitch pattern and should have smooth onset.
[The principle and application of the single-molecule real-time sequencing technology].
Yanhu, Liu; Lu, Wang; Li, Yu
2015-03-01
Last decade witnessed the explosive development of the third-generation sequencing strategy, including single-molecule real-time sequencing (SMRT), true single-molecule sequencing (tSMSTM) and the single-molecule nanopore DNA sequencing. In this review, we summarize the principle, performance and application of the SMRT sequencing technology. Compared with the traditional Sanger method and the next-generation sequencing (NGS) technologies, the SMRT approach has several advantages, including long read length, high speed, PCR-free and the capability of direct detection of epigenetic modifications. However, the disadvantage of its low accuracy, most of which resulted from insertions and deletions, is also notable. So, the raw sequence data need to be corrected before assembly. Up to now, the SMRT is a good fit for applications in the de novo genomic sequencing and the high-quality assemblies of small genomes. In the future, it is expected to play an important role in epigenetics, transcriptomic sequencing, and assemblies of large genomes.
Life Cycle Evolution and Systematics of Campanulariid Hydrozoans
2004-09-01
kit according to manufacturer’s protocol. Purified PCR product was cycle-sequenced using either Big Dye 2 or 3 sequencing chemistry (ABI), following...ethidium bromide and purified with PCR purification kits (Qiagen). Purified products were cycle- sequenced with either Big Dye 2 or 3 sequencing chemistry...PCR purification kit (Qiagen). The purified product was cycle-sequenced using Big Dye 2 sequencing chemistry (ABI) following the manufacturer’s
What is a melody? On the relationship between pitch and brightness of timbre
Cousineau, Marion; Carcagno, Samuele; Demany, Laurent; Pressnitzer, Daniel
2014-01-01
Previous studies showed that the perceptual processing of sound sequences is more efficient when the sounds vary in pitch than when they vary in loudness. We show here that sequences of sounds varying in brightness of timbre are processed with the same efficiency as pitch sequences. The sounds used consisted of two simultaneous pure tones one octave apart, and the listeners’ task was to make same/different judgments on pairs of sequences varying in length (one, two, or four sounds). In one condition, brightness of timbre was varied within the sequences by changing the relative level of the two pure tones. In other conditions, pitch was varied by changing fundamental frequency, or loudness was varied by changing the overall level. In all conditions, only two possible sounds could be used in a given sequence, and these two sounds were equally discriminable. When sequence length increased from one to four, discrimination performance decreased substantially for loudness sequences, but to a smaller extent for brightness sequences and pitch sequences. In the latter two conditions, sequence length had a similar effect on performance. These results suggest that the processes dedicated to pitch and brightness analysis, when probed with a sequence-discrimination task, share unexpected similarities. PMID:24478638
Jacob, Jacob H; Hussein, Emad I; Shakhatreh, Muhamad Ali K; Cornelison, Christopher T
2017-10-01
Amplicon sequencing using next-generation technology (bTEFAP ® ) has been utilized in describing the diversity of Dead Sea microbiota. The investigated area is a well-known salt lake in the western part of Jordan found in the lowest geographical location in the world (more than 420 m below sea level) and characterized by extreme salinity (approximately, 34%) in addition to other extreme conditions (low pH, unique ionic composition different from sea water). DNA was extracted from Dead Sea water. A total of 314,310 small subunit RNA (SSU rRNA) sequences were parsed, and 288,452 sequences were then clustered. For alpha diversity analysis, sample was rarefied to 3,000 sequences. The Shannon-Wiener index curve plot reached a plateau at approximately 3,000 sequences indicating that sequencing depth was sufficient to capture the full scope of microbial diversity. Archaea was found to be dominating the sequences (52%), whereas Bacteria constitute 45% of the sequences. Altogether, prokaryotic sequences (which constitute 97% of all sequences) were found to predominate. The findings expand on previous studies by using high-throughput amplicon sequencing to describe the microbial community in an environment which in recent years has been shown to hide some interesting diversity. © 2017 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.
2009-01-01
Background ESTs or variable sequence reads can be available in prokaryotic studies well before a complete genome is known. Use cases include (i) transcriptome studies or (ii) single cell sequencing of bacteria. Without suitable software their further analysis and mapping would have to await finalization of the corresponding genome. Results The tool JANE rapidly maps ESTs or variable sequence reads in prokaryotic sequencing and transcriptome efforts to related template genomes. It provides an easy-to-use graphics interface for information retrieval and a toolkit for EST or nucleotide sequence function prediction. Furthermore, we developed for rapid mapping an enhanced sequence alignment algorithm which reassembles and evaluates high scoring pairs provided from the BLAST algorithm. Rapid assembly on and replacement of the template genome by sequence reads or mapped ESTs is achieved. This is illustrated (i) by data from Staphylococci as well as from a Blattabacteria sequencing effort, (ii) mapping single cell sequencing reads is shown for poribacteria to sister phylum representative Rhodopirellula Baltica SH1. The algorithm has been implemented in a web-server accessible at http://jane.bioapps.biozentrum.uni-wuerzburg.de. Conclusion Rapid prokaryotic EST mapping or mapping of sequence reads is achieved applying JANE even without knowing the cognate genome sequence. PMID:19943962
Sedimentary sequence evolution in a Foredeep basin: Eastern Venezuela
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bejarano, C.; Funes, D.; Sarzalho, S.
1996-08-01
Well log-seismic sequence stratigraphy analysis in the Eastern Venezuela Foreland Basin leads to study of the evolution of sedimentary sequences onto the Cretaceous-Paleocene passive margin. This basin comprises two different foredeep sub-basins: The Guarico subbasin to the west, older, and the Maturin sub-basin to the east, younger. A foredeep switching between these two sub-basins is observed at 12.5 m.y. Seismic interpretation and well log sections across the study area show sedimentary sequences with transgressive sands and coastal onlaps to the east-southeast for the Guarico sub-basin, as well as truncations below the switching sequence (12.5 m.y.), and the Maturin sub-basin showsmore » apparent coastal onlaps to the west-northwest, as well as a marine onlap (deeper water) in the west, where it starts to establish. Sequence stratigraphy analysis of these sequences with well logs allowed the study of the evolution of stratigraphic section from Paleocene to middle Miocene (68.0-12.0 m.y.). On the basis of well log patterns, the sequences were divided in regressive-transgressive-regressive sedimentary cycles caused by changes in relative sea level. Facies distributions were analyzed and the sequences were divided into simple sequences or sub- sequences of a greater frequencies than third order depositional sequences.« less
FOUNTAIN: A JAVA open-source package to assist large sequencing projects
Buerstedde, Jean-Marie; Prill, Florian
2001-01-01
Background Better automation, lower cost per reaction and a heightened interest in comparative genomics has led to a dramatic increase in DNA sequencing activities. Although the large sequencing projects of specialized centers are supported by in-house bioinformatics groups, many smaller laboratories face difficulties managing the appropriate processing and storage of their sequencing output. The challenges include documentation of clones, templates and sequencing reactions, and the storage, annotation and analysis of the large number of generated sequences. Results We describe here a new program, named FOUNTAIN, for the management of large sequencing projects . FOUNTAIN uses the JAVA computer language and data storage in a relational database. Starting with a collection of sequencing objects (clones), the program generates and stores information related to the different stages of the sequencing project using a web browser interface for user input. The generated sequences are subsequently imported and annotated based on BLAST searches against the public databases. In addition, simple algorithms to cluster sequences and determine putative polymorphic positions are implemented. Conclusions A simple, but flexible and scalable software package is presented to facilitate data generation and storage for large sequencing projects. Open source and largely platform and database independent, we wish FOUNTAIN to be improved and extended in a community effort. PMID:11591214
Sequence-specific bias correction for RNA-seq data using recurrent neural networks.
Zhang, Yao-Zhong; Yamaguchi, Rui; Imoto, Seiya; Miyano, Satoru
2017-01-25
The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited for biological problems that require automatic or hierarchical feature representation for biological data when prior knowledge is limited. In this work, we address the sequence-specific bias correction problem for RNA-seq data redusing Recurrent Neural Networks (RNNs) to model nucleotide sequences without pre-determining sequence structures. The sequence-specific bias of a read is then calculated based on the sequence probabilities estimated by RNNs, and used in the estimation of gene abundance. We explore the application of two popular RNN recurrent units for this task and demonstrate that RNN-based approaches provide a flexible way to model nucleotide sequences without knowledge of predetermined sequence structures. Our experiments show that training a RNN-based nucleotide sequence model is efficient and RNN-based bias correction methods compare well with the-state-of-the-art sequence-specific bias correction method on the commonly used MAQC-III data set. RNNs provides an alternative and flexible way to calculate sequence-specific bias without explicitly pre-determining sequence structures.
Zhao, Ying; Tsang, Chi-Ching; Xiao, Meng; Cheng, Jingwei; Xu, Yingchun; Lau, Susanna K P; Woo, Patrick C Y
2015-10-22
Internal transcribed spacer region (ITS) sequencing is the most extensively used technology for accurate molecular identification of fungal pathogens in clinical microbiology laboratories. Intra-genomic ITS sequence heterogeneity, which makes fungal identification based on direct sequencing of PCR products difficult, has rarely been reported in pathogenic fungi. During the process of performing ITS sequencing on 71 yeast strains isolated from various clinical specimens, direct sequencing of the PCR products showed ambiguous sequences in six of them. After cloning the PCR products into plasmids for sequencing, interpretable sequencing electropherograms could be obtained. For each of the six isolates, 10-49 clones were selected for sequencing and two to seven intra-genomic ITS copies were detected. The identities of these six isolates were confirmed to be Candida glabrata (n=2), Pichia (Candida) norvegensis (n=2), Candida tropicalis (n=1) and Saccharomyces cerevisiae (n=1). Multiple sequence alignment revealed that one to four intra-genomic ITS polymorphic sites were present in the six isolates, and all these polymorphic sites were located in the ITS1 and/or ITS2 regions. We report and describe the first evidence of intra-genomic ITS sequence heterogeneity in four different pathogenic yeasts, which occurred exclusively in the ITS1 and ITS2 spacer regions for the six isolates in this study.
Zhao, Ying; Tsang, Chi-Ching; Xiao, Meng; Cheng, Jingwei; Xu, Yingchun; Lau, Susanna K. P.; Woo, Patrick C. Y.
2015-01-01
Internal transcribed spacer region (ITS) sequencing is the most extensively used technology for accurate molecular identification of fungal pathogens in clinical microbiology laboratories. Intra-genomic ITS sequence heterogeneity, which makes fungal identification based on direct sequencing of PCR products difficult, has rarely been reported in pathogenic fungi. During the process of performing ITS sequencing on 71 yeast strains isolated from various clinical specimens, direct sequencing of the PCR products showed ambiguous sequences in six of them. After cloning the PCR products into plasmids for sequencing, interpretable sequencing electropherograms could be obtained. For each of the six isolates, 10–49 clones were selected for sequencing and two to seven intra-genomic ITS copies were detected. The identities of these six isolates were confirmed to be Candida glabrata (n = 2), Pichia (Candida) norvegensis (n = 2), Candida tropicalis (n = 1) and Saccharomyces cerevisiae (n = 1). Multiple sequence alignment revealed that one to four intra-genomic ITS polymorphic sites were present in the six isolates, and all these polymorphic sites were located in the ITS1 and/or ITS2 regions. We report and describe the first evidence of intra-genomic ITS sequence heterogeneity in four different pathogenic yeasts, which occurred exclusively in the ITS1 and ITS2 spacer regions for the six isolates in this study. PMID:26506340
Yamada, Kazuhiko; Kamimura, Eikichi; Kondo, Mariko; Tsuchiya, Kimiyuki; Nishida-Umehara, Chizuko; Matsuda, Yoichi
2006-02-01
We molecularly cloned new families of site-specific repetitive DNA sequences from BglII- and EcoRI-digested genomic DNA of the Syrian hamster (Mesocricetus auratus, Cricetrinae, Rodentia) and characterized them by chromosome in situ hybridization and filter hybridization. They were classified into six different types of repetitive DNA sequence families according to chromosomal distribution and genome organization. The hybridization patterns of the sequences were consistent with the distribution of C-positive bands and/or Hoechst-stained heterochromatin. The centromeric major satellite DNA and sex chromosome-specific and telomeric region-specific repetitive sequences were conserved in the same genus (Mesocricetus) but divergent in different genera. The chromosome-2-specific sequence was conserved in two genera, Mesocricetus and Cricetulus, and a low copy number of repetitive sequences on the heterochromatic chromosome arms were conserved in the subfamily Cricetinae but not in the subfamily Calomyscinae. By contrast, the other type of repetitive sequences on the heterochromatic chromosome arms, which had sequence similarities to a LINE sequence of rodents, was conserved through the three subfamilies, Cricetinae, Calomyscinae and Murinae. The nucleotide divergence of the repetitive sequences of heterochromatin was well correlated with the phylogenetic relationships of the Cricetinae species, and each sequence has been independently amplified and diverged in the same genome.
Guo, Bingfu; Guo, Yong; Hong, Huilong; Qiu, Li-Juan
2016-01-01
Molecular characterization of sequence flanking exogenous fragment insertion is essential for safety assessment and labeling of genetically modified organism (GMO). In this study, the T-DNA insertion sites and flanking sequences were identified in two newly developed transgenic glyphosate-tolerant soybeans GE-J16 and ZH10-6 based on whole genome sequencing (WGS) method. More than 22.4 Gb sequence data (∼21 × coverage) for each line was generated on Illumina HiSeq 2500 platform. The junction reads mapped to boundaries of T-DNA and flanking sequences in these two events were identified by comparing all sequencing reads with soybean reference genome and sequence of transgenic vector. The putative insertion loci and flanking sequences were further confirmed by PCR amplification, Sanger sequencing, and co-segregation analysis. All these analyses supported that exogenous T-DNA fragments were integrated in positions of Chr19: 50543767-50543792 and Chr17: 7980527-7980541 in these two transgenic lines. Identification of genomic insertion sites of G2-EPSPS and GAT transgenes will facilitate the utilization of their glyphosate-tolerant traits in soybean breeding program. These results also demonstrated that WGS was a cost-effective and rapid method for identifying sites of T-DNA insertions and flanking sequences in soybean.
Geoseq: a tool for dissecting deep-sequencing datasets.
Gurtowski, James; Cancio, Anthony; Shah, Hardik; Levovitz, Chaya; George, Ajish; Homann, Robert; Sachidanandam, Ravi
2010-10-12
Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.
Novel primers for complete mitochondrial cytochrome b genesequencing in mammals
Naidu, Ashwin; Fitak, Robert R.; Munguia-Vega, Adrian; Culver, Melanie
2011-01-01
Sequence-based species identification relies on the extent and integrity of sequence data available in online databases such as GenBank. When identifying species from a sample of unknown origin, partial DNA sequences obtained from the sample are aligned against existing sequences in databases. When the sequence from the matching species is not present in the database, high-scoring alignments with closely related sequences might produce unreliable results on species identity. For species identification in mammals, the cytochrome b (cyt b) gene has been identified to be highly informative; thus, large amounts of reference sequence data from the cyt b gene are much needed. To enhance availability of cyt b gene sequence data on a large number of mammalian species in GenBank and other such publicly accessible online databases, we identified a primer pair for complete cyt b gene sequencing in mammals. Using this primer pair, we successfully PCR amplified and sequenced the complete cyt b gene from 40 of 44 mammalian species representing 10 orders of mammals. We submitted 40 complete, correctly annotated, cyt b protein coding sequences to GenBank. To our knowledge, this is the first single primer pair to amplify the complete cyt b gene in a broad range of mammalian species. This primer pair can be used for the addition of new cyt b gene sequences and to enhance data available on species represented in GenBank. The availability of novel and complete gene sequences as high-quality reference data can improve the reliability of sequence-based species identification.
Browning, James V.; Miller, Kenneth G.; Sugarman, Peter J.; Barron, John; McCarthy, Francine M.G.; Kulhanek, Denise K.; Katz, Miriam E.; Feigenson, Mark D.
2013-01-01
Integrated Ocean Drilling Program Expedition 313 continuously cored and logged latest Eocene to early-middle Miocene sequences at three sites (M27, M28, and M29) on the inner-middle continental shelf offshore New Jersey, providing an opportunity to evaluate the ages, global correlations, and significance of sequence boundaries. We provide a chronology for these sequences using integrated strontium isotopic stratigraphy and biostratigraphy (primarily calcareous nannoplankton, diatoms, and dinocysts [dinoflagellate cysts]). Despite challenges posed by shallow-water sediments, age resolution is typically ±0.5 m.y. and in many sequences is as good as ±0.25 m.y. Three Oligocene sequences were sampled at Site M27 on sequence bottomsets. Fifteen early to early-middle Miocene sequences were dated at Sites M27, M28, and M29 across clinothems in topsets, foresets (where the sequences are thickest), and bottomsets. A few sequences have coarse (∼1 m.y.) or little age constraint due to barren zones; we constrain the age estimates of these less well dated sequences by applying the principle of superposition, i.e., sediments above sequence boundaries in any site are younger than the sediments below the sequence boundaries at other sites. Our age control provides constraints on the timing of deposition in the clinothem; sequences on the topsets are generally the youngest in the clinothem, whereas the bottomsets generally are the oldest. The greatest amount of time is represented on foresets, although we have no evidence for a correlative conformity. Our chronology provides a baseline for regional and interregional correlations and sea-level reconstructions: (1) we correlate a major increase in sedimentation rate precisely with the timing of the middle Miocene climate changes associated with the development of a permanent East Antarctic Ice Sheet; and (2) the timing of sequence boundaries matches the deep-sea oxygen isotopic record, implicating glacioeustasy as a major driver for forming sequence boundaries.
Memory for sequences of events impaired in typical aging
Allen, Timothy A.; Morris, Andrea M.; Stark, Shauna M.; Fortin, Norbert J.
2015-01-01
Typical aging is associated with diminished episodic memory performance. To improve our understanding of the fundamental mechanisms underlying this age-related memory deficit, we previously developed an integrated, cross-species approach to link converging evidence from human and animal research. This novel approach focuses on the ability to remember sequences of events, an important feature of episodic memory. Unlike existing paradigms, this task is nonspatial, nonverbal, and can be used to isolate different cognitive processes that may be differentially affected in aging. Here, we used this task to make a comprehensive comparison of sequence memory performance between younger (18–22 yr) and older adults (62–86 yr). Specifically, participants viewed repeated sequences of six colored, fractal images and indicated whether each item was presented “in sequence” or “out of sequence.” Several out of sequence probe trials were used to provide a detailed assessment of sequence memory, including: (i) repeating an item from earlier in the sequence (“Repeats”; e.g., ABADEF), (ii) skipping ahead in the sequence (“Skips”; e.g., ABDDEF), and (iii) inserting an item from a different sequence into the same ordinal position (“Ordinal Transfers”; e.g., AB3DEF). We found that older adults performed as well as younger controls when tested on well-known and predictable sequences, but were severely impaired when tested using novel sequences. Importantly, overall sequence memory performance in older adults steadily declined with age, a decline not detected with other measures (RAVLT or BPS-O). We further characterized this deficit by showing that performance of older adults was severely impaired on specific probe trials that required detailed knowledge of the sequence (Skips and Ordinal Transfers), and was associated with a shift in their underlying mnemonic representation of the sequences. Collectively, these findings provide unambiguous evidence that the capacity to remember sequences of events is fundamentally affected by typical aging. PMID:25691514
Theta oscillations promote temporal sequence learning.
Crivelli-Decker, Jordan; Hsieh, Liang-Tien; Clarke, Alex; Ranganath, Charan
2018-05-17
Many theoretical models suggest that neural oscillations play a role in learning or retrieval of temporal sequences, but the extent to which oscillations support sequence representation remains unclear. To address this question, we used scalp electroencephalography (EEG) to examine oscillatory activity over learning of different object sequences. Participants made semantic decisions on each object as they were presented in a continuous stream. For three "Consistent" sequences, the order of the objects was always fixed. Activity during Consistent sequences was compared to "Random" sequences that consisted of the same objects presented in a different order on each repetition. Over the course of learning, participants made faster semantic decisions to objects in Consistent, as compared to objects in Random sequences. Thus, participants were able to use sequence knowledge to predict upcoming items in Consistent sequences. EEG analyses revealed decreased oscillatory power in the theta (4-7 Hz) band at frontal sites following decisions about objects in Consistent sequences, as compared with objects in Random sequences. The theta power difference between Consistent and Random only emerged in the second half of the task, as participants were more effectively able to predict items in Consistent sequences. Moreover, we found increases in parieto-occipital alpha (10-13 Hz) and beta (14-28 Hz) power during the pre-response period for objects in Consistent sequences, relative to objects in Random sequences. Linear mixed effects modeling revealed that single trial theta oscillations were related to reaction time for future objects in a sequence, whereas beta and alpha oscillations were only predictive of reaction time on the current trial. These results indicate that theta and alpha/beta activity preferentially relate to future and current events, respectively. More generally our findings highlight the importance of band-specific neural oscillations in the learning of temporal order information. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Mariappan, Yogesh K.; Dzyubak, Bogdan; Glaser, Kevin J.; Venkatesh, Sudhakar K.; Sirlin, Claude B.; Hooker, Jonathan; McGee, Kiaran P.
2017-01-01
Purpose To (a) evaluate modified spin-echo (SE) magnetic resonance (MR) elastographic sequences for acquiring MR images with improved signal-to-noise ratio (SNR) in patients in whom the standard gradient-echo (GRE) MR elastographic sequence yields low hepatic signal intensity and (b) compare the stiffness values obtained with these sequences with those obtained with the conventional GRE sequence. Materials and Methods This HIPAA-compliant retrospective study was approved by the institutional review board; the requirement to obtain informed consent was waived. Data obtained with modified SE and SE echo-planar imaging (EPI) MR elastographic pulse sequences with short echo times were compared with those obtained with the conventional GRE MR elastographic sequence in two patient cohorts, one that exhibited adequate liver signal intensity and one that exhibited low liver signal intensity. Shear stiffness values obtained with the three sequences in 130 patients with successful GRE-based examinations were retrospectively tested for statistical equivalence by using a 5% margin. In 47 patients in whom GRE examinations were considered to have failed because of low SNR, the SNR and confidence level with the SE-based sequences were compared with those with the GRE sequence. Results The results of this study helped confirm the equivalence of SE MR elastography and SE-EPI MR elastography to GRE MR elastography (P = .0212 and P = .0001, respectively). The SE and SE-EPI MR elastographic sequences provided substantially improved SNR and stiffness inversion confidence level in 47 patients in whom GRE MR elastography had failed. Conclusion Modified SE-based MR elastographic sequences provide higher SNR MR elastographic data and reliable stiffness measurements; thus, they enable quantification of stiffness in patients in whom the conventional GRE MR elastographic sequence failed owing to low signal intensity. The equivalence of the three sequences indicates that the current diagnostic thresholds are applicable to SE MR elastographic sequences for assessing liver fibrosis. © RSNA, 2016 PMID:27509543
(Pea)nuts and bolts of visual narrative: Structure and meaning in sequential image comprehension
Cohn, Neil; Paczynski, Martin; Jackendoff, Ray; Holcomb, Phillip J.; Kuperberg, Gina R.
2012-01-01
Just as syntax differentiates coherent sentences from scrambled word strings, the comprehension of sequential images must also use a cognitive system to distinguish coherent narrative sequences from random strings of images. We conducted experiments analogous to two classic studies of language processing to examine the contributions of narrative structure and semantic relatedness to processing sequential images. We compared four types of comic strips: 1) Normal sequences with both structure and meaning, 2) Semantic Only sequences (in which the panels were related to a common semantic theme, but had no narrative structure), 3) Structural Only sequences (narrative structure but no semantic relatedness), and 4) Scrambled sequences of randomly-ordered panels. In Experiment 1, participants monitored for target panels in sequences presented panel-by-panel. Reaction times were slowest to panels in Scrambled sequences, intermediate in both Structural Only and Semantic Only sequences, and fastest in Normal sequences. This suggests that both semantic relatedness and narrative structure offer advantages to processing. Experiment 2 measured ERPs to all panels across the whole sequence. The N300/N400 was largest to panels in both the Scrambled and Structural Only sequences, intermediate in Semantic Only sequences and smallest in the Normal sequences. This implies that a combination of narrative structure and semantic relatedness can facilitate semantic processing of upcoming panels (as reflected by the N300/N400). Also, panels in the Scrambled sequences evoked a larger left-lateralized anterior negativity than panels in the Structural Only sequences. This localized effect was distinct from the N300/N400, and appeared despite the fact that these two sequence types were matched on local semantic relatedness between individual panels. These findings suggest that sequential image comprehension uses a narrative structure that may be independent of semantic relatedness. Altogether, we argue that the comprehension of visual narrative is guided by an interaction between structure and meaning. PMID:22387723
A robust and cost-effective approach to sequence and analyze complete genomes of small RNA viruses
USDA-ARS?s Scientific Manuscript database
Background: Next-generation sequencing (NGS) allows ultra-deep sequencing of nucleic acids. The use of sequence-independent amplification of viral nucleic acids without utilization of target-specific primers provides advantages over traditional sequencing methods and allows detection of unsuspected ...
Genome Wide Characterization of Simple Sequence Repeats in Cucumber
USDA-ARS?s Scientific Manuscript database
The whole genome sequence of the cucumber cultivar Gy14 was recently sequenced at 15× coverage with the Roche 454 Titanium technology. The microsatellite DNA sequences (simple sequence repeats, SSRs) in the assembled scaffolds were computationally explored and characterized. A total of 112,073 SSRs ...
Jenkins, David
2018-01-10
David Jenkins on "Ion Torrent semiconductor sequencing allows rapid, low-cost sequencing of the human exome" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jenkins, David
David Jenkins on "Ion Torrent semiconductor sequencing allows rapid, low-cost sequencing of the human exome" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Genome sequence of Phytophthora ramorum: implications for management
Brett Tyler; Sucheta Tripathy; Nik Grunwald; Kurt Lamour; Kelly Ivors; Matteo Garbelotto; Daniel Rokhsar; Nik Putnam; Igor Grigoriev; Jeffrey Boore
2006-01-01
A draft genome sequence has been determined for Phytophthora ramorum, together with a draft sequence of the soybean pathogen Phytophthora sojae. The P. ramorum genome was sequenced to a depth of 7-fold coverage, while the P. sojae genome was sequenced to a depth of 9-fold coverage. The genome...
Teaching Task Sequencing via Verbal Mediation.
ERIC Educational Resources Information Center
Rusch, Frank R.; And Others
1987-01-01
Verbal sequence training was used to teach a moderately mentally retarded woman to sequence job-related tasks. Learning to say the tasks in the proper sequence resulted in the employee performing her tasks in that sequence, and the employee was capable of mediating her own work behavior when scheduled changes occurred. (Author/JDD)
Sequencing Adventure Activities: A New Perspective.
ERIC Educational Resources Information Center
Bisson, Christian
Sequencing in adventure education involves putting activities in an order appropriate to the needs of the group. Contrary to the common assumption that each adventure sequence is unique, a review of literature concerning five sequencing models reveals a certain universality. These models present sequences that move through four phases: group…
USDA-ARS?s Scientific Manuscript database
The advancement of next-generation sequencing technologies in conjunction with new bioinformatics tools enabled fine-tuning of sequence-based high resolution mapping strategies for complex genomes. Although genotyping-by-sequencing (GBS) provides a large number of markers, its application for assoc...
Federal Register 2010, 2011, 2012, 2013, 2014
2012-10-29
... DEPARTMENT OF COMMERCE Patent and Trademark Office Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request... Patent applications that contain nucleotide and/or amino acid sequence disclosures must include a copy of...
A Glance at Microsatellite Motifs from 454 Sequencing Reads of Watermelon Genomic DNA
USDA-ARS?s Scientific Manuscript database
A single 454 (Life Sciences Sequencing Technology) run of Charleston Gray watermelon (Citrullus lanatus var. lanatus) genomic DNA was performed and sequence data were assembled. A large scale identification of simple sequence repeat (SSR) was performed and SSR sequence data were used for the develo...
USDA-ARS?s Scientific Manuscript database
Next-generation sequencing technologies were used to rapidly and efficiently sequence the genome of the domestic turkey (Meleagris gallopavo). The current genome assembly (~1.1 Gb) includes 917 Mb of sequence assigned to chromosomes. Innate heterozygosity of the sequenced bird allowed discovery of...
Bellerophon: A program to detect chimeric sequences in multiple sequence alignments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huber, Thomas; Faulkner, Geoffrey; Hugenholtz, Philip
2003-12-23
Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments.
A Code Division Multiple Access Communication System for the Low Frequency Band.
1983-04-01
frequency channels spread-spectrum communication / complex sequences, orthogonal codes impulsive noise 20. ABSTRACT (Continue an reverse side It...their transmissions with signature sequences. Our LF/CDMA scheme is different in that each user’s signature sequence set consists of M orthogonal ...signature sequences. Our LF/CDMA scheme is different in that each user’s signature sequence set consists of M orthogonal sequences and thus log 2 M
A vision for ubiquitous sequencing
Erlich, Yaniv
2015-01-01
Genomics has recently celebrated reaching the $1000 genome milestone, making affordable DNA sequencing a reality. With this goal successfully completed, the next goal of the sequencing revolution can be sequencing sensors—miniaturized sequencing devices that are manufactured for real-time applications and deployed in large quantities at low costs. The first part of this manuscript envisions applications that will benefit from moving the sequencers to the samples in a range of domains. In the second part, the manuscript outlines the critical barriers that need to be addressed in order to reach the goal of ubiquitous sequencing sensors. PMID:26430149
Data compression of discrete sequence: A tree based approach using dynamic programming
NASA Technical Reports Server (NTRS)
Shivaram, Gurusrasad; Seetharaman, Guna; Rao, T. R. N.
1994-01-01
A dynamic programming based approach for data compression of a ID sequence is presented. The compression of an input sequence of size N to that of a smaller size k is achieved by dividing the input sequence into k subsequences and replacing the subsequences by their respective average values. The partitioning of the input sequence is carried with the intention of reducing the mean squared error in the reconstructed sequence. The complexity involved in finding the partitions which would result in such an optimal compressed sequence is reduced by using the dynamic programming approach, which is presented.
Whole-genome sequencing for comparative genomics and de novo genome assembly.
Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C
2015-01-01
Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).
Library construction for next-generation sequencing: Overviews and challenges
Head, Steven R.; Komori, H. Kiyomi; LaMere, Sarah A.; Whisenant, Thomas; Van Nieuwerburgh, Filip; Salomon, Daniel R.; Ordoukhanian, Phillip
2014-01-01
High-throughput sequencing, also known as next-generation sequencing (NGS), has revolutionized genomic research. In recent years, NGS technology has steadily improved, with costs dropping and the number and range of sequencing applications increasing exponentially. Here, we examine the critical role of sequencing library quality and consider important challenges when preparing NGS libraries from DNA and RNA sources. Factors such as the quantity and physical characteristics of the RNA or DNA source material as well as the desired application (i.e., genome sequencing, targeted sequencing, RNA-seq, ChIP-seq, RIP-seq, and methylation) are addressed in the context of preparing high quality sequencing libraries. In addition, the current methods for preparing NGS libraries from single cells are also discussed. PMID:24502796
PCR Amplification Strategies towards full-length HIV-1 Genome sequencing.
Liu, Chao Chun; Ji, Hezhao
2018-06-26
The advent of next generation sequencing has enabled greater resolution of viral diversity and improved feasibility of full viral genome sequencing allowing routine HIV-1 full genome sequencing in both research and diagnostic settings. Regardless of the sequencing platform selected, successful PCR amplification of the HIV-1 genome is essential for sequencing template preparation. As such, full HIV-1 genome amplification is a crucial step in dictating the successful and reliable sequencing downstream. Here we reviewed existing PCR protocols leading to HIV-1 full genome sequencing. In addition to the discussion on basic considerations on relevant PCR design, the advantages as well as the pitfalls of published protocols were reviewed. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.
Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab
2012-01-01
RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.
Long sequence correlation coprocessor
NASA Astrophysics Data System (ADS)
Gage, Douglas W.
1994-09-01
A long sequence correlation coprocessor (LSCC) accelerates the bitwise correlation of arbitrarily long digital sequences by calculating in parallel the correlation score for 16, for example, adjacent bit alignments between two binary sequences. The LSCC integrated circuit is incorporated into a computer system with memory storage buffers and a separate general purpose computer processor which serves as its controller. Each of the LSCC's set of sequential counters simultaneously tallies a separate correlation coefficient. During each LSCC clock cycle, computer enable logic associated with each counter compares one bit of a first sequence with one bit of a second sequence to increment the counter if the bits are the same. A shift register assures that the same bit of the first sequence is simultaneously compared to different bits of the second sequence to simultaneously calculate the correlation coefficient by the different counters to represent different alignments of the two sequences.
It’s More Than Stamp Collecting: How Genome Sequencing Can Unify Biological Research
Richards, Stephen
2015-01-01
The availability of reference genome sequences, especially the human reference, has revolutionized the study of biology. However, whilst the genomes of some species have been fully sequenced, a wide range of biological problems still cannot be effectively studied for lack of genome sequence information. Here, I identify neglected areas of biology and describe how both targeted species sequencing and more broad taxonomic surveys of the tree of life can address important biological questions. I enumerate the significant benefits that would accrue from sequencing a broader range of taxa, as well as discuss the technical advances in sequencing and assembly methods that would allow for wide-ranging application of whole-genome analysis. Finally, I suggest that in addition to “Big Science” survey initiatives to sequence the tree of life, a modified infrastructure-funding paradigm would better support reference genome sequence generation for research communities most in need. PMID:26003218
It's more than stamp collecting: how genome sequencing can unify biological research.
Richards, Stephen
2015-07-01
The availability of reference genome sequences, especially the human reference, has revolutionized the study of biology. However, while the genomes of some species have been fully sequenced, a wide range of biological problems still cannot be effectively studied for lack of genome sequence information. Here, I identify neglected areas of biology and describe how both targeted species sequencing and more broad taxonomic surveys of the tree of life can address important biological questions. I enumerate the significant benefits that would accrue from sequencing a broader range of taxa, as well as discuss the technical advances in sequencing and assembly methods that would allow for wide-ranging application of whole-genome analysis. Finally, I suggest that in addition to 'big science' survey initiatives to sequence the tree of life, a modified infrastructure-funding paradigm would better support reference genome sequence generation for research communities most in need. Copyright © 2015 Elsevier Ltd. All rights reserved.
Sequence information signal processor
Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.
1999-01-01
An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.
Kizaki, Seiichiro; Chandran, Anandhakumar; Sugiyama, Hiroshi
2016-03-02
Tet (ten-eleven translocation) family proteins have the ability to oxidize 5-methylcytosine (mC) to 5-hydroxymethylcytosine (hmC), 5-formylcytosine (fC), and 5-carboxycytosine (caC). However, the oxidation reaction of Tet is not understood completely. Evaluation of genomic-level epigenetic changes by Tet protein requires unbiased identification of the highly selective oxidation sites. In this study, we used high-throughput sequencing to investigate the sequence specificity of mC oxidation by Tet1. A 6.6×10(4) -member mC-containing random DNA-sequence library was constructed. The library was subjected to Tet-reactive pulldown followed by high-throughput sequencing. Analysis of the obtained sequence data identified the Tet1-reactive sequences. We identified mCpG as a highly reactive sequence of Tet1 protein. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Rapid Threat Organism Recognition Pipeline
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Kelly P.; Solberg, Owen D.; Schoeniger, Joseph S.
2013-05-07
The RAPTOR computational pipeline identifies microbial nucleic acid sequences present in sequence data from clinical samples. It takes as input raw short-read genomic sequence data (in particular, the type generated by the Illumina sequencing platforms) and outputs taxonomic evaluation of detected microbes in various human-readable formats. This software was designed to assist in the diagnosis or characterization of infectious disease, by detecting pathogen sequences in nucleic acid sequence data from clinical samples. It has also been applied in the detection of algal pathogens, when algal biofuel ponds became unproductive. RAPTOR first trims and filters genomic sequence reads based on qualitymore » and related considerations, then performs a quick alignment to the human (or other host) genome to filter out host sequences, then performs a deeper search against microbial genomes. Alignment to a protein sequence database is optional. Alignment results are summarized and placed in a taxonomic framework using the Lowest Common Ancestor algorithm.« less
Next-Generation Sequencing Platforms
NASA Astrophysics Data System (ADS)
Mardis, Elaine R.
2013-06-01
Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.
Verwey, Willem B
2015-05-01
Research has provided many indications that highly practiced 6-key sequences are carried out in a chunking mode in which key-specific stimuli past the first are largely ignored. When in such sequences a deviating stimulus occasionally occurs at an unpredictable location, participants fall back to responding to individual stimuli (Verwey & Abrahamse, 2012). The observation that in such a situation execution still benefits from prior practice has been attributed to the possibility to operate in an associative mode. To better understand the contribution to the execution of keying sequences of motor chunks, associative sequence knowledge and also of explicit sequence knowledge, the present study tested three alternative accounts for the earlier finding of an execution rate increase at the end of 6-key sequences performed in the associative mode. The results provide evidence that the earlier observed execution rate increase can be attributed to the use of explicit sequence knowledge. In the present experiment this benefit was limited to sequences that are executed at the moderately fast rates of the associative mode, and occurred at both the earlier and final elements of the sequences. Copyright © 2015 Elsevier B.V. All rights reserved.
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less
Pease, Anthony; Sullivan, Stacey; Olby, Natasha; Galano, Heather; Cerda-Gonzalez, Sophia; Robertson, Ian D; Gavin, Patrick; Thrall, Donald
2006-01-01
Three case history reports are presented to illustrate the value of the single-shot turbo spin-echo pulse sequence for assessment of the subarachnoid space. The use of the single-shot turbo spin-echo pulse sequence, which is a heavily T2-weighted sequence, allows for a rapid, noninvasive evaluation of the subarachnoid space by using the high signal from cerebrospinal fluid. This sequence can be completed in seconds rather than the several minutes required for a T2-fast spin-echo sequence. Unlike the standard T2-fast spin-echo sequence, a single-shot turbo spin-echo pulse sequence also provides qualitative information about the protein and the cellular content of the cerebrospinal fluid, such as in patients with inflammatory debris or hemorrhage in the cerebrospinal fluid. Although the resolution of the single-shot turbo spin-echo pulse sequence images is relatively poor compared with more conventional sequences, the qualitative information about the subarachnoid space and cerebrospinal fluid and the rapid acquisition time, make it a useful sequence to include in standard protocols of spinal magnetic resonance imaging.
Protein sequence annotation in the genome era: the annotation concept of SWISS-PROT+TREMBL.
Apweiler, R; Gateau, A; Contrino, S; Martin, M J; Junker, V; O'Donovan, C; Lang, F; Mitaritonna, N; Kappus, S; Bairoch, A
1997-01-01
SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases. Ongoing genome sequencing projects have dramatically increased the number of protein sequences to be incorporated into SWISS-PROT. Since we do not want to dilute the quality standards of SWISS-PROT by incorporating sequences without proper sequence analysis and annotation, we cannot speed up the incorporation of new incoming data indefinitely. However, as we also want to make the sequences available as fast as possible, we introduced TREMBL (TRanslation of EMBL nucleotide sequence database), a supplement to SWISS-PROT. TREMBL consists of computer-annotated entries in SWISS-PROT format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except for CDS already included in SWISS-PROT. While TREMBL is already of immense value, its computer-generated annotation does not match the quality of SWISS-PROTs. The main difference is in the protein functional information attached to sequences. With this in mind, we are dedicating substantial effort to develop and apply computer methods to enhance the functional information attached to TREMBL entries.
rpoB Gene Sequencing for Identification of Corynebacterium Species
Khamis, Atieh; Raoult, Didier; La Scola, Bernard
2004-01-01
The genus Corynebacterium is a heterogeneous group of species comprising human and animal pathogens and environmental bacteria. It is defined on the basis of several phenotypic characters and the results of DNA-DNA relatedness and, more recently, 16S rRNA gene sequencing. However, the 16S rRNA gene is not polymorphic enough to ensure reliable phylogenetic studies and needs to be completely sequenced for accurate identification. The almost complete rpoB sequences of 56 Corynebacterium species were determined by both PCR and genome walking methods. In all cases the percent similarities between different species were lower than those observed by 16S rRNA gene sequencing, even for those species with degrees of high similarity. Several clusters supported by high bootstrap values were identified. In order to propose a method for strain identification which does not require sequencing of the complete rpoB sequence (approximately 3,500 bp), we identified an area with a high degree of polymorphism, bordered by conserved sequences that can be used as universal primers for PCR amplification and sequencing. The sequence of this fragment (434 to 452 bp) allows accurate species identification and may be used in the future for routine sequence-based identification of Corynebacterium species. PMID:15364970
Fernández-Caballero Rico, Jose Ángel; Chueca Porcuna, Natalia; Álvarez Estévez, Marta; Mosquera Gutiérrez, María Del Mar; Marcos Maeso, María Ángeles; García, Federico
2018-02-01
To show how to generate a consensus sequence from the information of massive parallel sequences data obtained from routine HIV anti-retroviral resistance studies, and that may be suitable for molecular epidemiology studies. Paired Sanger (Trugene-Siemens) and next-generation sequencing (NGS) (454 GSJunior-Roche) HIV RT and protease sequences from 62 patients were studied. NGS consensus sequences were generated using Mesquite, using 10%, 15%, and 20% thresholds. Molecular evolutionary genetics analysis (MEGA) was used for phylogenetic studies. At a 10% threshold, NGS-Sanger sequences from 17/62 patients were phylogenetically related, with a median bootstrap-value of 88% (IQR83.5-95.5). Association increased to 36/62 sequences, median bootstrap 94% (IQR85.5-98)], using a 15% threshold. Maximum association was at the 20% threshold, with 61/62 sequences associated, and a median bootstrap value of 99% (IQR98-100). A safe method is presented to generate consensus sequences from HIV-NGS data at 20% threshold, which will prove useful for molecular epidemiological studies. Copyright © 2016 Elsevier España, S.L.U. and Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica. All rights reserved.
Method for isolating chromosomal DNA in preparation for hybridization in suspension
Lucas, Joe N.
2000-01-01
A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. Chromosomal DNA in a sample containing cell debris is prepared for hybridization in suspension by treating the mixture with RNase. The treated DNA can also be fixed prior to hybridization.
Megabase sequencing of human genome by ordered-shotgun-sequencing (OSS) strategy
NASA Astrophysics Data System (ADS)
Chen, Ellson Y.
1997-05-01
So far we have used OSS strategy to sequence over 2 megabases DNA in large-insert clones from regions of human X chromosomes with different characteristic levels of GC content. The method starts by randomly fragmenting a BAC, YAC or PAC to 8-12 kb pieces and subcloning those into lambda phage. Insert-ends of these clones are sequenced and overlapped to create a partial map. Complete sequencing is then done on a minimal tiling path of selected subclones, recursively focusing on those at the edges of contigs to facilitate mergers of clones across the entire target. To reduce manual labor, PCR processes have been adapted to prepare sequencing templates throughout the entire operation. The streamlined process can thus lend itself to further automation. The OSS approach is suitable for large- scale genomic sequencing, providing considerable flexibility in the choice of subclones or regions for more or less intensive sequencing. For example, subclones containing contaminating host cell DNA or cloning vector can be recognized and ignored with minimal sequencing effort; regions overlapping a neighboring clone already sequenced need not be redone; and segments containing tandem repeats or long repetitive sequences can be spotted early on and targeted for additional attention.
The future scalability of pH-based genome sequencers: A theoretical perspective
NASA Astrophysics Data System (ADS)
Go, Jonghyun; Alam, Muhammad A.
2013-10-01
Sequencing of human genome is an essential prerequisite for personalized medicine and early prognosis of various genetic diseases. The state-of-art, high-throughput genome sequencing technologies provide improved sequencing; however, their reliance on relatively expensive optical detection schemes has prevented wide-spread adoption of the technology in routine care. In contrast, the recently announced pH-based electronic genome sequencers achieve fast sequencing at low cost because of the compatibility with the current microelectronics technology. While the progress in technology development has been rapid, the physics of the sequencing chips and the potential for future scaling (and therefore, cost reduction) remain unexplored. In this article, we develop a theoretical framework and a scaling theory to explain the principle of operation of the pH-based sequencing chips and use the framework to explore various perceived scaling limits of the technology related to signal to noise ratio, well-to-well crosstalk, and sequencing accuracy. We also address several limitations inherent to the key steps of pH-based genome sequencers, which are widely shared by many other sequencing platforms in the market but remained unexplained properly so far.
Pulse sequence programming in a dynamic visual environment: SequenceTree.
Magland, Jeremy F; Li, Cheng; Langham, Michael C; Wehrli, Felix W
2016-01-01
To describe SequenceTree, an open source, integrated software environment for implementing MRI pulse sequences and, ideally, exporting them to actual MRI scanners. The software is a user-friendly alternative to vendor-supplied pulse sequence design and editing tools and is suited for programmers and nonprogrammers alike. The integrated user interface was programmed using the Qt4/C++ toolkit. As parameters and code are modified, the pulse sequence diagram is automatically updated within the user interface. Several aspects of pulse programming are handled automatically, allowing users to focus on higher-level aspects of sequence design. Sequences can be simulated using a built-in Bloch equation solver and then exported for use on a Siemens MRI scanner. Ideally, other types of scanners will be supported in the future. SequenceTree has been used for 8 years in our laboratory and elsewhere and has contributed to more than 50 peer-reviewed publications in areas such as cardiovascular imaging, solid state and nonproton NMR, MR elastography, and high-resolution structural imaging. SequenceTree is an innovative, open source, visual pulse sequence environment for MRI combining simplicity with flexibility and is ideal both for advanced users and users with limited programming experience. © 2015 Wiley Periodicals, Inc.
Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens; Gniadecki, Robert; Dybkaer, Karen; Rosenberg, Jacob; Langhoff, Jill Levin; Cruz, David Flores Santa; Fonager, Jannik; Izarzugaza, Jose M G; Gupta, Ramneek; Sicheritz-Ponten, Thomas; Brunak, Søren; Willerslev, Eske; Nielsen, Lars Peter; Hansen, Anders Johannes
2015-08-19
Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low-stringency in-solution hybridization method enables detection of <100 viral copies. Furthermore, distantly related proviral sequences may be enriched by orders of magnitude, enabling discovery of hitherto unknown viral sequences by high-throughput sequencing. The sensitivity was sufficient to detect retroviral sequences in clinical samples. We used this method to conduct an investigation for novel retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material.
Modeling genome coverage in single-cell sequencing
Daley, Timothy; Smith, Andrew D.
2014-01-01
Motivation: Single-cell DNA sequencing is necessary for examining genetic variation at the cellular level, which remains hidden in bulk sequencing experiments. But because they begin with such small amounts of starting material, the amount of information that is obtained from single-cell sequencing experiment is highly sensitive to the choice of protocol employed and variability in library preparation. In particular, the fraction of the genome represented in single-cell sequencing libraries exhibits extreme variability due to quantitative biases in amplification and loss of genetic material. Results: We propose a method to predict the genome coverage of a deep sequencing experiment using information from an initial shallow sequencing experiment mapped to a reference genome. The observed coverage statistics are used in a non-parametric empirical Bayes Poisson model to estimate the gain in coverage from deeper sequencing. This approach allows researchers to know statistical features of deep sequencing experiments without actually sequencing deeply, providing a basis for optimizing and comparing single-cell sequencing protocols or screening libraries. Availability and implementation: The method is available as part of the preseq software package. Source code is available at http://smithlabresearch.org/preseq. Contact: andrewds@usc.edu Supplementary information: Supplementary material is available at Bioinformatics online. PMID:25107873
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies
Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.; ...
2017-07-18
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies
Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Richard A.; Brown, Steven D.
2017-01-01
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences. PMID:28769883
Beltman, Joost B; Urbanus, Jos; Velds, Arno; van Rooij, Nienke; Rohr, Jan C; Naik, Shalin H; Schumacher, Ton N
2016-04-02
Next generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing. For applications in which both abundant and rare sequences are biologically relevant, the relatively high error rate of NGS techniques complicates data analysis, as it is difficult to distinguish rare true sequences from spurious sequences that are generated by PCR or sequencing errors. This issue, for instance, applies to cellular barcoding strategies that aim to follow the amount and type of offspring of single cells, by supplying these with unique heritable DNA tags. Here, we use genetic barcoding data from the Illumina HiSeq platform to show that straightforward read threshold-based filtering of data is typically insufficient to filter out spurious barcodes. Importantly, we demonstrate that specific sequencing errors occur at an approximately constant rate across different samples that are sequenced in parallel. We exploit this observation by developing a novel approach to filter out spurious sequences. Application of our new method demonstrates its value in the identification of true sequences amongst spurious sequences in biological data sets.
Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo
2003-01-01
To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979
Levels of integration in cognitive control and sequence processing in the prefrontal cortex.
Bahlmann, Jörg; Korb, Franziska M; Gratton, Caterina; Friederici, Angela D
2012-01-01
Cognitive control is necessary to flexibly act in changing environments. Sequence processing is needed in language comprehension to build the syntactic structure in sentences. Functional imaging studies suggest that sequence processing engages the left ventrolateral prefrontal cortex (PFC). In contrast, cognitive control processes additionally recruit bilateral rostral lateral PFC regions. The present study aimed to investigate these two types of processes in one experimental paradigm. Sequence processing was manipulated using two different sequencing rules varying in complexity. Cognitive control was varied with different cue-sets that determined the choice of a sequencing rule. Univariate analyses revealed distinct PFC regions for the two types of processing (i.e. sequence processing: left ventrolateral PFC and cognitive control processing: bilateral dorsolateral and rostral PFC). Moreover, in a common brain network (including left lateral PFC and intraparietal sulcus) no interaction between sequence and cognitive control processing was observed. In contrast, a multivariate pattern analysis revealed an interaction of sequence and cognitive control processing, such that voxels in left lateral PFC and parietal cortex showed different tuning functions for tasks involving different sequencing and cognitive control demands. These results suggest that the difference between the process of rule selection (i.e. cognitive control) and the process of rule-based sequencing (i.e. sequence processing) find their neuronal underpinnings in distinct activation patterns in lateral PFC. Moreover, the combination of rule selection and rule sequencing can shape the response of neurons in lateral PFC and parietal cortex.
Non-Cartesian Balanced SSFP Pulse Sequences for Real-Time Cardiac MRI
Feng, Xue; Salerno, Michael; Kramer, Christopher M.; Meyer, Craig H.
2015-01-01
Purpose To develop a new spiral-in/out balanced steady-state free precession (bSSFP) pulse sequence for real-time cardiac MRI and compare it with radial and spiral-out techniques. Methods Non-Cartesian sampling strategies are efficient and robust to motion and thus have important advantages for real-time bSSFP cine imaging. This study describes a new symmetric spiral-in/out sequence with intrinsic gradient moment compensation and SSFP refocusing at TE=TR/2. In-vivo real-time cardiac imaging studies were performed to compare radial, spiral-out, and spiral-in/out bSSFP pulse sequences. Furthermore, phase-based fat-water separation taking advantage of the refocusing mechanism of the spiral-in/out bSSFP sequence was also studied. Results The image quality of the spiral-out and spiral-in/out bSSFP sequences was improved with off-resonance and k-space trajectory correction. The spiral-in/out bSSFP sequence had the highest SNR, CNR, and image quality ratings, with spiral-out bSSFP sequence second in each category and the radial bSSFP sequence third. The spiral-in/out bSSFP sequence provides separated fat and water images with no additional scan time. Conclusions In this work a new spiral-in/out bSSFP sequence was developed and tested. The superiority of spiral bSSFP sequences over the radial bSSFP sequence in terms of SNR and reduced artifacts was demonstrated in real-time MRI of cardiac function without image acceleration. PMID:25960254
Levels of Integration in Cognitive Control and Sequence Processing in the Prefrontal Cortex
Bahlmann, Jörg; Korb, Franziska M.; Gratton, Caterina; Friederici, Angela D.
2012-01-01
Cognitive control is necessary to flexibly act in changing environments. Sequence processing is needed in language comprehension to build the syntactic structure in sentences. Functional imaging studies suggest that sequence processing engages the left ventrolateral prefrontal cortex (PFC). In contrast, cognitive control processes additionally recruit bilateral rostral lateral PFC regions. The present study aimed to investigate these two types of processes in one experimental paradigm. Sequence processing was manipulated using two different sequencing rules varying in complexity. Cognitive control was varied with different cue-sets that determined the choice of a sequencing rule. Univariate analyses revealed distinct PFC regions for the two types of processing (i.e. sequence processing: left ventrolateral PFC and cognitive control processing: bilateral dorsolateral and rostral PFC). Moreover, in a common brain network (including left lateral PFC and intraparietal sulcus) no interaction between sequence and cognitive control processing was observed. In contrast, a multivariate pattern analysis revealed an interaction of sequence and cognitive control processing, such that voxels in left lateral PFC and parietal cortex showed different tuning functions for tasks involving different sequencing and cognitive control demands. These results suggest that the difference between the process of rule selection (i.e. cognitive control) and the process of rule-based sequencing (i.e. sequence processing) find their neuronal underpinnings in distinct activation patterns in lateral PFC. Moreover, the combination of rule selection and rule sequencing can shape the response of neurons in lateral PFC and parietal cortex. PMID:22952762
Rapid Identification of Sequences for Orphan Enzymes to Power Accurate Protein Annotation
Ojha, Sunil; Watson, Douglas S.; Bomar, Martha G.; Galande, Amit K.; Shearer, Alexander G.
2013-01-01
The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation. PMID:24386392
Improving performance of DS-CDMA systems using chaotic complex Bernoulli spreading codes
NASA Astrophysics Data System (ADS)
Farzan Sabahi, Mohammad; Dehghanfard, Ali
2014-12-01
The most important goal of spreading spectrum communication system is to protect communication signals against interference and exploitation of information by unintended listeners. In fact, low probability of detection and low probability of intercept are two important parameters to increase the performance of the system. In Direct Sequence Code Division Multiple Access (DS-CDMA) systems, these properties are achieved by multiplying the data information in spreading sequences. Chaotic sequences, with their particular properties, have numerous applications in constructing spreading codes. Using one-dimensional Bernoulli chaotic sequence as spreading code is proposed in literature previously. The main feature of this sequence is its negative auto-correlation at lag of 1, which with proper design, leads to increase in efficiency of the communication system based on these codes. On the other hand, employing the complex chaotic sequences as spreading sequence also has been discussed in several papers. In this paper, use of two-dimensional Bernoulli chaotic sequences is proposed as spreading codes. The performance of a multi-user synchronous and asynchronous DS-CDMA system will be evaluated by applying these sequences under Additive White Gaussian Noise (AWGN) and fading channel. Simulation results indicate improvement of the performance in comparison with conventional spreading codes like Gold codes as well as similar complex chaotic spreading sequences. Similar to one-dimensional Bernoulli chaotic sequences, the proposed sequences also have negative auto-correlation. Besides, construction of complex sequences with lower average cross-correlation is possible with the proposed method.
Rapid and Accurate Sequencing of Enterovirus Genomes Using MinION Nanopore Sequencer.
Wang, Ji; Ke, Yue Hua; Zhang, Yong; Huang, Ke Qiang; Wang, Lei; Shen, Xin Xin; Dong, Xiao Ping; Xu, Wen Bo; Ma, Xue Jun
2017-10-01
Knowledge of an enterovirus genome sequence is very important in epidemiological investigation to identify transmission patterns and ascertain the extent of an outbreak. The MinION sequencer is increasingly used to sequence various viral pathogens in many clinical situations because of its long reads, portability, real-time accessibility of sequenced data, and very low initial costs. However, information is lacking on MinION sequencing of enterovirus genomes. In this proof-of-concept study using Enterovirus 71 (EV71) and Coxsackievirus A16 (CA16) strains as examples, we established an amplicon-based whole genome sequencing method using MinION. We explored the accuracy, minimum sequencing time, discrimination and high-throughput sequencing ability of MinION, and compared its performance with Sanger sequencing. Within the first minute (min) of sequencing, the accuracy of MinION was 98.5% for the single EV71 strain and 94.12%-97.33% for 10 genetically-related CA16 strains. In as little as 14 min, 99% identity was reached for the single EV71 strain, and in 17 min (on average), 99% identity was achieved for 10 CA16 strains in a single run. MinION is suitable for whole genome sequencing of enteroviruses with sufficient accuracy and fine discrimination and has the potential as a fast, reliable and convenient method for routine use. Copyright © 2017 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.
Rapid identification of sequences for orphan enzymes to power accurate protein annotation.
Ramkissoon, Kevin R; Miller, Jennifer K; Ojha, Sunil; Watson, Douglas S; Bomar, Martha G; Galande, Amit K; Shearer, Alexander G
2013-01-01
The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.
Comparison of the quality of different magnetic resonance image sequences of multiple myeloma.
Sun, Zhao-yong; Zhang, Hai-bo; Li, Shuo; Wang, Yun; Xue, Hua-dan; Jin, Zheng-yu
2015-02-01
To compare the image quality of T1WI fat phase,T1WI water phase, short time inversion recovery (STIR) sequence, and diffusion weighted imaging (DWI) sequence in the evaluation of multiple myeloma (MM). Totally 20MM patients were enrolled in this study. All patients underwent scanning at coronal T1WI fat phase, coronal T1WI water phase, coronal STIR sequence, and axial DWI sequence. The image quality of the four different sequences was evaluated. The image was divided into seven sections(head and neck, chest, abdomen, pelvis, thigh, leg, and foot), and the signal-to-noise ratio (SNR) of each section was measured at 7 segments (skull, spine, pelvis, humerus, femur, tibia and fibula and ribs) were measured. In addition, 20 active MM lesions were selected, and the contrast-to-noise ratio (CNR) of each scan sequence was calculated. The average image quality scores of T1WI fat phase,T1WI water phase, STIR sequence, and DWI sequence were 4.19 ± 0.70,4.16 ± 0.73,3.89 ± 0.70, and 3.76 ± 0.68, respectively. The image quality at T1-fat phase and T1-water phase were significantly higher than those at STIR (P=0.000 and P=0.001) and DWI sequence (both P=0.000); however, there was no significant difference between T1-fat and T1-water phase (P=0.723)and between STIR and DWI sequence (P=0.167). The SNR of T1WI fat phase was significantly higher than those of the other three sequences (all P=0.000), and there was no significant difference among the other three sequences (all P>0.05). Although the CNR of DWI sequences was slightly higher than those of the other three sequences,there was no significant difference among all of them (all P>0.05). Imaging at T1WI fat phase,T1WI water phase, STIR sequence, and DWI sequence has certain advantages,and they should be combined in the diagnosis of MM.
Evaluating the protein coding potential of exonized transposable element sequences
Piriyapongsa, Jittima; Rutledge, Mark T; Patel, Sanil; Borodovsky, Mark; Jordan, I King
2007-01-01
Background Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. Results We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences. Conclusion The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence. Reviewers: This article was reviewed by Itai Yanai, Kateryna D. Makova, Melissa Wilson (nominated by Kateryna D. Makova) and Cedric Feschotte (nominated by John M. Logsdon Jr.). PMID:18036258
In silico Analysis of 2085 Clones from a Normalized Rat Vestibular Periphery 3′ cDNA Library
Roche, Joseph P.; Cioffi, Joseph A.; Kwitek, Anne E.; Erbe, Christy B.; Popper, Paul
2005-01-01
The inserts from 2400 cDNA clones isolated from a normalized Rattus norvegicus vestibular periphery cDNA library were sequenced and characterized. The Wackym-Soares vestibular 3′ cDNA library was constructed from the saccular and utricular maculae, the ampullae of all three semicircular canals and Scarpa's ganglia containing the somata of the primary afferent neurons, microdissected from 104 male and female rats. The inserts from 2400 randomly selected clones were sequenced from the 5′ end. Each sequence was analyzed using the BLAST algorithm compared to the Genbank nonredundant, rat genome, mouse genome and human genome databases to search for high homology alignments. Of the initial 2400 clones, 315 (13%) were found to be of poor quality and did not yield useful information, and therefore were eliminated from the analysis. Of the remaining 2085 sequences, 918 (44%) were found to represent 758 unique genes having useful annotations that were identified in databases within the public domain or in the published literature; these sequences were designated as known characterized sequences. 1141 sequences (55%) aligned with 1011 unique sequences had no useful annotations and were designated as known but uncharacterized sequences. Of the remaining 26 sequences (1%), 24 aligned with rat genomic sequences, but none matched previously described rat expressed sequence tags or mRNAs. No significant alignment to the rat or human genomic sequences could be found for the remaining 2 sequences. Of the 2085 sequences analyzed, 86% were singletons. The known, characterized sequences were analyzed with the FatiGO online data-mining tool (http://fatigo.bioinfo.cnio.es/) to identify level 5 biological process gene ontology (GO) terms for each alignment and to group alignments with similar or identical GO terms. Numerous genes were identified that have not been previously shown to be expressed in the vestibular system. Further characterization of the novel cDNA sequences may lead to the identification of genes with vestibular-specific functions. Continued analysis of the rat vestibular periphery transcriptome should provide new insights into vestibular function and generate new hypotheses. Physiological studies are necessary to further elucidate the roles of the identified genes and novel sequences in vestibular function. PMID:16103642
2012-01-01
Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence-based methods. Conclusions Appropriate homologous sequences are selected automatically and objectively by the index. Such sequence selection improved the performance of functional region prediction. As far as we know, this is the first approach in which spatial statistics have been applied to protein analyses. Such integration of structure and sequence information would be useful for other bioinformatics problems. PMID:22643026
Staňková, Helena; Hastie, Alex R; Chan, Saki; Vrána, Jan; Tulpová, Zuzana; Kubaláková, Marie; Visendi, Paul; Hayashi, Satomi; Luo, Mingcheng; Batley, Jacqueline; Edwards, David; Doležel, Jaroslav; Šimková, Hana
2016-07-01
The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC-by-BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high-resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high-resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome-scale analysis of repetitive sequences and revealed a ~800-kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone-by-clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC-contig physical map and validate sequence assembly on a chromosome-arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome-by-chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Henninger, B; Zoller, H; Rauch, S; Schocke, M; Kannengiesser, S; Zhong, X; Reiter, G; Jaschke, W; Kremser, C
2015-05-01
To evaluate the automated two-point Dixon screening sequence for the detection and estimated quantification of hepatic iron and fat compared with standard sequences as a reference. One hundred and two patients with suspected diffuse liver disease were included in this prospective study. The following MRI protocol was used: 3D-T1-weighted opposed- and in-phase gradient echo with two-point Dixon reconstruction and dual-ratio signal discrimination algorithm ("screening" sequence); fat-saturated, multi-gradient-echo sequence with 12 echoes; gradient-echo T1 FLASH opposed- and in-phase. Bland-Altman plots were generated and correlation coefficients were calculated to compare the sequences. The screening sequence diagnosed fat in 33, iron in 35 and a combination of both in 4 patients. Correlation between R2* values of the screening sequence and the standard relaxometry was excellent (r = 0.988). A slightly lower correlation (r = 0.978) was found between the fat fraction of the screening sequence and the standard sequence. Bland-Altman revealed systematically lower R2* values obtained from the screening sequence and higher fat fraction values obtained with the standard sequence with a rather high variability in agreement. The screening sequence is a promising method with fast diagnosis of the predominant liver disease. It is capable of estimating the amount of hepatic fat and iron comparable to standard methods. • MRI plays a major role in the clarification of diffuse liver disease. • The screening sequence was introduced for the assessment of diffuse liver disease. • It is a fast and automated algorithm for the evaluation of hepatic iron and fat. • It is capable of estimating the amount of hepatic fat and iron.
Probabilistic topic modeling for the analysis and classification of genomic sequences
2015-01-01
Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734
Kröber, Magdalena; Bekel, Thomas; Diaz, Naryttza N; Goesmann, Alexander; Jaenicke, Sebastian; Krause, Lutz; Miller, Dimitri; Runte, Kai J; Viehöver, Prisca; Pühler, Alfred; Schlüter, Andreas
2009-06-01
The phylogenetic structure of the microbial community residing in a fermentation sample from a production-scale biogas plant fed with maize silage, green rye and liquid manure was analysed by an integrated approach using clone library sequences and metagenome sequence data obtained by 454-pyrosequencing. Sequencing of 109 clones from a bacterial and an archaeal 16S-rDNA amplicon library revealed that the obtained nucleotide sequences are similar but not identical to 16S-rDNA database sequences derived from different anaerobic environments including digestors and bioreactors. Most of the bacterial 16S-rDNA sequences could be assigned to the phylum Firmicutes with the most abundant class Clostridia and to the class Bacteroidetes, whereas most archaeal 16S-rDNA sequences cluster close to the methanogen Methanoculleus bourgensis. Further sequences of the archaeal library most probably represent so far non-characterised species within the genus Methanoculleus. A similar result derived from phylogenetic analysis of mcrA clone sequences. The mcrA gene product encodes the alpha-subunit of methyl-coenzyme-M reductase involved in the final step of methanogenesis. BLASTn analysis applying stringent settings resulted in assignment of 16S-rDNA metagenome sequence reads to 62 16S-rDNA amplicon sequences thus enabling frequency of abundance estimations for 16S-rDNA clone library sequences. Ribosomal Database Project (RDP) Classifier processing of metagenome 16S-rDNA reads revealed abundance of the phyla Firmicutes, Bacteroidetes and Euryarchaeota and the orders Clostridiales, Bacteroidales and Methanomicrobiales. Moreover, a large fraction of 16S-rDNA metagenome reads could not be assigned to lower taxonomic ranks, demonstrating that numerous microorganisms in the analysed fermentation sample of the biogas plant are still unclassified or unknown.
How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.
Tian, Pengfei; Best, Robert B
2017-10-17
Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.
Zienius, D; Lelešius, R; Kavaliauskis, H; Stankevičius, A; Šalomskas, A
2016-01-01
The aim of the present study was to detect canine parvovirus (CPV) from faecal samples of clinically ill domestic dogs by polymerase chain reaction (PCR) followed by VP2 gene partial sequencing and molecular characterization of circulating strains in Lithuania. Eleven clinically and antigen-tested positive dog faecal samples, collected during the period of 2014-2015, were investigated by using PCR. The phylogenetic investigations indicated that the Lithuanian CPV VP2 partial sequences (3025-3706 cds) were closely related and showed 99.0-99.9% identity. All Lithuanian sequences were associated with one phylogroup, but grouped in different clusters. Ten of investigated Lithuanian CPV VP2 sequences were closely associated with CPV 2a antigenic variant (99.4% nt identity). Five CPV VP2 sequences from Lithuania were related to CPV-2a, but were rather divergent (6.8 nt differences). Only one CPV VP2 sequence from Lithuania was associated (99.3% nt identity) with CPV-2b VP2 sequences from France, Italy, USA and Korea. The four of eleven investigated Lithuanian dogs with CPV infection symptoms were vaccinated with CPV-2 vaccine, but their VP2 sequences were phylogenetically distantly associated with CPV vaccine strains VP2 sequences (11.5-15.8 nt differences). Ten Lithuanian CPV VP2 sequences had monophyletic relations among the close geographically associated samples, but five of them were rather divergent (1.0% less sequence similarity). The one Lithuanian CPV VP2 sequence was closely related with CPV-2b antigenic variant. All the Lithuanian CPV VP2 partial sequences were conservative and phylogenetically low associated with most commonly used CPV vaccine strains.
Li, Man; Ling, Cheng; Xu, Qi; Gao, Jingyang
2018-02-01
Sequence classification is crucial in predicting the function of newly discovered sequences. In recent years, the prediction of the incremental large-scale and diversity of sequences has heavily relied on the involvement of machine-learning algorithms. To improve prediction accuracy, these algorithms must confront the key challenge of extracting valuable features. In this work, we propose a feature-enhanced protein classification approach, considering the rich generation of multiple sequence alignment algorithms, N-gram probabilistic language model and the deep learning technique. The essence behind the proposed method is that if each group of sequences can be represented by one feature sequence, composed of homologous sites, there should be less loss when the sequence is rebuilt, when a more relevant sequence is added to the group. On the basis of this consideration, the prediction becomes whether a query sequence belonging to a group of sequences can be transferred to calculate the probability that the new feature sequence evolves from the original one. The proposed work focuses on the hierarchical classification of G-protein Coupled Receptors (GPCRs), which begins by extracting the feature sequences from the multiple sequence alignment results of the GPCRs sub-subfamilies. The N-gram model is then applied to construct the input vectors. Finally, these vectors are imported into a convolutional neural network to make a prediction. The experimental results elucidate that the proposed method provides significant performance improvements. The classification error rate of the proposed method is reduced by at least 4.67% (family level I) and 5.75% (family Level II), in comparison with the current state-of-the-art methods. The implementation program of the proposed work is freely available at: https://github.com/alanFchina/CNN .
Yohda, Masafumi; Yagi, Osami; Takechi, Ayane; Kitajima, Mizuki; Matsuda, Hisashi; Miyamura, Naoaki; Aizawa, Tomoko; Nakajima, Mutsuyasu; Sunairi, Michio; Daiba, Akito; Miyajima, Takashi; Teruya, Morimi; Teruya, Kuniko; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Juan, Ayaka; Nakano, Kazuma; Aoyama, Misako; Terabayashi, Yasunobu; Satou, Kazuhito; Hirano, Takashi
2015-07-01
A Dehalococcoides-containing bacterial consortium that performed dechlorination of 0.20 mM cis-1,2-dichloroethene to ethene in 14 days was obtained from the sediment mud of the lotus field. To obtain detailed information of the consortium, the metagenome was analyzed using the short-read next-generation sequencer SOLiD 3. Matching the obtained sequence tags with the reference genome sequences indicated that the Dehalococcoides sp. in the consortium was highly homologous to Dehalococcoides mccartyi CBDB1 and BAV1. Sequence comparison with the reference sequence constructed from 16S rRNA gene sequences in a public database showed the presence of Sedimentibacter, Sulfurospirillum, Clostridium, Desulfovibrio, Parabacteroides, Alistipes, Eubacterium, Peptostreptococcus and Proteocatella in addition to Dehalococcoides sp. After further enrichment, the members of the consortium were narrowed down to almost three species. Finally, the full-length circular genome sequence of the Dehalococcoides sp. in the consortium, D. mccartyi IBARAKI, was determined by analyzing the metagenome with the single-molecule DNA sequencer PacBio RS. The accuracy of the sequence was confirmed by matching it to the tag sequences obtained by SOLiD 3. The genome is 1,451,062 nt and the number of CDS is 1566, which includes 3 rRNA genes and 47 tRNA genes. There exist twenty-eight RDase genes that are accompanied by the genes for anchor proteins. The genome exhibits significant sequence identity with other Dehalococcoides spp. throughout the genome, but there exists significant difference in the distribution RDase genes. The combination of a short-read next-generation DNA sequencer and a long-read single-molecule DNA sequencer gives detailed information of a bacterial consortium. Copyright © 2014 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.
Hazes, Bart
2014-02-28
Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.
PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities.
Troshin, Peter V; Postis, Vincent Lg; Ashworth, Denise; Baldwin, Stephen A; McPherson, Michael J; Barton, Geoffrey J
2011-03-07
Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.
Nanopore Technology: A Simple, Inexpensive, Futuristic Technology for DNA Sequencing.
Gupta, P D
2016-10-01
In health care, importance of DNA sequencing has been fully established. Sanger's Capillary Electrophoresis DNA sequencing methodology is time consuming, cumbersome, hence become more expensive. Lately, because of its versatility DNA sequencing became house hold name, and therefore, there is an urgent need of simple, fast, inexpensive, DNA sequencing technology. In the beginning of this century efforts were made, and Nanopore DNA sequencing technology was developed; still it is infancy, nevertheless, it is the futuristic technology.
Computation of repetitions and regularities of biologically weighted sequences.
Christodoulakis, M; Iliopoulos, C; Mouchard, L; Perdikuri, K; Tsakalidis, A; Tsichlas, K
2006-01-01
Biological weighted sequences are used extensively in molecular biology as profiles for protein families, in the representation of binding sites and often for the representation of sequences produced by a shotgun sequencing strategy. In this paper, we address three fundamental problems in the area of biologically weighted sequences: (i) computation of repetitions, (ii) pattern matching, and (iii) computation of regularities. Our algorithms can be used as basic building blocks for more sophisticated algorithms applied on weighted sequences.
Morozumi, Takeya; Toki, Daisuke; Eguchi-Ogawa, Tomoko; Uenishi, Hirohide
2011-09-01
Large-scale cDNA-sequencing projects require an efficient strategy for mass sequencing. Here we describe a method for sequencing pooled cDNA clones using a combination of transposon insertion and Gateway technology. Our method reduces the number of shotgun clones that are unsuitable for reconstruction of cDNA sequences, and has the advantage of reducing the total costs of the sequencing project.
Effects of informed consent for individual genome sequencing on relevant knowledge.
Kaphingst, K A; Facio, F M; Cheng, M-R; Brooks, S; Eidem, H; Linn, A; Biesecker, B B; Biesecker, L G
2012-11-01
Increasing availability of individual genomic information suggests that patients will need knowledge about genome sequencing to make informed decisions, but prior research is limited. In this study, we examined genome sequencing knowledge before and after informed consent among 311 participants enrolled in the ClinSeq™ sequencing study. An exploratory factor analysis of knowledge items yielded two factors (sequencing limitations knowledge; sequencing benefits knowledge). In multivariable analysis, high pre-consent sequencing limitations knowledge scores were significantly related to education [odds ratio (OR): 8.7, 95% confidence interval (CI): 2.45-31.10 for post-graduate education, and OR: 3.9; 95% CI: 1.05, 14.61 for college degree compared with less than college degree] and race/ethnicity (OR: 2.4, 95% CI: 1.09, 5.38 for non-Hispanic Whites compared with other racial/ethnic groups). Mean values increased significantly between pre- and post-consent for the sequencing limitations knowledge subscale (6.9-7.7, p < 0.0001) and sequencing benefits knowledge subscale (7.0-7.5, p < 0.0001); increase in knowledge did not differ by sociodemographic characteristics. This study highlights gaps in genome sequencing knowledge and underscores the need to target educational efforts toward participants with less education or from minority racial/ethnic groups. The informed consent process improved genome sequencing knowledge. Future studies could examine how genome sequencing knowledge influences informed decision making. © 2012 John Wiley & Sons A/S.
Wang, Haibin; Jiang, Jiafu; Chen, Sumei; Qi, Xiangyu; Peng, Hui; Li, Pirui; Song, Aiping; Guan, Zhiyong; Fang, Weimin; Liao, Yuan; Chen, Fadi
2013-01-01
Background Simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST) sequences have been acquired to date, so the number of available EST-SSR markers is very low. Methodology/Principal Findings Illumina paired-end sequencing technology produced over 53 million sequencing reads from C. nankingense mRNA. The subsequent de novo assembly yielded 70,895 unigenes, of which 45,789 (64.59%) unigenes showed similarity to the sequences in NCBI database. Out of 45,789 sequences, 107 have hits to the Chrysanthemum Nr protein database; 679 and 277 sequences have hits to the database of Helianthus and Lactuca species, respectively. MISA software identified a large number of putative EST-SSRs, allowing 1,788 primer pairs to be designed from the de novo transcriptome sequence and a further 363 from archival EST sequence. Among 100 primer pairs randomly chosen, 81 markers have amplicons and 20 are polymorphic for genotypes analysis in Chrysanthemum. The results showed that most (but not all) of the assays were transferable across species and that they exposed a significant amount of allelic diversity. Conclusions/Significance SSR markers acquired by transcriptome sequencing are potentially useful for marker-assisted breeding and genetic analysis in the genus Chrysanthemum and its related genera. PMID:23626799
Learning of goal-relevant and -irrelevant complex visual sequences in human V1.
Rosenthal, Clive R; Mallik, Indira; Caballero-Gaudes, Cesar; Sereno, Martin I; Soto, David
2018-06-12
Learning and memory are supported by a network involving the medial temporal lobe and linked neocortical regions. Emerging evidence indicates that primary visual cortex (i.e., V1) may contribute to recognition memory, but this has been tested only with a single visuospatial sequence as the target memorandum. The present study used functional magnetic resonance imaging to investigate whether human V1 can support the learning of multiple, concurrent complex visual sequences involving discontinous (second-order) associations. Two peripheral, goal-irrelevant but structured sequences of orientated gratings appeared simultaneously in fixed locations of the right and left visual fields alongside a central, goal-relevant sequence that was in the focus of spatial attention. Pseudorandom sequences were introduced at multiple intervals during the presentation of the three structured visual sequences to provide an online measure of sequence-specific knowledge at each retinotopic location. We found that a network involving the precuneus and V1 was involved in learning the structured sequence presented at central fixation, whereas right V1 was modulated by repeated exposure to the concurrent structured sequence presented in the left visual field. The same result was not found in left V1. These results indicate for the first time that human V1 can support the learning of multiple concurrent sequences involving complex discontinuous inter-item associations, even peripheral sequences that are goal-irrelevant. Copyright © 2018. Published by Elsevier Inc.
Biological sequence compression algorithms.
Matsumoto, T; Sadakane, K; Imai, H
2000-01-01
Today, more and more DNA sequences are becoming available. The information about DNA sequences are stored in molecular biology databases. The size and importance of these databases will be bigger and bigger in the future, therefore this information must be stored or communicated efficiently. Furthermore, sequence compression can be used to define similarities between biological sequences. The standard compression algorithms such as gzip or compress cannot compress DNA sequences, but only expand them in size. On the other hand, CTW (Context Tree Weighting Method) can compress DNA sequences less than two bits per symbol. These algorithms do not use special structures of biological sequences. Two characteristic structures of DNA sequences are known. One is called palindromes or reverse complements and the other structure is approximate repeats. Several specific algorithms for DNA sequences that use these structures can compress them less than two bits per symbol. In this paper, we improve the CTW so that characteristic structures of DNA sequences are available. Before encoding the next symbol, the algorithm searches an approximate repeat and palindrome using hash and dynamic programming. If there is a palindrome or an approximate repeat with enough length then our algorithm represents it with length and distance. By using this preprocessing, a new program achieves a little higher compression ratio than that of existing DNA-oriented compression algorithms. We also describe new compression algorithm for protein sequences.
A novel, privacy-preserving cryptographic approach for sharing sequencing data
Cassa, Christopher A; Miller, Rachel A; Mandl, Kenneth D
2013-01-01
Objective DNA samples are often processed and sequenced in facilities external to the point of collection. These samples are routinely labeled with patient identifiers or pseudonyms, allowing for potential linkage to identity and private clinical information if intercepted during transmission. We present a cryptographic scheme to securely transmit externally generated sequence data which does not require any patient identifiers, public key infrastructure, or the transmission of passwords. Materials and methods This novel encryption scheme cryptographically protects participant sequence data using a shared secret key that is derived from a unique subset of an individual’s genetic sequence. This scheme requires access to a subset of an individual’s genetic sequence to acquire full access to the transmitted sequence data, which helps to prevent sample mismatch. Results We validate that the proposed encryption scheme is robust to sequencing errors, population uniqueness, and sibling disambiguation, and provides sufficient cryptographic key space. Discussion Access to a set of an individual’s genotypes and a mutually agreed cryptographic seed is needed to unlock the full sequence, which provides additional sample authentication and authorization security. We present modest fixed and marginal costs to implement this transmission architecture. Conclusions It is possible for genomics researchers who sequence participant samples externally to protect the transmission of sequence data using unique features of an individual’s genetic sequence. PMID:23125421
Zseq: An Approach for Preprocessing Next-Generation Sequencing Data.
Alkhateeb, Abedalrhman; Rueda, Luis
2017-08-01
Next-generation sequencing technology generates a huge number of reads (short sequences), which contain a vast amount of genomic data. The sequencing process, however, comes with artifacts. Preprocessing of sequences is mandatory for further downstream analysis. We present Zseq, a linear method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of unique k-mers in each sequence as its corresponding score and also takes into the account other factors such as ambiguous nucleotides or high GC-content percentage in k-mers. Based on a z-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold. Zseq algorithm is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as de novo assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Moreover, de novo assembled transcripts from the reads filtered by Zseq have longer genomic sequences than other tested methods. Estimating the threshold of the cutoff point is introduced using labeling rules with optimistic results.
Optimal choice of word length when comparing two Markov sequences using a χ 2-statistic.
Bai, Xin; Tang, Kujin; Ren, Jie; Waterman, Michael; Sun, Fengzhu
2017-10-03
Alignment-free sequence comparison using counts of word patterns (grams, k-tuples) has become an active research topic due to the large amount of sequence data from the new sequencing technologies. Genome sequences are frequently modelled by Markov chains and the likelihood ratio test or the corresponding approximate χ 2 -statistic has been suggested to compare two sequences. However, it is not known how to best choose the word length k in such studies. We develop an optimal strategy to choose k by maximizing the statistical power of detecting differences between two sequences. Let the orders of the Markov chains for the two sequences be r 1 and r 2 , respectively. We show through both simulations and theoretical studies that the optimal k= max(r 1 ,r 2 )+1 for both long sequences and next generation sequencing (NGS) read data. The orders of the Markov chains may be unknown and several methods have been developed to estimate the orders of Markov chains based on both long sequences and NGS reads. We study the power loss of the statistics when the estimated orders are used. It is shown that the power loss is minimal for some of the estimators of the orders of Markov chains. Our studies provide guidelines on choosing the optimal word length for the comparison of Markov sequences.
Shih, Arthur Chun-Chieh; Lee, DT; Peng, Chin-Lin; Wu, Yu-Wei
2007-01-01
Background When aligning several hundreds or thousands of sequences, such as epidemic virus sequences or homologous/orthologous sequences of some big gene families, to reconstruct the epidemiological history or their phylogenies, how to analyze and visualize the alignment results of many sequences has become a new challenge for computational biologists. Although there are several tools available for visualization of very long sequence alignments, few of them are applicable to the alignments of many sequences. Results A multiple-logo alignment visualization tool, called Phylo-mLogo, is presented in this paper. Phylo-mLogo calculates the variabilities and homogeneities of alignment sequences by base frequencies or entropies. Different from the traditional representations of sequence logos, Phylo-mLogo not only displays the global logo patterns of the whole alignment of multiple sequences, but also demonstrates their local homologous logos for each clade hierarchically. In addition, Phylo-mLogo also allows the user to focus only on the analysis of some important, structurally or functionally constrained sites in the alignment selected by the user or by built-in automatic calculation. Conclusion With Phylo-mLogo, the user can symbolically and hierarchically visualize hundreds of aligned sequences simultaneously and easily check the changes of their amino acid sites when analyzing many homologous/orthologous or influenza virus sequences. More information of Phylo-mLogo can be found at URL . PMID:17319966
Method and Apparatus for Evaluating the Visual Quality of Processed Digital Video Sequences
NASA Technical Reports Server (NTRS)
Watson, Andrew B. (Inventor)
2002-01-01
A Digital Video Quality (DVQ) apparatus and method that incorporate a model of human visual sensitivity to predict the visibility of artifacts. The DVQ method and apparatus are used for the evaluation of the visual quality of processed digital video sequences and for adaptively controlling the bit rate of the processed digital video sequences without compromising the visual quality. The DVQ apparatus minimizes the required amount of memory and computation. The input to the DVQ apparatus is a pair of color image sequences: an original (R) non-compressed sequence, and a processed (T) sequence. Both sequences (R) and (T) are sampled, cropped, and subjected to color transformations. The sequences are then subjected to blocking and discrete cosine transformation, and the results are transformed to local contrast. The next step is a time filtering operation which implements the human sensitivity to different time frequencies. The results are converted to threshold units by dividing each discrete cosine transform coefficient by its respective visual threshold. At the next stage the two sequences are subtracted to produce an error sequence. The error sequence is subjected to a contrast masking operation, which also depends upon the reference sequence (R). The masked errors can be pooled in various ways to illustrate the perceptual error over various dimensions, and the pooled error can be converted to a visual quality measure.
Kirschbaum, Mark A.; Hettinger, Robert D.
2004-01-01
Facies and sequence-stratigraphic analysis identifies six high-resolution sequences within upper Campanian strata across about 120 miles of the Book Cliffs in western Colorado and eastern Utah. The six sequences are named after prominent sandstone units and include, in ascending order, upper Sego sequence, Neslen sequence, Corcoran sequence, Buck Canyon/lower Cozzette sequence, upper Cozzette sequence, and Cozzette/Rollins sequence. A seventh sequence, the Bluecastle sequence, is present in the extreme western part of the study area. Facies analysis documents deepening- and shallowing- upward successions, parasequence stacking patterns, downlap in subsurface cross sections, facies dislocations, basinward shifts in facies, and truncation of strata.All six sequences display major incision into shoreface deposits of the Sego Sandstone and sandstones of the Corcoran and Cozzette Members of the Mount Garfield Formation. The incised surfaces represent sequence-boundary unconformities that allowed bypass of sediment to lowstand shorelines that are either attached to the older highstand shorelines or are detached from the older highstand shorelines and located southeast of the main study area. The sequence boundary unconformities represent valley incisions that were cut during successive lowstands of relative sea level. The overlying valley-fill deposits generally consist of tidally influenced strata deposited during an overall base level rise. Transgressive surfaces can be traced or projected over, or locally into, estuarine deposits above and landward of their associated shoreface deposits. Maximum flooding surfaces can be traced or projected landward from offshore strata into, or above, coastal-plain deposits. With the exception of the Cozzette/Rollins sequence, the majority of coal-bearing coastal-plain strata was deposited before maximum flooding and is therefore within the transgressive systems tracts. Maximum flooding was followed by strong progradation of parasequences and low preservation potential of coastal-plain strata within the highstand systems tract. The large incised valleys, lack of transgressive retrogradational parasequences, strong progradational nature of highstand parasequences, and low preservation of coastal-plain strata in the highstand systems tracts argue for relatively low accommodation space during deposition of the Sego, Corcoran, and Cozzette sequences. The Buck Canyon/Cozzette and Cozzette/Rollins sequences contrast with other sequences in that the preservation of retrogradational parasequences and the development of large estuaries coincident with maximum flooding indicate a relative increase in accommodation space during deposition of these strata. Following maximum flooding, the Buck Canyon/Cozzette sequence follows the pattern of the other sequences, but the Cozzette/Rollins sequence exhibits a contrasting offlapping pattern with development of offshore clinoforms that downlap and eventually parallel its maximum flooding surface. This highstand systems tract preserves a thick coal-bearing section where the Rollins Sandstone Member of the Mount Garfield Formation parasequences prograde out of the study area, stepping up as much as 800 ft stratigraphically over a distance of about 90 miles. This progradational stacking pattern indicates a higher accommodation space and increased sedimentation rate compared to the previous sequences.
USDA-ARS?s Scientific Manuscript database
The current pig reference genome sequence (Sscrofa10.2) was established using Sanger sequencing and following the clone-by-clone hierarchical shotgun sequencing approach used in the public human genome project. However, as sequence coverage was low (4-6x) the resulting assembly was only of draft qua...
Sequence Matters but How Exactly? A Method for Evaluating Activity Sequences from Data
ERIC Educational Resources Information Center
Doroudi, Shayan; Holstein, Kenneth; Aleven, Vincent; Brunskill, Emma
2016-01-01
How should a wide variety of educational activities be sequenced to maximize student learning? Although some experimental studies have addressed this question, educational data mining methods may be able to evaluate a wider range of possibilities and better handle many simultaneous sequencing constraints. We introduce Sequencing Constraint…
Pure Perceptual-Based Sequence Learning: A Role for Visuospatial Attention
ERIC Educational Resources Information Center
Remillard, Gilbert
2009-01-01
Learning the structure of a sequence of target locations when target location is not the response dimension and the sequence of target locations is uncorrelated with the sequence of responses is called pure perceptual-based sequence learning. The paradigm introduced by G. Remillard (2003) was used to determine whether orienting of visuospatial…
USDA-ARS?s Scientific Manuscript database
Current technologies for next generation sequencing (NGS) have revolutionized metagenomics analysis of clinical samples. One advantage of the NGS platform is the possibility to sequence the genetic material in samples without any prior knowledge of the sequence contained within. Sequence-Independent...
Targeted sequencing of plant genomes
Mark D. Huynh
2014-01-01
Next-generation sequencing (NGS) has revolutionized the field of genetics by providing a means for fast and relatively affordable sequencing. With the advancement of NGS, wholegenome sequencing (WGS) has become more commonplace. However, sequencing an entire genome is still not cost effective or even beneficial in all cases. In studies that do not require a whole-...
USDA-ARS?s Scientific Manuscript database
Polymorphic genetic markers were identified and characterized using a partial genomic library of Heliothis virescens enriched for simple sequence repeats (SSR) and nucleotide sequences of expressed sequence tags (EST). Nucleotide sequences of 192 clones from the partial genomic library yielded 147 u...
A Simulation of DNA Sequencing Utilizing 3M Post-It[R] Notes
ERIC Educational Resources Information Center
Christensen, Doug
2009-01-01
An inexpensive and equipment free approach to teaching the technical aspects of DNA sequencing. The activity described requires an instructor with a familiarity of DNA sequencing technology but provides a straight forward method of teaching the technical aspects of sequencing in the absence of expensive sequencing equipment. The final sequence…
Serial Reaction Time Learning in Preschool- and School-Age Children.
ERIC Educational Resources Information Center
Thomas, Kathleen M.; Nelson, Charles A.
2001-01-01
Two experiments assessed visuomotor sequence learning in 4- to 10-year-olds using a serial reaction time (SRT) task with random and sequenced trials. Found that children demonstrated sequence-specific decreases in RT. Participants with explicit awareness of the sequence at the session's end showed larger sequence-specific RT decrements than…
The Sequencing of Basic Chemistry Topics by Physical Science Teachers
ERIC Educational Resources Information Center
Sibanda, Doras; Hobden, Paul
2016-01-01
The purpose of this study was to find out teachers' preferred teaching sequence for basic chemistry topics in Physical Science in South Africa, to obtain their reasons underpinning their preferred sequence, and to compare these sequences with the prescribed sequences in the current curriculum. The study was located within a pragmatic paradigm and…
Rubin, D A; Dores, R M
1995-06-01
In order to obtain a more resolute phylogeny of teleosts based on growth hormone (GH) sequences, phylogenetic analyses were performed in which deletions (gaps), which appear to be order specific, were upheld to maintain GH's structural information. Sequences were analyzed at 194 amino acid positions. In addition, the two closest genealogically related groups to the teleosts, Amia calva and Acipenser guldenstadti, were used as outgroups. Modified sequence alignments were also analyzed to determine clade stability. Analyses indicated, in the most parsimonious cladogram, that molecular and morphological relationships for the orders of fishes are congruent. With GH molecular sequence data it was possible to resolve all clades at the familial level. Analyses of the primary sequence data indicate that: (a) the halecomorphean and chondrostean GH sequences are the appropriate outgroups for generating the most parsimonious cladogram for teleosts; (b) proper alignment of teleost GH sequence by the inclusion of gaps is necessary for resolution of the Percomorpha; and (c) removal of sequence information by deleting improperly aligned sequence decreases the phylogenetic signal obtained.
Analysis of sequence repeats of proteins in the PDB.
Mary Rajathei, David; Selvaraj, Samuel
2013-12-01
Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20-40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain. Copyright © 2013 Elsevier Ltd. All rights reserved.
Fourment, Mathieu; Gibbs, Mark J
2008-02-05
Viruses of the Bunyaviridae have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. The VirusBanker database contains Bunyaviridae sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. Bunyaviridae sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. VirusBanker allows large datasets of aligned nucleotide and protein sequences from the Bunyaviridae to be compiled and winnowed rapidly using criteria that are formulated heuristically.
Dölz, R; Mossé, M O; Slonimski, P P; Bairoch, A; Linder, P
1994-01-01
We continued our effort to make a comprehensive database (LISTA) for the yeast Saccharomyces cerevisiae. In this database each sequence has been attributed a single genetic name. In the case of duplicated sequences a simple method has been applied to distinguish between sequences of one and the same gene from non-allelic sequences of duplicated genes. If necessary, synonyms are given in the case of allelic duplicated sequences. Thus sequences can be found either by the name or by synonyms given in LISTA. Each entry contains the genetic name, the mnemonic from the EMBL data bank, the codon bias, reference of the publication of the sequence, Chromosomal location as far as known, Swissprot and EMBL accession numbers. To obtain more information on the included sequences, each entry has been screened against non-redundant nucleotide and protein data bank collections resulting in LISTA-HON and LISTA-HOP. The LISTA data base can be linked to the associated data sets or to nucleotide and protein banks by the Sequence Retrieval System (SRS). PMID:7937046
Nolan, Danielle; Carlson, Martha
2016-06-01
Genetic heterogeneity in neurologic disorders has been an obstacle to phenotype-based diagnostic testing. The authors hypothesized that information compiled via whole exome sequencing will improve clinical diagnosis and management of pediatric neurology patients. The authors performed a retrospective chart review of patients evaluated in the University of Michigan Pediatric Neurology clinic between 6/2011 and 6/2015. The authors recorded previous diagnostic testing, indications for whole exome sequencing, and whole exome sequencing results. Whole exome sequencing was recommended for 135 patients and obtained in 53 patients. Insurance barriers often precluded whole exome sequencing. The most common indication for whole exome sequencing was neurodevelopmental disorders. Whole exome sequencing improved the presumptive diagnostic rate in the patient cohort from 25% to 48%. Clinical implications included family planning, medication selection, and systemic investigation. Compared to current second tier testing, whole exome sequencing can result in lower long-term charges and more timely diagnosis. Overcoming barriers related to whole exome sequencing insurance authorization could allow for more efficient and fruitful diagnostic neurological evaluations. © The Author(s) 2016.
Accurate and exact CNV identification from targeted high-throughput sequence data.
Nord, Alex S; Lee, Ming; King, Mary-Claire; Walsh, Tom
2011-04-12
Massively parallel sequencing of barcoded DNA samples significantly increases screening efficiency for clinically important genes. Short read aligners are well suited to single nucleotide and indel detection. However, methods for CNV detection from targeted enrichment are lacking. We present a method combining coverage with map information for the identification of deletions and duplications in targeted sequence data. Sequencing data is first scanned for gains and losses using a comparison of normalized coverage data between samples. CNV calls are confirmed by testing for a signature of sequences that span the CNV breakpoint. With our method, CNVs can be identified regardless of whether breakpoints are within regions targeted for sequencing. For CNVs where at least one breakpoint is within targeted sequence, exact CNV breakpoints can be identified. In a test data set of 96 subjects sequenced across ~1 Mb genomic sequence using multiplexing technology, our method detected mutations as small as 31 bp, predicted quantitative copy count, and had a low false-positive rate. Application of this method allows for identification of gains and losses in targeted sequence data, providing comprehensive mutation screening when combined with a short read aligner.
Extension of the COG and arCOG databases by amino acid and nucleotide sequences
Meereis, Florian; Kaufmann, Michael
2008-01-01
Background The current versions of the COG and arCOG databases, both excellent frameworks for studies in comparative and functional genomics, do not contain the nucleotide sequences corresponding to their protein or protein domain entries. Results Using sequence information obtained from GenBank flat files covering the completely sequenced genomes of the COG and arCOG databases, we constructed NUCOCOG (nucleotide sequences containing COG databases) as an extended version including all nucleotide sequences and in addition the amino acid sequences originally utilized to construct the current COG and arCOG databases. We make available three comprehensive single XML files containing the complete databases including all sequence information. In addition, we provide a web interface as a utility suitable to browse the NUCOCOG database for sequence retrieval. The database is accessible at . Conclusion NUCOCOG offers the possibility to analyze any sequence related property in the context of the COG and arCOG framework simply by using script languages such as PERL applied to a large but single XML document. PMID:19014535
MIPS: a database for genomes and protein sequences.
Mewes, H W; Heumann, K; Kaps, A; Mayer, K; Pfeiffer, F; Stocker, S; Frishman, D
1999-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database. PMID:9847138
Enhanced learning of natural visual sequences in newborn chicks.
Wood, Justin N; Prasad, Aditya; Goldman, Jason G; Wood, Samantha M W
2016-07-01
To what extent are newborn brains designed to operate over natural visual input? To address this question, we used a high-throughput controlled-rearing method to examine whether newborn chicks (Gallus gallus) show enhanced learning of natural visual sequences at the onset of vision. We took the same set of images and grouped them into either natural sequences (i.e., sequences showing different viewpoints of the same real-world object) or unnatural sequences (i.e., sequences showing different images of different real-world objects). When raised in virtual worlds containing natural sequences, newborn chicks developed the ability to recognize familiar images of objects. Conversely, when raised in virtual worlds containing unnatural sequences, newborn chicks' object recognition abilities were severely impaired. In fact, the majority of the chicks raised with the unnatural sequences failed to recognize familiar images of objects despite acquiring over 100 h of visual experience with those images. Thus, newborn chicks show enhanced learning of natural visual sequences at the onset of vision. These results indicate that newborn brains are designed to operate over natural visual input.
Gray, Stacy W; Martins, Yolanda; Feuerman, Lindsay Z; Bernhardt, Barbara A; Biesecker, Barbara B; Christensen, Kurt D; Joffe, Steven; Rini, Christine; Veenstra, David; McGuire, Amy L
2014-10-01
The routine use of genomic sequencing in clinical medicine has the potential to dramatically alter patient care and medical outcomes. To fully understand the psychosocial and behavioral impact of sequencing integration into clinical practice, it is imperative that we identify the factors that influence sequencing-related decision making and patient outcomes. In an effort to develop a collaborative and conceptually grounded approach to studying sequencing adoption, members of the National Human Genome Research Institute's Clinical Sequencing Exploratory Research Consortium formed the Outcomes and Measures Working Group. Here we highlight the priority areas of investigation and psychosocial and behavioral outcomes identified by the Working Group. We also review some of the anticipated challenges to measurement in social and behavioral research related to genomic sequencing; opportunities for instrument development; and the importance of qualitative, quantitative, and mixed-method approaches. This work represents the early, shared efforts of multiple research teams as we strive to understand individuals' experiences with genomic sequencing. The resulting body of knowledge will guide recommendations for the optimal use of sequencing in clinical practice.
Approaches for in silico finishing of microbial genome sequences
Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva
2017-01-01
Abstract The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as “drafts”, incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing. PMID:28898352
Approaches for in silico finishing of microbial genome sequences.
Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva
The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as "drafts", incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing.
NASA Technical Reports Server (NTRS)
Smith, David J.; Burton, Aaron; Castro-Wallace, Sarah; John, Kristen; Stahl, Sarah E.; Dworkin, Jason Peter; Lupisella, Mark L.
2016-01-01
On the International Space Station (ISS), technologies capable of rapid microbial identification and disease diagnostics are not currently available. NASA still relies upon sample return for comprehensive, molecular-based sample characterization. Next-generation DNA sequencing is a powerful approach for identifying microorganisms in air, water, and surfaces onboard spacecraft. The Biomolecule Sequencer payload, manifested to SpaceX-9 and scheduled on the Increment 4748 research plan (June 2016), will assess the functionality of a commercially-available next-generation DNA sequencer in the microgravity environment of ISS. The MinION device from Oxford Nanopore Technologies (Oxford, UK) measures picoamp changes in electrical current dependent on nucleotide sequences of the DNA strand migrating through nanopores in the system. The hardware is exceptionally small (9.5 x 3.2 x 1.6 cm), lightweight (120 grams), and powered only by a USB connection. For the ISS technology demonstration, the Biomolecule Sequencer will be powered by a Microsoft Surface Pro3. Ground-prepared samples containing lambda bacteriophage, Escherichia coli, and mouse genomic DNA, will be launched and stored frozen on the ISS until experiment initiation. Immediately prior to sequencing, a crew member will collect and thaw frozen DNA samples, connect the sequencer to the Surface Pro3, inject thawed samples into a MinION flow cell, and initiate sequencing. At the completion of the sequencing run, data will be downlinked for ground analysis. Identical, synchronous ground controls will be used for data comparisons to determine sequencer functionality, run-time sequence, current dynamics, and overall accuracy. We will present our latest results from the ISS flight experiment the first time DNA has ever been sequenced in space and discuss the many potential applications of the Biomolecule Sequencer for environmental monitoring, medical diagnostics, higher fidelity and more adaptable Space Biology Human Research Program investigations, and even life detection experiments for astrobiology missions.
Korber, B T; Kunstman, K J; Patterson, B K; Furtado, M; McEvilly, M M; Levy, R; Wolinsky, S M
1994-01-01
Human immunodeficiency virus type 1 (HIV-1) sequences were generated from blood and from brain tissue obtained by stereotactic biopsy from six patients undergoing a diagnostic neurosurgical procedure. Proviral DNA was directly amplified by nested PCR, and 8 to 36 clones from each sample were sequenced. Phylogenetic analysis of intrapatient envelope V3-V5 region HIV-1 DNA sequence sets revealed that brain viral sequences were clustered relative to the blood viral sequences, suggestive of tissue-specific compartmentalization of the virus in four of the six cases. In the other two cases, the blood and brain virus sequences were intermingled in the phylogenetic analyses, suggesting trafficking of virus between the two tissues. Slide-based PCR-driven in situ hybridization of two of the patients' brain biopsy samples confirmed our interpretation of the intrapatient phylogenetic analyses. Interpatient V3 region brain-derived sequence distances were significantly less than blood-derived sequence distances. Relative to the tip of the loop, the set of brain-derived viral sequences had a tendency towards negative or neutral charge compared with the set of blood-derived viral sequences. Entropy calculations were used as a measure of the variability at each position in alignments of blood and brain viral sequences. A relatively conserved set of positions were found, with a significantly lower entropy in the brain-than in the blood-derived viral sequences. These sites constitute a brain "signature pattern," or a noncontiguous set of amino acids in the V3 region conserved in viral sequences derived from brain tissue. This brain-derived signature pattern was also well preserved among isolates previously characterized in vitro as macrophage tropic. Macrophage-monocyte tropism may be the biological constraint that results in the conservation of the viral brain signature pattern. Images PMID:7933130
Siebert, Stefan; Robinson, Mark D; Tintori, Sophia C; Goetz, Freya; Helm, Rebecca R; Smith, Stephen A; Shaner, Nathan; Haddock, Steven H D; Dunn, Casey W
2011-01-01
We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing.
Siebert, Stefan; Robinson, Mark D.; Tintori, Sophia C.; Goetz, Freya; Helm, Rebecca R.; Smith, Stephen A.; Shaner, Nathan; Haddock, Steven H. D.; Dunn, Casey W.
2011-01-01
We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing. PMID:21829563
Anwar, R; Booth, A; Churchill, A J; Markham, A F
1996-01-01
The determination of nucleotide sequence is fundamental to the identification and molecular analysis of genes. Direct sequencing of PCR products is now becoming a commonplace procedure for haplotype analysis, and for defining mutations and polymorphism within genes, particularly for diagnostic purposes. A previously unrecognised phenomenon, primer related variability, observed in sequence data generated using Taq cycle sequencing and T7 Sequenase sequencing, is reported. This suggests that caution is necessary when interpreting DNA sequence data. This is particularly important in situations where treatment may be dependent on the accuracy of the molecular diagnosis. Images PMID:16696096
Inaugural Genomics Automation Congress and the coming deluge of sequencing data.
Creighton, Chad J
2010-10-01
Presentations at Select Biosciences's first 'Genomics Automation Congress' (Boston, MA, USA) in 2010 focused on next-generation sequencing and the platforms and methodology around them. The meeting provided an overview of sequencing technologies, both new and emerging. Speakers shared their recent work on applying sequencing to profile cells for various levels of biomolecular complexity, including DNA sequences, DNA copy, DNA methylation, mRNA and microRNA. With sequencing time and costs continuing to drop dramatically, a virtual explosion of very large sequencing datasets is at hand, which will probably present challenges and opportunities for high-level data analysis and interpretation, as well as for information technology infrastructure.
Advances in high throughput DNA sequence data compression.
Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz
2016-06-01
Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.
Sequence search on a supercomputer.
Gotoh, O; Tagashira, Y
1986-01-10
A set of programs was developed for searching nucleic acid and protein sequence data bases for sequences similar to a given sequence. The programs, written in FORTRAN 77, were optimized for vector processing on a Hitachi S810-20 supercomputer. A search of a 500-residue protein sequence against the entire PIR data base Ver. 1.0 (1) (0.5 M residues) is carried out in a CPU time of 45 sec. About 4 min is required for an exhaustive search of a 1500-base nucleotide sequence against all mammalian sequences (1.2M bases) in Genbank Ver. 29.0. The CPU time is reduced to about a quarter with a faster version.
Acquisition of New DNA Sequences After Infection of Chicken Cells with Avian Myeloblastosis Virus
Shoyab, M.; Baluda, M. A.; Evans, R.
1974-01-01
DNA-RNA hybridization studies between 70S RNA from avian myeloblastosis virus (AMV) and an excess of DNA from (i) AMV-induced leukemic chicken myeloblasts or (ii) a mixture of normal and of congenitally infected K-137 chicken embryos producing avian leukosis viruses revealed the presence of fast- and slow-hybridizing virus-specific DNA sequences. However, the leukemic cells contained twice the level of AMV-specific DNA sequences observed in normal chicken embryonic cells. The fast-reacting sequences were two to three times more numerous in leukemic DNA than in DNA from the mixed embryos. The slow-reacting sequences had a reiteration frequency of approximately 9 and 6, in the two respective systems. Both the fast- and the slow-reacting DNA sequences in leukemic cells exhibited a higher Tm (2 C) than the respective DNA sequences in normal cells. In normal and leukemic cells the slow hybrid sequences appeared to have a Tm which was 2 C higher than that of the fast hybrid sequences. Individual non-virus-producing chicken embryos, either group-specific antigen positive or negative, contained 40 to 100 copies of the fast sequences and 2 to 6 copies of the slowly hybridizing sequences per cell genome. Normal rat cells did not contain DNA that hybridized with AMV RNA, whereas non-virus-producing rat cells transformed by B-77 avian sarcoma virus contained only the slowly reacting sequences. The results demonstrate that leukemic cells transformed by AMV contain new AMV-specific DNA sequences which were not present before infection. PMID:16789139
Fredlake, Christopher P; Hert, Daniel G; Kan, Cheuk-Wai; Chiesl, Thomas N; Root, Brian E; Forster, Ryan E; Barron, Annelise E
2008-01-15
To realize the immense potential of large-scale genomic sequencing after the completion of the second human genome (Venter's), the costs for the complete sequencing of additional genomes must be dramatically reduced. Among the technologies being developed to reduce sequencing costs, microchip electrophoresis is the only new technology ready to produce the long reads most suitable for the de novo sequencing and assembly of large and complex genomes. Compared with the current paradigm of capillary electrophoresis, microchip systems promise to reduce sequencing costs dramatically by increasing throughput, reducing reagent consumption, and integrating the many steps of the sequencing pipeline onto a single platform. Although capillary-based systems require approximately 70 min to deliver approximately 650 bases of contiguous sequence, we report sequencing up to 600 bases in just 6.5 min by microchip electrophoresis with a unique polymer matrix/adsorbed polymer wall coating combination. This represents a two-thirds reduction in sequencing time over any previously published chip sequencing result, with comparable read length and sequence quality. We hypothesize that these ultrafast long reads on chips can be achieved because the combined polymer system engenders a recently discovered "hybrid" mechanism of DNA electromigration, in which DNA molecules alternate rapidly between repeating through the intact polymer network and disrupting network entanglements to drag polymers through the solution, similar to dsDNA dynamics we observe in single-molecule DNA imaging studies. Most importantly, these results reveal the surprisingly powerful ability of microchip electrophoresis to provide ultrafast Sanger sequencing, which will translate to increased system throughput and reduced costs.
Fredlake, Christopher P.; Hert, Daniel G.; Kan, Cheuk-Wai; Chiesl, Thomas N.; Root, Brian E.; Forster, Ryan E.; Barron, Annelise E.
2008-01-01
To realize the immense potential of large-scale genomic sequencing after the completion of the second human genome (Venter's), the costs for the complete sequencing of additional genomes must be dramatically reduced. Among the technologies being developed to reduce sequencing costs, microchip electrophoresis is the only new technology ready to produce the long reads most suitable for the de novo sequencing and assembly of large and complex genomes. Compared with the current paradigm of capillary electrophoresis, microchip systems promise to reduce sequencing costs dramatically by increasing throughput, reducing reagent consumption, and integrating the many steps of the sequencing pipeline onto a single platform. Although capillary-based systems require ≈70 min to deliver ≈650 bases of contiguous sequence, we report sequencing up to 600 bases in just 6.5 min by microchip electrophoresis with a unique polymer matrix/adsorbed polymer wall coating combination. This represents a two-thirds reduction in sequencing time over any previously published chip sequencing result, with comparable read length and sequence quality. We hypothesize that these ultrafast long reads on chips can be achieved because the combined polymer system engenders a recently discovered “hybrid” mechanism of DNA electromigration, in which DNA molecules alternate rapidly between reptating through the intact polymer network and disrupting network entanglements to drag polymers through the solution, similar to dsDNA dynamics we observe in single-molecule DNA imaging studies. Most importantly, these results reveal the surprisingly powerful ability of microchip electrophoresis to provide ultrafast Sanger sequencing, which will translate to increased system throughput and reduced costs. PMID:18184818
Deng, Yi-Mo; Spirason, Natalie; Iannello, Pina; Jelley, Lauren; Lau, Hilda; Barr, Ian G
2015-07-01
Full genome sequencing of influenza A viruses (IAV), including those that arise from annual influenza epidemics, is undertaken to determine if reassorting has occurred or if other pathogenic traits are present. Traditionally IAV sequencing has been biased toward the major surface glycoproteins haemagglutinin and neuraminidase, while the internal genes are often ignored. Despite the development of next generation sequencing (NGS), many laboratories are still reliant on conventional Sanger sequencing to sequence IAV. To develop a minimal and robust set of primers for Sanger sequencing of the full genome of IAV currently circulating in humans. A set of 13 primer pairs was designed that enabled amplification of the six internal genes of multiple human IAV subtypes including the recent avian influenza A(H7N9) virus from China. Specific primers were designed to amplify the HA and NA genes of each IAV subtype of interest. Each of the primers also incorporated a binding site at its 5'-end for either a forward or reverse M13 primer, such that only two M13 primers were required for all subsequent sequencing reactions. This minimal set of primers was suitable for sequencing the six internal genes of all currently circulating human seasonal influenza A subtypes as well as the avian A(H7N9) viruses that have infected humans in China. This streamlined Sanger sequencing protocol could be used to generate full genome sequence data more rapidly and easily than existing influenza genome sequencing protocols. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Optical Processing Techniques For Pseudorandom Sequence Prediction
NASA Astrophysics Data System (ADS)
Gustafson, Steven C.
1983-11-01
Pseudorandom sequences are series of apparently random numbers generated, for example, by linear or nonlinear feedback shift registers. An important application of these sequences is in spread spectrum communication systems, in which, for example, the transmitted carrier phase is digitally modulated rapidly and pseudorandomly and in which the information to be transmitted is incorporated as a slow modulation in the pseudorandom sequence. In this case the transmitted information can be extracted only by a receiver that uses for demodulation the same pseudorandom sequence used by the transmitter, and thus this type of communication system has a very high immunity to third-party interference. However, if a third party can predict in real time the probable future course of the transmitted pseudorandom sequence given past samples of this sequence, then interference immunity can be significantly reduced.. In this application effective pseudorandom sequence prediction techniques should be (1) applicable in real time to rapid (e.g., megahertz) sequence generation rates, (2) applicable to both linear and nonlinear pseudorandom sequence generation processes, and (3) applicable to error-prone past sequence samples of limited number and continuity. Certain optical processing techniques that may meet these requirements are discussed in this paper. In particular, techniques based on incoherent optical processors that perform general linear transforms or (more specifically) matrix-vector multiplications are considered. Computer simulation examples are presented which indicate that significant prediction accuracy can be obtained using these transforms for simple pseudorandom sequences. However, the useful prediction of more complex pseudorandom sequences will probably require the application of more sophisticated optical processing techniques.
The Release 6 reference sequence of the Drosophila melanogaster genome
Hoskins, Roger A.; Carlson, Joseph W.; Wan, Kenneth H.; ...
2015-01-14
Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy andmore » middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. In conclusion, further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads.« less
The Release 6 reference sequence of the Drosophila melanogaster genome
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoskins, Roger A.; Carlson, Joseph W.; Wan, Kenneth H.
Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy andmore » middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. In conclusion, further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads.« less
Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq
Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru
2015-01-01
Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593
High-throughput sequence alignment using Graphics Processing Units
Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh
2007-01-01
Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356
Biological function in the twilight zone of sequence conservation.
Ponting, Chris P
2017-08-16
Strong DNA conservation among divergent species is an indicator of enduring functionality. With weaker sequence conservation we enter a vast 'twilight zone' in which sequence subject to transient or lower constraint cannot be distinguished easily from neutrally evolving, non-functional sequence. Twilight zone functional sequence is illuminated instead by principles of selective constraint and positive selection using genomic data acquired from within a species' population. Application of these principles reveals that despite being biochemically active, most twilight zone sequence is not functional.
Bounds on the cross-correlation functions of state m-sequences
NASA Astrophysics Data System (ADS)
Woodcock, C. F.; Davies, Phillip A.; Shaar, Ahmed A.
1987-03-01
Lower and upper bounds on the peaks of the periodic Hamming cross-correlation function for state m-sequences, which are often used in frequency-hopped spread-spectrum systems, are derived. The state position mapped (SPM) sequences of the state m-sequences are described. The use of SPM sequences for OR-channel code division multiplexing is studied. The relation between the Hamming cross-correlation function and the correlation function of SPM sequence is examined. Numerical results which support the theoretical data are presented.
JVM: Java Visual Mapping tool for next generation sequencing read.
Yang, Ye; Liu, Juan
2015-01-01
We developed a program JVM (Java Visual Mapping) for mapping next generation sequencing read to reference sequence. The program is implemented in Java and is designed to deal with millions of short read generated by sequence alignment using the Illumina sequencing technology. It employs seed index strategy and octal encoding operations for sequence alignments. JVM is useful for DNA-Seq, RNA-Seq when dealing with single-end resequencing. JVM is a desktop application, which supports reads capacity from 1 MB to 10 GB.
Cloud, Joann L; Harmsen, Dag; Iwen, Peter C; Dunn, James J; Hall, Gerri; Lasala, Paul Rocco; Hoggan, Karen; Wilson, Deborah; Woods, Gail L; Mellmann, Alexander
2010-04-01
Correct identification of nonfermenting Gram-negative bacilli (NFB) is crucial for patient management. We compared phenotypic identifications of 96 clinical NFB isolates with identifications obtained by 5' 16S rRNA gene sequencing. Sequencing identified 88 isolates (91.7%) with >99% similarity to a sequence from the assigned species; 61.5% of sequencing results were concordant with phenotypic results, indicating the usability of sequencing to identify NFB.
FARME DB: a functional antibiotic resistance element database
Wallace, James C.; Port, Jesse A.; Smith, Marissa N.; Faustman, Elaine M.
2017-01-01
Antibiotic resistance (AR) is a major global public health threat but few resources exist that catalog AR genes outside of a clinical context. Current AR sequence databases are assembled almost exclusively from genomic sequences derived from clinical bacterial isolates and thus do not include many microbial sequences derived from environmental samples that confer resistance in functional metagenomic studies. These environmental metagenomic sequences often show little or no similarity to AR sequences from clinical isolates using standard classification criteria. In addition, existing AR databases provide no information about flanking sequences containing regulatory or mobile genetic elements. To help address this issue, we created an annotated database of DNA and protein sequences derived exclusively from environmental metagenomic sequences showing AR in laboratory experiments. Our Functional Antibiotic Resistant Metagenomic Element (FARME) database is a compilation of publically available DNA sequences and predicted protein sequences conferring AR as well as regulatory elements, mobile genetic elements and predicted proteins flanking antibiotic resistant genes. FARME is the first database to focus on functional metagenomic AR gene elements and provides a resource to better understand AR in the 99% of bacteria which cannot be cultured and the relationship between environmental AR sequences and antibiotic resistant genes derived from cultured isolates. Database URL: http://staff.washington.edu/jwallace/farme PMID:28077567
Sequence-Selective Formation of Synthetic H-Bonded Duplexes
2017-01-01
Oligomers equipped with a sequence of phenol and pyridine N-oxide groups form duplexes via H-bonding interactions between these recognition units. Reductive amination chemistry was used to synthesize all possible 3-mer sequences: AAA, AAD, ADA, DAA, ADD, DAD, DDA, and DDD. Pairwise interactions between the oligomers were investigated using NMR titration and dilution experiments in toluene. The measured association constants vary by 3 orders of magnitude (102 to 105 M–1). Antiparallel sequence-complementary oligomers generally form more stable complexes than mismatched duplexes. Mismatched duplexes that have an excess of H-bond donors are stabilized by the interaction of two phenol donors with one pyridine N-oxide acceptor. Oligomers that have a H-bond donor and acceptor on the ends of the chain can fold to form intramolecular H-bonds in the free state. The 1,3-folding equilibrium competes with duplex formation and lowers the stability of duplexes involving these sequences. As a result, some of the mismatch duplexes are more stable than some of the sequence-complementary duplexes. However, the most stable mismatch duplexes contain DDD and compete with the most stable sequence-complementary duplex, AAA·DDD, so in mixtures that contain all eight sequences, sequence-complementary duplexes dominate. Even higher fidelity sequence selectivity can be achieved if alternating donor–acceptor sequences are avoided. PMID:28857551
Sequencing and assembly of the 22-gb loblolly pine genome.
Zimin, Aleksey; Stevens, Kristian A; Crepeau, Marc W; Holtz-Morris, Ann; Koriabine, Maxim; Marçais, Guillaume; Puiu, Daniela; Roberts, Michael; Wegrzyn, Jill L; de Jong, Pieter J; Neale, David B; Salzberg, Steven L; Yorke, James A; Langley, Charles H
2014-03-01
Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer "super-reads," rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp.
Analysis of xylem formation in pine by cDNA sequencing
NASA Technical Reports Server (NTRS)
Allona, I.; Quinn, M.; Shoop, E.; Swope, K.; St Cyr, S.; Carlis, J.; Riedl, J.; Retzel, E.; Campbell, M. M.; Sederoff, R.;
1998-01-01
Secondary xylem (wood) formation is likely to involve some genes expressed rarely or not at all in herbaceous plants. Moreover, environmental and developmental stimuli influence secondary xylem differentiation, producing morphological and chemical changes in wood. To increase our understanding of xylem formation, and to provide material for comparative analysis of gymnosperm and angiosperm sequences, ESTs were obtained from immature xylem of loblolly pine (Pinus taeda L.). A total of 1,097 single-pass sequences were obtained from 5' ends of cDNAs made from gravistimulated tissue from bent trees. Cluster analysis detected 107 groups of similar sequences, ranging in size from 2 to 20 sequences. A total of 361 sequences fell into these groups, whereas 736 sequences were unique. About 55% of the pine EST sequences show similarity to previously described sequences in public databases. About 10% of the recognized genes encode factors involved in cell wall formation. Sequences similar to cell wall proteins, most known lignin biosynthetic enzymes, and several enzymes of carbohydrate metabolism were found. A number of putative regulatory proteins also are represented. Expression patterns of several of these genes were studied in various tissues and organs of pine. Sequencing novel genes expressed during xylem formation will provide a powerful means of identifying mechanisms controlling this important differentiation pathway.
Aokic, Jun-ya; Kawase, Junya; Hamada, Kazuhisa; Fujimoto, Hiroshi; Yamamoto, Ikki; Usuki, Hironori
2018-01-01
Greater amberjack (Seriola dumerili) is distributed in tropical and temperate waters worldwide and is an important aquaculture fish. We carried out de novo sequencing of the greater amberjack genome to construct a reference genome sequence to identify single nucleotide polymorphisms (SNPs) for breeding amberjack by marker-assisted or gene-assisted selection as well as to identify functional genes for biological traits. We obtained 200 times coverage and constructed a high-quality genome assembly using next generation sequencing technology. The assembled sequences were aligned onto a yellowtail (Seriola quinqueradiata) radiation hybrid (RH) physical map by sequence homology. A total of 215 of the longest amberjack sequences, with a total length of 622.8 Mbp (92% of the total length of the genome scaffolds), were lined up on the yellowtail RH map. We resequenced the whole genomes of 20 greater amberjacks and mapped the resulting sequences onto the reference genome sequence. About 186,000 nonredundant SNPs were successfully ordered on the reference genome. Further, we found differences in the genome structural variations between two greater amberjack populations using BreakDancer. We also analyzed the greater amberjack transcriptome and mapped the annotated sequences onto the reference genome sequence. PMID:29785397
Isolation and characterization of target sequences of the chicken CdxA homeobox gene.
Margalit, Y; Yarus, S; Shapira, E; Gruenbaum, Y; Fainsod, A
1993-01-01
The DNA binding specificity of the chicken homeodomain protein CDXA was studied. Using a CDXA-glutathione-S-transferase fusion protein, DNA fragments containing the binding site for this protein were isolated. The sources of DNA were oligonucleotides with random sequence and chicken genomic DNA. The DNA fragments isolated were sequenced and tested in DNA binding assays. Sequencing revealed that most DNA fragments are AT rich which is a common feature of homeodomain binding sites. By electrophoretic mobility shift assays it was shown that the different target sequences isolated bind to the CDXA protein with different affinities. The specific sequences bound by the CDXA protein in the genomic fragments isolated, were determined by DNase I footprinting. From the footprinted sequences, the CDXA consensus binding site was determined. The CDXA protein binds the consensus sequence A, A/T, T, A/T, A, T, A/G. The CAUDAL binding site in the ftz promoter is also included in this consensus sequence. When tested, some of the genomic target sequences were capable of enhancing the transcriptional activity of reporter plasmids when introduced into CDXA expressing cells. This study determined the DNA sequence specificity of the CDXA protein and it also shows that this protein can further activate transcription in cells in culture. Images PMID:7909943
Williams, Emma L; Bagg, Eleanor A L; Mueller, Michael; Vandrovcova, Jana; Aitman, Timothy J; Rumsby, Gill
2015-01-01
Definitive diagnosis of primary hyperoxaluria (PH) currently utilizes sequential Sanger sequencing of the AGXT, GRPHR, and HOGA1 genes but efficacy is unproven. This analysis is time-consuming, relatively expensive, and delays in diagnosis and inappropriate treatment can occur if not pursued early in the diagnostic work-up. We reviewed testing outcomes of Sanger sequencing in 200 consecutive patient samples referred for analysis. In addition, the Illumina Truseq custom amplicon system was evaluated for paralleled next-generation sequencing (NGS) of AGXT,GRHPR, and HOGA1 in 90 known PH patients. AGXT sequencing was requested in all patients, permitting a diagnosis of PH1 in 50%. All remaining patients underwent targeted exon sequencing of GRHPR and HOGA1 with 8% diagnosed with PH2 and 8% with PH3. Complete sequencing of both GRHPR and HOGA1 was not requested in 25% of patients referred leaving their diagnosis in doubt. NGS analysis showed 98% agreement with Sanger sequencing and both approaches had 100% diagnostic specificity. Diagnostic sensitivity of Sanger sequencing was 98% and for NGS it was 97%. NGS has comparable diagnostic performance to Sanger sequencing for the diagnosis of PH and, if implemented, would screen for all forms of PH simultaneously ensuring prompt diagnosis at decreased cost. PMID:25629080
Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling
Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien
2012-01-01
The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697
enoLOGOS: a versatile web tool for energy normalized sequence logos
Workman, Christopher T.; Yin, Yutong; Corcoran, David L.; Ideker, Trey; Stormo, Gary D.; Benos, Panayiotis V.
2005-01-01
enoLOGOS is a web-based tool that generates sequence logos from various input sources. Sequence logos have become a popular way to graphically represent DNA and amino acid sequence patterns from a set of aligned sequences. Each position of the alignment is represented by a column of stacked symbols with its total height reflecting the information content in this position. Currently, the available web servers are able to create logo images from a set of aligned sequences, but none of them generates weighted sequence logos directly from energy measurements or other sources. With the advent of high-throughput technologies for estimating the contact energy of different DNA sequences, tools that can create logos directly from binding affinity data are useful to researchers. enoLOGOS generates sequence logos from a variety of input data, including energy measurements, probability matrices, alignment matrices, count matrices and aligned sequences. Furthermore, enoLOGOS can represent the mutual information of different positions of the consensus sequence, a unique feature of this tool. Another web interface for our software, C2H2-enoLOGOS, generates logos for the DNA-binding preferences of the C2H2 zinc-finger transcription factor family members. enoLOGOS and C2H2-enoLOGOS are accessible over the web at . PMID:15980495
Effects of learning duration on implicit transfer.
Tanaka, Kanji; Watanabe, Katsumi
2015-10-01
Implicit learning and transfer in sequence acquisition play important roles in daily life. Several previous studies have found that even when participants are not aware that a transfer sequence has been transformed from the learning sequence, they are able to perform the transfer sequence faster and more accurately; this suggests implicit transfer of visuomotor sequences. Here, we investigated whether implicit transfer could be modulated by the number of trials completed in a learning session. Participants learned a sequence through trial and error, known as the m × n task (Hikosaka et al. in J Neurophysiol 74:1652-1661, 1995). In the learning session, participants were required to successfully perform the same sequence 4, 12, 16, or 20 times. In the transfer session, participants then learned one of two other sequences: one where the button configuration Vertically Mirrored the learning sequence, or a randomly generated sequence. Our results show that even when participants did not notice the alternation rule (i.e., vertical mirroring), their total working time was less and their total number of errors was lower in the transfer session compared with those who performed a Random sequence, irrespective of the number of trials completed in the learning session. This result suggests that implicit transfer likely occurs even over a shorter learning duration.
Riman, Sarah; Kiesler, Kevin M; Borsuk, Lisa A; Vallone, Peter M
2017-07-01
Standard Reference Materials SRM 2392 and 2392-I are intended to provide quality control when amplifying and sequencing human mitochondrial genome sequences. The National Institute of Standards and Technology (NIST) offers these SRMs to laboratories performing DNA-based forensic human identification, molecular diagnosis of mitochondrial diseases, mutation detection, evolutionary anthropology, and genetic genealogy. The entire mtGenome (∼16569bp) of SRM 2392 and 2392-I have previously been characterized at NIST by Sanger sequencing. Herein, we used the sensitivity, specificity, and accuracy offered by next generation sequencing (NGS) to: (1) re-sequence the certified values of the SRM 2392 and 2392-I; (2) confirm Sanger data with a high coverage new sequencing technology; (3) detect lower level heteroplasmies (<20%); and thus (4) support mitochondrial sequencing communities in the adoption of NGS methods. To obtain a consensus sequence for the SRMs as well as identify and control any bias, sequencing was performed using two NGS platforms and data was analyzed using different bioinformatics pipelines. Our results confirm five low level heteroplasmy sites that were not previously observed with Sanger sequencing: three sites in the GM09947A template in SRM 2392 and two sites in the HL-60 template in SRM 2392-I. Copyright © 2017 Elsevier B.V. All rights reserved.
Sources of PCR-induced distortions in high-throughput sequencing data sets
Kebschull, Justus M.; Zador, Anthony M.
2015-01-01
PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCR-mediated amplification. Here we examine the effects of four important sources of error—bias, stochasticity, template switches and polymerase errors—on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules. PMID:26187991
Rapid and accurate pyrosequencing of angiosperm plastid genomes
Moore, Michael J; Dhingra, Amit; Soltis, Pamela S; Shaw, Regina; Farmerie, William G; Folta, Kevin M; Soltis, Douglas E
2006-01-01
Background Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20) System (454 Life Sciences Corporation), to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae) and Platanus occidentalis (Platanaceae). Results More than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6× in Nandina and 17.3× in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with ~60% of all errors associated with homopolymer runs of 5 or more nucleotides and ~50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions. Conclusion Highly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy observed in the GS 20 plastid genome sequence was generated for a significant reduction in time and cost over traditional shotgun-based genome sequencing techniques, although with approximately half the coverage of previously reported GS 20 de novo genome sequence. The GS 20 should be broadly applicable to angiosperm plastid genome sequencing, and therefore promises to expand the scale of plant genetic and phylogenetic research dramatically. PMID:16934154
Xu, Yi-Hua; Manoharan, Herbert T; Pitot, Henry C
2007-09-01
The bisulfite genomic sequencing technique is one of the most widely used techniques to study sequence-specific DNA methylation because of its unambiguous ability to reveal DNA methylation status to the order of a single nucleotide. One characteristic feature of the bisulfite genomic sequencing technique is that a number of sample sequence files will be produced from a single DNA sample. The PCR products of bisulfite-treated DNA samples cannot be sequenced directly because they are heterogeneous in nature; therefore they should be cloned into suitable plasmids and then sequenced. This procedure generates an enormous number of sample DNA sequence files as well as adding extra bases belonging to the plasmids to the sequence, which will cause problems in the final sequence comparison. Finding the methylation status for each CpG in each sample sequence is not an easy job. As a result CpG PatternFinder was developed for this purpose. The main functions of the CpG PatternFinder are: (i) to analyze the reference sequence to obtain CpG and non-CpG-C residue position information. (ii) To tailor sample sequence files (delete insertions and mark deletions from the sample sequence files) based on a configuration of ClustalW multiple alignment. (iii) To align sample sequence files with a reference file to obtain bisulfite conversion efficiency and CpG methylation status. And, (iv) to produce graphics, highlighted aligned sequence text and a summary report which can be easily exported to Microsoft Office suite. CpG PatternFinder is designed to operate cooperatively with BioEdit, a freeware on the internet. It can handle up to 100 files of sample DNA sequences simultaneously, and the total CpG pattern analysis process can be finished in minutes. CpG PatternFinder is an ideal software tool for DNA methylation studies to determine the differential methylation pattern in a large number of individuals in a population. Previously we developed the CpG Analyzer program; CpG PatternFinder is our further effort to create software tools for DNA methylation studies.
Detecting and Characterizing Repeating Earthquake Sequences During Volcanic Eruptions
NASA Astrophysics Data System (ADS)
Tepp, G.; Haney, M. M.; Wech, A.
2017-12-01
A major challenge in volcano seismology is forecasting eruptions. Repeating earthquake sequences often precede volcanic eruptions or lava dome activity, providing an opportunity for short-term eruption forecasting. Automatic detection of these sequences can lead to timely eruption notification and aid in continuous monitoring of volcanic systems. However, repeating earthquake sequences may also occur after eruptions or along with magma intrusions that do not immediately lead to an eruption. This additional challenge requires a better understanding of the processes involved in producing these sequences to distinguish those that are precursory. Calculation of the inverse moment rate and concepts from the material failure forecast method can lead to such insights. The temporal evolution of the inverse moment rate is observed to differ for precursory and non-precursory sequences, and multiple earthquake sequences may occur concurrently. These observations suggest that sequences may occur in different locations or through different processes. We developed an automated repeating earthquake sequence detector and near real-time alarm to send alerts when an in-progress sequence is identified. Near real-time inverse moment rate measurements can further improve our ability to forecast eruptions by allowing for characterization of sequences. We apply the detector to eruptions of two Alaskan volcanoes: Bogoslof in 2016-2017 and Redoubt Volcano in 2009. The Bogoslof eruption produced almost 40 repeating earthquake sequences between its start in mid-December 2016 and early June 2017, 21 of which preceded an explosive eruption, and 2 sequences in the months before eruptive activity. Three of the sequences occurred after the implementation of the alarm in late March 2017 and successfully triggered alerts. The nearest seismometers to Bogoslof are over 45 km away, requiring a detector that can work with few stations and a relatively low signal-to-noise ratio. During the Redoubt eruption, earthquake sequences were observed in the months leading up to the eruptive activity beginning in March 2009 as well as immediately preceding 7 of the 19 explosive events. In contrast to Bogoslof, Redoubt has a local monitoring network which allows for better detection and more detailed analysis of the repeating earthquake sequences.
NASA Astrophysics Data System (ADS)
Zhong, G.; Wang, L.
2013-12-01
The northern South China Sea (SCS) margin is suggested as one of the ideal sites for documenting the late Cenozoic sea level changes for its characteristics of rapid sedimentation and relatively stable structural subsidence since the Late Miocene. In this study, high-resolution seismic profiles acquired by the Guangzhou Marine Geological Survey, calibrated by well control from the ODP sites 1146 and 1148, were utilized to construct a time-significant sequence stratigraphic framework, from which the history of relative sea level changes since the Late Miocene on the northern SCS margin was derived. Our study area is situated in the middle segment of the margin, between the Hainan Island to the west and the Dongsha Islands to the east. This region is to a certain degree far away from the active structural zones and is suggested as the most stable region in the margin. Totally 4000 km seismic profiles were used, which controls an area of about 6×104 km2. The seismic data have a vertical resolution of 5 to 15 m for the Upper Miocene to Quaternary interval. Three regional seismic sequence boundaries were identified. They subdivide the Late Miocene to Quaternary into three mega-sequences, which correspond to the Quaternary, Pliocene and Late Miocene, respectively by tying to well control. The Late Miocene mega-sequence, including 13 component sequences, is characterized with a basal incised canyon-developed interval overlain by three sets of progradational sequences formed in deep-water slope environments. The Pliocene mega-sequence consists of four sets of progradational sequences. Each sequence set contains one to three component sequences. At least 7 component sequences can be identified. The Quaternary mega-sequence consists of five sets of progradational sequences, in which the lower two constitute a retrogressive sequence set and the upper three a progradational sequence set. At least 9 component sequences can be recognized. Most of the component sequences within the Pliocene and Quaternary mega-sequences occur adjacent to modern shelf margin, and therefore were interpreted as shelf-marginal progradational deltaic sequences. A relative sea level curve since the Late Miocene was compiled by integrating the shift trajectory of onlap points, the stacking pattern of component sequences, and the chronostratigraphic diagrams. The curve contains about 29 cycles of relative sea level changes, showing a much higher resolution than the previous results in the region. These cycles constitute three large relative sea level rise and fall cycles. General trend of sea level variations is rising since the Late Miocene, which is opposite to the global sea level changes and is in accordance with the previous regional researches. This deviation is ascribed to the combined effects of very rapid regional subsidence and relative deficiency of sediment supply. This research was funded by the National Natural Science Foundation of China (Grant Nos. 91028003 and 41076020).
Microsatellite DNA in genomic survey sequences and UniGenes of loblolly pine
Craig S Echt; Surya Saha; Dennis L Deemer; C Dana Nelson
2011-01-01
Genomic DNA sequence databases are a potential and growing resource for simple sequence repeat (SSR) marker development in loblolly pine (Pinus taeda L.). Loblolly pine also has many expressed sequence tags (ESTs) available for microsatellite (SSR) marker development. We compared loblolly pine SSR densities in genome survey sequences (GSSs) to those in non-redundant...
ERIC Educational Resources Information Center
Du, Wenchong; Kelly, Steve W.
2013-01-01
The present study examines implicit sequence learning in adult dyslexics with a focus on comparing sequence transitions with different statistical complexities. Learning of a 12-item deterministic sequence was assessed in 12 dyslexic and 12 non-dyslexic university students. Both groups showed equivalent standard reaction time increments when the…
USDA-ARS?s Scientific Manuscript database
The complete nucleotide sequence of a recently discovered Florida (FL) isolate of Hibiscus infecting Cilevirus (HiCV) was determined by Sanger sequencing. The movement- and coat- protein gene sequences of the HiCV-FL isolate are more divergent than other genes of the previously sequenced HiCV-HA (Ha...
USDA-ARS?s Scientific Manuscript database
The complete genome sequence of a Southern tomato virus (STV) isolate on tomato plants in a seed production field in Bangladesh was obtained for the first time using next generation sequencing. The identified isolate STV_BD-13 shares high degree of sequence identity (99%) with several known STV isol...
USDA-ARS?s Scientific Manuscript database
Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...
ERIC Educational Resources Information Center
Taylor, D. Leland; Campbell, A. Malcolm; Heyer, Laurie J.
2013-01-01
Next-generation sequencing technologies have greatly reduced the cost of sequencing genomes. With the current sequencing technology, a genome is broken into fragments and sequenced, producing millions of "reads." A computer algorithm pieces these reads together in the genome assembly process. PHAST is a set of online modules…
TABLE D - WMO AND LOCAL (NCEP) DESCRIPTORS AS WELL AS THOSE AWAITING
sequences common to satellite observations None 3 05 Meteorological or hydrological sequences common to Vertical sounding sequences (conventional data) None 3 10 Vertical sounding sequences (satellite data) None (satellite data) None 3 13 Sequences common to image data None 3 14 Reserved None 3 15 Oceanographic report
Abe, Takashi; Hamano, Yuta; Ikemura, Toshimichi
2014-01-01
A strategy of evolutionary studies that can compare vast numbers of genome sequences is becoming increasingly important with the remarkable progress of high-throughput DNA sequencing methods. We previously established a sequence alignment-free clustering method "BLSOM" for di-, tri-, and tetranucleotide compositions in genome sequences, which can characterize sequence characteristics (genome signatures) of a wide range of species. In the present study, we generated BLSOMs for tetra- and pentanucleotide compositions in approximately one million sequence fragments derived from 101 eukaryotes, for which almost complete genome sequences were available. BLSOM recognized phylotype-specific characteristics (e.g., key combinations of oligonucleotide frequencies) in the genome sequences, permitting phylotype-specific clustering of the sequences without any information regarding the species. In our detailed examination of 12 Drosophila species, the correlation between their phylogenetic classification and the classification on the BLSOMs was observed to visualize oligonucleotides diagnostic for species-specific clustering.
Low-Cost, High-Throughput Sequencing of DNA Assemblies Using a Highly Multiplexed Nextera Process.
Shapland, Elaine B; Holmes, Victor; Reeves, Christopher D; Sorokin, Elena; Durot, Maxime; Platt, Darren; Allen, Christopher; Dean, Jed; Serber, Zach; Newman, Jack; Chandran, Sunil
2015-07-17
In recent years, next-generation sequencing (NGS) technology has greatly reduced the cost of sequencing whole genomes, whereas the cost of sequence verification of plasmids via Sanger sequencing has remained high. Consequently, industrial-scale strain engineers either limit the number of designs or take short cuts in quality control. Here, we show that over 4000 plasmids can be completely sequenced in one Illumina MiSeq run for less than $3 each (15× coverage), which is a 20-fold reduction over using Sanger sequencing (2× coverage). We reduced the volume of the Nextera tagmentation reaction by 100-fold and developed an automated workflow to prepare thousands of samples for sequencing. We also developed software to track the samples and associated sequence data and to rapidly identify correctly assembled constructs having the fewest defects. As DNA synthesis and assembly become a centralized commodity, this NGS quality control (QC) process will be essential to groups operating high-throughput pipelines for DNA construction.
RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis
Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab
2012-01-01
RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. Availability http://www.cemb.edu.pk/sw.html Abbreviations RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language. PMID:23055611
Froenicke, Lutz; Lavelle, Dean; Martineau, Belinda; Perroud, Bertrand; Michelmore, Richard
2013-01-01
Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce. PMID:23409088
Ancient DNA sequence revealed by error-correcting codes.
Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo
2015-07-10
A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.
Ancient DNA sequence revealed by error-correcting codes
Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo
2015-01-01
A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228
Folding and Stabilization of Native-Sequence-Reversed Proteins
Zhang, Yuanzhao; Weber, Jeffrey K; Zhou, Ruhong
2016-01-01
Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols. PMID:27113844
Ultra-deep mutant spectrum profiling: improving sequencing accuracy using overlapping read pairs.
Chen-Harris, Haiyin; Borucki, Monica K; Torres, Clinton; Slezak, Tom R; Allen, Jonathan E
2013-02-12
High throughput sequencing is beginning to make a transformative impact in the area of viral evolution. Deep sequencing has the potential to reveal the mutant spectrum within a viral sample at high resolution, thus enabling the close examination of viral mutational dynamics both within- and between-hosts. The challenge however, is to accurately model the errors in the sequencing data and differentiate real viral mutations, particularly those that exist at low frequencies, from sequencing errors. We demonstrate that overlapping read pairs (ORP) -- generated by combining short fragment sequencing libraries and longer sequencing reads -- significantly reduce sequencing error rates and improve rare variant detection accuracy. Using this sequencing protocol and an error model optimized for variant detection, we are able to capture a large number of genetic mutations present within a viral population at ultra-low frequency levels (<0.05%). Our rare variant detection strategies have important implications beyond viral evolution and can be applied to any basic and clinical research area that requires the identification of rare mutations.
Makiguchi, Wataru; Tanabe, Junki; Yamada, Hidekazu; Iida, Hiroki; Taura, Daisuke; Ousaka, Naoki; Yashima, Eiji
2015-01-01
Self-recognition and self-discrimination within complex mixtures are of fundamental importance in biological systems, which entirely rely on the preprogrammed monomer sequences and homochirality of biological macromolecules. Here we report artificial chirality- and sequence-selective successive self-sorting of chiral dimeric strands bearing carboxylic acid or amidine groups joined by chiral amide linkers with different sequences through homo- and complementary-duplex formations. A mixture of carboxylic acid dimers linked by racemic-1,2-cyclohexane bis-amides with different amide sequences (NHCO or CONH) self-associate to form homoduplexes in a completely sequence-selective way, the structures of which are different from each other depending on the linker amide sequences. The further addition of an enantiopure amide-linked amidine dimer to a mixture of the racemic carboxylic acid dimers resulted in the formation of a single optically pure complementary duplex with a 100% diastereoselectivity and complete sequence specificity stabilized by the amidinium–carboxylate salt bridges, leading to the perfect chirality- and sequence-selective duplex formation. PMID:26051291
Folding and Stabilization of Native-Sequence-Reversed Proteins
NASA Astrophysics Data System (ADS)
Zhang, Yuanzhao; Weber, Jeffrey K.; Zhou, Ruhong
2016-04-01
Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols.
Pruitt, Kim D.; Tatusova, Tatiana; Maglott, Donna R.
2005-01-01
The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. PMID:15608248
Coarse-grained sequences for protein folding and design.
Brown, Scott; Fawzi, Nicolas J; Head-Gordon, Teresa
2003-09-16
We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the alpha/beta ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20- to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design.
The diploid genome sequence of an Asian individual
Wang, Jun; Wang, Wei; Li, Ruiqiang; Li, Yingrui; Tian, Geng; Goodman, Laurie; Fan, Wei; Zhang, Junqing; Li, Jun; Zhang, Juanbin; Guo, Yiran; Feng, Binxiao; Li, Heng; Lu, Yao; Fang, Xiaodong; Liang, Huiqing; Du, Zhenglin; Li, Dong; Zhao, Yiqing; Hu, Yujie; Yang, Zhenzhen; Zheng, Hancheng; Hellmann, Ines; Inouye, Michael; Pool, John; Yi, Xin; Zhao, Jing; Duan, Jinjie; Zhou, Yan; Qin, Junjie; Ma, Lijia; Li, Guoqing; Yang, Zhentao; Zhang, Guojie; Yang, Bin; Yu, Chang; Liang, Fang; Li, Wenjie; Li, Shaochuan; Li, Dawei; Ni, Peixiang; Ruan, Jue; Li, Qibin; Zhu, Hongmei; Liu, Dongyuan; Lu, Zhike; Li, Ning; Guo, Guangwu; Zhang, Jianguo; Ye, Jia; Fang, Lin; Hao, Qin; Chen, Quan; Liang, Yu; Su, Yeyang; san, A.; Ping, Cuo; Yang, Shuang; Chen, Fang; Li, Li; Zhou, Ke; Zheng, Hongkun; Ren, Yuanyuan; Yang, Ling; Gao, Yang; Yang, Guohua; Li, Zhuo; Feng, Xiaoli; Kristiansen, Karsten; Wong, Gane Ka-Shu; Nielsen, Rasmus; Durbin, Richard; Bolund, Lars; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian
2009-01-01
Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics. PMID:18987735
Johnson, Matthew G.; Gardner, Elliot M.; Liu, Yang; Medina, Rafael; Goffinet, Bernard; Shaw, A. Jonathan; Zerega, Nyree J. C.; Wickett, Norman J.
2016-01-01
Premise of the study: Using sequence data generated via target enrichment for phylogenetics requires reassembly of high-throughput sequence reads into loci, presenting a number of bioinformatics challenges. We developed HybPiper as a user-friendly platform for assembly of gene regions, extraction of exon and intron sequences, and identification of paralogous gene copies. We test HybPiper using baits designed to target 333 phylogenetic markers and 125 genes of functional significance in Artocarpus (Moraceae). Methods and Results: HybPiper implements parallel execution of sequence assembly in three phases: read mapping, contig assembly, and target sequence extraction. The pipeline was able to recover nearly complete gene sequences for all genes in 22 species of Artocarpus. HybPiper also recovered more than 500 bp of nontargeted intron sequence in over half of the phylogenetic markers and identified paralogous gene copies in Artocarpus. Conclusions: HybPiper was designed for Linux and Mac OS X and is freely available at https://github.com/mossmatters/HybPiper. PMID:27437175
Flexible theta sequence compression mediated via phase precessing interneurons
Chadwick, Angus; van Rossum, Mark CW; Nolan, Matthew F
2016-01-01
Encoding of behavioral episodes as spike sequences during hippocampal theta oscillations provides a neural substrate for computations on events extended across time and space. However, the mechanisms underlying the numerous and diverse experimentally observed properties of theta sequences remain poorly understood. Here we account for theta sequences using a novel model constrained by the septo-hippocampal circuitry. We show that when spontaneously active interneurons integrate spatial signals and theta frequency pacemaker inputs, they generate phase precessing action potentials that can coordinate theta sequences in place cell populations. We reveal novel constraints on sequence generation, predict cellular properties and neural dynamics that characterize sequence compression, identify circuit organization principles for high capacity sequential representation, and show that theta sequences can be used as substrates for association of conditioned stimuli with recent and upcoming events. Our results suggest mechanisms for flexible sequence compression that are suited to associative learning across an animal’s lifespan. DOI: http://dx.doi.org/10.7554/eLife.20349.001 PMID:27929374
Roca, Alberto I
2014-01-01
The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org.
Matvienko, Marta; Kozik, Alexander; Froenicke, Lutz; Lavelle, Dean; Martineau, Belinda; Perroud, Bertrand; Michelmore, Richard
2013-01-01
Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce.
Coarse-grained sequences for protein folding and design
Brown, Scott; Fawzi, Nicolas J.; Head-Gordon, Teresa
2003-01-01
We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the α/β ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20- to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design. PMID:12963815
Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform
Van Nostrand, Joy D.; Ning, Daliang; Sun, Bo; Xue, Kai; Liu, Feifei; Deng, Ye; Liang, Yuting; Zhou, Jizhong
2017-01-01
Illumina’s MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered, the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1–3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility. PMID:28453559
Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wen, Chongqing; Wu, Liyou; Qin, Yujia
Illumina's MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered,more » the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1-3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility.« less
Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform
Wen, Chongqing; Wu, Liyou; Qin, Yujia; ...
2017-04-28
Illumina's MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered,more » the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1-3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility.« less
Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform.
Wen, Chongqing; Wu, Liyou; Qin, Yujia; Van Nostrand, Joy D; Ning, Daliang; Sun, Bo; Xue, Kai; Liu, Feifei; Deng, Ye; Liang, Yuting; Zhou, Jizhong
2017-01-01
Illumina's MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered, the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1-3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility.
30 CFR 75.1102 - Slippage and sequence switches.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 30 Mineral Resources 1 2013-07-01 2013-07-01 false Slippage and sequence switches. 75.1102 Section... sequence switches. [Statutory Provisions] Underground belt conveyors shall be equipped with slippage and sequence switches. ...
30 CFR 75.1102 - Slippage and sequence switches.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 30 Mineral Resources 1 2010-07-01 2010-07-01 false Slippage and sequence switches. 75.1102 Section... sequence switches. [Statutory Provisions] Underground belt conveyors shall be equipped with slippage and sequence switches. ...
30 CFR 75.1102 - Slippage and sequence switches.
Code of Federal Regulations, 2014 CFR
2014-07-01
... 30 Mineral Resources 1 2014-07-01 2014-07-01 false Slippage and sequence switches. 75.1102 Section... sequence switches. [Statutory Provisions] Underground belt conveyors shall be equipped with slippage and sequence switches. ...
30 CFR 75.1102 - Slippage and sequence switches.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 30 Mineral Resources 1 2011-07-01 2011-07-01 false Slippage and sequence switches. 75.1102 Section... sequence switches. [Statutory Provisions] Underground belt conveyors shall be equipped with slippage and sequence switches. ...
30 CFR 75.1102 - Slippage and sequence switches.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 30 Mineral Resources 1 2012-07-01 2012-07-01 false Slippage and sequence switches. 75.1102 Section... sequence switches. [Statutory Provisions] Underground belt conveyors shall be equipped with slippage and sequence switches. ...
[Current applications of high-throughput DNA sequencing technology in antibody drug research].
Yu, Xin; Liu, Qi-Gang; Wang, Ming-Rong
2012-03-01
Since the publication of a high-throughput DNA sequencing technology based on PCR reaction was carried out in oil emulsions in 2005, high-throughput DNA sequencing platforms have been evolved to a robust technology in sequencing genomes and diverse DNA libraries. Antibody libraries with vast numbers of members currently serve as a foundation of discovering novel antibody drugs, and high-throughput DNA sequencing technology makes it possible to rapidly identify functional antibody variants with desired properties. Herein we present a review of current applications of high-throughput DNA sequencing technology in the analysis of antibody library diversity, sequencing of CDR3 regions, identification of potent antibodies based on sequence frequency, discovery of functional genes, and combination with various display technologies, so as to provide an alternative approach of discovery and development of antibody drugs.
Scar-less multi-part DNA assembly design automation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hillson, Nathan J.
The present invention provides a method of a method of designing an implementation of a DNA assembly. In an exemplary embodiment, the method includes (1) receiving a list of DNA sequence fragments to be assembled together and an order in which to assemble the DNA sequence fragments, (2) designing DNA oligonucleotides (oligos) for each of the DNA sequence fragments, and (3) creating a plan for adding flanking homology sequences to each of the DNA oligos. In an exemplary embodiment, the method includes (1) receiving a list of DNA sequence fragments to be assembled together and an order in which tomore » assemble the DNA sequence fragments, (2) designing DNA oligonucleotides (oligos) for each of the DNA sequence fragments, and (3) creating a plan for adding optimized overhang sequences to each of the DNA oligos.« less
Company profile: Complete Genomics Inc.
Reid, Clifford
2011-02-01
Complete Genomics Inc. is a life sciences company that focuses on complete human genome sequencing. It is taking a completely different approach to DNA sequencing than other companies in the industry. Rather than building a general-purpose platform for sequencing all organisms and all applications, it has focused on a single application - complete human genome sequencing. The company's Complete Genomics Analysis Platform (CGA™ Platform) comprises an integrated package of biochemistry, instrumentation and software that sequences human genomes at the highest quality, lowest cost and largest scale available. Complete Genomics offers a turnkey service that enables customers to outsource their human genome sequencing to the company's genome sequencing center in Mountain View, CA, USA. Customers send in their DNA samples, the company does all the library preparation, DNA sequencing, assembly and variant analysis, and customers receive research-ready data that they can use for biological discovery.
Quantum-Sequencing: Fast electronic single DNA molecule sequencing
NASA Astrophysics Data System (ADS)
Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant
2014-03-01
A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.
RefSeq microbial genomes database: new representation and annotation strategy.
Tatusova, Tatiana; Ciufo, Stacy; Fedorov, Boris; O'Neill, Kathleen; Tolstoy, Igor
2014-01-01
The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks.
Linder, P; Dölz, R; Mossé, M O; Lazowska, J; Slonimski, P P
1993-01-01
The amount of nucleotide sequence data is increasing exponentially. We therefore made an effort to make a comprehensive database (LISTA) for the yeast Saccharomyces cerevisiae. Each sequence has been attributed a single genetic name and in the case of allelic duplicated sequences, synonyms are given, if necessary. For the nomenclature we have introduced a standard principle for naming gene sequences based on priority rules. We have also applied a simple method to distinguish duplicated sequences of one and the same gene from non-allelic sequences of duplicated genes. By using these principles we have sorted out a lot of confusion in the literature and databanks. Along with the genetic name, the mnemonic from the EMBL databank, the codon bias, reference of the publication of the sequence and the EMBL accession numbers are included in each entry. PMID:8332521
NASA Astrophysics Data System (ADS)
Brenner, Tom; Chen, Johnny; Stait-Gardner, Tim; Zheng, Gang; Matsukawa, Shingo; Price, William S.
2018-03-01
A new family of binomial-like inversion sequences, named jump-and-return sandwiches (JRS), has been developed by inserting a binomial-like sequence into a standard jump-and-return sequence, discovered through use of a stochastic Genetic Algorithm optimisation. Compared to currently used binomial-like inversion sequences (e.g., 3-9-19 and W5), the new sequences afford wider inversion bands and narrower non-inversion bands with an equal number of pulses. As an example, two jump-and-return sandwich 10-pulse sequences achieved 95% inversion at offsets corresponding to 9.4% and 10.3% of the non-inversion band spacing, compared to 14.7% for the binomial-like W5 inversion sequence, i.e., they afforded non-inversion bands about two thirds the width of the W5 non-inversion band.
Bobrova, E V; Liakhovetskiĭ, V A; Borshchevskaia, E R
2011-01-01
The dependence of errors during reproduction of a sequence of hand movements without visual feedback on the previous right- and left-hand performance ("prehistory") and on positions in space of sequence elements (random or ordered by the explicit rule) was analyzed. It was shown that the preceding information about the ordered positions of the sequence elements was used during right-hand movements, whereas left-hand movements were performed with involvement of the information about the random sequence. The data testify to a central mechanism of the analysis of spatial structure of sequence elements. This mechanism activates movement coding specific for the left hemisphere (vector coding) in case of an ordered sequence structure and positional coding specific for the right hemisphere in case of a random sequence structure.
A powerful graphical pulse sequence programming tool for magnetic resonance imaging.
Jie, Shen; Ying, Liu; Jianqi, Li; Gengying, Li
2005-12-01
A powerful graphical pulse sequence programming tool has been designed for creating magnetic resonance imaging (MRI) applications. It allows rapid development of pulse sequences in graphical mode (allowing for the visualization of sequences), and consists of three modules which include a graphical sequence editor, a parameter management module and a sequence compiler. Its key features are ease to use, flexibility and hardware independence. When graphic elements are combined with a certain text expressions, the graphical pulse sequence programming is as flexible as text-based programming tool. In addition, a hardware-independent design is implemented by using the strategy of two step compilations. To demonstrate the flexibility and the capability of this graphical sequence programming tool, a multi-slice fast spin echo experiment is performed on our home-made 0.3 T permanent magnet MRI system.
Thomsen, Martin Christen Frølund; Nielsen, Morten
2012-01-01
Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed). PMID:22638583
Tanaka, Mizuki; Sakai, Yoshifumi; Yamada, Osamu; Shintani, Takahiro; Gomi, Katsuya
2011-01-01
To investigate 3′-end-processing signals in Aspergillus oryzae, we created a nucleotide sequence data set of the 3′-untranslated region (3′ UTR) plus 100 nucleotides (nt) sequence downstream of the poly(A) site using A. oryzae expressed sequence tags and genomic sequencing data. This data set comprised 1065 sequences derived from 1042 unique genes. The average 3′ UTR length in A. oryzae was 241 nt, which is greater than that in yeast but similar to that in plants. The 3′ UTR and 100 nt sequence downstream of the poly(A) site is notably U-rich, while the region located 15–30 nt upstream of the poly(A) site is markedly A-rich. The most frequently found hexanucleotide in this A-rich region is AAUGAA, although this sequence accounts for only 6% of all transcripts. These data suggested that A. oryzae has no highly conserved sequence element equivalent to AAUAAA, a mammalian polyadenylation signal. We identified that putative 3′-end-processing signals in A. oryzae, while less well conserved than those in mammals, comprised four sequence elements: the furthest upstream U-rich element, A-rich sequence, cleavage site, and downstream U-rich element flanking the cleavage site. Although these putative 3′-end-processing signals are similar to those in yeast and plants, some notable differences exist between them. PMID:21586533
Verstappen, Koen M; Huijbregts, Loes; Spaninks, Mirlin; Wagenaar, Jaap A; Fluit, Ad C; Duim, Birgitta
2017-01-01
Staphylococcus pseudintermedius is an opportunistic pathogen in dogs and cats and occasionally causes infections in humans. S. pseudintermedius is often resistant to multiple classes of antimicrobials. It requires a reliable detection so that it is not misidentified as S. aureus. Phenotypic and currently-used molecular-based diagnostic assays lack specificity or are labour-intensive using multiplex PCR or nucleic acid sequencing. The aim of this study was to identify a specific target for real-time PCR by comparing whole genome sequences of S. pseudintermedius and non-pseudintermedius.Genome sequences were downloaded from public repositories and supplemented by isolates that were sequenced in this study. A Perl-script was written that analysed 300-nt fragments from a reference genome sequence of S. pseudintermedius and checked if this sequence was present in other S. pseudintermedius genomes (n = 74) and non-pseudintermedius genomes (n = 138). Six sequences specific for S. pseudintermedius were identified (sequence length between 300-500 nt). One sequence, which was located in the spsJ gene, was used to develop primers and a probe. The real-time PCR showed 100% specificity when testing for S. pseudintermedius isolates (n = 54), and eight other staphylococcal species (n = 43). In conclusion, a novel approach by comparing whole genome sequences identified a sequence that is specific for S. pseudintermedius and provided a real-time PCR target for rapid and reliable detection of S. pseudintermedius.
REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era
Leonard, Guy; Stevens, Jamie R.; Richards, Thomas A.
2009-01-01
The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends. PMID:19812722
Yanagi, Tomohiro; Shirasawa, Kenta; Terachi, Mayuko; Isobe, Sachiko
2017-01-01
Cultivated strawberry ( Fragaria × ananassa Duch.) has homoeologous chromosomes because of allo-octoploidy. For example, two homoeologous chromosomes that belong to different sub-genome of allopolyploids have similar base sequences. Thus, when conducting de novo assembly of DNA sequences, it is difficult to determine whether these sequences are derived from the same chromosome. To avoid the difficulties associated with homoeologous chromosomes and demonstrate the possibility of sequencing allopolyploids using single chromosomes, we conducted sequence analysis using microdissected single somatic chromosomes of cultivated strawberry. Three hundred and ten somatic chromosomes of the Japanese octoploid strawberry 'Reiko' were individually selected under a light microscope using a microdissection system. DNA from 288 of the dissected chromosomes was successfully amplified using a DNA amplification kit. Using next-generation sequencing, we decoded the base sequences of the amplified DNA segments, and on the basis of mapping, we identified DNA sequences from 144 samples that were best matched to the reference genomes of the octoploid strawberry, F. × ananassa , and the diploid strawberry, F. vesca . The 144 samples were classified into seven pseudo-molecules of F. vesca . The coverage rates of the DNA sequences from the single chromosome onto all pseudo-molecular sequences varied from 3 to 29.9%. We demonstrated an efficient method for sequence analysis of allopolyploid plants using microdissected single chromosomes. On the basis of our results, we believe that whole-genome analysis of allopolyploid plants can be enhanced using methodology that employs microdissected single chromosomes.
Grahn, Jessica A.; Henry, Molly J.; McAuley, J. Devin
2011-01-01
How we measure time and integrate temporal cues from different sensory modalities are fundamental questions in neuroscience. Sensitivity to a “beat” (such as that routinely perceived in music) differs substantially between auditory and visual modalities. Here we examined beat sensitivity in each modality, and examined cross-modal influences, using functional magnetic resonance imaging (fMRI) to characterize brain activity during perception of auditory and visual rhythms. In separate fMRI sessions, participants listened to auditory sequences or watched visual sequences. The order of auditory and visual sequence presentation was counterbalanced so that cross-modal order effects could be investigated. Participants judged whether sequences were speeding up or slowing down, and the pattern of tempo judgments was used to derive a measure of sensitivity to an implied beat. As expected, participants were less sensitive to an implied beat in visual sequences than in auditory sequences. However, visual sequences produced a stronger sense of beat when preceded by auditory sequences with identical temporal structure. Moreover, increases in brain activity were observed in the bilateral putamen for visual sequences preceded by auditory sequences when compared to visual sequences without prior auditory exposure. No such order-dependent differences (behavioral or neural) were found for the auditory sequences. The results provide further evidence for the role of the basal ganglia in internal generation of the beat and suggest that an internal auditory rhythm representation may be activated during visual rhythm perception. PMID:20858544
Perceptions of randomness in binary sequences: Normative, heuristic, or both?
Reimers, Stian; Donkin, Chris; Le Pelley, Mike E
2018-03-01
When people consider a series of random binary events, such as tossing an unbiased coin and recording the sequence of heads (H) and tails (T), they tend to erroneously rate sequences with less internal structure or order (such as HTTHT) as more probable than sequences containing more structure or order (such as HHHHH). This is traditionally explained as a local representativeness effect: Participants assume that the properties of long sequences of random outcomes-such as an equal proportion of heads and tails, and little internal structure-should also apply to short sequences. However, recent theoretical work has noted that the probability of a particular sequence of say, heads and tails of length n, occurring within a larger (>n) sequence of coin flips actually differs by sequence, so P(HHHHH)
Differentiating Visual from Response Sequencing during Long-term Skill Learning.
Lynch, Brighid; Beukema, Patrick; Verstynen, Timothy
2017-01-01
The dual-system model of sequence learning posits that during early learning there is an advantage for encoding sequences in sensory frames; however, it remains unclear whether this advantage extends to long-term consolidation. Using the serial RT task, we set out to distinguish the dynamics of learning sequential orders of visual cues from learning sequential responses. On each day, most participants learned a new mapping between a set of symbolic cues and responses made with one of four fingers, after which they were exposed to trial blocks of either randomly ordered cues or deterministic ordered cues (12-item sequence). Participants were randomly assigned to one of four groups (n = 15 per group): Visual sequences (same sequence of visual cues across training days), Response sequences (same order of key presses across training days), Combined (same serial order of cues and responses on all training days), and a Control group (a novel sequence each training day). Across 5 days of training, sequence-specific measures of response speed and accuracy improved faster in the Visual group than any of the other three groups, despite no group differences in explicit awareness of the sequence. The two groups that were exposed to the same visual sequence across days showed a marginal improvement in response binding that was not found in the other groups. These results indicate that there is an advantage, in terms of rate of consolidation across multiple days of training, for learning sequences of actions in a sensory representational space, rather than as motoric representations.
Choe, Se-Eun; Nguyen, Thuy Thi-Dieu; Kang, Tae-Gyu; Kweon, Chang-Hee; Kang, Seung-Won
2011-09-01
Nuclear ribosomal DNA sequence of the second internal transcribed spacer (ITS-2) has been used efficiently to identify the liver fluke species collected from different hosts and various geographic regions. ITS-2 sequences of 19 Fasciola samples collected from Korean native cattle were determined and compared. Sequence comparison including ITS-2 sequences of isolates from this study and reference sequences from Fasciola hepatica and Fasciola gigantica and intermediate Fasciola in Genbank revealed seven identical variable sites of investigated isolates. Among 19 samples, 12 individuals had ITS-2 sequences completely identical to that of pure F. hepatica, five possessed the sequences identical to F. gigantica type, whereas two shared the sequence of both F. hepatica and F. gigantica. No variations in length and nucleotide composition of ITS-2 sequence were observed within isolates that belonged to F. hepatica or F. gigantica. At the position of 218, five Fasciola containing a single-base substitution (C>T) formed a distinct branch inside the F. gigantica-type group which was similar to those of Asian-origin isolates. The phylogenetic tree of the Fasciola spp. based on complete ITS-2 sequences from this study and other representative isolates in different locations clearly showed that pure F. hepatica, F. gigantica type and intermediate Fasciola were observed. The result also provided additional genetic evidence for the existence of three forms of Fasciola isolated from native cattle in Korea by genetic approach using ITS-2 sequence.
Development of Scoring Functions for Antibody Sequence Assessment and Optimization
Seeliger, Daniel
2013-01-01
Antibody development is still associated with substantial risks and difficulties as single mutations can radically change molecule properties like thermodynamic stability, solubility or viscosity. Since antibody generation methodologies cannot select and optimize for molecule properties which are important for biotechnological applications, careful sequence analysis and optimization is necessary to develop antibodies that fulfil the ambitious requirements of future drugs. While efforts to grab the physical principles of undesired molecule properties from the very bottom are becoming increasingly powerful, the wealth of publically available antibody sequences provides an alternative way to develop early assessment strategies for antibodies using a statistical approach which is the objective of this paper. Here, publically available sequences were used to develop heuristic potentials for the framework regions of heavy and light chains of antibodies of human and murine origin. The potentials take into account position dependent probabilities of individual amino acids but also conditional probabilities which are inevitable for sequence assessment and optimization. It is shown that the potentials derived from human sequences clearly distinguish between human sequences and sequences from mice and, hence, can be used as a measure of humaness which compares a given sequence with the phenotypic pool of human sequences instead of comparing sequence identities to germline genes. Following this line, it is demonstrated that, using the developed potentials, humanization of an antibody can be described as a simple mathematical optimization problem and that the in-silico generated framework variants closely resemble native sequences in terms of predicted immunogenicity. PMID:24204701
Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E
2013-08-15
Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.
Software for pre-processing Illumina next-generation sequencing short read sequences
2014-01-01
Background When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets. Methods We developed ngsShoRT (next-generation sequencing Short Reads Trimmer), a flexible and comprehensive open-source software package written in Perl that provides a set of algorithms commonly used for pre-processing NGS short read sequences. We compared the features and performance of ngsShoRT with existing tools: CutAdapt, NGS QC Toolkit and Trimmomatic. We also compared the effects of using pre-processed short read sequences generated by different algorithms on de novo and reference-based assembly for three different genomes: Caenorhabditis elegans, Saccharomyces cerevisiae S288c, and Escherichia coli O157 H7. Results Several combinations of ngsShoRT algorithms were tested on publicly available Illumina GA II, HiSeq 2000, and MiSeq eukaryotic and bacteria genomic short read sequences with the focus on removing sequencing artifacts and low-quality reads and/or bases. Our results show that across three organisms and three sequencing platforms, trimming improved the mean quality scores of trimmed sequences. Using trimmed sequences for de novo and reference-based assembly improved assembly quality as well as assembler performance. In general, ngsShoRT outperformed comparable trimming tools in terms of trimming speed and improvement of de novo and reference-based assembly as measured by assembly contiguity and correctness. Conclusions Trimming of short read sequences can improve the quality of de novo and reference-based assembly and assembler performance. The parallel processing capability of ngsShoRT reduces trimming time and improves the memory efficiency when dealing with large datasets. We recommend combining sequencing artifacts removal, and quality score based read filtering and base trimming as the most consistent method for improving sequence quality and downstream assemblies. ngsShoRT source code, user guide and tutorial are available at http://research.bioinformatics.udel.edu/genomics/ngsShoRT/. ngsShoRT can be incorporated as a pre-processing step in genome and transcriptome assembly projects. PMID:24955109
Tahir, Muhammad N; Lockhart, Ben; Grinstead, Samuel; Mollov, Dimitre
2017-04-01
Bermuda grass samples were examined by transmission electron microscopy and 28-30 nm spherical virus particles were observed. Total RNA from these plants was subjected to high-throughput sequencing (HTS). The nearly full genome sequence of a panicovirus was identified from one HTS scaffold. Sanger sequencing was used to confirm the HTS results and complete the genome sequence of 4404 nt. This virus was provisionally named Bermuda grass latent virus (BGLV). Its predicted open reading frames follow the typical arrangement of the genus Panicovirus. Based on sequence comparisons and phylogenetic analyses BGLV differs from other viruses and therefore taxonomically it is a new member of the genus Panicovirus, family Tombusviridae.
Unlocking hidden genomic sequence
Keith, Jonathan M.; Cochran, Duncan A. E.; Lala, Gita H.; Adams, Peter; Bryant, Darryn; Mitchelson, Keith R.
2004-01-01
Despite the success of conventional Sanger sequencing, significant regions of many genomes still present major obstacles to sequencing. Here we propose a novel approach with the potential to alleviate a wide range of sequencing difficulties. The technique involves extracting target DNA sequence from variants generated by introduction of random mutations. The introduction of mutations does not destroy original sequence information, but distributes it amongst multiple variants. Some of these variants lack problematic features of the target and are more amenable to conventional sequencing. The technique has been successfully demonstrated with mutation levels up to an average 18% base substitution and has been used to read previously intractable poly(A), AT-rich and GC-rich motifs. PMID:14973330
FASH: A web application for nucleotides sequence search.
Veksler-Lublinksy, Isana; Barash, Danny; Avisar, Chai; Troim, Einav; Chew, Paul; Kedem, Klara
2008-05-27
: FASH (Fourier Alignment Sequence Heuristics) is a web application, based on the Fast Fourier Transform, for finding remote homologs within a long nucleic acid sequence. Given a query sequence and a long text-sequence (e.g, the human genome), FASH detects subsequences within the text that are remotely-similar to the query. FASH offers an alternative approach to Blast/Fasta for querying long RNA/DNA sequences. FASH differs from these other approaches in that it does not depend on the existence of contiguous seed-sequences in its initial detection phase. The FASH web server is user friendly and very easy to operate. FASH can be accessed athttps://fash.bgu.ac.il:8443/fash/default.jsp (secured website).
ERIC Educational Resources Information Center
Flowers, Susan K.; Easter, Carla; Holmes, Andrea; Cohen, Brian; Bednarski, April E.; Mardis, Elaine R.; Wilson, Richard K.; Elgin, Sarah C. R.
2005-01-01
Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington…
USDA-ARS?s Scientific Manuscript database
A bacterial artificial chromosome (BAC) library and BAC-end sequences for Gossypium hirsutum L. have recently been developed. Here we report on genomic-based genome-wide SNP mining utilizing re-sequencing data with a BAC-end sequence reference for twelve G. hirsutum L. lines, one G. barbadense L. li...
37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.
Code of Federal Regulations, 2011 CFR
2011-07-01
... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...
USDA-ARS?s Scientific Manuscript database
Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...
Bandelt, Hans-Jürgen; Yao, Yong-Gang; Bravi, Claudio M; Salas, Antonio; Kivisild, Toomas
2009-03-01
Sequence analysis of the mitochondrial genome has become a routine method in the study of mitochondrial diseases. Quite often, the sequencing efforts in the search of pathogenic or disease-associated mutations are affected by technical and interpretive problems, caused by sample mix-up, contamination, biochemical problems, incomplete sequencing, misdocumentation and insufficient reference to previously published data. To assess data quality in case studies of mitochondrial diseases, it is recommended to compare any mtDNA sequence under consideration to their phylogenetically closest lineages available in the Web. The median network method has proven useful for visualizing potential problems with the data. We contrast some early reports of complete mtDNA sequences to more recent total mtDNA sequencing efforts in studies of various mitochondrial diseases. We conclude that the quality of complete mtDNA sequences generated in the medical field in the past few years is somewhat unsatisfactory and may even fall behind that of pioneer manual sequencing in the early nineties. Our study provides a paradigm for an a posteriori evaluation of sequence quality and for detection of potential problems with inferring a pathogenic status of a particular mutation.
A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS
Jiao, Xiaoli; Zheng, Xin; Ma, Liang; Kutty, Geetha; Gogineni, Emile; Sun, Qiang; Sherman, Brad T.; Hu, Xiaojun; Jones, Kristine; Raley, Castle; Tran, Bao; Munroe, David J.; Stephens, Robert; Liang, Dun; Imamichi, Tomozumi; Kovacs, Joseph A.; Lempicki, Richard A.; Huang, Da Wei
2013-01-01
PacBio RS, a newly emerging third-generation DNA sequencing platform, is based on a real-time, single-molecule, nano-nitch sequencing technology that can generate very long reads (up to 20-kb) in contrast to the shorter reads produced by the first and second generation sequencing technologies. As a new platform, it is important to assess the sequencing error rate, as well as the quality control (QC) parameters associated with the PacBio sequence data. In this study, a mixture of 10 prior known, closely related DNA amplicons were sequenced using the PacBio RS sequencing platform. After aligning Circular Consensus Sequence (CCS) reads derived from the above sequencing experiment to the known reference sequences, we found that the median error rate was 2.5% without read QC, and improved to 1.3% with an SVM based multi-parameter QC method. In addition, a De Novo assembly was used as a downstream application to evaluate the effects of different QC approaches. This benchmark study indicates that even though CCS reads are post error-corrected it is still necessary to perform appropriate QC on CCS reads in order to produce successful downstream bioinformatics analytical results. PMID:24179701
A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS.
Jiao, Xiaoli; Zheng, Xin; Ma, Liang; Kutty, Geetha; Gogineni, Emile; Sun, Qiang; Sherman, Brad T; Hu, Xiaojun; Jones, Kristine; Raley, Castle; Tran, Bao; Munroe, David J; Stephens, Robert; Liang, Dun; Imamichi, Tomozumi; Kovacs, Joseph A; Lempicki, Richard A; Huang, Da Wei
2013-07-31
PacBio RS, a newly emerging third-generation DNA sequencing platform, is based on a real-time, single-molecule, nano-nitch sequencing technology that can generate very long reads (up to 20-kb) in contrast to the shorter reads produced by the first and second generation sequencing technologies. As a new platform, it is important to assess the sequencing error rate, as well as the quality control (QC) parameters associated with the PacBio sequence data. In this study, a mixture of 10 prior known, closely related DNA amplicons were sequenced using the PacBio RS sequencing platform. After aligning Circular Consensus Sequence (CCS) reads derived from the above sequencing experiment to the known reference sequences, we found that the median error rate was 2.5% without read QC, and improved to 1.3% with an SVM based multi-parameter QC method. In addition, a De Novo assembly was used as a downstream application to evaluate the effects of different QC approaches. This benchmark study indicates that even though CCS reads are post error-corrected it is still necessary to perform appropriate QC on CCS reads in order to produce successful downstream bioinformatics analytical results.
Jakupciak, John P; Wells, Jeffrey M; Karalus, Richard J; Pawlowski, David R; Lin, Jeffrey S; Feldman, Andrew B
2013-01-01
Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.
Jakupciak, John P.; Wells, Jeffrey M.; Karalus, Richard J.; Pawlowski, David R.; Lin, Jeffrey S.; Feldman, Andrew B.
2013-01-01
Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations. PMID:24455204
Evolutionary Dynamics of the Gametologous CTNNB1 Gene on the Z and W Chromosomes of Snakes.
Laopichienpong, Nararat; Muangmai, Narongrit; Chanhome, Lawan; Suntrarachun, Sunutcha; Twilprawat, Panupon; Peyachoknagul, Surin; Srikulnath, Kornsorn
2017-03-01
Snakes exhibit genotypic sex determination with female heterogamety (ZZ males and ZW females), and the state of sex chromosome differentiation also varies among lineages. To investigate the evolutionary history of homologous genes located in the nonrecombining region of differentiated sex chromosomes in snakes, partial sequences of the gametologous CTNNB1 gene were analyzed for 12 species belonging to henophid (Cylindrophiidae, Xenopeltidae, and Pythonidae) and caenophid snakes (Viperidae, Elapidae, and Colubridae). Nonsynonymous/synonymous substitution ratios (Ka/Ks) in coding sequences were low (Ka/Ks < 1) between CTNNB1Z and CTNNB1W, suggesting that these 2 genes may have similar functional properties. However, frequencies of intron sequence substitutions and insertion–deletions were higher in CTNNB1Z than CTNNB1W, suggesting that Z-linked sequences evolved faster than W-linked sequences. Molecular phylogeny based on both intron and exon sequences showed the presence of 2 major clades: 1) Z-linked sequences of Caenophidia and 2) W-linked sequences of Caenophidia clustered with Z-linked sequences of Henophidia, which suggests that the sequence divergence between CTNNB1Z and CTNNB1W in Caenophidia may have occurred by the cessation of recombination after the split from Henophidia.
Genomic sequencing of Pleistocene cave bears
DOE Office of Scientific and Technical Information (OSTI.GOV)
Noonan, James P.; Hofreiter, Michael; Smith, Doug
2005-04-01
Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome,more » the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.« less
A High-Throughput Process for the Solid-Phase Purification of Synthetic DNA Sequences
Grajkowski, Andrzej; Cieślak, Jacek; Beaucage, Serge L.
2017-01-01
An efficient process for the purification of synthetic phosphorothioate and native DNA sequences is presented. The process is based on the use of an aminopropylated silica gel support functionalized with aminooxyalkyl functions to enable capture of DNA sequences through an oximation reaction with the keto function of a linker conjugated to the 5′-terminus of DNA sequences. Deoxyribonucleoside phosphoramidites carrying this linker, as a 5′-hydroxyl protecting group, have been synthesized for incorporation into DNA sequences during the last coupling step of a standard solid-phase synthesis protocol executed on a controlled pore glass (CPG) support. Solid-phase capture of the nucleobase- and phosphate-deprotected DNA sequences released from the CPG support is demonstrated to proceed near quantitatively. Shorter than full-length DNA sequences are first washed away from the capture support; the solid-phase purified DNA sequences are then released from this support upon reaction with tetra-n-butylammonium fluoride in dry dimethylsulfoxide (DMSO) and precipitated in tetrahydrofuran (THF). The purity of solid-phase-purified DNA sequences exceeds 98%. The simulated high-throughput and scalability features of the solid-phase purification process are demonstrated without sacrificing purity of the DNA sequences. PMID:28628204
Sequence investigation of 34 forensic autosomal STRs with massively parallel sequencing.
Zhang, Suhua; Niu, Yong; Bian, Yingnan; Dong, Rixia; Liu, Xiling; Bao, Yun; Jin, Chao; Zheng, Hancheng; Li, Chengtao
2018-05-01
STRs vary not only in the length of the repeat units and the number of repeats but also in the region with which they conform to an incremental repeat pattern. Massively parallel sequencing (MPS) offers new possibilities in the analysis of STRs since they can simultaneously sequence multiple targets in a single reaction and capture potential internal sequence variations. Here, we sequenced 34 STRs applied in the forensic community of China with a custom-designed panel. MPS performance were evaluated from sequencing reads analysis, concordance study and sensitivity testing. High coverage sequencing data were obtained to determine the constitute ratios and heterozygous balance. No actual inconsistent genotypes were observed between capillary electrophoresis (CE) and MPS, demonstrating the reliability of the panel and the MPS technology. With the sequencing data from the 200 investigated individuals, 346 and 418 alleles were obtained via CE and MPS technologies at the 34 STRs, indicating MPS technology provides higher discrimination than CE detection. The whole study demonstrated that STR genotyping with the custom panel and MPS technology has the potential not only to reveal length and sequence variations but also to satisfy the demands of high throughput and high multiplexing with acceptable sensitivity.
Fourment, Mathieu; Gibbs, Mark J
2008-01-01
Background Viruses of the Bunyaviridae have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. Results The VirusBanker database contains Bunyaviridae sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. Bunyaviridae sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. Conclusion VirusBanker allows large datasets of aligned nucleotide and protein sequences from the Bunyaviridae to be compiled and winnowed rapidly using criteria that are formulated heuristically. PMID:18251994
Rapid DNA Sequencing by Direct Nanoscale Reading of Nucleotide Bases on Individual DNA Chains
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, James Weifu; Meller, Amit
2007-01-01
Since the independent invention of DNA sequencing by Sanger and by Gilbert 30 years ago, it has grown from a small scale technique capable of reading several kilobase-pair of sequence per day into today's multibillion dollar industry. This growth has spurred the development of new sequencing technologies that do not involve either electrophoresis or Sanger sequencing chemistries. Sequencing by Synthesis (SBS) involves multiple parallel micro-sequencing addition events occurring on a surface, where data from each round is detected by imaging. New High Throughput Technologies for DNA Sequencing and Genomics is the second volume in the Perspectives in Bioanalysis series, whichmore » looks at the electroanalytical chemistry of nucleic acids and proteins, development of electrochemical sensors and their application in biomedicine and in the new fields of genomics and proteomics. The authors have expertly formatted the information for a wide variety of readers, including new developments that will inspire students and young scientists to create new tools for science and medicine in the 21st century. Reviews of complementary developments in Sanger and SBS sequencing chemistries, capillary electrophoresis and microdevice integration, MS sequencing and applications set the framework for the book.« less
Spectra library assisted de novo peptide sequencing for HCD and ETD spectra pairs.
Yan, Yan; Zhang, Kaizhong
2016-12-23
De novo peptide sequencing via tandem mass spectrometry (MS/MS) has been developed rapidly in recent years. With the use of spectra pairs from the same peptide under different fragmentation modes, performance of de novo sequencing is greatly improved. Currently, with large amount of spectra sequenced everyday, spectra libraries containing tens of thousands of annotated experimental MS/MS spectra become available. These libraries provide information of the spectra properties, thus have the potential to be used with de novo sequencing to improve its performance. In this study, an improved de novo sequencing method assisted with spectra library is proposed. It uses spectra libraries as training datasets and introduces significant scores of the features used in our previous de novo sequencing method for HCD and ETD spectra pairs. Two pairs of HCD and ETD spectral datasets were used to test the performance of the proposed method and our previous method. The results show that this proposed method achieves better sequencing accuracy with higher ranked correct sequences and less computational time. This paper proposed an advanced de novo sequencing method for HCD and ETD spectra pair and used information from spectra libraries and significant improved previous similar methods.
Solopchuk, Oleg; Alamia, Andrea; Dricot, Laurence; Duque, Julie; Zénon, Alexandre
2017-12-01
Neuroimaging studies have repeatedly emphasized the role of the supplementary motor area (SMA) in motor sequence learning, but interferential approaches have led to inconsistent findings. Here, we aimed to test the role of the SMA in motor skill learning by combining interferential and neuroimaging techniques. Sixteen subjects were trained on simple finger movement sequences for 4 days. Afterwards, they underwent two neuroimaging sessions, in which they executed both trained and novel sequences. Prior to entering the scanner, the subjects received inhibitory transcranial magnetic stimulation (TMS) over the SMA or a control site. Using multivariate fMRI analysis, we confirmed that motor training enhances the neural representation of motor sequences in the SMA, in accordance with previous findings. However, although SMA inhibition altered sequence representation (i.e. between-sequence decoding accuracy) in this area, behavioural performance remained unimpaired. Our findings question the causal link between the neuroimaging correlate of elementary motor sequence representation in the SMA and sequence generation, calling for a more thorough investigation of the role of this region in performance of learned motor sequences. Copyright © 2017 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fields, C.A.
1996-06-01
The objective of this project is the development of practical software to automate the identification of genes in anonymous DNA sequences from the human, and other higher eukaryotic genomes. A software system for automated sequence analysis, gm (gene modeler) has been designed, implemented, tested, and distributed to several dozen laboratories worldwide. A significantly faster, more robust, and more flexible version of this software, gm 2.0 has now been completed, and is being tested by operational use to analyze human cosmid sequence data. A range of efforts to further understand the features of eukaryoyic gene sequences are also underway. This progressmore » report also contains papers coming out of the project including the following: gm: a Tool for Exploratory Analysis of DNA Sequence Data; The Human THE-LTR(O) and MstII Interspersed Repeats are subfamilies of a single widely distruted highly variable repeat family; Information contents and dinucleotide compostions of plant intron sequences vary with evolutionary origin; Splicing signals in Drosophila: intron size, information content, and consensus sequences; Integration of automated sequence analysis into mapping and sequencing projects; Software for the C. elegans genome project.« less
Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato; Takeyama, Haruko
2017-07-12
Massively parallel single-cell genome sequencing is required to further understand genetic diversities in complex biological systems. Whole genome amplification (WGA) is the first step for single-cell sequencing, but its throughput and accuracy are insufficient in conventional reaction platforms. Here, we introduce single droplet multiple displacement amplification (sd-MDA), a method that enables massively parallel amplification of single cell genomes while maintaining sequence accuracy and specificity. Tens of thousands of single cells are compartmentalized in millions of picoliter droplets and then subjected to lysis and WGA by passive droplet fusion in microfluidic channels. Because single cells are isolated in compartments, their genomes are amplified to saturation without contamination. This enables the high-throughput acquisition of contamination-free and cell specific sequence reads from single cells (21,000 single-cells/h), resulting in enhancement of the sequence data quality compared to conventional methods. This method allowed WGA of both single bacterial cells and human cancer cells. The obtained sequencing coverage rivals those of conventional techniques with superior sequence quality. In addition, we also demonstrate de novo assembly of uncultured soil bacteria and obtain draft genomes from single cell sequencing. This sd-MDA is promising for flexible and scalable use in single-cell sequencing.
Using expected sequence features to improve basecalling accuracy of amplicon pyrosequencing data.
Rask, Thomas S; Petersen, Bent; Chen, Donald S; Day, Karen P; Pedersen, Anders Gorm
2016-04-22
Amplicon pyrosequencing targets a known genetic region and thus inherently produces reads highly anticipated to have certain features, such as conserved nucleotide sequence, and in the case of protein coding DNA, an open reading frame. Pyrosequencing errors, consisting mainly of nucleotide insertions and deletions, are on the other hand likely to disrupt open reading frames. Such an inverse relationship between errors and expectation based on prior knowledge can be used advantageously to guide the process known as basecalling, i.e. the inference of nucleotide sequence from raw sequencing data. The new basecalling method described here, named Multipass, implements a probabilistic framework for working with the raw flowgrams obtained by pyrosequencing. For each sequence variant Multipass calculates the likelihood and nucleotide sequence of several most likely sequences given the flowgram data. This probabilistic approach enables integration of basecalling into a larger model where other parameters can be incorporated, such as the likelihood for observing a full-length open reading frame at the targeted region. We apply the method to 454 amplicon pyrosequencing data obtained from a malaria virulence gene family, where Multipass generates 20 % more error-free sequences than current state of the art methods, and provides sequence characteristics that allow generation of a set of high confidence error-free sequences. This novel method can be used to increase accuracy of existing and future amplicon sequencing data, particularly where extensive prior knowledge is available about the obtained sequences, for example in analysis of the immunoglobulin VDJ region where Multipass can be combined with a model for the known recombining germline genes. Multipass is available for Roche 454 data at http://www.cbs.dtu.dk/services/MultiPass-1.0 , and the concept can potentially be implemented for other sequencing technologies as well.
Li, Jonathan Z; Chapman, Brad; Charlebois, Patrick; Hofmann, Oliver; Weiner, Brian; Porter, Alyssa J; Samuel, Reshmi; Vardhanabhuti, Saran; Zheng, Lu; Eron, Joseph; Taiwo, Babafemi; Zody, Michael C; Henn, Matthew R; Kuritzkes, Daniel R; Hide, Winston; Wilson, Cara C; Berzins, Baiba I; Acosta, Edward P; Bastow, Barbara; Kim, Peter S; Read, Sarah W; Janik, Jennifer; Meres, Debra S; Lederman, Michael M; Mong-Kryspin, Lori; Shaw, Karl E; Zimmerman, Louis G; Leavitt, Randi; De La Rosa, Guy; Jennings, Amy
2014-01-01
The impact of raltegravir-resistant HIV-1 minority variants (MVs) on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs. A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser. Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001). Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454. In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.
Holcomb, C L; Rastrou, M; Williams, T C; Goodridge, D; Lazaro, A M; Tilanus, M; Erlich, H A
2014-01-01
The high-resolution human leukocyte antigen (HLA) genotyping assay that we developed using 454 sequencing and Conexio software uses generic polymerase chain reaction (PCR) primers for DRB exon 2. Occasionally, we observed low abundance DRB amplicon sequences that resulted from in vitro PCR 'crossing over' between DRB1 and DRB3/4/5. These hybrid sequences, revealed by the clonal sequencing property of the 454 system, were generally observed at a read depth of 5%-10% of the true alleles. They usually contained at least one mismatch with the IMGT/HLA database, and consequently, were easily recognizable and did not cause a problem for HLA genotyping. Sometimes, however, these artifactual sequences matched a rare allele and the automatic genotype assignment was incorrect. These observations raised two issues: (1) could PCR conditions be modified to reduce such artifacts? and (2) could some of the rare alleles listed in the IMGT/HLA database be artifacts rather than true alleles? Because PCR crossing over occurs during late cycles of PCR, we compared DRB genotypes resulting from 28 and (our standard) 35 cycles of PCR. For all 21 cell line DNAs amplified for 35 cycles, crossover products were detected. In 33% of the cases, these hybrid sequences corresponded to named alleles. With amplification for only 28 cycles, these artifactual sequences were not detectable. To investigate whether some rare alleles in the IMGT/HLA database might be due to PCR artifacts, we analyzed four samples obtained from the investigators who submitted the sequences. In three cases, the sequences were generated from true alleles. In one case, our 454 sequencing revealed an error in the previously submitted sequence. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
2012-01-01
Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742
Gullapalli, Rama R; Desai, Ketaki V; Santana-Santos, Lucas; Kant, Jeffrey A; Becich, Michael J
2012-01-01
The Human Genome Project (HGP) provided the initial draft of mankind's DNA sequence in 2001. The HGP was produced by 23 collaborating laboratories using Sanger sequencing of mapped regions as well as shotgun sequencing techniques in a process that occupied 13 years at a cost of ~$3 billion. Today, Next Generation Sequencing (NGS) techniques represent the next phase in the evolution of DNA sequencing technology at dramatically reduced cost compared to traditional Sanger sequencing. A single laboratory today can sequence the entire human genome in a few days for a few thousand dollars in reagents and staff time. Routine whole exome or even whole genome sequencing of clinical patients is well within the realm of affordability for many academic institutions across the country. This paper reviews current sequencing technology methods and upcoming advancements in sequencing technology as well as challenges associated with data generation, data manipulation and data storage. Implementation of routine NGS data in cancer genomics is discussed along with potential pitfalls in the interpretation of the NGS data. The overarching importance of bioinformatics in the clinical implementation of NGS is emphasized.[7] We also review the issue of physician education which also is an important consideration for the successful implementation of NGS in the clinical workplace. NGS technologies represent a golden opportunity for the next generation of pathologists to be at the leading edge of the personalized medicine approaches coming our way. Often under-emphasized issues of data access and control as well as potential ethical implications of whole genome NGS sequencing are also discussed. Despite some challenges, it's hard not to be optimistic about the future of personalized genome sequencing and its potential impact on patient care and the advancement of knowledge of human biology and disease in the near future.
FRAGS: estimation of coding sequence substitution rates from fragmentary data
Swart, Estienne C; Hide, Winston A; Seoighe, Cathal
2004-01-01
Background Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account. Results We have developed FRAGS, an application framework that uses existing, freely available software components to construct in-frame alignments and estimate coding substitution rates from fragmentary sequence data. Coding sequence substitution estimates for human and chimpanzee sequences, generated by FRAGS, reveal that methodological differences can give rise to significantly different estimates of important substitution parameters. The estimated substitution rates were also used to infer upper-bounds on the amount of sequencing error in the datasets that we have analysed. Conclusion We have developed a system that performs robust estimation of substitution rates for orthologous sequences from a pair of organisms. Our system can be used when fragmentary genomic or transcript data is available from one of the organisms and the other is a completely sequenced genome within the Ensembl database. As well as estimating substitution statistics our system enables the user to manage and query alignment and substitution data. PMID:15005802
Wolffe, E J; Gause, W C; Pelfrey, C M; Holland, S M; Steinberg, A D; August, J T
1990-01-05
We describe the isolation and sequencing of a cDNA encoding mouse Pgp-1. An oligonucleotide probe corresponding to the NH2-terminal sequence of the purified protein was synthesized by the polymerase chain reaction and used to screen a mouse macrophage lambda gt11 library. A cDNA clone with an insert of 1.2 kilobases was selected and sequenced. In Northern blot analysis, only cells expressing Pgp-1 contained mRNA species that hybridized with this Pgp-1 cDNA. The nucleotide sequence of the cDNA has a single open reading frame that yields a protein-coding sequence of 1076 base pairs followed by a 132-base pair 3'-untranslated sequence that includes a putative polyadenylation signal but no poly(A) tail. The translated sequence comprises a 13-amino acid signal peptide followed by a polypeptide core of 345 residues corresponding to an Mr of 37,800. Portions of the deduced amino acid sequence were identical to those obtained by amino acid sequence analysis from the purified glycoprotein, confirming that the cDNA encodes Pgp-1. The predicted structure of Pgp-1 includes an NH2-terminal extracellular domain (residues 14-265), a transmembrane domain (residues 266-286), and a cytoplasmic tail (residues 287-358). Portions of the mouse Pgp-1 sequence are highly similar to that of the human CD44 cell surface glycoprotein implicated in cell adhesion. The protein also shows sequence similarity to the proteoglycan tandem repeat sequences found in cartilage link protein and cartilage proteoglycan core protein which are thought to be involved in binding to hyaluronic acid.
Osmundson, Todd W.; Robert, Vincent A.; Schoch, Conrad L.; Baker, Lydia J.; Smith, Amy; Robich, Giovanni; Mizzan, Luca; Garbelotto, Matteo M.
2013-01-01
Despite recent advances spearheaded by molecular approaches and novel technologies, species description and DNA sequence information are significantly lagging for fungi compared to many other groups of organisms. Large scale sequencing of vouchered herbarium material can aid in closing this gap. Here, we describe an effort to obtain broad ITS sequence coverage of the approximately 6000 macrofungal-species-rich herbarium of the Museum of Natural History in Venice, Italy. Our goals were to investigate issues related to large sequencing projects, develop heuristic methods for assessing the overall performance of such a project, and evaluate the prospects of such efforts to reduce the current gap in fungal biodiversity knowledge. The effort generated 1107 sequences submitted to GenBank, including 416 previously unrepresented taxa and 398 sequences exhibiting a best BLAST match to an unidentified environmental sequence. Specimen age and taxon affected sequencing success, and subsequent work on failed specimens showed that an ITS1 mini-barcode greatly increased sequencing success without greatly reducing the discriminating power of the barcode. Similarity comparisons and nonmetric multidimensional scaling ordinations based on pairwise distance matrices proved to be useful heuristic tools for validating the overall accuracy of specimen identifications, flagging potential misidentifications, and identifying taxa in need of additional species-level revision. Comparison of within- and among-species nucleotide variation showed a strong increase in species discriminating power at 1–2% dissimilarity, and identified potential barcoding issues (same sequence for different species and vice-versa). All sequences are linked to a vouchered specimen, and results from this study have already prompted revisions of species-sequence assignments in several taxa. PMID:23638077
Inferring Short-Range Linkage Information from Sequencing Chromatograms
Beggel, Bastian; Neumann-Fraune, Maria; Kaiser, Rolf; Verheyen, Jens; Lengauer, Thomas
2013-01-01
Direct Sanger sequencing of viral genome populations yields multiple ambiguous sequence positions. It is not straightforward to derive linkage information from sequencing chromatograms, which in turn hampers the correct interpretation of the sequence data. We present a method for determining the variants existing in a viral quasispecies in the case of two nearby ambiguous sequence positions by exploiting the effect of sequence context-dependent incorporation of dideoxynucleotides. The computational model was trained on data from sequencing chromatograms of clonal variants and was evaluated on two test sets of in vitro mixtures. The approach achieved high accuracies in identifying the mixture components of 97.4% on a test set in which the positions to be analyzed are only one base apart from each other, and of 84.5% on a test set in which the ambiguous positions are separated by three bases. In silico experiments suggest two major limitations of our approach in terms of accuracy. First, due to a basic limitation of Sanger sequencing, it is not possible to reliably detect minor variants with a relative frequency of no more than 10%. Second, the model cannot distinguish between mixtures of two or four clonal variants, if one of two sets of linear constraints is fulfilled. Furthermore, the approach requires repetitive sequencing of all variants that might be present in the mixture to be analyzed. Nevertheless, the effectiveness of our method on the two in vitro test sets shows that short-range linkage information of two ambiguous sequence positions can be inferred from Sanger sequencing chromatograms without any further assumptions on the mixture composition. Additionally, our model provides new insights into the established and widely used Sanger sequencing technology. The source code of our method is made available at http://bioinf.mpi-inf.mpg.de/publications/beggel/linkageinformation.zip. PMID:24376502
CLAST: CUDA implemented large-scale alignment search tool.
Yano, Masahiro; Mori, Hiroshi; Akiyama, Yutaka; Yamada, Takuji; Kurokawa, Ken
2014-12-11
Metagenomics is a powerful methodology to study microbial communities, but it is highly dependent on nucleotide sequence similarity searching against sequence databases. Metagenomic analyses with next-generation sequencing technologies produce enormous numbers of reads from microbial communities, and many reads are derived from microbes whose genomes have not yet been sequenced, limiting the usefulness of existing sequence similarity search tools. Therefore, there is a clear need for a sequence similarity search tool that can rapidly detect weak similarity in large datasets. We developed a tool, which we named CLAST (CUDA implemented large-scale alignment search tool), that enables analyses of millions of reads and thousands of reference genome sequences, and runs on NVIDIA Fermi architecture graphics processing units. CLAST has four main advantages over existing alignment tools. First, CLAST was capable of identifying sequence similarities ~80.8 times faster than BLAST and 9.6 times faster than BLAT. Second, CLAST executes global alignment as the default (local alignment is also an option), enabling CLAST to assign reads to taxonomic and functional groups based on evolutionarily distant nucleotide sequences with high accuracy. Third, CLAST does not need a preprocessed sequence database like Burrows-Wheeler Transform-based tools, and this enables CLAST to incorporate large, frequently updated sequence databases. Fourth, CLAST requires <2 GB of main memory, making it possible to run CLAST on a standard desktop computer or server node. CLAST achieved very high speed (similar to the Burrows-Wheeler Transform-based Bowtie 2 for long reads) and sensitivity (equal to BLAST, BLAT, and FR-HIT) without the need for extensive database preprocessing or a specialized computing platform. Our results demonstrate that CLAST has the potential to be one of the most powerful and realistic approaches to analyze the massive amount of sequence data from next-generation sequencing technologies.
Houseknecht, D.W.; Bird, K.J.
2004-01-01
Beaufortian strata (Jurassic-Lower Cretaceous) in the National Petroleum Reserve in Alaska (NPRA) are a focus of exploration since the 1994 discovery of the nearby Alpine oil field (>400 MMBO). These strata include the Kingak Shale, a succession of depositional sequences influenced by rift opening of the Arctic Ocean Basin. Interpretation of sequence stratigraphy and depositional facies from a regional two-dimensional seismic grid and well data allows the definition of four sequence sets that each displays unique stratal geometries and thickness trends across NPRA. A Lower to Middle Jurassic sequence set includes numerous transgressive-regressive sequences that collectively built a clastic shelf in north-central NPRA. Along the south-facing, lobate shelf margin, condensed shales in transgressive systems tracts downlap and coalesce into a basinal condensed section that is likely an important hydrocarbon source rock. An Oxfordian-Kimmeridgian sequence set, deposited during pulses of uplift on the Barrow arch, includes multiple transgressive-regressive sequences that locally contain well-winnowed, shoreface sandstones at the base of transgressive systems tracts. These shoreface sandstones and overlying shales, deposited during maximum flooding, form stratigraphic traps that are the main objective of exploration in the Alpine play in NPRA. A Valanginian sequence set includes at least two transgressive-regressive sequences that display relatively distal characteristics, suggesting high relative sea level. An important exception is the presence of a basal transgressive systems tract that locally contains shoreface sandstones of reservoir quality. A Hauterivian sequence set includes two transgressive-regressive sequences that constitute a shelf-margin wedge developed as the result of tectonic uplift along the Barrow arch during rift opening of the Arctic Ocean Basin. This sequence set displays stratal geometries suggesting incision and synsedimentary collapse of the shelf margin. ?? 2004. The American Association of Petroleum Geologists. All rights reserved.
Statistical learning of music- and language-like sequences and tolerance for spectral shifts.
Daikoku, Tatsuya; Yatomi, Yutaka; Yumoto, Masato
2015-02-01
In our previous study (Daikoku, Yatomi, & Yumoto, 2014), we demonstrated that the N1m response could be a marker for the statistical learning process of pitch sequence, in which each tone was ordered by a Markov stochastic model. The aim of the present study was to investigate how the statistical learning of music- and language-like auditory sequences is reflected in the N1m responses based on the assumption that both language and music share domain generality. By using vowel sounds generated by a formant synthesizer, we devised music- and language-like auditory sequences in which higher-ordered transitional rules were embedded according to a Markov stochastic model by controlling fundamental (F0) and/or formant frequencies (F1-F2). In each sequence, F0 and/or F1-F2 were spectrally shifted in the last one-third of the tone sequence. Neuromagnetic responses to the tone sequences were recorded from 14 right-handed normal volunteers. In the music- and language-like sequences with pitch change, the N1m responses to the tones that appeared with higher transitional probability were significantly decreased compared with the responses to the tones that appeared with lower transitional probability within the first two-thirds of each sequence. Moreover, the amplitude difference was even retained within the last one-third of the sequence after the spectral shifts. However, in the language-like sequence without pitch change, no significant difference could be detected. The pitch change may facilitate the statistical learning in language and music. Statistically acquired knowledge may be appropriated to process altered auditory sequences with spectral shifts. The relative processing of spectral sequences may be a domain-general auditory mechanism that is innate to humans. Copyright © 2014 Elsevier Inc. All rights reserved.
Reilly, Kevin J.; Spencer, Kristie A.
2013-01-01
The current study investigated the processes responsible for selection of sounds and syllables during production of speech sequences in 10 adults with hypokinetic dysarthria from Parkinson’s disease, five adults with ataxic dysarthria, and 14 healthy control speakers. Speech production data from a choice reaction time task were analyzed to evaluate the effects of sequence length and practice on speech sound sequencing. Speakers produced sequences that were between one and five syllables in length over five experimental runs of 60 trials each. In contrast to the healthy speakers, speakers with hypokinetic dysarthria demonstrated exaggerated sequence length effects for both inter-syllable intervals (ISIs) and speech error rates. Conversely, speakers with ataxic dysarthria failed to demonstrate a sequence length effect on ISIs and were also the only group that did not exhibit practice-related changes in ISIs and speech error rates over the five experimental runs. The exaggerated sequence length effects in the hypokinetic speakers with Parkinson’s disease are consistent with an impairment of action selection during speech sequence production. The absent length effects observed in the speakers with ataxic dysarthria is consistent with previous findings that indicate a limited capacity to buffer speech sequences in advance of their execution. In addition, the lack of practice effects in these speakers suggests that learning-related improvements in the production rate and accuracy of speech sequences involves processing by structures of the cerebellum. Together, the current findings inform models of serial control for speech in healthy speakers and support the notion that sequencing deficits contribute to speech symptoms in speakers with hypokinetic or ataxic dysarthria. In addition, these findings indicate that speech sequencing is differentially impaired in hypokinetic and ataxic dysarthria. PMID:24137121
Identification of Mycobacterium spp. of veterinary importance using rpoB gene sequencing
2011-01-01
Background Studies conducted on Mycobacterium spp. isolated from human patients indicate that sequencing of a 711 bp portion of the rpoB gene can be useful in assigning a species identity, particularly for members of the Mycobacterium avium complex (MAC). Given that MAC are important pathogens in livestock, companion animals, and zoo/exotic animals, we were interested in evaluating the use of rpoB sequencing for identification of Mycobacterium isolates of veterinary origin. Results A total of 386 isolates, collected over 2008 - June 2011 from 378 animals (amphibians, reptiles, birds, and mammals) underwent PCR and sequencing of a ~ 711 bp portion of the rpoB gene; 310 isolates (80%) were identified to the species level based on similarity at ≥ 98% with a reference sequence. The remaining 76 isolates (20%) displayed < 98% similarity with reference sequences and were assigned to a clade based on their location in a neighbor-joining tree containing reference sequences. For a subset of 236 isolates that received both 16S rRNA and rpoB sequencing, 167 (70%) displayed a similar species/clade assignation for both sequencing methods. For the remaining 69 isolates, species/clade identities were different with each sequencing method. Mycobacterium avium subsp. hominissuis was the species most frequently isolated from specimens from pigs, cervids, companion animals, cattle, and exotic/zoo animals. Conclusions rpoB sequencing proved useful in identifying Mycobacterium isolates of veterinary origin to clade, species, or subspecies levels, particularly for assemblages (such as the MAC) where 16S rRNA sequencing alone is not adequate to demarcate these taxa. rpoB sequencing can represent a cost-effective identification tool suitable for routine use in the veterinary diagnostic laboratory. PMID:22118247
Grimm, Alexandra; Meyer, Heiko; Nickel, Marcel D; Nittka, Mathias; Raithel, Esther; Chaudry, Oliver; Friedberger, Andreas; Uder, Michael; Kemmler, Wolfgang; Quick, Harald H; Engelke, Klaus
2018-06-01
The purpose of this study is to evaluate and compare 2-point (2pt), 3-point (3pt), and 6-point (6pt) Dixon magnetic resonance imaging (MRI) sequences with flexible echo times (TE) to measure proton density fat fraction (PDFF) within muscles. Two subject groups were recruited (G1: 23 young and healthy men, 31 ± 6 years; G2: 50 elderly men, sarcopenic, 77 ± 5 years). A 3-T MRI system was used to perform Dixon imaging on the left thigh. PDFF was measured with six Dixon prototype sequences: 2pt, 3pt, and 6pt sequences once with optimal TEs (in- and opposed-phase echo times), lower resolution, and higher bandwidth (optTE sequences) and once with higher image resolution (highRes sequences) and shortest possible TE, respectively. Intra-fascia PDFF content was determined. To evaluate the comparability among the sequences, Bland-Altman analysis was performed. The highRes 6pt Dixon sequences served as reference as a high correlation of this sequence to magnetic resonance spectroscopy has been shown before. The PDFF difference between the highRes 6pt Dixon sequence and the optTE 6pt, both 3pt, and the optTE 2pt was low (between 2.2% and 4.4%), however, not to the highRes 2pt Dixon sequence (33%). For the optTE sequences, difference decreased with the number of echoes used. In conclusion, for Dixon sequences with more than two echoes, the fat fraction measurement was reliable with arbitrary echo times, while for 2pt Dixon sequences, it was reliable with dedicated in- and opposed-phase echo timing. Copyright © 2018 Elsevier B.V. All rights reserved.
Abdulhadi, Mike A; Perno, Joseph R; Melhem, Elias R; Nucifora, Paolo G P
2014-01-01
In patients with spinal stenosis, magnetic resonance imaging of the cervical spine can be improved by using 3D driven-equilibrium fast spin echo sequences to provide a high-resolution assessment of osseous and ligamentous structures. However, it is not yet clear whether 3D driven-equilibrium fast spin echo sequences adequately evaluate the spinal cord itself. As a result, they are generally supplemented by additional 2D fast spin echo sequences, adding time to the examination and potential discomfort to the patient. Here we investigate the hypothesis that in patients with spinal stenosis and spondylotic myelopathy, 3D driven-equilibrium fast spin echo sequences can characterize cord lesions equally well as 2D fast spin echo sequences. We performed a retrospective analysis of 30 adult patients with spondylotic myelopathy who had been examined with both 3D driven-equilibrium fast spin echo sequences and 2D fast spin echo sequences at the same scanning session. The two sequences were inspected separately for each patient, and visible cord lesions were manually traced. We found no significant differences between 3D driven-equilibrium fast spin echo and 2D fast spin echo sequences in the mean number, mean area, or mean transverse dimensions of spondylotic cord lesions. Nevertheless, the mean contrast-to-noise ratio of cord lesions was decreased on 3D driven-equilibrium fast spin echo sequences compared to 2D fast spin echo sequences. These findings suggest that 3D driven-equilibrium fast spin echo sequences do not need supplemental 2D fast spin echo sequences for the diagnosis of spondylotic myelopathy, but they may be less well suited for quantitative signal measurements in the spinal cord.
FRESCO: Referential compression of highly similar sequences.
Wandelt, Sebastian; Leser, Ulf
2013-01-01
In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware.
Exome-wide DNA capture and next generation sequencing in domestic and wild species.
Cosart, Ted; Beja-Pereira, Albano; Chen, Shanyuan; Ng, Sarah B; Shendure, Jay; Luikart, Gordon
2011-07-05
Gene-targeted and genome-wide markers are crucial to advance evolutionary biology, agriculture, and biodiversity conservation by improving our understanding of genetic processes underlying adaptation and speciation. Unfortunately, for eukaryotic species with large genomes it remains costly to obtain genome sequences and to develop genome resources such as genome-wide SNPs. A method is needed to allow gene-targeted, next-generation sequencing that is flexible enough to include any gene or number of genes, unlike transcriptome sequencing. Such a method would allow sequencing of many individuals, avoiding ascertainment bias in subsequent population genetic analyses.We demonstrate the usefulness of a recent technology, exon capture, for genome-wide, gene-targeted marker discovery in species with no genome resources. We use coding gene sequences from the domestic cow genome sequence (Bos taurus) to capture (enrich for), and subsequently sequence, thousands of exons of B. taurus, B. indicus, and Bison bison (wild bison). Our capture array has probes for 16,131 exons in 2,570 genes, including 203 candidate genes with known function and of interest for their association with disease and other fitness traits. We successfully sequenced and mapped exon sequences from across the 29 autosomes and X chromosome in the B. taurus genome sequence. Exon capture and high-throughput sequencing identified thousands of putative SNPs spread evenly across all reference chromosomes, in all three individuals, including hundreds of SNPs in our targeted candidate genes. This study shows exon capture can be customized for SNP discovery in many individuals and for non-model species without genomic resources. Our captured exome subset was small enough for affordable next-generation sequencing, and successfully captured exons from a divergent wild species using the domestic cow genome as reference.
Ultraaccurate genome sequencing and haplotyping of single human cells.
Chu, Wai Keung; Edge, Peter; Lee, Ho Suk; Bansal, Vikas; Bafna, Vineet; Huang, Xiaohua; Zhang, Kun
2017-11-21
Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10 -8 and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs. Copyright © 2017 the Author(s). Published by PNAS.
Boll, Daniel T; Lewin, Jonathan S; Duerk, Jeffrey L; Aschoff, Andrik J; Merkle, Elmar M
2004-05-01
To compare the appropriate pulse sequences for interventional device guidance during magnetic resonance (MR) imaging at 0.2 T and to evaluate the dependence of sequence selection on the anatomic region of the procedure. Using a C-arm 0.2 T system, four interventional MR sequences were applied in 23 liver cases and during MR-guided neck interventions in 13 patients. The imaging protocol consisted of: multislice turbo spin echo (TSE) T2w, sequential-slice fast imaging with steady precession (FISP), a time-reversed version of FISP (PSIF), and FISP with balanced gradients in all spatial directions (True-FISP) sequences. Vessel conspicuity was rated and contrast-to-noise ratio (CNR) was calculated for each sequence and a differential receiver operating characteristic was performed. Liver findings were detected in 96% using the TSE sequence. PSIF, FISP, and True-FISP imaging showed lesions in 91%, 61%, and 65%, respectively. The TSE sequence offered the best CNR, followed by PSIF imaging. Differential receiver operating characteristic analysis also rated TSE and PSIF to be the superior sequences. Lesions in the head and neck were detected in all cases by TSE and FISP, in 92% using True-FISP, and in 84% using PSIF. True-FISP offered the best CNR, followed by TSE imaging. Vessels appeared bright on FISP and True-FISP imaging and dark on the other sequences. In interventional MR imaging, no single sequence fits all purposes. Image guidance for interventional MR during liver procedures is best achieved by PSIF or TSE, whereas biopsies in the head and neck are best performed using FISP or True-FISP sequences.
Osmundson, Todd W; Robert, Vincent A; Schoch, Conrad L; Baker, Lydia J; Smith, Amy; Robich, Giovanni; Mizzan, Luca; Garbelotto, Matteo M
2013-01-01
Despite recent advances spearheaded by molecular approaches and novel technologies, species description and DNA sequence information are significantly lagging for fungi compared to many other groups of organisms. Large scale sequencing of vouchered herbarium material can aid in closing this gap. Here, we describe an effort to obtain broad ITS sequence coverage of the approximately 6000 macrofungal-species-rich herbarium of the Museum of Natural History in Venice, Italy. Our goals were to investigate issues related to large sequencing projects, develop heuristic methods for assessing the overall performance of such a project, and evaluate the prospects of such efforts to reduce the current gap in fungal biodiversity knowledge. The effort generated 1107 sequences submitted to GenBank, including 416 previously unrepresented taxa and 398 sequences exhibiting a best BLAST match to an unidentified environmental sequence. Specimen age and taxon affected sequencing success, and subsequent work on failed specimens showed that an ITS1 mini-barcode greatly increased sequencing success without greatly reducing the discriminating power of the barcode. Similarity comparisons and nonmetric multidimensional scaling ordinations based on pairwise distance matrices proved to be useful heuristic tools for validating the overall accuracy of specimen identifications, flagging potential misidentifications, and identifying taxa in need of additional species-level revision. Comparison of within- and among-species nucleotide variation showed a strong increase in species discriminating power at 1-2% dissimilarity, and identified potential barcoding issues (same sequence for different species and vice-versa). All sequences are linked to a vouchered specimen, and results from this study have already prompted revisions of species-sequence assignments in several taxa.
Gullapalli, Rama R.; Desai, Ketaki V.; Santana-Santos, Lucas; Kant, Jeffrey A.; Becich, Michael J.
2012-01-01
The Human Genome Project (HGP) provided the initial draft of mankind's DNA sequence in 2001. The HGP was produced by 23 collaborating laboratories using Sanger sequencing of mapped regions as well as shotgun sequencing techniques in a process that occupied 13 years at a cost of ~$3 billion. Today, Next Generation Sequencing (NGS) techniques represent the next phase in the evolution of DNA sequencing technology at dramatically reduced cost compared to traditional Sanger sequencing. A single laboratory today can sequence the entire human genome in a few days for a few thousand dollars in reagents and staff time. Routine whole exome or even whole genome sequencing of clinical patients is well within the realm of affordability for many academic institutions across the country. This paper reviews current sequencing technology methods and upcoming advancements in sequencing technology as well as challenges associated with data generation, data manipulation and data storage. Implementation of routine NGS data in cancer genomics is discussed along with potential pitfalls in the interpretation of the NGS data. The overarching importance of bioinformatics in the clinical implementation of NGS is emphasized.[7] We also review the issue of physician education which also is an important consideration for the successful implementation of NGS in the clinical workplace. NGS technologies represent a golden opportunity for the next generation of pathologists to be at the leading edge of the personalized medicine approaches coming our way. Often under-emphasized issues of data access and control as well as potential ethical implications of whole genome NGS sequencing are also discussed. Despite some challenges, it's hard not to be optimistic about the future of personalized genome sequencing and its potential impact on patient care and the advancement of knowledge of human biology and disease in the near future. PMID:23248761
Wong, Danny Ka-Ho; Tsoi, Ottilia; Huang, Fung-Yu; Seto, Wai-Kay; Fung, James; Lai, Ching-Lung
2014-01-01
Nucleoside/nucleotide analogue for the treatment of chronic hepatitis B virus (HBV) infection is hampered by the emergence of drug resistance mutations. Conventional PCR sequencing cannot detect minor variants of <20%. We developed a modified co-amplification at lower denaturation temperature-PCR (COLD-PCR) method for the detection of HBV minority drug resistance mutations. The critical denaturation temperature for COLD-PCR was determined to be 78°C. Sensitivity of COLD-PCR sequencing was determined using serially diluted plasmids containing mixed proportions of HBV reverse transcriptase (rt) wild-type and mutant sequences. Conventional PCR sequencing detected mutations only if they existed in ≥25%, whereas COLD-PCR sequencing detected mutations when they existed in 5 to 10% of the viral population. The performance of COLD-PCR was compared to conventional PCR sequencing and a line probe assay (LiPA) using 215 samples obtained from 136 lamivudine- or telbivudine-treated patients with virological breakthrough. Among these 215 samples, drug resistance mutations were detected in 155 (72%), 148 (69%), and 113 samples (53%) by LiPA, COLD-PCR, and conventional PCR sequencing, respectively. Nineteen (9%) samples had mutations detectable by COLD-PCR but not LiPA, while 26 (12%) samples had mutations detectable by LiPA but not COLD-PCR, indicating both methods were comparable (P = 0.371). COLD-PCR was more sensitive than conventional PCR sequencing. Thirty-five (16%) samples had mutations detectable by COLD-PCR but not conventional PCR sequencing, while none had mutations detected by conventional PCR sequencing but not COLD-PCR (P < 0.0001). COLD-PCR sequencing is a simple method which is comparable to LiPA and superior to conventional PCR sequencing in detecting minor lamivudine/telbivudine resistance mutations. PMID:24951803
What are Whole Exome Sequencing and Whole Genome Sequencing?
... the future. For more information about DNA sequencing technologies and their use: Genetics Home Reference discusses whether ... University in St. Louis describes the different sequencing technologies and what the new technologies have meant for ...
Kaul, M G; Stork, A; Bansmann, P M; Nolte-Ernsting, C; Lund, G K; Weber, C; Adam, G
2004-11-01
To test the feasibility of k-space segmented gradient-echo pulse sequences for free-breathing coronary magnetic resonance angiography (cMRA) on a clinical 3T system. T2-prepared, fat-suppressed turbo field echo (TFE, turboFLASH, SFPGR) as well as balanced TFE (b-TFE, trueFISP, FIESTA, segmented SSFP) sequences with navigator gating for prospective motion correction were applied on a 3T system equipped with a six-element phased-array cardiac coil. In 15 healthy volunteers, the right coronary artery (RCA) was examined with TFE and b-TFE sequences. Due to examination time limitations, the left coronary artery (LM/LAD) was examined exclusively with the TFE sequence in ten volunteers. Image quality was graded on a five point scale (0 = not visualized to 4 = excellent). The length, diameter and sharpness of the vessels and the contrast-to-noise ratios (CNR) were measured. 98 % of all major segments (proximal/middle/distal) of the RCA could be seen with the TFE sequence and 82 % with the b-TFE sequence. The image quality for the three segments was graded higher for the TFE sequence (2.7/2.7/1.5) than for the b-TFE sequence (1.9/1.6/0.9) with P: (< or = 0.001/< or = 0.004/< or = 0.056). The kappa of the interobserver variability was 0.75 for the TFE sequence and 0.8 for the b-TFE sequence. The measured vessel lengths were longer for the TFE sequence (95 +/- 22 mm) than for the b-TFE sequence (80 +/- 40 mm; P < or = 0.115). No significant changes (P < or = 0.074, P < or = 0.145) in diameter and vessel sharpness of the RCAs were observed between the TFE (2.4 +/- 0.3 mm, 60 % +/- 5) and b-TFE sequences (2.4 +/- 0.3 mm, 62 % +/- 6). The CNR was higher for the TFE sequence (10.1 +/- 3.4) than for the b-TFE sequence (6.6 +/- 2.1; P < or = 0.014). All ten main and proximal segments of the LM/LAD, which were examined exclusively with the TFE sequence, were visible with grade 2.5 and 2.1. The middle segment was visible in seven cases with grade 1.3. In three cases, the distal segment was visible with grade 0.5. The vessel length was 78 +/- 27 mm and the CNR 11.9 +/- 2.4. The conventional TFE technique has demonstrated good feasibility for cMRA at 3T. In its operational availability at 3T, the b-TFE sequence is inferior to the TFE sequence.
Regulatory sequence analysis tools.
van Helden, Jacques
2003-07-01
The web resource Regulatory Sequence Analysis Tools (RSAT) (http://rsat.ulb.ac.be/rsat) offers a collection of software tools dedicated to the prediction of regulatory sites in non-coding DNA sequences. These tools include sequence retrieval, pattern discovery, pattern matching, genome-scale pattern matching, feature-map drawing, random sequence generation and other utilities. Alternative formats are supported for the representation of regulatory motifs (strings or position-specific scoring matrices) and several algorithms are proposed for pattern discovery. RSAT currently holds >100 fully sequenced genomes and these data are regularly updated from GenBank.
SEQassembly: A Practical Tools Program for Coding Sequences Splicing
NASA Astrophysics Data System (ADS)
Lee, Hongbin; Yang, Hang; Fu, Lei; Qin, Long; Li, Huili; He, Feng; Wang, Bo; Wu, Xiaoming
CDS (Coding Sequences) is a portion of mRNA sequences, which are composed by a number of exon sequence segments. The construction of CDS sequence is important for profound genetic analysis such as genotyping. A program in MATLAB environment is presented, which can process batch of samples sequences into code segments under the guide of reference exon models, and splice these code segments of same sample source into CDS according to the exon order in queue file. This program is useful in transcriptional polymorphism detection and gene function study.
1998-12-01
Type II restriction enzymes, such as Eco R1 endonulease, present a unique advantage for the study of sequence-specific recognition because they leave a record of where they have been in the form of the cleaved ends of the DNA sites where they were bound. The differential behavior of a sequence -specific protein at sites of differing base sequence is the essence of the sequence-specificity; the core question is how do these proteins discriminate between different DNA sequences especially when the two sequences are very similar. Principal Investigator: Dan Carter/New Century Pharmaceuticals
Curated eutherian third party data gene data sets.
Premzl, Marko
2016-03-01
The free available eutherian genomic sequence data sets advanced scientific field of genomics. Of note, future revisions of gene data sets were expected, due to incompleteness of public eutherian genomic sequence assemblies and potential genomic sequence errors. The eutherian comparative genomic analysis protocol was proposed as guidance in protection against potential genomic sequence errors in public eutherian genomic sequences. The protocol was applicable in updates of 7 major eutherian gene data sets, including 812 complete coding sequences deposited in European Nucleotide Archive as curated third party data gene data sets.
Protein Crystal Eco R1 Endonulease-DNA Complex
NASA Technical Reports Server (NTRS)
1998-01-01
Type II restriction enzymes, such as Eco R1 endonulease, present a unique advantage for the study of sequence-specific recognition because they leave a record of where they have been in the form of the cleaved ends of the DNA sites where they were bound. The differential behavior of a sequence -specific protein at sites of differing base sequence is the essence of the sequence-specificity; the core question is how do these proteins discriminate between different DNA sequences especially when the two sequences are very similar. Principal Investigator: Dan Carter/New Century Pharmaceuticals
Combined hairpin-antisense compositions and methods for modulating expression
Shanklin, John; Nguyen, Tam
2014-08-05
A nucleotide construct comprising a nucleotide sequence that forms a stem and a loop, wherein the loop comprises a nucleotide sequence that modulates expression of a target, wherein the stem comprises a nucleotide sequence that modulates expression of a target, and wherein the target modulated by the nucleotide sequence in the loop and the target modulated by the nucleotide sequence in the stem may be the same or different. Vectors, methods of regulating target expression, methods of providing a cell, and methods of treating conditions comprising the nucleotide sequence are also disclosed.
Combined hairpin-antisense compositions and methods for modulating expression
Shanklin, John; Nguyen, Tam Huu
2015-11-24
A nucleotide construct comprising a nucleotide sequence that forms a stem and a loop, wherein the loop comprises a nucleotide sequence that modulates expression of a target, wherein the stem comprises a nucleotide sequence that modulates expression of a target, and wherein the target modulated by the nucleotide sequence in the loop and the target modulated by the nucleotide sequence in the stem may be the same or different. Vectors, methods of regulating target expression, methods of providing a cell, and methods of treating conditions comprising the nucleotide sequence are also disclosed.
Van Kreijl, C F; Bos, J L
1977-01-01
The repeating nucleotide sequence of 68 base pairs in the mtDNA from an ethidium-induced cytoplasmic petite mutant of yeast has been determined. For sequence analysis specifically primed and terminated RNA copies, obtained by in vitro transcription of the separated strands, were use. The sequence consists of 66 consecutive AT base pairs flanked by two GC pairs and comprises nearly all of the mutant mitochondrial genome. The sequence, moreover, also represents the first part of wild-type mtDNA sequence so far. Images PMID:198740
Diversity of the P2 protein among nontypeable Haemophilus influenzae isolates.
Bell, J; Grass, S; Jeanteur, D; Munson, R S
1994-01-01
The genes for outer membrane protein P2 of four nontypeable Haemophilus influenzae strains were cloned and sequenced. The derived amino acid sequences were compared with the outer membrane protein P2 sequence from H. influenzae type b MinnA and the sequences of P2 from three additional nontypeable H. influenzae strains. The sequences were 76 to 94% identical. The sequences had regions with considerable variability separated by regions which were highly conserved. The variable regions mapped to putative surface-exposed loops of the protein. PMID:8188390
Jones, Darryl R; Thomas, Dallas; Alger, Nicholas; Ghavidel, Ata; Inglis, G Douglas; Abbott, D Wade
2018-01-01
Deposition of new genetic sequences in online databases is expanding at an unprecedented rate. As a result, sequence identification continues to outpace functional characterization of carbohydrate active enzymes (CAZymes). In this paradigm, the discovery of enzymes with novel functions is often hindered by high volumes of uncharacterized sequences particularly when the enzyme sequence belongs to a family that exhibits diverse functional specificities (i.e., polyspecificity). Therefore, to direct sequence-based discovery and characterization of new enzyme activities we have developed an automated in silico pipeline entitled: Sequence Analysis and Clustering of CarboHydrate Active enzymes for Rapid Informed prediction of Specificity (SACCHARIS). This pipeline streamlines the selection of uncharacterized sequences for discovery of new CAZyme or CBM specificity from families currently maintained on the CAZy website or within user-defined datasets. SACCHARIS was used to generate a phylogenetic tree of a GH43, a CAZyme family with defined subfamily designations. This analysis confirmed that large datasets can be organized into sequence clusters of manageable sizes that possess related functions. Seeding this tree with a GH43 sequence from Bacteroides dorei DSM 17855 (BdGH43b, revealed it partitioned as a single sequence within the tree. This pattern was consistent with it possessing a unique enzyme activity for GH43 as BdGH43b is the first described α-glucanase described for this family. The capacity of SACCHARIS to extract and cluster characterized carbohydrate binding module sequences was demonstrated using family 6 CBMs (i.e., CBM6s). This CBM family displays a polyspecific ligand binding profile and contains many structurally determined members. Using SACCHARIS to identify a cluster of divergent sequences, a CBM6 sequence from a unique clade was demonstrated to bind yeast mannan, which represents the first description of an α-mannan binding CBM. Additionally, we have performed a CAZome analysis of an in-house sequenced bacterial genome and a comparative analysis of B. thetaiotaomicron VPI-5482 and B. thetaiotaomicron 7330, to demonstrate that SACCHARIS can generate "CAZome fingerprints", which differentiate between the saccharolytic potential of two related strains in silico. Establishing sequence-function and sequence-structure relationships in polyspecific CAZyme families are promising approaches for streamlining enzyme discovery. SACCHARIS facilitates this process by embedding CAZyme and CBM family trees generated from biochemically to structurally characterized sequences, with protein sequences that have unknown functions. In addition, these trees can be integrated with user-defined datasets (e.g., genomics, metagenomics, and transcriptomics) to inform experimental characterization of new CAZymes or CBMs not currently curated, and for researchers to compare differential sequence patterns between entire CAZomes. In this light, SACCHARIS provides an in silico tool that can be tailored for enzyme bioprospecting in datasets of increasing complexity and for diverse applications in glycobiotechnology.
Lu, Chaoxia; Wu, Wei; Xiao, Jifang; Meng, Yan; Zhang, Shuyang; Zhang, Xue
2013-06-01
To detect pathogenic mutations in Marfan syndrome (MFS) using an Ion Torrent Personal Genome Machine (PGM) and to validate the result of targeted next-generation semiconductor sequencing for the diagnosis of genetic disorders. Peripheral blood samples were collected from three MFS patients and a normal control with informed consent. Genomic DNA was isolated by standard method and then subjected to targeted sequencing using an Ion Ampliseq(TM) Inherited Disease Panel. Three multiplex PCR reactions were carried out to amplify the coding exons of 328 genes including FBN1, TGFBR1 and TGFBR2. DNA fragments from different samples were ligated with barcoded sequencing adaptors. Template preparation and emulsion PCR, and Ion Sphere Particles enrichment were carried out using an Ion One Touch system. The ion sphere particles were sequenced on a 318 chip using the PGM platform. Data from the PGM runs were processed using an Ion Torrent Suite 3.2 software to generate sequence reads. After sequence alignment and extraction of SNPs and indels, all the variants were filtered against dbSNP137. DNA sequences were visualized with an Integrated Genomics Viewer. The most likely disease-causing variants were analyzed by Sanger sequencing. The PGM sequencing has yielded an output of 855.80 Mb, with a > 100 × median sequencing depth and a coverage of > 98% for the targeted regions in all the four samples. After data analysis and database filtering, one known missense mutation (p.E1811K) and two novel premature termination mutations (p.E2264X and p.L871FfsX23) in the FBN1 gene were identified in the three MFS patients. All mutations were verified by conventional Sanger sequencing. Pathogenic FBN1 mutations have been identified in all patients with MFS, indicating that the targeted next-generation sequencing on the PGM sequencers can be applied for accurate and high-throughput testing of genetic disorders.
Faragher, S G; Dalgarno, L
1986-07-20
The 3' untranslated (UT) sequences of the genomic RNAs of five geographic variants of the alphavirus Ross River virus (RRV) were determined and compared with the 3' UT sequence of RRV T48, the prototype strain. Part of the 3' UT region of Getah virus, a close serological relative of RRV, was also sequenced. The RRV 3' UT region varies markedly in length between variants. Large deletions or insertions, sequence rearrangements and single nucleotide substitutions are observed. A sequence tract of 49 to 58 nucleotides, which is repeated as four blocks in the RRV T48 3' UT region, occurs only once in the 3' UT region of one RRV strain (NB5092), indicating that the existence of repeat sequence blocks is not essential for RRV replication. However, the precise sequence of the 3' proximal copy of the repeat block and its position relative to the poly(A) tail were identical in all RRV isolates examined, suggesting that it has an important role in RRV replication. Nucleotide substitutions between RRV variants are distributed non-randomly along the length of the 3' UT region. The sequence of 120 to 130 nucleotides adjacent to the poly(A) tail is strongly conserved. Getah virus RNA contains three repeat sequence blocks in the 3' UT region. These are similar in sequence to those in RRV RNA but differ in their arrangement. Homology between the RRV and Getah 3' UT sequences is greatest in the 3' proximal repeat sequence block that shows three differences in 49 nucleotides. The 3' proximal repeat in Getah RNA occurs at the same position, relative to the poly(A) tail, as in all RRV variants. The RRV and Getah virus 3' UT sequences show extensive homology in the region between the 3' proximal repeat and the poly(A) tail but, apart from the repeat blocks themselves, they show no significant homology elsewhere.
Bashir, Ali; Bansal, Vikas; Bafna, Vineet
2010-06-18
Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcription-factor binding sites, small RNA discovery, etc. The multitude of sequencing platforms, each with their unique characteristics, pose a number of design challenges, regarding the technology to be used and the depth of sequencing required for a particular sequencing application. Here we describe a number of analytical and empirical results to address design questions for two applications: detection of structural variations from paired-end sequencing and estimating mRNA transcript abundance. For structural variation, our results provide explicit trade-offs between the detection and resolution of rearrangement breakpoints, and the optimal mix of paired-read insert lengths. Specifically, we prove that optimal detection and resolution of breakpoints is achieved using a mix of exactly two insert library lengths. Furthermore, we derive explicit formulae to determine these insert length combinations, enabling a 15% improvement in breakpoint detection at the same experimental cost. On empirical short read data, these predictions show good concordance with Illumina 200 bp and 2 Kbp insert length libraries. For transcriptome sequencing, we determine the sequencing depth needed to detect rare transcripts from a small pilot study. With only 1 Million reads, we derive corrections that enable almost perfect prediction of the underlying expression probability distribution, and use this to predict the sequencing depth required to detect low expressed genes with greater than 95% probability. Together, our results form a generic framework for many design considerations related to high-throughput sequencing. We provide software tools http://bix.ucsd.edu/projects/NGS-DesignTools to derive platform independent guidelines for designing sequencing experiments (amount of sequencing, choice of insert length, mix of libraries) for novel applications of next generation sequencing.
Yokoyama, Naoaki; Sivakumar, Thillaiampalam; Tuvshintulga, Bumduuren; Hayashida, Kyoko; Igarashi, Ikuo; Inoue, Noboru; Long, Phung Thang; Lan, Dinh Thi Bich
2015-03-01
The genes that encode merozoite surface antigens (MSAs) in Babesia bovis are genetically diverse. In this study, we analyzed the genetic diversity of B. bovis MSA-1, MSA-2b, and MSA-2c genes in Vietnamese cattle and water buffaloes. Blood DNA samples from 258 cattle and 49 water buffaloes reared in the Thua Thien Hue province of Vietnam were screened with a B. bovis-specific diagnostic PCR assay. The B. bovis-positive DNA samples (23 cattle and 16 water buffaloes) were then subjected to PCR assays to amplify the MSA-1, MSA-2b, and MSA-2c genes. Sequencing analyses showed that the Vietnamese MSA-1 and MSA-2b sequences are genetically diverse, whereas MSA-2c is relatively conserved. The nucleotide identity values for these MSA gene sequences were similar in the cattle and water buffaloes. Consistent with the sequencing data, the Vietnamese MSA-1 and MSA-2b sequences were dispersed across several clades in the corresponding phylogenetic trees, whereas the MSA-2c sequences occurred in a single clade. Cattle- and water-buffalo-derived sequences also often clustered together on the phylogenetic trees. The Vietnamese MSA-1, MSA-2b, and MSA-2c sequences were then screened for recombination with automated methods. Of the seven recombination events detected, five and two were associated with the MSA-2b and MSA-2c recombinant sequences, respectively, whereas no MSA-1 recombinants were detected among the sequences analyzed. Recombination between the sequences derived from cattle and water buffaloes was very common, and the resultant recombinant sequences were found in both host animals. These data indicate that the genetic diversity of the MSA sequences does not differ between cattle and water buffaloes in Vietnam. They also suggest that recombination between the B. bovis MSA sequences in both cattle and water buffaloes might contribute to the genetic variation in these genes in Vietnam. Copyright © 2015 Elsevier B.V. All rights reserved.
Nowrousian, Minou; Stajich, Jason E.; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D.; Pöggeler, Stefanie; Read, Nick D.; Seiler, Stephan; Smith, Kristina M.; Zickler, Denise; Kück, Ulrich; Freitag, Michael
2010-01-01
Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30–90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in ∼4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for comparative studies to address basic questions of fungal biology. PMID:20386741
Hodzic, Jasin; Gurbeta, Lejla; Omanovic-Miklicanin, Enisa; Badnjevic, Almir
2017-01-01
Introduction: Major advancements in DNA sequencing methods introduced in the first decade of the new millennium initiated a rapid expansion of sequencing studies, which yielded a tremendous amount of DNA sequence data, including whole sequenced genomes of various species, including plants. A set of novel sequencing platforms, often collectively named as “next-generation sequencing” (NGS) completely transformed the life sciences, by allowing extensive throughput, while greatly reducing the necessary time, labor and cost of any sequencing endeavor. Purpose: of this paper is to present an overview NGS platforms used to produce the current compendium of published draft genomes of various plants, namely the Roche/454, ABI/SOLiD, and Solexa/Illumina, and to determine the most frequently used platform for the whole genome sequencing of plants in light of genotypization of immortelle plant. Materials and methods: 45 papers were selected (with 47 presented plant genome draft sequences), and utilized sequencing techniques and NGS platforms (Roche/454, ABI/SOLiD and Illumina/Solexa) in selected papers were determined. Subsequently, frequency of usage of each platform or combination of platforms was calculated. Results: Illumina/Solexa platforms are by used either as sole sequencing tool in 40.42% of published genomes, or in combination with other platforms - additional 48.94% of published genomes, followed by Roche/454 platforms, used in combination with traditional Sanger sequencing method (10.64%), and never as a sole tool. ABI/SOLiD was only used in combination with Illumina/Solexa and Roche/454 in 4.25% of publications. Conclusions: Illumina/Solexa platforms are by far most preferred by researchers, most probably due to most affordable sequencing costs. Taking into consideration the current economic situation in the Balkans region, Illumina Solexa is the best (if not the only) platform choice if the sequencing of immortelle plant (Helichrysium arenarium) is to be performed by the researchers in this region. PMID:28974852
NASA Astrophysics Data System (ADS)
Humpula, James F.; Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Stafford, Thomas W.; Smith, James J.; Voorhies, Michael R.; George Corner, R.; Andrews, Phillip C.
2007-12-01
Ancient DNA sequences offer an extraordinary opportunity to unravel the evolutionary history of ancient organisms. Protein sequences offer another reservoir of genetic information that has recently become tractable through the application of mass spectrometric techniques. The extent to which ancient protein sequences resolve phylogenetic relationships, however, has not been explored. We determined the osteocalcin amino acid sequence from the bone of an extinct Camelid (21 ka, Camelops hesternus) excavated from Isleta Cave, New Mexico and three bones of extant camelids: bactrian camel ( Camelus bactrianus); dromedary camel ( Camelus dromedarius) and guanaco ( Llama guanacoe) for a diagenetic and phylogenetic assessment. There was no difference in sequence among the four taxa. Structural attributes observed in both modern and ancient osteocalcin include a post-translation modification, Hyp 9, deamidation of Gln 35 and Gln 39, and oxidation of Met 36. Carbamylation of the N-terminus in ancient osteocalcin may result in blockage and explain previous difficulties in sequencing ancient proteins via Edman degradation. A phylogenetic analysis using osteocalcin sequences of 25 vertebrate taxa was conducted to explore osteocalcin protein evolution and the utility of osteocalcin sequences for delineating phylogenetic relationships. The maximum likelihood tree closely reflected generally recognized taxonomic relationships. For example, maximum likelihood analysis recovered rodents, birds and, within hominins, the Homo-Pan-Gorilla trichotomy. Within Artiodactyla, character state analysis showed that a substitution of Pro 4 for His 4 defines the Capra-Ovis clade within Artiodactyla. Homoplasy in our analysis indicated that osteocalcin evolution is not a perfect indicator of species evolution. Limited sequence availability prevented assigning functional significance to sequence changes. Our preliminary analysis of osteocalcin evolution represents an initial step towards a complete character analysis aimed at determining the evolutionary history of this functionally significant protein. We emphasize that ancient protein sequencing and phylogenetic analyses using amino acid sequences must pay close attention to post-translational modifications, amino acid substitutions due to diagenetic alteration and the impacts of isobaric amino acids on mass shifts and sequence alignments.
Sequence-similar, structure-dissimilar protein pairs in the PDB.
Kosloff, Mickey; Kolodny, Rachel
2008-05-01
It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).
PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities
2011-01-01
Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/. PMID:21385349
Ketchum, Myles J; Weyand, Theodore G; Weed, Peter F; Winsauer, Peter J
2016-05-01
Learning is believed to be reflected in the activity of the hippocampus. However, neural correlates of learning have been difficult to characterize because hippocampal activity is integrated with ongoing behavior. To address this issue, male rats (n = 5) implanted with electrodes (n = 14) in the CA1 subfield responded during two tasks within a single test session. In one task, subjects acquired a new 3-response sequence (acquisition), whereas in the other task, subjects completed a well-rehearsed 3-response sequence (performance). Both tasks though could be completed using an identical response topography and used the same sensory stimuli and schedule of reinforcement. More important, comparing neural patterns during sequence acquisition to those during sequence performance allows for a subtractive approach whereby activity associated with learning could potentially be dissociated from the activity associated with ongoing behavior. At sites where CA1 activity was closely associated with behavior, the patterns of activity were differentially modulated by key position and the serial position of a response within the schedule of reinforcement. Temporal shifts between peak activity and responding on particular keys also occurred during sequence acquisition, but not during sequence performance. Ethanol disrupted CA1 activity while producing rate-decreasing effects in both tasks and error-increasing effects that were more selective for sequence acquisition than sequence performance. Ethanol also produced alterations in the magnitude of modulations and temporal pattern of CA1 activity, although these effects were not selective for sequence acquisition. Similar to ethanol, hippocampal micro-stimulation decreased response rate in both tasks and selectively increased the percentage of errors during sequence acquisition, and provided a more direct demonstration of hippocampal involvement during sequence acquisition. Together, these results strongly support the notion that ethanol disrupts sequence acquisition by disrupting hippocampal activity and that the hippocampus is necessary for the conditioned associations required for sequence acquisition. © 2015 Wiley Periodicals, Inc.
Ketchum, Myles J.; Weyand, Theodore G.; Weed, Peter F.; Winsauer, Peter J.
2015-01-01
Learning is believed to be reflected in the activity of the hippocampus. However, neural correlates of learning have been difficult to characterize because hippocampal activity is integrated with ongoing behavior. To address this issue, male rats (n=5) implanted with electrodes (n=14) in the CA1 subfield responded during two tasks within a single test session. In one task, subjects acquired a new 3-response sequence (acquisition), whereas in the other task, subjects completed a well-rehearsed 3-response sequence (performance). Both tasks though could be completed using an identical response topography and used the same sensory stimuli and schedule of reinforcement. More important, comparing neural patterns during sequence acquisition to those during sequence performance allows for a subtractive approach whereby activity associated with learning could potentially be dissociated from the activity associated with ongoing behavior. At sites where CA1 activity was closely associated with behavior, the patterns of activity were differentially modulated by key position and the serial position of a response within the schedule of reinforcement. Temporal shifts between peak activity and responding on particular keys also occurred during sequence acquisition, but not during sequence performance. Ethanol disrupted CA1 activity while producing rate-decreasing effects in both tasks and error-increasing effects that were more selective for sequence acquisition than sequence performance. Ethanol also produced alterations in the magnitude of modulations and temporal pattern of CA1 activity, although these effects were not selective for sequence acquisition. Similar to ethanol, hippocampal micro-stimulation decreased response rate in both tasks and selectively increased the percentage of errors during sequence acquisition, and provided a more direct demonstration of hippocampal involvement during sequence acquisition. Together, these results strongly support the notion that ethanol disrupts sequence acquisition by disrupting hippocampal activity and that the hippocampus is necessary for the conditioned associations required for sequence acquisition. PMID:26482846
Nowrousian, Minou; Stajich, Jason E; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D; Pöggeler, Stefanie; Read, Nick D; Seiler, Stephan; Smith, Kristina M; Zickler, Denise; Kück, Ulrich; Freitag, Michael
2010-04-08
Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30-90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in approximately 4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for comparative studies to address basic questions of fungal biology.
Adamiak, Paul; Vanderkooi, Otto G; Kellner, James D; Schryvers, Anthony B; Bettinger, Julie A; Alcantara, Joenel
2014-06-03
Multi-locus sequence typing (MLST) is a portable, broadly applicable method for classifying bacterial isolates at an intra-species level. This methodology provides clinical and scientific investigators with a standardized means of monitoring evolution within bacterial populations. MLST uses the DNA sequences from a set of genes such that each unique combination of sequences defines an isolate's sequence type. In order to reliably determine the sequence of a typing gene, matching sequence reads for both strands of the gene must be obtained. This study assesses the ability of both the standard, and an alternative set of, Streptococcus pneumoniae MLST primers to completely sequence, in both directions, the required typing alleles. The results demonstrated that for five (aroE, recP, spi, xpt, ddl) of the seven S. pneumoniae typing alleles, the standard primers were unable to obtain the complete forward and reverse sequences. This is due to the standard primers annealing too closely to the target regions, and current sequencing technology failing to sequence the bases that are too close to the primer. The alternative primer set described here, which includes a combination of primers proposed by the CDC and several designed as part of this study, addresses this limitation by annealing to highly conserved segments further from the target region. This primer set was subsequently employed to sequence type 105 S. pneumoniae isolates collected by the Canadian Immunization Monitoring Program ACTive (IMPACT) over a period of 18 years. The inability of several of the standard S. pneumoniae MLST primers to fully sequence the required region was consistently observed and is the result of a shift in sequencing technology occurring after the original primers were designed. The results presented here introduce clear documentation describing this phenomenon into the literature, and provide additional guidance, through the introduction of a widely validated set of alternative primers, to research groups seeking to undertake S. pneumoniae MLST based studies.
Increased fMRI Sensitivity at Equal Data Burden Using Averaged Shifted Echo Acquisition
Witt, Suzanne T.; Warntjes, Marcel; Engström, Maria
2016-01-01
There is growing evidence as to the benefits of collecting BOLD fMRI data with increased sampling rates. However, many of the newly developed acquisition techniques developed to collect BOLD data with ultra-short TRs require hardware, software, and non-standard analytic pipelines that may not be accessible to all researchers. We propose to incorporate the method of shifted echo into a standard multi-slice, gradient echo EPI sequence to achieve a higher sampling rate with a TR of <1 s with acceptable spatial resolution. We further propose to incorporate temporal averaging of consecutively acquired EPI volumes to both ameliorate the reduced temporal signal-to-noise inherent in ultra-fast EPI sequences and reduce the data burden. BOLD data were collected from 11 healthy subjects performing a simple, event-related visual-motor task with four different EPI sequences: (1) reference EPI sequence with TR = 1440 ms, (2) shifted echo EPI sequence with TR = 700 ms, (3) shifted echo EPI sequence with every two consecutively acquired EPI volumes averaged and effective TR = 1400 ms, and (4) shifted echo EPI sequence with every four consecutively acquired EPI volumes averaged and effective TR = 2800 ms. Both the temporally averaged sequences exhibited increased temporal signal-to-noise over the shifted echo EPI sequence. The shifted echo sequence with every two EPI volumes averaged also had significantly increased BOLD signal change compared with the other three sequences, while the shifted echo sequence with every four EPI volumes averaged had significantly decreased BOLD signal change compared with the other three sequences. The results indicated that incorporating the method of shifted echo into a standard multi-slice EPI sequence is a viable method for achieving increased sampling rate for collecting event-related BOLD data. Further, consecutively averaging every two consecutively acquired EPI volumes significantly increased the measured BOLD signal change and the subsequently calculated activation map statistics. PMID:27932947
Lovell, Charles R; Decker, Peter V; Bagwell, Christopher E; Thompson, Shelly; Matsui, George Y
2008-05-01
Methods to assess the diversity of the diazotroph assemblage in the rhizosphere of the salt marsh cordgrass, Spartina alterniflora were examined. The effectiveness of nifH PCR-denaturing gradient gel electrophoresis (DGGE) was compared to that of nifH clone library analysis. Seventeen DGGE gel bands were sequenced and yielded 58 nonidentical nifH sequences from a total of 67 sequences determined. A clone library constructed using the GC-clamp nifH primers that were employed in the PCR-DGGE (designated the GC-Library) yielded 83 nonidentical sequences from a total of 257 nifH sequences. A second library constructed using an alternate set of nifH primers (N-Library) yielded 83 nonidentical sequences from a total of 138 nifH sequences. Rarefaction curves for the libraries did not reach saturation, although the GC-Library curve was substantially dampened and appeared to be closer to saturation than the N-Library curve. Phylogenetic analyses showed that DGGE gel band sequencing recovered nifH sequences that were frequently sampled in the GC-Library, as well as sequences that were infrequently sampled, and provided a species composition assessment that was robust, efficient, and relatively inexpensive to obtain. Further, the DGGE method permits a large number of samples to be examined for differences in banding patterns, after which bands of interest can be sampled for sequence determination.
The genome-wide DNA sequence specificity of the anti-tumour drug bleomycin in human cells.
Murray, Vincent; Chen, Jon K; Tanaka, Mark M
2016-07-01
The cancer chemotherapeutic agent, bleomycin, cleaves DNA at specific sites. For the first time, the genome-wide DNA sequence specificity of bleomycin breakage was determined in human cells. Utilising Illumina next-generation DNA sequencing techniques, over 200 million bleomycin cleavage sites were examined to elucidate the bleomycin genome-wide DNA selectivity. The genome-wide bleomycin cleavage data were analysed by four different methods to determine the cellular DNA sequence specificity of bleomycin strand breakage. For the most highly cleaved DNA sequences, the preferred site of bleomycin breakage was at 5'-GT* dinucleotide sequences (where the asterisk indicates the bleomycin cleavage site), with lesser cleavage at 5'-GC* dinucleotides. This investigation also determined longer bleomycin cleavage sequences, with preferred cleavage at 5'-GT*A and 5'- TGT* trinucleotide sequences, and 5'-TGT*A tetranucleotides. For cellular DNA, the hexanucleotide DNA sequence 5'-RTGT*AY (where R is a purine and Y is a pyrimidine) was the most highly cleaved DNA sequence. It was striking that alternating purine-pyrimidine sequences were highly cleaved by bleomycin. The highest intensity cleavage sites in cellular and purified DNA were very similar although there were some minor differences. Statistical nucleotide frequency analysis indicated a G nucleotide was present at the -3 position (relative to the cleavage site) in cellular DNA but was absent in purified DNA.
Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction
Laehnemann, David; Borkhardt, Arndt
2016-01-01
Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. PMID:26026159
Korber, B T; Osmanov, S; Esparza, J; Myers, G
1994-11-01
The World Health Organization Global Programme on AIDS (WHO/GPA) is conducting a large-scale collaborative study of human immunodeficiency virus type 1 (HIV-1) variation, based in four potential vaccine-trial site countries: Brazil, Rwanda, Thailand, and Uganda. Through the course of this study, it was crucial to keep track of certain attributes of the samples from which the viral nucleotide sequences were derived (e.g., country of origin and viral culture characterization), so that meaningful sequence comparisons could be made. Here we describe a system developed in the context of the WHO/GPA study that summarizes such critical attributes by representing them as standardized characters directly incorporated into sequence names. This nomenclature allows linkage of clinical, phenotypic, and geographic information with molecular data. We propose that other investigators involved in human immunodeficiency virus (HIV) nucleotide sequencing efforts adopt a similar standardized sequence nomenclature to facilitate cross-study sequence comparison. HIV sequence data are being generated at an ever-increasing rate; directly coupled to this increase is our deepening understanding of biological parameters that influence or result from sequence variability. A standardized sequence nomenclature that includes relevant biological information would enable researchers to better utilize the growing body of sequence data, and enhance their ability to interpret the biological implications of their own data through facilitating comparisons with previously published work.
Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi
2014-09-18
Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.
Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong
2015-01-01
Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.
Rattei, Thomas; Tischler, Patrick; Götz, Stefan; Jehl, Marc-André; Hoser, Jonathan; Arnold, Roland; Conesa, Ana; Mewes, Hans-Werner
2010-01-01
The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).
Kimura, Tomohiro; Nakano, Toshiki; Yamaguchi, Toshiyasu; Sato, Minoru; Ogawa, Tomohisa; Muramoto, Koji; Yokoyama, Takehiko; Kan-No, Nobuhiro; Nagahisa, Eizou; Janssen, Frank; Grieshaber, Manfred K
2004-01-01
The complete complementary DNA sequences of genes presumably coding for opine dehydrogenases from Arabella iricolor (sandworm), Haliotis discus hannai (abalone), and Patinopecten yessoensis (scallop) were determined, and partial cDNA sequences were derived for Meretrix lusoria (Japanese hard clam) and Spisula sachalinensis (Sakhalin surf clam). The primers ODH-9F and ODH-11R proved useful for amplifying the sequences for opine dehydrogenases from the 4 mollusk species investigated in this study. The sequence of the sandworm was obtained using primers constructed from the amino acid sequence of tauropine dehydrogenase, the main opine dehydrogenase in A. iricolor. The complete cDNA sequence of A. iricolor, H. discus hannai, and P. yessoensis encode 397, 400, and 405 amino acids, respectively. All sequences were aligned and compared with published databank sequences of Loligo opalescens, Loligo vulgaris (squid), Sepia officinalis (cuttlefish), and Pecten maximus (scallop). As expected, a high level of homology was observed for the cDNA from closely related species, such as for cephalopods or scallops, whereas cDNA from the other species showed lower-level homologies. A similar trend was observed when the deduced amino acid sequences were compared. Furthermore, alignment of these sequences revealed some structural motifs that are possibly related to the binding sites of the substrates. The phylogenetic trees derived from the nucleotide and amino acid sequences were consistent with the classification of species resulting from classical taxonomic analyses.
Lim, Yan Wei; Cuevas, Daniel A.; Silva, Genivaldo Gueiros Z.; Aguinaldo, Kristen; Dinsdale, Elizabeth A.; Haas, Andreas F.; Hatay, Mark; Sanchez, Savannah E.; Wegley-Kelly, Linda; Dutilh, Bas E.; Harkins, Timothy T.; Lee, Clarence C.; Tom, Warren; Sandin, Stuart A.; Smith, Jennifer E.; Zgliczynski, Brian; Vermeij, Mark J.A.; Rohwer, Forest
2014-01-01
Genomics and metagenomics have revolutionized our understanding of marine microbial ecology and the importance of microbes in global geochemical cycles. However, the process of DNA sequencing has always been an abstract extension of the research expedition, completed once the samples were returned to the laboratory. During the 2013 Southern Line Islands Research Expedition, we started the first effort to bring next generation sequencing to some of the most remote locations on our planet. We successfully sequenced twenty six marine microbial genomes, and two marine microbial metagenomes using the Ion Torrent PGM platform on the Merchant Yacht Hanse Explorer. Onboard sequence assembly, annotation, and analysis enabled us to investigate the role of the microbes in the coral reef ecology of these islands and atolls. This analysis identified phosphonate as an important phosphorous source for microbes growing in the Line Islands and reinforced the importance of L-serine in marine microbial ecosystems. Sequencing in the field allowed us to propose hypotheses and conduct experiments and further sampling based on the sequences generated. By eliminating the delay between sampling and sequencing, we enhanced the productivity of the research expedition. By overcoming the hurdles associated with sequencing on a boat in the middle of the Pacific Ocean we proved the flexibility of the sequencing, annotation, and analysis pipelines. PMID:25177534
Pulseq: A rapid and hardware-independent pulse sequence prototyping framework.
Layton, Kelvin J; Kroboth, Stefan; Jia, Feng; Littin, Sebastian; Yu, Huijun; Leupold, Jochen; Nielsen, Jon-Fredrik; Stöcker, Tony; Zaitsev, Maxim
2017-04-01
Implementing new magnetic resonance experiments, or sequences, often involves extensive programming on vendor-specific platforms, which can be time consuming and costly. This situation is exacerbated when research sequences need to be implemented on several platforms simultaneously, for example, at different field strengths. This work presents an alternative programming environment that is hardware-independent, open-source, and promotes rapid sequence prototyping. A novel file format is described to efficiently store the hardware events and timing information required for an MR pulse sequence. Platform-dependent interpreter modules convert the file to appropriate instructions to run the sequence on MR hardware. Sequences can be designed in high-level languages, such as MATLAB, or with a graphical interface. Spin physics simulation tools are incorporated into the framework, allowing for comparison between real and virtual experiments. Minimal effort is required to implement relatively advanced sequences using the tools provided. Sequences are executed on three different MR platforms, demonstrating the flexibility of the approach. A high-level, flexible and hardware-independent approach to sequence programming is ideal for the rapid development of new sequences. The framework is currently not suitable for large patient studies or routine scanning although this would be possible with deeper integration into existing workflows. Magn Reson Med 77:1544-1552, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Zhu, X; Naz, R K
1999-03-01
The deduced ZP3 amino acid (aa) sequences of 13 vertebrate species namely mouse, hamster, rabbit, pig, porcine, cow, dog, cat, human, bonnet, marmoset, carp, and frog were compared using the PILEUP and PRETTY alignment programs (GCG, Wisconsin, USA). The published aa sequences obtained from 13 vertebrate species indicated the overall evolutionarily conservation in the N-terminus, central region, and C-terminus of the ZP3 polypeptide. More variations of ZP3 polypeptide sequences were seen in the alignments of carp and frog from the 11 mammalian species making the leader sequence more prominent. The canonical furin proteolytic processing signal at the C-terminus was found in all the ZP3 polypeptide sequences except of carp and frog. In the central region, the ZP3 deduced aa sequences of all the 13 vertebrate species aligned well, and six relatively conserved sequences were found. There are 11 conserved cysteine residues in the central region across all species including carp and frog, indicating that these residues have longer evolutionary history. The ZP3 aa sequence similarities were examined using the GAP program (GCG). The highest aa similarities are observed between the members of the same order within the class mammalia, and also (95.4%) between pig (ungulata) and rabbit (lagomorpha). The deduced ZP3 aa sequences per se may not be enough to build a phylogenetic tree.
Lim, Yan Wei; Cuevas, Daniel A; Silva, Genivaldo Gueiros Z; Aguinaldo, Kristen; Dinsdale, Elizabeth A; Haas, Andreas F; Hatay, Mark; Sanchez, Savannah E; Wegley-Kelly, Linda; Dutilh, Bas E; Harkins, Timothy T; Lee, Clarence C; Tom, Warren; Sandin, Stuart A; Smith, Jennifer E; Zgliczynski, Brian; Vermeij, Mark J A; Rohwer, Forest; Edwards, Robert A
2014-01-01
Genomics and metagenomics have revolutionized our understanding of marine microbial ecology and the importance of microbes in global geochemical cycles. However, the process of DNA sequencing has always been an abstract extension of the research expedition, completed once the samples were returned to the laboratory. During the 2013 Southern Line Islands Research Expedition, we started the first effort to bring next generation sequencing to some of the most remote locations on our planet. We successfully sequenced twenty six marine microbial genomes, and two marine microbial metagenomes using the Ion Torrent PGM platform on the Merchant Yacht Hanse Explorer. Onboard sequence assembly, annotation, and analysis enabled us to investigate the role of the microbes in the coral reef ecology of these islands and atolls. This analysis identified phosphonate as an important phosphorous source for microbes growing in the Line Islands and reinforced the importance of L-serine in marine microbial ecosystems. Sequencing in the field allowed us to propose hypotheses and conduct experiments and further sampling based on the sequences generated. By eliminating the delay between sampling and sequencing, we enhanced the productivity of the research expedition. By overcoming the hurdles associated with sequencing on a boat in the middle of the Pacific Ocean we proved the flexibility of the sequencing, annotation, and analysis pipelines.
MIPS: a database for protein sequences, homology data and yeast genome information.
Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F
1997-01-01
The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498
The advantages of SMRT sequencing.
Roberts, Richard J; Carneiro, Mauricio O; Schatz, Michael C
2013-07-03
Of the current next-generation sequencing technologies, SMRT sequencing is sometimes overlooked. However, attributes such as long reads, modified base detection and high accuracy make SMRT a useful technology and an ideal approach to the complete sequencing of small genomes.
Metagenomic ventures into outer sequence space.
Dutilh, Bas E
Sequencing DNA or RNA directly from the environment often results in many sequencing reads that have no homologs in the database. These are referred to as "unknowns," and reflect the vast unexplored microbial sequence space of our biosphere, also known as "biological dark matter." However, unknowns also exist because metagenomic datasets are not optimally mined. There is a pressure on researchers to publish and move on, and the unknown sequences are often left for what they are, and conclusions drawn based on reads with annotated homologs. This can cause abundant and widespread genomes to be overlooked, such as the recently discovered human gut bacteriophage crAssphage. The unknowns may be enriched for bacteriophage sequences, the most abundant and genetically diverse component of the biosphere and of sequence space. However, it remains an open question, what is the actual size of biological sequence space? The de novo assembly of shotgun metagenomes is the most powerful tool to address this question.
Growth and splitting of neural sequences in songbird vocal development
Okubo, Tatsuo S.; Mackevicius, Emily L.; Payne, Hannah L.; Lynch, Galen F.; Fee, Michale S.
2015-01-01
Neural sequences are a fundamental feature of brain dynamics underlying diverse behaviors, but the mechanisms by which they develop during learning remain unknown. Songbirds learn vocalizations composed of syllables; in adult birds, each syllable is produced by a different sequence of action potential bursts in the premotor cortical area HVC. Here we carried out recordings of large populations of HVC neurons in singing juvenile birds throughout learning to examine the emergence of neural sequences. Early in vocal development, HVC neurons begin producing rhythmic bursts, temporally locked to a ‘prototype’ syllable. Different neurons are active at different latencies relative to syllable onset to form a continuous sequence. Through development, as new syllables emerge from the prototype syllable, initially highly overlapping burst sequences become increasingly distinct. We propose a mechanistic model in which multiple neural sequences can emerge from the growth and splitting of a common precursor sequence. PMID:26618871
Analysis of Illumina Microbial Assemblies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clum, Alicia; Foster, Brian; Froula, Jeff
2010-05-28
Since the emerging of second generation sequencing technologies, the evaluation of different sequencing approaches and their assembly strategies for different types of genomes has become an important undertaken. Next generation sequencing technologies dramatically increase sequence throughput while decreasing cost, making them an attractive tool for whole genome shotgun sequencing. To compare different approaches for de-novo whole genome assembly, appropriate tools and a solid understanding of both quantity and quality of the underlying sequence data are crucial. Here, we performed an in-depth analysis of short-read Illumina sequence assembly strategies for bacterial and archaeal genomes. Different types of Illumina libraries as wellmore » as different trim parameters and assemblers were evaluated. Results of the comparative analysis and sequencing platforms will be presented. The goal of this analysis is to develop a cost-effective approach for the increased throughput of the generation of high quality microbial genomes.« less
Effects of Sequences of Cognitions on Group Performance Over Time
Molenaar, Inge; Chiu, Ming Ming
2017-01-01
Extending past research showing that sequences of low cognitions (low-level processing of information) and high cognitions (high-level processing of information through questions and elaborations) influence the likelihoods of subsequent high and low cognitions, this study examines whether sequences of cognitions are related to group performance over time; 54 primary school students (18 triads) discussed and wrote an essay about living in another country (32,375 turns of talk). Content analysis and statistical discourse analysis showed that within each lesson, groups with more low cognitions or more sequences of low cognition followed by high cognition added more essay words. Groups with more high cognitions, sequences of low cognition followed by low cognition, or sequences of high cognition followed by an action followed by low cognition, showed different words and sequences, suggestive of new ideas. The links between cognition sequences and group performance over time can inform facilitation and assessment of student discussions. PMID:28490854
Effects of Sequences of Cognitions on Group Performance Over Time.
Molenaar, Inge; Chiu, Ming Ming
2017-04-01
Extending past research showing that sequences of low cognitions (low-level processing of information) and high cognitions (high-level processing of information through questions and elaborations) influence the likelihoods of subsequent high and low cognitions, this study examines whether sequences of cognitions are related to group performance over time; 54 primary school students (18 triads) discussed and wrote an essay about living in another country (32,375 turns of talk). Content analysis and statistical discourse analysis showed that within each lesson, groups with more low cognitions or more sequences of low cognition followed by high cognition added more essay words. Groups with more high cognitions, sequences of low cognition followed by low cognition, or sequences of high cognition followed by an action followed by low cognition, showed different words and sequences, suggestive of new ideas. The links between cognition sequences and group performance over time can inform facilitation and assessment of student discussions.
Construction of a map-based reference genome sequence for barley, Hordeum vulgare L.
Beier, Sebastian; Himmelbach, Axel; Colmsee, Christian; Zhang, Xiao-Qi; Barrero, Roberto A.; Zhang, Qisen; Li, Lin; Bayer, Micha; Bolser, Daniel; Taudien, Stefan; Groth, Marco; Felder, Marius; Hastie, Alex; Šimková, Hana; Staňková, Helena; Vrána, Jan; Chan, Saki; Muñoz-Amatriaín, María; Ounit, Rachid; Wanamaker, Steve; Schmutzer, Thomas; Aliyeva-Schnorr, Lala; Grasso, Stefano; Tanskanen, Jaakko; Sampath, Dharanya; Heavens, Darren; Cao, Sujie; Chapman, Brett; Dai, Fei; Han, Yong; Li, Hua; Li, Xuan; Lin, Chongyun; McCooke, John K.; Tan, Cong; Wang, Songbo; Yin, Shuya; Zhou, Gaofeng; Poland, Jesse A.; Bellgard, Matthew I.; Houben, Andreas; Doležel, Jaroslav; Ayling, Sarah; Lonardi, Stefano; Langridge, Peter; Muehlbauer, Gary J.; Kersey, Paul; Clark, Matthew D.; Caccamo, Mario; Schulman, Alan H.; Platzer, Matthias; Close, Timothy J.; Hansson, Mats; Zhang, Guoping; Braumann, Ilka; Li, Chengdao; Waugh, Robbie; Scholz, Uwe; Stein, Nils; Mascher, Martin
2017-01-01
Barley (Hordeum vulgare L.) is a cereal grass mainly used as animal fodder and raw material for the malting industry. The map-based reference genome sequence of barley cv. ‘Morex’ was constructed by the International Barley Genome Sequencing Consortium (IBSC) using hierarchical shotgun sequencing. Here, we report the experimental and computational procedures to (i) sequence and assemble more than 80,000 bacterial artificial chromosome (BAC) clones along the minimum tiling path of a genome-wide physical map, (ii) find and validate overlaps between adjacent BACs, (iii) construct 4,265 non-redundant sequence scaffolds representing clusters of overlapping BACs, and (iv) order and orient these BAC clusters along the seven barley chromosomes using positional information provided by dense genetic maps, an optical map and chromosome conformation capture sequencing (Hi-C). Integrative access to these sequence and mapping resources is provided by the barley genome explorer (BARLEX). PMID:28448065
Church, George M.; Kieffer-Higgins, Stephen
1992-01-01
This invention features vectors and a method for sequencing DNA. The method includes the steps of: a) ligating the DNA into a vector comprising a tag sequence, the tag sequence includes at least 15 bases, wherein the tag sequence will not hybridize to the DNA under stringent hybridization conditions and is unique in the vector, to form a hybrid vector, b) treating the hybrid vector in a plurality of vessels to produce fragments comprising the tag sequence, wherein the fragments differ in length and terminate at a fixed known base or bases, wherein the fixed known base or bases differs in each vessel, c) separating the fragments from each vessel according to their size, d) hybridizing the fragments with an oligonucleotide able to hybridize specifically with the tag sequence, and e) detecting the pattern of hybridization of the tag sequence, wherein the pattern reflects the nucleotide sequence of the DNA.
Perceived ambiguity as a barrier to intentions to learn genome sequencing results
Taber, Jennifer M.; Klein, William M.P.; Ferrer, Rebecca A.; Han, Paul K. J.; Lewis, Katie L.; Biesecker, Leslie G.; Biesecker, Barbara B.
2015-01-01
Many variants that could be returned from genome sequencing may be perceived as ambiguous—lacking reliability, credibility, or adequacy. Little is known about how perceived ambiguity influences thoughts about sequencing results. Participants (n=494) in an NIH genome sequencing study completed a baseline survey before sequencing results were available. We examined how perceived ambiguity regarding sequencing results and individual differences in medical ambiguity aversion and tolerance for uncertainty were associated with cognitions and intentions concerning sequencing results. Perceiving sequencing results as more ambiguous was associated with less favorable cognitions about results and lower intentions to learn and share results. Among participants low in tolerance for uncertainty or optimism, greater perceived ambiguity was associated with lower intentions to learn results for non-medically actionable diseases; medical ambiguity aversion did not moderate any associations. Results are consistent with the phenomenon of “ambiguity aversion” and may influence whether people learn and communicate genomic information. PMID:26003053
SeqCompress: an algorithm for biological sequence compression.
Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz; Bajwa, Hassan
2014-10-01
The growth of Next Generation Sequencing technologies presents significant research challenges, specifically to design bioinformatics tools that handle massive amount of data efficiently. Biological sequence data storage cost has become a noticeable proportion of total cost in the generation and analysis. Particularly increase in DNA sequencing rate is significantly outstripping the rate of increase in disk storage capacity, which may go beyond the limit of storage capacity. It is essential to develop algorithms that handle large data sets via better memory management. This article presents a DNA sequence compression algorithm SeqCompress that copes with the space complexity of biological sequences. The algorithm is based on lossless data compression and uses statistical model as well as arithmetic coding to compress DNA sequences. The proposed algorithm is compared with recent specialized compression tools for biological sequences. Experimental results show that proposed algorithm has better compression gain as compared to other existing algorithms. Copyright © 2014 Elsevier Inc. All rights reserved.
Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization.
Bauer, Markus; Klau, Gunnar W; Reinert, Knut
2007-07-27
The discovery of functional non-coding RNA sequences has led to an increasing interest in algorithms related to RNA analysis. Traditional sequence alignment algorithms, however, fail at computing reliable alignments of low-homology RNA sequences. The spatial conformation of RNA sequences largely determines their function, and therefore RNA alignment algorithms have to take structural information into account. We present a graph-based representation for sequence-structure alignments, which we model as an integer linear program (ILP). We sketch how we compute an optimal or near-optimal solution to the ILP using methods from combinatorial optimization, and present results on a recently published benchmark set for RNA alignments. The implementation of our algorithm yields better alignments in terms of two published scores than the other programs that we tested: This is especially the case with an increasing number of input sequences. Our program LARA is freely available for academic purposes from http://www.planet-lisa.net.
Studier, F. William
1995-04-18
Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.
BiQ Analyzer HT: locus-specific analysis of DNA methylation by high-throughput bisulfite sequencing
Lutsik, Pavlo; Feuerbach, Lars; Arand, Julia; Lengauer, Thomas; Walter, Jörn; Bock, Christoph
2011-01-01
Bisulfite sequencing is a widely used method for measuring DNA methylation in eukaryotic genomes. The assay provides single-base pair resolution and, given sufficient sequencing depth, its quantitative accuracy is excellent. High-throughput sequencing of bisulfite-converted DNA can be applied either genome wide or targeted to a defined set of genomic loci (e.g. using locus-specific PCR primers or DNA capture probes). Here, we describe BiQ Analyzer HT (http://biq-analyzer-ht.bioinf.mpi-inf.mpg.de/), a user-friendly software tool that supports locus-specific analysis and visualization of high-throughput bisulfite sequencing data. The software facilitates the shift from time-consuming clonal bisulfite sequencing to the more quantitative and cost-efficient use of high-throughput sequencing for studying locus-specific DNA methylation patterns. In addition, it is useful for locus-specific visualization of genome-wide bisulfite sequencing data. PMID:21565797
Li, Wen Hui; Jia, Wan Zhong; Qu, Zi Gang; Xie, Zhi Zhou; Luo, Jian Xun; Yin, Hong; Sun, Xiao Lin; Blaga, Radu; Fu, Bao Quan
2013-04-01
A total of 16 Taenia multiceps isolates collected from naturally infected sheep or goats in Gansu Province, China were characterized by sequences of mitochondrial cytochrome c oxidase subunit 1 (cox1) gene. The complete cox1 gene was amplified for individual T. multiceps isolates by PCR, ligated to pMD18T vector, and sequenced. Sequence analysis indicated that out of 16 T. multiceps isolates 10 unique cox1 gene sequences of 1,623 bp were obtained with sequence variation of 0.12-0.68%. The results showed that the cox1 gene sequences were highly conserved among the examined T. multiceps isolates. However, they were quite different from those of the other Taenia species. Phylogenetic analysis based on complete cox1 gene sequences revealed that T. multiceps isolates were composed of 3 genotypes and distinguished from the other Taenia species.
Li, Wen Hui; Jia, Wan Zhong; Qu, Zi Gang; Xie, Zhi Zhou; Luo, Jian Xun; Yin, Hong; Sun, Xiao Lin; Blaga, Radu
2013-01-01
A total of 16 Taenia multiceps isolates collected from naturally infected sheep or goats in Gansu Province, China were characterized by sequences of mitochondrial cytochrome c oxidase subunit 1 (cox1) gene. The complete cox1 gene was amplified for individual T. multiceps isolates by PCR, ligated to pMD18T vector, and sequenced. Sequence analysis indicated that out of 16 T. multiceps isolates 10 unique cox1 gene sequences of 1,623 bp were obtained with sequence variation of 0.12-0.68%. The results showed that the cox1 gene sequences were highly conserved among the examined T. multiceps isolates. However, they were quite different from those of the other Taenia species. Phylogenetic analysis based on complete cox1 gene sequences revealed that T. multiceps isolates were composed of 3 genotypes and distinguished from the other Taenia species. PMID:23710087
Gopal, J; Yebra, M J; Bhagwat, A S
1994-01-01
The methyltransferase (MTase) in the DsaV restriction--modification system methylates within 5'-CCNGG sequences. We have cloned the gene for this MTase and determined its sequence. The predicted sequence of the MTase protein contains sequence motifs conserved among all cytosine-5 MTases and is most similar to other MTases that methylate CCNGG sequences, namely M.ScrFI and M.SsoII. All three MTases methylate the internal cytosine within their recognition sequence. The 'variable' region within the three enzymes that methylate CCNGG can be aligned with the sequences of two enzymes that methylate CCWGG sequences. Remarkably, two segments within this region contain significant similarity with the region of M.HhaI that is known to contact DNA bases. These alignments suggest that many cytosine-5 MTases are likely to interact with DNA using a similar structural framework. Images PMID:7971279
Studier, F.W.
1995-04-18
Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient. 2 figs.
NCBI-compliant genome submissions: tips and tricks to save time and money.
Pirovano, Walter; Boetzer, Marten; Derks, Martijn F L; Smit, Sandra
2017-03-01
Genome sequences nowadays play a central role in molecular biology and bioinformatics. These sequences are shared with the scientific community through sequence databases. The sequence repositories of the International Nucleotide Sequence Database Collaboration (INSDC, comprising GenBank, ENA and DDBJ) are the largest in the world. Preparing an annotated sequence in such a way that it will be accepted by the database is challenging because many validation criteria apply. In our opinion, it is an undesirable situation that researchers who want to submit their sequence need either a lot of experience or help from partners to get the job done. To save valuable time and money, we list a number of recommendations for people who want to submit an annotated genome to a sequence database, as well as for tool developers, who could help to ease the process. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Shideler, G.L.
1994-01-01
Middle Miocene siliciclastic deposits comprising the Calvert Cliffs section at the Baltimore Gas and Electric Company's (BG&E) nuclear power plant site in southern Maryland were analyzed in terms of lithostratigraphy, sedimentary structures, and granulometric parameters, to interprete paleo-environments within a sequence-stratigraphic framework. In terms of sequence-stratigraphic models, the BG&E section can be interpreted as consisting of two genetic stratigraphic sequences (Galloway model), namely, a shelf sequence and an overlying deltaic sequence. Using the Exxon model, the section consists of two third-order (1-5 m.y. duration) depositional sequences. The stratigraphic sequences of the BG&E section reflect both relatively short-term eustatic transgressive events, as well as a long-term regressive trend with associated local deltation and coastal progradation. The regression probably signified a regional basinward shift of depocenters within the Salisbury embayment during Miocene time. -from Author