Sample records for sequencing error rates

  1. Correcting for sequencing error in maximum likelihood phylogeny inference.

    PubMed

    Kuhner, Mary K; McGill, James

    2014-11-04

    Accurate phylogenies are critical to taxonomy as well as studies of speciation processes and other evolutionary patterns. Accurate branch lengths in phylogenies are critical for dating and rate measurements. Such accuracy may be jeopardized by unacknowledged sequencing error. We use simulated data to test a correction for DNA sequencing error in maximum likelihood phylogeny inference. Over a wide range of data polymorphism and true error rate, we found that correcting for sequencing error improves recovery of the branch lengths, even if the assumed error rate is up to twice the true error rate. Low error rates have little effect on recovery of the topology. When error is high, correction improves topological inference; however, when error is extremely high, using an assumed error rate greater than the true error rate leads to poor recovery of both topology and branch lengths. The error correction approach tested here was proposed in 2004 but has not been widely used, perhaps because researchers do not want to commit to an estimate of the error rate. This study shows that correction with an approximate error rate is generally preferable to ignoring the issue. Copyright © 2014 Kuhner and McGill.

  2. Estimating genotype error rates from high-coverage next-generation sequence data.

    PubMed

    Wall, Jeffrey D; Tang, Ling Fung; Zerbe, Brandon; Kvale, Mark N; Kwok, Pui-Yan; Schaefer, Catherine; Risch, Neil

    2014-11-01

    Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on the platform and the depth of coverage. Additionally, we found (1) no difference in the error profiles or rates between blood and saliva samples; (2) Complete Genomics sequences had substantially higher error rates than Illumina sequences had; (3) error rates were higher (up to 6%) for rare or unique variants; (4) error rates generally declined with genotype quality (GQ) score, but in a nonlinear fashion for the Illumina data, likely due to loss of specificity of GQ scores greater than 60; and (5) error rates increased with increasing depth of coverage for the Illumina data. These findings, especially (3)-(5), suggest that caution should be taken in interpreting the results of next-generation sequencing-based association studies, and even more so in clinical application of this technology in the absence of validation by other more robust sequencing or genotyping methods. © 2014 Wall et al.; Published by Cold Spring Harbor Laboratory Press.

  3. Mapping DNA polymerase errors by single-molecule sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, David F.; Lu, Jenny; Chang, Seungwoo

    Genomic integrity is compromised by DNA polymerase replication errors, which occur in a sequence-dependent manner across the genome. Accurate and complete quantification of a DNA polymerase's error spectrum is challenging because errors are rare and difficult to detect. We report a high-throughput sequencing assay to map in vitro DNA replication errors at the single-molecule level. Unlike previous methods, our assay is able to rapidly detect a large number of polymerase errors at base resolution over any template substrate without quantification bias. To overcome the high error rate of high-throughput sequencing, our assay uses a barcoding strategy in which each replicationmore » product is tagged with a unique nucleotide sequence before amplification. Here, this allows multiple sequencing reads of the same product to be compared so that sequencing errors can be found and removed. We demonstrate the ability of our assay to characterize the average error rate, error hotspots and lesion bypass fidelity of several DNA polymerases.« less

  4. Mapping DNA polymerase errors by single-molecule sequencing

    DOE PAGES

    Lee, David F.; Lu, Jenny; Chang, Seungwoo; ...

    2016-05-16

    Genomic integrity is compromised by DNA polymerase replication errors, which occur in a sequence-dependent manner across the genome. Accurate and complete quantification of a DNA polymerase's error spectrum is challenging because errors are rare and difficult to detect. We report a high-throughput sequencing assay to map in vitro DNA replication errors at the single-molecule level. Unlike previous methods, our assay is able to rapidly detect a large number of polymerase errors at base resolution over any template substrate without quantification bias. To overcome the high error rate of high-throughput sequencing, our assay uses a barcoding strategy in which each replicationmore » product is tagged with a unique nucleotide sequence before amplification. Here, this allows multiple sequencing reads of the same product to be compared so that sequencing errors can be found and removed. We demonstrate the ability of our assay to characterize the average error rate, error hotspots and lesion bypass fidelity of several DNA polymerases.« less

  5. Reverse Transcription Errors and RNA-DNA Differences at Short Tandem Repeats.

    PubMed

    Fungtammasan, Arkarachai; Tomaszkiewicz, Marta; Campos-Sánchez, Rebeca; Eckert, Kristin A; DeGiorgio, Michael; Makova, Kateryna D

    2016-10-01

    Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA-DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a Caenorhabditis elegans data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  6. Primer ID Validates Template Sampling Depth and Greatly Reduces the Error Rate of Next-Generation Sequencing of HIV-1 Genomic RNA Populations

    PubMed Central

    Zhou, Shuntai; Jones, Corbin; Mieczkowski, Piotr

    2015-01-01

    ABSTRACT Validating the sampling depth and reducing sequencing errors are critical for studies of viral populations using next-generation sequencing (NGS). We previously described the use of Primer ID to tag each viral RNA template with a block of degenerate nucleotides in the cDNA primer. We now show that low-abundance Primer IDs (offspring Primer IDs) are generated due to PCR/sequencing errors. These artifactual Primer IDs can be removed using a cutoff model for the number of reads required to make a template consensus sequence. We have modeled the fraction of sequences lost due to Primer ID resampling. For a typical sequencing run, less than 10% of the raw reads are lost to offspring Primer ID filtering and resampling. The remaining raw reads are used to correct for PCR resampling and sequencing errors. We also demonstrate that Primer ID reveals bias intrinsic to PCR, especially at low template input or utilization. cDNA synthesis and PCR convert ca. 20% of RNA templates into recoverable sequences, and 30-fold sequence coverage recovers most of these template sequences. We have directly measured the residual error rate to be around 1 in 10,000 nucleotides. We use this error rate and the Poisson distribution to define the cutoff to identify preexisting drug resistance mutations at low abundance in an HIV-infected subject. Collectively, these studies show that >90% of the raw sequence reads can be used to validate template sampling depth and to dramatically reduce the error rate in assessing a genetically diverse viral population using NGS. IMPORTANCE Although next-generation sequencing (NGS) has revolutionized sequencing strategies, it suffers from serious limitations in defining sequence heterogeneity in a genetically diverse population, such as HIV-1 due to PCR resampling and PCR/sequencing errors. The Primer ID approach reveals the true sampling depth and greatly reduces errors. Knowing the sampling depth allows the construction of a model of how to maximize the recovery of sequences from input templates and to reduce resampling of the Primer ID so that appropriate multiplexing can be included in the experimental design. With the defined sampling depth and measured error rate, we are able to assign cutoffs for the accurate detection of minority variants in viral populations. This approach allows the power of NGS to be realized without having to guess about sampling depth or to ignore the problem of PCR resampling, while also being able to correct most of the errors in the data set. PMID:26041299

  7. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems

    PubMed Central

    2011-01-01

    Background The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases. Results We provide quantifications and evidence for GC bias, error rates, error sequence context, effects of quality filtering, and the reliability of quality values. By combining different filtering criteria we reduced error rates 7-fold at the expense of discarding 12.5% of alignable bases. While overall error rates are low in HiSeq data we observed regions of accumulated wrong base calls. Only 3% of all error positions accounted for 24.7% of all substitution errors. Analyzing the forward and reverse strands separately revealed error rates of up to 18.7%. Insertions and deletions occurred at very low rates on average but increased to up to 2% in homopolymers. A positive correlation between read coverage and GC content was found depending on the GC content range. Conclusions The errors and biases we report have implications for the use and the interpretation of Illumina sequencing data. GAIIx and HiSeq data sets show slightly different error profiles. Quality filtering is essential to minimize downstream analysis artifacts. Supporting previous recommendations, the strand-specificity provides a criterion to distinguish sequencing errors from low abundance polymorphisms. PMID:22067484

  8. Error and Error Mitigation in Low-Coverage Genome Assemblies

    PubMed Central

    Hubisz, Melissa J.; Lin, Michael F.; Kellis, Manolis; Siepel, Adam

    2011-01-01

    The recent release of twenty-two new genome sequences has dramatically increased the data available for mammalian comparative genomics, but twenty of these new sequences are currently limited to ∼2× coverage. Here we examine the extent of sequencing error in these 2× assemblies, and its potential impact in downstream analyses. By comparing 2× assemblies with high-quality sequences from the ENCODE regions, we estimate the rate of sequencing error to be 1–4 errors per kilobase. While this error rate is fairly modest, sequencing error can still have surprising effects. For example, an apparent lineage-specific insertion in a coding region is more likely to reflect sequencing error than a true biological event, and the length distribution of coding indels is strongly distorted by error. We find that most errors are contributed by a small fraction of bases with low quality scores, in particular, by the ends of reads in regions of single-read coverage in the assembly. We explore several approaches for automatic sequencing error mitigation (SEM), making use of the localized nature of sequencing error, the fact that it is well predicted by quality scores, and information about errors that comes from comparisons across species. Our automatic methods for error mitigation cannot replace the need for additional sequencing, but they do allow substantial fractions of errors to be masked or eliminated at the cost of modest amounts of over-correction, and they can reduce the impact of error in downstream phylogenomic analyses. Our error-mitigated alignments are available for download. PMID:21340033

  9. Efficient error correction for next-generation sequencing of viral amplicons

    PubMed Central

    2012-01-01

    Background Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. Results In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Conclusions Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses. The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm PMID:22759430

  10. Efficient error correction for next-generation sequencing of viral amplicons.

    PubMed

    Skums, Pavel; Dimitrova, Zoya; Campo, David S; Vaughan, Gilberto; Rossi, Livia; Forbi, Joseph C; Yokosawa, Jonny; Zelikovsky, Alex; Khudyakov, Yury

    2012-06-25

    Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses.The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm.

  11. DNA/RNA transverse current sequencing: intrinsic structural noise from neighboring bases

    PubMed Central

    Alvarez, Jose R.; Skachkov, Dmitry; Massey, Steven E.; Kalitsov, Alan; Velev, Julian P.

    2015-01-01

    Nanopore DNA sequencing via transverse current has emerged as a promising candidate for third-generation sequencing technology. It produces long read lengths which could alleviate problems with assembly errors inherent in current technologies. However, the high error rates of nanopore sequencing have to be addressed. A very important source of the error is the intrinsic noise in the current arising from carrier dispersion along the chain of the molecule, i.e., from the influence of neighboring bases. In this work we perform calculations of the transverse current within an effective multi-orbital tight-binding model derived from first-principles calculations of the DNA/RNA molecules, to study the effect of this structural noise on the error rates in DNA/RNA sequencing via transverse current in nanopores. We demonstrate that a statistical technique, utilizing not only the currents through the nucleotides but also the correlations in the currents, can in principle reduce the error rate below any desired precision. PMID:26150827

  12. Estimating population genetic parameters and comparing model goodness-of-fit using DNA sequences with error

    PubMed Central

    Liu, Xiaoming; Fu, Yun-Xin; Maxwell, Taylor J.; Boerwinkle, Eric

    2010-01-01

    It is known that sequencing error can bias estimation of evolutionary or population genetic parameters. This problem is more prominent in deep resequencing studies because of their large sample size n, and a higher probability of error at each nucleotide site. We propose a new method based on the composite likelihood of the observed SNP configurations to infer population mutation rate θ = 4Neμ, population exponential growth rate R, and error rate ɛ, simultaneously. Using simulation, we show the combined effects of the parameters, θ, n, ɛ, and R on the accuracy of parameter estimation. We compared our maximum composite likelihood estimator (MCLE) of θ with other θ estimators that take into account the error. The results show the MCLE performs well when the sample size is large or the error rate is high. Using parametric bootstrap, composite likelihood can also be used as a statistic for testing the model goodness-of-fit of the observed DNA sequences. The MCLE method is applied to sequence data on the ANGPTL4 gene in 1832 African American and 1045 European American individuals. PMID:19952140

  13. ADEPT, a dynamic next generation sequencing data error-detection program with trimming

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feng, Shihai; Lo, Chien-Chi; Li, Po-E

    Illumina is the most widely used next generation sequencing technology and produces millions of short reads that contain errors. These sequencing errors constitute a major problem in applications such as de novo genome assembly, metagenomics analysis and single nucleotide polymorphism discovery. In this study, we present ADEPT, a dynamic error detection method, based on the quality scores of each nucleotide and its neighboring nucleotides, together with their positions within the read and compares this to the position-specific quality score distribution of all bases within the sequencing run. This method greatly improves upon other available methods in terms of the truemore » positive rate of error discovery without affecting the false positive rate, particularly within the middle of reads. We conclude that ADEPT is the only tool to date that dynamically assesses errors within reads by comparing position-specific and neighboring base quality scores with the distribution of quality scores for the dataset being analyzed. The result is a method that is less prone to position-dependent under-prediction, which is one of the most prominent issues in error prediction. The outcome is that ADEPT improves upon prior efforts in identifying true errors, primarily within the middle of reads, while reducing the false positive rate.« less

  14. ADEPT, a dynamic next generation sequencing data error-detection program with trimming

    DOE PAGES

    Feng, Shihai; Lo, Chien-Chi; Li, Po-E; ...

    2016-02-29

    Illumina is the most widely used next generation sequencing technology and produces millions of short reads that contain errors. These sequencing errors constitute a major problem in applications such as de novo genome assembly, metagenomics analysis and single nucleotide polymorphism discovery. In this study, we present ADEPT, a dynamic error detection method, based on the quality scores of each nucleotide and its neighboring nucleotides, together with their positions within the read and compares this to the position-specific quality score distribution of all bases within the sequencing run. This method greatly improves upon other available methods in terms of the truemore » positive rate of error discovery without affecting the false positive rate, particularly within the middle of reads. We conclude that ADEPT is the only tool to date that dynamically assesses errors within reads by comparing position-specific and neighboring base quality scores with the distribution of quality scores for the dataset being analyzed. The result is a method that is less prone to position-dependent under-prediction, which is one of the most prominent issues in error prediction. The outcome is that ADEPT improves upon prior efforts in identifying true errors, primarily within the middle of reads, while reducing the false positive rate.« less

  15. Improve homology search sensitivity of PacBio data by correcting frameshifts.

    PubMed

    Du, Nan; Sun, Yanni

    2016-09-01

    Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than secondary generation sequencing technologies such as Illumina. The long read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and identify gene isoforms with higher accuracy in transcriptomic sequencing. However, PacBio data has high sequencing error rate and most of the errors are insertion or deletion errors. During alignment-based homology search, insertion or deletion errors in genes will cause frameshifts and may only lead to marginal alignment scores and short alignments. As a result, it is hard to distinguish true alignments from random alignments and the ambiguity will incur errors in structural and functional annotation. Existing frameshift correction tools are designed for data with much lower error rate and are not optimized for PacBio data. As an increasing number of groups are using SMRT, there is an urgent need for dedicated homology search tools for PacBio data. In this work, we introduce Frame-Pro, a profile homology search tool for PacBio reads. Our tool corrects sequencing errors and also outputs the profile alignments of the corrected sequences against characterized protein families. We applied our tool to both simulated and real PacBio data. The results showed that our method enables more sensitive homology search, especially for PacBio data sets of low sequencing coverage. In addition, we can correct more errors when comparing with a popular error correction tool that does not rely on hybrid sequencing. The source code is freely available at https://sourceforge.net/projects/frame-pro/ yannisun@msu.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells.

    PubMed

    Beltman, Joost B; Urbanus, Jos; Velds, Arno; van Rooij, Nienke; Rohr, Jan C; Naik, Shalin H; Schumacher, Ton N

    2016-04-02

    Next generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing. For applications in which both abundant and rare sequences are biologically relevant, the relatively high error rate of NGS techniques complicates data analysis, as it is difficult to distinguish rare true sequences from spurious sequences that are generated by PCR or sequencing errors. This issue, for instance, applies to cellular barcoding strategies that aim to follow the amount and type of offspring of single cells, by supplying these with unique heritable DNA tags. Here, we use genetic barcoding data from the Illumina HiSeq platform to show that straightforward read threshold-based filtering of data is typically insufficient to filter out spurious barcodes. Importantly, we demonstrate that specific sequencing errors occur at an approximately constant rate across different samples that are sequenced in parallel. We exploit this observation by developing a novel approach to filter out spurious sequences. Application of our new method demonstrates its value in the identification of true sequences amongst spurious sequences in biological data sets.

  17. A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS

    PubMed Central

    Jiao, Xiaoli; Zheng, Xin; Ma, Liang; Kutty, Geetha; Gogineni, Emile; Sun, Qiang; Sherman, Brad T.; Hu, Xiaojun; Jones, Kristine; Raley, Castle; Tran, Bao; Munroe, David J.; Stephens, Robert; Liang, Dun; Imamichi, Tomozumi; Kovacs, Joseph A.; Lempicki, Richard A.; Huang, Da Wei

    2013-01-01

    PacBio RS, a newly emerging third-generation DNA sequencing platform, is based on a real-time, single-molecule, nano-nitch sequencing technology that can generate very long reads (up to 20-kb) in contrast to the shorter reads produced by the first and second generation sequencing technologies. As a new platform, it is important to assess the sequencing error rate, as well as the quality control (QC) parameters associated with the PacBio sequence data. In this study, a mixture of 10 prior known, closely related DNA amplicons were sequenced using the PacBio RS sequencing platform. After aligning Circular Consensus Sequence (CCS) reads derived from the above sequencing experiment to the known reference sequences, we found that the median error rate was 2.5% without read QC, and improved to 1.3% with an SVM based multi-parameter QC method. In addition, a De Novo assembly was used as a downstream application to evaluate the effects of different QC approaches. This benchmark study indicates that even though CCS reads are post error-corrected it is still necessary to perform appropriate QC on CCS reads in order to produce successful downstream bioinformatics analytical results. PMID:24179701

  18. A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS.

    PubMed

    Jiao, Xiaoli; Zheng, Xin; Ma, Liang; Kutty, Geetha; Gogineni, Emile; Sun, Qiang; Sherman, Brad T; Hu, Xiaojun; Jones, Kristine; Raley, Castle; Tran, Bao; Munroe, David J; Stephens, Robert; Liang, Dun; Imamichi, Tomozumi; Kovacs, Joseph A; Lempicki, Richard A; Huang, Da Wei

    2013-07-31

    PacBio RS, a newly emerging third-generation DNA sequencing platform, is based on a real-time, single-molecule, nano-nitch sequencing technology that can generate very long reads (up to 20-kb) in contrast to the shorter reads produced by the first and second generation sequencing technologies. As a new platform, it is important to assess the sequencing error rate, as well as the quality control (QC) parameters associated with the PacBio sequence data. In this study, a mixture of 10 prior known, closely related DNA amplicons were sequenced using the PacBio RS sequencing platform. After aligning Circular Consensus Sequence (CCS) reads derived from the above sequencing experiment to the known reference sequences, we found that the median error rate was 2.5% without read QC, and improved to 1.3% with an SVM based multi-parameter QC method. In addition, a De Novo assembly was used as a downstream application to evaluate the effects of different QC approaches. This benchmark study indicates that even though CCS reads are post error-corrected it is still necessary to perform appropriate QC on CCS reads in order to produce successful downstream bioinformatics analytical results.

  19. ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection

    PubMed Central

    Alachiotis, Nikolaos; Vogiatzi, Emmanouella; Pavlidis, Pavlos; Stamatakis, Alexandros

    2013-01-01

    Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined) characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a per-peak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG), an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA) for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability. The guided MSA assembly procedure in ChromatoGate detects chromatogram peaks of all characters in an alignment that lead to polymorphic sites, given a user-defined threshold. The threshold value represents the sensitivity of the sequencing error detection mechanism. After this pre-filtering, the user only needs to inspect a small number of peaks in every chromatogram to correct sequencing errors. Finally, we show that correcting sequencing errors is important, because population genetic and phylogenetic inferences can be misled by MSAs with uncorrected mis-calls. Our experiments indicate that estimates of population mutation rates can be affected two- to three-fold by uncorrected errors. PMID:24688709

  20. ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection.

    PubMed

    Alachiotis, Nikolaos; Vogiatzi, Emmanouella; Pavlidis, Pavlos; Stamatakis, Alexandros

    2013-01-01

    Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined) characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a per-peak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG), an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA) for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability. The guided MSA assembly procedure in ChromatoGate detects chromatogram peaks of all characters in an alignment that lead to polymorphic sites, given a user-defined threshold. The threshold value represents the sensitivity of the sequencing error detection mechanism. After this pre-filtering, the user only needs to inspect a small number of peaks in every chromatogram to correct sequencing errors. Finally, we show that correcting sequencing errors is important, because population genetic and phylogenetic inferences can be misled by MSAs with uncorrected mis-calls. Our experiments indicate that estimates of population mutation rates can be affected two- to three-fold by uncorrected errors.

  1. Single-variant and multi-variant trend tests for genetic association with next-generation sequencing that are robust to sequencing error.

    PubMed

    Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Alejandro Q; Musolf, Anthony; Matise, Tara C; Finch, Stephen J; Gordon, Derek

    2012-01-01

    As with any new technology, next-generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to those data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have lower power than the corresponding single-variant simulation results, most probably due to our specification of multi-variant SNP correlation values. In conclusion, our LTTae,NGS addresses two key challenges with NGS disease studies; first, it allows for differential misclassification when computing the statistic; and second, it addresses the multiple-testing issue in that there is a multi-variant form of the statistic that has only one degree of freedom, and provides a single p value, no matter how many loci. Copyright © 2013 S. Karger AG, Basel.

  2. Single variant and multi-variant trend tests for genetic association with next generation sequencing that are robust to sequencing error

    PubMed Central

    Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Andrew; Musolf, Anthony; Matise, Tara C.; Finch, Stephen J.; Gordon, Derek

    2013-01-01

    As with any new technology, next generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model, based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to that data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have lower power than the corresponding single variant simulation results, most probably due to our specification of multi-variant SNP correlation values. In conclusion, our LTTae,NGS addresses two key challenges with NGS disease studies; first, it allows for differential misclassification when computing the statistic; and second, it addresses the multiple-testing issue in that there is a multi-variant form of the statistic that has only one degree of freedom, and provides a single p-value, no matter how many loci. PMID:23594495

  3. Fundamental Bounds for Sequence Reconstruction from Nanopore Sequencers.

    PubMed

    Magner, Abram; Duda, Jarosław; Szpankowski, Wojciech; Grama, Ananth

    2016-06-01

    Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given indel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) using replicated extrusion (the process of passing a DNA strand through the nanopore), what is the number of replicas needed to accurately reconstruct the true sequence with high probability? Our results provide a number of important insights: (i) the probability of accurate reconstruction of a sequence from a single sample in the presence of indel errors tends quickly (i.e., exponentially) to zero as the length of the sequence increases; and (ii) replicated extrusion is an effective technique for accurate reconstruction. We show that for typical distributions of indel errors, the required number of replicas is a slow function (polylogarithmic) of sequence length - implying that through replicated extrusion, we can sequence large reads using nanopore sequencers. Moreover, we show that in certain cases, the required number of replicas can be related to information-theoretic parameters of the indel error distributions.

  4. Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.

    PubMed

    Otto, Thomas D; Sanders, Mandy; Berriman, Matthew; Newbold, Chris

    2010-07-15

    The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy. Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications. The software is available at http://icorn.sourceforge.net

  5. Error Rate Comparison during Polymerase Chain Reaction by DNA Polymerase

    DOE PAGES

    McInerney, Peter; Adams, Paul; Hadi, Masood Z.

    2014-01-01

    As larger-scale cloning projects become more prevalent, there is an increasing need for comparisons among high fidelity DNA polymerases used for PCR amplification. All polymerases marketed for PCR applications are tested for fidelity properties (i.e., error rate determination) by vendors, and numerous literature reports have addressed PCR enzyme fidelity. Nonetheless, it is often difficult to make direct comparisons among different enzymes due to numerous methodological and analytical differences from study to study. We have measured the error rates for 6 DNA polymerases commonly used in PCR applications, including 3 polymerases typically used for cloning applications requiring high fidelity. Error ratemore » measurement values reported here were obtained by direct sequencing of cloned PCR products. The strategy employed here allows interrogation of error rate across a very large DNA sequence space, since 94 unique DNA targets were used as templates for PCR cloning. The six enzymes included in the study, Taq polymerase, AccuPrime-Taq High Fidelity, KOD Hot Start, cloned Pfu polymerase, Phusion Hot Start, and Pwo polymerase, we find the lowest error rates with Pfu , Phusion, and Pwo polymerases. Error rates are comparable for these 3 enzymes and are >10x lower than the error rate observed with Taq polymerase. Mutation spectra are reported, with the 3 high fidelity enzymes displaying broadly similar types of mutations. For these enzymes, transition mutations predominate, with little bias observed for type of transition.« less

  6. Ultra-deep mutant spectrum profiling: improving sequencing accuracy using overlapping read pairs.

    PubMed

    Chen-Harris, Haiyin; Borucki, Monica K; Torres, Clinton; Slezak, Tom R; Allen, Jonathan E

    2013-02-12

    High throughput sequencing is beginning to make a transformative impact in the area of viral evolution. Deep sequencing has the potential to reveal the mutant spectrum within a viral sample at high resolution, thus enabling the close examination of viral mutational dynamics both within- and between-hosts. The challenge however, is to accurately model the errors in the sequencing data and differentiate real viral mutations, particularly those that exist at low frequencies, from sequencing errors. We demonstrate that overlapping read pairs (ORP) -- generated by combining short fragment sequencing libraries and longer sequencing reads -- significantly reduce sequencing error rates and improve rare variant detection accuracy. Using this sequencing protocol and an error model optimized for variant detection, we are able to capture a large number of genetic mutations present within a viral population at ultra-low frequency levels (<0.05%). Our rare variant detection strategies have important implications beyond viral evolution and can be applied to any basic and clinical research area that requires the identification of rare mutations.

  7. Accurate typing of short tandem repeats from genome-wide sequencing data and its applications.

    PubMed

    Fungtammasan, Arkarachai; Ananda, Guruprasad; Hile, Suzanne E; Su, Marcia Shu-Wei; Sun, Chen; Harris, Robert; Medvedev, Paul; Eckert, Kristin; Makova, Kateryna D

    2015-05-01

    Short tandem repeats (STRs) are implicated in dozens of human genetic diseases and contribute significantly to genome variation and instability. Yet profiling STRs from short-read sequencing data is challenging because of their high sequencing error rates. Here, we developed STR-FM, short tandem repeat profiling using flank-based mapping, a computational pipeline that can detect the full spectrum of STR alleles from short-read data, can adapt to emerging read-mapping algorithms, and can be applied to heterogeneous genetic samples (e.g., tumors, viruses, and genomes of organelles). We used STR-FM to study STR error rates and patterns in publicly available human and in-house generated ultradeep plasmid sequencing data sets. We discovered that STRs sequenced with a PCR-free protocol have up to ninefold fewer errors than those sequenced with a PCR-containing protocol. We constructed an error correction model for genotyping STRs that can distinguish heterozygous alleles containing STRs with consecutive repeat numbers. Applying our model and pipeline to Illumina sequencing data with 100-bp reads, we could confidently genotype several disease-related long trinucleotide STRs. Utilizing this pipeline, for the first time we determined the genome-wide STR germline mutation rate from a deeply sequenced human pedigree. Additionally, we built a tool that recommends minimal sequencing depth for accurate STR genotyping, depending on repeat length and sequencing read length. The required read depth increases with STR length and is lower for a PCR-free protocol. This suite of tools addresses the pressing challenges surrounding STR genotyping, and thus is of wide interest to researchers investigating disease-related STRs and STR evolution. © 2015 Fungtammasan et al.; Published by Cold Spring Harbor Laboratory Press.

  8. Illusory conjunctions of pitch and duration in unfamiliar tone sequences.

    PubMed

    Thompson, W F; Hall, M D; Pressing, J

    2001-02-01

    In 3 experiments, the authors examined short-term memory for pitch and duration in unfamiliar tone sequences. Participants were presented a target sequence consisting of 2 tones (Experiment 1) or 7 tones (Experiments 2 and 3) and then a probe tone. Participants indicated whether the probe tone matched 1 of the target tones in both pitch and duration. Error rates were relatively low if the probe tone matched 1 of the target tones or if it differed from target tones in pitch, duration, or both. Error rates were remarkably high, however, if the probe tone combined the pitch of 1 target tone with the duration of a different target tone. The results suggest that illusory conjunctions of these dimensions frequently occur. A mathematical model is presented that accounts for the relative contribution of pitch errors, duration errors, and illusory conjunctions of pitch and duration.

  9. Rapid and accurate pyrosequencing of angiosperm plastid genomes

    PubMed Central

    Moore, Michael J; Dhingra, Amit; Soltis, Pamela S; Shaw, Regina; Farmerie, William G; Folta, Kevin M; Soltis, Douglas E

    2006-01-01

    Background Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20) System (454 Life Sciences Corporation), to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae) and Platanus occidentalis (Platanaceae). Results More than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6× in Nandina and 17.3× in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with ~60% of all errors associated with homopolymer runs of 5 or more nucleotides and ~50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions. Conclusion Highly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy observed in the GS 20 plastid genome sequence was generated for a significant reduction in time and cost over traditional shotgun-based genome sequencing techniques, although with approximately half the coverage of previously reported GS 20 de novo genome sequence. The GS 20 should be broadly applicable to angiosperm plastid genome sequencing, and therefore promises to expand the scale of plant genetic and phylogenetic research dramatically. PMID:16934154

  10. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.

    PubMed

    Schirmer, Melanie; Ijaz, Umer Z; D'Amore, Rosalinda; Hall, Neil; Sloan, William T; Quince, Christopher

    2015-03-31

    With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina's MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. RECKONER: read error corrector based on KMC.

    PubMed

    Dlugosz, Maciej; Deorowicz, Sebastian

    2017-04-01

    Presence of sequencing errors in data produced by next-generation sequencers affects quality of downstream analyzes. Accuracy of them can be improved by performing error correction of sequencing reads. We introduce a new correction algorithm capable of processing eukaryotic close to 500 Mbp-genome-size, high error-rated data using less than 4 GB of RAM in about 35 min on 16-core computer. Program is freely available at http://sun.aei.polsl.pl/REFRESH/reckoner . sebastian.deorowicz@polsl.pl. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  12. A Simple Exact Error Rate Analysis for DS-CDMA with Arbitrary Pulse Shape in Flat Nakagami Fading

    NASA Astrophysics Data System (ADS)

    Rahman, Mohammad Azizur; Sasaki, Shigenobu; Kikuchi, Hisakazu; Harada, Hiroshi; Kato, Shuzo

    A simple exact error rate analysis is presented for random binary direct sequence code division multiple access (DS-CDMA) considering a general pulse shape and flat Nakagami fading channel. First of all, a simple model is developed for the multiple access interference (MAI). Based on this, a simple exact expression of the characteristic function (CF) of MAI is developed in a straight forward manner. Finally, an exact expression of error rate is obtained following the CF method of error rate analysis. The exact error rate so obtained can be much easily evaluated as compared to the only reliable approximate error rate expression currently available, which is based on the Improved Gaussian Approximation (IGA).

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    McInerney, Peter; Adams, Paul; Hadi, Masood Z.

    As larger-scale cloning projects become more prevalent, there is an increasing need for comparisons among high fidelity DNA polymerases used for PCR amplification. All polymerases marketed for PCR applications are tested for fidelity properties (i.e., error rate determination) by vendors, and numerous literature reports have addressed PCR enzyme fidelity. Nonetheless, it is often difficult to make direct comparisons among different enzymes due to numerous methodological and analytical differences from study to study. We have measured the error rates for 6 DNA polymerases commonly used in PCR applications, including 3 polymerases typically used for cloning applications requiring high fidelity. Error ratemore » measurement values reported here were obtained by direct sequencing of cloned PCR products. The strategy employed here allows interrogation of error rate across a very large DNA sequence space, since 94 unique DNA targets were used as templates for PCR cloning. The six enzymes included in the study, Taq polymerase, AccuPrime-Taq High Fidelity, KOD Hot Start, cloned Pfu polymerase, Phusion Hot Start, and Pwo polymerase, we find the lowest error rates with Pfu , Phusion, and Pwo polymerases. Error rates are comparable for these 3 enzymes and are >10x lower than the error rate observed with Taq polymerase. Mutation spectra are reported, with the 3 high fidelity enzymes displaying broadly similar types of mutations. For these enzymes, transition mutations predominate, with little bias observed for type of transition.« less

  14. Draft versus finished sequence data for DNA and protein diagnostic signature development

    PubMed Central

    Gardner, Shea N.; Lam, Marisa W.; Smith, Jason R.; Torres, Clinton L.; Slezak, Tom R.

    2005-01-01

    Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP uses simulations to estimate the number of target genomes and close phylogenetic relatives (near neighbors or NNs) to sequence. We use SAP to assess whether draft data are sufficient or finished sequencing is required using Marburg and variola virus sequences. Simulations indicate that intermediate to high-quality draft with error rates of 10−3–10−5 (∼8× coverage) of target organisms is suitable for DNA signature prediction. Low-quality draft with error rates of ∼1% (3× to 6× coverage) of target isolates is inadequate for DNA signature prediction, although low-quality draft of NNs is sufficient, as long as the target genomes are of high quality. For protein signature prediction, sequencing errors in target genomes substantially reduce the detection of amino acid sequence conservation, even if the draft is of high quality. In summary, high-quality draft of target and low-quality draft of NNs appears to be a cost-effective investment for DNA signature prediction, but may lead to underestimation of predicted protein signatures. PMID:16243783

  15. AfterQC: automatic filtering, trimming, error removing and quality control for fastq data.

    PubMed

    Chen, Shifu; Huang, Tanxiao; Zhou, Yanqing; Han, Yue; Xu, Mingyan; Gu, Jia

    2017-03-14

    Some applications, especially those clinical applications requiring high accuracy of sequencing data, usually have to face the troubles caused by unavoidable sequencing errors. Several tools have been proposed to profile the sequencing quality, but few of them can quantify or correct the sequencing errors. This unmet requirement motivated us to develop AfterQC, a tool with functions to profile sequencing errors and correct most of them, plus highly automated quality control and data filtering features. Different from most tools, AfterQC analyses the overlapping of paired sequences for pair-end sequencing data. Based on overlapping analysis, AfterQC can detect and cut adapters, and furthermore it gives a novel function to correct wrong bases in the overlapping regions. Another new feature is to detect and visualise sequencing bubbles, which can be commonly found on the flowcell lanes and may raise sequencing errors. Besides normal per cycle quality and base content plotting, AfterQC also provides features like polyX (a long sub-sequence of a same base X) filtering, automatic trimming and K-MER based strand bias profiling. For each single or pair of FastQ files, AfterQC filters out bad reads, detects and eliminates sequencer's bubble effects, trims reads at front and tail, detects the sequencing errors and corrects part of them, and finally outputs clean data and generates HTML reports with interactive figures. AfterQC can run in batch mode with multiprocess support, it can run with a single FastQ file, a single pair of FastQ files (for pair-end sequencing), or a folder for all included FastQ files to be processed automatically. Based on overlapping analysis, AfterQC can estimate the sequencing error rate and profile the error transform distribution. The results of our error profiling tests show that the error distribution is highly platform dependent. Much more than just another new quality control (QC) tool, AfterQC is able to perform quality control, data filtering, error profiling and base correction automatically. Experimental results show that AfterQC can help to eliminate the sequencing errors for pair-end sequencing data to provide much cleaner outputs, and consequently help to reduce the false-positive variants, especially for the low-frequency somatic mutations. While providing rich configurable options, AfterQC can detect and set all the options automatically and require no argument in most cases.

  16. Factors That Affect Large Subunit Ribosomal DNA Amplicon Sequencing Studies of Fungal Communities: Classification Method, Primer Choice, and Error

    PubMed Central

    Porter, Teresita M.; Golding, G. Brian

    2012-01-01

    Nuclear large subunit ribosomal DNA is widely used in fungal phylogenetics and to an increasing extent also amplicon-based environmental sequencing. The relatively short reads produced by next-generation sequencing, however, makes primer choice and sequence error important variables for obtaining accurate taxonomic classifications. In this simulation study we tested the performance of three classification methods: 1) a similarity-based method (BLAST + Metagenomic Analyzer, MEGAN); 2) a composition-based method (Ribosomal Database Project naïve Bayesian classifier, NBC); and, 3) a phylogeny-based method (Statistical Assignment Package, SAP). We also tested the effects of sequence length, primer choice, and sequence error on classification accuracy and perceived community composition. Using a leave-one-out cross validation approach, results for classifications to the genus rank were as follows: BLAST + MEGAN had the lowest error rate and was particularly robust to sequence error; SAP accuracy was highest when long LSU query sequences were classified; and, NBC runs significantly faster than the other tested methods. All methods performed poorly with the shortest 50–100 bp sequences. Increasing simulated sequence error reduced classification accuracy. Community shifts were detected due to sequence error and primer selection even though there was no change in the underlying community composition. Short read datasets from individual primers, as well as pooled datasets, appear to only approximate the true community composition. We hope this work informs investigators of some of the factors that affect the quality and interpretation of their environmental gene surveys. PMID:22558215

  17. Designing robust watermark barcodes for multiplex long-read sequencing.

    PubMed

    Ezpeleta, Joaquín; Krsticevic, Flavia J; Bulacio, Pilar; Tapia, Elizabeth

    2017-03-15

    To attain acceptable sample misassignment rates, current approaches to multiplex single-molecule real-time sequencing require upstream quality improvement, which is obtained from multiple passes over the sequenced insert and significantly reduces the effective read length. In order to fully exploit the raw read length on multiplex applications, robust barcodes capable of dealing with the full single-pass error rates are needed. We present a method for designing sequencing barcodes that can withstand a large number of insertion, deletion and substitution errors and are suitable for use in multiplex single-molecule real-time sequencing. The manuscript focuses on the design of barcodes for full-length single-pass reads, impaired by challenging error rates in the order of 11%. The proposed barcodes can multiplex hundreds or thousands of samples while achieving sample misassignment probabilities as low as 10-7 under the above conditions, and are designed to be compatible with chemical constraints imposed by the sequencing process. Software tools for constructing watermark barcode sets and demultiplexing barcoded reads, together with example sets of barcodes and synthetic barcoded reads, are freely available at www.cifasis-conicet.gov.ar/ezpeleta/NS-watermark . ezpeleta@cifasis-conicet.gov.ar. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  18. Portable and Error-Free DNA-Based Data Storage.

    PubMed

    Yazdi, S M Hossein Tabatabaei; Gabrys, Ryan; Milenkovic, Olgica

    2017-07-10

    DNA-based data storage is an emerging nonvolatile memory technology of potentially unprecedented density, durability, and replication efficiency. The basic system implementation steps include synthesizing DNA strings that contain user information and subsequently retrieving them via high-throughput sequencing technologies. Existing architectures enable reading and writing but do not offer random-access and error-free data recovery from low-cost, portable devices, which is crucial for making the storage technology competitive with classical recorders. Here we show for the first time that a portable, random-access platform may be implemented in practice using nanopore sequencers. The novelty of our approach is to design an integrated processing pipeline that encodes data to avoid costly synthesis and sequencing errors, enables random access through addressing, and leverages efficient portable sequencing via new iterative alignment and deletion error-correcting codes. Our work represents the only known random access DNA-based data storage system that uses error-prone nanopore sequencers, while still producing error-free readouts with the highest reported information rate/density. As such, it represents a crucial step towards practical employment of DNA molecules as storage media.

  19. Error baseline rates of five sample preparation methods used to characterize RNA virus populations.

    PubMed

    Kugelman, Jeffrey R; Wiley, Michael R; Nagle, Elyse R; Reyes, Daniel; Pfeffer, Brad P; Kuhn, Jens H; Sanchez-Lockhart, Mariano; Palacios, Gustavo F

    2017-01-01

    Individual RNA viruses typically occur as populations of genomes that differ slightly from each other due to mutations introduced by the error-prone viral polymerase. Understanding the variability of RNA virus genome populations is critical for understanding virus evolution because individual mutant genomes may gain evolutionary selective advantages and give rise to dominant subpopulations, possibly even leading to the emergence of viruses resistant to medical countermeasures. Reverse transcription of virus genome populations followed by next-generation sequencing is the only available method to characterize variation for RNA viruses. However, both steps may lead to the introduction of artificial mutations, thereby skewing the data. To better understand how such errors are introduced during sample preparation, we determined and compared error baseline rates of five different sample preparation methods by analyzing in vitro transcribed Ebola virus RNA from an artificial plasmid-based system. These methods included: shotgun sequencing from plasmid DNA or in vitro transcribed RNA as a basic "no amplification" method, amplicon sequencing from the plasmid DNA or in vitro transcribed RNA as a "targeted" amplification method, sequence-independent single-primer amplification (SISPA) as a "random" amplification method, rolling circle reverse transcription sequencing (CirSeq) as an advanced "no amplification" method, and Illumina TruSeq RNA Access as a "targeted" enrichment method. The measured error frequencies indicate that RNA Access offers the best tradeoff between sensitivity and sample preparation error (1.4-5) of all compared methods.

  20. Error baseline rates of five sample preparation methods used to characterize RNA virus populations

    PubMed Central

    Kugelman, Jeffrey R.; Wiley, Michael R.; Nagle, Elyse R.; Reyes, Daniel; Pfeffer, Brad P.; Kuhn, Jens H.; Sanchez-Lockhart, Mariano; Palacios, Gustavo F.

    2017-01-01

    Individual RNA viruses typically occur as populations of genomes that differ slightly from each other due to mutations introduced by the error-prone viral polymerase. Understanding the variability of RNA virus genome populations is critical for understanding virus evolution because individual mutant genomes may gain evolutionary selective advantages and give rise to dominant subpopulations, possibly even leading to the emergence of viruses resistant to medical countermeasures. Reverse transcription of virus genome populations followed by next-generation sequencing is the only available method to characterize variation for RNA viruses. However, both steps may lead to the introduction of artificial mutations, thereby skewing the data. To better understand how such errors are introduced during sample preparation, we determined and compared error baseline rates of five different sample preparation methods by analyzing in vitro transcribed Ebola virus RNA from an artificial plasmid-based system. These methods included: shotgun sequencing from plasmid DNA or in vitro transcribed RNA as a basic “no amplification” method, amplicon sequencing from the plasmid DNA or in vitro transcribed RNA as a “targeted” amplification method, sequence-independent single-primer amplification (SISPA) as a “random” amplification method, rolling circle reverse transcription sequencing (CirSeq) as an advanced “no amplification” method, and Illumina TruSeq RNA Access as a “targeted” enrichment method. The measured error frequencies indicate that RNA Access offers the best tradeoff between sensitivity and sample preparation error (1.4−5) of all compared methods. PMID:28182717

  1. Method and Apparatus for Evaluating the Visual Quality of Processed Digital Video Sequences

    NASA Technical Reports Server (NTRS)

    Watson, Andrew B. (Inventor)

    2002-01-01

    A Digital Video Quality (DVQ) apparatus and method that incorporate a model of human visual sensitivity to predict the visibility of artifacts. The DVQ method and apparatus are used for the evaluation of the visual quality of processed digital video sequences and for adaptively controlling the bit rate of the processed digital video sequences without compromising the visual quality. The DVQ apparatus minimizes the required amount of memory and computation. The input to the DVQ apparatus is a pair of color image sequences: an original (R) non-compressed sequence, and a processed (T) sequence. Both sequences (R) and (T) are sampled, cropped, and subjected to color transformations. The sequences are then subjected to blocking and discrete cosine transformation, and the results are transformed to local contrast. The next step is a time filtering operation which implements the human sensitivity to different time frequencies. The results are converted to threshold units by dividing each discrete cosine transform coefficient by its respective visual threshold. At the next stage the two sequences are subtracted to produce an error sequence. The error sequence is subjected to a contrast masking operation, which also depends upon the reference sequence (R). The masked errors can be pooled in various ways to illustrate the perceptual error over various dimensions, and the pooled error can be converted to a visual quality measure.

  2. Haplotype estimation using sequencing reads.

    PubMed

    Delaneau, Olivier; Howie, Bryan; Cox, Anthony J; Zagury, Jean-François; Marchini, Jonathan

    2013-10-03

    High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  3. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    DOE PAGES

    McNair, Katelyn; Edwards, Robert A.

    2015-06-16

    As increases in prokaryotic sequencing take place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek) was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping errormore » rates low, as well as offering unique data visualization options.« less

  4. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    PubMed Central

    Matochko, Wadim L.; Derda, Ratmir

    2013-01-01

    Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (S a). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of S a and use them to define the sequencing operator (S e q). Sequencing without any bias and errors is S e q = S a IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (C E N), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071

  5. DNA Barcoding through Quaternary LDPC Codes

    PubMed Central

    Tapia, Elizabeth; Spetale, Flavio; Krsticevic, Flavia; Angelone, Laura; Bulacio, Pilar

    2015-01-01

    For many parallel applications of Next-Generation Sequencing (NGS) technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH) or have intrinsic poor error correcting abilities (Hamming). Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC) codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10−2 per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10−9 at the expense of a rate of read losses just in the order of 10−6. PMID:26492348

  6. DNA Barcoding through Quaternary LDPC Codes.

    PubMed

    Tapia, Elizabeth; Spetale, Flavio; Krsticevic, Flavia; Angelone, Laura; Bulacio, Pilar

    2015-01-01

    For many parallel applications of Next-Generation Sequencing (NGS) technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH) or have intrinsic poor error correcting abilities (Hamming). Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC) codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10(-2) per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10(-9) at the expense of a rate of read losses just in the order of 10(-6).

  7. Characterization of Hepatitis C Virus (HCV) Envelope Diversification from Acute to Chronic Infection within a Sexually Transmitted HCV Cluster by Using Single-Molecule, Real-Time Sequencing

    PubMed Central

    Ho, Cynthia K. Y.; Raghwani, Jayna; Koekkoek, Sylvie; Liang, Richard H.; Van der Meer, Jan T. M.; Van Der Valk, Marc; De Jong, Menno; Pybus, Oliver G.

    2016-01-01

    ABSTRACT In contrast to other available next-generation sequencing platforms, PacBio single-molecule, real-time (SMRT) sequencing has the advantage of generating long reads albeit with a relatively higher error rate in unprocessed data. Using this platform, we longitudinally sampled and sequenced the hepatitis C virus (HCV) envelope genome region (1,680 nucleotides [nt]) from individuals belonging to a cluster of sexually transmitted cases. All five subjects were coinfected with HIV-1 and a closely related strain of HCV genotype 4d. In total, 50 samples were analyzed by using SMRT sequencing. By using 7 passes of circular consensus sequencing, the error rate was reduced to 0.37%, and the median number of sequences was 612 per sample. A further reduction of insertions was achieved by alignment against a sample-specific reference sequence. However, in vitro recombination during PCR amplification could not be excluded. Phylogenetic analysis supported close relationships among HCV sequences from the four male subjects and subsequent transmission from one subject to his female partner. Transmission was characterized by a strong genetic bottleneck. Viral genetic diversity was low during acute infection and increased upon progression to chronicity but subsequently fluctuated during chronic infection, caused by the alternate detection of distinct coexisting lineages. SMRT sequencing combines long reads with sufficient depth for many phylogenetic analyses and can therefore provide insights into within-host HCV evolutionary dynamics without the need for haplotype reconstruction using statistical algorithms. IMPORTANCE Next-generation sequencing has revolutionized the study of genetically variable RNA virus populations, but for phylogenetic and evolutionary analyses, longer sequences than those generated by most available platforms, while minimizing the intrinsic error rate, are desired. Here, we demonstrate for the first time that PacBio SMRT sequencing technology can be used to generate full-length HCV envelope sequences at the single-molecule level, providing a data set with large sequencing depth for the characterization of intrahost viral dynamics. The selection of consensus reads derived from at least 7 full circular consensus sequencing rounds significantly reduced the intrinsic high error rate of this method. We used this method to genetically characterize a unique transmission cluster of sexually transmitted HCV infections, providing insight into the distinct evolutionary pathways in each patient over time and identifying the transmission-associated genetic bottleneck as well as fluctuations in viral genetic diversity over time, accompanied by dynamic shifts in viral subpopulations. PMID:28077634

  8. Accounting for uncertainty in DNA sequencing data.

    PubMed

    O'Rawe, Jason A; Ferson, Scott; Lyon, Gholson J

    2015-02-01

    Science is defined in part by an honest exposition of the uncertainties that arise in measurements and propagate through calculations and inferences, so that the reliabilities of its conclusions are made apparent. The recent rapid development of high-throughput DNA sequencing technologies has dramatically increased the number of measurements made at the biochemical and molecular level. These data come from many different DNA-sequencing technologies, each with their own platform-specific errors and biases, which vary widely. Several statistical studies have tried to measure error rates for basic determinations, but there are no general schemes to project these uncertainties so as to assess the surety of the conclusions drawn about genetic, epigenetic, and more general biological questions. We review here the state of uncertainty quantification in DNA sequencing applications, describe sources of error, and propose methods that can be used for accounting and propagating these errors and their uncertainties through subsequent calculations. Copyright © 2014 Elsevier Ltd. All rights reserved.

  9. Choice of Reference Sequence and Assembler for Alignment of Listeria monocytogenes Short-Read Sequence Data Greatly Influences Rates of Error in SNP Analyses

    PubMed Central

    Pightling, Arthur W.; Petronella, Nicholas; Pagotto, Franco

    2014-01-01

    The wide availability of whole-genome sequencing (WGS) and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs) in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs) are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps) are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i) depth of sequencing coverage, ii) choice of reference-guided short-read sequence assembler, iii) choice of reference genome, and iv) whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT), using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming). We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers should test a variety of conditions to achieve optimal results. PMID:25144537

  10. Assessing the utility of the Oxford Nanopore MinION for snake venom gland cDNA sequencing.

    PubMed

    Hargreaves, Adam D; Mulley, John F

    2015-01-01

    Portable DNA sequencers such as the Oxford Nanopore MinION device have the potential to be truly disruptive technologies, facilitating new approaches and analyses and, in some cases, taking sequencing out of the lab and into the field. However, the capabilities of these technologies are still being revealed. Here we show that single-molecule cDNA sequencing using the MinION accurately characterises venom toxin-encoding genes in the painted saw-scaled viper, Echis coloratus. We find the raw sequencing error rate to be around 12%, improved to 0-2% with hybrid error correction and 3% with de novo error correction. Our corrected data provides full coding sequences and 5' and 3' UTRs for 29 of 33 candidate venom toxins detected, far superior to Illumina data (13/40 complete) and Sanger-based ESTs (15/29). We suggest that, should the current pace of improvement continue, the MinION will become the default approach for cDNA sequencing in a variety of species.

  11. Assessing the utility of the Oxford Nanopore MinION for snake venom gland cDNA sequencing

    PubMed Central

    Hargreaves, Adam D.

    2015-01-01

    Portable DNA sequencers such as the Oxford Nanopore MinION device have the potential to be truly disruptive technologies, facilitating new approaches and analyses and, in some cases, taking sequencing out of the lab and into the field. However, the capabilities of these technologies are still being revealed. Here we show that single-molecule cDNA sequencing using the MinION accurately characterises venom toxin-encoding genes in the painted saw-scaled viper, Echis coloratus. We find the raw sequencing error rate to be around 12%, improved to 0–2% with hybrid error correction and 3% with de novo error correction. Our corrected data provides full coding sequences and 5′ and 3′ UTRs for 29 of 33 candidate venom toxins detected, far superior to Illumina data (13/40 complete) and Sanger-based ESTs (15/29). We suggest that, should the current pace of improvement continue, the MinION will become the default approach for cDNA sequencing in a variety of species. PMID:26623194

  12. Evaluation of 16S Rrna amplicon sequencing using two next-generation sequencing technologies for phylogenetic analysis of the rumen bacterial community in steers

    USDA-ARS?s Scientific Manuscript database

    Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...

  13. Evaluation of 16S rRNA amplicon sequencing using two next-generation sequencing technologies for phylogenetic analysis of the rumen bacterial community in steers

    USDA-ARS?s Scientific Manuscript database

    Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...

  14. Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions.

    PubMed

    Senol Cali, Damla; Kim, Jeremie S; Ghose, Saugata; Alkan, Can; Mutlu, Onur

    2018-04-02

    Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, to overcome the high error rates of the nanopore sequencing technology.

  15. Quantizing and sampling considerations in digital phased-locked loops

    NASA Technical Reports Server (NTRS)

    Hurst, G. T.; Gupta, S. C.

    1974-01-01

    The quantizer problem is first considered. The conditions under which the uniform white sequence model for the quantizer error is valid are established independent of the sampling rate. An equivalent spectral density is defined for the quantizer error resulting in an effective SNR value. This effective SNR may be used to determine quantized performance from infinitely fine quantized results. Attention is given to sampling rate considerations. Sampling rate characteristics of the digital phase-locked loop (DPLL) structure are investigated for the infinitely fine quantized system. The predicted phase error variance equation is examined as a function of the sampling rate. Simulation results are presented and a method is described which enables the minimum required sampling rate to be determined from the predicted phase error variance equations.

  16. Optimization of High-Throughput Sequencing Kinetics for determining enzymatic rate constants of thousands of RNA substrates

    PubMed Central

    Niland, Courtney N.; Jankowsky, Eckhard; Harris, Michael E.

    2016-01-01

    Quantification of the specificity of RNA binding proteins and RNA processing enzymes is essential to understanding their fundamental roles in biological processes. High Throughput Sequencing Kinetics (HTS-Kin) uses high throughput sequencing and internal competition kinetics to simultaneously monitor the processing rate constants of thousands of substrates by RNA processing enzymes. This technique has provided unprecedented insight into the substrate specificity of the tRNA processing endonuclease ribonuclease P. Here, we investigate the accuracy and robustness of measurements associated with each step of the HTS-Kin procedure. We examine the effect of substrate concentration on the observed rate constant, determine the optimal kinetic parameters, and provide guidelines for reducing error in amplification of the substrate population. Importantly, we find that high-throughput sequencing, and experimental reproducibility contribute their own sources of error, and these are the main sources of imprecision in the quantified results when otherwise optimized guidelines are followed. PMID:27296633

  17. Metagenomic and near full-length 16S rRNA sequence data in support of the phylogenetic analysis of the rumen bacterial community in steers

    USDA-ARS?s Scientific Manuscript database

    Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...

  18. Hybrid error correction and de novo assembly of single-molecule sequencing reads

    PubMed Central

    Koren, Sergey; Schatz, Michael C.; Walenz, Brian P.; Martin, Jeffrey; Howard, Jason; Ganapathy, Ganeshkumar; Wang, Zhong; Rasko, David A.; McCombie, W. Richard; Jarvis, Erich D.; Phillippy, Adam M.

    2012-01-01

    Emerging single-molecule sequencing instruments can generate multi-kilobase sequences with the potential to dramatically improve genome and transcriptome assembly. However, the high error rate of single-molecule reads is challenging, and has limited their use to resequencing bacteria. To address this limitation, we introduce a novel correction algorithm and assembly strategy that utilizes shorter, high-identity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on Pacbio RS reads of phage, prokaryotic, and eukaryotic whole genomes, including the novel genome of the parrot Melopsittacus undulatus, as well as for RNA-seq reads of the corn (Zea mays) transcriptome. Our approach achieves over 99.9% read correction accuracy and produces substantially better assemblies than current sequencing strategies: in the best example, quintupling the median contig size relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly. PMID:22750884

  19. Error catastrophe and phase transition in the empirical fitness landscape of HIV

    NASA Astrophysics Data System (ADS)

    Hart, Gregory R.; Ferguson, Andrew L.

    2015-03-01

    We have translated clinical sequence databases of the p6 HIV protein into an empirical fitness landscape quantifying viral replicative capacity as a function of the amino acid sequence. We show that the viral population resides close to a phase transition in sequence space corresponding to an "error catastrophe" beyond which there is lethal accumulation of mutations. Our model predicts that the phase transition may be induced by drug therapies that elevate the mutation rate, or by forcing mutations at particular amino acids. Applying immune pressure to any combination of killer T-cell targets cannot induce the transition, providing a rationale for why the viral protein can exist close to the error catastrophe without sustaining fatal fitness penalties due to adaptive immunity.

  20. A systematic comparison of error correction enzymes by next-generation sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lubock, Nathan B.; Zhang, Di; Sidore, Angus M.

    Gene synthesis, the process of assembling genelength fragments from shorter groups of oligonucleotides (oligos), is becoming an increasingly important tool in molecular and synthetic biology. The length, quality and cost of gene synthesis are limited by errors produced during oligo synthesis and subsequent assembly. Enzymatic error correction methods are cost-effective means to ameliorate errors in gene synthesis. Previous analyses of these methods relied on cloning and Sanger sequencing to evaluate their efficiencies, limiting quantitative assessment. Here, we develop a method to quantify errors in synthetic DNA by next-generation sequencing. We analyzed errors in model gene assemblies and systematically compared sixmore » different error correction enzymes across 11 conditions. We find that ErrASE and T7 Endonuclease I are the most effective at decreasing average error rates (up to 5.8-fold relative to the input), whereas MutS is the best for increasing the number of perfect assemblies (up to 25.2-fold). We are able to quantify differential specificities such as ErrASE preferentially corrects C/G transversions whereas T7 Endonuclease I preferentially corrects A/T transversions. More generally, this experimental and computational pipeline is a fast, scalable and extensible way to analyze errors in gene assemblies, to profile error correction methods, and to benchmark DNA synthesis methods.« less

  1. A systematic comparison of error correction enzymes by next-generation sequencing

    DOE PAGES

    Lubock, Nathan B.; Zhang, Di; Sidore, Angus M.; ...

    2017-08-01

    Gene synthesis, the process of assembling genelength fragments from shorter groups of oligonucleotides (oligos), is becoming an increasingly important tool in molecular and synthetic biology. The length, quality and cost of gene synthesis are limited by errors produced during oligo synthesis and subsequent assembly. Enzymatic error correction methods are cost-effective means to ameliorate errors in gene synthesis. Previous analyses of these methods relied on cloning and Sanger sequencing to evaluate their efficiencies, limiting quantitative assessment. Here, we develop a method to quantify errors in synthetic DNA by next-generation sequencing. We analyzed errors in model gene assemblies and systematically compared sixmore » different error correction enzymes across 11 conditions. We find that ErrASE and T7 Endonuclease I are the most effective at decreasing average error rates (up to 5.8-fold relative to the input), whereas MutS is the best for increasing the number of perfect assemblies (up to 25.2-fold). We are able to quantify differential specificities such as ErrASE preferentially corrects C/G transversions whereas T7 Endonuclease I preferentially corrects A/T transversions. More generally, this experimental and computational pipeline is a fast, scalable and extensible way to analyze errors in gene assemblies, to profile error correction methods, and to benchmark DNA synthesis methods.« less

  2. Machine Learned Replacement of N-Labels for Basecalled Sequences in DNA Barcoding.

    PubMed

    Ma, Eddie Y T; Ratnasingham, Sujeevan; Kremer, Stefan C

    2018-01-01

    This study presents a machine learning method that increases the number of identified bases in Sanger Sequencing. The system post-processes a KB basecalled chromatogram. It selects a recoverable subset of N-labels in the KB-called chromatogram to replace with basecalls (A,C,G,T). An N-label correction is defined given an additional read of the same sequence, and a human finished sequence. Corrections are added to the dataset when an alignment determines the additional read and human agree on the identity of the N-label. KB must also rate the replacement with quality value of in the additional read. Corrections are only available during system training. Developing the system, nearly 850,000 N-labels are obtained from Barcode of Life Datasystems, the premier database of genetic markers called DNA Barcodes. Increasing the number of correct bases improves reference sequence reliability, increases sequence identification accuracy, and assures analysis correctness. Keeping with barcoding standards, our system maintains an error rate of percent. Our system only applies corrections when it estimates low rate of error. Tested on this data, our automation selects and recovers: 79 percent of N-labels from COI (animal barcode); 80 percent from matK and rbcL (plant barcodes); and 58 percent from non-protein-coding sequences (across eukaryotes).

  3. AMPLISAS: a web server for multilocus genotyping using next-generation amplicon sequencing data.

    PubMed

    Sebastian, Alvaro; Herdegen, Magdalena; Migalska, Magdalena; Radwan, Jacek

    2016-03-01

    Next-generation sequencing (NGS) technologies are revolutionizing the fields of biology and medicine as powerful tools for amplicon sequencing (AS). Using combinations of primers and barcodes, it is possible to sequence targeted genomic regions with deep coverage for hundreds, even thousands, of individuals in a single experiment. This is extremely valuable for the genotyping of gene families in which locus-specific primers are often difficult to design, such as the major histocompatibility complex (MHC). The utility of AS is, however, limited by the high intrinsic sequencing error rates of NGS technologies and other sources of error such as polymerase amplification or chimera formation. Correcting these errors requires extensive bioinformatic post-processing of NGS data. Amplicon Sequence Assignment (AMPLISAS) is a tool that performs analysis of AS results in a simple and efficient way, while offering customization options for advanced users. AMPLISAS is designed as a three-step pipeline consisting of (i) read demultiplexing, (ii) unique sequence clustering and (iii) erroneous sequence filtering. Allele sequences and frequencies are retrieved in excel spreadsheet format, making them easy to interpret. AMPLISAS performance has been successfully benchmarked against previously published genotyped MHC data sets obtained with various NGS technologies. © 2015 John Wiley & Sons Ltd.

  4. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3.

    PubMed

    Han, Mira V; Thomas, Gregg W C; Lugo-Martinez, Jose; Hahn, Matthew W

    2013-08-01

    Current sequencing methods produce large amounts of data, but genome assemblies constructed from these data are often fragmented and incomplete. Incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. This means that methods attempting to estimate rates of gene duplication and loss often will be misled by such errors and that rates of gene family evolution will be consistently overestimated. Here, we present a method that takes these errors into account, allowing one to accurately infer rates of gene gain and loss among genomes even with low assembly and annotation quality. The method is implemented in the newest version of the software package CAFE, along with several other novel features. We demonstrate the accuracy of the method with extensive simulations and reanalyze several previously published data sets. Our results show that errors in genome annotation do lead to higher inferred rates of gene gain and loss but that CAFE 3 sufficiently accounts for these errors to provide accurate estimates of important evolutionary parameters.

  5. On the value of Mendelian laws of segregation in families: data quality control, imputation and beyond

    PubMed Central

    Blue, Elizabeth Marchani; Sun, Lei; Tintle, Nathan L.; Wijsman, Ellen M.

    2014-01-01

    When analyzing family data, we dream of perfectly informative data, even whole genome sequences (WGS) for all family members. Reality intervenes, and we find next-generation sequence (NGS) data have error, and are often too expensive or impossible to collect on everyone. Genetic Analysis Workshop 18 groups “Quality Control” and “Dropping WGS through families using GWAS framework” focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single nucleotide polymorphisms, NGS, and imputed data are generally concordant, but that errors are particularly likely at rare variants, homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelateds. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Both genotype and pedigree errors had an adverse effect on subsequent analyses. Computationally fast rules-based imputation was accurate, but could not cover as many loci or subjects as more computationally demanding probability-based methods. Incorporating population-level data into pedigree-based imputation methods improved results. Observed data outperformed imputed data in association testing, but imputed data were also useful. We discuss the strengths and weaknesses of existing methods, and suggest possible future directions. Topics include improving communication between those performing data collection and analysis, establishing thresholds for and improving imputation quality, and incorporating error into imputation and analytical models. PMID:25112184

  6. Mapping DNA methylation by transverse current sequencing: Reduction of noise from neighboring nucleotides

    NASA Astrophysics Data System (ADS)

    Alvarez, Jose; Massey, Steven; Kalitsov, Alan; Velev, Julian

    Nanopore sequencing via transverse current has emerged as a competitive candidate for mapping DNA methylation without needed bisulfite-treatment, fluorescent tag, or PCR amplification. By eliminating the error producing amplification step, long read lengths become feasible, which greatly simplifies the assembly process and reduces the time and the cost inherent in current technologies. However, due to the large error rates of nanopore sequencing, single base resolution has not been reached. A very important source of noise is the intrinsic structural noise in the electric signature of the nucleotide arising from the influence of neighboring nucleotides. In this work we perform calculations of the tunneling current through DNA molecules in nanopores using the non-equilibrium electron transport method within an effective multi-orbital tight-binding model derived from first-principles calculations. We develop a base-calling algorithm accounting for the correlations of the current through neighboring bases, which in principle can reduce the error rate below any desired precision. Using this method we show that we can clearly distinguish DNA methylation and other base modifications based on the reading of the tunneling current.

  7. A short note on dynamic programming in a band.

    PubMed

    Gibrat, Jean-François

    2018-06-15

    Third generation sequencing technologies generate long reads that exhibit high error rates, in particular for insertions and deletions which are usually the most difficult errors to cope with. The only exact algorithm capable of aligning sequences with insertions and deletions is a dynamic programming algorithm. In this note, for the sake of efficiency, we consider dynamic programming in a band. We show how to choose the band width in function of the long reads' error rates, thus obtaining an [Formula: see text] algorithm in space and time. We also propose a procedure to decide whether this algorithm, when applied to semi-global alignments, provides the optimal score. We suggest that dynamic programming in a band is well suited to the problem of aligning long reads between themselves and can be used as a core component of methods for obtaining a consensus sequence from the long reads alone. The function implementing the dynamic programming algorithm in a band is available, as a standalone program, at: https://forgemia.inra.fr/jean-francois.gibrat/BAND_DYN_PROG.git.

  8. Specificity control for read alignments using an artificial reference genome-guided false discovery rate.

    PubMed

    Giese, Sven H; Zickmann, Franziska; Renard, Bernhard Y

    2014-01-01

    Accurate estimation, comparison and evaluation of read mapping error rates is a crucial step in the processing of next-generation sequencing data, as further analysis steps and interpretation assume the correctness of the mapping results. Current approaches are either focused on sensitivity estimation and thereby disregard specificity or are based on read simulations. Although continuously improving, read simulations are still prone to introduce a bias into the mapping error quantitation and cannot capture all characteristics of an individual dataset. We introduce ARDEN (artificial reference driven estimation of false positives in next-generation sequencing data), a novel benchmark method that estimates error rates of read mappers based on real experimental reads, using an additionally generated artificial reference genome. It allows a dataset-specific computation of error rates and the construction of a receiver operating characteristic curve. Thereby, it can be used for optimization of parameters for read mappers, selection of read mappers for a specific problem or for filtering alignments based on quality estimation. The use of ARDEN is demonstrated in a general read mapper comparison, a parameter optimization for one read mapper and an application example in single-nucleotide polymorphism discovery with a significant reduction in the number of false positive identifications. The ARDEN source code is freely available at http://sourceforge.net/projects/arden/.

  9. Ultraaccurate genome sequencing and haplotyping of single human cells.

    PubMed

    Chu, Wai Keung; Edge, Peter; Lee, Ho Suk; Bansal, Vikas; Bafna, Vineet; Huang, Xiaohua; Zhang, Kun

    2017-11-21

    Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10 -8 and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs. Copyright © 2017 the Author(s). Published by PNAS.

  10. Whole exome sequencing for familial bicuspid aortic valve identifies putative variants.

    PubMed

    Martin, Lisa J; Pilipenko, Valentina; Kaufman, Kenneth M; Cripe, Linda; Kottyan, Leah C; Keddache, Mehdi; Dexheimer, Phillip; Weirauch, Matthew T; Benson, D Woodrow

    2014-10-01

    Bicuspid aortic valve (BAV) is the most common congenital cardiovascular malformation. Although highly heritable, few causal variants have been identified. The purpose of this study was to identify genetic variants underlying BAV by whole exome sequencing a multiplex BAV kindred. Whole exome sequencing was performed on 17 individuals from a single family (BAV=3; other cardiovascular malformation, 3). Postvariant calling error control metrics were established after examining the relationship between Mendelian inheritance error rate and coverage, quality score, and call rate. To determine the most effective approach to identifying susceptibility variants from among 54 674 variants passing error control metrics, we evaluated 3 variant selection strategies frequently used in whole exome sequencing studies plus extended family linkage. No putative rare, high-effect variants were identified in all affected but no unaffected individuals. Eight high-effect variants were identified by ≥2 of the commonly used selection strategies; however, these were either common in the general population (>10%) or present in the majority of the unaffected family members. However, using extended family linkage, 3 synonymous variants were identified; all 3 variants were identified by at least one other strategy. These results suggest that traditional whole exome sequencing approaches, which assume causal variants alter coding sense, may be insufficient for BAV and other complex traits. Identification of disease-associated variants is facilitated by the use of segregation within families. © 2014 American Heart Association, Inc.

  11. Speech serial control in healthy speakers and speakers with hypokinetic or ataxic dysarthria: effects of sequence length and practice

    PubMed Central

    Reilly, Kevin J.; Spencer, Kristie A.

    2013-01-01

    The current study investigated the processes responsible for selection of sounds and syllables during production of speech sequences in 10 adults with hypokinetic dysarthria from Parkinson’s disease, five adults with ataxic dysarthria, and 14 healthy control speakers. Speech production data from a choice reaction time task were analyzed to evaluate the effects of sequence length and practice on speech sound sequencing. Speakers produced sequences that were between one and five syllables in length over five experimental runs of 60 trials each. In contrast to the healthy speakers, speakers with hypokinetic dysarthria demonstrated exaggerated sequence length effects for both inter-syllable intervals (ISIs) and speech error rates. Conversely, speakers with ataxic dysarthria failed to demonstrate a sequence length effect on ISIs and were also the only group that did not exhibit practice-related changes in ISIs and speech error rates over the five experimental runs. The exaggerated sequence length effects in the hypokinetic speakers with Parkinson’s disease are consistent with an impairment of action selection during speech sequence production. The absent length effects observed in the speakers with ataxic dysarthria is consistent with previous findings that indicate a limited capacity to buffer speech sequences in advance of their execution. In addition, the lack of practice effects in these speakers suggests that learning-related improvements in the production rate and accuracy of speech sequences involves processing by structures of the cerebellum. Together, the current findings inform models of serial control for speech in healthy speakers and support the notion that sequencing deficits contribute to speech symptoms in speakers with hypokinetic or ataxic dysarthria. In addition, these findings indicate that speech sequencing is differentially impaired in hypokinetic and ataxic dysarthria. PMID:24137121

  12. Monitoring Error Rates In Illumina Sequencing.

    PubMed

    Manley, Leigh J; Ma, Duanduan; Levine, Stuart S

    2016-12-01

    Guaranteeing high-quality next-generation sequencing data in a rapidly changing environment is an ongoing challenge. The introduction of the Illumina NextSeq 500 and the depreciation of specific metrics from Illumina's Sequencing Analysis Viewer (SAV; Illumina, San Diego, CA, USA) have made it more difficult to determine directly the baseline error rate of sequencing runs. To improve our ability to measure base quality, we have created an open-source tool to construct the Percent Perfect Reads (PPR) plot, previously provided by the Illumina sequencers. The PPR program is compatible with HiSeq 2000/2500, MiSeq, and NextSeq 500 instruments and provides an alternative to Illumina's quality value (Q) scores for determining run quality. Whereas Q scores are representative of run quality, they are often overestimated and are sourced from different look-up tables for each platform. The PPR's unique capabilities as a cross-instrument comparison device, as a troubleshooting tool, and as a tool for monitoring instrument performance can provide an increase in clarity over SAV metrics that is often crucial for maintaining instrument health. These capabilities are highlighted.

  13. Nucleotide-Specific Contrast for DNA Sequencing by Electron Spectroscopy.

    PubMed

    Mankos, Marian; Persson, Henrik H J; N'Diaye, Alpha T; Shadman, Khashayar; Schmid, Andreas K; Davis, Ronald W

    2016-01-01

    DNA sequencing by imaging in an electron microscope is an approach that holds promise to deliver long reads with low error rates and without the need for amplification. Earlier work using transmission electron microscopes, which use high electron energies on the order of 100 keV, has shown that low contrast and radiation damage necessitates the use of heavy atom labeling of individual nucleotides, which increases the read error rates. Other prior work using scattering electrons with much lower energy has shown to suppress beam damage on DNA. Here we explore possibilities to increase contrast by employing two methods, X-ray photoelectron and Auger electron spectroscopy. Using bulk DNA samples with monomers of each base, both methods are shown to provide contrast mechanisms that can distinguish individual nucleotides without labels. Both spectroscopic techniques can be readily implemented in a low energy electron microscope, which may enable label-free DNA sequencing by direct imaging.

  14. Type I error rates of rare single nucleotide variants are inflated in tests of association with non-normally distributed traits using simple linear regression methods.

    PubMed

    Schwantes-An, Tae-Hwi; Sung, Heejong; Sabourin, Jeremy A; Justice, Cristina M; Sorant, Alexa J M; Wilson, Alexander F

    2016-01-01

    In this study, the effects of (a) the minor allele frequency of the single nucleotide variant (SNV), (b) the degree of departure from normality of the trait, and (c) the position of the SNVs on type I error rates were investigated in the Genetic Analysis Workshop (GAW) 19 whole exome sequence data. To test the distribution of the type I error rate, 5 simulated traits were considered: standard normal and gamma distributed traits; 2 transformed versions of the gamma trait (log 10 and rank-based inverse normal transformations); and trait Q1 provided by GAW 19. Each trait was tested with 313,340 SNVs. Tests of association were performed with simple linear regression and average type I error rates were determined for minor allele frequency classes. Rare SNVs (minor allele frequency < 0.05) showed inflated type I error rates for non-normally distributed traits that increased as the minor allele frequency decreased. The inflation of average type I error rates increased as the significance threshold decreased. Normally distributed traits did not show inflated type I error rates with respect to the minor allele frequency for rare SNVs. There was no consistent effect of transformation on the uniformity of the distribution of the location of SNVs with a type I error.

  15. FMLRC: Hybrid long read error correction using an FM-index.

    PubMed

    Wang, Jeremy R; Holt, James; McMillan, Leonard; Jones, Corbin D

    2018-02-09

    Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging "hybrid" assemblies that use long reads for scaffolding and short reads for accuracy. We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods. Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.

  16. Joint Estimation of Contamination, Error and Demography for Nuclear DNA from Ancient Humans

    PubMed Central

    Slatkin, Montgomery

    2016-01-01

    When sequencing an ancient DNA sample from a hominin fossil, DNA from present-day humans involved in excavation and extraction will be sequenced along with the endogenous material. This type of contamination is problematic for downstream analyses as it will introduce a bias towards the population of the contaminating individual(s). Quantifying the extent of contamination is a crucial step as it allows researchers to account for possible biases that may arise in downstream genetic analyses. Here, we present an MCMC algorithm to co-estimate the contamination rate, sequencing error rate and demographic parameters—including drift times and admixture rates—for an ancient nuclear genome obtained from human remains, when the putative contaminating DNA comes from present-day humans. We assume we have a large panel representing the putative contaminant population (e.g. European, East Asian or African). The method is implemented in a C++ program called ‘Demographic Inference with Contamination and Error’ (DICE). We applied it to simulations and genome data from ancient Neanderthals and modern humans. With reasonable levels of genome sequence coverage (>3X), we find we can recover accurate estimates of all these parameters, even when the contamination rate is as high as 50%. PMID:27049965

  17. Reliability of a Longitudinal Sequence of Scale Ratings

    ERIC Educational Resources Information Center

    Laenen, Annouschka; Alonso, Ariel; Molenberghs, Geert; Vangeneugden, Tony

    2009-01-01

    Reliability captures the influence of error on a measurement and, in the classical setting, is defined as one minus the ratio of the error variance to the total variance. Laenen, Alonso, and Molenberghs ("Psychometrika" 73:443-448, 2007) proposed an axiomatic definition of reliability and introduced the R[subscript T] coefficient, a measure of…

  18. An effective solution to the nonlinear, nonstationary Navier-Stokes equations for two dimensions

    NASA Technical Reports Server (NTRS)

    Gabrielsen, R. E.

    1975-01-01

    A sequence of approximate solutions for the nonlinear, nonstationary Navier-Stokes equations for a two-dimensional domain, from which explicit error estimates and rates of convergence are obtained, is described. This sequence of approximate solutions is based primarily on the Newton-Kantorovich method.

  19. Fast imputation using medium- or low-coverage sequence data

    USDA-ARS?s Scientific Manuscript database

    Direct imputation from raw sequence reads can be more accurate than calling genotypes first and then imputing, especially if read depth is low or error rates high, but different imputation strategies are required than those used for data from genotyping chips. A fast algorithm to impute from lower t...

  20. Determinants of the rate of protein sequence evolution

    PubMed Central

    Zhang, Jianzhi; Yang, Jian-Rong

    2015-01-01

    The rate and mechanism of protein sequence evolution have been central questions in evolutionary biology since the 1960s. Although the rate of protein sequence evolution depends primarily on the level of functional constraint, exactly what constitutes functional constraint has remained unclear. The increasing availability of genomic data has allowed for much needed empirical examinations on the nature of functional constraint. These studies found that the evolutionary rate of a protein is predominantly influenced by its expression level rather than functional importance. A combination of theoretical and empirical analyses have identified multiple mechanisms behind these observations and demonstrated a prominent role that selection against errors in molecular and cellular processes plays in protein evolution. PMID:26055156

  1. Bit error rate tester using fast parallel generation of linear recurring sequences

    DOEpatents

    Pierson, Lyndon G.; Witzke, Edward L.; Maestas, Joseph H.

    2003-05-06

    A fast method for generating linear recurring sequences by parallel linear recurring sequence generators (LRSGs) with a feedback circuit optimized to balance minimum propagation delay against maximal sequence period. Parallel generation of linear recurring sequences requires decimating the sequence (creating small contiguous sections of the sequence in each LRSG). A companion matrix form is selected depending on whether the LFSR is right-shifting or left-shifting. The companion matrix is completed by selecting a primitive irreducible polynomial with 1's most closely grouped in a corner of the companion matrix. A decimation matrix is created by raising the companion matrix to the (n*k).sup.th power, where k is the number of parallel LRSGs and n is the number of bits to be generated at a time by each LRSG. Companion matrices with 1's closely grouped in a corner will yield sparse decimation matrices. A feedback circuit comprised of XOR logic gates implements the decimation matrix in hardware. Sparse decimation matrices can be implemented with minimum number of XOR gates, and therefore a minimum propagation delay through the feedback circuit. The LRSG of the invention is particularly well suited to use as a bit error rate tester on high speed communication lines because it permits the receiver to synchronize to the transmitted pattern within 2n bits.

  2. Physical layer one-time-pad data encryption through synchronized semiconductor laser networks

    NASA Astrophysics Data System (ADS)

    Argyris, Apostolos; Pikasis, Evangelos; Syvridis, Dimitris

    2016-02-01

    Semiconductor lasers (SL) have been proven to be a key device in the generation of ultrafast true random bit streams. Their potential to emit chaotic signals under conditions with desirable statistics, establish them as a low cost solution to cover various needs, from large volume key generation to real-time encrypted communications. Usually, only undemanding post-processing is needed to convert the acquired analog timeseries to digital sequences that pass all established tests of randomness. A novel architecture that can generate and exploit these true random sequences is through a fiber network in which the nodes are semiconductor lasers that are coupled and synchronized to central hub laser. In this work we show experimentally that laser nodes in such a star network topology can synchronize with each other through complex broadband signals that are the seed to true random bit sequences (TRBS) generated at several Gb/s. The potential for each node to access real-time generated and synchronized with the rest of the nodes random bit streams, through the fiber optic network, allows to implement an one-time-pad encryption protocol that mixes the synchronized true random bit sequence with real data at Gb/s rates. Forward-error correction methods are used to reduce the errors in the TRBS and the final error rate at the data decoding level. An appropriate selection in the sampling methodology and properties, as well as in the physical properties of the chaotic seed signal through which network locks in synchronization, allows an error free performance.

  3. Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data

    PubMed Central

    Soraggi, Samuele; Wiuf, Carsten; Albrechtsen, Anders

    2017-01-01

    The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high-throughput sequencing data, calling genotypes accurately is not always possible; therefore, the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction to combat the problems of sequencing errors, and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this leads to an estimate of the admixture rate. We prove that the D-statistic is approximated by a standard normal distribution. Furthermore, we show that our method outperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low and medium sequencing depth (1–10×), and performances are as good as with perfectly called genotypes at a sequencing depth of 2×. We show the reliability of error correction in scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to estimate the admixture rates. PMID:29196497

  4. Dynamic Sensorimotor Planning during Long-Term Sequence Learning: The Role of Variability, Response Chunking and Planning Errors

    PubMed Central

    Verstynen, Timothy; Phillips, Jeff; Braun, Emily; Workman, Brett; Schunn, Christian; Schneider, Walter

    2012-01-01

    Many everyday skills are learned by binding otherwise independent actions into a unified sequence of responses across days or weeks of practice. Here we looked at how the dynamics of action planning and response binding change across such long timescales. Subjects (N = 23) were trained on a bimanual version of the serial reaction time task (32-item sequence) for two weeks (10 days total). Response times and accuracy both showed improvement with time, but appeared to be learned at different rates. Changes in response speed across training were associated with dynamic changes in response time variability, with faster learners expanding their variability during the early training days and then contracting response variability late in training. Using a novel measure of response chunking, we found that individual responses became temporally correlated across trials and asymptoted to set sizes of approximately 7 bound responses at the end of the first week of training. Finally, we used a state-space model of the response planning process to look at how predictive (i.e., response anticipation) and error-corrective (i.e., post-error slowing) processes correlated with learning rates for speed, accuracy and chunking. This analysis yielded non-monotonic association patterns between the state-space model parameters and learning rates, suggesting that different parts of the response planning process are relevant at different stages of long-term learning. These findings highlight the dynamic modulation of response speed, variability, accuracy and chunking as multiple movements become bound together into a larger set of responses during sequence learning. PMID:23056630

  5. Partial bisulfite conversion for unique template sequencing

    PubMed Central

    Kumar, Vijay; Rosenbaum, Julie; Wang, Zihua; Forcier, Talitha; Ronemus, Michael; Wigler, Michael

    2018-01-01

    Abstract We introduce a new protocol, mutational sequencing or muSeq, which uses sodium bisulfite to randomly deaminate unmethylated cytosines at a fixed and tunable rate. The muSeq protocol marks each initial template molecule with a unique mutation signature that is present in every copy of the template, and in every fragmented copy of a copy. In the sequenced read data, this signature is observed as a unique pattern of C-to-T or G-to-A nucleotide conversions. Clustering reads with the same conversion pattern enables accurate count and long-range assembly of initial template molecules from short-read sequence data. We explore count and low-error sequencing by profiling 135 000 restriction fragments in a PstI representation, demonstrating that muSeq improves copy number inference and significantly reduces sporadic sequencer error. We explore long-range assembly in the context of cDNA, generating contiguous transcript clusters greater than 3,000 bp in length. The muSeq assemblies reveal transcriptional diversity not observable from short-read data alone. PMID:29161423

  6. Nucleotide-Specific Contrast for DNA Sequencing by Electron Spectroscopy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mankos, Marian; Persson, Henrik H. J.; N’Diaye, Alpha T.

    DNA sequencing by imaging in an electron microscope is an approach that holds promise to deliver long reads with low error rates and without the need for amplification. Earlier work using transmission electron microscopes, which use high electron energies on the order of 100 keV, has shown that low contrast and radiation damage necessitates the use of heavy atom labeling of individual nucleotides, which increases the read error rates. Other prior work using scattering electrons with much lower energy has shown to suppress beam damage on DNA. Here we explore possibilities to increase contrast by employing two methods, X-ray photoelectronmore » and Auger electron spectroscopy. Using bulk DNA samples with monomers of each base, both methods are shown to provide contrast mechanisms that can distinguish individual nucleotides without labels. In conclusion, both spectroscopic techniques can be readily implemented in a low energy electron microscope, which may enable label-free DNA sequencing by direct imaging.« less

  7. Nucleotide-Specific Contrast for DNA Sequencing by Electron Spectroscopy

    DOE PAGES

    Mankos, Marian; Persson, Henrik H. J.; N’Diaye, Alpha T.; ...

    2016-05-05

    DNA sequencing by imaging in an electron microscope is an approach that holds promise to deliver long reads with low error rates and without the need for amplification. Earlier work using transmission electron microscopes, which use high electron energies on the order of 100 keV, has shown that low contrast and radiation damage necessitates the use of heavy atom labeling of individual nucleotides, which increases the read error rates. Other prior work using scattering electrons with much lower energy has shown to suppress beam damage on DNA. Here we explore possibilities to increase contrast by employing two methods, X-ray photoelectronmore » and Auger electron spectroscopy. Using bulk DNA samples with monomers of each base, both methods are shown to provide contrast mechanisms that can distinguish individual nucleotides without labels. In conclusion, both spectroscopic techniques can be readily implemented in a low energy electron microscope, which may enable label-free DNA sequencing by direct imaging.« less

  8. Computationally mapping sequence space to understand evolutionary protein engineering.

    PubMed

    Armstrong, Kathryn A; Tidor, Bruce

    2008-01-01

    Evolutionary protein engineering has been dramatically successful, producing a wide variety of new proteins with altered stability, binding affinity, and enzymatic activity. However, the success of such procedures is often unreliable, and the impact of the choice of protein, engineering goal, and evolutionary procedure is not well understood. We have created a framework for understanding aspects of the protein engineering process by computationally mapping regions of feasible sequence space for three small proteins using structure-based design protocols. We then tested the ability of different evolutionary search strategies to explore these sequence spaces. The results point to a non-intuitive relationship between the error-prone PCR mutation rate and the number of rounds of replication. The evolutionary relationships among feasible sequences reveal hub-like sequences that serve as particularly fruitful starting sequences for evolutionary search. Moreover, genetic recombination procedures were examined, and tradeoffs relating sequence diversity and search efficiency were identified. This framework allows us to consider the impact of protein structure on the allowed sequence space and therefore on the challenges that each protein presents to error-prone PCR and genetic recombination procedures.

  9. Error control techniques for satellite and space communications

    NASA Technical Reports Server (NTRS)

    Costello, D. J., Jr.

    1986-01-01

    High rate concatenated coding systems with trellis inner codes and Reed-Solomon (RS) outer codes for application in satellite communication systems are considered. Two types of inner codes are studied: high rate punctured binary convolutional codes which result in overall effective information rates between 1/2 and 1 bit per channel use; and bandwidth efficient signal space trellis codes which can achieve overall effective information rates greater than 1 bit per channel use. Channel capacity calculations with and without side information performed for the concatenated coding system. Concatenated coding schemes are investigated. In Scheme 1, the inner code is decoded with the Viterbi algorithm and the outer RS code performs error-correction only (decoding without side information). In scheme 2, the inner code is decoded with a modified Viterbi algorithm which produces reliability information along with the decoded output. In this algorithm, path metrics are used to estimate the entire information sequence, while branch metrics are used to provide the reliability information on the decoded sequence. This information is used to erase unreliable bits in the decoded output. An errors-and-erasures RS decoder is then used for the outer code. These two schemes are proposed for use on NASA satellite channels. Results indicate that high system reliability can be achieved with little or no bandwidth expansion.

  10. Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data.

    PubMed

    Soraggi, Samuele; Wiuf, Carsten; Albrechtsen, Anders

    2018-02-02

    The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high-throughput sequencing data, calling genotypes accurately is not always possible; therefore, the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction to combat the problems of sequencing errors, and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this leads to an estimate of the admixture rate. We prove that the D-statistic is approximated by a standard normal distribution. Furthermore, we show that our method outperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low and medium sequencing depth (1-10×), and performances are as good as with perfectly called genotypes at a sequencing depth of 2×. We show the reliability of error correction in scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to estimate the admixture rates. Copyright © 2018 Soraggi et al.

  11. Genotyping-by-sequencing for estimating relatedness in nonmodel organisms: Avoiding the trap of precise bias.

    PubMed

    Attard, Catherine R M; Beheregaray, Luciano B; Möller, Luciana M

    2018-05-01

    There has been remarkably little attention to using the high resolution provided by genotyping-by-sequencing (i.e., RADseq and similar methods) for assessing relatedness in wildlife populations. A major hurdle is the genotyping error, especially allelic dropout, often found in this type of data that could lead to downward-biased, yet precise, estimates of relatedness. Here, we assess the applicability of genotyping-by-sequencing for relatedness inferences given its relatively high genotyping error rate. Individuals of known relatedness were simulated under genotyping error, allelic dropout and missing data scenarios based on an empirical ddRAD data set, and their true relatedness was compared to that estimated by seven relatedness estimators. We found that an estimator chosen through such analyses can circumvent the influence of genotyping error, with the estimator of Ritland (Genetics Research, 67, 175) shown to be unaffected by allelic dropout and to be the most accurate when there is genotyping error. We also found that the choice of estimator should not rely solely on the strength of correlation between estimated and true relatedness as a strong correlation does not necessarily mean estimates are close to true relatedness. We also demonstrated how even a large SNP data set with genotyping error (allelic dropout or otherwise) or missing data still performs better than a perfectly genotyped microsatellite data set of tens of markers. The simulation-based approach used here can be easily implemented by others on their own genotyping-by-sequencing data sets to confirm the most appropriate and powerful estimator for their data. © 2017 John Wiley & Sons Ltd.

  12. Identification and correction of systematic error in high-throughput sequence data

    PubMed Central

    2011-01-01

    Background A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed "next-gen" sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of systematic error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. Results We characterize and describe systematic errors using overlapping paired reads from high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that they are highly replicable across experiments. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq), and can be used with single-end datasets. Conclusions Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments. PMID:22099972

  13. Assessing the performance of the Oxford Nanopore Technologies MinION

    PubMed Central

    Laver, T.; Harrison, J.; O’Neill, P.A.; Moore, K.; Farbos, A.; Paszkiewicz, K.; Studholme, D.J.

    2015-01-01

    The Oxford Nanopore Technologies (ONT) MinION is a new sequencing technology that potentially offers read lengths of tens of kilobases (kb) limited only by the length of DNA molecules presented to it. The device has a low capital cost, is by far the most portable DNA sequencer available, and can produce data in real-time. It has numerous prospective applications including improving genome sequence assemblies and resolution of repeat-rich regions. Before such a technology is widely adopted, it is important to assess its performance and limitations in respect of throughput and accuracy. In this study we assessed the performance of the MinION by re-sequencing three bacterial genomes, with very different nucleotide compositions ranging from 28.6% to 70.7%; the high G + C strain was underrepresented in the sequencing reads. We estimate the error rate of the MinION (after base calling) to be 38.2%. Mean and median read lengths were 2 kb and 1 kb respectively, while the longest single read was 98 kb. The whole length of a 5 kb rRNA operon was covered by a single read. As the first nanopore-based single molecule sequencer available to researchers, the MinION is an exciting prospect; however, the current error rate limits its ability to compete with existing sequencing technologies, though we do show that MinION sequence reads can enhance contiguity of de novo assembly when used in conjunction with Illumina MiSeq data. PMID:26753127

  14. Data driven CAN node reliability assessment for manufacturing system

    NASA Astrophysics Data System (ADS)

    Zhang, Leiming; Yuan, Yong; Lei, Yong

    2017-01-01

    The reliability of the Controller Area Network(CAN) is critical to the performance and safety of the system. However, direct bus-off time assessment tools are lacking in practice due to inaccessibility of the node information and the complexity of the node interactions upon errors. In order to measure the mean time to bus-off(MTTB) of all the nodes, a novel data driven node bus-off time assessment method for CAN network is proposed by directly using network error information. First, the corresponding network error event sequence for each node is constructed using multiple-layer network error information. Then, the generalized zero inflated Poisson process(GZIP) model is established for each node based on the error event sequence. Finally, the stochastic model is constructed to predict the MTTB of the node. The accelerated case studies with different error injection rates are conducted on a laboratory network to demonstrate the proposed method, where the network errors are generated by a computer controlled error injection system. Experiment results show that the MTTB of nodes predicted by the proposed method agree well with observations in the case studies. The proposed data driven node time to bus-off assessment method for CAN networks can successfully predict the MTTB of nodes by directly using network error event data.

  15. Neuropsychological analysis of a typewriting disturbance following cerebral damage.

    PubMed

    Boyle, M; Canter, G J

    1987-01-01

    Following a left CVA, a skilled professional typist sustained a disturbance of typing disproportionate to her handwriting disturbance. Typing errors were predominantly of the sequencing type, with spatial errors much less frequent, suggesting that the impairment was based on a relatively early (premotor) stage of processing. Depriving the subject of visual feedback during handwriting greatly increased her error rate. Similarly, interfering with auditory feedback during speech substantially reduced her self-correction of speech errors. These findings suggested that impaired ability to utilize somesthetic information--probably caused by the subject's parietal lobe lesion--may have been the basis of the typing disorder.

  16. Genotyping-by-sequencing for Populus population genomics: An assessment of genome sampling patterns and filtering approaches

    Treesearch

    Martin P. Schilling; Paul G. Wolf; Aaron M. Duffy; Hardeep S. Rai; Carol A. Rowe; Bryce A. Richardson; Karen E. Mock

    2014-01-01

    Continuing advances in nucleotide sequencing technology are inspiring a suite of genomic approaches in studies of natural populations. Researchers are faced with data management and analytical scales that are increasing by orders of magnitude. With such dramatic advances comes a need to understand biases and error rates, which can be propagated and magnified in large-...

  17. Base modifications affecting RNA polymerase and reverse transcriptase fidelity.

    PubMed

    Potapov, Vladimir; Fu, Xiaoqing; Dai, Nan; Corrêa, Ivan R; Tanner, Nathan A; Ong, Jennifer L

    2018-06-20

    Ribonucleic acid (RNA) is capable of hosting a variety of chemically diverse modifications, in both naturally-occurring post-transcriptional modifications and artificial chemical modifications used to expand the functionality of RNA. However, few studies have addressed how base modifications affect RNA polymerase and reverse transcriptase activity and fidelity. Here, we describe the fidelity of RNA synthesis and reverse transcription of modified ribonucleotides using an assay based on Pacific Biosciences Single Molecule Real-Time sequencing. Several modified bases, including methylated (m6A, m5C and m5U), hydroxymethylated (hm5U) and isomeric bases (pseudouridine), were examined. By comparing each modified base to the equivalent unmodified RNA base, we can determine how the modification affected cumulative RNA polymerase and reverse transcriptase fidelity. 5-hydroxymethyluridine and N6-methyladenosine both increased the combined error rate of T7 RNA polymerase and reverse transcriptases, while pseudouridine specifically increased the error rate of RNA synthesis by T7 RNA polymerase. In addition, we examined the frequency, mutational spectrum and sequence context of reverse transcription errors on DNA templates from an analysis of second strand DNA synthesis.

  18. Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data

    PubMed Central

    Kosugi, Shunichi; Natsume, Satoshi; Yoshida, Kentaro; MacLean, Daniel; Cano, Liliana; Kamoun, Sophien; Terauchi, Ryohei

    2013-01-01

    Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in ‘targeted’ alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/. PMID:24116042

  19. High rate concatenated coding systems using bandwidth efficient trellis inner codes

    NASA Technical Reports Server (NTRS)

    Deng, Robert H.; Costello, Daniel J., Jr.

    1989-01-01

    High-rate concatenated coding systems with bandwidth-efficient trellis inner codes and Reed-Solomon (RS) outer codes are investigated for application in high-speed satellite communication systems. Two concatenated coding schemes are proposed. In one the inner code is decoded with soft-decision Viterbi decoding, and the outer RS code performs error-correction-only decoding (decoding without side information). In the other, the inner code is decoded with a modified Viterbi algorithm, which produces reliability information along with the decoded output. In this algorithm, path metrics are used to estimate the entire information sequence, whereas branch metrics are used to provide reliability information on the decoded sequence. This information is used to erase unreliable bits in the decoded output. An errors-and-erasures RS decoder is then used for the outer code. The two schemes have been proposed for high-speed data communication on NASA satellite channels. The rates considered are at least double those used in current NASA systems, and the results indicate that high system reliability can still be achieved.

  20. Improved Quality in Aerospace Testing Through the Modern Design of Experiments

    NASA Technical Reports Server (NTRS)

    DeLoach, R.

    2000-01-01

    This paper illustrates how, in the presence of systematic error, the quality of an experimental result can be influenced by the order in which the independent variables are set. It is suggested that in typical experimental circumstances in which systematic errors are significant, the common practice of organizing the set point order of independent variables to maximize data acquisition rate results in a test matrix that fails to produce the highest quality research result. With some care to match the volume of data required to satisfy inference error risk tolerances, it is possible to accept a lower rate of data acquisition and still produce results of higher technical quality (lower experimental error) with less cost and in less time than conventional test procedures, simply by optimizing the sequence in which independent variable levels are set.

  1. Performance characteristics of the AmpliSeq Cancer Hotspot panel v2 in combination with the Ion Torrent Next Generation Sequencing Personal Genome Machine.

    PubMed

    Butler, Kimberly S; Young, Megan Y L; Li, Zhihua; Elespuru, Rosalie K; Wood, Steven C

    2016-02-01

    Next-Generation Sequencing is a rapidly advancing technology that has research and clinical applications. For many cancers, it is important to know the precise mutation(s) present, as specific mutations could indicate or contra-indicate certain treatments as well as be indicative of prognosis. Using the Ion Torrent Personal Genome Machine and the AmpliSeq Cancer Hotspot panel v2, we sequenced two pancreatic cancer cell lines, BxPC-3 and HPAF-II, alone or in mixtures, to determine the error rate, sensitivity, and reproducibility of this system. The system resulted in coverage averaging 2000× across the various amplicons and was able to reliably and reproducibly identify mutations present at a rate of 5%. Identification of mutations present at a lower rate was possible by altering the parameters by which calls were made, but with an increase in erroneous, low-level calls. The panel was able to identify known mutations in these cell lines that are present in the COSMIC database. In addition, other, novel mutations were also identified that may prove clinically useful. The system was assessed for systematic errors such as homopolymer effects, end of amplicon effects and patterns in NO CALL sequence. Overall, the system is adequate at identifying the known, targeted mutations in the panel. Published by Elsevier Inc.

  2. Evaluation of logistic regression models and effect of covariates for case-control study in RNA-Seq analysis.

    PubMed

    Choi, Seung Hoan; Labadorf, Adam T; Myers, Richard H; Lunetta, Kathryn L; Dupuis, Josée; DeStefano, Anita L

    2017-02-06

    Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth's logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth's logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework.

  3. Partial bisulfite conversion for unique template sequencing.

    PubMed

    Kumar, Vijay; Rosenbaum, Julie; Wang, Zihua; Forcier, Talitha; Ronemus, Michael; Wigler, Michael; Levy, Dan

    2018-01-25

    We introduce a new protocol, mutational sequencing or muSeq, which uses sodium bisulfite to randomly deaminate unmethylated cytosines at a fixed and tunable rate. The muSeq protocol marks each initial template molecule with a unique mutation signature that is present in every copy of the template, and in every fragmented copy of a copy. In the sequenced read data, this signature is observed as a unique pattern of C-to-T or G-to-A nucleotide conversions. Clustering reads with the same conversion pattern enables accurate count and long-range assembly of initial template molecules from short-read sequence data. We explore count and low-error sequencing by profiling 135 000 restriction fragments in a PstI representation, demonstrating that muSeq improves copy number inference and significantly reduces sporadic sequencer error. We explore long-range assembly in the context of cDNA, generating contiguous transcript clusters greater than 3,000 bp in length. The muSeq assemblies reveal transcriptional diversity not observable from short-read data alone. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Impacts of visuomotor sequence learning methods on speed and accuracy: Starting over from the beginning or from the point of error.

    PubMed

    Tanaka, Kanji; Watanabe, Katsumi

    2016-02-01

    The present study examined whether sequence learning led to more accurate and shorter performance time if people who are learning a sequence start over from the beginning when they make an error (i.e., practice the whole sequence) or only from the point of error (i.e., practice a part of the sequence). We used a visuomotor sequence learning paradigm with a trial-and-error procedure. In Experiment 1, we found fewer errors, and shorter performance time for those who restarted their performance from the beginning of the sequence as compared to those who restarted from the point at which an error occurred, indicating better learning of spatial and motor representations of the sequence. This might be because the learned elements were repeated when the next performance started over from the beginning. In subsequent experiments, we increased the occasions for the repetitions of learned elements by modulating the number of fresh start points in the sequence after errors. The results showed that fewer fresh start points were likely to lead to fewer errors and shorter performance time, indicating that the repetitions of learned elements enabled participants to develop stronger spatial and motor representations of the sequence. Thus, a single or two fresh start points in the sequence (i.e., starting over only from the beginning or from the beginning or midpoint of the sequence after errors) is likely to lead to more accurate and faster performance. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. FRAGS: estimation of coding sequence substitution rates from fragmentary data

    PubMed Central

    Swart, Estienne C; Hide, Winston A; Seoighe, Cathal

    2004-01-01

    Background Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account. Results We have developed FRAGS, an application framework that uses existing, freely available software components to construct in-frame alignments and estimate coding substitution rates from fragmentary sequence data. Coding sequence substitution estimates for human and chimpanzee sequences, generated by FRAGS, reveal that methodological differences can give rise to significantly different estimates of important substitution parameters. The estimated substitution rates were also used to infer upper-bounds on the amount of sequencing error in the datasets that we have analysed. Conclusion We have developed a system that performs robust estimation of substitution rates for orthologous sequences from a pair of organisms. Our system can be used when fragmentary genomic or transcript data is available from one of the organisms and the other is a completely sequenced genome within the Ensembl database. As well as estimating substitution statistics our system enables the user to manage and query alignment and substitution data. PMID:15005802

  6. Design of association studies with pooled or un-pooled next-generation sequencing data.

    PubMed

    Kim, Su Yeon; Li, Yingrui; Guo, Yiran; Li, Ruiqiang; Holmkvist, Johan; Hansen, Torben; Pedersen, Oluf; Wang, Jun; Nielsen, Rasmus

    2010-07-01

    Most common hereditary diseases in humans are complex and multifactorial. Large-scale genome-wide association studies based on SNP genotyping have only identified a small fraction of the heritable variation of these diseases. One explanation may be that many rare variants (a minor allele frequency, MAF <5%), which are not included in the common genotyping platforms, may contribute substantially to the genetic variation of these diseases. Next-generation sequencing, which would allow the analysis of rare variants, is now becoming so cheap that it provides a viable alternative to SNP genotyping. In this paper, we present cost-effective protocols for using next-generation sequencing in association mapping studies based on pooled and un-pooled samples, and identify optimal designs with respect to total number of individuals, number of individuals per pool, and the sequencing coverage. We perform a small empirical study to evaluate the pooling variance in a realistic setting where pooling is combined with exon-capturing. To test for associations, we develop a likelihood ratio statistic that accounts for the high error rate of next-generation sequencing data. We also perform extensive simulations to determine the power and accuracy of this method. Overall, our findings suggest that with a fixed cost, sequencing many individuals at a more shallow depth with larger pool size achieves higher power than sequencing a small number of individuals in higher depth with smaller pool size, even in the presence of high error rates. Our results provide guidelines for researchers who are developing association mapping studies based on next-generation sequencing. (c) 2010 Wiley-Liss, Inc.

  7. Local neutral networks help maintain inaccurately replicating ribozymes.

    PubMed

    Szilágyi, András; Kun, Ádám; Szathmáry, Eörs

    2014-01-01

    The error threshold of replication limits the selectively maintainable genome size against recurrent deleterious mutations for most fitness landscapes. In the context of RNA replication a distinction between the genotypic and the phenotypic error threshold has been made; where the latter concerns the maintenance of secondary structure rather than sequence. RNA secondary structure is treated as a proxy for function. The phenotypic error threshold allows higher per digit mutation rates than its genotypic counterpart, and is known to increase with the frequency of neutral mutations in sequence space. Here we show that the degree of neutrality, i.e. the frequency of nearest-neighbour (one-step) neutral mutants is a remarkably accurate proxy for the overall frequency of such mutants in an experimentally verifiable formula for the phenotypic error threshold; this we achieve by the full numerical solution for the concentration of all sequences in mutation-selection balance up to length 16. We reinforce our previous result that currently known ribozymes could be selectively maintained by the accuracy known from the best available polymerase ribozymes. Furthermore, we show that in silico stabilizing selection can increase the mutational robustness of ribozymes due to the fact that they were produced by artificial directional selection in the first place. Our finding offers a better understanding of the error threshold and provides further insight into the plausibility of an ancient RNA world.

  8. Quasispecies in population of compositional assemblies.

    PubMed

    Gross, Renan; Fouxon, Itzhak; Lancet, Doron; Markovitch, Omer

    2014-12-30

    The quasispecies model refers to information carriers that undergo self-replication with errors. A quasispecies is a steady-state population of biopolymer sequence variants generated by mutations from a master sequence. A quasispecies error threshold is a minimal replication accuracy below which the population structure breaks down. Theory and experimentation of this model often refer to biopolymers, e.g. RNA molecules or viral genomes, while its prebiotic context is often associated with an RNA world scenario. Here, we study the possibility that compositional entities which code for compositional information, intrinsically different from biopolymers coding for sequential information, could show quasispecies dynamics. We employed a chemistry-based model, graded autocatalysis replication domain (GARD), which simulates the network dynamics within compositional molecular assemblies. In GARD, a compotype represents a population of similar assemblies that constitute a quasi-stationary state in compositional space. A compotype's center-of-mass is found to be analogous to a master sequence for a sequential quasispecies. Using single-cycle GARD dynamics, we measured the quasispecies transition matrix (Q) for the probabilities of transition from one center-of-mass Euclidean distance to another. Similarly, the quasispecies' growth rate vector (A) was obtained. This allowed computing a steady state distribution of distances to the center of mass, as derived from the quasispecies equation. In parallel, a steady state distribution was obtained via the GARD equation kinetics. Rewardingly, a significant correlation was observed between the distributions obtained by these two methods. This was only seen for distances to the compotype center-of-mass, and not to randomly selected compositions. A similar correspondence was found when comparing the quasispecies time dependent dynamics towards steady state. Further, changing the error rate by modifying basal assembly joining rate of GARD kinetics was found to display an error catastrophe, similar to the standard quasispecies model. Additional augmentation of compositional mutations leads to the complete disappearance of the master-like composition. Our results show that compositional assemblies, as simulated by the GARD formalism, portray significant attributes of quasispecies dynamics. This expands the applicability of the quasispecies model beyond sequence-based entities, and potentially enhances validity of GARD as a model for prebiotic evolution.

  9. Determinants of spontaneous mutation in the bacterium Escherichia coli as revealed by whole-genome sequencing

    PubMed Central

    Foster, Patricia L.; Lee, Heewook; Popodi, Ellen; Townes, Jesse P.; Tang, Haixu

    2015-01-01

    A complete understanding of evolutionary processes requires that factors determining spontaneous mutation rates and spectra be identified and characterized. Using mutation accumulation followed by whole-genome sequencing, we found that the mutation rates of three widely diverged commensal Escherichia coli strains differ only by about 50%, suggesting that a rate of 1–2 × 10−3 mutations per generation per genome is common for this bacterium. Four major forces are postulated to contribute to spontaneous mutations: intrinsic DNA polymerase errors, endogenously induced DNA damage, DNA damage caused by exogenous agents, and the activities of error-prone polymerases. To determine the relative importance of these factors, we studied 11 strains, each defective for a major DNA repair pathway. The striking result was that only loss of the ability to prevent or repair oxidative DNA damage significantly impacted mutation rates or spectra. These results suggest that, with the exception of oxidative damage, endogenously induced DNA damage does not perturb the overall accuracy of DNA replication in normally growing cells and that repair pathways may exist primarily to defend against exogenously induced DNA damage. The thousands of mutations caused by oxidative damage recovered across the entire genome revealed strong local-sequence biases of these mutations. Specifically, we found that the identity of the 3′ base can affect the mutability of a purine by oxidative damage by as much as eightfold. PMID:26460006

  10. Viral quasispecies inference from 454 pyrosequencing

    PubMed Central

    2013-01-01

    Background Many potentially life-threatening infectious viruses are highly mutable in nature. Characterizing the fittest variants within a quasispecies from infected patients is expected to allow unprecedented opportunities to investigate the relationship between quasispecies diversity and disease epidemiology. The advent of next-generation sequencing technologies has allowed the study of virus diversity with high-throughput sequencing, although these methods come with higher rates of errors which can artificially increase diversity. Results Here we introduce a novel computational approach that incorporates base quality scores from next-generation sequencers for reconstructing viral genome sequences that simultaneously infers the number of variants within a quasispecies that are present. Comparisons on simulated and clinical data on dengue virus suggest that the novel approach provides a more accurate inference of the underlying number of variants within the quasispecies, which is vital for clinical efforts in mapping the within-host viral diversity. Sequence alignments generated by our approach are also found to exhibit lower rates of error. Conclusions The ability to infer the viral quasispecies colony that is present within a human host provides the potential for a more accurate classification of the viral phenotype. Understanding the genomics of viruses will be relevant not just to studying how to control or even eradicate these viral infectious diseases, but also in learning about the innate protection in the human host against the viruses. PMID:24308284

  11. Learning by subtraction: Hippocampal activity and effects of ethanol during the acquisition and performance of response sequences.

    PubMed

    Ketchum, Myles J; Weyand, Theodore G; Weed, Peter F; Winsauer, Peter J

    2016-05-01

    Learning is believed to be reflected in the activity of the hippocampus. However, neural correlates of learning have been difficult to characterize because hippocampal activity is integrated with ongoing behavior. To address this issue, male rats (n = 5) implanted with electrodes (n = 14) in the CA1 subfield responded during two tasks within a single test session. In one task, subjects acquired a new 3-response sequence (acquisition), whereas in the other task, subjects completed a well-rehearsed 3-response sequence (performance). Both tasks though could be completed using an identical response topography and used the same sensory stimuli and schedule of reinforcement. More important, comparing neural patterns during sequence acquisition to those during sequence performance allows for a subtractive approach whereby activity associated with learning could potentially be dissociated from the activity associated with ongoing behavior. At sites where CA1 activity was closely associated with behavior, the patterns of activity were differentially modulated by key position and the serial position of a response within the schedule of reinforcement. Temporal shifts between peak activity and responding on particular keys also occurred during sequence acquisition, but not during sequence performance. Ethanol disrupted CA1 activity while producing rate-decreasing effects in both tasks and error-increasing effects that were more selective for sequence acquisition than sequence performance. Ethanol also produced alterations in the magnitude of modulations and temporal pattern of CA1 activity, although these effects were not selective for sequence acquisition. Similar to ethanol, hippocampal micro-stimulation decreased response rate in both tasks and selectively increased the percentage of errors during sequence acquisition, and provided a more direct demonstration of hippocampal involvement during sequence acquisition. Together, these results strongly support the notion that ethanol disrupts sequence acquisition by disrupting hippocampal activity and that the hippocampus is necessary for the conditioned associations required for sequence acquisition. © 2015 Wiley Periodicals, Inc.

  12. Learning by Subtraction: Hippocampal Activity and Effects of Ethanol during the Acquisition and Performance of Response Sequences

    PubMed Central

    Ketchum, Myles J.; Weyand, Theodore G.; Weed, Peter F.; Winsauer, Peter J.

    2015-01-01

    Learning is believed to be reflected in the activity of the hippocampus. However, neural correlates of learning have been difficult to characterize because hippocampal activity is integrated with ongoing behavior. To address this issue, male rats (n=5) implanted with electrodes (n=14) in the CA1 subfield responded during two tasks within a single test session. In one task, subjects acquired a new 3-response sequence (acquisition), whereas in the other task, subjects completed a well-rehearsed 3-response sequence (performance). Both tasks though could be completed using an identical response topography and used the same sensory stimuli and schedule of reinforcement. More important, comparing neural patterns during sequence acquisition to those during sequence performance allows for a subtractive approach whereby activity associated with learning could potentially be dissociated from the activity associated with ongoing behavior. At sites where CA1 activity was closely associated with behavior, the patterns of activity were differentially modulated by key position and the serial position of a response within the schedule of reinforcement. Temporal shifts between peak activity and responding on particular keys also occurred during sequence acquisition, but not during sequence performance. Ethanol disrupted CA1 activity while producing rate-decreasing effects in both tasks and error-increasing effects that were more selective for sequence acquisition than sequence performance. Ethanol also produced alterations in the magnitude of modulations and temporal pattern of CA1 activity, although these effects were not selective for sequence acquisition. Similar to ethanol, hippocampal micro-stimulation decreased response rate in both tasks and selectively increased the percentage of errors during sequence acquisition, and provided a more direct demonstration of hippocampal involvement during sequence acquisition. Together, these results strongly support the notion that ethanol disrupts sequence acquisition by disrupting hippocampal activity and that the hippocampus is necessary for the conditioned associations required for sequence acquisition. PMID:26482846

  13. Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction

    PubMed Central

    Laehnemann, David; Borkhardt, Arndt

    2016-01-01

    Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. PMID:26026159

  14. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentousmore » ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.« less

  15. English speech sound development in preschool-aged children from bilingual English-Spanish environments.

    PubMed

    Gildersleeve-Neumann, Christina E; Kester, Ellen S; Davis, Barbara L; Peña, Elizabeth D

    2008-07-01

    English speech acquisition by typically developing 3- to 4-year-old children with monolingual English was compared to English speech acquisition by typically developing 3- to 4-year-old children with bilingual English-Spanish backgrounds. We predicted that exposure to Spanish would not affect the English phonetic inventory but would increase error frequency and type in bilingual children. Single-word speech samples were collected from 33 children. Phonetically transcribed samples for the 3 groups (monolingual English children, English-Spanish bilingual children who were predominantly exposed to English, and English-Spanish bilingual children with relatively equal exposure to English and Spanish) were compared at 2 time points and for change over time for phonetic inventory, phoneme accuracy, and error pattern frequencies. Children demonstrated similar phonetic inventories. Some bilingual children produced Spanish phonemes in their English and produced few consonant cluster sequences. Bilingual children with relatively equal exposure to English and Spanish averaged more errors than did bilingual children who were predominantly exposed to English. Both bilingual groups showed higher error rates than English-only children overall, particularly for syllable-level error patterns. All language groups decreased in some error patterns, although the ones that decreased were not always the same across language groups. Some group differences of error patterns and accuracy were significant. Vowel error rates did not differ by language group. Exposure to English and Spanish may result in a higher English error rate in typically developing bilinguals, including the application of Spanish phonological properties to English. Slightly higher error rates are likely typical for bilingual preschool-aged children. Change over time at these time points for all 3 groups was similar, suggesting that all will reach an adult-like system in English with exposure and practice.

  16. Context and meter enhance long-range planning in music performance

    PubMed Central

    Mathias, Brian; Pfordresher, Peter Q.; Palmer, Caroline

    2015-01-01

    Neural responses demonstrate evidence of resonance, or oscillation, during the production of periodic auditory events. Music contains periodic auditory events that give rise to a sense of beat, which in turn generates a sense of meter on the basis of multiple periodicities. Metrical hierarchies may aid memory for music by facilitating similarity-based associations among sequence events at different periodic distances that unfold in longer contexts. A fundamental question is how metrical associations arising from a musical context influence memory during music performance. Longer contexts may facilitate metrical associations at higher hierarchical levels more than shorter contexts, a prediction of the range model, a formal model of planning processes in music performance (Palmer and Pfordresher, 2003; Pfordresher et al., 2007). Serial ordering errors, in which intended sequence events are produced in incorrect sequence positions, were measured as skilled pianists performed musical pieces that contained excerpts embedded in long or short musical contexts. Pitch errors arose from metrically similar positions and further sequential distances more often when the excerpt was embedded in long contexts compared to short contexts. Musicians’ keystroke intensities and error rates also revealed influences of metrical hierarchies, which differed for performances in long and short contexts. The range model accounted for contextual effects and provided better fits to empirical findings when metrical associations between sequence events were included. Longer sequence contexts may facilitate planning during sequence production by increasing conceptual similarity between hierarchically associated events. These findings are consistent with the notion that neural oscillations at multiple periodicities may strengthen metrical associations across sequence events during planning. PMID:25628550

  17. The Spectrum of Replication Errors in the Absence of Error Correction Assayed Across the Whole Genome of Escherichia coli.

    PubMed

    Niccum, Brittany A; Lee, Heewook; MohammedIsmail, Wazim; Tang, Haixu; Foster, Patricia L

    2018-06-15

    When the DNA polymerase that replicates the Escherichia coli chromosome, DNA Pol III, makes an error, there are two primary defenses against mutation: proofreading by the epsilon subunit of the holoenzyme and mismatch repair. In proofreading deficient strains, mismatch repair is partially saturated and the cell's response to DNA damage, the SOS response, may be partially induced. To investigate the nature of replication errors, we used mutation accumulation experiments and whole genome sequencing to determine mutation rates and mutational spectra across the entire chromosome of strains deficient in proofreading, mismatch repair, and the SOS response. We report that a proofreading-deficient strain has a mutation rate 4,000-fold greater than wild-type strains. While the SOS response may be induced in these cells, it does not contribute to the mutational load. Inactivating mismatch repair in a proofreading-deficient strain increases the mutation rate another 1.5-fold. DNA polymerase has a bias for converting G:C to A:T base pairs, but proofreading reduces the impact of these mutations, helping to maintain the genomic G:C content. These findings give an unprecedented view of how polymerase and error-correction pathways work together to maintain E. coli' s low mutation rate of 1 per thousand generations. Copyright © 2018, Genetics.

  18. SOPRA: Scaffolding algorithm for paired reads via statistical optimization.

    PubMed

    Dayarian, Adel; Michael, Todd P; Sengupta, Anirvan M

    2010-06-24

    High throughput sequencing (HTS) platforms produce gigabases of short read (<100 bp) data per run. While these short reads are adequate for resequencing applications, de novo assembly of moderate size genomes from such reads remains a significant challenge. These limitations could be partially overcome by utilizing mate pair technology, which provides pairs of short reads separated by a known distance along the genome. We have developed SOPRA, a tool designed to exploit the mate pair/paired-end information for assembly of short reads. The main focus of the algorithm is selecting a sufficiently large subset of simultaneously satisfiable mate pair constraints to achieve a balance between the size and the quality of the output scaffolds. Scaffold assembly is presented as an optimization problem for variables associated with vertices and with edges of the contig connectivity graph. Vertices of this graph are individual contigs with edges drawn between contigs connected by mate pairs. Similar graph problems have been invoked in the context of shotgun sequencing and scaffold building for previous generation of sequencing projects. However, given the error-prone nature of HTS data and the fundamental limitations from the shortness of the reads, the ad hoc greedy algorithms used in the earlier studies are likely to lead to poor quality results in the current context. SOPRA circumvents this problem by treating all the constraints on equal footing for solving the optimization problem, the solution itself indicating the problematic constraints (chimeric/repetitive contigs, etc.) to be removed. The process of solving and removing of constraints is iterated till one reaches a core set of consistent constraints. For SOLiD sequencer data, SOPRA uses a dynamic programming approach to robustly translate the color-space assembly to base-space. For assessing the quality of an assembly, we report the no-match/mismatch error rate as well as the rates of various rearrangement errors. Applying SOPRA to real data from bacterial genomes, we were able to assemble contigs into scaffolds of significant length (N50 up to 200 Kb) with very few errors introduced in the process. In general, the methodology presented here will allow better scaffold assemblies of any type of mate pair sequencing data.

  19. Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.

    PubMed

    Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

    2013-01-01

    Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.

  20. An accurate algorithm for the detection of DNA fragments from dilution pool sequencing experiments.

    PubMed

    Bansal, Vikas

    2018-01-01

    The short read lengths of current high-throughput sequencing technologies limit the ability to recover long-range haplotype information. Dilution pool methods for preparing DNA sequencing libraries from high molecular weight DNA fragments enable the recovery of long DNA fragments from short sequence reads. These approaches require computational methods for identifying the DNA fragments using aligned sequence reads and assembling the fragments into long haplotypes. Although a number of computational methods have been developed for haplotype assembly, the problem of identifying DNA fragments from dilution pool sequence data has not received much attention. We formulate the problem of detecting DNA fragments from dilution pool sequencing experiments as a genome segmentation problem and develop an algorithm that uses dynamic programming to optimize a likelihood function derived from a generative model for the sequence reads. This algorithm uses an iterative approach to automatically infer the mean background read depth and the number of fragments in each pool. Using simulated data, we demonstrate that our method, FragmentCut, has 25-30% greater sensitivity compared with an HMM based method for fragment detection and can also detect overlapping fragments. On a whole-genome human fosmid pool dataset, the haplotypes assembled using the fragments identified by FragmentCut had greater N50 length, 16.2% lower switch error rate and 35.8% lower mismatch error rate compared with two existing methods. We further demonstrate the greater accuracy of our method using two additional dilution pool datasets. FragmentCut is available from https://bansal-lab.github.io/software/FragmentCut. vibansal@ucsd.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  1. Sequence data - Magnitude and implications of some ambiguities.

    NASA Technical Reports Server (NTRS)

    Holmquist, R.; Jukes, T. H.

    1972-01-01

    A stochastic model is applied to the divergence of the horse-pig lineage from a common ansestor in terms of the alpha and beta chains of hemoglobin and fibrinopeptides. The results are compared with those based on the minimum mutation distance model of Fitch (1972). Buckwheat and cauliflower cytochrome c sequences are analyzed to demonstrate their ambiguities. A comparative analysis of evolutionary rates for various proteins of horses and pigs shows that errors of considerable magnitude are introduced by Glx and Asx ambiguities into evolutionary conclusions drawn from sequences of incompletely analyzed proteins.

  2. Local alignment of two-base encoded DNA sequence

    PubMed Central

    Homer, Nils; Merriman, Barry; Nelson, Stanley F

    2009-01-01

    Background DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. Results We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions. Conclusion The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data. PMID:19508732

  3. A study of high density bit transition requirements versus the effects on BCH error correcting coding

    NASA Technical Reports Server (NTRS)

    Ingels, F.; Schoggen, W. O.

    1981-01-01

    Several methods for increasing bit transition densities in a data stream are summarized, discussed in detail, and compared against constraints imposed by the 2 MHz data link of the space shuttle high rate multiplexer unit. These methods include use of alternate pulse code modulation waveforms, data stream modification by insertion, alternate bit inversion, differential encoding, error encoding, and use of bit scramblers. The psuedo-random cover sequence generator was chosen for application to the 2 MHz data link of the space shuttle high rate multiplexer unit. This method is fully analyzed and a design implementation proposed.

  4. Thermodynamic Basis for the Emergence of Genomes during Prebiotic Evolution

    PubMed Central

    Woo, Hyung-June; Vijaya Satya, Ravi; Reifman, Jaques

    2012-01-01

    The RNA world hypothesis views modern organisms as descendants of RNA molecules. The earliest RNA molecules must have been random sequences, from which the first genomes that coded for polymerase ribozymes emerged. The quasispecies theory by Eigen predicts the existence of an error threshold limiting genomic stability during such transitions, but does not address the spontaneity of changes. Following a recent theoretical approach, we applied the quasispecies theory combined with kinetic/thermodynamic descriptions of RNA replication to analyze the collective behavior of RNA replicators based on known experimental kinetics data. We find that, with increasing fidelity (relative rate of base-extension for Watson-Crick versus mismatched base pairs), replications without enzymes, with ribozymes, and with protein-based polymerases are above, near, and below a critical point, respectively. The prebiotic evolution therefore must have crossed this critical region. Over large regions of the phase diagram, fitness increases with increasing fidelity, biasing random drifts in sequence space toward ‘crystallization.’ This region encloses the experimental nonenzymatic fidelity value, favoring evolutions toward polymerase sequences with ever higher fidelity, despite error rates above the error catastrophe threshold. Our work shows that experimentally characterized kinetics and thermodynamics of RNA replication allow us to determine the physicochemical conditions required for the spontaneous crystallization of biological information. Our findings also suggest that among many potential oligomers capable of templated replication, RNAs may have evolved to form prebiotic genomes due to the value of their nonenzymatic fidelity. PMID:22693440

  5. Genome-Wide Spectra of Transcription Insertions and Deletions Reveal That Slippage Depends on RNA:DNA Hybrid Complementarity

    PubMed Central

    Traverse, Charles C.

    2017-01-01

    ABSTRACT Advances in sequencing technologies have enabled direct quantification of genome-wide errors that occur during RNA transcription. These errors occur at rates that are orders of magnitude higher than rates during DNA replication, but due to technical difficulties such measurements have been limited to single-base substitutions and have not yet quantified the scope of transcription insertions and deletions. Previous reporter gene assay findings suggested that transcription indels are produced exclusively by elongation complex slippage at homopolymeric runs, so we enumerated indels across the protein-coding transcriptomes of Escherichia coli and Buchnera aphidicola, which differ widely in their genomic base compositions and incidence of repeat regions. As anticipated from prior assays, transcription insertions prevailed in homopolymeric runs of A and T; however, transcription deletions arose in much more complex sequences and were rarely associated with homopolymeric runs. By reconstructing the relocated positions of the elongation complex as inferred from the sequences inserted or deleted during transcription, we show that continuation of transcription after slippage hinges on the degree of nucleotide complementarity within the RNA:DNA hybrid at the new DNA template location. PMID:28851848

  6. Effect of lethality on the extinction and on the error threshold of quasispecies.

    PubMed

    Tejero, Hector; Marín, Arturo; Montero, Francisco

    2010-02-21

    In this paper the effect of lethality on error threshold and extinction has been studied in a population of error-prone self-replicating molecules. For given lethality and a simple fitness landscape, three dynamic regimes can be obtained: quasispecies, error catastrophe, and extinction. Using a simple model in which molecules are classified as master, lethal and non-lethal mutants, it is possible to obtain the mutation rates of the transitions between the three regimes analytically. The numerical resolution of the extended model, in which molecules are classified depending on their Hamming distance to the master sequence, confirms the results obtained in the simple model and shows how an error catastrophe regime changes when lethality is taken in account. (c) 2009 Elsevier Ltd. All rights reserved.

  7. Genotype calling from next-generation sequencing data using haplotype information of reads

    PubMed Central

    Zhi, Degui; Wu, Jihua; Liu, Nianjun; Zhang, Kui

    2012-01-01

    Motivation: Low coverage sequencing provides an economic strategy for whole genome sequencing. When sequencing a set of individuals, genotype calling can be challenging due to low sequencing coverage. Linkage disequilibrium (LD) based refinement of genotyping calling is essential to improve the accuracy. Current LD-based methods use read counts or genotype likelihoods at individual potential polymorphic sites (PPSs). Reads that span multiple PPSs (jumping reads) can provide additional haplotype information overlooked by current methods. Results: In this article, we introduce a new Hidden Markov Model (HMM)-based method that can take into account jumping reads information across adjacent PPSs and implement it in the HapSeq program. Our method extends the HMM in Thunder and explicitly models jumping reads information as emission probabilities conditional on the states of adjacent PPSs. Our simulation results show that, compared to Thunder, HapSeq reduces the genotyping error rate by 30%, from 0.86% to 0.60%. The results from the 1000 Genomes Project show that HapSeq reduces the genotyping error rate by 12 and 9%, from 2.24% and 2.76% to 1.97% and 2.50% for individuals with European and African ancestry, respectively. We expect our program can improve genotyping qualities of the large number of ongoing and planned whole genome sequencing projects. Contact: dzhi@ms.soph.uab.edu; kzhang@ms.soph.uab.edu Availability: The software package HapSeq and its manual can be found and downloaded at www.ssg.uab.edu/hapseq/. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22285565

  8. Proteomic Identification of Monoclonal Antibodies from Serum

    PubMed Central

    2015-01-01

    Characterizing the in vivo dynamics of the polyclonal antibody repertoire in serum, such as that which might arise in response to stimulation with an antigen, is difficult due to the presence of many highly similar immunoglobulin proteins, each specified by distinct B lymphocytes. These challenges have precluded the use of conventional mass spectrometry for antibody identification based on peptide mass spectral matches to a genomic reference database. Recently, progress has been made using bottom-up analysis of serum antibodies by nanoflow liquid chromatography/high-resolution tandem mass spectrometry combined with a sample-specific antibody sequence database generated by high-throughput sequencing of individual B cell immunoglobulin variable domains (V genes). Here, we describe how intrinsic features of antibody primary structure, most notably the interspersed segments of variable and conserved amino acid sequences, generate recurring patterns in the corresponding peptide mass spectra of V gene peptides, greatly complicating the assignment of correct sequences to mass spectral data. We show that the standard method of decoy-based error modeling fails to account for the error introduced by these highly similar sequences, leading to a significant underestimation of the false discovery rate. Because of these effects, antibody-derived peptide mass spectra require increased stringency in their interpretation. The use of filters based on the mean precursor ion mass accuracy of peptide-spectrum matches is shown to be particularly effective in distinguishing between “true” and “false” identifications. These findings highlight important caveats associated with the use of standard database search and error-modeling methods with nonstandard data sets and custom sequence databases. PMID:24684310

  9. Accounting for Errors in Low Coverage High-Throughput Sequencing Data When Constructing Genetic Maps Using Biparental Outcrossed Populations

    PubMed Central

    Bilton, Timothy P.; Schofield, Matthew R.; Black, Michael A.; Chagné, David; Wilcox, Phillip L.; Dodds, Ken G.

    2018-01-01

    Next-generation sequencing is an efficient method that allows for substantially more markers than previous technologies, providing opportunities for building high-density genetic linkage maps, which facilitate the development of nonmodel species’ genomic assemblies and the investigation of their genes. However, constructing genetic maps using data generated via high-throughput sequencing technology (e.g., genotyping-by-sequencing) is complicated by the presence of sequencing errors and genotyping errors resulting from missing parental alleles due to low sequencing depth. If unaccounted for, these errors lead to inflated genetic maps. In addition, map construction in many species is performed using full-sibling family populations derived from the outcrossing of two individuals, where unknown parental phase and varying segregation types further complicate construction. We present a new methodology for modeling low coverage sequencing data in the construction of genetic linkage maps using full-sibling populations of diploid species, implemented in a package called GUSMap. Our model is based on the Lander–Green hidden Markov model but extended to account for errors present in sequencing data. We were able to obtain accurate estimates of the recombination fractions and overall map distance using GUSMap, while most existing mapping packages produced inflated genetic maps in the presence of errors. Our results demonstrate the feasibility of using low coverage sequencing data to produce genetic maps without requiring extensive filtering of potentially erroneous genotypes, provided that the associated errors are correctly accounted for in the model. PMID:29487138

  10. Accounting for Errors in Low Coverage High-Throughput Sequencing Data When Constructing Genetic Maps Using Biparental Outcrossed Populations.

    PubMed

    Bilton, Timothy P; Schofield, Matthew R; Black, Michael A; Chagné, David; Wilcox, Phillip L; Dodds, Ken G

    2018-05-01

    Next-generation sequencing is an efficient method that allows for substantially more markers than previous technologies, providing opportunities for building high-density genetic linkage maps, which facilitate the development of nonmodel species' genomic assemblies and the investigation of their genes. However, constructing genetic maps using data generated via high-throughput sequencing technology ( e.g. , genotyping-by-sequencing) is complicated by the presence of sequencing errors and genotyping errors resulting from missing parental alleles due to low sequencing depth. If unaccounted for, these errors lead to inflated genetic maps. In addition, map construction in many species is performed using full-sibling family populations derived from the outcrossing of two individuals, where unknown parental phase and varying segregation types further complicate construction. We present a new methodology for modeling low coverage sequencing data in the construction of genetic linkage maps using full-sibling populations of diploid species, implemented in a package called GUSMap. Our model is based on the Lander-Green hidden Markov model but extended to account for errors present in sequencing data. We were able to obtain accurate estimates of the recombination fractions and overall map distance using GUSMap, while most existing mapping packages produced inflated genetic maps in the presence of errors. Our results demonstrate the feasibility of using low coverage sequencing data to produce genetic maps without requiring extensive filtering of potentially erroneous genotypes, provided that the associated errors are correctly accounted for in the model. Copyright © 2018 Bilton et al.

  11. Qualitative and quantitative assessment of Illumina's forensic STR and SNP kits on MiSeq FGx™.

    PubMed

    Sharma, Vishakha; Chow, Hoi Yan; Siegel, Donald; Wurmbach, Elisa

    2017-01-01

    Massively parallel sequencing (MPS) is a powerful tool transforming DNA analysis in multiple fields ranging from medicine, to environmental science, to evolutionary biology. In forensic applications, MPS offers the ability to significantly increase the discriminatory power of human identification as well as aid in mixture deconvolution. However, before the benefits of any new technology can be employed, a thorough evaluation of its quality, consistency, sensitivity, and specificity must be rigorously evaluated in order to gain a detailed understanding of the technique including sources of error, error rates, and other restrictions/limitations. This extensive study assessed the performance of Illumina's MiSeq FGx MPS system and ForenSeq™ kit in nine experimental runs including 314 reaction samples. In-depth data analysis evaluated the consequences of different assay conditions on test results. Variables included: sample numbers per run, targets per run, DNA input per sample, and replications. Results are presented as heat maps revealing patterns for each locus. Data analysis focused on read numbers (allele coverage), drop-outs, drop-ins, and sequence analysis. The study revealed that loci with high read numbers performed better and resulted in fewer drop-outs and well balanced heterozygous alleles. Several loci were prone to drop-outs which led to falsely typed homozygotes and therefore to genotype errors. Sequence analysis of allele drop-in typically revealed a single nucleotide change (deletion, insertion, or substitution). Analyses of sequences, no template controls, and spurious alleles suggest no contamination during library preparation, pooling, and sequencing, but indicate that sequencing or PCR errors may have occurred due to DNA polymerase infidelities. Finally, we found utilizing Illumina's FGx System at recommended conditions does not guarantee 100% outcomes for all samples tested, including the positive control, and required manual editing due to low read numbers and/or allele drop-in. These findings are important for progressing towards implementation of MPS in forensic DNA testing.

  12. Temporal regularization of ultrasound-based liver motion estimation for image-guided radiation therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    O’Shea, Tuathan P., E-mail: tuathan.oshea@icr.ac.uk; Bamber, Jeffrey C.; Harris, Emma J.

    Purpose: Ultrasound-based motion estimation is an expanding subfield of image-guided radiation therapy. Although ultrasound can detect tissue motion that is a fraction of a millimeter, its accuracy is variable. For controlling linear accelerator tracking and gating, ultrasound motion estimates must remain highly accurate throughout the imaging sequence. This study presents a temporal regularization method for correlation-based template matching which aims to improve the accuracy of motion estimates. Methods: Liver ultrasound sequences (15–23 Hz imaging rate, 2.5–5.5 min length) from ten healthy volunteers under free breathing were used. Anatomical features (blood vessels) in each sequence were manually annotated for comparison withmore » normalized cross-correlation based template matching. Five sequences from a Siemens Acuson™ scanner were used for algorithm development (training set). Results from incremental tracking (IT) were compared with a temporal regularization method, which included a highly specific similarity metric and state observer, known as the α–β filter/similarity threshold (ABST). A further five sequences from an Elekta Clarity™ system were used for validation, without alteration of the tracking algorithm (validation set). Results: Overall, the ABST method produced marked improvements in vessel tracking accuracy. For the training set, the mean and 95th percentile (95%) errors (defined as the difference from manual annotations) were 1.6 and 1.4 mm, respectively (compared to 6.2 and 9.1 mm, respectively, for IT). For each sequence, the use of the state observer leads to improvement in the 95% error. For the validation set, the mean and 95% errors for the ABST method were 0.8 and 1.5 mm, respectively. Conclusions: Ultrasound-based motion estimation has potential to monitor liver translation over long time periods with high accuracy. Nonrigid motion (strain) and the quality of the ultrasound data are likely to have an impact on tracking performance. A future study will investigate spatial uniformity of motion and its effect on the motion estimation errors.« less

  13. Detection of microRNAs in color space.

    PubMed

    Marco, Antonio; Griffiths-Jones, Sam

    2012-02-01

    Deep sequencing provides inexpensive opportunities to characterize the transcriptional diversity of known genomes. The AB SOLiD technology generates millions of short sequencing reads in color-space; that is, the raw data is a sequence of colors, where each color represents 2 nt and each nucleotide is represented by two consecutive colors. This strategy is purported to have several advantages, including increased ability to distinguish sequencing errors from polymorphisms. Several programs have been developed to map short reads to genomes in color space. However, a number of previously unexplored technical issues arise when using SOLiD technology to characterize microRNAs. Here we explore these technical difficulties. First, since the sequenced reads are longer than the biological sequences, every read is expected to contain linker fragments. The color-calling error rate increases toward the 3(') end of the read such that recognizing the linker sequence for removal becomes problematic. Second, mapping in color space may lead to the loss of the first nucleotide of each read. We propose a sequential trimming and mapping approach to map small RNAs. Using our strategy, we reanalyze three published insect small RNA deep sequencing datasets and characterize 22 new microRNAs. A bash shell script to perform the sequential trimming and mapping procedure, called SeqTrimMap, is available at: http://www.mirbase.org/tools/seqtrimmap/ antonio.marco@manchester.ac.uk Supplementary data are available at Bioinformatics online.

  14. Comparing errors in ED computer-assisted vs conventional pediatric drug dosing and administration.

    PubMed

    Yamamoto, Loren; Kanemori, Joan

    2010-06-01

    Compared to fixed-dose single-vial drug administration in adults, pediatric drug dosing and administration requires a series of calculations, all of which are potentially error prone. The purpose of this study is to compare error rates and task completion times for common pediatric medication scenarios using computer program assistance vs conventional methods. Two versions of a 4-part paper-based test were developed. Each part consisted of a set of medication administration and/or dosing tasks. Emergency department and pediatric intensive care unit nurse volunteers completed these tasks using both methods (sequence assigned to start with a conventional or a computer-assisted approach). Completion times, errors, and the reason for the error were recorded. Thirty-eight nurses completed the study. Summing the completion of all 4 parts, the mean conventional total time was 1243 seconds vs the mean computer program total time of 879 seconds (P < .001). The conventional manual method had a mean of 1.8 errors vs the computer program with a mean of 0.7 errors (P < .001). Of the 97 total errors, 36 were due to misreading the drug concentration on the label, 34 were due to calculation errors, and 8 were due to misplaced decimals. Of the 36 label interpretation errors, 18 (50%) occurred with digoxin or insulin. Computerized assistance reduced errors and the time required for drug administration calculations. A pattern of errors emerged, noting that reading/interpreting certain drug labels were more error prone. Optimizing the layout of drug labels could reduce the error rate for error-prone labels. Copyright (c) 2010 Elsevier Inc. All rights reserved.

  15. High-Frame-Rate Doppler Ultrasound Using a Repeated Transmit Sequence

    PubMed Central

    Podkowa, Anthony S.; Oelze, Michael L.; Ketterling, Jeffrey A.

    2018-01-01

    The maximum detectable velocity of high-frame-rate color flow Doppler ultrasound is limited by the imaging frame rate when using coherent compounding techniques. Traditionally, high quality ultrasonic images are produced at a high frame rate via coherent compounding of steered plane wave reconstructions. However, this compounding operation results in an effective downsampling of the slow-time signal, thereby artificially reducing the frame rate. To alleviate this effect, a new transmit sequence is introduced where each transmit angle is repeated in succession. This transmit sequence allows for direct comparison between low resolution, pre-compounded frames at a short time interval in ways that are resistent to sidelobe motion. Use of this transmit sequence increases the maximum detectable velocity by a scale factor of the transmit sequence length. The performance of this new transmit sequence was evaluated using a rotating cylindrical phantom and compared with traditional methods using a 15-MHz linear array transducer. Axial velocity estimates were recorded for a range of ±300 mm/s and compared to the known ground truth. Using these new techniques, the root mean square error was reduced from over 400 mm/s to below 50 mm/s in the high-velocity regime compared to traditional techniques. The standard deviation of the velocity estimate in the same velocity range was reduced from 250 mm/s to 30 mm/s. This result demonstrates the viability of the repeated transmit sequence methods in detecting and quantifying high-velocity flow. PMID:29910966

  16. Kinetics and thermodynamics of exonuclease-deficient DNA polymerases

    NASA Astrophysics Data System (ADS)

    Gaspard, Pierre

    2016-04-01

    A kinetic theory is developed for exonuclease-deficient DNA polymerases, based on the experimental observation that the rates depend not only on the newly incorporated nucleotide, but also on the previous one, leading to the growth of Markovian DNA sequences from a Bernoullian template. The dependencies on nucleotide concentrations and template sequence are explicitly taken into account. In this framework, the kinetic and thermodynamic properties of DNA replication, in particular, the mean growth velocity, the error probability, and the entropy production are calculated analytically in terms of the rate constants and the concentrations. Theory is compared with numerical simulations for the DNA polymerases of T7 viruses and human mitochondria.

  17. Elucidation of cross-species proteomic effects in human and hominin bone proteome identification through a bioinformatics experiment.

    PubMed

    Welker, F

    2018-02-20

    The study of ancient protein sequences is increasingly focused on the analysis of older samples, including those of ancient hominins. The analysis of such ancient proteomes thereby potentially suffers from "cross-species proteomic effects": the loss of peptide and protein identifications at increased evolutionary distances due to a larger number of protein sequence differences between the database sequence and the analyzed organism. Error-tolerant proteomic search algorithms should theoretically overcome this problem at both the peptide and protein level; however, this has not been demonstrated. If error-tolerant searches do not overcome the cross-species proteomic issue then there might be inherent biases in the identified proteomes. Here, a bioinformatics experiment is performed to test this using a set of modern human bone proteomes and three independent searches against sequence databases at increasing evolutionary distances: the human (0 Ma), chimpanzee (6-8 Ma) and orangutan (16-17 Ma) reference proteomes, respectively. Incorrectly suggested amino acid substitutions are absent when employing adequate filtering criteria for mutable Peptide Spectrum Matches (PSMs), but roughly half of the mutable PSMs were not recovered. As a result, peptide and protein identification rates are higher in error-tolerant mode compared to non-error-tolerant searches but did not recover protein identifications completely. Data indicates that peptide length and the number of mutations between the target and database sequences are the main factors influencing mutable PSM identification. The error-tolerant results suggest that the cross-species proteomics problem is not overcome at increasing evolutionary distances, even at the protein level. Peptide and protein loss has the potential to significantly impact divergence dating and proteome comparisons when using ancient samples as there is a bias towards the identification of conserved sequences and proteins. Effects are minimized between moderately divergent proteomes, as indicated by almost complete recovery of informative positions in the search against the chimpanzee proteome (≈90%, 6-8 Ma). This provides a bioinformatic background to future phylogenetic and proteomic analysis of ancient hominin proteomes, including the future description of novel hominin amino acid sequences, but also has negative implications for the study of fast-evolving proteins in hominins, non-hominin animals, and ancient bacterial proteins in evolutionary contexts.

  18. Processing Dynamic Image Sequences from a Moving Sensor.

    DTIC Science & Technology

    1984-02-01

    65 Roadsign Image Sequence ..... ................ ... 70 Roadsign Sequence with Redundant Features .. ........ . 79 Roadsign Subimage...Selected Feature Error Values .. ........ 66 2c. Industrial Image Selected Feature Local Search Values. .. .... 67 3ab. Roadsign Image Error Values...72 3c. Roadsign Image Local Search Values ............. 73 4ab. Roadsign Redundant Feature Error Values. ............ 8 4c. Roadsign

  19. Are special read alignment strategies necessary and cost-effective when handling sequencing reads from patient-derived tumor xenografts?

    PubMed

    Tso, Kai-Yuen; Lee, Sau Dan; Lo, Kwok-Wai; Yip, Kevin Y

    2014-12-23

    Patient-derived tumor xenografts in mice are widely used in cancer research and have become important in developing personalized therapies. When these xenografts are subject to DNA sequencing, the samples could contain various amounts of mouse DNA. It has been unclear how the mouse reads would affect data analyses. We conducted comprehensive simulations to compare three alignment strategies at different mutation rates, read lengths, sequencing error rates, human-mouse mixing ratios and sequenced regions. We also sequenced a nasopharyngeal carcinoma xenograft and a cell line to test how the strategies work on real data. We found the "filtering" and "combined reference" strategies performed better than aligning reads directly to human reference in terms of alignment and variant calling accuracies. The combined reference strategy was particularly good at reducing false negative variants calls without significantly increasing the false positive rate. In some scenarios the performance gain of these two special handling strategies was too small for special handling to be cost-effective, but it was found crucial when false non-synonymous SNVs should be minimized, especially in exome sequencing. Our study systematically analyzes the effects of mouse contamination in the sequencing data of human-in-mouse xenografts. Our findings provide information for designing data analysis pipelines for these data.

  20. Sequencing artifacts in the type A influenza databases and attempts to correct them.

    PubMed

    Suarez, David L; Chester, Nikki; Hatfield, Jason

    2014-07-01

    There are over 276 000 influenza gene sequences in public databases, with the quality of the sequences determined by the contributor. As part of a high school class project, influenza sequences with possible errors were identified in the public databases based on the size of the gene being longer than expected, with the hypothesis that these sequences would have an error. Students contacted sequence submitters alerting them of the possible sequence issue(s) and requested they the suspect sequence(s) be correct as appropriate. Type A influenza viruses were screened, and gene segments longer than the accepted size were identified for further analysis. Attention was placed on sequences with additional nucleotides upstream or downstream of the highly conserved non-coding ends of the viral segments. A total of 1081 sequences were identified that met this criterion. Three types of errors were commonly observed: non-influenza primer sequence wasn't removed from the sequence; PCR product was cloned and plasmid sequence was included in the sequence; and Taq polymerase added an adenine at the end of the PCR product. Internal insertions of nucleotide sequence were also commonly observed, but in many cases it was unclear if the sequence was correct or actually contained an error. A total of 215 sequences, or 22.8% of the suspect sequences, were corrected in the public databases in the first year of the student project. Unfortunately 138 additional sequences with possible errors were added to the databases in the second year. Additional awareness of the need for data integrity of sequences submitted to public databases is needed to fully reap the benefits of these large data sets. © 2014 The Authors. Influenza and Other Respiratory Viruses Published by John Wiley & Sons Ltd.

  1. Evolution at increased error rate leads to the coexistence of multiple adaptive pathways in an RNA virus.

    PubMed

    Cabanillas, Laura; Arribas, María; Lázaro, Ester

    2013-01-16

    When beneficial mutations present in different genomes spread simultaneously in an asexual population, their fixation can be delayed due to competition among them. This interference among mutations is mainly determined by the rate of beneficial mutations, which in turn depends on the population size, the total error rate, and the degree of adaptation of the population. RNA viruses, with their large population sizes and high error rates, are good candidates to present a great extent of interference. To test this hypothesis, in the current study we have investigated whether competition among beneficial mutations was responsible for the prolonged presence of polymorphisms in the mutant spectrum of an RNA virus, the bacteriophage Qβ, evolved during a large number of generations in the presence of the mutagenic nucleoside analogue 5-azacytidine. The analysis of the mutant spectra of bacteriophage Qβ populations evolved at artificially increased error rate shows a large number of polymorphic mutations, some of them with demonstrated selective value. Polymorphisms distributed into several evolutionary lines that can compete among them, making it difficult the emergence of a defined consensus sequence. The presence of accompanying deleterious mutations, the high degree of recurrence of the polymorphic mutations, and the occurrence of epistatic interactions generate a highly complex interference dynamics. Interference among beneficial mutations in bacteriophage Qβ evolved at increased error rate permits the coexistence of multiple adaptive pathways that can provide selective advantages by different molecular mechanisms. In this way, interference can be seen as a positive factor that allows the exploration of the different local maxima that exist in rugged fitness landscapes.

  2. Word and frame synchronization with verification for PPM optical communications

    NASA Technical Reports Server (NTRS)

    Marshall, William K.

    1986-01-01

    A method for obtaining word and frame synchronization in pulse position modulated optical communication systems is described. The method uses a short sync sequence inserted at the beginning of each data frame and a verification procedure to distinguish between inserted and randomly occurring sequences at the receiver. This results in an easy to implement sync system which provides reliable synchronization even at high symbol error rates. Results are given for the application of this approach to a highly energy efficient 256-ary PPM test system.

  3. An approach enabling adaptive FEC for OFDM in fiber-VLLC system

    NASA Astrophysics Data System (ADS)

    Wei, Yiran; He, Jing; Deng, Rui; Shi, Jin; Chen, Shenghai; Chen, Lin

    2017-12-01

    In this paper, we propose an orthogonal circulant matrix transform (OCT)-based adaptive frame-level-forward error correction (FEC) scheme for fiber-visible laser light communication (VLLC) system and experimentally demonstrate by Reed-Solomon (RS) Code. In this method, no extra bits are spent for adaptive message, except training sequence (TS), which is simultaneously used for synchronization and channel estimation. Therefore, RS-coding can be adaptively performed frames by frames via the last received codeword-error-rate (CER) feedback estimated by the TSs of the previous few OFDM frames. In addition, the experimental results exhibit that over 20 km standard single-mode fiber (SSMF) and 8 m visible light transmission, the costs of RS codewords are at most 14.12% lower than those of conventional adaptive subcarrier-RS-code based 16-QAM OFDM at bit error rate (BER) of 10-5.

  4. High-fidelity target sequencing of individual molecules identified using barcode sequences: de novo detection and absolute quantitation of mutations in plasma cell-free DNA from cancer patients.

    PubMed

    Kukita, Yoji; Matoba, Ryo; Uchida, Junji; Hamakawa, Takuya; Doki, Yuichiro; Imamura, Fumio; Kato, Kikuya

    2015-08-01

    Circulating tumour DNA (ctDNA) is an emerging field of cancer research. However, current ctDNA analysis is usually restricted to one or a few mutation sites due to technical limitations. In the case of massively parallel DNA sequencers, the number of false positives caused by a high read error rate is a major problem. In addition, the final sequence reads do not represent the original DNA population due to the global amplification step during the template preparation. We established a high-fidelity target sequencing system of individual molecules identified in plasma cell-free DNA using barcode sequences; this system consists of the following two steps. (i) A novel target sequencing method that adds barcode sequences by adaptor ligation. This method uses linear amplification to eliminate the errors introduced during the early cycles of polymerase chain reaction. (ii) The monitoring and removal of erroneous barcode tags. This process involves the identification of individual molecules that have been sequenced and for which the number of mutations have been absolute quantitated. Using plasma cell-free DNA from patients with gastric or lung cancer, we demonstrated that the system achieved near complete elimination of false positives and enabled de novo detection and absolute quantitation of mutations in plasma cell-free DNA. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  5. Effect of Pointing Error on the BER Performance of an Optical CDMA FSO Link with SIK Receiver

    NASA Astrophysics Data System (ADS)

    Nazrul Islam, A. K. M.; Majumder, S. P.

    2017-12-01

    An analytical approach is presented for an optical code division multiple access (OCDMA) system over free space optical (FSO) channel considering the effect of pointing error between the transmitter and the receiver. Analysis is carried out with an optical sequence inverse keying (SIK) correlator receiver with intensity modulation and direct detection (IM/DD) to find the bit error rate (BER) with pointing error. The results are evaluated numerically in terms of signal-to-noise plus multi-access interference (MAI) ratio, BER and power penalty due to pointing error. It is noticed that the OCDMA FSO system is highly affected by pointing error with significant power penalty at a BER of 10-6 and 10-9. For example, penalty at BER 10-9 is found to be 9 dB corresponding to normalized pointing error of 1.4 for 16 users with processing gain of 256 and is reduced to 6.9 dB when the processing gain is increased to 1,024.

  6. Prefrontal neural correlates of memory for sequences.

    PubMed

    Averbeck, Bruno B; Lee, Daeyeol

    2007-02-28

    The sequence of actions appropriate to solve a problem often needs to be discovered by trial and error and recalled in the future when faced with the same problem. Here, we show that when monkeys had to discover and then remember a sequence of decisions across trials, ensembles of prefrontal cortex neurons reflected the sequence of decisions the animal would make throughout the interval between trials. This signal could reflect either an explicit memory process or a sequence-planning process that begins far in advance of the actual sequence execution. This finding extended to error trials such that, when the neural activity during the intertrial interval specified the wrong sequence, the animal also attempted to execute an incorrect sequence. More specifically, we used a decoding analysis to predict the sequence the monkey was planning to execute at the end of the fore-period, just before sequence execution. When this analysis was applied to error trials, we were able to predict where in the sequence the error would occur, up to three movements into the future. This suggests that prefrontal neural activity can retain information about sequences between trials, and that regardless of whether information is remembered correctly or incorrectly, the prefrontal activity veridically reflects the animal's action plan.

  7. New architecture for dynamic frame-skipping transcoder.

    PubMed

    Fung, Kai-Tat; Chan, Yui-Lam; Siu, Wan-Chi

    2002-01-01

    Transcoding is a key technique for reducing the bit rate of a previously compressed video signal. A high transcoding ratio may result in an unacceptable picture quality when the full frame rate of the incoming video bitstream is used. Frame skipping is often used as an efficient scheme to allocate more bits to the representative frames, so that an acceptable quality for each frame can be maintained. However, the skipped frame must be decompressed completely, which might act as a reference frame to nonskipped frames for reconstruction. The newly quantized discrete cosine transform (DCT) coefficients of the prediction errors need to be re-computed for the nonskipped frame with reference to the previous nonskipped frame; this can create undesirable complexity as well as introduce re-encoding errors. In this paper, we propose new algorithms and a novel architecture for frame-rate reduction to improve picture quality and to reduce complexity. The proposed architecture is mainly performed on the DCT domain to achieve a transcoder with low complexity. With the direct addition of DCT coefficients and an error compensation feedback loop, re-encoding errors are reduced significantly. Furthermore, we propose a frame-rate control scheme which can dynamically adjust the number of skipped frames according to the incoming motion vectors and re-encoding errors due to transcoding such that the decoded sequence can have a smooth motion as well as better transcoded pictures. Experimental results show that, as compared to the conventional transcoder, the new architecture for frame-skipping transcoder is more robust, produces fewer requantization errors, and has reduced computational complexity.

  8. Evaluation of exome variants using the Ion Proton Platform to sequence error-prone regions.

    PubMed

    Seo, Heewon; Park, Yoomi; Min, Byung Joo; Seo, Myung Eui; Kim, Ju Han

    2017-01-01

    The Ion Proton sequencer from Thermo Fisher accurately determines sequence variants from target regions with a rapid turnaround time at a low cost. However, misleading variant-calling errors can occur. We performed a systematic evaluation and manual curation of read-level alignments for the 675 ultrarare variants reported by the Ion Proton sequencer from 27 whole-exome sequencing data but that are not present in either the 1000 Genomes Project and the Exome Aggregation Consortium. We classified positive variant calls into 393 highly likely false positives, 126 likely false positives, and 156 likely true positives, which comprised 58.2%, 18.7%, and 23.1% of the variants, respectively. We identified four distinct error patterns of variant calling that may be bioinformatically corrected when using different strategies: simplicity region, SNV cluster, peripheral sequence read, and base inversion. Local de novo assembly successfully corrected 201 (38.7%) of the 519 highly likely or likely false positives. We also demonstrate that the two sequencing kits from Thermo Fisher (the Ion PI Sequencing 200 kit V3 and the Ion PI Hi-Q kit) exhibit different error profiles across different error types. A refined calling algorithm with better polymerase may improve the performance of the Ion Proton sequencing platform.

  9. Sequence-structure mapping errors in the PDB: OB-fold domains

    PubMed Central

    Venclovas, Česlovas; Ginalski, Krzysztof; Kang, Chulhee

    2004-01-01

    The Protein Data Bank (PDB) is the single most important repository of structural data for proteins and other biologically relevant molecules. Therefore, it is critically important to keep the PDB data, as much as possible, error-free. In this study, we have analyzed PDB crystal structures possessing oligonucleotide/oligosaccharide binding (OB)-fold, one of the highly populated folds, for the presence of sequence-structure mapping errors. Using energy-based structure quality assessment coupled with sequence analyses, we have found that there are at least five OB-structures in the PDB that have regions where sequences have been incorrectly mapped onto the structure. We have demonstrated that the combination of these computation techniques is effective not only in detecting sequence-structure mapping errors, but also in providing guidance to correct them. Namely, we have used results of computational analysis to direct a revision of X-ray data for one of the PDB entries containing a fairly inconspicuous sequence-structure mapping error. The revised structure has been deposited with the PDB. We suggest use of computational energy assessment and sequence analysis techniques to facilitate structure determination when homologs having known structure are available to use as a reference. Such computational analysis may be useful in either guiding the sequence-structure assignment process or verifying the sequence mapping within poorly defined regions. PMID:15133161

  10. Mutation at a distance caused by homopolymeric guanine repeats in Saccharomyces cerevisiae

    PubMed Central

    McDonald, Michael J.; Yu, Yen-Hsin; Guo, Jheng-Fen; Chong, Shin Yen; Kao, Cheng-Fu; Leu, Jun-Yi

    2016-01-01

    Mutation provides the raw material from which natural selection shapes adaptations. The rate at which new mutations arise is therefore a key factor that determines the tempo and mode of evolution. However, an accurate assessment of the mutation rate of a given organism is difficult because mutation rate varies on a fine scale within a genome. A central challenge of evolutionary genetics is to determine the underlying causes of this variation. In earlier work, we had shown that repeat sequences not only are prone to a high rate of expansion and contraction but also can cause an increase in mutation rate (on the order of kilobases) of the sequence surrounding the repeat. We perform experiments that show that simple guanine repeats 13 bp (base pairs) in length or longer (G13+) increase the substitution rate 4- to 18-fold in the downstream DNA sequence, and this correlates with DNA replication timing (R = 0.89). We show that G13+ mutagenicity results from the interplay of both error-prone translesion synthesis and homologous recombination repair pathways. The mutagenic repeats that we study have the potential to be exploited for the artificial elevation of mutation rate in systems biology and synthetic biology applications. PMID:27386516

  11. OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis.

    PubMed

    Verzotto, Davide; M Teo, Audrey S; Hillmer, Axel M; Nagarajan, Niranjan

    2016-01-01

    Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging because of the lack of efficient and sensitive map-alignment algorithms for robustly aligning error-prone maps to sequences. We introduce a novel seed-and-extend glocal (short for global-local) alignment method, OPTIMA (and a sliding-window extension for overlap alignment, OPTIMA-Overlap), which is the first to create indexes for continuous-valued mapping data while accounting for mapping errors. We also present a novel statistical model, agnostic with respect to technology-dependent error rates, for conservatively evaluating the significance of alignments without relying on expensive permutation-based tests. We show that OPTIMA and OPTIMA-Overlap outperform other state-of-the-art approaches (1.6-2 times more sensitive) and are more efficient (170-200 %) and precise in their alignments (nearly 99 % precision). These advantages are independent of the quality of the data, suggesting that our indexing approach and statistical evaluation are robust, provide improved sensitivity and guarantee high precision.

  12. Genome-Wide Spectra of Transcription Insertions and Deletions Reveal That Slippage Depends on RNA:DNA Hybrid Complementarity.

    PubMed

    Traverse, Charles C; Ochman, Howard

    2017-08-29

    Advances in sequencing technologies have enabled direct quantification of genome-wide errors that occur during RNA transcription. These errors occur at rates that are orders of magnitude higher than rates during DNA replication, but due to technical difficulties such measurements have been limited to single-base substitutions and have not yet quantified the scope of transcription insertions and deletions. Previous reporter gene assay findings suggested that transcription indels are produced exclusively by elongation complex slippage at homopolymeric runs, so we enumerated indels across the protein-coding transcriptomes of Escherichia coli and Buchnera aphidicola , which differ widely in their genomic base compositions and incidence of repeat regions. As anticipated from prior assays, transcription insertions prevailed in homopolymeric runs of A and T; however, transcription deletions arose in much more complex sequences and were rarely associated with homopolymeric runs. By reconstructing the relocated positions of the elongation complex as inferred from the sequences inserted or deleted during transcription, we show that continuation of transcription after slippage hinges on the degree of nucleotide complementarity within the RNA:DNA hybrid at the new DNA template location. IMPORTANCE The high level of mistakes generated during transcription can result in the accumulation of malfunctioning and misfolded proteins which can alter global gene regulation and in the expenditure of energy to degrade these nonfunctional proteins. The transcriptome-wide occurrence of base substitutions has been elucidated in bacteria, but information on transcription insertions and deletions-errors that potentially have more dire effects on protein function-is limited to reporter gene constructs. Here, we capture the transcriptome-wide spectrum of insertions and deletions in Escherichia coli and Buchnera aphidicola and show that they occur at rates approaching those of base substitutions. Knowledge of the full extent of sequences subject to transcription indels supports a new model of bacterial transcription slippage, one that relies on the number of complementary bases between the transcript and the DNA template to which it slipped. Copyright © 2017 Traverse and Ochman.

  13. Discrimination of germline V genes at different sequencing lengths and mutational burdens: A new tool for identifying and evaluating the reliability of V gene assignment.

    PubMed

    Zhang, Bochao; Meng, Wenzhao; Prak, Eline T Luning; Hershberg, Uri

    2015-12-01

    Immune repertoires are collections of lymphocytes that express diverse antigen receptor gene rearrangements consisting of Variable (V), (Diversity (D) in the case of heavy chains) and Joining (J) gene segments. Clonally related cells typically share the same germline gene segments and have highly similar junctional sequences within their third complementarity determining regions. Identifying clonal relatedness of sequences is a key step in the analysis of immune repertoires. The V gene is the most important for clone identification because it has the longest sequence and the greatest number of sequence variants. However, accurate identification of a clone's germline V gene source is challenging because there is a high degree of similarity between different germline V genes. This difficulty is compounded in antibodies, which can undergo somatic hypermutation. Furthermore, high-throughput sequencing experiments often generate partial sequences and have significant error rates. To address these issues, we describe a novel method to estimate which germline V genes (or alleles) cannot be discriminated under different conditions (read lengths, sequencing errors or somatic hypermutation frequencies). Starting with any set of germline V genes, this method measures their similarity using different sequencing lengths and calculates their likelihood of unambiguous assignment under different levels of mutation. Hence, one can identify, under different experimental and biological conditions, the germline V genes (or alleles) that cannot be uniquely identified and bundle them together into groups of specific V genes with highly similar sequences. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. Time-resolved fluorescence imaging of slab gels for lifetime base-calling in DNA sequencing applications.

    PubMed

    Lassiter, S J; Stryjewski, W; Legendre, B L; Erdmann, R; Wahl, M; Wurm, J; Peterson, R; Middendorf, L; Soper, S A

    2000-11-01

    A compact time-resolved near-IR fluorescence imager was constructed to obtain lifetime and intensity images of DNA sequencing slab gels. The scanner consisted of a microscope body with f/1.2 relay optics onto which was mounted a pulsed diode laser (repetition rate 80 MHz, lasing wavelength 680 nm, average power 5 mW), filtering optics, and a large photoactive area (diameter 500 microns) single-photon avalanche diode that was actively quenched to provide a large dynamic operating range. The time-resolved data were processed using electronics configured in a conventional time-correlated single-photon-counting format with all of the counting hardware situated on a PC card resident on the computer bus. The microscope head produced a timing response of 450 ps (fwhm) in a scanning mode, allowing the measurement of subnano-second lifetimes. The time-resolved microscope head was placed in an automated DNA sequencer and translated across a 21-cm-wide gel plate in approximately 6 s (scan rate 3.5 cm/s) with an accumulation time per pixel of 10 ms. The sampling frequency was 0.17 Hz (duty cycle 0.0017), sufficient to prevent signal aliasing during the electrophoresis separation. Software (written in Visual Basic) allowed acquisition of both the intensity image and lifetime analysis of DNA bands migrating through the gel in real time. Using a dual-labeling (IRD700 and Cy5.5 labeling dyes)/two-lane sequencing strategy, we successfully read 670 bases of a control M13mp18 ssDNA template using lifetime identification. Comparison of the reconstructed sequence with the known sequence of the phage indicated the number of miscalls was only 2, producing an error rate of approximately 0.3% (identification accuracy 99.7%). The lifetimes were calculated using maximum likelihood estimators and allowed on-line determinations with high precision, even when short integration times were used to construct the decay profiles. Comparison of the lifetime base calling to a single-dye/four-lane sequencing strategy indicated similar results in terms of miscalls, but reduced insertion and deletion errors using lifetime identification methods, improving the overall read accuracy.

  15. Neural Representations Used by Brain Regions Underlying Speech Production

    ERIC Educational Resources Information Center

    Segawa, Jennifer Anne

    2013-01-01

    Speech utterances are phoneme sequences but may not always be represented as such in the brain. For instance, electropalatography evidence indicates that as speaking rate increases, gestures within syllables are manipulated separately but those within consonant clusters act as one motor unit. Moreover, speech error data suggest that a syllable's…

  16. Mutagenesis during plant responses to UVB radiation.

    PubMed

    Holá, M; Vágnerová, R; Angelis, K J

    2015-08-01

    We tested an idea that induced mutagenesis due to unrepaired DNA lesions, here the UV photoproducts, underlies the impact of UVB irradiation on plant phenotype. For this purpose we used protonemal culture of the moss Physcomitrella patens with 50% of apical cells, which mimics actively growing tissue, the most vulnerable stage for the induction of mutations. We measured the UVB mutation rate of various moss lines with defects in DNA repair (pplig4, ppku70, pprad50, ppmre11), and in selected clones resistant to 2-Fluoroadenine, which were mutated in the adenosine phosphotrasferase gene (APT), we analysed induced mutations by sequencing. In parallel we followed DNA break repair and removal of cyclobutane pyrimidine dimers with a half-life τ = 4 h 14 min determined by comet assay combined with UV dimer specific T4 endonuclease V. We show that UVB induces massive, sequence specific, error-prone bypass repair that is responsible for a high mutation rate owing to relatively slow, though error-free, removal of photoproducts by nucleotide excision repair (NER). Copyright © 2014 Elsevier Masson SAS. All rights reserved.

  17. Model parameter estimation approach based on incremental analysis for lithium-ion batteries without using open circuit voltage

    NASA Astrophysics Data System (ADS)

    Wu, Hongjie; Yuan, Shifei; Zhang, Xi; Yin, Chengliang; Ma, Xuerui

    2015-08-01

    To improve the suitability of lithium-ion battery model under varying scenarios, such as fluctuating temperature and SoC variation, dynamic model with parameters updated realtime should be developed. In this paper, an incremental analysis-based auto regressive exogenous (I-ARX) modeling method is proposed to eliminate the modeling error caused by the OCV effect and improve the accuracy of parameter estimation. Then, its numerical stability, modeling error, and parametric sensitivity are analyzed at different sampling rates (0.02, 0.1, 0.5 and 1 s). To identify the model parameters recursively, a bias-correction recursive least squares (CRLS) algorithm is applied. Finally, the pseudo random binary sequence (PRBS) and urban dynamic driving sequences (UDDSs) profiles are performed to verify the realtime performance and robustness of the newly proposed model and algorithm. Different sampling rates (1 Hz and 10 Hz) and multiple temperature points (5, 25, and 45 °C) are covered in our experiments. The experimental and simulation results indicate that the proposed I-ARX model can present high accuracy and suitability for parameter identification without using open circuit voltage.

  18. Anticipatory activity in primary motor cortex codes memorized movement sequences.

    PubMed

    Lu, Xiaofeng; Ashe, James

    2005-03-24

    Movement sequences, defined both by the component movements and by the serial order in which they are produced, are fundamental building blocks of motor behavior. The serial order of sequence production is strongly encoded in medial motor areas. It is not known to what extent sequences are further elaborated or encoded in primary motor cortex. Here, we describe cells in the primary motor cortex of the monkey that show anticipatory activity exclusively related to a specific memorized sequence of upcoming movements. In addition, the injection of muscimol, a GABA agonist, into motor cortex resulted in an increase in the error rate during sequence production, without concomitant effects on nonsequenced motor performance. Our results challenge the role of medial motor areas in the control of well-practiced movement sequences and suggest that motor cortex contains a complete apparatus for the planning and production of this complex behavior.

  19. Sensorimotor synchronization with tempo-changing auditory sequences: Modeling temporal adaptation and anticipation.

    PubMed

    van der Steen, M C Marieke; Jacoby, Nori; Fairhurst, Merle T; Keller, Peter E

    2015-11-11

    The current study investigated the human ability to synchronize movements with event sequences containing continuous tempo changes. This capacity is evident, for example, in ensemble musicians who maintain precise interpersonal coordination while modulating the performance tempo for expressive purposes. Here we tested an ADaptation and Anticipation Model (ADAM) that was developed to account for such behavior by combining error correction processes (adaptation) with a predictive temporal extrapolation process (anticipation). While previous computational models of synchronization incorporate error correction, they do not account for prediction during tempo-changing behavior. The fit between behavioral data and computer simulations based on four versions of ADAM was assessed. These versions included a model with adaptation only, one in which adaptation and anticipation act in combination (error correction is applied on the basis of predicted tempo changes), and two models in which adaptation and anticipation were linked in a joint module that corrects for predicted discrepancies between the outcomes of adaptive and anticipatory processes. The behavioral experiment required participants to tap their finger in time with three auditory pacing sequences containing tempo changes that differed in the rate of change and the number of turning points. Behavioral results indicated that sensorimotor synchronization accuracy and precision, while generally high, decreased with increases in the rate of tempo change and number of turning points. Simulations and model-based parameter estimates showed that adaptation mechanisms alone could not fully explain the observed precision of sensorimotor synchronization. Including anticipation in the model increased the precision of simulated sensorimotor synchronization and improved the fit of model to behavioral data, especially when adaptation and anticipation mechanisms were linked via a joint module based on the notion of joint internal models. Overall results suggest that adaptation and anticipation mechanisms both play an important role during sensorimotor synchronization with tempo-changing sequences. This article is part of a Special Issue entitled SI: Prediction and Attention. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy

    PubMed Central

    2017-01-01

    Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes that are increasingly used in high-throughput sequencing experiments. Through a UMI, identical copies arising from distinct molecules can be distinguished from those arising through PCR amplification of the same molecule. However, bioinformatic methods to leverage the information from UMIs have yet to be formalized. In particular, sequencing errors in the UMI sequence are often ignored or else resolved in an ad hoc manner. We show that errors in the UMI sequence are common and introduce network-based methods to account for these errors when identifying PCR duplicates. Using these methods, we demonstrate improved quantification accuracy both under simulated conditions and real iCLIP and single-cell RNA-seq data sets. Reproducibility between iCLIP replicates and single-cell RNA-seq clustering are both improved using our proposed network-based method, demonstrating the value of properly accounting for errors in UMIs. These methods are implemented in the open source UMI-tools software package. PMID:28100584

  1. Simultaneous message framing and error detection

    NASA Technical Reports Server (NTRS)

    Frey, A. H., Jr.

    1968-01-01

    Circuitry simultaneously inserts message framing information and detects noise errors in binary code data transmissions. Separate message groups are framed without requiring both framing bits and error-checking bits, and predetermined message sequence are separated from other message sequences without being hampered by intervening noise.

  2. On the error probability of general tree and trellis codes with applications to sequential decoding

    NASA Technical Reports Server (NTRS)

    Johannesson, R.

    1973-01-01

    An upper bound on the average error probability for maximum-likelihood decoding of the ensemble of random binary tree codes is derived and shown to be independent of the length of the tree. An upper bound on the average error probability for maximum-likelihood decoding of the ensemble of random L-branch binary trellis codes of rate R = 1/n is derived which separates the effects of the tail length T and the memory length M of the code. It is shown that the bound is independent of the length L of the information sequence. This implication is investigated by computer simulations of sequential decoding utilizing the stack algorithm. These simulations confirm the implication and further suggest an empirical formula for the true undetected decoding error probability with sequential decoding.

  3. Simulated Assessment of Interference Effects in Direct Sequence Spread Spectrum (DSSS) QPSK Receiver

    DTIC Science & Technology

    2014-03-27

    bit error rate BPSK binary phase shift keying CDMA code division multiple access CSI comb spectrum interference CW continuous wave DPSK differential... CDMA ) and GPS systems which is a Gold code. This code is generated by a modulo-2 operation between two different preferred m-sequences. The preferred m...10 SNR Sim (dB) S N R O ut ( dB ) SNR RF SNR DS Figure 3.26: Comparison of input S NRS im and S NROut of the band-pass RF filter (S NRRF) and

  4. Use of four next-generation sequencing platforms to determine HIV-1 coreceptor tropism.

    PubMed

    Archer, John; Weber, Jan; Henry, Kenneth; Winner, Dane; Gibson, Richard; Lee, Lawrence; Paxinos, Ellen; Arts, Eric J; Robertson, David L; Mimms, Larry; Quiñones-Mateu, Miguel E

    2012-01-01

    HIV-1 coreceptor tropism assays are required to rule out the presence of CXCR4-tropic (non-R5) viruses prior treatment with CCR5 antagonists. Phenotypic (e.g., Trofile™, Monogram Biosciences) and genotypic (e.g., population sequencing linked to bioinformatic algorithms) assays are the most widely used. Although several next-generation sequencing (NGS) platforms are available, to date all published deep sequencing HIV-1 tropism studies have used the 454™ Life Sciences/Roche platform. In this study, HIV-1 co-receptor usage was predicted for twelve patients scheduled to start a maraviroc-based antiretroviral regimen. The V3 region of the HIV-1 env gene was sequenced using four NGS platforms: 454™, PacBio® RS (Pacific Biosciences), Illumina®, and Ion Torrent™ (Life Technologies). Cross-platform variation was evaluated, including number of reads, read length and error rates. HIV-1 tropism was inferred using Geno2Pheno, Web PSSM, and the 11/24/25 rule and compared with Trofile™ and virologic response to antiretroviral therapy. Error rates related to insertions/deletions (indels) and nucleotide substitutions introduced by the four NGS platforms were low compared to the actual HIV-1 sequence variation. Each platform detected all major virus variants within the HIV-1 population with similar frequencies. Identification of non-R5 viruses was comparable among the four platforms, with minor differences attributable to the algorithms used to infer HIV-1 tropism. All NGS platforms showed similar concordance with virologic response to the maraviroc-based regimen (75% to 80% range depending on the algorithm used), compared to Trofile (80%) and population sequencing (70%). In conclusion, all four NGS platforms were able to detect minority non-R5 variants at comparable levels suggesting that any NGS-based method can be used to predict HIV-1 coreceptor usage.

  5. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lapidus, Alla L.

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly ofmore » whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.« less

  6. Choosing appropriate analysis methods for cluster randomised cross-over trials with a binary outcome.

    PubMed

    Morgan, Katy E; Forbes, Andrew B; Keogh, Ruth H; Jairath, Vipul; Kahan, Brennan C

    2017-01-30

    In cluster randomised cross-over (CRXO) trials, clusters receive multiple treatments in a randomised sequence over time. In such trials, there is usual correlation between patients in the same cluster. In addition, within a cluster, patients in the same period may be more similar to each other than to patients in other periods. We demonstrate that it is necessary to account for these correlations in the analysis to obtain correct Type I error rates. We then use simulation to compare different methods of analysing a binary outcome from a two-period CRXO design. Our simulations demonstrated that hierarchical models without random effects for period-within-cluster, which do not account for any extra within-period correlation, performed poorly with greatly inflated Type I errors in many scenarios. In scenarios where extra within-period correlation was present, a hierarchical model with random effects for cluster and period-within-cluster only had correct Type I errors when there were large numbers of clusters; with small numbers of clusters, the error rate was inflated. We also found that generalised estimating equations did not give correct error rates in any scenarios considered. An unweighted cluster-level summary regression performed best overall, maintaining an error rate close to 5% for all scenarios, although it lost power when extra within-period correlation was present, especially for small numbers of clusters. Results from our simulation study show that it is important to model both levels of clustering in CRXO trials, and that any extra within-period correlation should be accounted for. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  7. Dissociable effects of surprising rewards on learning and memory.

    PubMed

    Rouhani, Nina; Norman, Kenneth A; Niv, Yael

    2018-03-19

    Reward-prediction errors track the extent to which rewards deviate from expectations, and aid in learning. How do such errors in prediction interact with memory for the rewarding episode? Existing findings point to both cooperative and competitive interactions between learning and memory mechanisms. Here, we investigated whether learning about rewards in a high-risk context, with frequent, large prediction errors, would give rise to higher fidelity memory traces for rewarding events than learning in a low-risk context. Experiment 1 showed that recognition was better for items associated with larger absolute prediction errors during reward learning. Larger prediction errors also led to higher rates of learning about rewards. Interestingly we did not find a relationship between learning rate for reward and recognition-memory accuracy for items, suggesting that these two effects of prediction errors were caused by separate underlying mechanisms. In Experiment 2, we replicated these results with a longer task that posed stronger memory demands and allowed for more learning. We also showed improved source and sequence memory for items within the high-risk context. In Experiment 3, we controlled for the difficulty of reward learning in the risk environments, again replicating the previous results. Moreover, this control revealed that the high-risk context enhanced item-recognition memory beyond the effect of prediction errors. In summary, our results show that prediction errors boost both episodic item memory and incremental reward learning, but the two effects are likely mediated by distinct underlying systems. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  8. Single Versus Multiple Events Error Potential Detection in a BCI-Controlled Car Game With Continuous and Discrete Feedback.

    PubMed

    Kreilinger, Alex; Hiebel, Hannah; Müller-Putz, Gernot R

    2016-03-01

    This work aimed to find and evaluate a new method for detecting errors in continuous brain-computer interface (BCI) applications. Instead of classifying errors on a single-trial basis, the new method was based on multiple events (MEs) analysis to increase the accuracy of error detection. In a BCI-driven car game, based on motor imagery (MI), discrete events were triggered whenever subjects collided with coins and/or barriers. Coins counted as correct events, whereas barriers were errors. This new method, termed ME method, combined and averaged the classification results of single events (SEs) and determined the correctness of MI trials, which consisted of event sequences instead of SEs. The benefit of this method was evaluated in an offline simulation. In an online experiment, the new method was used to detect erroneous MI trials. Such MI trials were discarded and could be repeated by the users. We found that, even with low SE error potential (ErrP) detection rates, feasible accuracies can be achieved when combining MEs to distinguish erroneous from correct MI trials. Online, all subjects reached higher scores with error detection than without, at the cost of longer times needed for completing the game. Findings suggest that ErrP detection may become a reliable tool for monitoring continuous states in BCI applications when combining MEs. This paper demonstrates a novel technique for detecting errors in online continuous BCI applications, which yields promising results even with low single-trial detection rates.

  9. The Effects of Spatial Diversity and Imperfect Channel Estimation on Wideband MC-DS-CDMA and MC-CDMA

    DTIC Science & Technology

    2009-10-01

    In our previous work, we compared the theoretical bit error rates of multi-carrier direct sequence code division multiple access (MC- DS - CDMA ) and...consider only those cases where MC- CDMA has higher frequency diversity than MC- DS - CDMA . Since increases in diversity yield diminishing gains, we conclude

  10. Automatic mouse ultrasound detector (A-MUD): A new tool for processing rodent vocalizations.

    PubMed

    Zala, Sarah M; Reitschmidt, Doris; Noll, Anton; Balazs, Peter; Penn, Dustin J

    2017-01-01

    House mice (Mus musculus) emit complex ultrasonic vocalizations (USVs) during social and sexual interactions, which have features similar to bird song (i.e., they are composed of several different types of syllables, uttered in succession over time to form a pattern of sequences). Manually processing complex vocalization data is time-consuming and potentially subjective, and therefore, we developed an algorithm that automatically detects mouse ultrasonic vocalizations (Automatic Mouse Ultrasound Detector or A-MUD). A-MUD is a script that runs on STx acoustic software (S_TOOLS-STx version 4.2.2), which is free for scientific use. This algorithm improved the efficiency of processing USV files, as it was 4-12 times faster than manual segmentation, depending upon the size of the file. We evaluated A-MUD error rates using manually segmented sound files as a 'gold standard' reference, and compared them to a commercially available program. A-MUD had lower error rates than the commercial software, as it detected significantly more correct positives, and fewer false positives and false negatives. The errors generated by A-MUD were mainly false negatives, rather than false positives. This study is the first to systematically compare error rates for automatic ultrasonic vocalization detection methods, and A-MUD and subsequent versions will be made available for the scientific community.

  11. High-order noise filtering in nontrivial quantum logic gates.

    PubMed

    Green, Todd; Uys, Hermann; Biercuk, Michael J

    2012-07-13

    Treating the effects of a time-dependent classical dephasing environment during quantum logic operations poses a theoretical challenge, as the application of noncommuting control operations gives rise to both dephasing and depolarization errors that must be accounted for in order to understand total average error rates. We develop a treatment based on effective Hamiltonian theory that allows us to efficiently model the effect of classical noise on nontrivial single-bit quantum logic operations composed of arbitrary control sequences. We present a general method to calculate the ensemble-averaged entanglement fidelity to arbitrary order in terms of noise filter functions, and provide explicit expressions to fourth order in the noise strength. In the weak noise limit we derive explicit filter functions for a broad class of piecewise-constant control sequences, and use them to study the performance of dynamically corrected gates, yielding good agreement with brute-force numerics.

  12. Antigenic Variation in the Lyme Spirochete: Insights into Recombinational Switching with a Suggested Role for Error-Prone Repair.

    PubMed

    Verhey, Theodore B; Castellanos, Mildred; Chaconas, George

    2018-05-29

    The Lyme disease spirochete, Borrelia burgdorferi, uses antigenic variation as a strategy to evade the host's acquired immune response. New variants of surface-localized VlsE are generated efficiently by unidirectional recombination from 15 unexpressed vls cassettes into the vlsE locus. Using algorithms to analyze switching from vlsE sequencing data, we characterize a population of over 45,000 inferred recombination events generated during mouse infection. We present evidence for clustering of these recombination events within the population and along the vlsE gene, a role for the direct repeats flanking the variable region in vlsE, and the importance of sequence homology in determining the location of recombination, despite RecA's dispensability. Finally, we report that non-templated sequence variation is strongly associated with recombinational switching and occurs predominantly at the 5' end of conversion tracts. This likely results from an error-prone repair mechanism operational during recombinational switching that elevates the mutation rate > 5,000-fold in switched regions. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.

  13. INTERDISCIPLINARY PHYSICS AND RELATED AREAS OF SCIENCE AND TECHNOLOGY: Relaxation Property and Stability Analysis of the Quasispecies Models

    NASA Astrophysics Data System (ADS)

    Feng, Xiao-Li; Li, Yu-Xiao; Gu, Jian-Zhong; Zhuo, Yi-Zhong

    2009-10-01

    The relaxation property of both Eigen model and Crow-Kimura model with a single peak fitness landscape is studied from phase transition point of view. We first analyze the eigenvalue spectra of the replication mutation matrices. For sufficiently long sequences, the almost crossing point between the largest and second-largest eigenvalues locates the error threshold at which critical slowing down behavior appears. We calculate the critical exponent in the limit of infinite sequence lengths and compare it with the result from numerical curve fittings at sufficiently long sequences. We find that for both models the relaxation time diverges with exponent 1 at the error (mutation) threshold point. Results obtained from both methods agree quite well. From the unlimited correlation length feature, the first order phase transition is further confirmed. Finally with linear stability theory, we show that the two model systems are stable for all ranges of mutation rate. The Eigen model is asymptotically stable in terms of mutant classes, and the Crow-Kimura model is completely stable.

  14. Optimized scheduling technique of null subcarriers for peak power control in 3GPP LTE downlink.

    PubMed

    Cho, Soobum; Park, Sang Kyu

    2014-01-01

    Orthogonal frequency division multiple access (OFDMA) is a key multiple access technique for the long term evolution (LTE) downlink. However, high peak-to-average power ratio (PAPR) can cause the degradation of power efficiency. The well-known PAPR reduction technique, dummy sequence insertion (DSI), can be a realistic solution because of its structural simplicity. However, the large usage of subcarriers for the dummy sequences may decrease the transmitted data rate in the DSI scheme. In this paper, a novel DSI scheme is applied to the LTE system. Firstly, we obtain the null subcarriers in single-input single-output (SISO) and multiple-input multiple-output (MIMO) systems, respectively; then, optimized dummy sequences are inserted into the obtained null subcarrier. Simulation results show that Walsh-Hadamard transform (WHT) sequence is the best for the dummy sequence and the ratio of 16 to 20 for the WHT and randomly generated sequences has the maximum PAPR reduction performance. The number of near optimal iteration is derived to prevent exhausted iterations. It is also shown that there is no bit error rate (BER) degradation with the proposed technique in LTE downlink system.

  15. Optimized Scheduling Technique of Null Subcarriers for Peak Power Control in 3GPP LTE Downlink

    PubMed Central

    Park, Sang Kyu

    2014-01-01

    Orthogonal frequency division multiple access (OFDMA) is a key multiple access technique for the long term evolution (LTE) downlink. However, high peak-to-average power ratio (PAPR) can cause the degradation of power efficiency. The well-known PAPR reduction technique, dummy sequence insertion (DSI), can be a realistic solution because of its structural simplicity. However, the large usage of subcarriers for the dummy sequences may decrease the transmitted data rate in the DSI scheme. In this paper, a novel DSI scheme is applied to the LTE system. Firstly, we obtain the null subcarriers in single-input single-output (SISO) and multiple-input multiple-output (MIMO) systems, respectively; then, optimized dummy sequences are inserted into the obtained null subcarrier. Simulation results show that Walsh-Hadamard transform (WHT) sequence is the best for the dummy sequence and the ratio of 16 to 20 for the WHT and randomly generated sequences has the maximum PAPR reduction performance. The number of near optimal iteration is derived to prevent exhausted iterations. It is also shown that there is no bit error rate (BER) degradation with the proposed technique in LTE downlink system. PMID:24883376

  16. Cognitive control adjustments in healthy older and younger adults: Conflict adaptation, the error-related negativity (ERN), and evidence of generalized decline with age.

    PubMed

    Larson, Michael J; Clayson, Peter E; Keith, Cierra M; Hunt, Isaac J; Hedges, Dawson W; Nielsen, Brent L; Call, Vaughn R A

    2016-03-01

    Older adults display alterations in neural reflections of conflict-related processing. We examined response times (RTs), error rates, and event-related potential (ERP; N2 and P3 components) indices of conflict adaptation (i.e., congruency sequence effects) a cognitive control process wherein previous-trial congruency influences current-trial performance, along with post-error slowing, correct-related negativity (CRN), error-related negativity (ERN) and error positivity (Pe) amplitudes in 65 healthy older adults and 94 healthy younger adults. Older adults showed generalized slowing, had decreased post-error slowing, and committed more errors than younger adults. Both older and younger adults showed conflict adaptation effects; magnitude of conflict adaptation did not differ by age. N2 amplitudes were similar between groups; younger, but not older, adults showed conflict adaptation effects for P3 component amplitudes. CRN and Pe, but not ERN, amplitudes differed between groups. Data support generalized declines in cognitive control processes in older adults without specific deficits in conflict adaptation. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. SUGAR: graphical user interface-based data refiner for high-throughput DNA sequencing.

    PubMed

    Sato, Yukuto; Kojima, Kaname; Nariai, Naoki; Yamaguchi-Kabata, Yumi; Kawai, Yosuke; Takahashi, Mamoru; Mimori, Takahiro; Nagasaki, Masao

    2014-08-08

    Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.

  18. [Transposition errors during learning to reproduce a sequence by the right- and the left-hand movements: simulation of positional and movement coding].

    PubMed

    Liakhovetskiĭ, V A; Bobrova, E V; Skopin, G N

    2012-01-01

    Transposition errors during the reproduction of a hand movement sequence make it possible to receive important information on the internal representation of this sequence in the motor working memory. Analysis of such errors showed that learning to reproduce sequences of the left-hand movements improves the system of positional coding (coding ofpositions), while learning of the right-hand movements improves the system of vector coding (coding of movements). Learning of the right-hand movements after the left-hand performance involved the system of positional coding "imposed" by the left hand. Learning of the left-hand movements after the right-hand performance activated the system of vector coding. Transposition errors during learning to reproduce movement sequences can be explained by neural network using either vector coding or both vector and positional coding.

  19. Covariance Matrix Estimation for Massive MIMO

    NASA Astrophysics Data System (ADS)

    Upadhya, Karthik; Vorobyov, Sergiy A.

    2018-04-01

    We propose a novel pilot structure for covariance matrix estimation in massive multiple-input multiple-output (MIMO) systems in which each user transmits two pilot sequences, with the second pilot sequence multiplied by a random phase-shift. The covariance matrix of a particular user is obtained by computing the sample cross-correlation of the channel estimates obtained from the two pilot sequences. This approach relaxes the requirement that all the users transmit their uplink pilots over the same set of symbols. We derive expressions for the achievable rate and the mean-squared error of the covariance matrix estimate when the proposed method is used with staggered pilots. The performance of the proposed method is compared with existing methods through simulations.

  20. Highly multiplexed targeted DNA sequencing from single nuclei.

    PubMed

    Leung, Marco L; Wang, Yong; Kim, Charissa; Gao, Ruli; Jiang, Jerry; Sei, Emi; Navin, Nicholas E

    2016-02-01

    Single-cell DNA sequencing methods are challenged by poor physical coverage, high technical error rates and low throughput. To address these issues, we developed a single-cell DNA sequencing protocol that combines flow-sorting of single nuclei, time-limited multiple-displacement amplification (MDA), low-input library preparation, DNA barcoding, targeted capture and next-generation sequencing (NGS). This approach represents a major improvement over our previous single nucleus sequencing (SNS) Nature Protocols paper in terms of generating higher-coverage data (>90%), thereby enabling the detection of genome-wide variants in single mammalian cells at base-pair resolution. Furthermore, by pooling 48-96 single-cell libraries together for targeted capture, this approach can be used to sequence many single-cell libraries in parallel in a single reaction. This protocol greatly reduces the cost of single-cell DNA sequencing, and it can be completed in 5-6 d by advanced users. This single-cell DNA sequencing protocol has broad applications for studying rare cells and complex populations in diverse fields of biological research and medicine.

  1. MeCorS: Metagenome-enabled error correction of single cell sequencing reads

    DOE PAGES

    Bremges, Andreas; Singer, Esther; Woyke, Tanja; ...

    2016-03-15

    Here we present a new tool, MeCorS, to correct chimeric reads and sequencing errors in Illumina data generated from single amplified genomes (SAGs). It uses sequence information derived from accompanying metagenome sequencing to accurately correct errors in SAG reads, even from ultra-low coverage regions. In evaluations on real data, we show that MeCorS outperforms BayesHammer, the most widely used state-of-the-art approach. MeCorS performs particularly well in correcting chimeric reads, which greatly improves both accuracy and contiguity of de novo SAG assemblies.

  2. European external quality control study on the competence of laboratories to recognize rare sequence variants resulting in unusual genotyping results.

    PubMed

    Márki-Zay, János; Klein, Christoph L; Gancberg, David; Schimmel, Heinz G; Dux, László

    2009-04-01

    Depending on the method used, rare sequence variants adjacent to the single nucleotide polymorphism (SNP) of interest may cause unusual or erroneous genotyping results. Because such rare variants are known for many genes commonly tested in diagnostic laboratories, we organized a proficiency study to assess their influence on the accuracy of reported laboratory results. Four external quality control materials were processed and sent to 283 laboratories through 3 EQA organizers for analysis of the prothrombin 20210G>A mutation. Two of these quality control materials contained sequence variants introduced by site-directed mutagenesis. One hundred eighty-nine laboratories participated in the study. When samples gave a usual result with the method applied, the error rate was 5.1%. Detailed analysis showed that more than 70% of the failures were reported from only 9 laboratories. Allele-specific amplification-based PCR had a much higher error rate than other methods (18.3% vs 2.9%). The variants 20209C>T and [20175T>G; 20179_20180delAC] resulted in unusual genotyping results in 67 and 85 laboratories, respectively. Eighty-three (54.6%) of these unusual results were not recognized, 32 (21.1%) were attributed to technical issues, and only 37 (24.3%) were recognized as another sequence variant. Our findings revealed that some of the participating laboratories were not able to recognize and correctly interpret unusual genotyping results caused by rare SNPs. Our study indicates that the majority of the failures could be avoided by improved training and careful selection and validation of the methods applied.

  3. A Swiss cheese error detection method for real-time EPID-based quality assurance and error prevention.

    PubMed

    Passarge, Michelle; Fix, Michael K; Manser, Peter; Stampanoni, Marco F M; Siebers, Jeffrey V

    2017-04-01

    To develop a robust and efficient process that detects relevant dose errors (dose errors of ≥5%) in external beam radiation therapy and directly indicates the origin of the error. The process is illustrated in the context of electronic portal imaging device (EPID)-based angle-resolved volumetric-modulated arc therapy (VMAT) quality assurance (QA), particularly as would be implemented in a real-time monitoring program. A Swiss cheese error detection (SCED) method was created as a paradigm for a cine EPID-based during-treatment QA. For VMAT, the method compares a treatment plan-based reference set of EPID images with images acquired over each 2° gantry angle interval. The process utilizes a sequence of independent consecutively executed error detection tests: an aperture check that verifies in-field radiation delivery and ensures no out-of-field radiation; output normalization checks at two different stages; global image alignment check to examine if rotation, scaling, and translation are within tolerances; pixel intensity check containing the standard gamma evaluation (3%, 3 mm) and pixel intensity deviation checks including and excluding high dose gradient regions. Tolerances for each check were determined. To test the SCED method, 12 different types of errors were selected to modify the original plan. A series of angle-resolved predicted EPID images were artificially generated for each test case, resulting in a sequence of precalculated frames for each modified treatment plan. The SCED method was applied multiple times for each test case to assess the ability to detect introduced plan variations. To compare the performance of the SCED process with that of a standard gamma analysis, both error detection methods were applied to the generated test cases with realistic noise variations. Averaged over ten test runs, 95.1% of all plan variations that resulted in relevant patient dose errors were detected within 2° and 100% within 14° (<4% of patient dose delivery). Including cases that led to slightly modified but clinically equivalent plans, 89.1% were detected by the SCED method within 2°. Based on the type of check that detected the error, determination of error sources was achieved. With noise ranging from no random noise to four times the established noise value, the averaged relevant dose error detection rate of the SCED method was between 94.0% and 95.8% and that of gamma between 82.8% and 89.8%. An EPID-frame-based error detection process for VMAT deliveries was successfully designed and tested via simulations. The SCED method was inspected for robustness with realistic noise variations, demonstrating that it has the potential to detect a large majority of relevant dose errors. Compared to a typical (3%, 3 mm) gamma analysis, the SCED method produced a higher detection rate for all introduced dose errors, identified errors in an earlier stage, displayed a higher robustness to noise variations, and indicated the error source. © 2017 American Association of Physicists in Medicine.

  4. Chaos-on-a-chip secures data transmission in optical fiber links.

    PubMed

    Argyris, Apostolos; Grivas, Evangellos; Hamacher, Michael; Bogris, Adonis; Syvridis, Dimitris

    2010-03-01

    Security in information exchange plays a central role in the deployment of modern communication systems. Besides algorithms, chaos is exploited as a real-time high-speed data encryption technique which enhances the security at the hardware level of optical networks. In this work, compact, fully controllable and stably operating monolithic photonic integrated circuits (PICs) that generate broadband chaotic optical signals are incorporated in chaos-encoded optical transmission systems. Data sequences with rates up to 2.5 Gb/s with small amplitudes are completely encrypted within these chaotic carriers. Only authorized counterparts, supplied with identical chaos generating PICs that are able to synchronize and reproduce the same carriers, can benefit from data exchange with bit-rates up to 2.5Gb/s with error rates below 10(-12). Eavesdroppers with access to the communication link experience a 0.5 probability to detect correctly each bit by direct signal detection, while eavesdroppers supplied with even slightly unmatched hardware receivers are restricted to data extraction error rates well above 10(-3).

  5. Magnetic Resonance–Based Automatic Air Segmentation for Generation of Synthetic Computed Tomography Scans in the Head Region

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zheng, Weili; Kim, Joshua P.; Kadbi, Mo

    2015-11-01

    Purpose: To incorporate a novel imaging sequence for robust air and tissue segmentation using ultrashort echo time (UTE) phase images and to implement an innovative synthetic CT (synCT) solution as a first step toward MR-only radiation therapy treatment planning for brain cancer. Methods and Materials: Ten brain cancer patients were scanned with a UTE/Dixon sequence and other clinical sequences on a 1.0 T open magnet with simulation capabilities. Bone-enhanced images were generated from a weighted combination of water/fat maps derived from Dixon images and inverted UTE images. Automated air segmentation was performed using unwrapped UTE phase maps. Segmentation accuracy was assessedmore » by calculating segmentation errors (true-positive rate, false-positive rate, and Dice similarity indices using CT simulation (CT-SIM) as ground truth. The synCTs were generated using a voxel-based, weighted summation method incorporating T2, fluid attenuated inversion recovery (FLAIR), UTE1, and bone-enhanced images. Mean absolute error (MAE) characterized Hounsfield unit (HU) differences between synCT and CT-SIM. A dosimetry study was conducted, and differences were quantified using γ-analysis and dose-volume histogram analysis. Results: On average, true-positive rate and false-positive rate for the CT and MR-derived air masks were 80.8% ± 5.5% and 25.7% ± 6.9%, respectively. Dice similarity indices values were 0.78 ± 0.04 (range, 0.70-0.83). Full field of view MAE between synCT and CT-SIM was 147.5 ± 8.3 HU (range, 138.3-166.2 HU), with the largest errors occurring at bone–air interfaces (MAE 422.5 ± 33.4 HU for bone and 294.53 ± 90.56 HU for air). Gamma analysis revealed pass rates of 99.4% ± 0.04%, with acceptable treatment plan quality for the cohort. Conclusions: A hybrid MRI phase/magnitude UTE image processing technique was introduced that significantly improved bone and air contrast in MRI. Segmented air masks and bone-enhanced images were integrated into our synCT pipeline for brain, and results agreed well with clinical CTs, thereby supporting MR-only radiation therapy treatment planning in the brain.« less

  6. Magnetic Resonance-Based Automatic Air Segmentation for Generation of Synthetic Computed Tomography Scans in the Head Region.

    PubMed

    Zheng, Weili; Kim, Joshua P; Kadbi, Mo; Movsas, Benjamin; Chetty, Indrin J; Glide-Hurst, Carri K

    2015-11-01

    To incorporate a novel imaging sequence for robust air and tissue segmentation using ultrashort echo time (UTE) phase images and to implement an innovative synthetic CT (synCT) solution as a first step toward MR-only radiation therapy treatment planning for brain cancer. Ten brain cancer patients were scanned with a UTE/Dixon sequence and other clinical sequences on a 1.0 T open magnet with simulation capabilities. Bone-enhanced images were generated from a weighted combination of water/fat maps derived from Dixon images and inverted UTE images. Automated air segmentation was performed using unwrapped UTE phase maps. Segmentation accuracy was assessed by calculating segmentation errors (true-positive rate, false-positive rate, and Dice similarity indices using CT simulation (CT-SIM) as ground truth. The synCTs were generated using a voxel-based, weighted summation method incorporating T2, fluid attenuated inversion recovery (FLAIR), UTE1, and bone-enhanced images. Mean absolute error (MAE) characterized Hounsfield unit (HU) differences between synCT and CT-SIM. A dosimetry study was conducted, and differences were quantified using γ-analysis and dose-volume histogram analysis. On average, true-positive rate and false-positive rate for the CT and MR-derived air masks were 80.8% ± 5.5% and 25.7% ± 6.9%, respectively. Dice similarity indices values were 0.78 ± 0.04 (range, 0.70-0.83). Full field of view MAE between synCT and CT-SIM was 147.5 ± 8.3 HU (range, 138.3-166.2 HU), with the largest errors occurring at bone-air interfaces (MAE 422.5 ± 33.4 HU for bone and 294.53 ± 90.56 HU for air). Gamma analysis revealed pass rates of 99.4% ± 0.04%, with acceptable treatment plan quality for the cohort. A hybrid MRI phase/magnitude UTE image processing technique was introduced that significantly improved bone and air contrast in MRI. Segmented air masks and bone-enhanced images were integrated into our synCT pipeline for brain, and results agreed well with clinical CTs, thereby supporting MR-only radiation therapy treatment planning in the brain. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. Does the cost function matter in Bayes decision rule?

    PubMed

    Schlü ter, Ralf; Nussbaum-Thom, Markus; Ney, Hermann

    2012-02-01

    In many tasks in pattern recognition, such as automatic speech recognition (ASR), optical character recognition (OCR), part-of-speech (POS) tagging, and other string recognition tasks, we are faced with a well-known inconsistency: The Bayes decision rule is usually used to minimize string (symbol sequence) error, whereas, in practice, we want to minimize symbol (word, character, tag, etc.) error. When comparing different recognition systems, we do indeed use symbol error rate as an evaluation measure. The topic of this work is to analyze the relation between string (i.e., 0-1) and symbol error (i.e., metric, integer valued) cost functions in the Bayes decision rule, for which fundamental analytic results are derived. Simple conditions are derived for which the Bayes decision rule with integer-valued metric cost function and with 0-1 cost gives the same decisions or leads to classes with limited cost. The corresponding conditions can be tested with complexity linear in the number of classes. The results obtained do not make any assumption w.r.t. the structure of the underlying distributions or the classification problem. Nevertheless, the general analytic results are analyzed via simulations of string recognition problems with Levenshtein (edit) distance cost function. The results support earlier findings that considerable improvements are to be expected when initial error rates are high.

  8. High Degree of Interlaboratory Reproducibility of Human Immunodeficiency Virus Type 1 Protease and Reverse Transcriptase Sequencing of Plasma Samples from Heavily Treated Patients

    PubMed Central

    Shafer, Robert W.; Hertogs, Kurt; Zolopa, Andrew R.; Warford, Ann; Bloor, Stuart; Betts, Bradley J.; Merigan, Thomas C.; Harrigan, Richard; Larder, Brendon A.

    2001-01-01

    We assessed the reproducibility of human immunodeficiency virus type 1 (HIV-1) reverse transcriptase (RT) and protease sequencing using cryopreserved plasma aliquots obtained from 46 heavily treated HIV-1-infected individuals in two laboratories using dideoxynucleotide sequencing. The rates of complete sequence concordance between the two laboratories were 99.1% for the protease sequence and 99.0% for the RT sequence. Approximately 90% of the discordances were partial, defined as one laboratory detecting a mixture and the second laboratory detecting only one of the mixture's components. Only 0.1% of the nucleotides were completely discordant between the two laboratories, and these were significantly more likely to occur in plasma samples with lower plasma HIV-1 RNA levels. Nucleotide mixtures were detected at approximately 1% of the nucleotide positions, and in every case in which one laboratory detected a mixture, the second laboratory either detected the same mixture or detected one of the mixture's components. The high rate of concordance in detecting mixtures and the fact that most discordances between the two laboratories were partial suggest that most discordances were caused by variation in sampling of the HIV-1 quasispecies by PCR rather than by technical errors in the sequencing process itself. PMID:11283081

  9. A better sequence-read simulator program for metagenomics.

    PubMed

    Johnson, Stephen; Trost, Brett; Long, Jeffrey R; Pittet, Vanessa; Kusalik, Anthony

    2014-01-01

    There are many programs available for generating simulated whole-genome shotgun sequence reads. The data generated by many of these programs follow predefined models, which limits their use to the authors' original intentions. For example, many models assume that read lengths follow a uniform or normal distribution. Other programs generate models from actual sequencing data, but are limited to reads from single-genome studies. To our knowledge, there are no programs that allow a user to generate simulated data following non-parametric read-length distributions and quality profiles based on empirically-derived information from metagenomics sequencing data. We present BEAR (Better Emulation for Artificial Reads), a program that uses a machine-learning approach to generate reads with lengths and quality values that closely match empirically-derived distributions. BEAR can emulate reads from various sequencing platforms, including Illumina, 454, and Ion Torrent. BEAR requires minimal user input, as it automatically determines appropriate parameter settings from user-supplied data. BEAR also uses a unique method for deriving run-specific error rates, and extracts useful statistics from the metagenomic data itself, such as quality-error models. Many existing simulators are specific to a particular sequencing technology; however, BEAR is not restricted in this way. Because of its flexibility, BEAR is particularly useful for emulating the behaviour of technologies like Ion Torrent, for which no dedicated sequencing simulators are currently available. BEAR is also the first metagenomic sequencing simulator program that automates the process of generating abundances, which can be an arduous task. BEAR is useful for evaluating data processing tools in genomics. It has many advantages over existing comparable software, such as generating more realistic reads and being independent of sequencing technology, and has features particularly useful for metagenomics work.

  10. At least some errors are randomly generated (Freud was wrong)

    NASA Technical Reports Server (NTRS)

    Sellen, A. J.; Senders, J. W.

    1986-01-01

    An experiment was carried out to expose something about human error generating mechanisms. In the context of the experiment, an error was made when a subject pressed the wrong key on a computer keyboard or pressed no key at all in the time allotted. These might be considered, respectively, errors of substitution and errors of omission. Each of seven subjects saw a sequence of three digital numbers, made an easily learned binary judgement about each, and was to press the appropriate one of two keys. Each session consisted of 1,000 presentations of randomly permuted, fixed numbers broken into 10 blocks of 100. One of two keys should have been pressed within one second of the onset of each stimulus. These data were subjected to statistical analyses in order to probe the nature of the error generating mechanisms. Goodness of fit tests for a Poisson distribution for the number of errors per 50 trial interval and for an exponential distribution of the length of the intervals between errors were carried out. There is evidence for an endogenous mechanism that may best be described as a random error generator. Furthermore, an item analysis of the number of errors produced per stimulus suggests the existence of a second mechanism operating on task driven factors producing exogenous errors. Some errors, at least, are the result of constant probability generating mechanisms with error rate idiosyncratically determined for each subject.

  11. Discrimination of plant-parasitic nematodes from complex soil communities using ecometagenetics.

    PubMed

    Porazinska, Dorota L; Morgan, Matthew J; Gaspar, John M; Court, Leon N; Hardy, Christopher M; Hodda, Mike

    2014-07-01

    Many plant pathogens are microscopic, cryptic, and difficult to diagnose. The new approach of ecometagenetics, involving ultrasequencing, bioinformatics, and biostatistics, has the potential to improve diagnoses of plant pathogens such as nematodes from the complex mixtures found in many agricultural and biosecurity situations. We tested this approach on a gradient of complexity ranging from a few individuals from a few species of known nematode pathogens in a relatively defined substrate to a complex and poorly known suite of nematode pathogens in a complex forest soil, including its associated biota of unknown protists, fungi, and other microscopic eukaryotes. We added three known but contrasting species (Pratylenchus neglectus, the closely related P. thornei, and Heterodera avenae) to half the set of substrates, leaving the other half without them. We then tested whether all nematode pathogens-known and unknown, indigenous, and experimentally added-were detected consistently present or absent. We always detected the Pratylenchus spp. correctly and with the number of sequence reads proportional to the numbers added. However, a single cyst of H. avenae was only identified approximately half the time it was present. Other plant-parasitic nematodes and nematodes from other trophic groups were detected well but other eukaryotes were detected less consistently. DNA sampling errors or informatic errors or both were involved in misidentification of H. avenae; however, the proportions of each varied in the different bioinformatic pipelines and with different parameters used. To a large extent, false-positive and false-negative errors were complementary: pipelines and parameters with the highest false-positive rates had the lowest false-negative rates and vice versa. Sources of error identified included assumptions in the bioinformatic pipelines, slight differences in primer regions, the number of sequence reads regarded as the minimum threshold for inclusion in analysis, and inaccessible DNA in resistant life stages. Identification of the sources of error allows us to suggest ways to improve identification using ecometagenetics.

  12. In Vitro vs In Silico Detected SNPs for the Development of a Genotyping Array: What Can We Learn from a Non-Model Species?

    PubMed Central

    Lepoittevin, Camille; Frigerio, Jean-Marc; Garnier-Géré, Pauline; Salin, Franck; Cervera, María-Teresa; Vornam, Barbara; Harvengt, Luc; Plomion, Christophe

    2010-01-01

    Background There is considerable interest in the high-throughput discovery and genotyping of single nucleotide polymorphisms (SNPs) to accelerate genetic mapping and enable association studies. This study provides an assessment of EST-derived and resequencing-derived SNP quality in maritime pine (Pinus pinaster Ait.), a conifer characterized by a huge genome size (∼23.8 Gb/C). Methodology/Principal Findings A 384-SNPs GoldenGate genotyping array was built from i/ 184 SNPs originally detected in a set of 40 re-sequenced candidate genes (in vitro SNPs), chosen on the basis of functionality scores, presence of neighboring polymorphisms, minor allele frequencies and linkage disequilibrium and ii/ 200 SNPs screened from ESTs (in silico SNPs) selected based on the number of ESTs used for SNP detection, the SNP minor allele frequency and the quality of SNP flanking sequences. The global success rate of the assay was 66.9%, and a conversion rate (considering only polymorphic SNPs) of 51% was achieved. In vitro SNPs showed significantly higher genotyping-success and conversion rates than in silico SNPs (+11.5% and +18.5%, respectively). The reproducibility was 100%, and the genotyping error rate very low (0.54%, dropping down to 0.06% when removing four SNPs showing elevated error rates). Conclusions/Significance This study demonstrates that ESTs provide a resource for SNP identification in non-model species, which do not require any additional bench work and little bio-informatics analysis. However, the time and cost benefits of in silico SNPs are counterbalanced by a lower conversion rate than in vitro SNPs. This drawback is acceptable for population-based experiments, but could be dramatic in experiments involving samples from narrow genetic backgrounds. In addition, we showed that both the visual inspection of genotyping clusters and the estimation of a per SNP error rate should help identify markers that are not suitable to the GoldenGate technology in species characterized by a large and complex genome. PMID:20543950

  13. Equalization of nonlinear transmission impairments by maximum-likelihood-sequence estimation in digital coherent receivers.

    PubMed

    Khairuzzaman, Md; Zhang, Chao; Igarashi, Koji; Katoh, Kazuhiro; Kikuchi, Kazuro

    2010-03-01

    We describe a successful introduction of maximum-likelihood-sequence estimation (MLSE) into digital coherent receivers together with finite-impulse response (FIR) filters in order to equalize both linear and nonlinear fiber impairments. The MLSE equalizer based on the Viterbi algorithm is implemented in the offline digital signal processing (DSP) core. We transmit 20-Gbit/s quadrature phase-shift keying (QPSK) signals through a 200-km-long standard single-mode fiber. The bit-error rate performance shows that the MLSE equalizer outperforms the conventional adaptive FIR filter, especially when nonlinear impairments are predominant.

  14. Repeat-aware modeling and correction of short read errors.

    PubMed

    Yang, Xiao; Aluru, Srinivas; Dorman, Karin S

    2011-02-15

    High-throughput short read sequencing is revolutionizing genomics and systems biology research by enabling cost-effective deep coverage sequencing of genomes and transcriptomes. Error detection and correction are crucial to many short read sequencing applications including de novo genome sequencing, genome resequencing, and digital gene expression analysis. Short read error detection is typically carried out by counting the observed frequencies of kmers in reads and validating those with frequencies exceeding a threshold. In case of genomes with high repeat content, an erroneous kmer may be frequently observed if it has few nucleotide differences with valid kmers with multiple occurrences in the genome. Error detection and correction were mostly applied to genomes with low repeat content and this remains a challenging problem for genomes with high repeat content. We develop a statistical model and a computational method for error detection and correction in the presence of genomic repeats. We propose a method to infer genomic frequencies of kmers from their observed frequencies by analyzing the misread relationships among observed kmers. We also propose a method to estimate the threshold useful for validating kmers whose estimated genomic frequency exceeds the threshold. We demonstrate that superior error detection is achieved using these methods. Furthermore, we break away from the common assumption of uniformly distributed errors within a read, and provide a framework to model position-dependent error occurrence frequencies common to many short read platforms. Lastly, we achieve better error correction in genomes with high repeat content. The software is implemented in C++ and is freely available under GNU GPL3 license and Boost Software V1.0 license at "http://aluru-sun.ece.iastate.edu/doku.php?id = redeem". We introduce a statistical framework to model sequencing errors in next-generation reads, which led to promising results in detecting and correcting errors for genomes with high repeat content.

  15. Cavemen Were Better at Depicting Quadruped Walking than Modern Artists: Erroneous Walking Illustrations in the Fine Arts from Prehistory to Today

    PubMed Central

    Horvath, Gabor; Farkas, Etelka; Boncz, Ildiko; Blaho, Miklos; Kriska, Gyorgy

    2012-01-01

    The experts of animal locomotion well know the characteristics of quadruped walking since the pioneering work of Eadweard Muybridge in the 1880s. Most of the quadrupeds advance their legs in the same lateral sequence when walking, and only the timing of their supporting feet differ more or less. How did this scientific knowledge influence the correctness of quadruped walking depictions in the fine arts? Did the proportion of erroneous quadruped walking illustrations relative to their total number (i.e. error rate) decrease after Muybridge? How correctly have cavemen (upper palaeolithic Homo sapiens) illustrated the walking of their quadruped prey in prehistoric times? The aim of this work is to answer these questions. We have analyzed 1000 prehistoric and modern artistic quadruped walking depictions and determined whether they are correct or not in respect of the limb attitudes presented, assuming that the other aspects of depictions used to determine the animals gait are illustrated correctly. The error rate of modern pre-Muybridgean quadruped walking illustrations was 83.5%, much more than the error rate of 73.3% of mere chance. It decreased to 57.9% after 1887, that is in the post-Muybridgean period. Most surprisingly, the prehistoric quadruped walking depictions had the lowest error rate of 46.2%. All these differences were statistically significant. Thus, cavemen were more keenly aware of the slower motion of their prey animals and illustrated quadruped walking more precisely than later artists. PMID:23227149

  16. Cavemen were better at depicting quadruped walking than modern artists: erroneous walking illustrations in the fine arts from prehistory to today.

    PubMed

    Horvath, Gabor; Farkas, Etelka; Boncz, Ildiko; Blaho, Miklos; Kriska, Gyorgy

    2012-01-01

    The experts of animal locomotion well know the characteristics of quadruped walking since the pioneering work of Eadweard Muybridge in the 1880s. Most of the quadrupeds advance their legs in the same lateral sequence when walking, and only the timing of their supporting feet differ more or less. How did this scientific knowledge influence the correctness of quadruped walking depictions in the fine arts? Did the proportion of erroneous quadruped walking illustrations relative to their total number (i.e. error rate) decrease after Muybridge? How correctly have cavemen (upper palaeolithic Homo sapiens) illustrated the walking of their quadruped prey in prehistoric times? The aim of this work is to answer these questions. We have analyzed 1000 prehistoric and modern artistic quadruped walking depictions and determined whether they are correct or not in respect of the limb attitudes presented, assuming that the other aspects of depictions used to determine the animals gait are illustrated correctly. The error rate of modern pre-Muybridgean quadruped walking illustrations was 83.5%, much more than the error rate of 73.3% of mere chance. It decreased to 57.9% after 1887, that is in the post-Muybridgean period. Most surprisingly, the prehistoric quadruped walking depictions had the lowest error rate of 46.2%. All these differences were statistically significant. Thus, cavemen were more keenly aware of the slower motion of their prey animals and illustrated quadruped walking more precisely than later artists.

  17. Evaluating the effectiveness of SW-only video coding for real-time video transmission over low-rate wireless networks

    NASA Astrophysics Data System (ADS)

    Bartolini, Franco; Pasquini, Cristina; Piva, Alessandro

    2001-04-01

    The recent development of video compression algorithms allowed the diffusion of systems for the transmission of video sequences over data networks. However, the transmission over error prone mobile communication channels is yet an open issue. In this paper, a system developed for the real time transmission of H263 video coded sequences over TETRA mobile networks is presented. TETRA is an open digital trunked radio standard defined by the European Telecommunications Standardization Institute developed for professional mobile radio users, providing full integration of voice and data services. Experimental tests demonstrate that, in spite of the low frame rate allowed by the SW only implementation of the decoder and by the low channel rate a video compression technique such as that complying with the H263 standard, is still preferable to a simpler but less effective frame based compression system.

  18. Volcanic Eruption Forecasts From Accelerating Rates of Drumbeat Long-Period Earthquakes

    NASA Astrophysics Data System (ADS)

    Bell, Andrew F.; Naylor, Mark; Hernandez, Stephen; Main, Ian G.; Gaunt, H. Elizabeth; Mothes, Patricia; Ruiz, Mario

    2018-02-01

    Accelerating rates of quasiperiodic "drumbeat" long-period earthquakes (LPs) are commonly reported before eruptions at andesite and dacite volcanoes, and promise insights into the nature of fundamental preeruptive processes and improved eruption forecasts. Here we apply a new Bayesian Markov chain Monte Carlo gamma point process methodology to investigate an exceptionally well-developed sequence of drumbeat LPs preceding a recent large vulcanian explosion at Tungurahua volcano, Ecuador. For more than 24 hr, LP rates increased according to the inverse power law trend predicted by material failure theory, and with a retrospectively forecast failure time that agrees with the eruption onset within error. LPs resulted from repeated activation of a single characteristic source driven by accelerating loading, rather than a distributed failure process, showing that similar precursory trends can emerge from quite different underlying physics. Nevertheless, such sequences have clear potential for improving forecasts of eruptions at Tungurahua and analogous volcanoes.

  19. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics

    PubMed Central

    Nesvizhskii, Alexey I.

    2010-01-01

    This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues. PMID:20816881

  20. Two-dimensional optoelectronic interconnect-processor and its operational bit error rate

    NASA Astrophysics Data System (ADS)

    Liu, J. Jiang; Gollsneider, Brian; Chang, Wayne H.; Carhart, Gary W.; Vorontsov, Mikhail A.; Simonis, George J.; Shoop, Barry L.

    2004-10-01

    Two-dimensional (2-D) multi-channel 8x8 optical interconnect and processor system were designed and developed using complementary metal-oxide-semiconductor (CMOS) driven 850-nm vertical-cavity surface-emitting laser (VCSEL) arrays and the photodetector (PD) arrays with corresponding wavelengths. We performed operation and bit-error-rate (BER) analysis on this free-space integrated 8x8 VCSEL optical interconnects driven by silicon-on-sapphire (SOS) circuits. Pseudo-random bit stream (PRBS) data sequence was used in operation of the interconnects. Eye diagrams were measured from individual channels and analyzed using a digital oscilloscope at data rates from 155 Mb/s to 1.5 Gb/s. Using a statistical model of Gaussian distribution for the random noise in the transmission, we developed a method to compute the BER instantaneously with the digital eye-diagrams. Direct measurements on this interconnects were also taken on a standard BER tester for verification. We found that the results of two methods were in the same order and within 50% accuracy. The integrated interconnects were investigated in an optoelectronic processing architecture of digital halftoning image processor. Error diffusion networks implemented by the inherently parallel nature of photonics promise to provide high quality digital halftoned images.

  1. Protective gloves for use in high-risk patients: how much do they affect the dexterity of the surgeon?

    PubMed Central

    Phillips, A. M.; Birch, N. C.; Ribbans, W. J.

    1997-01-01

    Twenty-five orthopaedic surgeons underwent eight motor and sensory tests while using four different glove combinations and without gloves. As well as single and double latex, surgeons wore a simple Kevlar glove with latex inside and outside and then wore a Kevlar and Medak glove with latex inside and outside, as recommended by the manufacturers. The effect of learning with each sequence was neutralised by randomising the glove order. The time taken to complete each test was recorded and, where appropriate, error rates were noted. Simple sensory tests took progressively longer to perform so that using the thickest glove combination led to the completion times being doubled. Error rates increased significantly. Tests of stereognosis also took longer and use of the thickest glove combination caused these tests to take three times as long on average. Error rates again increased significantly. However, prolongation of motor tasks was less marked. We conclude that, armed with this quantitative analysis of sensitivity and dexterity impairment, surgeons can judge the relative difficulties that may be incurred as a result of wearing the gloves against the benefits that they offer in protection. PMID:9135240

  2. Higher-level phylogeny of paraneopteran insects inferred from mitochondrial genome sequences

    PubMed Central

    Li, Hu; Shao, Renfu; Song, Nan; Song, Fan; Jiang, Pei; Li, Zhihong; Cai, Wanzhi

    2015-01-01

    Mitochondrial (mt) genome data have been proven to be informative for animal phylogenetic studies but may also suffer from systematic errors, due to the effects of accelerated substitution rate and compositional heterogeneity. We analyzed the mt genomes of 25 insect species from the four paraneopteran orders, aiming to better understand how accelerated substitution rate and compositional heterogeneity affect the inferences of the higher-level phylogeny of this diverse group of hemimetabolous insects. We found substantial heterogeneity in base composition and contrasting rates in nucleotide substitution among these paraneopteran insects, which complicate the inference of higher-level phylogeny. The phylogenies inferred with concatenated sequences of mt genes using maximum likelihood and Bayesian methods and homogeneous models failed to recover Psocodea and Hemiptera as monophyletic groups but grouped, instead, the taxa that had accelerated substitution rates together, including Sternorrhyncha (a suborder of Hemiptera), Thysanoptera, Phthiraptera and Liposcelididae (a family of Psocoptera). Bayesian inference with nucleotide sequences and heterogeneous models (CAT and CAT + GTR), however, recovered Psocodea, Thysanoptera and Hemiptera each as a monophyletic group. Within Psocodea, Liposcelididae is more closely related to Phthiraptera than to other species of Psocoptera. Furthermore, Thysanoptera was recovered as the sister group to Hemiptera. PMID:25704094

  3. Curated eutherian third party data gene data sets.

    PubMed

    Premzl, Marko

    2016-03-01

    The free available eutherian genomic sequence data sets advanced scientific field of genomics. Of note, future revisions of gene data sets were expected, due to incompleteness of public eutherian genomic sequence assemblies and potential genomic sequence errors. The eutherian comparative genomic analysis protocol was proposed as guidance in protection against potential genomic sequence errors in public eutherian genomic sequences. The protocol was applicable in updates of 7 major eutherian gene data sets, including 812 complete coding sequences deposited in European Nucleotide Archive as curated third party data gene data sets.

  4. QualComp: a new lossy compressor for quality scores based on rate distortion theory

    PubMed Central

    2013-01-01

    Background Next Generation Sequencing technologies have revolutionized many fields in biology by reducing the time and cost required for sequencing. As a result, large amounts of sequencing data are being generated. A typical sequencing data file may occupy tens or even hundreds of gigabytes of disk space, prohibitively large for many users. This data consists of both the nucleotide sequences and per-base quality scores that indicate the level of confidence in the readout of these sequences. Quality scores account for about half of the required disk space in the commonly used FASTQ format (before compression), and therefore the compression of the quality scores can significantly reduce storage requirements and speed up analysis and transmission of sequencing data. Results In this paper, we present a new scheme for the lossy compression of the quality scores, to address the problem of storage. Our framework allows the user to specify the rate (bits per quality score) prior to compression, independent of the data to be compressed. Our algorithm can work at any rate, unlike other lossy compression algorithms. We envisage our algorithm as being part of a more general compression scheme that works with the entire FASTQ file. Numerical experiments show that we can achieve a better mean squared error (MSE) for small rates (bits per quality score) than other lossy compression schemes. For the organism PhiX, whose assembled genome is known and assumed to be correct, we show that it is possible to achieve a significant reduction in size with little compromise in performance on downstream applications (e.g., alignment). Conclusions QualComp is an open source software package, written in C and freely available for download at https://sourceforge.net/projects/qualcomp. PMID:23758828

  5. Viterbi equalization for long-distance, high-speed underwater laser communication

    NASA Astrophysics Data System (ADS)

    Hu, Siqi; Mi, Le; Zhou, Tianhua; Chen, Weibiao

    2017-07-01

    In long-distance, high-speed underwater laser communication, because of the strong absorption and scattering processes, the laser pulse is stretched with the increase in communication distance and the decrease in water clarity. The maximum communication bandwidth is limited by laser-pulse stretching. Improving the communication rate increases the intersymbol interference (ISI). To reduce the effect of ISI, the Viterbi equalization (VE) algorithm is used to estimate the maximum-likelihood receiving sequence. The Monte Carlo method is used to simulate the stretching of the received laser pulse and the maximum communication rate at a wavelength of 532 nm in Jerlov IB and Jerlov II water channels with communication distances of 80, 100, and 130 m, respectively. The high-data rate communication performance for the VE and hard-decision algorithms is compared. The simulation results show that the VE algorithm can be used to reduce the ISI by selecting the minimum error path. The trade-off between the high-data rate communication performance and minor bit-error rate performance loss makes VE a promising option for applications in long-distance, high-speed underwater laser communication systems.

  6. Model-based quality assessment and base-calling for second-generation sequencing data.

    PubMed

    Bravo, Héctor Corrada; Irizarry, Rafael A

    2010-09-01

    Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads-strings of A,C,G, or T's, between 30 and 100 characters long-which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base-calling. The complexity of the base-calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across-sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec-gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base-calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base-calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base-calling performance. © 2009, The International Biometric Society.

  7. Continuous- and discrete-time stimulus sequences for high stimulus rate paradigm in evoked potential studies.

    PubMed

    Wang, Tao; Huang, Jiang-hua; Lin, Lin; Zhan, Chang'an A

    2013-01-01

    To obtain reliable transient auditory evoked potentials (AEPs) from EEGs recorded using high stimulus rate (HSR) paradigm, it is critical to design the stimulus sequences of appropriate frequency properties. Traditionally, the individual stimulus events in a stimulus sequence occur only at discrete time points dependent on the sampling frequency of the recording system and the duration of stimulus sequence. This dependency likely causes the implementation of suboptimal stimulus sequences, sacrificing the reliability of resulting AEPs. In this paper, we explicate the use of continuous-time stimulus sequence for HSR paradigm, which is independent of the discrete electroencephalogram (EEG) recording system. We employ simulation studies to examine the applicability of the continuous-time stimulus sequences and the impacts of sampling frequency on AEPs in traditional studies using discrete-time design. Results from these studies show that the continuous-time sequences can offer better frequency properties and improve the reliability of recovered AEPs. Furthermore, we find that the errors in the recovered AEPs depend critically on the sampling frequencies of experimental systems, and their relationship can be fitted using a reciprocal function. As such, our study contributes to the literature by demonstrating the applicability and advantages of continuous-time stimulus sequences for HSR paradigm and by revealing the relationship between the reliability of AEPs and sampling frequencies of the experimental systems when discrete-time stimulus sequences are used in traditional manner for the HSR paradigm.

  8. Rapid and accurate taxonomic classification of insect (class Insecta) cytochrome c oxidase subunit 1 (COI) DNA barcode sequences using a naïve Bayesian classifier

    PubMed Central

    Porter, Teresita M; Gibson, Joel F; Shokralla, Shadi; Baird, Donald J; Golding, G Brian; Hajibabaei, Mehrdad

    2014-01-01

    Current methods to identify unknown insect (class Insecta) cytochrome c oxidase (COI barcode) sequences often rely on thresholds of distances that can be difficult to define, sequence similarity cut-offs, or monophyly. Some of the most commonly used metagenomic classification methods do not provide a measure of confidence for the taxonomic assignments they provide. The aim of this study was to use a naïve Bayesian classifier (Wang et al. Applied and Environmental Microbiology, 2007; 73: 5261) to automate taxonomic assignments for large batches of insect COI sequences such as data obtained from high-throughput environmental sequencing. This method provides rank-flexible taxonomic assignments with an associated bootstrap support value, and it is faster than the blast-based methods commonly used in environmental sequence surveys. We have developed and rigorously tested the performance of three different training sets using leave-one-out cross-validation, two field data sets, and targeted testing of Lepidoptera, Diptera and Mantodea sequences obtained from the Barcode of Life Data system. We found that type I error rates, incorrect taxonomic assignments with a high bootstrap support, were already relatively low but could be lowered further by ensuring that all query taxa are actually present in the reference database. Choosing bootstrap support cut-offs according to query length and summarizing taxonomic assignments to more inclusive ranks can also help to reduce error while retaining the maximum number of assignments. Additionally, we highlight gaps in the taxonomic and geographic representation of insects in public sequence databases that will require further work by taxonomists to improve the quality of assignments generated using any method.

  9. Automatic mouse ultrasound detector (A-MUD): A new tool for processing rodent vocalizations

    PubMed Central

    Reitschmidt, Doris; Noll, Anton; Balazs, Peter; Penn, Dustin J.

    2017-01-01

    House mice (Mus musculus) emit complex ultrasonic vocalizations (USVs) during social and sexual interactions, which have features similar to bird song (i.e., they are composed of several different types of syllables, uttered in succession over time to form a pattern of sequences). Manually processing complex vocalization data is time-consuming and potentially subjective, and therefore, we developed an algorithm that automatically detects mouse ultrasonic vocalizations (Automatic Mouse Ultrasound Detector or A-MUD). A-MUD is a script that runs on STx acoustic software (S_TOOLS-STx version 4.2.2), which is free for scientific use. This algorithm improved the efficiency of processing USV files, as it was 4–12 times faster than manual segmentation, depending upon the size of the file. We evaluated A-MUD error rates using manually segmented sound files as a ‘gold standard’ reference, and compared them to a commercially available program. A-MUD had lower error rates than the commercial software, as it detected significantly more correct positives, and fewer false positives and false negatives. The errors generated by A-MUD were mainly false negatives, rather than false positives. This study is the first to systematically compare error rates for automatic ultrasonic vocalization detection methods, and A-MUD and subsequent versions will be made available for the scientific community. PMID:28727808

  10. Word-Synchronous Optical Sampling of Periodically Repeated OTDM Data Words for True Waveform Visualization

    NASA Astrophysics Data System (ADS)

    Benkler, Erik; Telle, Harald R.

    2007-06-01

    An improved phase-locked loop (PLL) for versatile synchronization of a sampling pulse train to an optical data stream is presented. It enables optical sampling of the true waveform of repetitive high bit-rate optical time division multiplexed (OTDM) data words such as pseudorandom bit sequences. Visualization of the true waveform can reveal details, which cause systematic bit errors. Such errors cannot be inferred from eye diagrams and require word-synchronous sampling. The programmable direct-digital-synthesis circuit used in our novel PLL approach allows flexible adaption of virtually any problem-specific synchronization scenario, including those required for waveform sampling, for jitter measurements by slope detection, and for classical eye-diagrams. Phase comparison of the PLL is performed at 10-GHz OTDM base clock rate, leading to a residual synchronization jitter of less than 70 fs.

  11. Pattern recognition of electronic bit-sequences using a semiconductor mode-locked laser and spatial light modulators

    NASA Astrophysics Data System (ADS)

    Bhooplapur, Sharad; Akbulut, Mehmetkan; Quinlan, Franklyn; Delfyett, Peter J.

    2010-04-01

    A novel scheme for recognition of electronic bit-sequences is demonstrated. Two electronic bit-sequences that are to be compared are each mapped to a unique code from a set of Walsh-Hadamard codes. The codes are then encoded in parallel on the spectral phase of the frequency comb lines from a frequency-stabilized mode-locked semiconductor laser. Phase encoding is achieved by using two independent spatial light modulators based on liquid crystal arrays. Encoded pulses are compared using interferometric pulse detection and differential balanced photodetection. Orthogonal codes eight bits long are compared, and matched codes are successfully distinguished from mismatched codes with very low error rates, of around 10-18. This technique has potential for high-speed, high accuracy recognition of bit-sequences, with applications in keyword searches and internet protocol packet routing.

  12. High density bit transition requirements versus the effects on BCH error correcting code. [bit synchronization

    NASA Technical Reports Server (NTRS)

    Ingels, F. M.; Schoggen, W. O.

    1982-01-01

    The design to achieve the required bit transition density for the Space Shuttle high rate multiplexes (HRM) data stream of the Space Laboratory Vehicle is reviewed. It contained a recommended circuit approach, specified the pseudo random (PN) sequence to be used and detailed the properties of the sequence. Calculations showing the probability of failing to meet the required transition density were included. A computer simulation of the data stream and PN cover sequence was provided. All worst case situations were simulated and the bit transition density exceeded that required. The Preliminary Design Review and the critical Design Review are documented. The Cover Sequence Generator (CSG) Encoder/Decoder design was constructed and demonstrated. The demonstrations were successful. All HRM and HRDM units incorporate the CSG encoder or CSG decoder as appropriate.

  13. Theories of Lethal Mutagenesis: From Error Catastrophe to Lethal Defection.

    PubMed

    Tejero, Héctor; Montero, Francisco; Nuño, Juan Carlos

    2016-01-01

    RNA viruses get extinct in a process called lethal mutagenesis when subjected to an increase in their mutation rate, for instance, by the action of mutagenic drugs. Several approaches have been proposed to understand this phenomenon. The extinction of RNA viruses by increased mutational pressure was inspired by the concept of the error threshold. The now classic quasispecies model predicts the existence of a limit to the mutation rate beyond which the genetic information of the wild type could not be efficiently transmitted to the next generation. This limit was called the error threshold, and for mutation rates larger than this threshold, the quasispecies was said to enter into error catastrophe. This transition has been assumed to foster the extinction of the whole population. Alternative explanations of lethal mutagenesis have been proposed recently. In the first place, a distinction is made between the error threshold and the extinction threshold, the mutation rate beyond which a population gets extinct. Extinction is explained from the effect the mutation rate has, throughout the mutational load, on the reproductive ability of the whole population. Secondly, lethal defection takes also into account the effect of interactions within mutant spectra, which have been shown to be determinant for the understanding the extinction of RNA virus due to an augmented mutational pressure. Nonetheless, some relevant issues concerning lethal mutagenesis are not completely understood yet, as so survival of the flattest, i.e. the development of resistance to lethal mutagenesis by evolving towards mutationally more robust regions of sequence space, or sublethal mutagenesis, i.e., the increase of the mutation rate below the extinction threshold which may boost the adaptability of RNA virus, increasing their ability to develop resistance to drugs (including mutagens). A better design of antiviral therapies will still require an improvement of our knowledge about lethal mutagenesis.

  14. Enhanced intercarrier interference mitigation based on encoded bit-sequence distribution inside optical superchannels

    NASA Astrophysics Data System (ADS)

    Torres, Jhon James Granada; Soto, Ana María Cárdenas; González, Neil Guerrero

    2016-10-01

    In the context of gridless optical multicarrier systems, we propose a method for intercarrier interference (ICI) mitigation which allows bit error correction in scenarios of nonspectral flatness between the subcarriers composing the multicarrier system and sub-Nyquist carrier spacing. We propose a hybrid ICI mitigation technique which exploits the advantages of signal equalization at both levels: the physical level for any digital and analog pulse shaping, and the bit-data level and its ability to incorporate advanced correcting codes. The concatenation of these two complementary techniques consists of a nondata-aided equalizer applied to each optical subcarrier, and a hard-decision forward error correction applied to the sequence of bits distributed along the optical subcarriers regardless of prior subchannel quality assessment as performed in orthogonal frequency-division multiplexing modulations for the implementation of the bit-loading technique. The impact of the ICI is systematically evaluated in terms of bit-error-rate as a function of the carrier frequency spacing and the roll-off factor of the digital pulse-shaping filter for a simulated 3×32-Gbaud single-polarization quadrature phase shift keying Nyquist-wavelength division multiplexing system. After the ICI mitigation, a back-to-back error-free decoding was obtained for sub-Nyquist carrier spacings of 28.5 and 30 GHz and roll-off values of 0.1 and 0.4, respectively.

  15. Quick, sensitive and specific detection and evaluation of quantification of minor variants by high-throughput sequencing.

    PubMed

    Leung, Ross Ka-Kit; Dong, Zhi Qiang; Sa, Fei; Chong, Cheong Meng; Lei, Si Wan; Tsui, Stephen Kwok-Wing; Lee, Simon Ming-Yuen

    2014-02-01

    Minor variants have significant implications in quasispecies evolution, early cancer detection and non-invasive fetal genotyping but their accurate detection by next-generation sequencing (NGS) is hampered by sequencing errors. We generated sequencing data from mixtures at predetermined ratios in order to provide insight into sequencing errors and variations that can arise for which simulation cannot be performed. The information also enables better parameterization in depth of coverage, read quality and heterogeneity, library preparation techniques, technical repeatability for mathematical modeling, theory development and simulation experimental design. We devised minor variant authentication rules that achieved 100% accuracy in both testing and validation experiments. The rules are free from tedious inspection of alignment accuracy, sequencing read quality or errors introduced by homopolymers. The authentication processes only require minor variants to: (1) have minimum depth of coverage larger than 30; (2) be reported by (a) four or more variant callers, or (b) DiBayes or LoFreq, plus SNVer (or BWA when no results are returned by SNVer), and with the interassay coefficient of variation (CV) no larger than 0.1. Quantification accuracy undermined by sequencing errors could neither be overcome by ultra-deep sequencing, nor recruiting more variant callers to reach a consensus, such that consistent underestimation and overestimation (i.e. low CV) were observed. To accommodate stochastic error and adjust the observed ratio within a specified accuracy, we presented a proof of concept for the use of a double calibration curve for quantification, which provides an important reference towards potential industrial-scale fabrication of calibrants for NGS.

  16. Muroid rodent phylogenetics: 900-species tree reveals increasing diversification rates

    PubMed Central

    Schenk, John J.

    2017-01-01

    We combined new sequence data for more than 300 muroid rodent species with our previously published sequences for up to five nuclear and one mitochondrial genes to generate the most widely and densely sampled hypothesis of evolutionary relationships across Muroidea. An exhaustive screening procedure for publically available sequences was implemented to avoid the propagation of taxonomic errors that are common to supermatrix studies. The combined data set of carefully screened sequences derived from all available sequences on GenBank with our new data resulted in a robust maximum likelihood phylogeny for 900 of the approximately 1,620 muroids. Several regions that were equivocally resolved in previous studies are now more decisively resolved, and we estimated a chronogram using 28 fossil calibrations for the most integrated age and topological estimates to date. The results were used to update muroid classification and highlight questions needing additional data. We also compared the results of multigene supermatrix studies like this one with the principal published supertrees and concluded that the latter are unreliable for any comparative study in muroids. In addition, we explored diversification patterns as an explanation for why muroid rodents represent one of the most species-rich groups of mammals by detecting evidence for increasing net diversification rates through time across the muroid tree. We suggest the observation of increasing rates may be due to a combination of parallel increases in rate across clades and high average extinction rates. Five increased diversification-rate-shifts were inferred, suggesting that multiple, but perhaps not independent, events have led to the remarkable species diversity in the superfamily. Our results provide a phylogenetic framework for comparative studies that is not highly dependent upon the signal from any one gene. PMID:28813483

  17. Lip-reading enhancement for law enforcement

    NASA Astrophysics Data System (ADS)

    Theobald, Barry J.; Harvey, Richard; Cox, Stephen J.; Lewis, Colin; Owen, Gari P.

    2006-09-01

    Accurate lip-reading techniques would be of enormous benefit for agencies involved in counter-terrorism and other law-enforcement areas. Unfortunately, there are very few skilled lip-readers, and it is apparently a difficult skill to transmit, so the area is under-resourced. In this paper we investigate the possibility of making the lip-reading task more amenable to a wider range of operators by enhancing lip movements in video sequences using active appearance models. These are generative, parametric models commonly used to track faces in images and video sequences. The parametric nature of the model allows a face in an image to be encoded in terms of a few tens of parameters, while the generative nature allows faces to be re-synthesised using the parameters. The aim of this study is to determine if exaggerating lip-motions in video sequences by amplifying the parameters of the model improves lip-reading ability. We also present results of lip-reading tests undertaken by experienced (but non-expert) adult subjects who claim to use lip-reading in their speech recognition process. The results, which are comparisons of word error-rates on unprocessed and processed video, are mixed. We find that there appears to be the potential to improve the word error rate but, for the method to improve the intelligibility there is need for more sophisticated tracking and visual modelling. Our technique can also act as an expression or visual gesture amplifier and so has applications to animation and the presentation of information via avatars or synthetic humans.

  18. Quantifying the effect of disruptions to temporal coherence on the intelligibility of compressed American Sign Language video

    NASA Astrophysics Data System (ADS)

    Ciaramello, Frank M.; Hemami, Sheila S.

    2009-02-01

    Communication of American Sign Language (ASL) over mobile phones would be very beneficial to the Deaf community. ASL video encoded to achieve the rates provided by current cellular networks must be heavily compressed and appropriate assessment techniques are required to analyze the intelligibility of the compressed video. As an extension to a purely spatial measure of intelligibility, this paper quantifies the effect of temporal compression artifacts on sign language intelligibility. These artifacts can be the result of motion-compensation errors that distract the observer or frame rate reductions. They reduce the the perception of smooth motion and disrupt the temporal coherence of the video. Motion-compensation errors that affect temporal coherence are identified by measuring the block-level correlation between co-located macroblocks in adjacent frames. The impact of frame rate reductions was quantified through experimental testing. A subjective study was performed in which fluent ASL participants rated the intelligibility of sequences encoded at a range of 5 different frame rates and with 3 different levels of distortion. The subjective data is used to parameterize an objective intelligibility measure which is highly correlated with subjective ratings at multiple frame rates.

  19. Is a Genome a Codeword of an Error-Correcting Code?

    PubMed Central

    Kleinschmidt, João H.; Silva-Filho, Márcio C.; Bim, Edson; Herai, Roberto H.; Yamagishi, Michel E. B.; Palazzo, Reginaldo

    2012-01-01

    Since a genome is a discrete sequence, the elements of which belong to a set of four letters, the question as to whether or not there is an error-correcting code underlying DNA sequences is unavoidable. The most common approach to answering this question is to propose a methodology to verify the existence of such a code. However, none of the methodologies proposed so far, although quite clever, has achieved that goal. In a recent work, we showed that DNA sequences can be identified as codewords in a class of cyclic error-correcting codes known as Hamming codes. In this paper, we show that a complete intron-exon gene, and even a plasmid genome, can be identified as a Hamming code codeword as well. Although this does not constitute a definitive proof that there is an error-correcting code underlying DNA sequences, it is the first evidence in this direction. PMID:22649495

  20. BBMerge – Accurate paired shotgun read merging via overlap

    DOE PAGES

    Bushnell, Brian; Rood, Jonathan; Singer, Esther

    2017-10-26

    Merging paired-end shotgun reads generated on high-throughput sequencing platforms can substantially improve various subsequent bioinformatics processes, including genome assembly, binning, mapping, annotation, and clustering for taxonomic analysis. With the inexorable growth of sequence data volume and CPU core counts, the speed and scalability of read-processing tools becomes ever-more important. The accuracy of shotgun read merging is crucial as well, as errors introduced by incorrect merging percolate through to reduce the quality of downstream analysis. Thus, we designed a new tool to maximize accuracy and minimize processing time, allowing the use of read merging on larger datasets, and in analyses highlymore » sensitive to errors. We present BBMerge, a new merging tool for paired-end shotgun sequence data. We benchmark BBMerge by comparison with eight other widely used merging tools, assessing speed, accuracy and scalability. Evaluations of both synthetic and real-world datasets demonstrate that BBMerge produces merged shotgun reads with greater accuracy and at higher speed than any existing merging tool examined. BBMerge also provides the ability to merge non-overlapping shotgun read pairs by using k-mer frequency information to assemble the unsequenced gap between reads, achieving a significantly higher merge rate while maintaining or increasing accuracy.« less

  1. BBMerge – Accurate paired shotgun read merging via overlap

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bushnell, Brian; Rood, Jonathan; Singer, Esther

    Merging paired-end shotgun reads generated on high-throughput sequencing platforms can substantially improve various subsequent bioinformatics processes, including genome assembly, binning, mapping, annotation, and clustering for taxonomic analysis. With the inexorable growth of sequence data volume and CPU core counts, the speed and scalability of read-processing tools becomes ever-more important. The accuracy of shotgun read merging is crucial as well, as errors introduced by incorrect merging percolate through to reduce the quality of downstream analysis. Thus, we designed a new tool to maximize accuracy and minimize processing time, allowing the use of read merging on larger datasets, and in analyses highlymore » sensitive to errors. We present BBMerge, a new merging tool for paired-end shotgun sequence data. We benchmark BBMerge by comparison with eight other widely used merging tools, assessing speed, accuracy and scalability. Evaluations of both synthetic and real-world datasets demonstrate that BBMerge produces merged shotgun reads with greater accuracy and at higher speed than any existing merging tool examined. BBMerge also provides the ability to merge non-overlapping shotgun read pairs by using k-mer frequency information to assemble the unsequenced gap between reads, achieving a significantly higher merge rate while maintaining or increasing accuracy.« less

  2. Complete chloroplast genome sequence of MD-2 pineapple and its comparative analysis among nine other plants from the subclass Commelinidae.

    PubMed

    Redwan, R M; Saidin, A; Kumar, S V

    2015-08-12

    Pineapple (Ananas comosus var. comosus) is known as the king of fruits for its crown and is the third most important tropical fruit after banana and citrus. The plant, which is indigenous to South America, is the most important species in the Bromeliaceae family and is largely traded for fresh fruit consumption. Here, we report the complete chloroplast sequence of the MD-2 pineapple that was sequenced using the PacBio sequencing technology. In this study, the high error rate of PacBio long sequence reads of A. comosus's total genomic DNA were improved by leveraging on the high accuracy but short Illumina reads for error-correction via the latest error correction module from Novocraft. Error corrected long PacBio reads were assembled by using a single tool to produce a contig representing the pineapple chloroplast genome. The genome of 159,636 bp in length is featured with the conserved quadripartite structure of chloroplast containing a large single copy region (LSC) with a size of 87,482 bp, a small single copy region (SSC) with a size of 18,622 bp and two inverted repeat regions (IRA and IRB) each with the size of 26,766 bp. Overall, the genome contained 117 unique coding regions and 30 were repeated in the IR region with its genes contents, structure and arrangement similar to its sister taxon, Typha latifolia. A total of 35 repeats structure were detected in both the coding and non-coding regions with a majority being tandem repeats. In addition, 205 SSRs were detected in the genome with six protein-coding genes contained more than two SSRs. Comparative chloroplast genomes from the subclass Commelinidae revealed a conservative protein coding gene albeit located in a highly divergence region. Analysis of selection pressure on protein-coding genes using Ka/Ks ratio showed significant positive selection exerted on the rps7 gene of the pineapple chloroplast with P less than 0.05. Phylogenetic analysis confirmed the recent taxonomical relation among the member of commelinids which support the monophyly relationship between Arecales and Dasypogonaceae and between Zingiberales to the Poales, which includes the A. comosus. The complete sequence of the chloroplast of pineapple provides insights to the divergence of genic chloroplast sequences from the members of the subclass Commelinidae. The complete pineapple chloroplast will serve as a reference for in-depth taxonomical studies in the Bromeliaceae family when more species under the family are sequenced in the future. The genetic sequence information will also make feasible other molecular applications of the pineapple chloroplast for plant genetic improvement.

  3. A statistical method for the detection of variants from next-generation resequencing of DNA pools.

    PubMed

    Bansal, Vikas

    2010-06-15

    Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing. We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80-85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3-5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP. Implementation of this method is available at http://polymorphism.scripps.edu/~vbansal/software/CRISP/.

  4. Identification and characterization of mutant clones with enhanced propagation rates from phage-displayed peptide libraries.

    PubMed

    Nguyen, Kieu T H; Adamkiewicz, Marta A; Hebert, Lauren E; Zygiel, Emily M; Boyle, Holly R; Martone, Christina M; Meléndez-Ríos, Carola B; Noren, Karen A; Noren, Christopher J; Hall, Marilena Fitzsimons

    2014-10-01

    A target-unrelated peptide (TUP) can arise in phage display selection experiments as a result of a propagation advantage exhibited by the phage clone displaying the peptide. We previously characterized HAIYPRH, from the M13-based Ph.D.-7 phage display library, as a propagation-related TUP resulting from a G→A mutation in the Shine-Dalgarno sequence of gene II. This mutant was shown to propagate in Escherichia coli at a dramatically faster rate than phage bearing the wild-type Shine-Dalgarno sequence. We now report 27 additional fast-propagating clones displaying 24 different peptides and carrying 14 unique mutations. Most of these mutations are found either in or upstream of the gene II Shine-Dalgarno sequence, but still within the mRNA transcript of gene II. All 27 clones propagate at significantly higher rates than normal library phage, most within experimental error of wild-type M13 propagation, suggesting that mutations arise to compensate for the reduced virulence caused by the insertion of a lacZα cassette proximal to the replication origin of the phage used to construct the library. We also describe an efficient and convenient assay to diagnose propagation-related TUPS among peptide sequences selected by phage display. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  5. Critical Mutation Rate Has an Exponential Dependence on Population Size in Haploid and Diploid Populations

    PubMed Central

    Aston, Elizabeth; Channon, Alastair; Day, Charles; Knight, Christopher G.

    2013-01-01

    Understanding the effect of population size on the key parameters of evolution is particularly important for populations nearing extinction. There are evolutionary pressures to evolve sequences that are both fit and robust. At high mutation rates, individuals with greater mutational robustness can outcompete those with higher fitness. This is survival-of-the-flattest, and has been observed in digital organisms, theoretically, in simulated RNA evolution, and in RNA viruses. We introduce an algorithmic method capable of determining the relationship between population size, the critical mutation rate at which individuals with greater robustness to mutation are favoured over individuals with greater fitness, and the error threshold. Verification for this method is provided against analytical models for the error threshold. We show that the critical mutation rate for increasing haploid population sizes can be approximated by an exponential function, with much lower mutation rates tolerated by small populations. This is in contrast to previous studies which identified that critical mutation rate was independent of population size. The algorithm is extended to diploid populations in a system modelled on the biological process of meiosis. The results confirm that the relationship remains exponential, but show that both the critical mutation rate and error threshold are lower for diploids, rather than higher as might have been expected. Analyzing the transition from critical mutation rate to error threshold provides an improved definition of critical mutation rate. Natural populations with their numbers in decline can be expected to lose genetic material in line with the exponential model, accelerating and potentially irreversibly advancing their decline, and this could potentially affect extinction, recovery and population management strategy. The effect of population size is particularly strong in small populations with 100 individuals or less; the exponential model has significant potential in aiding population management to prevent local (and global) extinction events. PMID:24386200

  6. Using expected sequence features to improve basecalling accuracy of amplicon pyrosequencing data.

    PubMed

    Rask, Thomas S; Petersen, Bent; Chen, Donald S; Day, Karen P; Pedersen, Anders Gorm

    2016-04-22

    Amplicon pyrosequencing targets a known genetic region and thus inherently produces reads highly anticipated to have certain features, such as conserved nucleotide sequence, and in the case of protein coding DNA, an open reading frame. Pyrosequencing errors, consisting mainly of nucleotide insertions and deletions, are on the other hand likely to disrupt open reading frames. Such an inverse relationship between errors and expectation based on prior knowledge can be used advantageously to guide the process known as basecalling, i.e. the inference of nucleotide sequence from raw sequencing data. The new basecalling method described here, named Multipass, implements a probabilistic framework for working with the raw flowgrams obtained by pyrosequencing. For each sequence variant Multipass calculates the likelihood and nucleotide sequence of several most likely sequences given the flowgram data. This probabilistic approach enables integration of basecalling into a larger model where other parameters can be incorporated, such as the likelihood for observing a full-length open reading frame at the targeted region. We apply the method to 454 amplicon pyrosequencing data obtained from a malaria virulence gene family, where Multipass generates 20 % more error-free sequences than current state of the art methods, and provides sequence characteristics that allow generation of a set of high confidence error-free sequences. This novel method can be used to increase accuracy of existing and future amplicon sequencing data, particularly where extensive prior knowledge is available about the obtained sequences, for example in analysis of the immunoglobulin VDJ region where Multipass can be combined with a model for the known recombining germline genes. Multipass is available for Roche 454 data at http://www.cbs.dtu.dk/services/MultiPass-1.0 , and the concept can potentially be implemented for other sequencing technologies as well.

  7. Source-Adaptation-Based Wireless Video Transport: A Cross-Layer Approach

    NASA Astrophysics Data System (ADS)

    Qu, Qi; Pei, Yong; Modestino, James W.; Tian, Xusheng

    2006-12-01

    Real-time packet video transmission over wireless networks is expected to experience bursty packet losses that can cause substantial degradation to the transmitted video quality. In wireless networks, channel state information is hard to obtain in a reliable and timely manner due to the rapid change of wireless environments. However, the source motion information is always available and can be obtained easily and accurately from video sequences. Therefore, in this paper, we propose a novel cross-layer framework that exploits only the motion information inherent in video sequences and efficiently combines a packetization scheme, a cross-layer forward error correction (FEC)-based unequal error protection (UEP) scheme, an intracoding rate selection scheme as well as a novel intraframe interleaving scheme. Our objective and subjective results demonstrate that the proposed approach is very effective in dealing with the bursty packet losses occurring on wireless networks without incurring any additional implementation complexity or delay. Thus, the simplicity of our proposed system has important implications for the implementation of a practical real-time video transmission system.

  8. Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster

    PubMed Central

    Zhu, Yuan; Bergland, Alan O.; González, Josefa; Petrov, Dmitri A.

    2012-01-01

    The sequencing of pooled non-barcoded individuals is an inexpensive and efficient means of assessing genome-wide population allele frequencies, yet its accuracy has not been thoroughly tested. We assessed the accuracy of this approach on whole, complex eukaryotic genomes by resequencing pools of largely isogenic, individually sequenced Drosophila melanogaster strains. We called SNPs in the pooled data and estimated false positive and false negative rates using the SNPs called in individual strain as a reference. We also estimated allele frequency of the SNPs using “pooled” data and compared them with “true” frequencies taken from the estimates in the individual strains. We demonstrate that pooled sequencing provides a faithful estimate of population allele frequency with the error well approximated by binomial sampling, and is a reliable means of novel SNP discovery with low false positive rates. However, a sufficient number of strains should be used in the pooling because variation in the amount of DNA derived from individual strains is a substantial source of noise when the number of pooled strains is low. Our results and analysis confirm that pooled sequencing is a very powerful and cost-effective technique for assessing of patterns of sequence variation in populations on genome-wide scales, and is applicable to any dataset where sequencing individuals or individual cells is impossible, difficult, time consuming, or expensive. PMID:22848651

  9. Perceptually tuned low-bit-rate video codec for ATM networks

    NASA Astrophysics Data System (ADS)

    Chou, Chun-Hsien

    1996-02-01

    In order to maintain high visual quality in transmitting low bit-rate video signals over asynchronous transfer mode (ATM) networks, a layered coding scheme that incorporates the human visual system (HVS), motion compensation (MC), and conditional replenishment (CR) is presented in this paper. An empirical perceptual model is proposed to estimate the spatio- temporal just-noticeable distortion (STJND) profile for each frame, by which perceptually important (PI) prediction-error signals can be located. Because of the limited channel capacity of the base layer, only coded data of motion vectors, the PI signals within a small strip of the prediction-error image and, if there are remaining bits, the PI signals outside the strip are transmitted by the cells of the base-layer channel. The rest of the coded data are transmitted by the second-layer cells which may be lost due to channel error or network congestion. Simulation results show that visual quality of the reconstructed CIF sequence is acceptable when the capacity of the base-layer channel is allocated with 2 multiplied by 64 kbps and the cells of the second layer are all lost.

  10. Consistency of gene starts among Burkholderia genomes

    PubMed Central

    2011-01-01

    Background Evolutionary divergence in the position of the translational start site among orthologous genes can have significant functional impacts. Divergence can alter the translation rate, degradation rate, subcellular location, and function of the encoded proteins. Results Existing Genbank gene maps for Burkholderia genomes suggest that extensive divergence has occurred--53% of ortholog sets based on Genbank gene maps had inconsistent gene start sites. However, most of these inconsistencies appear to be gene-calling errors. Evolutionary divergence was the most plausible explanation for only 17% of the ortholog sets. Correcting probable errors in the Genbank gene maps decreased the percentage of ortholog sets with inconsistent starts by 68%, increased the percentage of ortholog sets with extractable upstream intergenic regions by 32%, increased the sequence similarity of intergenic regions and predicted proteins, and increased the number of proteins with identifiable signal peptides. Conclusions Our findings highlight an emerging problem in comparative genomics: single-digit percent errors in gene predictions can lead to double-digit percentages of inconsistent ortholog sets. The work demonstrates a simple approach to evaluate and improve the quality of gene maps. PMID:21342528

  11. BAUM: improving genome assembly by adaptive unique mapping and local overlap-layout-consensus approach.

    PubMed

    Wang, Anqi; Wang, Zhanyu; Li, Zheng; Li, Lei M

    2018-06-15

    It is highly desirable to assemble genomes of high continuity and consistency at low cost. The current bottleneck of draft genome continuity using the second generation sequencing (SGS) reads is primarily caused by uncertainty among repetitive sequences. Even though the single-molecule real-time sequencing technology is very promising to overcome the uncertainty issue, its relatively high cost and error rate add burden on budget or computation. Many long-read assemblers take the overlap-layout-consensus (OLC) paradigm, which is less sensitive to sequencing errors, heterozygosity and variability of coverage. However, current assemblers of SGS data do not sufficiently take advantage of the OLC approach. Aiming at minimizing uncertainty, the proposed method BAUM, breaks the whole genome into regions by adaptive unique mapping; then the local OLC is used to assemble each region in parallel. BAUM can (i) perform reference-assisted assembly based on the genome of a close species (ii) or improve the results of existing assemblies that are obtained based on short or long sequencing reads. The tests on two eukaryote genomes, a wild rice Oryza longistaminata and a parrot Melopsittacus undulatus, show that BAUM achieved substantial improvement on genome size and continuity. Besides, BAUM reconstructed a considerable amount of repetitive regions that failed to be assembled by existing short read assemblers. We also propose statistical approaches to control the uncertainty in different steps of BAUM. http://www.zhanyuwang.xin/wordpress/index.php/2017/07/21/baum. Supplementary data are available at Bioinformatics online.

  12. Customisation of the exome data analysis pipeline using a combinatorial approach.

    PubMed

    Pattnaik, Swetansu; Vaidyanathan, Srividya; Pooja, Durgad G; Deepak, Sa; Panda, Binay

    2012-01-01

    The advent of next generation sequencing (NGS) technologies have revolutionised the way biologists produce, analyse and interpret data. Although NGS platforms provide a cost-effective way to discover genome-wide variants from a single experiment, variants discovered by NGS need follow up validation due to the high error rates associated with various sequencing chemistries. Recently, whole exome sequencing has been proposed as an affordable option compared to whole genome runs but it still requires follow up validation of all the novel exomic variants. Customarily, a consensus approach is used to overcome the systematic errors inherent to the sequencing technology, alignment and post alignment variant detection algorithms. However, the aforementioned approach warrants the use of multiple sequencing chemistry, multiple alignment tools, multiple variant callers which may not be viable in terms of time and money for individual investigators with limited informatics know-how. Biologists often lack the requisite training to deal with the huge amount of data produced by NGS runs and face difficulty in choosing from the list of freely available analytical tools for NGS data analysis. Hence, there is a need to customise the NGS data analysis pipeline to preferentially retain true variants by minimising the incidence of false positives and make the choice of right analytical tools easier. To this end, we have sampled different freely available tools used at the alignment and post alignment stage suggesting the use of the most suitable combination determined by a simple framework of pre-existing metrics to create significant datasets.

  13. cyclostratigraphy, sequence stratigraphy and organic matter accumulation mechanism

    NASA Astrophysics Data System (ADS)

    Cong, F.; Li, J.

    2016-12-01

    The first member of Maokou Formation of Sichuan basin is composed of well preserved carbonate ramp couplets of limestone and marlstone/shale. It acts as one of the potential shale gas source rock, and is suitable for time-series analysis. We conducted time-series analysis to identify high-frequency sequences, reconstruct high-resolution sedimentation rate, estimate detailed primary productivity for the first time in the study intervals and discuss organic matter accumulation mechanism of source rock under sequence stratigraphic framework.Using the theory of cyclostratigraphy and sequence stratigraphy, the high-frequency sequences of one outcrop profile and one drilling well are identified. Two third-order sequences and eight fourth-order sequences are distinguished on outcrop profile based on the cycle stacking patterns. For drilling well, sequence boundary and four system tracts is distinguished by "integrated prediction error filter analysis" (INPEFA) of Gamma-ray logging data, and eight fourth-order sequences is identified by 405ka long eccentricity curve in depth domain which is quantified and filtered by integrated analysis of MTM spectral analysis, evolutive harmonic analysis (EHA), evolutive average spectral misfit (eASM) and band-pass filtering. It suggests that high-frequency sequences correlate well with Milankovitch orbital signals recorded in sediments, and it is applicable to use cyclostratigraphy theory in dividing high-frequency(4-6 orders) sequence stratigraphy.High-resolution sedimentation rate is reconstructed through the study interval by tracking the highly statistically significant short eccentricity component (123ka) revealed by EHA. Based on sedimentation rate, measured TOC and density data, the burial flux, delivery flux and primary productivity of organic carbon was estimated. By integrating redox proxies, we can discuss the controls on organic matter accumulation by primary production and preservation under the high-resolution sequence stratigraphic framework. Results show that high average organic carbon contents in the study interval are mainly attributed to high primary production. The results also show a good correlation between high organic carbon accumulation and intervals of transgression.

  14. Trellises and Trellis-Based Decoding Algorithms for Linear Block Codes. Part 3; The Map and Related Decoding Algirithms

    NASA Technical Reports Server (NTRS)

    Lin, Shu; Fossorier, Marc

    1998-01-01

    In a coded communication system with equiprobable signaling, MLD minimizes the word error probability and delivers the most likely codeword associated with the corresponding received sequence. This decoding has two drawbacks. First, minimization of the word error probability is not equivalent to minimization of the bit error probability. Therefore, MLD becomes suboptimum with respect to the bit error probability. Second, MLD delivers a hard-decision estimate of the received sequence, so that information is lost between the input and output of the ML decoder. This information is important in coded schemes where the decoded sequence is further processed, such as concatenated coding schemes, multi-stage and iterative decoding schemes. In this chapter, we first present a decoding algorithm which both minimizes bit error probability, and provides the corresponding soft information at the output of the decoder. This algorithm is referred to as the MAP (maximum aposteriori probability) decoding algorithm.

  15. BETASEQ: a powerful novel method to control type-I error inflation in partially sequenced data for rare variant association testing.

    PubMed

    Yan, Song; Li, Yun

    2014-02-15

    Despite its great capability to detect rare variant associations, next-generation sequencing is still prohibitively expensive when applied to large samples. In case-control studies, it is thus appealing to sequence only a subset of cases to discover variants and genotype the identified variants in controls and the remaining cases under the reasonable assumption that causal variants are usually enriched among cases. However, this approach leads to inflated type-I error if analyzed naively for rare variant association. Several methods have been proposed in recent literature to control type-I error at the cost of either excluding some sequenced cases or correcting the genotypes of discovered rare variants. All of these approaches thus suffer from certain extent of information loss and thus are underpowered. We propose a novel method (BETASEQ), which corrects inflation of type-I error by supplementing pseudo-variants while keeps the original sequence and genotype data intact. Extensive simulations and real data analysis demonstrate that, in most practical situations, BETASEQ leads to higher testing powers than existing approaches with guaranteed (controlled or conservative) type-I error. BETASEQ and associated R files, including documentation, examples, are available at http://www.unc.edu/~yunmli/betaseq

  16. Accurate Sample Assignment in a Multiplexed, Ultrasensitive, High-Throughput Sequencing Assay for Minimal Residual Disease.

    PubMed

    Bartram, Jack; Mountjoy, Edward; Brooks, Tony; Hancock, Jeremy; Williamson, Helen; Wright, Gary; Moppett, John; Goulden, Nick; Hubank, Mike

    2016-07-01

    High-throughput sequencing (HTS) (next-generation sequencing) of the rearranged Ig and T-cell receptor genes promises to be less expensive and more sensitive than current methods of monitoring minimal residual disease (MRD) in patients with acute lymphoblastic leukemia. However, the adoption of new approaches by clinical laboratories requires careful evaluation of all potential sources of error and the development of strategies to ensure the highest accuracy. Timely and efficient clinical use of HTS platforms will depend on combining multiple samples (multiplexing) in each sequencing run. Here we examine the Ig heavy-chain gene HTS on the Illumina MiSeq platform for MRD. We identify errors associated with multiplexing that could potentially impact the accuracy of MRD analysis. We optimize a strategy that combines high-purity, sequence-optimized oligonucleotides, dual indexing, and an error-aware demultiplexing approach to minimize errors and maximize sensitivity. We present a probability-based, demultiplexing pipeline Error-Aware Demultiplexer that is suitable for all MiSeq strategies and accurately assigns samples to the correct identifier without excessive loss of data. Finally, using controls quantified by digital PCR, we show that HTS-MRD can accurately detect as few as 1 in 10(6) copies of specific leukemic MRD. Crown Copyright © 2016. Published by Elsevier Inc. All rights reserved.

  17. Usefulness of Guided Breathing for Dose Rate-Regulated Tracking

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Han-Oh, Sarah; Department of Radiation Oncology, University of Maryland Medical System, Baltimore, MD; Yi, Byong Yong

    2009-02-01

    Purpose: To evaluate the usefulness of guided breathing for dose rate-regulated tracking (DRRT), a new technique to compensate for intrafraction tumor motion. Methods and Materials: DRRT uses a preprogrammed multileaf collimator sequence that tracks the tumor motion derived from four-dimensional computed tomography and the corresponding breathing signals measured before treatment. Because the multileaf collimator speed can be controlled by adjusting the dose rate, the multileaf collimator positions are adjusted in real time during treatment by dose rate regulation, thereby maintaining synchrony with the tumor motion. DRRT treatment was simulated with free, audio-guided, and audiovisual-guided breathing signals acquired from 23 lungmore » cancer patients. The tracking error and duty cycle for each patient were determined as a function of the system time delay (range, 0-1.0 s). Results: The tracking error and duty cycle averaged for all 23 patients was 1.9 {+-} 0.8 mm and 92% {+-} 5%, 1.9 {+-} 1.0 mm and 93% {+-} 6%, and 1.8 {+-} 0.7 mm and 92% {+-} 6% for the free, audio-guided, and audiovisual-guided breathing, respectively, for a time delay of 0.35 s. The small differences in both the tracking error and the duty cycle with guided breathing were not statistically significant. Conclusion: DRRT by its nature adapts well to variations in breathing frequency, which is also the motivation for guided-breathing techniques. Because of this redundancy, guided breathing does not result in significant improvements for either the tracking error or the duty cycle when DRRT is used for real-time tumor tracking.« less

  18. Multi-modulus algorithm based on global artificial fish swarm intelligent optimization of DNA encoding sequences.

    PubMed

    Guo, Y C; Wang, H; Wu, H P; Zhang, M Q

    2015-12-21

    Aimed to address the defects of the large mean square error (MSE), and the slow convergence speed in equalizing the multi-modulus signals of the constant modulus algorithm (CMA), a multi-modulus algorithm (MMA) based on global artificial fish swarm (GAFS) intelligent optimization of DNA encoding sequences (GAFS-DNA-MMA) was proposed. To improve the convergence rate and reduce the MSE, this proposed algorithm adopted an encoding method based on DNA nucleotide chains to provide a possible solution to the problem. Furthermore, the GAFS algorithm, with its fast convergence and global search ability, was used to find the best sequence. The real and imaginary parts of the initial optimal weight vector of MMA were obtained through DNA coding of the best sequence. The simulation results show that the proposed algorithm has a faster convergence speed and smaller MSE in comparison with the CMA, the MMA, and the AFS-DNA-MMA.

  19. Mapping RNA-seq Reads with STAR

    PubMed Central

    Dobin, Alexander; Gingeras, Thomas R.

    2015-01-01

    Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, signal visualization, and so forth. In this unit we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is Open Source software that can be run on Unix, Linux or Mac OS X systems. PMID:26334920

  20. Mapping RNA-seq Reads with STAR.

    PubMed

    Dobin, Alexander; Gingeras, Thomas R

    2015-09-03

    Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates, providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, and signal visualization. In this unit, we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is open source software that can be run on Unix, Linux, or Mac OS X systems. Copyright © 2015 John Wiley & Sons, Inc.

  1. Evaluating approaches to find exon chains based on long reads.

    PubMed

    Kuosmanen, Anna; Norri, Tuukka; Mäkinen, Veli

    2018-05-01

    Transcript prediction can be modeled as a graph problem where exons are modeled as nodes and reads spanning two or more exons are modeled as exon chains. Pacific Biosciences third-generation sequencing technology produces significantly longer reads than earlier second-generation sequencing technologies, which gives valuable information about longer exon chains in a graph. However, with the high error rates of third-generation sequencing, aligning long reads correctly around the splice sites is a challenging task. Incorrect alignments lead to spurious nodes and arcs in the graph, which in turn lead to incorrect transcript predictions. We survey several approaches to find the exon chains corresponding to long reads in a splicing graph, and experimentally study the performance of these methods using simulated data to allow for sensitivity/precision analysis. Our experiments show that short reads from second-generation sequencing can be used to significantly improve exon chain correctness either by error-correcting the long reads before splicing graph creation, or by using them to create a splicing graph on which the long-read alignments are then projected. We also study the memory and time consumption of various modules, and show that accurate exon chains lead to significantly increased transcript prediction accuracy. The simulated data and in-house scripts used for this article are available at http://www.cs.helsinki.fi/group/gsa/exon-chains/exon-chains-bib.tar.bz2.

  2. Comparison of cluster-based and source-attribution methods for estimating transmission risk using large HIV sequence databases.

    PubMed

    Le Vu, Stéphane; Ratmann, Oliver; Delpech, Valerie; Brown, Alison E; Gill, O Noel; Tostevin, Anna; Fraser, Christophe; Volz, Erik M

    2018-06-01

    Phylogenetic clustering of HIV sequences from a random sample of patients can reveal epidemiological transmission patterns, but interpretation is hampered by limited theoretical support and statistical properties of clustering analysis remain poorly understood. Alternatively, source attribution methods allow fitting of HIV transmission models and thereby quantify aspects of disease transmission. A simulation study was conducted to assess error rates of clustering methods for detecting transmission risk factors. We modeled HIV epidemics among men having sex with men and generated phylogenies comparable to those that can be obtained from HIV surveillance data in the UK. Clustering and source attribution approaches were applied to evaluate their ability to identify patient attributes as transmission risk factors. We find that commonly used methods show a misleading association between cluster size or odds of clustering and covariates that are correlated with time since infection, regardless of their influence on transmission. Clustering methods usually have higher error rates and lower sensitivity than source attribution method for identifying transmission risk factors. But neither methods provide robust estimates of transmission risk ratios. Source attribution method can alleviate drawbacks from phylogenetic clustering but formal population genetic modeling may be required to estimate quantitative transmission risk factors. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Heng, E-mail: hengli@mdanderson.org; Zhu, X. Ronald; Zhang, Xiaodong

    Purpose: To develop and validate a novel delivery strategy for reducing the respiratory motion–induced dose uncertainty of spot-scanning proton therapy. Methods and Materials: The spot delivery sequence was optimized to reduce dose uncertainty. The effectiveness of the delivery sequence optimization was evaluated using measurements and patient simulation. One hundred ninety-one 2-dimensional measurements using different delivery sequences of a single-layer uniform pattern were obtained with a detector array on a 1-dimensional moving platform. Intensity modulated proton therapy plans were generated for 10 lung cancer patients, and dose uncertainties for different delivery sequences were evaluated by simulation. Results: Without delivery sequence optimization,more » the maximum absolute dose error can be up to 97.2% in a single measurement, whereas the optimized delivery sequence results in a maximum absolute dose error of ≤11.8%. In patient simulation, the optimized delivery sequence reduces the mean of fractional maximum absolute dose error compared with the regular delivery sequence by 3.3% to 10.6% (32.5-68.0% relative reduction) for different patients. Conclusions: Optimizing the delivery sequence can reduce dose uncertainty due to respiratory motion in spot-scanning proton therapy, assuming the 4-dimensional CT is a true representation of the patients' breathing patterns.« less

  4. Iterative Overlap FDE for Multicode DS-CDMA

    NASA Astrophysics Data System (ADS)

    Takeda, Kazuaki; Tomeba, Hiromichi; Adachi, Fumiyuki

    Recently, a new frequency-domain equalization (FDE) technique, called overlap FDE, that requires no GI insertion was proposed. However, the residual inter/intra-block interference (IBI) cannot completely be removed. In addition to this, for multicode direct sequence code division multiple access (DS-CDMA), the presence of residual interchip interference (ICI) after FDE distorts orthogonality among the spreading codes. In this paper, we propose an iterative overlap FDE for multicode DS-CDMA to suppress both the residual IBI and the residual ICI. In the iterative overlap FDE, joint minimum mean square error (MMSE)-FDE and ICI cancellation is repeated a sufficient number of times. The bit error rate (BER) performance with the iterative overlap FDE is evaluated by computer simulation.

  5. Single molecule counting and assessment of random molecular tagging errors with transposable giga-scale error-correcting barcodes.

    PubMed

    Lau, Billy T; Ji, Hanlee P

    2017-09-21

    RNA-Seq measures gene expression by counting sequence reads belonging to unique cDNA fragments. Molecular barcodes commonly in the form of random nucleotides were recently introduced to improve gene expression measures by detecting amplification duplicates, but are susceptible to errors generated during PCR and sequencing. This results in false positive counts, leading to inaccurate transcriptome quantification especially at low input and single-cell RNA amounts where the total number of molecules present is minuscule. To address this issue, we demonstrated the systematic identification of molecular species using transposable error-correcting barcodes that are exponentially expanded to tens of billions of unique labels. We experimentally showed random-mer molecular barcodes suffer from substantial and persistent errors that are difficult to resolve. To assess our method's performance, we applied it to the analysis of known reference RNA standards. By including an inline random-mer molecular barcode, we systematically characterized the presence of sequence errors in random-mer molecular barcodes. We observed that such errors are extensive and become more dominant at low input amounts. We described the first study to use transposable molecular barcodes and its use for studying random-mer molecular barcode errors. Extensive errors found in random-mer molecular barcodes may warrant the use of error correcting barcodes for transcriptome analysis as input amounts decrease.

  6. Single molecule sequencing-guided scaffolding and correction of draft assemblies.

    PubMed

    Zhu, Shenglong; Chen, Danny Z; Emrich, Scott J

    2017-12-06

    Although single molecule sequencing is still improving, the lengths of the generated sequences are inevitably an advantage in genome assembly. Prior work that utilizes long reads to conduct genome assembly has mostly focused on correcting sequencing errors and improving contiguity of de novo assemblies. We propose a disassembling-reassembling approach for both correcting structural errors in the draft assembly and scaffolding a target assembly based on error-corrected single molecule sequences. To achieve this goal, we formulate a maximum alternating path cover problem. We prove that this problem is NP-hard, and solve it by a 2-approximation algorithm. Our experimental results show that our approach can improve the structural correctness of target assemblies in the cost of some contiguity, even with smaller amounts of long reads. In addition, our reassembling process can also serve as a competitive scaffolder relative to well-established assembly benchmarks.

  7. Spatial serial order processing in schizophrenia.

    PubMed

    Fraser, David; Park, Sohee; Clark, Gina; Yohanna, Daniel; Houk, James C

    2004-10-01

    The aim of this study was to examine serial order processing deficits in 21 schizophrenia patients and 16 age- and education-matched healthy controls. In a spatial serial order working memory task, one to four spatial targets were presented in a randomized sequence. Subjects were required to remember the locations and the order in which the targets were presented. Patients showed a marked deficit in ability to remember the sequences compared with controls. Increasing the number of targets within a sequence resulted in poorer memory performance for both control and schizophrenia subjects, but the effect was much more pronounced in the patients. Targets presented at the end of a long sequence were more vulnerable to memory error in schizophrenia patients. Performance deficits were not attributable to motor errors, but to errors in target choice. The results support the idea that the memory errors seen in schizophrenia patients may be due to saturating the working memory network at relatively low levels of memory load.

  8. The performance of the Congruence Among Distance Matrices (CADM) test in phylogenetic analysis

    PubMed Central

    2011-01-01

    Background CADM is a statistical test used to estimate the level of Congruence Among Distance Matrices. It has been shown in previous studies to have a correct rate of type I error and good power when applied to dissimilarity matrices and to ultrametric distance matrices. Contrary to most other tests of incongruence used in phylogenetic analysis, the null hypothesis of the CADM test assumes complete incongruence of the phylogenetic trees instead of congruence. In this study, we performed computer simulations to assess the type I error rate and power of the test. It was applied to additive distance matrices representing phylogenies and to genetic distance matrices obtained from nucleotide sequences of different lengths that were simulated on randomly generated trees of varying sizes, and under different evolutionary conditions. Results Our results showed that the test has an accurate type I error rate and good power. As expected, power increased with the number of objects (i.e., taxa), the number of partially or completely congruent matrices and the level of congruence among distance matrices. Conclusions Based on our results, we suggest that CADM is an excellent candidate to test for congruence and, when present, to estimate its level in phylogenomic studies where numerous genes are analysed simultaneously. PMID:21388552

  9. Mutagenic cost of ribonucleotides in bacterial DNA

    PubMed Central

    Schroeder, Jeremy W.; Randall, Justin R.; Hirst, William G.; O’Donnell, Michael E.; Simmons, Lyle A.

    2017-01-01

    Replicative DNA polymerases misincorporate ribonucleoside triphosphates (rNTPs) into DNA approximately once every 2,000 base pairs synthesized. Ribonucleotide excision repair (RER) removes ribonucleoside monophosphates (rNMPs) from genomic DNA, replacing the error with the appropriate deoxyribonucleoside triphosphate (dNTP). Ribonucleotides represent a major threat to genome integrity with the potential to cause strand breaks. Furthermore, it has been shown in the bacterium Bacillus subtilis that loss of RER increases spontaneous mutagenesis. Despite the high rNTP error rate and the effect on genome integrity, the mechanism underlying mutagenesis in RER-deficient bacterial cells remains unknown. We performed mutation accumulation lines and genome-wide mutational profiling of B. subtilis lacking RNase HII, the enzyme that incises at single rNMP residues initiating RER. We show that loss of RER in B. subtilis causes strand- and sequence-context–dependent GC → AT transitions. Using purified proteins, we show that the replicative polymerase DnaE is mutagenic within the sequence context identified in RER-deficient cells. We also found that DnaE does not perform strand displacement synthesis. Given the use of nucleotide excision repair (NER) as a backup pathway for RER in RNase HII-deficient cells and the known mutagenic profile of DnaE, we propose that misincorporated ribonucleotides are removed by NER followed by error-prone resynthesis with DnaE. PMID:29078353

  10. High-Throughput Identification of Loss-of-Function Mutations for Anti-Interferon Activity in the Influenza A Virus NS Segment

    PubMed Central

    Wu, Nicholas C.; Young, Arthur P.; Al-Mawsawi, Laith Q.; Olson, C. Anders; Feng, Jun; Qi, Hangfei; Luan, Harding H.; Li, Xinmin; Wu, Ting-Ting

    2014-01-01

    ABSTRACT Viral proteins often display several functions which require multiple assays to dissect their genetic basis. Here, we describe a systematic approach to screen for loss-of-function mutations that confer a fitness disadvantage under a specified growth condition. Our methodology was achieved by genetically monitoring a mutant library under two growth conditions, with and without interferon, by deep sequencing. We employed a molecular tagging technique to distinguish true mutations from sequencing error. This approach enabled us to identify mutations that were negatively selected against, in addition to those that were positively selected for. Using this technique, we identified loss-of-function mutations in the influenza A virus NS segment that were sensitive to type I interferon in a high-throughput fashion. Mechanistic characterization further showed that a single substitution, D92Y, resulted in the inability of NS to inhibit RIG-I ubiquitination. The approach described in this study can be applied under any specified condition for any virus that can be genetically manipulated. IMPORTANCE Traditional genetics focuses on a single genotype-phenotype relationship, whereas high-throughput genetics permits phenotypic characterization of numerous mutants in parallel. High-throughput genetics often involves monitoring of a mutant library with deep sequencing. However, deep sequencing suffers from a high error rate (∼0.1 to 1%), which is usually higher than the occurrence frequency for individual point mutations within a mutant library. Therefore, only mutations that confer a fitness advantage can be identified with confidence due to an enrichment in the occurrence frequency. In contrast, it is impossible to identify deleterious mutations using most next-generation sequencing techniques. In this study, we have applied a molecular tagging technique to distinguish true mutations from sequencing errors. It enabled us to identify mutations that underwent negative selection, in addition to mutations that experienced positive selection. This study provides a proof of concept by screening for loss-of-function mutations on the influenza A virus NS segment that are involved in its anti-interferon activity. PMID:24965464

  11. Error reduction and parameter optimization of the TAPIR method for fast T1 mapping.

    PubMed

    Zaitsev, M; Steinhoff, S; Shah, N J

    2003-06-01

    A methodology is presented for the reduction of both systematic and random errors in T(1) determination using TAPIR, a Look-Locker-based fast T(1) mapping technique. The relations between various sequence parameters were carefully investigated in order to develop recipes for choosing optimal sequence parameters. Theoretical predictions for the optimal flip angle were verified experimentally. Inversion pulse imperfections were identified as the main source of systematic errors in T(1) determination with TAPIR. An effective remedy is demonstrated which includes extension of the measurement protocol to include a special sequence for mapping the inversion efficiency itself. Copyright 2003 Wiley-Liss, Inc.

  12. An Activation-Based Model of Routine Sequence Errors

    DTIC Science & Technology

    2015-04-01

    part of the ACT-R frame- work (e.g., Anderson, 1983), we adopt a newer, richer no- tion of priming as part of our approach ( Harrison & Trafton, 2010...2014). Other models of routine sequence errors, such as the in- teractive activation network ( IAN ) model (Cooper & Shal- lice, 2006) and the simple...error patterns that results from an interface layout shift. The ideas behind our expanded priming approach, however, could apply to IAN , which uses

  13. Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions

    PubMed Central

    2014-01-01

    Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads. PMID:24428920

  14. Evaluation of Magnetic Resonance Imaging-Compatible Needles and Interactive Sequences for Musculoskeletal Interventions Using an Open High-Field Magnetic Resonance Imaging Scanner

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wonneberger, Uta, E-mail: uta.wonneberger@charite.d; Schnackenburg, Bernhard, E-mail: bernhard.schnackenburg@philips.co; Streitparth, Florian, E-mail: florian.streitparth@charite.de

    2010-04-15

    In this article, we study in vitro evaluation of needle artefacts and image quality for musculoskeletal laser-interventions in an open high-field magnetic resonance imaging (MRI) scanner at 1.0T with vertical field orientation. Five commercially available MRI-compatible puncture needles were assessed based on artefact characteristics in a CuSO4 phantom (0.1%) and in human cadaveric lumbar spines. First, six different interventional sequences were evaluated with varying needle orientation to the main magnetic field B0 (0{sup o} to 90{sup o}) in a sequence test. Artefact width, needle-tip error, and contrast-to-noise ratio (CNR) were calculated. Second, a gradient-echo sequence used for thermometric monitoring wasmore » assessed and in varying echo times, artefact width, tip error, and signal-to-noise ratio (SNR) were measured. Artefact width and needle-tip error correlated with needle material, instrument orientation to B0, and sequence type. Fast spin-echo sequences produced the smallest needle artefacts for all needles, except for the carbon fibre needle (width <3.5 mm, tip error <2 mm) at 45{sup o} to B0. Overall, the proton density-weighted spin-echo sequences had the best CNR (CNR{sub Muscle/Needle} >16.8). Concerning the thermometric gradient echo sequence, artefacts remained <5 mm, and the SNR reached its maximum at an echo time of 15 ms. If needle materials and sequences are accordingly combined, guidance and monitoring of musculoskeletal laser interventions may be feasible in a vertical magnetic field at 1.0T.« less

  15. Surrogate oracles, generalized dependency and simpler models

    NASA Technical Reports Server (NTRS)

    Wilson, Larry

    1990-01-01

    Software reliability models require the sequence of interfailure times from the debugging process as input. It was previously illustrated that using data from replicated debugging could greatly improve reliability predictions. However, inexpensive replication of the debugging process requires the existence of a cheap, fast error detector. Laboratory experiments can be designed around a gold version which is used as an oracle or around an n-version error detector. Unfortunately, software developers can not be expected to have an oracle or to bear the expense of n-versions. A generic technique is being investigated for approximating replicated data by using the partially debugged software as a difference detector. It is believed that the failure rate of each fault has significant dependence on the presence or absence of other faults. Thus, in order to discuss a failure rate for a known fault, the presence or absence of each of the other known faults needs to be specified. Also, in simpler models which use shorter input sequences without sacrificing accuracy are of interest. In fact, a possible gain in performance is conjectured. To investigate these propositions, NASA computers running LIC (RTI) versions are used to generate data. This data will be used to label the debugging graph associated with each version. These labeled graphs will be used to test the utility of a surrogate oracle, to analyze the dependent nature of fault failure rates and to explore the feasibility of reliability models which use the data of only the most recent failures.

  16. Determinants of Base-Pair Substitution Patterns Revealed by Whole-Genome Sequencing of DNA Mismatch Repair Defective Escherichia coli.

    PubMed

    Foster, Patricia L; Niccum, Brittany A; Popodi, Ellen; Townes, Jesse P; Lee, Heewook; MohammedIsmail, Wazim; Tang, Haixu

    2018-06-15

    Mismatch repair (MMR) is a major contributor to replication fidelity, but its impact varies with sequence context and the nature of the mismatch. Mutation accumulation experiments followed by whole-genome sequencing of MMR-defective E. coli strains yielded ≈30,000 base-pair substitutions, revealing mutational patterns across the entire chromosome. The base-pair substitution spectrum was dominated by A:T > G:C transitions, which occurred predominantly at the center base of 5'N A C3'+5'G T N3' triplets. Surprisingly, growth on minimal medium or at low temperature attenuated these mutations. Mononucleotide runs were also hotspots for base-pair substitutions, and the rate at which these occurred increased with run length. Comparison with ≈2000 base-pair substitutions accumulated in MMR-proficient strains revealed that both kinds of hotspots appeared in the wild-type spectrum and so are likely to be sites of frequent replication errors. In MMR-defective strains transitions were strand biased, occurring twice as often when A and C rather than T and G were on the lagging-strand template. Loss of nucleotide diphosphate kinase increases the cellular concentration of dCTP, which resulted in increased rates of mutations due to misinsertion of C opposite A and T. In an mmr ndk double mutant strain, these mutations were more frequent when the template A and T were on the leading strand, suggesting that lagging-strand synthesis was more error-prone or less well corrected by proofreading than was leading strand synthesis. Copyright © 2018, Genetics.

  17. The genomic structure: proof of the role of non-coding DNA.

    PubMed

    Bouaynaya, Nidhal; Schonfeld, Dan

    2006-01-01

    We prove that the introns play the role of a decoy in absorbing mutations in the same way hollow uninhabited structures are used by the military to protect important installations. Our approach is based on a probability of error analysis, where errors are mutations which occur in the exon sequences. We derive the optimal exon length distribution, which minimizes the probability of error in the genome. Furthermore, to understand how can Nature generate the optimal distribution, we propose a diffusive random walk model for exon generation throughout evolution. This model results in an alpha stable exon length distribution, which is asymptotically equivalent to the optimal distribution. Experimental results show that both distributions accurately fit the real data. Given that introns also drive biological evolution by increasing the rate of unequal crossover between genes, we conclude that the role of introns is to maintain a genius balance between stability and adaptability in eukaryotic genomes.

  18. Intonation and dialog context as constraints for speech recognition.

    PubMed

    Taylor, P; King, S; Isard, S; Wright, H

    1998-01-01

    This paper describes a way of using intonation and dialog context to improve the performance of an automatic speech recognition (ASR) system. Our experiments were run on the DCIEM Maptask corpus, a corpus of spontaneous task-oriented dialog speech. This corpus has been tagged according to a dialog analysis scheme that assigns each utterance to one of 12 "move types," such as "acknowledge," "query-yes/no" or "instruct." Most ASR systems use a bigram language model to constrain the possible sequences of words that might be recognized. Here we use a separate bigram language model for each move type. We show that when the "correct" move-specific language model is used for each utterance in the test set, the word error rate of the recognizer drops. Of course when the recognizer is run on previously unseen data, it cannot know in advance what move type the speaker has just produced. To determine the move type we use an intonation model combined with a dialog model that puts constraints on possible sequences of move types, as well as the speech recognizer likelihoods for the different move-specific models. In the full recognition system, the combination of automatic move type recognition with the move specific language models reduces the overall word error rate by a small but significant amount when compared with a baseline system that does not take intonation or dialog acts into account. Interestingly, the word error improvement is restricted to "initiating" move types, where word recognition is important. In "response" move types, where the important information is conveyed by the move type itself--for example, positive versus negative response--there is no word error improvement, but recognition of the response types themselves is good. The paper discusses the intonation model, the language models, and the dialog model in detail and describes the architecture in which they are combined.

  19. Dynamically correcting two-qubit gates against any systematic logical error

    NASA Astrophysics Data System (ADS)

    Calderon Vargas, Fernando Antonio

    The reliability of quantum information processing depends on the ability to deal with noise and error in an efficient way. A significant source of error in many settings is coherent, systematic gate error. This work introduces a set of composite pulse sequences that generate maximally entangling gates and correct all systematic errors within the logical subspace to arbitrary order. These sequences are applica- ble for any two-qubit interaction Hamiltonian, and make no assumptions about the underlying noise mechanism except that it is constant on the timescale of the opera- tion. The prime use for our results will be in cases where one has limited knowledge of the underlying physical noise and control mechanisms, highly constrained control, or both. In particular, we apply these composite pulse sequences to the quantum system formed by two capacitively coupled singlet-triplet qubits, which is charac- terized by having constrained control and noise sources that are low frequency and of a non-Markovian nature.

  20. FACS single cell index sorting is highly reliable and determines immune phenotypes of clonally expanded T cells.

    PubMed

    Penter, Livius; Dietze, Kerstin; Bullinger, Lars; Westermann, Jörg; Rahn, Hans-Peter; Hansmann, Leo

    2018-03-14

    FACS index sorting allows the isolation of single cells with retrospective identification of each single cell's high-dimensional immune phenotype. We experimentally determine the error rate of index sorting and combine the technology with T cell receptor sequencing to identify clonal T cell expansion in aplastic anemia bone marrow as an example. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Breaking Lander-Waterman’s Coverage Bound

    PubMed Central

    Nashta-ali, Damoun; Motahari, Seyed Abolfazl; Hosseinkhalaj, Babak

    2016-01-01

    Lander-Waterman’s coverage bound establishes the total number of reads required to cover the whole genome of size G bases. In fact, their bound is a direct consequence of the well-known solution to the coupon collector’s problem which proves that for such genome, the total number of bases to be sequenced should be O(G ln G). Although the result leads to a tight bound, it is based on a tacit assumption that the set of reads are first collected through a sequencing process and then are processed through a computation process, i.e., there are two different machines: one for sequencing and one for processing. In this paper, we present a significant improvement compared to Lander-Waterman’s result and prove that by combining the sequencing and computing processes, one can re-sequence the whole genome with as low as O(G) sequenced bases in total. Our approach also dramatically reduces the required computational power for the combined process. Simulation results are performed on real genomes with different sequencing error rates. The results support our theory predicting the log G improvement on coverage bound and corresponding reduction in the total number of bases required to be sequenced. PMID:27806058

  2. Atropos: specific, sensitive, and speedy trimming of sequencing reads.

    PubMed

    Didion, John P; Martin, Marcel; Collins, Francis S

    2017-01-01

    A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Atropos is open source and free software written in Python (3.3+) and available at https://github.com/jdidion/atropos.

  3. Atropos: specific, sensitive, and speedy trimming of sequencing reads

    PubMed Central

    Collins, Francis S.

    2017-01-01

    A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Atropos is open source and free software written in Python (3.3+) and available at https://github.com/jdidion/atropos. PMID:28875074

  4. Interuser Interference Analysis for Direct-Sequence Spread-Spectrum Systems Part I: Partial-Period Cross-Correlation

    NASA Technical Reports Server (NTRS)

    Ni, Jianjun (David)

    2012-01-01

    This presentation discusses an analysis approach to evaluate the interuser interference for Direct-Sequence Spread-Spectrum (DSSS) Systems for Space Network (SN) Users. Part I of this analysis shows that the correlation property of pseudo noise (PN) sequences is the critical factor which determines the interuser interference performance of the DSSS system. For non-standard DSSS systems in which PN sequence s period is much larger than one data symbol duration, it is the partial-period cross-correlation that determines the system performance. This study reveals through an example that a well-designed PN sequence set (e.g. Gold Sequence, in which the cross-correlation for a whole-period is well controlled) may have non-controlled partial-period cross-correlation which could cause severe interuser interference for a DSSS system. Since the analytical derivation of performance metric (bit error rate or signal-to-noise ratio) based on partial-period cross-correlation is prohibitive, the performance degradation due to partial-period cross-correlation will be evaluated using simulation in Part II of this analysis in the future.

  5. Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample.

    PubMed

    Luo, Chengwei; Tsementzi, Despina; Kyrpides, Nikos; Read, Timothy; Konstantinidis, Konstantinos T

    2012-01-01

    Next-generation sequencing (NGS) is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA) II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage) correlated highly between the two platforms (R(2)>0.9). Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.

  6. Research on wind field algorithm of wind lidar based on BP neural network and grey prediction

    NASA Astrophysics Data System (ADS)

    Chen, Yong; Chen, Chun-Li; Luo, Xiong; Zhang, Yan; Yang, Ze-hou; Zhou, Jie; Shi, Xiao-ding; Wang, Lei

    2018-01-01

    This paper uses the BP neural network and grey algorithm to forecast and study radar wind field. In order to reduce the residual error in the wind field prediction which uses BP neural network and grey algorithm, calculating the minimum value of residual error function, adopting the residuals of the gray algorithm trained by BP neural network, using the trained network model to forecast the residual sequence, using the predicted residual error sequence to modify the forecast sequence of the grey algorithm. The test data show that using the grey algorithm modified by BP neural network can effectively reduce the residual value and improve the prediction precision.

  7. Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies

    PubMed Central

    Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim

    2007-01-01

    While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434

  8. How good are indirect tests at detecting recombination in human mtDNA?

    PubMed

    White, Daniel James; Bryant, David; Gemmell, Neil John

    2013-07-08

    Empirical proof of human mitochondrial DNA (mtDNA) recombination in somatic tissues was obtained in 2004; however, a lack of irrefutable evidence exists for recombination in human mtDNA at the population level. Our inability to demonstrate convincingly a signal of recombination in population data sets of human mtDNA sequence may be due, in part, to the ineffectiveness of current indirect tests. Previously, we tested some well-established indirect tests of recombination (linkage disequilibrium vs. distance using D' and r(2), Homoplasy Test, Pairwise Homoplasy Index, Neighborhood Similarity Score, and Max χ(2)) on sequence data derived from the only empirically confirmed case of human mtDNA recombination thus far and demonstrated that some methods were unable to detect recombination. Here, we assess the performance of these six well-established tests and explore what characteristics specific to human mtDNA sequence may affect their efficacy by simulating sequence under various parameters with levels of recombination (ρ) that vary around an empirically derived estimate for human mtDNA (population parameter ρ = 5.492). No test performed infallibly under any of our scenarios, and error rates varied across tests, whereas detection rates increased substantially with ρ values > 5.492. Under a model of evolution that incorporates parameters specific to human mtDNA, including rate heterogeneity, population expansion, and ρ = 5.492, successful detection rates are limited to a range of 7-70% across tests with an acceptable level of false-positive results: the neighborhood similarity score incompatibility test performed best overall under these parameters. Population growth seems to have the greatest impact on recombination detection probabilities across all models tested, likely due to its impact on sequence diversity. The implications of our findings on our current understanding of mtDNA recombination in humans are discussed.

  9. How Good Are Indirect Tests at Detecting Recombination in Human mtDNA?

    PubMed Central

    White, Daniel James; Bryant, David; Gemmell, Neil John

    2013-01-01

    Empirical proof of human mitochondrial DNA (mtDNA) recombination in somatic tissues was obtained in 2004; however, a lack of irrefutable evidence exists for recombination in human mtDNA at the population level. Our inability to demonstrate convincingly a signal of recombination in population data sets of human mtDNA sequence may be due, in part, to the ineffectiveness of current indirect tests. Previously, we tested some well-established indirect tests of recombination (linkage disequilibrium vs. distance using D′ and r2, Homoplasy Test, Pairwise Homoplasy Index, Neighborhood Similarity Score, and Max χ2) on sequence data derived from the only empirically confirmed case of human mtDNA recombination thus far and demonstrated that some methods were unable to detect recombination. Here, we assess the performance of these six well-established tests and explore what characteristics specific to human mtDNA sequence may affect their efficacy by simulating sequence under various parameters with levels of recombination (ρ) that vary around an empirically derived estimate for human mtDNA (population parameter ρ = 5.492). No test performed infallibly under any of our scenarios, and error rates varied across tests, whereas detection rates increased substantially with ρ values > 5.492. Under a model of evolution that incorporates parameters specific to human mtDNA, including rate heterogeneity, population expansion, and ρ = 5.492, successful detection rates are limited to a range of 7−70% across tests with an acceptable level of false-positive results: the neighborhood similarity score incompatibility test performed best overall under these parameters. Population growth seems to have the greatest impact on recombination detection probabilities across all models tested, likely due to its impact on sequence diversity. The implications of our findings on our current understanding of mtDNA recombination in humans are discussed. PMID:23665874

  10. Wideband Arrhythmia-Insensitive-Rapid (AIR) Pulse Sequence for Cardiac T1 mapping without Image Artifacts induced by ICD

    PubMed Central

    Hong, KyungPyo; Jeong, Eun-Kee; Wall, T. Scott; Drakos, Stavros G.; Kim, Daniel

    2015-01-01

    Purpose To develop and evaluate a wideband arrhythmia-insensitive-rapid (AIR) pulse sequence for cardiac T1 mapping without image artifacts induced by implantable-cardioverter-defibrillator (ICD). Methods We developed a wideband AIR pulse sequence by incorporating a saturation pulse with wide frequency bandwidth (8.9 kHz), in order to achieve uniform T1 weighting in the heart with ICD. We tested the performance of original and “wideband” AIR cardiac T1 mapping pulse sequences in phantom and human experiments at 1.5T. Results In 5 phantoms representing native myocardium and blood and post-contrast blood/tissue T1 values, compared with the control T1 values measured with an inversion-recovery pulse sequence without ICD, T1 values measured with original AIR with ICD were considerably lower (absolute percent error >29%), whereas T1 values measured with wideband AIR with ICD were similar (absolute percent error <5%). Similarly, in 11 human subjects, compared with the control T1 values measured with original AIR without ICD, T1 measured with original AIR with ICD was significantly lower (absolute percent error >10.1%), whereas T1 measured with wideband AIR with ICD was similar (absolute percent error <2.0%). Conclusion This study demonstrates the feasibility of a wideband pulse sequence for cardiac T1 mapping without significant image artifacts induced by ICD. PMID:25975192

  11. Turtle: identifying frequent k-mers with cache-efficient algorithms.

    PubMed

    Roy, Rajat Shuvro; Bhattacharya, Debashish; Schliep, Alexander

    2014-07-15

    Counting the frequencies of k-mers in read libraries is often a first step in the analysis of high-throughput sequencing data. Infrequent k-mers are assumed to be a result of sequencing errors. The frequent k-mers constitute a reduced but error-free representation of the experiment, which can inform read error correction or serve as the input to de novo assembly methods. Ideally, the memory requirement for counting should be linear in the number of frequent k-mers and not in the, typically much larger, total number of k-mers in the read library. We present a novel method that balances time, space and accuracy requirements to efficiently extract frequent k-mers even for high-coverage libraries and large genomes such as human. Our method is designed to minimize cache misses in a cache-efficient manner by using a pattern-blocked Bloom filter to remove infrequent k-mers from consideration in combination with a novel sort-and-compact scheme, instead of a hash, for the actual counting. Although this increases theoretical complexity, the savings in cache misses reduce the empirical running times. A variant of method can resort to a counting Bloom filter for even larger savings in memory at the expense of false-negative rates in addition to the false-positive rates common to all Bloom filter-based approaches. A comparison with the state-of-the-art shows reduced memory requirements and running times. The tools are freely available for download at http://bioinformatics.rutgers.edu/Software/Turtle and http://figshare.com/articles/Turtle/791582. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. 2-Step Maximum Likelihood Channel Estimation for Multicode DS-CDMA with Frequency-Domain Equalization

    NASA Astrophysics Data System (ADS)

    Kojima, Yohei; Takeda, Kazuaki; Adachi, Fumiyuki

    Frequency-domain equalization (FDE) based on the minimum mean square error (MMSE) criterion can provide better downlink bit error rate (BER) performance of direct sequence code division multiple access (DS-CDMA) than the conventional rake combining in a frequency-selective fading channel. FDE requires accurate channel estimation. In this paper, we propose a new 2-step maximum likelihood channel estimation (MLCE) for DS-CDMA with FDE in a very slow frequency-selective fading environment. The 1st step uses the conventional pilot-assisted MMSE-CE and the 2nd step carries out the MLCE using decision feedback from the 1st step. The BER performance improvement achieved by 2-step MLCE over pilot assisted MMSE-CE is confirmed by computer simulation.

  13. [Memorization of Sequences of Movements of the Right and the Left Hand by Right- and Left-Handers].

    PubMed

    Bobrova, E V; Bogacheva, I N; Lyakhovetskii, V A; Fabinskaja, A A; Fomina, E V

    2015-01-01

    We analyzed the errors of right- and left-handers when performing memorized sequences by the left or the right hand during the task which activates positional coding: after 6-10 times the order of movements changed (the positions remained the same during all task). The task was first performed by one ("initial") hand, and then by another one ("continuing"); there were 2 groups of right-handers and 2 groups of left-handers. It was found that the pattern of errors during the task performance by the initial hand is similar in right- and left-handers both for the dominant and non-dominant hand. The information about the previous positions after changing the order of elements is used in the sequences for subdominant hands and not used in the sequences for dominant ones. After changing the hand, right- and left-handers show different patterns of errors ("non-symmetrical"). Thus, the errors of right- and left-handers are "symmetrical" at the early stages of task performance, while the transfer of this motor skill in right-and left-handers occurs in different ways.

  14. The Representation of Prediction Error in Auditory Cortex

    PubMed Central

    Rubin, Jonathan; Ulanovsky, Nachum; Tishby, Naftali

    2016-01-01

    To survive, organisms must extract information from the past that is relevant for their future. How this process is expressed at the neural level remains unclear. We address this problem by developing a novel approach from first principles. We show here how to generate low-complexity representations of the past that produce optimal predictions of future events. We then illustrate this framework by studying the coding of ‘oddball’ sequences in auditory cortex. We find that for many neurons in primary auditory cortex, trial-by-trial fluctuations of neuronal responses correlate with the theoretical prediction error calculated from the short-term past of the stimulation sequence, under constraints on the complexity of the representation of this past sequence. In some neurons, the effect of prediction error accounted for more than 50% of response variability. Reliable predictions often depended on a representation of the sequence of the last ten or more stimuli, although the representation kept only few details of that sequence. PMID:27490251

  15. Error correcting code with chip kill capability and power saving enhancement

    DOEpatents

    Gara, Alan G [Mount Kisco, NY; Chen, Dong [Croton On Husdon, NY; Coteus, Paul W [Yorktown Heights, NY; Flynn, William T [Rochester, MN; Marcella, James A [Rochester, MN; Takken, Todd [Brewster, NY; Trager, Barry M [Yorktown Heights, NY; Winograd, Shmuel [Scarsdale, NY

    2011-08-30

    A method and system are disclosed for detecting memory chip failure in a computer memory system. The method comprises the steps of accessing user data from a set of user data chips, and testing the user data for errors using data from a set of system data chips. This testing is done by generating a sequence of check symbols from the user data, grouping the user data into a sequence of data symbols, and computing a specified sequence of syndromes. If all the syndromes are zero, the user data has no errors. If one of the syndromes is non-zero, then a set of discriminator expressions are computed, and used to determine whether a single or double symbol error has occurred. In the preferred embodiment, less than two full system data chips are used for testing and correcting the user data.

  16. Ultra-Deep Sequencing Analysis of the Hepatitis A Virus 5'-Untranslated Region among Cases of the Same Outbreak from a Single Source

    PubMed Central

    Wu, Shuang; Nakamoto, Shingo; Kanda, Tatsuo; Jiang, Xia; Nakamura, Masato; Miyamura, Tatsuo; Shirasawa, Hiroshi; Sugiura, Nobuyuki; Takahashi-Nakaguchi, Azusa; Gonoi, Tohru; Yokosuka, Osamu

    2014-01-01

    Hepatitis A virus (HAV) is a causative agent of acute viral hepatitis for which an effective vaccine has been developed. Here we describe ultra-deep pyrosequences (UDPSs) of HAV 5'-untranslated region (5'UTR) among cases of the same outbreak, which arose from a single source, associated with a revolving sushi bar. We determined the reference sequence from HAV-derived clone from an attendant by the Sanger method. Sixteen UDPSs from this outbreak and one from another sporadic case were compared with this reference. Nucleotide errors yielded a UDPS error rate of < 1%. This study confirmed that nucleotide substitutions of this region are transition mutations in outbreak cases, that insertion was observed only in non-severe cases, and that these nucleotide substitutions were different from those of the sporadic case. Analysis of UDPSs detected low-prevalence HAV variations in 5'UTR, but no specific mutations associated with severity in these outbreak cases. To our surprise, HAV strains in this outbreak conserved HAV IRES sequence even if we performed analysis of UDPSs. UDPS analysis of HAV 5'UTR gave us no association between the disease severity of hepatitis A and HAV 5'UTR substitutions. It might be more interesting to perform ultra-deep sequencing of full length HAV genome in order to reveal possible unknown genomic determinants associated with disease severity. Further studies will be needed. PMID:24396287

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Allen, J; Velsko, S

    This report explores the question of whether meaningful conclusions can be drawn regarding the transmission relationship between two microbial samples on the basis of differences observed between the two sample's respective genomes. Unlike similar forensic applications using human DNA, the rapid rate of microbial genome evolution combined with the dynamics of infectious disease require a shift in thinking on what it means for two samples to 'match' in support of a forensic hypothesis. Previous outbreaks for SARS-CoV, FMDV and HIV were examined to investigate the question of how microbial sequence data can be used to draw inferences that link twomore » infected individuals by direct transmission. The results are counter intuitive with respect to human DNA forensic applications in that some genetic change rather than exact matching improve confidence in inferring direct transmission links, however, too much genetic change poses challenges, which can weaken confidence in inferred links. High rates of infection coupled with relatively weak selective pressure observed in the SARS-CoV and FMDV data lead to fairly low confidence for direct transmission links. Confidence values for forensic hypotheses increased when testing for the possibility that samples are separated by at most a few intermediate hosts. Moreover, the observed outbreak conditions support the potential to provide high confidence values for hypothesis that exclude direct transmission links. Transmission inferences are based on the total number of observed or inferred genetic changes separating two sequences rather than uniquely weighing the importance of any one genetic mismatch. Thus, inferences are surprisingly robust in the presence of sequencing errors provided the error rates are randomly distributed across all samples in the reference outbreak database and the novel sequence samples in question. When the number of observed nucleotide mutations are limited due to characteristics of the outbreak or the availability of only partial rather than whole genome sequencing, indel information was shown to have the potential to improve performance but only for select outbreak conditions. In examined HIV transmission cases, extended evolution proved to be the limiting factor in assigning high confidence to transmission links, however, the potential to correct for extended evolution not associated with transmission events is demonstrated. Outbreak specific conditions such as selective pressure (in the form of varying mutation rate), are shown to impact the strength of inference made and a Monte Carlo simulation tool is introduced, which is used to provide upper and lower bounds on the confidence values associated with a forensic hypothesis.« less

  18. SeqLib: a C ++ API for rapid BAM manipulation, sequence alignment and sequence assembly

    PubMed Central

    Wala, Jeremiah; Beroukhim, Rameen

    2017-01-01

    Abstract We present SeqLib, a C ++ API and command line tool that provides a rapid and user-friendly interface to BAM/SAM/CRAM files, global sequence alignment operations and sequence assembly. Four C libraries perform core operations in SeqLib: HTSlib for BAM access, BWA-MEM and BLAT for sequence alignment and Fermi for error correction and sequence assembly. Benchmarking indicates that SeqLib has lower CPU and memory requirements than leading C ++ sequence analysis APIs. We demonstrate an example of how minimal SeqLib code can extract, error-correct and assemble reads from a CRAM file and then align with BWA-MEM. SeqLib also provides additional capabilities, including chromosome-aware interval queries and read plotting. Command line tools are available for performing integrated error correction, micro-assemblies and alignment. Availability and Implementation: SeqLib is available on Linux and OSX for the C ++98 standard and later at github.com/walaj/SeqLib. SeqLib is released under the Apache2 license. Additional capabilities for BLAT alignment are available under the BLAT license. Contact: jwala@broadinstitue.org; rameen@broadinstitute.org PMID:28011768

  19. SeqLib: a C ++ API for rapid BAM manipulation, sequence alignment and sequence assembly.

    PubMed

    Wala, Jeremiah; Beroukhim, Rameen

    2017-03-01

    We present SeqLib, a C ++ API and command line tool that provides a rapid and user-friendly interface to BAM/SAM/CRAM files, global sequence alignment operations and sequence assembly. Four C libraries perform core operations in SeqLib: HTSlib for BAM access, BWA-MEM and BLAT for sequence alignment and Fermi for error correction and sequence assembly. Benchmarking indicates that SeqLib has lower CPU and memory requirements than leading C ++ sequence analysis APIs. We demonstrate an example of how minimal SeqLib code can extract, error-correct and assemble reads from a CRAM file and then align with BWA-MEM. SeqLib also provides additional capabilities, including chromosome-aware interval queries and read plotting. Command line tools are available for performing integrated error correction, micro-assemblies and alignment. SeqLib is available on Linux and OSX for the C ++98 standard and later at github.com/walaj/SeqLib. SeqLib is released under the Apache2 license. Additional capabilities for BLAT alignment are available under the BLAT license. jwala@broadinstitue.org ; rameen@broadinstitute.org. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  20. Prediction of protein tertiary structure from sequences using a very large back-propagation neural network

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, X.; Wilcox, G.L.

    1993-12-31

    We have implemented large scale back-propagation neural networks on a 544 node Connection Machine, CM-5, using the C language in MIMD mode. The program running on 512 processors performs backpropagation learning at 0.53 Gflops, which provides 76 million connection updates per second. We have applied the network to the prediction of protein tertiary structure from sequence information alone. A neural network with one hidden layer and 40 million connections is trained to learn the relationship between sequence and tertiary structure. The trained network yields predicted structures of some proteins on which it has not been trained given only their sequences.more » Presentation of the Fourier transform of the sequences accentuates periodicity in the sequence and yields good generalization with greatly increased training efficiency. Training simulations with a large, heterologous set of protein structures (111 proteins from CM-5 time) to solutions with under 2% RMS residual error within the training set (random responses give an RMS error of about 20%). Presentation of 15 sequences of related proteins in a testing set of 24 proteins yields predicted structures with less than 8% RMS residual error, indicating good apparent generalization.« less

  1. Significantly improved precision of cell migration analysis in time-lapse video microscopy through use of a fully automated tracking system

    PubMed Central

    2010-01-01

    Background Cell motility is a critical parameter in many physiological as well as pathophysiological processes. In time-lapse video microscopy, manual cell tracking remains the most common method of analyzing migratory behavior of cell populations. In addition to being labor-intensive, this method is susceptible to user-dependent errors regarding the selection of "representative" subsets of cells and manual determination of precise cell positions. Results We have quantitatively analyzed these error sources, demonstrating that manual cell tracking of pancreatic cancer cells lead to mis-calculation of migration rates of up to 410%. In order to provide for objective measurements of cell migration rates, we have employed multi-target tracking technologies commonly used in radar applications to develop fully automated cell identification and tracking system suitable for high throughput screening of video sequences of unstained living cells. Conclusion We demonstrate that our automatic multi target tracking system identifies cell objects, follows individual cells and computes migration rates with high precision, clearly outperforming manual procedures. PMID:20377897

  2. DNA assembly with error correction on a droplet digital microfluidics platform.

    PubMed

    Khilko, Yuliya; Weyman, Philip D; Glass, John I; Adams, Mark D; McNeil, Melanie A; Griffin, Peter B

    2018-06-01

    Custom synthesized DNA is in high demand for synthetic biology applications. However, current technologies to produce these sequences using assembly from DNA oligonucleotides are costly and labor-intensive. The automation and reduced sample volumes afforded by microfluidic technologies could significantly decrease materials and labor costs associated with DNA synthesis. The purpose of this study was to develop a gene assembly protocol utilizing a digital microfluidic device. Toward this goal, we adapted bench-scale oligonucleotide assembly methods followed by enzymatic error correction to the Mondrian™ digital microfluidic platform. We optimized Gibson assembly, polymerase chain reaction (PCR), and enzymatic error correction reactions in a single protocol to assemble 12 oligonucleotides into a 339-bp double- stranded DNA sequence encoding part of the human influenza virus hemagglutinin (HA) gene. The reactions were scaled down to 0.6-1.2 μL. Initial microfluidic assembly methods were successful and had an error frequency of approximately 4 errors/kb with errors originating from the original oligonucleotide synthesis. Relative to conventional benchtop procedures, PCR optimization required additional amounts of MgCl 2 , Phusion polymerase, and PEG 8000 to achieve amplification of the assembly and error correction products. After one round of error correction, error frequency was reduced to an average of 1.8 errors kb - 1 . We demonstrated that DNA assembly from oligonucleotides and error correction could be completely automated on a digital microfluidic (DMF) platform. The results demonstrate that enzymatic reactions in droplets show a strong dependence on surface interactions, and successful on-chip implementation required supplementation with surfactants, molecular crowding agents, and an excess of enzyme. Enzymatic error correction of assembled fragments improved sequence fidelity by 2-fold, which was a significant improvement but somewhat lower than expected compared to bench-top assays, suggesting an additional capacity for optimization.

  3. The role of consolidation in learning context-dependent phonotactic patterns in speech and digital sequence production.

    PubMed

    Anderson, Nathaniel D; Dell, Gary S

    2018-04-03

    Speakers implicitly learn novel phonotactic patterns by producing strings of syllables. The learning is revealed in their speech errors. First-order patterns, such as "/f/ must be a syllable onset," can be distinguished from contingent, or second-order, patterns, such as "/f/ must be an onset if the vowel is /a/, but a coda if the vowel is /o/." A metaanalysis of 19 experiments clearly demonstrated that first-order patterns affect speech errors to a very great extent in a single experimental session, but second-order vowel-contingent patterns only affect errors on the second day of testing, suggesting the need for a consolidation period. Two experiments tested an analogue to these studies involving sequences of button pushes, with fingers as "consonants" and thumbs as "vowels." The button-push errors revealed two of the key speech-error findings: first-order patterns are learned quickly, but second-order thumb-contingent patterns are only strongly revealed in the errors on the second day of testing. The influence of computational complexity on the implicit learning of phonotactic patterns in speech production may be a general feature of sequence production.

  4. Simulated rRNA/DNA Ratios Show Potential To Misclassify Active Populations as Dormant

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Steven, Blaire; Hesse, Cedar; Soghigian, John

    The use of rRNA/DNA ratios derived from surveys of rRNA sequences in RNA and DNA extracts is an appealing but poorly validated approach to infer the activity status of environmental microbes. To improve the interpretation of rRNA/DNA ratios, we performed simulations to investigate the effects of community structure, rRNA amplification, and sampling depth on the accuracy of rRNA/DNA ratios in classifying bacterial populations as “active” or “dormant.” Community structure was an insignificant factor. In contrast, the extent of rRNA amplification that occurs as cells transition from dormant to growing had a significant effect (P < 0.0001) on classification accuracy, withmore » misclassification errors ranging from 16 to 28%, depending on the rRNA amplification model. The error rate increased to 47% when communities included a mixture of rRNA amplification models, but most of the inflated error was false negatives (i.e., active populations misclassified as dormant). Sampling depth also affected error rates (P < 0.001). Inadequate sampling depth produced various artifacts that are characteristic of rRNA/DNA ratios generated from real communities. These data show important constraints on the use of rRNA/DNA ratios to infer activity status. Whereas classification of populations as active based on rRNA/DNA ratios appears generally valid, classification of populations as dormant is potentially far less accurate.« less

  5. Simulated rRNA/DNA Ratios Show Potential To Misclassify Active Populations as Dormant

    DOE PAGES

    Steven, Blaire; Hesse, Cedar; Soghigian, John; ...

    2017-03-31

    The use of rRNA/DNA ratios derived from surveys of rRNA sequences in RNA and DNA extracts is an appealing but poorly validated approach to infer the activity status of environmental microbes. To improve the interpretation of rRNA/DNA ratios, we performed simulations to investigate the effects of community structure, rRNA amplification, and sampling depth on the accuracy of rRNA/DNA ratios in classifying bacterial populations as “active” or “dormant.” Community structure was an insignificant factor. In contrast, the extent of rRNA amplification that occurs as cells transition from dormant to growing had a significant effect (P < 0.0001) on classification accuracy, withmore » misclassification errors ranging from 16 to 28%, depending on the rRNA amplification model. The error rate increased to 47% when communities included a mixture of rRNA amplification models, but most of the inflated error was false negatives (i.e., active populations misclassified as dormant). Sampling depth also affected error rates (P < 0.001). Inadequate sampling depth produced various artifacts that are characteristic of rRNA/DNA ratios generated from real communities. These data show important constraints on the use of rRNA/DNA ratios to infer activity status. Whereas classification of populations as active based on rRNA/DNA ratios appears generally valid, classification of populations as dormant is potentially far less accurate.« less

  6. A family-based probabilistic method for capturing de novo mutations from high-throughput short-read sequencing data.

    PubMed

    Cartwright, Reed A; Hussin, Julie; Keebler, Jonathan E M; Stone, Eric A; Awadalla, Philip

    2012-01-06

    Recent advances in high-throughput DNA sequencing technologies and associated statistical analyses have enabled in-depth analysis of whole-genome sequences. As this technology is applied to a growing number of individual human genomes, entire families are now being sequenced. Information contained within the pedigree of a sequenced family can be leveraged when inferring the donors' genotypes. The presence of a de novo mutation within the pedigree is indicated by a violation of Mendelian inheritance laws. Here, we present a method for probabilistically inferring genotypes across a pedigree using high-throughput sequencing data and producing the posterior probability of de novo mutation at each genomic site examined. This framework can be used to disentangle the effects of germline and somatic mutational processes and to simultaneously estimate the effect of sequencing error and the initial genetic variation in the population from which the founders of the pedigree arise. This approach is examined in detail through simulations and areas for method improvement are noted. By applying this method to data from members of a well-defined nuclear family with accurate pedigree information, the stage is set to make the most direct estimates of the human mutation rate to date.

  7. Quantitative Functional Imaging Using Dynamic Positron Computed Tomography and Rapid Parameter Estimation Techniques

    NASA Astrophysics Data System (ADS)

    Koeppe, Robert Allen

    Positron computed tomography (PCT) is a diagnostic imaging technique that provides both three dimensional imaging capability and quantitative measurements of local tissue radioactivity concentrations in vivo. This allows the development of non-invasive methods that employ the principles of tracer kinetics for determining physiological properties such as mass specific blood flow, tissue pH, and rates of substrate transport or utilization. A physiologically based, two-compartment tracer kinetic model was derived to mathematically describe the exchange of a radioindicator between blood and tissue. The model was adapted for use with dynamic sequences of data acquired with a positron tomograph. Rapid estimation techniques were implemented to produce functional images of the model parameters by analyzing each individual pixel sequence of the image data. A detailed analysis of the performance characteristics of three different parameter estimation schemes was performed. The analysis included examination of errors caused by statistical uncertainties in the measured data, errors in the timing of the data, and errors caused by violation of various assumptions of the tracer kinetic model. Two specific radioindicators were investigated. ('18)F -fluoromethane, an inert freely diffusible gas, was used for local quantitative determinations of both cerebral blood flow and tissue:blood partition coefficient. A method was developed that did not require direct sampling of arterial blood for the absolute scaling of flow values. The arterial input concentration time course was obtained by assuming that the alveolar or end-tidal expired breath radioactivity concentration is proportional to the arterial blood concentration. The scale of the input function was obtained from a series of venous blood concentration measurements. The method of absolute scaling using venous samples was validated in four studies, performed on normal volunteers, in which directly measured arterial concentrations were compared to those predicted from the expired air and venous blood samples. The glucose analog ('18)F-3-deoxy-3-fluoro-D -glucose (3-FDG) was used for quantitating the membrane transport rate of glucose. The measured data indicated that the phosphorylation rate of 3-FDG was low enough to allow accurate estimation of the transport rate using a two compartment model.

  8. A matter of emphasis: Linguistic stress habits modulate serial recall.

    PubMed

    Taylor, John C; Macken, Bill; Jones, Dylan M

    2015-04-01

    Models of short-term memory for sequential information rely on item-level, feature-based descriptions to account for errors in serial recall. Transposition errors within alternating similar/dissimilar letter sequences derive from interactions between overlapping features. However, in two experiments, we demonstrated that the characteristics of the sequence are what determine the fates of items, rather than the properties ascribed to the items themselves. Performance in alternating sequences is determined by the way that the sequences themselves induce particular prosodic rehearsal patterns, and not by the nature of the items per se. In a serial recall task, the shapes of the canonical "saw-tooth" serial position curves and transposition error probabilities at successive input-output distances were modulated by subvocal rehearsal strategies, despite all item-based parameters being held constant. We replicated this finding using nonalternating lists, thus demonstrating that transpositions are substantially influenced by prosodic features-such as stress-that emerge during subvocal rehearsal.

  9. snpAD: An ancient DNA genotype caller.

    PubMed

    Prüfer, Kay

    2018-06-21

    The study of ancient genomes can elucidate the evolutionary past. However, analyses are complicated by base-modifications in ancient DNA molecules that result in errors in DNA sequences. These errors are particularly common near the ends of sequences and pose a challenge for genotype calling. I describe an iterative method that estimates genotype frequencies and errors along sequences to allow for accurate genotype calling from ancient sequences. The implementation of this method, called snpAD, performs well on high-coverage ancient data, as shown by simulations and by subsampling the data of a high-coverage Neandertal genome. Although estimates for low-coverage genomes are less accurate, I am able to derive approximate estimates of heterozygosity from several low-coverage Neandertals. These estimates show that low heterozygosity, compared to modern humans, was common among Neandertals. The C ++ code of snpAD is freely available at http://bioinf.eva.mpg.de/snpAD/. Supplementary data are available at Bioinformatics online.

  10. Investigating the limitations of single breath-hold renal artery blood flow measurements using spiral phase contrast MR with R-R interval averaging.

    PubMed

    Steeden, Jennifer A; Muthurangu, Vivek

    2015-04-01

    1) To validate an R-R interval averaged golden angle spiral phase contrast magnetic resonance (RAGS PCMR) sequence against conventional cine PCMR for assessment of renal blood flow (RBF) in normal volunteers; and 2) To investigate the effects of motion and heart rate on the accuracy of flow measurements using an in silico simulation. In 20 healthy volunteers RAGS (∼6 sec breath-hold) and respiratory-navigated cine (∼5 min) PCMR were performed in both renal arteries to assess RBF. A simulation of RAGS PCMR was used to assess the effect of heart rate (30-105 bpm), vessel expandability (0-150%) and translational motion (x1.0-4.0) on the accuracy of RBF measurements. There was good agreement between RAGS and cine PCMR in the volunteer study (bias: 0.01 L/min, limits of agreement: -0.04 to +0.06 L/min, P = 0.0001). The simulation demonstrated a positive linear relationship between heart rate and error (r = 0.9894, P < 0.0001), a negative linear relationship between vessel expansion and error (r = -0.9484, P < 0.0001), and a nonlinear, heart rate-dependent relationship between vessel translation and error. We have demonstrated that RAGS PCMR accurately measures RBF in vivo. However, the simulation reveals limitations in this technique at extreme heart rates (<40 bpm, >100 bpm), or when there is significant motion (vessel expandability: >80%, vessel translation: >x2.2). © 2014 Wiley Periodicals, Inc.

  11. Auto-tracking system for human lumbar motion analysis.

    PubMed

    Sui, Fuge; Zhang, Da; Lam, Shing Chun Benny; Zhao, Lifeng; Wang, Dongjun; Bi, Zhenggang; Hu, Yong

    2011-01-01

    Previous lumbar motion analyses suggest the usefulness of quantitatively characterizing spine motion. However, the application of such measurements is still limited by the lack of user-friendly automatic spine motion analysis systems. This paper describes an automatic analysis system to measure lumbar spine disorders that consists of a spine motion guidance device, an X-ray imaging modality to acquire digitized video fluoroscopy (DVF) sequences and an automated tracking module with a graphical user interface (GUI). DVF sequences of the lumbar spine are recorded during flexion-extension under a guidance device. The automatic tracking software utilizing a particle filter locates the vertebra-of-interest in every frame of the sequence, and the tracking result is displayed on the GUI. Kinematic parameters are also extracted from the tracking results for motion analysis. We observed that, in a bone model test, the maximum fiducial error was 3.7%, and the maximum repeatability error in translation and rotation was 1.2% and 2.6%, respectively. In our simulated DVF sequence study, the automatic tracking was not successful when the noise intensity was greater than 0.50. In a noisy situation, the maximal difference was 1.3 mm in translation and 1° in the rotation angle. The errors were calculated in translation (fiducial error: 2.4%, repeatability error: 0.5%) and in the rotation angle (fiducial error: 1.0%, repeatability error: 0.7%). However, the automatic tracking software could successfully track simulated sequences contaminated by noise at a density ≤ 0.5 with very high accuracy, providing good reliability and robustness. A clinical trial with 10 healthy subjects and 2 lumbar spondylolisthesis patients were enrolled in this study. The measurement with auto-tacking of DVF provided some information not seen in the conventional X-ray. The results proposed the potential use of the proposed system for clinical applications.

  12. ECHO: A reference-free short-read error correction algorithm

    PubMed Central

    Kao, Wei-Chun; Chan, Andrew H.; Song, Yun S.

    2011-01-01

    Developing accurate, scalable algorithms to improve data quality is an important computational challenge associated with recent advances in high-throughput sequencing technology. In this study, a novel error-correction algorithm, called ECHO, is introduced for correcting base-call errors in short-reads, without the need of a reference genome. Unlike most previous methods, ECHO does not require the user to specify parameters of which optimal values are typically unknown a priori. ECHO automatically sets the parameters in the assumed model and estimates error characteristics specific to each sequencing run, while maintaining a running time that is within the range of practical use. ECHO is based on a probabilistic model and is able to assign a quality score to each corrected base. Furthermore, it explicitly models heterozygosity in diploid genomes and provides a reference-free method for detecting bases that originated from heterozygous sites. On both real and simulated data, ECHO is able to improve the accuracy of previous error-correction methods by several folds to an order of magnitude, depending on the sequence coverage depth and the position in the read. The improvement is most pronounced toward the end of the read, where previous methods become noticeably less effective. Using a whole-genome yeast data set, it is demonstrated here that ECHO is capable of coping with nonuniform coverage. Also, it is shown that using ECHO to perform error correction as a preprocessing step considerably facilitates de novo assembly, particularly in the case of low-to-moderate sequence coverage depth. PMID:21482625

  13. Reducing false-positive incidental findings with ensemble genotyping and logistic regression based variant filtering methods.

    PubMed

    Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choe, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B; Gupta, Neha; Kohane, Isaac S; Green, Robert C; Kong, Sek Won

    2014-08-01

    As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false-negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous single nucleotide variants (SNVs); 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery in NA12878, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and an ensemble genotyping would be essential to minimize false-positive DNM candidates. © 2014 WILEY PERIODICALS, INC.

  14. Reducing false positive incidental findings with ensemble genotyping and logistic regression-based variant filtering methods

    PubMed Central

    Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choi, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B.; Gupta, Neha; Kohane, Isaac S.; Green, Robert C.; Kong, Sek Won

    2014-01-01

    As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous SNVs; 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and ensemble genotyping would be essential to minimize false positive DNM candidates. PMID:24829188

  15. A Novel Genome-Information Content-Based Statistic for Genome-Wide Association Analysis Designed for Next-Generation Sequencing Data

    PubMed Central

    Luo, Li; Zhu, Yun

    2012-01-01

    Abstract The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T2, collapsing method, multivariate and collapsing (CMC) method, individual χ2 test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets. PMID:22651812

  16. A novel genome-information content-based statistic for genome-wide association analysis designed for next-generation sequencing data.

    PubMed

    Luo, Li; Zhu, Yun; Xiong, Momiao

    2012-06-01

    The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.

  17. Characterization of 32 microsatellite loci for the Pacific red snapper, Lutjanus peru, through next generation sequencing.

    PubMed

    Paz-García, David A; Munguía-Vega, Adrián; Plomozo-Lugo, Tomas; Weaver, Amy Hudson

    2017-04-01

    We developed a set of hypervariable microsatellite markers for the Pacific red snapper (Lutjanus peru), an economically important marine fish for small-scale fisheries in the west coast of Mexico. We performed shotgun genome sequencing with the 454 XL titanium chemistry and used bioinformatic tools to search for perfect microsatellite loci. We selected 66 primer pairs that were synthesized and genotyped in an ABI PRISM 3730XL DNA sequencer in 32 individuals from the Gulf of California. We estimated levels of genetic diversity, deviations from linkage and Hardy-Weinberg equilibrium, estimated the frequency of null alleles and the probability of individual identity for the new markers. We reanalyzed 16 loci in 16 individuals to estimate genotyping error rates. Eighteen loci failed to amplify, 16 loci were discarded due to unspecific amplifications and 32 loci (14 tetranucleotide and 18 dinucleotide) were successfully scored. The average number of alleles per locus was 21 (±6.87, SD) and ranged from 8 to 34. The average observed and expected heterozygosities were 0.787 (±0.144 SD, range 0.250-0.935) and 0.909 (±0.122 SD, range 0.381-0.965), respectively. No significant linkage was detected. Eight loci showed deviations from Hardy-Weinberg equilibrium, and from these, four loci showed moderate null allele frequencies (0.104-0.220). The probability of individual identity for the new loci was 1.46 -62 . Genotyping error rates averaged 9.58%. The new markers will be useful to investigate patterns of larval dispersal, metapopulation dynamics, fine-scale genetic structure and diversity aimed to inform the implementation of spatially explicit fisheries management strategies in the Gulf of California.

  18. Comparison of phasing strategies for whole human genomes

    PubMed Central

    Kirkness, Ewen; Schork, Nicholas J.

    2018-01-01

    Humans are a diploid species that inherit one set of chromosomes paternally and one homologous set of chromosomes maternally. Unfortunately, most human sequencing initiatives ignore this fact in that they do not directly delineate the nucleotide content of the maternal and paternal copies of the 23 chromosomes individuals possess (i.e., they do not ‘phase’ the genome) often because of the costs and complexities of doing so. We compared 11 different widely-used approaches to phasing human genomes using the publicly available ‘Genome-In-A-Bottle’ (GIAB) phased version of the NA12878 genome as a gold standard. The phasing strategies we compared included laboratory-based assays that prepare DNA in unique ways to facilitate phasing as well as purely computational approaches that seek to reconstruct phase information from general sequencing reads and constructs or population-level haplotype frequency information obtained through a reference panel of haplotypes. To assess the performance of the 11 approaches, we used metrics that included, among others, switch error rates, haplotype block lengths, the proportion of fully phase-resolved genes, phasing accuracy and yield between pairs of SNVs. Our comparisons suggest that a hybrid or combined approach that leverages: 1. population-based phasing using the SHAPEIT software suite, 2. either genome-wide sequencing read data or parental genotypes, and 3. a large reference panel of variant and haplotype frequencies, provides a fast and efficient way to produce highly accurate phase-resolved individual human genomes. We found that for population-based approaches, phasing performance is enhanced with the addition of genome-wide read data; e.g., whole genome shotgun and/or RNA sequencing reads. Further, we found that the inclusion of parental genotype data within a population-based phasing strategy can provide as much as a ten-fold reduction in phasing errors. We also considered a majority voting scheme for the construction of a consensus haplotype combining multiple predictions for enhanced performance and site coverage. Finally, we also identified DNA sequence signatures associated with the genomic regions harboring phasing switch errors, which included regions of low polymorphism or SNV density. PMID:29621242

  19. Identification and correction of abnormal, incomplete and mispredicted proteins in public databases.

    PubMed

    Nagy, Alinda; Hegyi, Hédi; Farkas, Krisztina; Tordai, Hedvig; Kozma, Evelin; Bányai, László; Patthy, László

    2008-08-27

    Despite significant improvements in computational annotation of genomes, sequences of abnormal, incomplete or incorrectly predicted genes and proteins remain abundant in public databases. Since the majority of incomplete, abnormal or mispredicted entries are not annotated as such, these errors seriously affect the reliability of these databases. Here we describe the MisPred approach that may provide an efficient means for the quality control of databases. The current version of the MisPred approach uses five distinct routines for identifying abnormal, incomplete or mispredicted entries based on the principle that a sequence is likely to be incorrect if some of its features conflict with our current knowledge about protein-coding genes and proteins: (i) conflict between the predicted subcellular localization of proteins and the absence of the corresponding sequence signals; (ii) presence of extracellular and cytoplasmic domains and the absence of transmembrane segments; (iii) co-occurrence of extracellular and nuclear domains; (iv) violation of domain integrity; (v) chimeras encoded by two or more genes located on different chromosomes. Analyses of predicted EnsEMBL protein sequences of nine deuterostome (Homo sapiens, Mus musculus, Rattus norvegicus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Fugu rubripes, Danio rerio and Ciona intestinalis) and two protostome species (Caenorhabditis elegans and Drosophila melanogaster) have revealed that the absence of expected signal peptides and violation of domain integrity account for the majority of mispredictions. Analyses of sequences predicted by NCBI's GNOMON annotation pipeline show that the rates of mispredictions are comparable to those of EnsEMBL. Interestingly, even the manually curated UniProtKB/Swiss-Prot dataset is contaminated with mispredicted or abnormal proteins, although to a much lesser extent than UniProtKB/TrEMBL or the EnsEMBL or GNOMON-predicted entries. MisPred works efficiently in identifying errors in predictions generated by the most reliable gene prediction tools such as the EnsEMBL and NCBI's GNOMON pipelines and also guides the correction of errors. We suggest that application of the MisPred approach will significantly improve the quality of gene predictions and the associated databases.

  20. Nonuniformity correction for an infrared focal plane array based on diamond search block matching.

    PubMed

    Sheng-Hui, Rong; Hui-Xin, Zhou; Han-Lin, Qin; Rui, Lai; Kun, Qian

    2016-05-01

    In scene-based nonuniformity correction algorithms, artificial ghosting and image blurring degrade the correction quality severely. In this paper, an improved algorithm based on the diamond search block matching algorithm and the adaptive learning rate is proposed. First, accurate transform pairs between two adjacent frames are estimated by the diamond search block matching algorithm. Then, based on the error between the corresponding transform pairs, the gradient descent algorithm is applied to update correction parameters. During the process of gradient descent, the local standard deviation and a threshold are utilized to control the learning rate to avoid the accumulation of matching error. Finally, the nonuniformity correction would be realized by a linear model with updated correction parameters. The performance of the proposed algorithm is thoroughly studied with four real infrared image sequences. Experimental results indicate that the proposed algorithm can reduce the nonuniformity with less ghosting artifacts in moving areas and can also overcome the problem of image blurring in static areas.

  1. 50 Mbps free space direct detection laser diode optical communication system with Q = 4 PPM signaling

    NASA Technical Reports Server (NTRS)

    Sun, Xiaoli; Davidson, Frederic; Field, Christopher

    1990-01-01

    A 50 Mbps direct detection optical communication system for use in an intersatellite link was constructed with an AlGaAs laser diode transmitter and a silicon avalanche photodiode photodetector. The system used a Q = 4 PPM format. The receiver consisted of a maximum likelihood PPM detector and a timing recovery subsystem. The PPM slot clock was recovered at the receiver by using a transition detector followed by a PLL. The PPM word clock was recovered by using a second PLL whose input was derived from the presence of back-to-back PPM pulses contained in the received random PPM pulse sequences. The system achieved a bit error rate of 0.000001 at less than 50 detected signal photons/information bit. The receiver was capable of acquiring and maintaining slot and word synchronization for received signal levels greater than 20 photons/information bit, at which the receiver bit error rate was about 0.01.

  2. Identity-by-Descent-Based Phasing and Imputation in Founder Populations Using Graphical Models

    PubMed Central

    Palin, Kimmo; Campbell, Harry; Wright, Alan F; Wilson, James F; Durbin, Richard

    2011-01-01

    Accurate knowledge of haplotypes, the combination of alleles co-residing on a single copy of a chromosome, enables powerful gene mapping and sequence imputation methods. Since humans are diploid, haplotypes must be derived from genotypes by a phasing process. In this study, we present a new computational model for haplotype phasing based on pairwise sharing of haplotypes inferred to be Identical-By-Descent (IBD). We apply the Bayesian network based model in a new phasing algorithm, called systematic long-range phasing (SLRP), that can capitalize on the close genetic relationships in isolated founder populations, and show with simulated and real genome-wide genotype data that SLRP substantially reduces the rate of phasing errors compared to previous phasing algorithms. Furthermore, the method accurately identifies regions of IBD, enabling linkage-like studies without pedigrees, and can be used to impute most genotypes with very low error rate. Genet. Epidemiol. 2011. © 2011 Wiley Periodicals, Inc.35:853-860, 2011 PMID:22006673

  3. Impact of increased mutagenesis on adaptation to high temperature in bacteriophage Qβ.

    PubMed

    Arribas, María; Cabanillas, Laura; Kubota, Kirina; Lázaro, Ester

    2016-10-01

    RNA viruses replicate with very high error rates, which makes them more sensitive to additional increases in this parameter. This fact has inspired an antiviral strategy named lethal mutagenesis, which is based on the artificial increase of the error rate above a threshold incompatible with virus infectivity. A relevant issue concerning lethal mutagenesis is whether incomplete treatments might enhance the adaptive possibilities of viruses. We have addressed this question by subjecting an RNA virus, the bacteriophage Qβ, to different transmission regimes in the presence or the absence of sublethal concentrations of the mutagenic nucleoside analogue 5-azacytidine (AZC). Populations obtained were subsequently exposed to a non-optimal temperature and analyzed to determine their consensus sequences. Our results show that previously mutagenized populations rapidly fixed a specific set of mutations upon propagation at the new temperature, suggesting that the expansion of the mutant spectrum caused by AZC has an influence on later evolutionary behavior. Copyright © 2016 Elsevier Inc. All rights reserved.

  4. Accurate read-based metagenome characterization using a hierarchical suite of unique signatures

    PubMed Central

    Freitas, Tracey Allen K.; Li, Po-E; Scholz, Matthew B.; Chain, Patrick S. G.

    2015-01-01

    A major challenge in the field of shotgun metagenomics is the accurate identification of organisms present within a microbial community, based on classification of short sequence reads. Though existing microbial community profiling methods have attempted to rapidly classify the millions of reads output from modern sequencers, the combination of incomplete databases, similarity among otherwise divergent genomes, errors and biases in sequencing technologies, and the large volumes of sequencing data required for metagenome sequencing has led to unacceptably high false discovery rates (FDR). Here, we present the application of a novel, gene-independent and signature-based metagenomic taxonomic profiling method with significantly and consistently smaller FDR than any other available method. Our algorithm circumvents false positives using a series of non-redundant signature databases and examines Genomic Origins Through Taxonomic CHAllenge (GOTTCHA). GOTTCHA was tested and validated on 20 synthetic and mock datasets ranging in community composition and complexity, was applied successfully to data generated from spiked environmental and clinical samples, and robustly demonstrates superior performance compared with other available tools. PMID:25765641

  5. Position-specific automated processing of V3 env ultra-deep pyrosequencing data for predicting HIV-1 tropism

    PubMed Central

    Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre

    2015-01-01

    HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds. PMID:26585833

  6. Position-specific automated processing of V3 env ultra-deep pyrosequencing data for predicting HIV-1 tropism.

    PubMed

    Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre

    2015-11-20

    HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds.

  7. Wireless visual sensor network resource allocation using cross-layer optimization

    NASA Astrophysics Data System (ADS)

    Bentley, Elizabeth S.; Matyjas, John D.; Medley, Michael J.; Kondi, Lisimachos P.

    2009-01-01

    In this paper, we propose an approach to manage network resources for a Direct Sequence Code Division Multiple Access (DS-CDMA) visual sensor network where nodes monitor scenes with varying levels of motion. It uses cross-layer optimization across the physical layer, the link layer and the application layer. Our technique simultaneously assigns a source coding rate, a channel coding rate, and a power level to all nodes in the network based on one of two criteria that maximize the quality of video of the entire network as a whole, subject to a constraint on the total chip rate. One criterion results in the minimal average end-to-end distortion amongst all nodes, while the other criterion minimizes the maximum distortion of the network. Our approach allows one to determine the capacity of the visual sensor network based on the number of nodes and the quality of video that must be transmitted. For bandwidth-limited applications, one can also determine the minimum bandwidth needed to accommodate a number of nodes with a specific target chip rate. Video captured by a sensor node camera is encoded and decoded using the H.264 video codec by a centralized control unit at the network layer. To reduce the computational complexity of the solution, Universal Rate-Distortion Characteristics (URDCs) are obtained experimentally to relate bit error probabilities to the distortion of corrupted video. Bit error rates are found first by using Viterbi's upper bounds on the bit error probability and second, by simulating nodes transmitting data spread by Total Square Correlation (TSC) codes over a Rayleigh-faded DS-CDMA channel and receiving that data using Auxiliary Vector (AV) filtering.

  8. Parasitic plants have increased rates of molecular evolution across all three genomes

    PubMed Central

    2013-01-01

    Background Theoretical models and experimental evidence suggest that rates of molecular evolution could be raised in parasitic organisms compared to non-parasitic taxa. Parasitic plants provide an ideal test for these predictions, as there are at least a dozen independent origins of the parasitic lifestyle in angiosperms. Studies of a number of parasitic plant lineages have suggested faster rates of molecular evolution, but the results of some studies have been mixed. Comparative analysis of all parasitic plant lineages, including sequences from all three genomes, is needed to examine the generality of the relationship between rates of molecular evolution and parasitism in plants. Results We analysed DNA sequence data from the mitochondrial, nuclear and chloroplast genomes for 12 independent evolutionary origins of parasitism in angiosperms. We demonstrated that parasitic lineages have a faster rate of molecular evolution than their non-parasitic relatives in sequences for all three genomes, for both synonymous and nonsynonymous substitutions. Conclusions Our results prove that raised rates of molecular evolution are a general feature of parasitic plants, not confined to a few taxa or specific genes. We discuss possible causes for this relationship, including increased positive selection associated with host-parasite arms races, relaxed selection, reduced population size or repeated bottlenecks, increased mutation rates, and indirect causal links with generation time and body size. We find no evidence that faster rates are due to smaller effective populations sizes or changes in selection pressure. Instead, our results suggest that parasitic plants have a higher mutation rate than their close non-parasitic relatives. This may be due to a direct connection, where some aspect of the parasitic lifestyle drives the evolution of raised mutation rates. Alternatively, this pattern may be driven by an indirect connection between rates and parasitism: for example, parasitic plants tend to be smaller than their non-parasitic relatives, which may result in more cell generations per year, thus a higher rate of mutations arising from DNA copy errors per unit time. Demonstration that adoption of a parasitic lifestyle influences the rate of genomic evolution is relevant to attempts to infer molecular phylogenies of parasitic plants and to estimate their evolutionary divergence times using sequence data. PMID:23782527

  9. Parasitic plants have increased rates of molecular evolution across all three genomes.

    PubMed

    Bromham, Lindell; Cowman, Peter F; Lanfear, Robert

    2013-06-19

    Theoretical models and experimental evidence suggest that rates of molecular evolution could be raised in parasitic organisms compared to non-parasitic taxa. Parasitic plants provide an ideal test for these predictions, as there are at least a dozen independent origins of the parasitic lifestyle in angiosperms. Studies of a number of parasitic plant lineages have suggested faster rates of molecular evolution, but the results of some studies have been mixed. Comparative analysis of all parasitic plant lineages, including sequences from all three genomes, is needed to examine the generality of the relationship between rates of molecular evolution and parasitism in plants. We analysed DNA sequence data from the mitochondrial, nuclear and chloroplast genomes for 12 independent evolutionary origins of parasitism in angiosperms. We demonstrated that parasitic lineages have a faster rate of molecular evolution than their non-parasitic relatives in sequences for all three genomes, for both synonymous and nonsynonymous substitutions. Our results prove that raised rates of molecular evolution are a general feature of parasitic plants, not confined to a few taxa or specific genes. We discuss possible causes for this relationship, including increased positive selection associated with host-parasite arms races, relaxed selection, reduced population size or repeated bottlenecks, increased mutation rates, and indirect causal links with generation time and body size. We find no evidence that faster rates are due to smaller effective populations sizes or changes in selection pressure. Instead, our results suggest that parasitic plants have a higher mutation rate than their close non-parasitic relatives. This may be due to a direct connection, where some aspect of the parasitic lifestyle drives the evolution of raised mutation rates. Alternatively, this pattern may be driven by an indirect connection between rates and parasitism: for example, parasitic plants tend to be smaller than their non-parasitic relatives, which may result in more cell generations per year, thus a higher rate of mutations arising from DNA copy errors per unit time. Demonstration that adoption of a parasitic lifestyle influences the rate of genomic evolution is relevant to attempts to infer molecular phylogenies of parasitic plants and to estimate their evolutionary divergence times using sequence data.

  10. A graph edit dictionary for correcting errors in roof topology graphs reconstructed from point clouds

    NASA Astrophysics Data System (ADS)

    Xiong, B.; Oude Elberink, S.; Vosselman, G.

    2014-07-01

    In the task of 3D building model reconstruction from point clouds we face the problem of recovering a roof topology graph in the presence of noise, small roof faces and low point densities. Errors in roof topology graphs will seriously affect the final modelling results. The aim of this research is to automatically correct these errors. We define the graph correction as a graph-to-graph problem, similar to the spelling correction problem (also called the string-to-string problem). The graph correction is more complex than string correction, as the graphs are 2D while strings are only 1D. We design a strategy based on a dictionary of graph edit operations to automatically identify and correct the errors in the input graph. For each type of error the graph edit dictionary stores a representative erroneous subgraph as well as the corrected version. As an erroneous roof topology graph may contain several errors, a heuristic search is applied to find the optimum sequence of graph edits to correct the errors one by one. The graph edit dictionary can be expanded to include entries needed to cope with errors that were previously not encountered. Experiments show that the dictionary with only fifteen entries already properly corrects one quarter of erroneous graphs in about 4500 buildings, and even half of the erroneous graphs in one test area, achieving as high as a 95% acceptance rate of the reconstructed models.

  11. Software for pre-processing Illumina next-generation sequencing short read sequences

    PubMed Central

    2014-01-01

    Background When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets. Methods We developed ngsShoRT (next-generation sequencing Short Reads Trimmer), a flexible and comprehensive open-source software package written in Perl that provides a set of algorithms commonly used for pre-processing NGS short read sequences. We compared the features and performance of ngsShoRT with existing tools: CutAdapt, NGS QC Toolkit and Trimmomatic. We also compared the effects of using pre-processed short read sequences generated by different algorithms on de novo and reference-based assembly for three different genomes: Caenorhabditis elegans, Saccharomyces cerevisiae S288c, and Escherichia coli O157 H7. Results Several combinations of ngsShoRT algorithms were tested on publicly available Illumina GA II, HiSeq 2000, and MiSeq eukaryotic and bacteria genomic short read sequences with the focus on removing sequencing artifacts and low-quality reads and/or bases. Our results show that across three organisms and three sequencing platforms, trimming improved the mean quality scores of trimmed sequences. Using trimmed sequences for de novo and reference-based assembly improved assembly quality as well as assembler performance. In general, ngsShoRT outperformed comparable trimming tools in terms of trimming speed and improvement of de novo and reference-based assembly as measured by assembly contiguity and correctness. Conclusions Trimming of short read sequences can improve the quality of de novo and reference-based assembly and assembler performance. The parallel processing capability of ngsShoRT reduces trimming time and improves the memory efficiency when dealing with large datasets. We recommend combining sequencing artifacts removal, and quality score based read filtering and base trimming as the most consistent method for improving sequence quality and downstream assemblies. ngsShoRT source code, user guide and tutorial are available at http://research.bioinformatics.udel.edu/genomics/ngsShoRT/. ngsShoRT can be incorporated as a pre-processing step in genome and transcriptome assembly projects. PMID:24955109

  12. Synchronized moving aperture radiation therapy (SMART): superimposing tumor motion on IMRT MLC leaf sequences under realistic delivery conditions

    NASA Astrophysics Data System (ADS)

    Xu, Jun; Papanikolaou, Nikos; Shi, Chengyu; Jiang, Steve B.

    2009-08-01

    Synchronized moving aperture radiation therapy (SMART) has been proposed to account for tumor motions during radiotherapy in prior work. The basic idea of SMART is to synchronize the moving radiation beam aperture formed by a dynamic multileaf collimator (DMLC) with the tumor motion induced by respiration. In this paper, a two-dimensional (2D) superimposing leaf sequencing method is presented for SMART. A leaf sequence optimization strategy was generated to assure the SMART delivery under realistic delivery conditions. The study of delivery performance using the Varian LINAC and the Millennium DMLC showed that clinical factors such as collimator angle, dose rate, initial phase and machine tolerance affect the delivery accuracy and efficiency. An in-house leaf sequencing software was developed to implement the 2D superimposing leaf sequencing method and optimize the motion-corrected leaf sequence under realistic clinical conditions. The analysis of dynamic log (Dynalog) files showed that optimization of the leaf sequence for various clinical factors can avoid beam hold-offs which break the synchronization of SMART and fail the SMART dose delivery. Through comparison between the simulated delivered fluence map and the planed fluence map, it was shown that the motion-corrected leaf sequence can greatly reduce the dose error.

  13. Synchronized moving aperture radiation therapy (SMART): superimposing tumor motion on IMRT MLC leaf sequences under realistic delivery conditions.

    PubMed

    Xu, Jun; Papanikolaou, Nikos; Shi, Chengyu; Jiang, Steve B

    2009-08-21

    Synchronized moving aperture radiation therapy (SMART) has been proposed to account for tumor motions during radiotherapy in prior work. The basic idea of SMART is to synchronize the moving radiation beam aperture formed by a dynamic multileaf collimator (DMLC) with the tumor motion induced by respiration. In this paper, a two-dimensional (2D) superimposing leaf sequencing method is presented for SMART. A leaf sequence optimization strategy was generated to assure the SMART delivery under realistic delivery conditions. The study of delivery performance using the Varian LINAC and the Millennium DMLC showed that clinical factors such as collimator angle, dose rate, initial phase and machine tolerance affect the delivery accuracy and efficiency. An in-house leaf sequencing software was developed to implement the 2D superimposing leaf sequencing method and optimize the motion-corrected leaf sequence under realistic clinical conditions. The analysis of dynamic log (Dynalog) files showed that optimization of the leaf sequence for various clinical factors can avoid beam hold-offs which break the synchronization of SMART and fail the SMART dose delivery. Through comparison between the simulated delivered fluence map and the planed fluence map, it was shown that the motion-corrected leaf sequence can greatly reduce the dose error.

  14. On site DNA barcoding by nanopore sequencing

    PubMed Central

    Menegon, Michele; Cantaloni, Chiara; Rodriguez-Prieto, Ana; Centomo, Cesare; Abdelfattah, Ahmed; Rossato, Marzia; Bernardi, Massimo; Xumerle, Luciano; Loader, Simon; Delledonne, Massimo

    2017-01-01

    Biodiversity research is becoming increasingly dependent on genomics, which allows the unprecedented digitization and understanding of the planet’s biological heritage. The use of genetic markers i.e. DNA barcoding, has proved to be a powerful tool in species identification. However, full exploitation of this approach is hampered by the high sequencing costs and the absence of equipped facilities in biodiversity-rich countries. In the present work, we developed a portable sequencing laboratory based on the portable DNA sequencer from Oxford Nanopore Technologies, the MinION. Complementary laboratory equipment and reagents were selected to be used in remote and tough environmental conditions. The performance of the MinION sequencer and the portable laboratory was tested for DNA barcoding in a mimicking tropical environment, as well as in a remote rainforest of Tanzania lacking electricity. Despite the relatively high sequencing error-rate of the MinION, the development of a suitable pipeline for data analysis allowed the accurate identification of different species of vertebrates including amphibians, reptiles and mammals. In situ sequencing of a wild frog allowed us to rapidly identify the species captured, thus confirming that effective DNA barcoding in the field is possible. These results open new perspectives for real-time-on-site DNA sequencing thus potentially increasing opportunities for the understanding of biodiversity in areas lacking conventional laboratory facilities. PMID:28977016

  15. Pattern of eyelid motion predictive of decision errors during drowsiness: oculomotor indices of altered states.

    PubMed

    Lobb, M L; Stern, J A

    1986-08-01

    Sequential patterns of eye and eyelid motion were identified in seven subjects performing a modified serial probe recognition task under drowsy conditions. Using simultaneous EOG and video recordings, eyelid motion was divided into components above, within, and below the pupil and the durations in sequence were recorded. A serial probe recognition task was modified to allow for distinguishing decision errors from attention errors. Decision errors were found to be more frequent following a downward shift in the gaze angle which the eyelid closing sequence was reduced from a five element to a three element sequence. The velocity of the eyelid moving over the pupil during decision errors was slow in the closing and fast in the reopening phase, while on decision correct trials it was fast in closing and slower in reopening. Due to the high variability of eyelid motion under drowsy conditions these findings were only marginally significant. When a five element blink occurred, the velocity of the lid over pupil motion component of these endogenous eye blinks was significantly faster on decision correct than on decision error trials. Furthermore, the highly variable, long duration closings associated with the decision response produced slow eye movements in the horizontal plane (SEM) which were more frequent and significantly longer in duration on decision error versus decision correct responses.

  16. Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads.

    PubMed

    Song, Li; Florea, Liliana

    2015-01-01

    Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole-genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, owing to the variation in gene expression levels and alternative splicing. We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read. Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from https://github.com/mourisl/Rcorrector/.

  17. Using comparative genome analysis to identify problems in annotated microbial genomes.

    PubMed

    Poptsova, Maria S; Gogarten, J Peter

    2010-07-01

    Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.

  18. Research on parallel combinatory spread spectrum communication system with double information matching

    NASA Astrophysics Data System (ADS)

    Xue, Wei; Wang, Qi; Wang, Tianyu

    2018-04-01

    This paper presents an improved parallel combinatory spread spectrum (PC/SS) communication system with the method of double information matching (DIM). Compared with conventional PC/SS system, the new model inherits the advantage of high transmission speed, large information capacity and high security. Besides, the problem traditional system will face is the high bit error rate (BER) and since its data-sequence mapping algorithm. Hence the new model presented shows lower BER and higher efficiency by its optimization of mapping algorithm.

  19. Digital visual communications using a Perceptual Components Architecture

    NASA Technical Reports Server (NTRS)

    Watson, Andrew B.

    1991-01-01

    The next era of space exploration will generate extraordinary volumes of image data, and management of this image data is beyond current technical capabilities. We propose a strategy for coding visual information that exploits the known properties of early human vision. This Perceptual Components Architecture codes images and image sequences in terms of discrete samples from limited bands of color, spatial frequency, orientation, and temporal frequency. This spatiotemporal pyramid offers efficiency (low bit rate), variable resolution, device independence, error-tolerance, and extensibility.

  20. Ensemble codes involving hippocampal neurons are at risk during delayed performance tests.

    PubMed

    Hampson, R E; Deadwyler, S A

    1996-11-26

    Multielectrode recording techniques were used to record ensemble activity from 10 to 16 simultaneously active CA1 and CA3 neurons in the rat hippocampus during performance of a spatial delayed-nonmatch-to-sample task. Extracted sources of variance were used to assess the nature of two different types of errors that accounted for 30% of total trials. The two types of errors included ensemble "miscodes" of sample phase information and errors associated with delay-dependent corruption or disappearance of sample information at the time of the nonmatch response. Statistical assessment of trial sequences and associated "strength" of hippocampal ensemble codes revealed that miscoded error trials always followed delay-dependent error trials in which encoding was "weak," indicating that the two types of errors were "linked." It was determined that the occurrence of weakly encoded, delay-dependent error trials initiated an ensemble encoding "strategy" that increased the chances of being correct on the next trial and avoided the occurrence of further delay-dependent errors. Unexpectedly, the strategy involved "strongly" encoding response position information from the prior (delay-dependent) error trial and carrying it forward to the sample phase of the next trial. This produced a miscode type error on trials in which the "carried over" information obliterated encoding of the sample phase response on the next trial. Application of this strategy, irrespective of outcome, was sufficient to reorient the animal to the proper between trial sequence of response contingencies (nonmatch-to-sample) and boost performance to 73% correct on subsequent trials. The capacity for ensemble analyses of strength of information encoding combined with statistical assessment of trial sequences therefore provided unique insight into the "dynamic" nature of the role hippocampus plays in delay type memory tasks.

  1. SeqMule: automated pipeline for analysis of human exome/genome sequencing data.

    PubMed

    Guo, Yunfei; Ding, Xiaolei; Shen, Yufeng; Lyon, Gholson J; Wang, Kai

    2015-09-18

    Next-generation sequencing (NGS) technology has greatly helped us identify disease-contributory variants for Mendelian diseases. However, users are often faced with issues such as software compatibility, complicated configuration, and no access to high-performance computing facility. Discrepancies exist among aligners and variant callers. We developed a computational pipeline, SeqMule, to perform automated variant calling from NGS data on human genomes and exomes. SeqMule integrates computational-cluster-free parallelization capability built on top of the variant callers, and facilitates normalization/intersection of variant calls to generate consensus set with high confidence. SeqMule integrates 5 alignment tools, 5 variant calling algorithms and accepts various combinations all by one-line command, therefore allowing highly flexible yet fully automated variant calling. In a modern machine (2 Intel Xeon X5650 CPUs, 48 GB memory), when fast turn-around is needed, SeqMule generates annotated VCF files in a day from a 30X whole-genome sequencing data set; when more accurate calling is needed, SeqMule generates consensus call set that improves over single callers, as measured by both Mendelian error rate and consistency. SeqMule supports Sun Grid Engine for parallel processing, offers turn-key solution for deployment on Amazon Web Services, allows quality check, Mendelian error check, consistency evaluation, HTML-based reports. SeqMule is available at http://seqmule.openbioinformatics.org.

  2. Fast converging minimum probability of error neural network receivers for DS-CDMA communications.

    PubMed

    Matyjas, John D; Psaromiligkos, Ioannis N; Batalama, Stella N; Medley, Michael J

    2004-03-01

    We consider a multilayer perceptron neural network (NN) receiver architecture for the recovery of the information bits of a direct-sequence code-division-multiple-access (DS-CDMA) user. We develop a fast converging adaptive training algorithm that minimizes the bit-error rate (BER) at the output of the receiver. The adaptive algorithm has three key features: i) it incorporates the BER, i.e., the ultimate performance evaluation measure, directly into the learning process, ii) it utilizes constraints that are derived from the properties of the optimum single-user decision boundary for additive white Gaussian noise (AWGN) multiple-access channels, and iii) it embeds importance sampling (IS) principles directly into the receiver optimization process. Simulation studies illustrate the BER performance of the proposed scheme.

  3. A channel estimation scheme for MIMO-OFDM systems

    NASA Astrophysics Data System (ADS)

    He, Chunlong; Tian, Chu; Li, Xingquan; Zhang, Ce; Zhang, Shiqi; Liu, Chaowen

    2017-08-01

    In view of the contradiction of the time-domain least squares (LS) channel estimation performance and the practical realization complexity, a reduced complexity channel estimation method for multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) based on pilot is obtained. This approach can transform the complexity of MIMO-OFDM channel estimation problem into a simple single input single output-orthogonal frequency division multiplexing (SISO-OFDM) channel estimation problem and therefore there is no need for large matrix pseudo-inverse, which greatly reduces the complexity of algorithms. Simulation results show that the bit error rate (BER) performance of the obtained method with time orthogonal training sequences and linear minimum mean square error (LMMSE) criteria is better than that of time-domain LS estimator and nearly optimal performance.

  4. Ancient DNA sequence revealed by error-correcting codes.

    PubMed

    Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

    2015-07-10

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.

  5. Ancient DNA sequence revealed by error-correcting codes

    PubMed Central

    Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

    2015-01-01

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228

  6. Rapid Measurement and Correction of Phase Errors from B0 Eddy Currents: Impact on Image Quality for Non-Cartesian Imaging

    PubMed Central

    Brodsky, Ethan K.; Klaers, Jessica L.; Samsonov, Alexey A.; Kijowski, Richard; Block, Walter F.

    2014-01-01

    Non-Cartesian imaging sequences and navigational methods can be more sensitive to scanner imperfections that have little impact on conventional clinical sequences, an issue which has repeatedly complicated the commercialization of these techniques by frustrating transitions to multi-center evaluations. One such imperfection is phase errors caused by resonant frequency shifts from eddy currents induced in the cryostat by time-varying gradients, a phenomemon known as B0 eddy currents. These phase errors can have a substantial impact on sequences that use ramp sampling, bipolar gradients, and readouts at varying azimuthal angles. We present a method for measuring and correcting phase errors from B0 eddy currents and examine the results on two different scanner models. This technique yields significant improvements in image quality for high-resolution joint imaging on certain scanners. The results suggest that correction of short time B0 eddy currents in manufacturer provided service routines would simplify adoption of non-Cartesian sampling methods. PMID:22488532

  7. Error propagation in eigenimage filtering.

    PubMed

    Soltanian-Zadeh, H; Windham, J P; Jenkins, J M

    1990-01-01

    Mathematical derivation of error (noise) propagation in eigenimage filtering is presented. Based on the mathematical expressions, a method for decreasing the propagated noise given a sequence of images is suggested. The signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) of the final composite image are compared to the SNRs and CNRs of the images in the sequence. The consistency of the assumptions and accuracy of the mathematical expressions are investigated using sequences of simulated and real magnetic resonance (MR) images of an agarose phantom and a human brain.

  8. The detection error of thermal test low-frequency cable based on M sequence correlation algorithm

    NASA Astrophysics Data System (ADS)

    Wu, Dongliang; Ge, Zheyang; Tong, Xin; Du, Chunlin

    2018-04-01

    The problem of low accuracy and low efficiency of off-line detecting on thermal test low-frequency cable faults could be solved by designing a cable fault detection system, based on FPGA export M sequence code(Linear feedback shift register sequence) as pulse signal source. The design principle of SSTDR (Spread spectrum time-domain reflectometry) reflection method and hardware on-line monitoring setup figure is discussed in this paper. Testing data show that, this detection error increases with fault location of thermal test low-frequency cable.

  9. Describing Phonological Paraphasias in Three Variants of Primary Progressive Aphasia.

    PubMed

    Dalton, Sarah Grace Hudspeth; Shultz, Christine; Henry, Maya L; Hillis, Argye E; Richardson, Jessica D

    2018-03-01

    The purpose of this study was to describe the linguistic environment of phonological paraphasias in 3 variants of primary progressive aphasia (semantic, logopenic, and nonfluent) and to describe the profiles of paraphasia production for each of these variants. Discourse samples of 26 individuals diagnosed with primary progressive aphasia were investigated for phonological paraphasias using the criteria established for the Philadelphia Naming Test (Moss Rehabilitation Research Institute, 2013). Phonological paraphasias were coded for paraphasia type, part of speech of the target word, target word frequency, type of segment in error, word position of consonant errors, type of error, and degree of change in consonant errors. Eighteen individuals across the 3 variants produced phonological paraphasias. Most paraphasias were nonword, followed by formal, and then mixed, with errors primarily occurring on nouns and verbs, with relatively few on function words. Most errors were substitutions, followed by addition and deletion errors, and few sequencing errors. Errors were evenly distributed across vowels, consonant singletons, and clusters, with more errors occurring in initial and medial positions of words than in the final position of words. Most consonant errors consisted of only a single-feature change, with few 2- or 3-feature changes. Importantly, paraphasia productions by variant differed from these aggregate results, with unique production patterns for each variant. These results suggest that a system where paraphasias are coded as present versus absent may be insufficient to adequately distinguish between the 3 subtypes of PPA. The 3 variants demonstrate patterns that may be used to improve phenotyping and diagnostic sensitivity. These results should be integrated with recent findings on phonological processing and speech rate. Future research should attempt to replicate these results in a larger sample of participants with longer speech samples and varied elicitation tasks. https://doi.org/10.23641/asha.5558107.

  10. Rapid evaluation for dielectronic recombination rate coefficients of the H-like isoelectronic sequence.

    NASA Astrophysics Data System (ADS)

    Teng, H.; Xu, Z.

    1996-09-01

    The authors present a set of accurate formulae for the rapid calculation of dielectronic recombination rate coefficients of H-like ions from Ne (Z = 10) to Ni (Z = 29) with an electron temperature range from 0.6 to 10 keV. This set of formulae are obtained by fitting directly the dielectronic recombination rate coefficients calculated on the basis of the intermediate - coupling multi - configuration Hartree-Fock model made by Karim and Bhalla (1988). The dielectronic recombination rate coefficients from these formulae are in close agreement with the original results of Karim et al. The errors are generally less than 0.1%. The results are also compared with the ones obtained by a set of new rate formulae developed by Hahn. These formulae can be used for generating dielectronic recombination rate coefficients of some H-like ions where the explicit calculations are unavailable. The detailed results are tabulated and discussed.

  11. Fully automatic hp-adaptivity for acoustic and electromagnetic scattering in three dimensions

    NASA Astrophysics Data System (ADS)

    Kurtz, Jason Patrick

    We present an algorithm for fully automatic hp-adaptivity for finite element approximations of elliptic and Maxwell boundary value problems in three dimensions. The algorithm automatically generates a sequence of coarse grids, and a corresponding sequence of fine grids, such that the energy norm of the error decreases exponentially with respect to the number of degrees of freedom in either sequence. At each step, we employ a discrete optimization algorithm to determine the refinements for the current coarse grid such that the projection-based interpolation error for the current fine grid solution decreases with an optimal rate with respect to the number of degrees of freedom added by the refinement. The refinements are restricted only by the requirement that the resulting mesh is at most 1-irregular, but they may be anisotropic in both element size h and order of approximation p. While we cannot prove that our method converges at all, we present numerical evidence of exponential convergence for a diverse suite of model problems from acoustic and electromagnetic scattering. In particular we show that our method is well suited to the automatic resolution of exterior problems truncated by the introduction of a perfectly matched layer. To enable and accelerate the solution of these problems on commodity hardware, we include a detailed account of three critical aspects of our implementation, namely an efficient implementation of sum factorization, several efficient interfaces to the direct multi-frontal solver MUMPS, and some fast direct solvers for the computation of a sequence of nested projections.

  12. A complete high-quality MinION nanopore assembly of an extensively drug-resistant Mycobacterium tuberculosis Beijing lineage strain identifies novel variation in repetitive PE/PPE gene regions.

    PubMed

    Bainomugisa, Arnold; Duarte, Tania; Lavu, Evelyn; Pandey, Sushil; Coulter, Chris; Marais, Ben J; Coin, Lachlan M

    2018-06-15

    A better understanding of the genomic changes that facilitate the emergence and spread of drug-resistant Mycobacterium tuberculosis strains is currently required. Here, we report the use of the MinION nanopore sequencer (Oxford Nanopore Technologies) to sequence and assemble an extensively drug-resistant (XDR) isolate, which is part of a modern Beijing sub-lineage strain, prevalent in Western Province, Papua New Guinea. Using 238-fold coverage obtained from a single flow-cell, de novo assembly of nanopore reads resulted into one contiguous assembly with 99.92 % assembly accuracy. Incorporation of complementary short read sequences (Illumina) as part of consensus error correction resulted in a 4 404 064 bp genome with 99.98 % assembly accuracy. This assembly had an average nucleotide identity of 99.7 % relative to the reference genome, H37Rv. We assembled nearly all GC-rich repetitive PE/PPE family genes (166/168) and identified variants within these genes. With an estimated genotypic error rate of 5.3 % from MinION data, we demonstrated identification of variants to include the conventional drug resistance mutations, and those that contribute to the resistance phenotype (efflux pumps/transporter) and virulence. Reference-based alignment of the assembly allowed detection of deletions and insertions. MinION sequencing provided a fully annotated assembly of a transmissible XDR strain from an endemic setting and showed its utility to provide further understanding of genomic processes within Mycobacterium tuberculosis.

  13. Learning by observation: insights from Williams syndrome.

    PubMed

    Foti, Francesca; Menghini, Deny; Mandolesi, Laura; Federico, Francesca; Vicari, Stefano; Petrosini, Laura

    2013-01-01

    Observing another person performing a complex action accelerates the observer's acquisition of the same action and limits the time-consuming process of learning by trial and error. Observational learning makes an interesting and potentially important topic in the developmental domain, especially when disorders are considered. The implications of studies aimed at clarifying whether and how this form of learning is spared by pathology are manifold. We focused on a specific population with learning and intellectual disabilities, the individuals with Williams syndrome. The performance of twenty-eight individuals with Williams syndrome was compared with that of mental age- and gender-matched thirty-two typically developing children on tasks of learning of a visuo-motor sequence by observation or by trial and error. Regardless of the learning modality, acquiring the correct sequence involved three main phases: a detection phase, in which participants discovered the correct sequence and learned how to perform the task; an exercise phase, in which they reproduced the sequence until performance was error-free; an automatization phase, in which by repeating the error-free sequence they became accurate and speedy. Participants with Williams syndrome beneficiated of observational training (in which they observed an actor detecting the visuo-motor sequence) in the detection phase, while they performed worse than typically developing children in the exercise and automatization phases. Thus, by exploiting competencies learned by observation, individuals with Williams syndrome detected the visuo-motor sequence, putting into action the appropriate procedural strategies. Conversely, their impaired performances in the exercise phases appeared linked to impaired spatial working memory, while their deficits in automatization phases to deficits in processes increasing efficiency and speed of the response. Overall, observational experience was advantageous for acquiring competencies, since it primed subjects' interest in the actions to be performed and functioned as a catalyst for executed action.

  14. Characterization of the uncertainty of divergence time estimation under relaxed molecular clock models using multiple loci.

    PubMed

    Zhu, Tianqi; Dos Reis, Mario; Yang, Ziheng

    2015-03-01

    Genetic sequence data provide information about the distances between species or branch lengths in a phylogeny, but not about the absolute divergence times or the evolutionary rates directly. Bayesian methods for dating species divergences estimate times and rates by assigning priors on them. In particular, the prior on times (node ages on the phylogeny) incorporates information in the fossil record to calibrate the molecular tree. Because times and rates are confounded, our posterior time estimates will not approach point values even if an infinite amount of sequence data are used in the analysis. In a previous study we developed a finite-sites theory to characterize the uncertainty in Bayesian divergence time estimation in analysis of large but finite sequence data sets under a strict molecular clock. As most modern clock dating analyses use more than one locus and are conducted under relaxed clock models, here we extend the theory to the case of relaxed clock analysis of data from multiple loci (site partitions). Uncertainty in posterior time estimates is partitioned into three sources: Sampling errors in the estimates of branch lengths in the tree for each locus due to limited sequence length, variation of substitution rates among lineages and among loci, and uncertainty in fossil calibrations. Using a simple but analogous estimation problem involving the multivariate normal distribution, we predict that as the number of loci ([Formula: see text]) goes to infinity, the variance in posterior time estimates decreases and approaches the infinite-data limit at the rate of 1/[Formula: see text], and the limit is independent of the number of sites in the sequence alignment. We then confirmed the predictions by using computer simulation on phylogenies of two or three species, and by analyzing a real genomic data set for six primate species. Our results suggest that with the fossil calibrations fixed, analyzing multiple loci or site partitions is the most effective way for improving the precision of posterior time estimation. However, even if a huge amount of sequence data is analyzed, considerable uncertainty will persist in time estimates. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society of Systematic Biologists.

  15. Error correction and statistical analyses for intra-host comparisons of feline immunodeficiency virus diversity from high-throughput sequencing data.

    PubMed

    Liu, Yang; Chiaromonte, Francesca; Ross, Howard; Malhotra, Raunaq; Elleder, Daniel; Poss, Mary

    2015-06-30

    Infection with feline immunodeficiency virus (FIV) causes an immunosuppressive disease whose consequences are less severe if cats are co-infected with an attenuated FIV strain (PLV). We use virus diversity measurements, which reflect replication ability and the virus response to various conditions, to test whether diversity of virulent FIV in lymphoid tissues is altered in the presence of PLV. Our data consisted of the 3' half of the FIV genome from three tissues of animals infected with FIV alone, or with FIV and PLV, sequenced by 454 technology. Since rare variants dominate virus populations, we had to carefully distinguish sequence variation from errors due to experimental protocols and sequencing. We considered an exponential-normal convolution model used for background correction of microarray data, and modified it to formulate an error correction approach for minor allele frequencies derived from high-throughput sequencing. Similar to accounting for over-dispersion in counts, this accounts for error-inflated variability in frequencies - and quite effectively reproduces empirically observed distributions. After obtaining error-corrected minor allele frequencies, we applied ANalysis Of VAriance (ANOVA) based on a linear mixed model and found that conserved sites and transition frequencies in FIV genes differ among tissues of dual and single infected cats. Furthermore, analysis of minor allele frequencies at individual FIV genome sites revealed 242 sites significantly affected by infection status (dual vs. single) or infection status by tissue interaction. All together, our results demonstrated a decrease in FIV diversity in bone marrow in the presence of PLV. Importantly, these effects were weakened or undetectable when error correction was performed with other approaches (thresholding of minor allele frequencies; probabilistic clustering of reads). We also queried the data for cytidine deaminase activity on the viral genome, which causes an asymmetric increase in G to A substitutions, but found no evidence for this host defense strategy. Our error correction approach for minor allele frequencies (more sensitive and computationally efficient than other algorithms) and our statistical treatment of variation (ANOVA) were critical for effective use of high-throughput sequencing data in understanding viral diversity. We found that co-infection with PLV shifts FIV diversity from bone marrow to lymph node and spleen.

  16. Retention-error patterns in complex alphanumeric serial-recall tasks.

    PubMed

    Mathy, Fabien; Varré, Jean-Stéphane

    2013-01-01

    We propose a new method based on an algorithm usually dedicated to DNA sequence alignment in order to both reliably score short-term memory performance on immediate serial-recall tasks and analyse retention-error patterns. There can be considerable confusion on how performance on immediate serial list recall tasks is scored, especially when the to-be-remembered items are sampled with replacement. We discuss the utility of sequence-alignment algorithms to compare the stimuli to the participants' responses. The idea is that deletion, substitution, translocation, and insertion errors, which are typical in DNA, are also typical putative errors in short-term memory (respectively omission, confusion, permutation, and intrusion errors). We analyse four data sets in which alphanumeric lists included a few (or many) repetitions. After examining the method on two simple data sets, we show that sequence alignment offers 1) a compelling method for measuring capacity in terms of chunks when many regularities are introduced in the material (third data set) and 2) a reliable estimator of individual differences in short-term memory capacity. This study illustrates the difficulty of arriving at a good measure of short-term memory performance, and also attempts to characterise the primary factors underpinning remembering and forgetting.

  17. Learning time-dependent noise to reduce logical errors: real time error rate estimation in quantum error correction

    NASA Astrophysics Data System (ADS)

    Huo, Ming-Xia; Li, Ying

    2017-12-01

    Quantum error correction is important to quantum information processing, which allows us to reliably process information encoded in quantum error correction codes. Efficient quantum error correction benefits from the knowledge of error rates. We propose a protocol for monitoring error rates in real time without interrupting the quantum error correction. Any adaptation of the quantum error correction code or its implementation circuit is not required. The protocol can be directly applied to the most advanced quantum error correction techniques, e.g. surface code. A Gaussian processes algorithm is used to estimate and predict error rates based on error correction data in the past. We find that using these estimated error rates, the probability of error correction failures can be significantly reduced by a factor increasing with the code distance.

  18. TU-H-CAMPUS-JeP3-05: Adaptive Determination of Needle Sequence HDR Prostate Brachytherapy with Divergent Needle-By-Needle Delivery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Borot de Battisti, M; Maenhout, M; Lagendijk, J J W

    Purpose: To develop a new method which adaptively determines the optimal needle insertion sequence for HDR prostate brachytherapy involving divergent needle-by-needle dose delivery by e.g. a robotic device. A needle insertion sequence is calculated at the beginning of the intervention and updated after each needle insertion with feedback on needle positioning errors. Methods: Needle positioning errors and anatomy changes may occur during HDR brachytherapy which can lead to errors in the delivered dose. A novel strategy was developed to calculate and update the needle sequence and the dose plan after each needle insertion with feedback on needle positioning errors. Themore » dose plan optimization was performed by numerical simulations. The proposed needle sequence determination optimizes the final dose distribution based on the dose coverage impact of each needle. This impact is predicted stochastically by needle insertion simulations. HDR procedures were simulated with varying number of needle insertions (4 to 12) using 11 patient MR data-sets with PTV, prostate, urethra, bladder and rectum delineated. Needle positioning errors were modeled by random normally distributed angulation errors (standard deviation of 3 mm at the needle’s tip). The final dose parameters were compared in the situations where the needle with the largest vs. the smallest dose coverage impact was selected at each insertion. Results: Over all scenarios, the percentage of clinically acceptable final dose distribution improved when the needle selected had the largest dose coverage impact (91%) compared to the smallest (88%). The differences were larger for few (4 to 6) needle insertions (maximum difference scenario: 79% vs. 60%). The computation time of the needle sequence optimization was below 60s. Conclusion: A new adaptive needle sequence determination for HDR prostate brachytherapy was developed. Coupled to adaptive planning, the selection of the needle with the largest dose coverage impact increases chances of reaching the clinical constraints. M. Borot de Battisti is funded by Philips Medical Systems Nederland B.V.; M. Moerland is principal investigator on a contract funded by Philips Medical Systems Nederland B.V.; G. Hautvast and D. Binnekamp are fulltime employees of Philips Medical Systems Nederland B.V.« less

  19. Frame sequences analysis technique of linear objects movement

    NASA Astrophysics Data System (ADS)

    Oshchepkova, V. Y.; Berg, I. A.; Shchepkin, D. V.; Kopylova, G. V.

    2017-12-01

    Obtaining data by noninvasive methods are often needed in many fields of science and engineering. This is achieved through video recording in various frame rate and light spectra. In doing so quantitative analysis of movement of the objects being studied becomes an important component of the research. This work discusses analysis of motion of linear objects on the two-dimensional plane. The complexity of this problem increases when the frame contains numerous objects whose images may overlap. This study uses a sequence containing 30 frames at the resolution of 62 × 62 pixels and frame rate of 2 Hz. It was required to determine the average velocity of objects motion. This velocity was found as an average velocity for 8-12 objects with the error of 15%. After processing dependencies of the average velocity vs. control parameters were found. The processing was performed in the software environment GMimPro with the subsequent approximation of the data obtained using the Hill equation.

  20. ABI Base Recall: Automatic Correction and Ends Trimming of DNA Sequences.

    PubMed

    Elyazghi, Zakaria; Yazouli, Loubna El; Sadki, Khalid; Radouani, Fouzia

    2017-12-01

    Automated DNA sequencers produce chromatogram files in ABI format. When viewing chromatograms, some ambiguities are shown at various sites along the DNA sequences, because the program implemented in the sequencing machine and used to call bases cannot always precisely determine the right nucleotide, especially when it is represented by either a broad peak or a set of overlaying peaks. In such cases, a letter other than A, C, G, or T is recorded, most commonly N. Thus, DNA sequencing chromatograms need manual examination: checking for mis-calls and truncating the sequence when errors become too frequent. The purpose of this paper is to develop a program allowing the automatic correction of these ambiguities. This application is a Web-based program powered by Shiny and runs under R platform for an easy exploitation. As a part of the interface, we added the automatic ends clipping option, alignment against reference sequences, and BLAST. To develop and test our tool, we collected several bacterial DNA sequences from different laboratories within Institut Pasteur du Maroc and performed both manual and automatic correction. The comparison between the two methods was carried out. As a result, we note that our program, ABI base recall, accomplishes good correction with a high accuracy. Indeed, it increases the rate of identity and coverage and minimizes the number of mismatches and gaps, hence it provides solution to sequencing ambiguities and saves biologists' time and labor.

  1. Alignment methods: strategies, challenges, benchmarking, and comparative overview.

    PubMed

    Löytynoja, Ari

    2012-01-01

    Comparative evolutionary analyses of molecular sequences are solely based on the identities and differences detected between homologous characters. Errors in this homology statement, that is errors in the alignment of the sequences, are likely to lead to errors in the downstream analyses. Sequence alignment and phylogenetic inference are tightly connected and many popular alignment programs use the phylogeny to divide the alignment problem into smaller tasks. They then neglect the phylogenetic tree, however, and produce alignments that are not evolutionarily meaningful. The use of phylogeny-aware methods reduces the error but the resulting alignments, with evolutionarily correct representation of homology, can challenge the existing practices and methods for viewing and visualising the sequences. The inter-dependency of alignment and phylogeny can be resolved by joint estimation of the two; methods based on statistical models allow for inferring the alignment parameters from the data and correctly take into account the uncertainty of the solution but remain computationally challenging. Widely used alignment methods are based on heuristic algorithms and unlikely to find globally optimal solutions. The whole concept of one correct alignment for the sequences is questionable, however, as there typically exist vast numbers of alternative, roughly equally good alignments that should also be considered. This uncertainty is hidden by many popular alignment programs and is rarely correctly taken into account in the downstream analyses. The quest for finding and improving the alignment solution is complicated by the lack of suitable measures of alignment goodness. The difficulty of comparing alternative solutions also affects benchmarks of alignment methods and the results strongly depend on the measure used. As the effects of alignment error cannot be predicted, comparing the alignments' performance in downstream analyses is recommended.

  2. Performance analysis of an all-digital BPSK direct sequence spread-spectrum IF receiver architecture

    NASA Astrophysics Data System (ADS)

    Chung, Bong-Young; Chien, Charles; Samueli, Henry; Jain, Rajeev

    1993-09-01

    A VLSI architecture for an all-digital binary phase shift keyed (BPSK) direct-sequence (DS) spread spectrum (SS) IF receiver is presented, and an in-depth performance analysis is given. The all-digital architecture incorporates a Costar loop for carrier recovery and a delay-locked loop for clock recovery. For the PN acquisition block, a robust energy detection scheme is proposed to reduce false PN locks over a broad range of signal-to-noise ratios. The proposed architecture is intended for use in the 902-928 MHz unlicensed spread spectrum radio band. A 100 kbs information rate and a 12.7 Mchips/second PN code rate are assumed. The IF center frequency is 12.7 MHz and the IF sampling rate is 50.8 Msamples/ second, which is the Nyquist rate for the 25.4 MHz bandwidth signal. Finite wordlength effects have been simulated to optimize the architecture, thereby minimizing the chip area, and results of the finite wordlength simulations demonstrate that the chip architecture achieves a bit error rate performance within 1 dB of theory in an additive white Gaussian noise channel. The probability of PN acquisition within 5 ms is approximately 56% at -17 dB IF input SNR and 82% at -11 dB IF input SNR.

  3. Task planning with uncertainty for robotic systems. Thesis

    NASA Technical Reports Server (NTRS)

    Cao, Tiehua

    1993-01-01

    In a practical robotic system, it is important to represent and plan sequences of operations and to be able to choose an efficient sequence from them for a specific task. During the generation and execution of task plans, different kinds of uncertainty may occur and erroneous states need to be handled to ensure the efficiency and reliability of the system. An approach to task representation, planning, and error recovery for robotic systems is demonstrated. Our approach to task planning is based on an AND/OR net representation, which is then mapped to a Petri net representation of all feasible geometric states and associated feasibility criteria for net transitions. Task decomposition of robotic assembly plans based on this representation is performed on the Petri net for robotic assembly tasks, and the inheritance of properties of liveness, safeness, and reversibility at all levels of decomposition are explored. This approach provides a framework for robust execution of tasks through the properties of traceability and viability. Uncertainty in robotic systems are modeled by local fuzzy variables, fuzzy marking variables, and global fuzzy variables which are incorporated in fuzzy Petri nets. Analysis of properties and reasoning about uncertainty are investigated using fuzzy reasoning structures built into the net. Two applications of fuzzy Petri nets, robot task sequence planning and sensor-based error recovery, are explored. In the first application, the search space for feasible and complete task sequences with correct precedence relationships is reduced via the use of global fuzzy variables in reasoning about subgoals. In the second application, sensory verification operations are modeled by mutually exclusive transitions to reason about local and global fuzzy variables on-line and automatically select a retry or an alternative error recovery sequence when errors occur. Task sequencing and task execution with error recovery capability for one and multiple soft components in robotic systems are investigated.

  4. [Comparative studies of serological typing and HLA-A, B antigen genotyping with PCR using sequence-specific primers].

    PubMed

    Wu, Da-lin; Ling, Han-xin; Tang, Hao

    2004-11-01

    To evaluate the accuracy of PCR with sequence-specific primers (PCR-SSP) for HLA-I genotyping and analyze the causes of the errors occurring in the genotyping. DNA samples and were obtained from 34 clinical patients, and serological typing with monoclonal antibody (mAb) and HLA-A and, B antigen genotyping with PCR-SSP were performed. HLA-A and, B alleles were successfully typed in 34 clinical samples by mAb and PCR-SSP. No false positive or false negative results were found, and the erroneous and missed diagnosis rates were obviously higher in serological detection, being 23.5% for HLA-A and 26.5% for HLA-B. Error or confusion was more likely to occur in the antigens of A2 and A68, A32 and A33, B5, B60 and B61. DNA typing for HLA-I class (A, B antigens) by PCR-SSP has high resolution, high specificity, and good reproducibility, which is more suitable for clinical application than serological typing. PCR-SSP may accurately detect the alleles that are easily missed or mistaken in serological typing.

  5. Combination of Multiple Spectral Libraries Improves the Current Search Methods Used to Identify Missing Proteins in the Chromosome-Centric Human Proteome Project.

    PubMed

    Cho, Jin-Young; Lee, Hyoung-Joo; Jeong, Seul-Ki; Kim, Kwang-Youl; Kwon, Kyung-Hoon; Yoo, Jong Shin; Omenn, Gilbert S; Baker, Mark S; Hancock, William S; Paik, Young-Ki

    2015-12-04

    Approximately 2.9 billion long base-pair human reference genome sequences are known to encode some 20 000 representative proteins. However, 3000 proteins, that is, ~15% of all proteins, have no or very weak proteomic evidence and are still missing. Missing proteins may be present in rare samples in very low abundance or be only temporarily expressed, causing problems in their detection and protein profiling. In particular, some technical limitations cause missing proteins to remain unassigned. For example, current mass spectrometry techniques have high limits and error rates for the detection of complex biological samples. An insufficient proteome coverage in a reference sequence database and spectral library also raises major issues. Thus, the development of a better strategy that results in greater sensitivity and accuracy in the search for missing proteins is necessary. To this end, we used a new strategy, which combines a reference spectral library search and a simulated spectral library search, to identify missing proteins. We built the human iRefSPL, which contains the original human reference spectral library and additional peptide sequence-spectrum match entries from other species. We also constructed the human simSPL, which contains the simulated spectra of 173 907 human tryptic peptides determined by MassAnalyzer (version 2.3.1). To prove the enhanced analytical performance of the combination of the human iRefSPL and simSPL methods for the identification of missing proteins, we attempted to reanalyze the placental tissue data set (PXD000754). The data from each experiment were analyzed using PeptideProphet, and the results were combined using iProphet. For the quality control, we applied the class-specific false-discovery rate filtering method. All of the results were filtered at a false-discovery rate of <1% at the peptide and protein levels. The quality-controlled results were then cross-checked with the neXtProt DB (2014-09-19 release). The two spectral libraries, iRefSPL and simSPL, were designed to ensure no overlap of the proteome coverage. They were shown to be complementary to spectral library searching and significantly increased the number of matches. From this trial, 12 new missing proteins were identified that passed the following criterion: at least 2 peptides of 7 or more amino acids in length or one of 9 or more amino acids in length with one or more unique sequences. Thus, the iRefSPL and simSPL combination can be used to help identify peptides that have not been detected by conventional sequence database searches with improved sensitivity and a low error rate.

  6. Accuracy of maximum likelihood estimates of a two-state model in single-molecule FRET

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gopich, Irina V.

    2015-01-21

    Photon sequences from single-molecule Förster resonance energy transfer (FRET) experiments can be analyzed using a maximum likelihood method. Parameters of the underlying kinetic model (FRET efficiencies of the states and transition rates between conformational states) are obtained by maximizing the appropriate likelihood function. In addition, the errors (uncertainties) of the extracted parameters can be obtained from the curvature of the likelihood function at the maximum. We study the standard deviations of the parameters of a two-state model obtained from photon sequences with recorded colors and arrival times. The standard deviations can be obtained analytically in a special case when themore » FRET efficiencies of the states are 0 and 1 and in the limiting cases of fast and slow conformational dynamics. These results are compared with the results of numerical simulations. The accuracy and, therefore, the ability to predict model parameters depend on how fast the transition rates are compared to the photon count rate. In the limit of slow transitions, the key parameters that determine the accuracy are the number of transitions between the states and the number of independent photon sequences. In the fast transition limit, the accuracy is determined by the small fraction of photons that are correlated with their neighbors. The relative standard deviation of the relaxation rate has a “chevron” shape as a function of the transition rate in the log-log scale. The location of the minimum of this function dramatically depends on how well the FRET efficiencies of the states are separated.« less

  7. Accuracy of maximum likelihood estimates of a two-state model in single-molecule FRET

    PubMed Central

    Gopich, Irina V.

    2015-01-01

    Photon sequences from single-molecule Förster resonance energy transfer (FRET) experiments can be analyzed using a maximum likelihood method. Parameters of the underlying kinetic model (FRET efficiencies of the states and transition rates between conformational states) are obtained by maximizing the appropriate likelihood function. In addition, the errors (uncertainties) of the extracted parameters can be obtained from the curvature of the likelihood function at the maximum. We study the standard deviations of the parameters of a two-state model obtained from photon sequences with recorded colors and arrival times. The standard deviations can be obtained analytically in a special case when the FRET efficiencies of the states are 0 and 1 and in the limiting cases of fast and slow conformational dynamics. These results are compared with the results of numerical simulations. The accuracy and, therefore, the ability to predict model parameters depend on how fast the transition rates are compared to the photon count rate. In the limit of slow transitions, the key parameters that determine the accuracy are the number of transitions between the states and the number of independent photon sequences. In the fast transition limit, the accuracy is determined by the small fraction of photons that are correlated with their neighbors. The relative standard deviation of the relaxation rate has a “chevron” shape as a function of the transition rate in the log-log scale. The location of the minimum of this function dramatically depends on how well the FRET efficiencies of the states are separated. PMID:25612692

  8. Estimating residual fault hitting rates by recapture sampling

    NASA Technical Reports Server (NTRS)

    Lee, Larry; Gupta, Rajan

    1988-01-01

    For the recapture debugging design introduced by Nayak (1988) the problem of estimating the hitting rates of the faults remaining in the system is considered. In the context of a conditional likelihood, moment estimators are derived and are shown to be asymptotically normal and fully efficient. Fixed sample properties of the moment estimators are compared, through simulation, with those of the conditional maximum likelihood estimators. Properties of the conditional model are investigated such as the asymptotic distribution of linear functions of the fault hitting frequencies and a representation of the full data vector in terms of a sequence of independent random vectors. It is assumed that the residual hitting rates follow a log linear rate model and that the testing process is truncated when the gaps between the detection of new errors exceed a fixed amount of time.

  9. Errors Detection by 5- to 8-Year-Olds Listening to a Wrong French Sequence of Number Words: Music before Lyrics?

    ERIC Educational Resources Information Center

    Gauderat-Bagault, Laurence; Lehalle, Henri

    Children, ages 5 to 8 years (n=71), were required to listen and detect errors out of a partly wrong sequence of tape-recorded French number words from 1 to 100. Children (from several schools near Montpellier, France) were from preschool, grade 1, and grade 2. Results show that wrong syntactic rules were better detected than omissions, whereas…

  10. Basecalling with LifeTrace

    PubMed Central

    Walther, Dirk; Bartha, Gábor; Morris, Macdonald

    2001-01-01

    A pivotal step in electrophoresis sequencing is the conversion of the raw, continuous chromatogram data into the actual sequence of discrete nucleotides, a process referred to as basecalling. We describe a novel algorithm for basecalling implemented in the program LifeTrace. Like Phred, currently the most widely used basecalling software program, LifeTrace takes processed trace data as input. It was designed to be tolerant to variable peak spacing by means of an improved peak-detection algorithm that emphasizes local chromatogram information over global properties. LifeTrace is shown to generate high-quality basecalls and reliable quality scores. It proved particularly effective when applied to MegaBACE capillary sequencing machines. In a benchmark test of 8372 dye-primer MegaBACE chromatograms, LifeTrace generated 17% fewer substitution errors, 16% fewer insertion/deletion errors, and 2.4% more aligned bases to the finished sequence than did Phred. For two sets totaling 6624 dye-terminator chromatograms, the performance improvement was 15% fewer substitution errors, 10% fewer insertion/deletion errors, and 2.1% more aligned bases. The processing time required by LifeTrace is comparable to that of Phred. The predicted quality scores were in line with observed quality scores, permitting direct use for quality clipping and in silico single nucleotide polymorphism (SNP) detection. Furthermore, we introduce a new type of quality score associated with every basecall: the gap-quality. It estimates the probability of a deletion error between the current and the following basecall. This additional quality score improves detection of single basepair deletions when used for locating potential basecalling errors during the alignment. We also describe a new protocol for benchmarking that we believe better discerns basecaller performance differences than methods previously published. PMID:11337481

  11. Comparing K-mer based methods for improved classification of 16S sequences.

    PubMed

    Vinje, Hilde; Liland, Kristian Hovde; Almøy, Trygve; Snipen, Lars

    2015-07-01

    The need for precise and stable taxonomic classification is highly relevant in modern microbiology. Parallel to the explosion in the amount of sequence data accessible, there has also been a shift in focus for classification methods. Previously, alignment-based methods were the most applicable tools. Now, methods based on counting K-mers by sliding windows are the most interesting classification approach with respect to both speed and accuracy. Here, we present a systematic comparison on five different K-mer based classification methods for the 16S rRNA gene. The methods differ from each other both in data usage and modelling strategies. We have based our study on the commonly known and well-used naïve Bayes classifier from the RDP project, and four other methods were implemented and tested on two different data sets, on full-length sequences as well as fragments of typical read-length. The difference in classification error obtained by the methods seemed to be small, but they were stable and for both data sets tested. The Preprocessed nearest-neighbour (PLSNN) method performed best for full-length 16S rRNA sequences, significantly better than the naïve Bayes RDP method. On fragmented sequences the naïve Bayes Multinomial method performed best, significantly better than all other methods. For both data sets explored, and on both full-length and fragmented sequences, all the five methods reached an error-plateau. We conclude that no K-mer based method is universally best for classifying both full-length sequences and fragments (reads). All methods approach an error plateau indicating improved training data is needed to improve classification from here. Classification errors occur most frequent for genera with few sequences present. For improving the taxonomy and testing new classification methods, the need for a better and more universal and robust training data set is crucial.

  12. Synthetic Spike-in Standards Improve Run-Specific Systematic Error Analysis for DNA and RNA Sequencing

    PubMed Central

    Zook, Justin M.; Samarov, Daniel; McDaniel, Jennifer; Sen, Shurjo K.; Salit, Marc

    2012-01-01

    While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a data set used to calculate association of SSEs with various features in the reads and sequence context. This data set is typically either from a part of the data set being “recalibrated” (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 Phred-scaled quality score units, and by as much as 13 units at CpG sites. In addition, since the spike-in data used for recalibration are independent of the genome being sequenced, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration. PMID:22859977

  13. Dynamically corrected gates for singlet-triplet spin qubits with control-dependent errors

    NASA Astrophysics Data System (ADS)

    Jacobson, N. Tobias; Witzel, Wayne M.; Nielsen, Erik; Carroll, Malcolm S.

    2013-03-01

    Magnetic field inhomogeneity due to random polarization of quasi-static local magnetic impurities is a major source of environmentally induced error for singlet-triplet double quantum dot (DQD) spin qubits. Moreover, for singlet-triplet qubits this error may depend on the applied controls. This effect is significant when a static magnetic field gradient is applied to enable full qubit control. Through a configuration interaction analysis, we observe that the dependence of the field inhomogeneity-induced error on the DQD bias voltage can vary systematically as a function of the controls for certain experimentally relevant operating regimes. To account for this effect, we have developed a straightforward prescription for adapting dynamically corrected gate sequences that assume control-independent errors into sequences that compensate for systematic control-dependent errors. We show that accounting for such errors may lead to a substantial increase in gate fidelities. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. DOE's National Nuclear Security Administration under contract DE-AC04-94AL85000.

  14. Experimental demonstration of tunable multiple optical orthogonal codes sequences-based optical label for optical packets switching

    NASA Astrophysics Data System (ADS)

    Zhang, Chongfu; Qiu, Kun; Zhou, Heng; Ling, Yun; Wang, Yawei; Xu, Bo

    2010-03-01

    In this paper, the tunable multiple optical orthogonal codes sequences (MOOCS)-based optical label for optical packet switching (OPS) (MOOCS-OPS) is experimentally demonstrated for the first time. The tunable MOOCS-based optical label is performed by using fiber Bragg grating (FBG)-based optical en/decoders group and optical switches configured by using Field Programmable Gate Array (FPGA), and the optical label is erased by using Semiconductor Optical Amplifier (SOA). Some waveforms of the MOOCS-based optical label, optical packet including the MOOCS-based optical label and the payloads are obtained, the switching control mechanism and the switching matrix are discussed, the bit error rate (BER) performance of this system is also studied. These experimental results show that the tunable MOOCS-OPS scheme is effective.

  15. Linear reduction method for predictive and informative tag SNP selection.

    PubMed

    He, Jingwu; Westbrooks, Kelly; Zelikovsky, Alexander

    2005-01-01

    Constructing a complete human haplotype map is helpful when associating complex diseases with their related SNPs. Unfortunately, the number of SNPs is very large and it is costly to sequence many individuals. Therefore, it is desirable to reduce the number of SNPs that should be sequenced to a small number of informative representatives called tag SNPs. In this paper, we propose a new linear algebra-based method for selecting and using tag SNPs. We measure the quality of our tag SNP selection algorithm by comparing actual SNPs with SNPs predicted from selected linearly independent tag SNPs. Our experiments show that for sufficiently long haplotypes, knowing only 0.4% of all SNPs the proposed linear reduction method predicts an unknown haplotype with the error rate below 2% based on 10% of the population.

  16. Newborn screening: a review of history, recent advancements, and future perspectives in the era of next generation sequencing.

    PubMed

    Almannai, Mohammed; Marom, Ronit; Sutton, V Reid

    2016-12-01

    The purpose of this review is to summarize the development and recent advancements of newborn screening. Early initiation of medical care has modified the outcome for many disorders that were previously associated with high morbidity (such as cystic fibrosis, primary immune deficiencies, and inborn errors of metabolism) or with significant neurodevelopmental disabilities (such as phenylketonuria and congenital hypothyroidism). The new era of mass spectrometry and next generation sequencing enables the expansion of the newborn screen panel, and will help to address technical issues such as turnaround time, and decreasing false-positive and false-negative rates for the testing. The newborn screening program is a successful public health initiative that facilitates early diagnosis of treatable disorders to reduce long-term morbidity and mortality.

  17. Sequencing artifacts in the type A influenza database and attempts to correct them

    USDA-ARS?s Scientific Manuscript database

    Currently over 300,000 Type A influenza gene sequences representing over 50,000 strains are available in publicly available databases. However, the quality of the sequences submitted are determined by the contributor and many sequence errors are present in the databases, which can affect the result...

  18. Pilot-Assisted Channel Estimation for Orthogonal Multi-Carrier DS-CDMA with Frequency-Domain Equalization

    NASA Astrophysics Data System (ADS)

    Shima, Tomoyuki; Tomeba, Hiromichi; Adachi, Fumiyuki

    Orthogonal multi-carrier direct sequence code division multiple access (orthogonal MC DS-CDMA) is a combination of time-domain spreading and orthogonal frequency division multiplexing (OFDM). In orthogonal MC DS-CDMA, the frequency diversity gain can be obtained by applying frequency-domain equalization (FDE) based on minimum mean square error (MMSE) criterion to a block of OFDM symbols and can improve the bit error rate (BER) performance in a severe frequency-selective fading channel. FDE requires an accurate estimate of the channel gain. The channel gain can be estimated by removing the pilot modulation in the frequency domain. In this paper, we propose a pilot-assisted channel estimation suitable for orthogonal MC DS-CDMA with FDE and evaluate, by computer simulation, the BER performance in a frequency-selective Rayleigh fading channel.

  19. Combinatorial pulse position modulation for power-efficient free-space laser communications

    NASA Technical Reports Server (NTRS)

    Budinger, James M.; Vanderaar, M.; Wagner, P.; Bibyk, Steven

    1993-01-01

    A new modulation technique called combinatorial pulse position modulation (CPPM) is presented as a power-efficient alternative to quaternary pulse position modulation (QPPM) for direct-detection, free-space laser communications. The special case of 16C4PPM is compared to QPPM in terms of data throughput and bit error rate (BER) performance for similar laser power and pulse duty cycle requirements. The increased throughput from CPPM enables the use of forward error corrective (FEC) encoding for a net decrease in the amount of laser power required for a given data throughput compared to uncoded QPPM. A specific, practical case of coded CPPM is shown to reduce the amount of power required to transmit and receive a given data sequence by at least 4.7 dB. Hardware techniques for maximum likelihood detection and symbol timing recovery are presented.

  20. SU-E-T-418: Explore the Sensitive of the Planar Quality Assurance to the MLC Error with Different Beam Complexity in Intensity-Modulate Radiation Therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, J; Peng, J; Xie, J

    2015-06-15

    Purpose: The purpose of this study is to investigate the sensitivity of the planar quality assurance to MLC errors with different beam complexities in intensity-modulate radiation therapy. Methods: sixteen patients’ planar quality assurance (QA) plans in our institution were enrolled in this study, including 10 dynamic MLC (DMLC) IMRT plans measured by Portal Dosimetry and 6 static MLC (SMLC) IMRT plans measured by Mapcheck. The gamma pass rate was calculated using vender’s software. The field numbers were 74 and 40 for DMLC and SMLC, respectively. A random error was generated and introduced to these fields. The modified gamma pass ratemore » was calculated by comparing the original measured fluence and modified fields’ fluence. A decreasing gamma pass rate was acquired using the original gamma pass rate minus the modified gamma pass rate. Eight complexity scores were calculated in MATLAB based on the fluence and MLC sequence of these fields. The complexity scores include fractal dimension, monitor unit of field, modulation index, fluence map complexity, weighted average of field area, weighted average of field perimeter, and small aperture ratio ( <5cm{sup 2} and <50cm{sup 2}). The Spearman’s rank correlation coefficient was implemented to analyze the correlation between these scores and decreasing gamma rate. Results: The relation between the decreasing gamma pass rate and field complexity was insignificant for most complexity scores. The most significant complexity score was fluence map complexity for SMLC, which have ρ =0.4274 (p-value=0.0063). For DMLC, the most significant complex score was fractal dimension, which have ρ=−0.3068 (p-value=0.0081). Conclusions: According to the primarily Result of this study, the sensitivity gamma pass rate was not strongly relate to the field complexity.« less

  1. AutoFACT: An Automatic Functional Annotation and Classification Tool

    PubMed Central

    Koski, Liisa B; Gray, Michael W; Lang, B Franz; Burger, Gertraud

    2005-01-01

    Background Assignment of function to new molecular sequence data is an essential step in genomics projects. The usual process involves similarity searches of a given sequence against one or more databases, an arduous process for large datasets. Results We present AutoFACT, a fully automated and customizable annotation tool that assigns biologically informative functions to a sequence. Key features of this tool are that it (1) analyzes nucleotide and protein sequence data; (2) determines the most informative functional description by combining multiple BLAST reports from several user-selected databases; (3) assigns putative metabolic pathways, functional classes, enzyme classes, GeneOntology terms and locus names; and (4) generates output in HTML, text and GFF formats for the user's convenience. We have compared AutoFACT to four well-established annotation pipelines. The error rate of functional annotation is estimated to be only between 1–2%. Comparison of AutoFACT to the traditional top-BLAST-hit annotation method shows that our procedure increases the number of functionally informative annotations by approximately 50%. Conclusion AutoFACT will serve as a useful annotation tool for smaller sequencing groups lacking dedicated bioinformatics staff. It is implemented in PERL and runs on LINUX/UNIX platforms. AutoFACT is available at . PMID:15960857

  2. Impact of gradient timing error on the tissue sodium concentration bioscale measured using flexible twisted projection imaging

    NASA Astrophysics Data System (ADS)

    Lu, Aiming; Atkinson, Ian C.; Vaughn, J. Thomas; Thulborn, Keith R.

    2011-12-01

    The rapid biexponential transverse relaxation of the sodium MR signal from brain tissue requires efficient k-space sampling for quantitative imaging in a time that is acceptable for human subjects. The flexible twisted projection imaging (flexTPI) sequence has been shown to be suitable for quantitative sodium imaging with an ultra-short echo time to minimize signal loss. The fidelity of the k-space center location is affected by the readout gradient timing errors on the three physical axes, which is known to cause image distortion for projection-based acquisitions. This study investigated the impact of these timing errors on the voxel-wise accuracy of the tissue sodium concentration (TSC) bioscale measured with the flexTPI sequence. Our simulations show greater than 20% spatially varying quantification errors when the gradient timing errors are larger than 10 μs on all three axes. The quantification is more tolerant of gradient timing errors on the Z-axis. An existing method was used to measure the gradient timing errors with <1 μs error. The gradient timing error measurement is shown to be RF coil dependent, and timing error differences of up to ˜16 μs have been observed between different RF coils used on the same scanner. The measured timing errors can be corrected prospectively or retrospectively to obtain accurate TSC values.

  3. Optical Processing Techniques For Pseudorandom Sequence Prediction

    NASA Astrophysics Data System (ADS)

    Gustafson, Steven C.

    1983-11-01

    Pseudorandom sequences are series of apparently random numbers generated, for example, by linear or nonlinear feedback shift registers. An important application of these sequences is in spread spectrum communication systems, in which, for example, the transmitted carrier phase is digitally modulated rapidly and pseudorandomly and in which the information to be transmitted is incorporated as a slow modulation in the pseudorandom sequence. In this case the transmitted information can be extracted only by a receiver that uses for demodulation the same pseudorandom sequence used by the transmitter, and thus this type of communication system has a very high immunity to third-party interference. However, if a third party can predict in real time the probable future course of the transmitted pseudorandom sequence given past samples of this sequence, then interference immunity can be significantly reduced.. In this application effective pseudorandom sequence prediction techniques should be (1) applicable in real time to rapid (e.g., megahertz) sequence generation rates, (2) applicable to both linear and nonlinear pseudorandom sequence generation processes, and (3) applicable to error-prone past sequence samples of limited number and continuity. Certain optical processing techniques that may meet these requirements are discussed in this paper. In particular, techniques based on incoherent optical processors that perform general linear transforms or (more specifically) matrix-vector multiplications are considered. Computer simulation examples are presented which indicate that significant prediction accuracy can be obtained using these transforms for simple pseudorandom sequences. However, the useful prediction of more complex pseudorandom sequences will probably require the application of more sophisticated optical processing techniques.

  4. Kinematic Analysis of Speech Sound Sequencing Errors Induced by Delayed Auditory Feedback.

    PubMed

    Cler, Gabriel J; Lee, Jackson C; Mittelman, Talia; Stepp, Cara E; Bohland, Jason W

    2017-06-22

    Delayed auditory feedback (DAF) causes speakers to become disfluent and make phonological errors. Methods for assessing the kinematics of speech errors are lacking, with most DAF studies relying on auditory perceptual analyses, which may be problematic, as errors judged to be categorical may actually represent blends of sounds or articulatory errors. Eight typical speakers produced nonsense syllable sequences under normal and DAF (200 ms). Lip and tongue kinematics were captured with electromagnetic articulography. Time-locked acoustic recordings were transcribed, and the kinematics of utterances with and without perceived errors were analyzed with existing and novel quantitative methods. New multivariate measures showed that for 5 participants, kinematic variability for productions perceived to be error free was significantly increased under delay; these results were validated by using the spatiotemporal index measure. Analysis of error trials revealed both typical productions of a nontarget syllable and productions with articulatory kinematics that incorporated aspects of both the target and the perceived utterance. This study is among the first to characterize articulatory changes under DAF and provides evidence for different classes of speech errors, which may not be perceptually salient. New methods were developed that may aid visualization and analysis of large kinematic data sets. https://doi.org/10.23641/asha.5103067.

  5. Kinematic Analysis of Speech Sound Sequencing Errors Induced by Delayed Auditory Feedback

    PubMed Central

    Lee, Jackson C.; Mittelman, Talia; Stepp, Cara E.; Bohland, Jason W.

    2017-01-01

    Purpose Delayed auditory feedback (DAF) causes speakers to become disfluent and make phonological errors. Methods for assessing the kinematics of speech errors are lacking, with most DAF studies relying on auditory perceptual analyses, which may be problematic, as errors judged to be categorical may actually represent blends of sounds or articulatory errors. Method Eight typical speakers produced nonsense syllable sequences under normal and DAF (200 ms). Lip and tongue kinematics were captured with electromagnetic articulography. Time-locked acoustic recordings were transcribed, and the kinematics of utterances with and without perceived errors were analyzed with existing and novel quantitative methods. Results New multivariate measures showed that for 5 participants, kinematic variability for productions perceived to be error free was significantly increased under delay; these results were validated by using the spatiotemporal index measure. Analysis of error trials revealed both typical productions of a nontarget syllable and productions with articulatory kinematics that incorporated aspects of both the target and the perceived utterance. Conclusions This study is among the first to characterize articulatory changes under DAF and provides evidence for different classes of speech errors, which may not be perceptually salient. New methods were developed that may aid visualization and analysis of large kinematic data sets. Supplemental Material https://doi.org/10.23641/asha.5103067 PMID:28655038

  6. High temporal resolution aberrometry in a 50-eye population and implications for adaptive optics error budget.

    PubMed

    Jarosz, Jessica; Mecê, Pedro; Conan, Jean-Marc; Petit, Cyril; Paques, Michel; Meimon, Serge

    2017-04-01

    We formed a database gathering the wavefront aberrations of 50 healthy eyes measured with an original custom-built Shack-Hartmann aberrometer at a temporal frequency of 236 Hz, with 22 lenslets across a 7-mm diameter pupil, for a duration of 20 s. With this database, we draw statistics on the spatial and temporal behavior of the dynamic aberrations of the eye. Dynamic aberrations were studied on a 5-mm diameter pupil and on a 3.4 s sequence between blinks. We noted that, on average, temporal wavefront variance exhibits a n -2 power-law with radial order n and temporal spectra follow a f -1.5 power-law with temporal frequency f . From these statistics, we then extract guidelines for designing an adaptive optics system. For instance, we show the residual wavefront error evolution as a function of the number of corrected modes and of the adaptive optics loop frame rate. In particular, we infer that adaptive optics performance rapidly increases with the loop frequency up to 50 Hz, with gain being more limited at higher rates.

  7. High temporal resolution aberrometry in a 50-eye population and implications for adaptive optics error budget

    PubMed Central

    Jarosz, Jessica; Mecê, Pedro; Conan, Jean-Marc; Petit, Cyril; Paques, Michel; Meimon, Serge

    2017-01-01

    We formed a database gathering the wavefront aberrations of 50 healthy eyes measured with an original custom-built Shack-Hartmann aberrometer at a temporal frequency of 236 Hz, with 22 lenslets across a 7-mm diameter pupil, for a duration of 20 s. With this database, we draw statistics on the spatial and temporal behavior of the dynamic aberrations of the eye. Dynamic aberrations were studied on a 5-mm diameter pupil and on a 3.4 s sequence between blinks. We noted that, on average, temporal wavefront variance exhibits a n−2 power-law with radial order n and temporal spectra follow a f−1.5 power-law with temporal frequency f. From these statistics, we then extract guidelines for designing an adaptive optics system. For instance, we show the residual wavefront error evolution as a function of the number of corrected modes and of the adaptive optics loop frame rate. In particular, we infer that adaptive optics performance rapidly increases with the loop frequency up to 50 Hz, with gain being more limited at higher rates. PMID:28736657

  8. Microsatellite Interruptions Stabilize Primate Genomes and Exist as Population-Specific Single Nucleotide Polymorphisms within Individual Human Genomes

    PubMed Central

    Ananda, Guruprasad; Hile, Suzanne E.; Breski, Amanda; Wang, Yanli; Kelkar, Yogeshwar; Makova, Kateryna D.; Eckert, Kristin A.

    2014-01-01

    Interruptions of microsatellite sequences impact genome evolution and can alter disease manifestation. However, human polymorphism levels at interrupted microsatellites (iMSs) are not known at a genome-wide scale, and the pathways for gaining interruptions are poorly understood. Using the 1000 Genomes Phase-1 variant call set, we interrogated mono-, di-, tri-, and tetranucleotide repeats up to 10 units in length. We detected ∼26,000–40,000 iMSs within each of four human population groups (African, European, East Asian, and American). We identified population-specific iMSs within exonic regions, and discovered that known disease-associated iMSs contain alleles present at differing frequencies among the populations. By analyzing longer microsatellites in primate genomes, we demonstrate that single interruptions result in a genome-wide average two- to six-fold reduction in microsatellite mutability, as compared with perfect microsatellites. Centrally located interruptions lowered mutability dramatically, by two to three orders of magnitude. Using a biochemical approach, we tested directly whether the mutability of a specific iMS is lower because of decreased DNA polymerase strand slippage errors. Modeling the adenomatous polyposis coli tumor suppressor gene sequence, we observed that a single base substitution interruption reduced strand slippage error rates five- to 50-fold, relative to a perfect repeat, during synthesis by DNA polymerases α, β, or η. Computationally, we demonstrate that iMSs arise primarily by base substitution mutations within individual human genomes. Our biochemical survey of human DNA polymerase α, β, δ, κ, and η error rates within certain microsatellites suggests that interruptions are created most frequently by low fidelity polymerases. Our combined computational and biochemical results demonstrate that iMSs are abundant in human genomes and are sources of population-specific genetic variation that may affect genome stability. The genome-wide identification of iMSs in human populations presented here has important implications for current models describing the impact of microsatellite polymorphisms on gene expression. PMID:25033203

  9. Modal processing for acoustic communications in shallow water experiment.

    PubMed

    Morozov, Andrey K; Preisig, James C; Papp, Joseph

    2008-09-01

    Acoustical array data from the Shallow Water Acoustics experiment was processed to show the feasibility of broadband mode decomposition as a preprocessing method to reduce the effective channel delay spread and concentrate received signal energy in a small number of independent channels. The data were collected by a vertical array designed at the Woods Hole Oceanographic Institution. Phase-shift Keying (PSK) m-sequence modulated signals with different carrier frequencies were transmitted at a distance 19.2 km from the array. Even during a strong internal waves activity a low bit error rate was achieved.

  10. Associations among measures of sequential processing in motor and linguistics tasks in adults with and without a family history of childhood apraxia of speech: a replication study.

    PubMed

    Button, Le; Peter, Beate; Stoel-Gammon, Carol; Raskind, Wendy H

    2013-03-01

    The purpose of this study was to address the hypothesis that childhood apraxia of speech (CAS) is influenced by an underlying deficit in sequential processing that is also expressed in other modalities. In a sample of 21 adults from five multigenerational families, 11 with histories of various familial speech sound disorders, 3 biologically related adults from a family with familial CAS showed motor sequencing deficits in an alternating motor speech task. Compared with the other adults, these three participants showed deficits in tasks requiring high loads of sequential processing, including nonword imitation, nonword reading and spelling. Qualitative error analyses in real word and nonword imitations revealed group differences in phoneme sequencing errors. Motor sequencing ability was correlated with phoneme sequencing errors during real word and nonword imitation, reading and spelling. Correlations were characterized by extremely high scores in one family and extremely low scores in another. Results are consistent with a central deficit in sequential processing in CAS of familial origin.

  11. Associations among measures of sequential processing in motor and linguistics tasks in adults with and without a family history of childhood apraxia of speech: A replication study

    PubMed Central

    BUTTON, LE; PETER, BEATE; STOEL-GAMMON, CAROL; RASKIND, WENDY H.

    2013-01-01

    The purpose of this study was to address the hypothesis that childhood apraxia of speech (CAS) is influenced by an underlying deficit in sequential processing that is also expressed in other modalities. In a sample of 21 adults from five multigenerational families, 11 with histories of various familial speech sound disorders, 3 biologically related adults from a family with familial CAS showed motor sequencing deficits in an alternating motor speech task. Compared with the other adults, these three participants showed deficits in tasks requiring high loads of sequential processing, including nonword imitation, nonword reading and spelling. Qualitative error analyses in real word and nonword imitations revealed group differences in phoneme sequencing errors. Motor sequencing ability was correlated with phoneme sequencing errors during real word and nonword imitation, reading and spelling. Correlations were characterized by extremely high scores in one family and extremely low scores in another. Results are consistent with a central deficit in sequential processing in CAS of familial origin. PMID:23339292

  12. Contour Tracking with a Spatio-Temporal Intensity Moment.

    PubMed

    Demi, Marcello

    2016-06-01

    Standard edge detection operators such as the Laplacian of Gaussian and the gradient of Gaussian can be used to track contours in image sequences. When using edge operators, a contour, which is determined on a frame of the sequence, is simply used as a starting contour to locate the nearest contour on the subsequent frame. However, the strategy used to look for the nearest edge points may not work when tracking contours of non isolated gray level discontinuities. In these cases, strategies derived from the optical flow equation, which look for similar gray level distributions, appear to be more appropriate since these can work with a lower frame rate than that needed for strategies based on pure edge detection operators. However, an optical flow strategy tends to propagate the localization errors through the sequence and an additional edge detection procedure is essential to compensate for such a drawback. In this paper a spatio-temporal intensity moment is proposed which integrates the two basic functions of edge detection and tracking.

  13. Fast and efficient search for MPEG-4 video using adjacent pixel intensity difference quantization histogram feature

    NASA Astrophysics Data System (ADS)

    Lee, Feifei; Kotani, Koji; Chen, Qiu; Ohmi, Tadahiro

    2010-02-01

    In this paper, a fast search algorithm for MPEG-4 video clips from video database is proposed. An adjacent pixel intensity difference quantization (APIDQ) histogram is utilized as the feature vector of VOP (video object plane), which had been reliably applied to human face recognition previously. Instead of fully decompressed video sequence, partially decoded data, namely DC sequence of the video object are extracted from the video sequence. Combined with active search, a temporal pruning algorithm, fast and robust video search can be realized. The proposed search algorithm has been evaluated by total 15 hours of video contained of TV programs such as drama, talk, news, etc. to search for given 200 MPEG-4 video clips which each length is 15 seconds. Experimental results show the proposed algorithm can detect the similar video clip in merely 80ms, and Equal Error Rate (ERR) of 2 % in drama and news categories are achieved, which are more accurately and robust than conventional fast video search algorithm.

  14. Nurses' behaviors and visual scanning patterns may reduce patient identification errors.

    PubMed

    Marquard, Jenna L; Henneman, Philip L; He, Ze; Jo, Junghee; Fisher, Donald L; Henneman, Elizabeth A

    2011-09-01

    Patient identification (ID) errors occurring during the medication administration process can be fatal. The aim of this study is to determine whether differences in nurses' behaviors and visual scanning patterns during the medication administration process influence their capacities to identify patient ID errors. Nurse participants (n = 20) administered medications to 3 patients in a simulated clinical setting, with 1 patient having an embedded ID error. Error-identifying nurses tended to complete more process steps in a similar amount of time than non-error-identifying nurses and tended to scan information across artifacts (e.g., ID band, patient chart, medication label) rather than fixating on several pieces of information on a single artifact before fixating on another artifact. Non-error-indentifying nurses tended to increase their durations of off-topic conversations-a type of process interruption-over the course of the trials; the difference between groups was significant in the trial with the embedded ID error. Error-identifying nurses tended to have their most fixations in a row on the patient's chart, whereas non-error-identifying nurses did not tend to have a single artifact on which they consistently fixated. Finally, error-identifying nurses tended to have predictable eye fixation sequences across artifacts, whereas non-error-identifying nurses tended to have seemingly random eye fixation sequences. This finding has implications for nurse training and the design of tools and technologies that support nurses as they complete the medication administration process. (c) 2011 APA, all rights reserved.

  15. Representation of item position in immediate serial recall: Evidence from intrusion errors.

    PubMed

    Fischer-Baum, Simon; McCloskey, Michael

    2015-09-01

    In immediate serial recall, participants are asked to recall novel sequences of items in the correct order. Theories of the representations and processes required for this task differ in how order information is maintained; some have argued that order is represented through item-to-item associations, while others have argued that each item is coded for its position in a sequence, with position being defined either by distance from the start of the sequence, or by distance from both the start and the end of the sequence. Previous researchers have used error analyses to adjudicate between these different proposals. However, these previous attempts have not allowed researchers to examine the full set of alternative proposals. In the current study, we analyzed errors produced in 2 immediate serial recall experiments that differ in the modality of input (visual vs. aural presentation of words) and the modality of output (typed vs. spoken responses), using new analysis methods that allow for a greater number of alternative hypotheses to be considered. We find evidence that sequence positions are represented relative to both the start and the end of the sequence, and show a contribution of the end-based representation beyond the final item in the sequence. We also find limited evidence for item-to-item associations, suggesting that both a start-end positional scheme and item-to-item associations play a role in representing item order in immediate serial recall. (c) 2015 APA, all rights reserved).

  16. Accuracy and Reproducibility of Adipose Tissue Measurements in Young Infants by Whole Body Magnetic Resonance Imaging

    PubMed Central

    Bauer, Jan Stefan; Noël, Peter Benjamin; Vollhardt, Christiane; Much, Daniela; Degirmenci, Saliha; Brunner, Stefanie; Rummeny, Ernst Josef; Hauner, Hans

    2015-01-01

    Purpose MR might be well suited to obtain reproducible and accurate measures of fat tissues in infants. This study evaluates MR-measurements of adipose tissue in young infants in vitro and in vivo. Material and Methods MR images of ten phantoms simulating subcutaneous fat of an infant’s torso were obtained using a 1.5T MR scanner with and without simulated breathing. Scans consisted of a cartesian water-suppression turbo spin echo (wsTSE) sequence, and a PROPELLER wsTSE sequence. Fat volume was quantified directly and by MR imaging using k-means clustering and threshold-based segmentation procedures to calculate accuracy in vitro. Whole body MR was obtained in sleeping young infants (average age 67±30 days). This study was approved by the local review board. All parents gave written informed consent. To obtain reproducibility in vivo, cartesian and PROPELLER wsTSE sequences were repeated in seven and four young infants, respectively. Overall, 21 repetitions were performed for the cartesian sequence and 13 repetitions for the PROPELLER sequence. Results In vitro accuracy errors depended on the chosen segmentation procedure, ranging from 5.4% to 76%, while the sequence showed no significant influence. Artificial breathing increased the minimal accuracy error to 9.1%. In vivo reproducibility errors for total fat volume of the sleeping infants ranged from 2.6% to 3.4%. Neither segmentation nor sequence significantly influenced reproducibility. Conclusion With both cartesian and PROPELLER sequences an accurate and reproducible measure of body fat was achieved. Adequate segmentation was mandatory for high accuracy. PMID:25706876

  17. Identifying and reducing error in cluster-expansion approximations of protein energies.

    PubMed

    Hahn, Seungsoo; Ashenberg, Orr; Grigoryan, Gevorg; Keating, Amy E

    2010-12-01

    Protein design involves searching a vast space for sequences that are compatible with a defined structure. This can pose significant computational challenges. Cluster expansion is a technique that can accelerate the evaluation of protein energies by generating a simple functional relationship between sequence and energy. The method consists of several steps. First, for a given protein structure, a training set of sequences with known energies is generated. Next, this training set is used to expand energy as a function of clusters consisting of single residues, residue pairs, and higher order terms, if required. The accuracy of the sequence-based expansion is monitored and improved using cross-validation testing and iterative inclusion of additional clusters. As a trade-off for evaluation speed, the cluster-expansion approximation causes prediction errors, which can be reduced by including more training sequences, including higher order terms in the expansion, and/or reducing the sequence space described by the cluster expansion. This article analyzes the sources of error and introduces a method whereby accuracy can be improved by judiciously reducing the described sequence space. The method is applied to describe the sequence-stability relationship for several protein structures: coiled-coil dimers and trimers, a PDZ domain, and T4 lysozyme as examples with computationally derived energies, and SH3 domains in amphiphysin-1 and endophilin-1 as examples where the expanded pseudo-energies are obtained from experiments. Our open-source software package Cluster Expansion Version 1.0 allows users to expand their own energy function of interest and thereby apply cluster expansion to custom problems in protein design. © 2010 Wiley Periodicals, Inc.

  18. Accuracy and reproducibility of adipose tissue measurements in young infants by whole body magnetic resonance imaging.

    PubMed

    Bauer, Jan Stefan; Noël, Peter Benjamin; Vollhardt, Christiane; Much, Daniela; Degirmenci, Saliha; Brunner, Stefanie; Rummeny, Ernst Josef; Hauner, Hans

    2015-01-01

    MR might be well suited to obtain reproducible and accurate measures of fat tissues in infants. This study evaluates MR-measurements of adipose tissue in young infants in vitro and in vivo. MR images of ten phantoms simulating subcutaneous fat of an infant's torso were obtained using a 1.5T MR scanner with and without simulated breathing. Scans consisted of a cartesian water-suppression turbo spin echo (wsTSE) sequence, and a PROPELLER wsTSE sequence. Fat volume was quantified directly and by MR imaging using k-means clustering and threshold-based segmentation procedures to calculate accuracy in vitro. Whole body MR was obtained in sleeping young infants (average age 67±30 days). This study was approved by the local review board. All parents gave written informed consent. To obtain reproducibility in vivo, cartesian and PROPELLER wsTSE sequences were repeated in seven and four young infants, respectively. Overall, 21 repetitions were performed for the cartesian sequence and 13 repetitions for the PROPELLER sequence. In vitro accuracy errors depended on the chosen segmentation procedure, ranging from 5.4% to 76%, while the sequence showed no significant influence. Artificial breathing increased the minimal accuracy error to 9.1%. In vivo reproducibility errors for total fat volume of the sleeping infants ranged from 2.6% to 3.4%. Neither segmentation nor sequence significantly influenced reproducibility. With both cartesian and PROPELLER sequences an accurate and reproducible measure of body fat was achieved. Adequate segmentation was mandatory for high accuracy.

  19. TH-EF-BRA-06: A Novel Retrospective 3D K-Space Sorting 4D-MRI Technique Using a Radial K-Space Acquisition MRI Sequence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Y; Subashi, E; Yin, F

    Purpose: Current retrospective 4D-MRI provides superior tumor-to-tissue contrast and accurate respiratory motion information for radiotherapy motion management. The developed 4D-MRI techniques based on 2D-MRI image sorting require a high frame-rate of the MR sequences. However, several MRI sequences provide excellent image quality but have low frame-rate. This study aims at developing a novel retrospective 3D k-space sorting 4D-MRI technique using radial k-space acquisition MRI sequences to improve 4D-MRI image quality and temporal-resolution for imaging irregular organ/tumor respiratory motion. Methods: The method is based on a RF-spoiled, steady-state, gradient-recalled sequence with minimal echo time. A 3D radial k-space data acquisition trajectorymore » was used for sampling the datasets. Each radial spoke readout data line starts from the 3D center of Field-of-View. Respiratory signal can be extracted from the k-space center data point of each spoke. The spoke data was sorted based on its self-synchronized respiratory signal using phase sorting. Subsequently, 3D reconstruction was conducted to generate the time-resolved 4D-MRI images. As a feasibility study, this technique was implemented on a digital human phantom XCAT. The respiratory motion was controlled by an irregular motion profile. To validate using k-space center data as a respiratory surrogate, we compared it with the XCAT input controlling breathing profile. Tumor motion trajectories measured on reconstructed 4D-MRI were compared to the average input trajectory. The mean absolute amplitude difference (D) was calculated. Results: The signal extracted from k-space center data matches well with the input controlling respiratory profile of XCAT. The relative amplitude error was 8.6% and the relative phase error was 3.5%. XCAT 4D-MRI demonstrated a clear motion pattern with little serrated artifacts. D of tumor trajectories was 0.21mm, 0.23mm and 0.23mm in SI, AP and ML directions, respectively. Conclusion: A novel retrospective 3D k-space sorting 4D-MRI technique has been developed and evaluated on human digital phantom. NIH (1R21CA165384-01A1)« less

  20. Indels, structural variation, and recombination drive genomic diversity in Plasmodium falciparum

    PubMed Central

    Miles, Alistair; Iqbal, Zamin; Vauterin, Paul; Pearson, Richard; Campino, Susana; Theron, Michel; Gould, Kelda; Mead, Daniel; Drury, Eleanor; O'Brien, John; Ruano Rubio, Valentin; MacInnis, Bronwyn; Mwangi, Jonathan; Samarakoon, Upeka; Ranford-Cartwright, Lisa; Ferdig, Michael; Hayton, Karen; Su, Xin-zhuan; Wellems, Thomas; Rayner, Julian; McVean, Gil; Kwiatkowski, Dominic

    2016-01-01

    The malaria parasite Plasmodium falciparum has a great capacity for evolutionary adaptation to evade host immunity and develop drug resistance. Current understanding of parasite evolution is impeded by the fact that a large fraction of the genome is either highly repetitive or highly variable and thus difficult to analyze using short-read sequencing technologies. Here, we describe a resource of deep sequencing data on parents and progeny from genetic crosses, which has enabled us to perform the first genome-wide, integrated analysis of SNP, indel and complex polymorphisms, using Mendelian error rates as an indicator of genotypic accuracy. These data reveal that indels are exceptionally abundant, being more common than SNPs and thus the dominant mode of polymorphism within the core genome. We use the high density of SNP and indel markers to analyze patterns of meiotic recombination, confirming a high rate of crossover events and providing the first estimates for the rate of non-crossover events and the length of conversion tracts. We observe several instances of meiotic recombination within copy number variants associated with drug resistance, demonstrating a mechanism whereby fitness costs associated with resistance mutations could be compensated and greater phenotypic plasticity could be acquired. PMID:27531718

  1. ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data

    PubMed Central

    2012-01-01

    Background Next-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results. Results Here we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy. Conclusion ReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration. PMID:22946927

  2. Golay sequences coded coherent optical OFDM for long-haul transmission

    NASA Astrophysics Data System (ADS)

    Qin, Cui; Ma, Xiangrong; Hua, Tao; Zhao, Jing; Yu, Huilong; Zhang, Jian

    2017-09-01

    We propose to use binary Golay sequences in coherent optical orthogonal frequency division multiplexing (CO-OFDM) to improve the long-haul transmission performance. The Golay sequences are generated by binary Reed-Muller codes, which have low peak-to-average power ratio and certain error correction capability. A low-complexity decoding algorithm for the Golay sequences is then proposed to recover the signal. Under same spectral efficiency, the QPSK modulated OFDM with binary Golay sequences coding with and without discrete Fourier transform (DFT) spreading (DFTS-QPSK-GOFDM and QPSK-GOFDM) are compared with the normal BPSK modulated OFDM with and without DFT spreading (DFTS-BPSK-OFDM and BPSK-OFDM) after long-haul transmission. At a 7% forward error correction code threshold (Q2 factor of 8.5 dB), it is shown that DFTS-QPSK-GOFDM outperforms DFTS-BPSK-OFDM by extending the transmission distance by 29% and 18%, in non-dispersion managed and dispersion managed links, respectively.

  3. Design and simulation of a 800 Mbit/s data link for magnetic resonance imaging wearables.

    PubMed

    Vogt, Christian; Buthe, Lars; Petti, Luisa; Cantarella, Giuseppe; Munzenrieder, Niko; Daus, Alwin; Troster, Gerhard

    2015-08-01

    This paper presents the optimization of electronic circuitry for operation in the harsh electro magnetic (EM) environment during a magnetic resonance imaging (MRI) scan. As demonstrator, a device small enough to be worn during the scan is optimized. Based on finite element method (FEM) simulations, the induced current densities due to magnetic field changes of 200 T s(-1) were reduced from 1 × 10(10) A m(-2) by one order of magnitude, predicting error-free operation of the 1.8V logic employed. The simulations were validated using a bit error rate test, which showed no bit errors during a MRI scan sequence. Therefore, neither the logic, nor the utilized 800 Mbit s(-1) low voltage differential swing (LVDS) data link of the optimized wearable device were significantly influenced by the EM interference. Next, the influence of ferro-magnetic components on the static magnetic field and consequently the image quality was simulated showing a MRI image loss with approximately 2 cm radius around a commercial integrated circuit of 1×1 cm(2). This was successively validated by a conventional MRI scan.

  4. Fault-tolerance thresholds for the surface code with fabrication errors

    NASA Astrophysics Data System (ADS)

    Auger, James M.; Anwar, Hussain; Gimeno-Segovia, Mercedes; Stace, Thomas M.; Browne, Dan E.

    2017-10-01

    The construction of topological error correction codes requires the ability to fabricate a lattice of physical qubits embedded on a manifold with a nontrivial topology such that the quantum information is encoded in the global degrees of freedom (i.e., the topology) of the manifold. However, the manufacturing of large-scale topological devices will undoubtedly suffer from fabrication errors—permanent faulty components such as missing physical qubits or failed entangling gates—introducing permanent defects into the topology of the lattice and hence significantly reducing the distance of the code and the quality of the encoded logical qubits. In this work we investigate how fabrication errors affect the performance of topological codes, using the surface code as the test bed. A known approach to mitigate defective lattices involves the use of primitive swap gates in a long sequence of syndrome extraction circuits. Instead, we show that in the presence of fabrication errors the syndrome can be determined using the supercheck operator approach and the outcome of the defective gauge stabilizer generators without any additional computational overhead or use of swap gates. We report numerical fault-tolerance thresholds in the presence of both qubit fabrication and gate fabrication errors using a circuit-based noise model and the minimum-weight perfect-matching decoder. Our numerical analysis is most applicable to two-dimensional chip-based technologies, but the techniques presented here can be readily extended to other topological architectures. We find that in the presence of 8 % qubit fabrication errors, the surface code can still tolerate a computational error rate of up to 0.1 % .

  5. Simultaneous Control of Error Rates in fMRI Data Analysis

    PubMed Central

    Kang, Hakmook; Blume, Jeffrey; Ombao, Hernando; Badre, David

    2015-01-01

    The key idea of statistical hypothesis testing is to fix, and thereby control, the Type I error (false positive) rate across samples of any size. Multiple comparisons inflate the global (family-wise) Type I error rate and the traditional solution to maintaining control of the error rate is to increase the local (comparison-wise) Type II error (false negative) rates. However, in the analysis of human brain imaging data, the number of comparisons is so large that this solution breaks down: the local Type II error rate ends up being so large that scientifically meaningful analysis is precluded. Here we propose a novel solution to this problem: allow the Type I error rate to converge to zero along with the Type II error rate. It works because when the Type I error rate per comparison is very small, the accumulation (or global) Type I error rate is also small. This solution is achieved by employing the Likelihood paradigm, which uses likelihood ratios to measure the strength of evidence on a voxel-by-voxel basis. In this paper, we provide theoretical and empirical justification for a likelihood approach to the analysis of human brain imaging data. In addition, we present extensive simulations that show the likelihood approach is viable, leading to ‘cleaner’ looking brain maps and operationally superiority (lower average error rate). Finally, we include a case study on cognitive control related activation in the prefrontal cortex of the human brain. PMID:26272730

  6. MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data.

    PubMed

    Fan, Yu; Xi, Liu; Hughes, Daniel S T; Zhang, Jianjun; Zhang, Jianhua; Futreal, P Andrew; Wheeler, David A; Wang, Wenyi

    2016-08-24

    Subclonal mutations reveal important features of the genetic architecture of tumors. However, accurate detection of mutations in genetically heterogeneous tumor cell populations using next-generation sequencing remains challenging. We develop MuSE ( http://bioinformatics.mdanderson.org/main/MuSE ), Mutation calling using a Markov Substitution model for Evolution, a novel approach for modeling the evolution of the allelic composition of the tumor and normal tissue at each reference base. MuSE adopts a sample-specific error model that reflects the underlying tumor heterogeneity to greatly improve the overall accuracy. We demonstrate the accuracy of MuSE in calling subclonal mutations in the context of large-scale tumor sequencing projects using whole exome and whole genome sequencing.

  7. Model Performance Evaluation and Scenario Analysis ...

    EPA Pesticide Factsheets

    This tool consists of two parts: model performance evaluation and scenario analysis (MPESA). The model performance evaluation consists of two components: model performance evaluation metrics and model diagnostics. These metrics provides modelers with statistical goodness-of-fit measures that capture magnitude only, sequence only, and combined magnitude and sequence errors. The performance measures include error analysis, coefficient of determination, Nash-Sutcliffe efficiency, and a new weighted rank method. These performance metrics only provide useful information about the overall model performance. Note that MPESA is based on the separation of observed and simulated time series into magnitude and sequence components. The separation of time series into magnitude and sequence components and the reconstruction back to time series provides diagnostic insights to modelers. For example, traditional approaches lack the capability to identify if the source of uncertainty in the simulated data is due to the quality of the input data or the way the analyst adjusted the model parameters. This report presents a suite of model diagnostics that identify if mismatches between observed and simulated data result from magnitude or sequence related errors. MPESA offers graphical and statistical options that allow HSPF users to compare observed and simulated time series and identify the parameter values to adjust or the input data to modify. The scenario analysis part of the too

  8. Pathway analysis with next-generation sequencing data.

    PubMed

    Zhao, Jinying; Zhu, Yun; Boerwinkle, Eric; Xiong, Momiao

    2015-04-01

    Although pathway analysis methods have been developed and successfully applied to association studies of common variants, the statistical methods for pathway-based association analysis of rare variants have not been well developed. Many investigators observed highly inflated false-positive rates and low power in pathway-based tests of association of rare variants. The inflated false-positive rates and low true-positive rates of the current methods are mainly due to their lack of ability to account for gametic phase disequilibrium. To overcome these serious limitations, we develop a novel statistic that is based on the smoothed functional principal component analysis (SFPCA) for pathway association tests with next-generation sequencing data. The developed statistic has the ability to capture position-level variant information and account for gametic phase disequilibrium. By intensive simulations, we demonstrate that the SFPCA-based statistic for testing pathway association with either rare or common or both rare and common variants has the correct type 1 error rates. Also the power of the SFPCA-based statistic and 22 additional existing statistics are evaluated. We found that the SFPCA-based statistic has a much higher power than other existing statistics in all the scenarios considered. To further evaluate its performance, the SFPCA-based statistic is applied to pathway analysis of exome sequencing data in the early-onset myocardial infarction (EOMI) project. We identify three pathways significantly associated with EOMI after the Bonferroni correction. In addition, our preliminary results show that the SFPCA-based statistic has much smaller P-values to identify pathway association than other existing methods.

  9. Classification of G-protein coupled receptors based on a rich generation of convolutional neural network, N-gram transformation and multiple sequence alignments.

    PubMed

    Li, Man; Ling, Cheng; Xu, Qi; Gao, Jingyang

    2018-02-01

    Sequence classification is crucial in predicting the function of newly discovered sequences. In recent years, the prediction of the incremental large-scale and diversity of sequences has heavily relied on the involvement of machine-learning algorithms. To improve prediction accuracy, these algorithms must confront the key challenge of extracting valuable features. In this work, we propose a feature-enhanced protein classification approach, considering the rich generation of multiple sequence alignment algorithms, N-gram probabilistic language model and the deep learning technique. The essence behind the proposed method is that if each group of sequences can be represented by one feature sequence, composed of homologous sites, there should be less loss when the sequence is rebuilt, when a more relevant sequence is added to the group. On the basis of this consideration, the prediction becomes whether a query sequence belonging to a group of sequences can be transferred to calculate the probability that the new feature sequence evolves from the original one. The proposed work focuses on the hierarchical classification of G-protein Coupled Receptors (GPCRs), which begins by extracting the feature sequences from the multiple sequence alignment results of the GPCRs sub-subfamilies. The N-gram model is then applied to construct the input vectors. Finally, these vectors are imported into a convolutional neural network to make a prediction. The experimental results elucidate that the proposed method provides significant performance improvements. The classification error rate of the proposed method is reduced by at least 4.67% (family level I) and 5.75% (family Level II), in comparison with the current state-of-the-art methods. The implementation program of the proposed work is freely available at: https://github.com/alanFchina/CNN .

  10. The Performance of the Date-Randomization Test in Phylogenetic Analyses of Time-Structured Virus Data.

    PubMed

    Duchêne, Sebastián; Duchêne, David; Holmes, Edward C; Ho, Simon Y W

    2015-07-01

    Rates and timescales of viral evolution can be estimated using phylogenetic analyses of time-structured molecular sequences. This involves the use of molecular-clock methods, calibrated by the sampling times of the viral sequences. However, the spread of these sampling times is not always sufficient to allow the substitution rate to be estimated accurately. We conducted Bayesian phylogenetic analyses of simulated virus data to evaluate the performance of the date-randomization test, which is sometimes used to investigate whether time-structured data sets have temporal signal. An estimate of the substitution rate passes this test if its mean does not fall within the 95% credible intervals of rate estimates obtained using replicate data sets in which the sampling times have been randomized. We find that the test sometimes fails to detect rate estimates from data with no temporal signal. This error can be minimized by using a more conservative criterion, whereby the 95% credible interval of the estimate with correct sampling times should not overlap with those obtained with randomized sampling times. We also investigated the behavior of the test when the sampling times are not uniformly distributed throughout the tree, which sometimes occurs in empirical data sets. The test performs poorly in these circumstances, such that a modification to the randomization scheme is needed. Finally, we illustrate the behavior of the test in analyses of nucleotide sequences of cereal yellow dwarf virus. Our results validate the use of the date-randomization test and allow us to propose guidelines for interpretation of its results. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  11. Carbapenem Susceptibility Testing Errors Using Three Automated Systems, Disk Diffusion, Etest, and Broth Microdilution and Carbapenem Resistance Genes in Isolates of Acinetobacter baumannii-calcoaceticus Complex

    DTIC Science & Technology

    2011-10-01

    Phoenix, and Vitek 2 systems). Discordant results were categorized as very major errors (VME), major errors (ME), and minor errors (mE). DNA sequences...01 OCT 2011 2 . REPORT TYPE N/A 3. DATES COVERED - 4. TITLE AND SUBTITLE Carbapenem Susceptibility Testing Errors Using Three Automated...FDA standards required for device approval (11). The Vitek 2 method was the only automated susceptibility method in our study that satisfied FDA

  12. Effects of learning duration on implicit transfer.

    PubMed

    Tanaka, Kanji; Watanabe, Katsumi

    2015-10-01

    Implicit learning and transfer in sequence acquisition play important roles in daily life. Several previous studies have found that even when participants are not aware that a transfer sequence has been transformed from the learning sequence, they are able to perform the transfer sequence faster and more accurately; this suggests implicit transfer of visuomotor sequences. Here, we investigated whether implicit transfer could be modulated by the number of trials completed in a learning session. Participants learned a sequence through trial and error, known as the m × n task (Hikosaka et al. in J Neurophysiol 74:1652-1661, 1995). In the learning session, participants were required to successfully perform the same sequence 4, 12, 16, or 20 times. In the transfer session, participants then learned one of two other sequences: one where the button configuration Vertically Mirrored the learning sequence, or a randomly generated sequence. Our results show that even when participants did not notice the alternation rule (i.e., vertical mirroring), their total working time was less and their total number of errors was lower in the transfer session compared with those who performed a Random sequence, irrespective of the number of trials completed in the learning session. This result suggests that implicit transfer likely occurs even over a shorter learning duration.

  13. Sources of PCR-induced distortions in high-throughput sequencing data sets

    PubMed Central

    Kebschull, Justus M.; Zador, Anthony M.

    2015-01-01

    PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCR-mediated amplification. Here we examine the effects of four important sources of error—bias, stochasticity, template switches and polymerase errors—on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules. PMID:26187991

  14. Comparison of the Predictive Accuracy of DNA Array-Based Multigene Classifiers across cDNA Arrays and Affymetrix GeneChips

    PubMed Central

    Stec, James; Wang, Jing; Coombes, Kevin; Ayers, Mark; Hoersch, Sebastian; Gold, David L.; Ross, Jeffrey S; Hess, Kenneth R.; Tirrell, Stephen; Linette, Gerald; Hortobagyi, Gabriel N.; Symmans, W. Fraser; Pusztai, Lajos

    2005-01-01

    We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r ≥ 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene. PMID:16049308

  15. Joint Frequency-Domain Equalization and Despreading for Multi-Code DS-CDMA Using Cyclic Delay Transmit Diversity

    NASA Astrophysics Data System (ADS)

    Yamamoto, Tetsuya; Takeda, Kazuki; Adachi, Fumiyuki

    Frequency-domain equalization (FDE) based on the minimum mean square error (MMSE) criterion can provide a better bit error rate (BER) performance than rake combining. To further improve the BER performance, cyclic delay transmit diversity (CDTD) can be used. CDTD simultaneously transmits the same signal from different antennas after adding different cyclic delays to increase the number of equivalent propagation paths. Although a joint use of CDTD and MMSE-FDE for direct sequence code division multiple access (DS-CDMA) achieves larger frequency diversity gain, the BER performance improvement is limited by the residual inter-chip interference (ICI) after FDE. In this paper, we propose joint FDE and despreading for DS-CDMA using CDTD. Equalization and despreading are simultaneously performed in the frequency-domain to suppress the residual ICI after FDE. A theoretical conditional BER analysis is presented for the given channel condition. The BER analysis is confirmed by computer simulation.

  16. RLS Channel Estimation with Adaptive Forgetting Factor for DS-CDMA Frequency-Domain Equalization

    NASA Astrophysics Data System (ADS)

    Kojima, Yohei; Tomeba, Hiromichi; Takeda, Kazuaki; Adachi, Fumiyuki

    Frequency-domain equalization (FDE) based on the minimum mean square error (MMSE) criterion can increase the downlink bit error rate (BER) performance of DS-CDMA beyond that possible with conventional rake combining in a frequency-selective fading channel. FDE requires accurate channel estimation. Recently, we proposed a pilot-assisted channel estimation (CE) based on the MMSE criterion. Using MMSE-CE, the channel estimation accuracy is almost insensitive to the pilot chip sequence, and a good BER performance is achieved. In this paper, we propose a channel estimation scheme using one-tap recursive least square (RLS) algorithm, where the forgetting factor is adapted to the changing channel condition by the least mean square (LMS)algorithm, for DS-CDMA with FDE. We evaluate the BER performance using RLS-CE with adaptive forgetting factor in a frequency-selective fast Rayleigh fading channel by computer simulation.

  17. Error-based Extraction of States and Energy Landscapes from Experimental Single-Molecule Time-Series

    NASA Astrophysics Data System (ADS)

    Taylor, J. Nicholas; Li, Chun-Biu; Cooper, David R.; Landes, Christy F.; Komatsuzaki, Tamiki

    2015-03-01

    Characterization of states, the essential components of the underlying energy landscapes, is one of the most intriguing subjects in single-molecule (SM) experiments due to the existence of noise inherent to the measurements. Here we present a method to extract the underlying state sequences from experimental SM time-series. Taking into account empirical error and the finite sampling of the time-series, the method extracts a steady-state network which provides an approximation of the underlying effective free energy landscape. The core of the method is the application of rate-distortion theory from information theory, allowing the individual data points to be assigned to multiple states simultaneously. We demonstrate the method's proficiency in its application to simulated trajectories as well as to experimental SM fluorescence resonance energy transfer (FRET) trajectories obtained from isolated agonist binding domains of the AMPA receptor, an ionotropic glutamate receptor that is prevalent in the central nervous system.

  18. Large Angle Reorientation of a Solar Sail Using Gimballed Mass Control

    NASA Astrophysics Data System (ADS)

    Sperber, E.; Fu, B.; Eke, F. O.

    2016-06-01

    This paper proposes a control strategy for the large angle reorientation of a solar sail equipped with a gimballed mass. The algorithm consists of a first stage that manipulates the gimbal angle in order to minimize the attitude error about a single principal axis. Once certain termination conditions are reached, a regulator is employed that selects a single gimbal angle for minimizing both the residual attitude error concomitantly with the body rate. Because the force due to the specular reflection of radiation is always directed along a reflector's surface normal, this form of thrust vector control cannot generate torques about an axis normal to the plane of the sail. Thus, in order to achieve three-axis control authority a 1-2-1 or 2-1-2 sequence of rotations about principal axes is performed. The control algorithm is implemented directly in-line with the nonlinear equations of motion and key performance characteristics are identified.

  19. Precision and recall estimates for two-hybrid screens

    PubMed Central

    Huang, Hailiang; Bader, Joel S.

    2009-01-01

    Motivation: Yeast two-hybrid screens are an important method to map pairwise protein interactions. This method can generate spurious interactions (false discoveries), and true interactions can be missed (false negatives). Previously, we reported a capture–recapture estimator for bait-specific precision and recall. Here, we present an improved method that better accounts for heterogeneity in bait-specific error rates. Result: For yeast, worm and fly screens, we estimate the overall false discovery rates (FDRs) to be 9.9%, 13.2% and 17.0% and the false negative rates (FNRs) to be 51%, 42% and 28%. Bait-specific FDRs and the estimated protein degrees are then used to identify protein categories that yield more (or fewer) false positive interactions and more (or fewer) interaction partners. While membrane proteins have been suggested to have elevated FDRs, the current analysis suggests that intrinsic membrane proteins may actually have reduced FDRs. Hydrophobicity is positively correlated with decreased error rates and fewer interaction partners. These methods will be useful for future two-hybrid screens, which could use ultra-high-throughput sequencing for deeper sampling of interacting bait–prey pairs. Availability: All software (C source) and datasets are available as supplemental files and at http://www.baderzone.org under the Lesser GPL v. 3 license. Contact: joel.bader@jhu.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19091773

  20. A cascaded coding scheme for error control and its performance analysis

    NASA Technical Reports Server (NTRS)

    Lin, Shu; Kasami, Tadao; Fujiwara, Tohru; Takata, Toyoo

    1986-01-01

    A coding scheme is investigated for error control in data communication systems. The scheme is obtained by cascading two error correcting codes, called the inner and outer codes. The error performance of the scheme is analyzed for a binary symmetric channel with bit error rate epsilon <1/2. It is shown that if the inner and outer codes are chosen properly, extremely high reliability can be attained even for a high channel bit error rate. Various specific example schemes with inner codes ranging form high rates to very low rates and Reed-Solomon codes as inner codes are considered, and their error probabilities are evaluated. They all provide extremely high reliability even for very high bit error rates. Several example schemes are being considered by NASA for satellite and spacecraft down link error control.

  1. The evolutionary rate dynamically tracks changes in HIV-1 epidemics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Maljkovic-berry, Irina; Athreya, Gayathri; Daniels, Marcus

    Large-sequence datasets provide an opportunity to investigate the dynamics of pathogen epidemics. Thus, a fast method to estimate the evolutionary rate from large and numerous phylogenetic trees becomes necessary. Based on minimizing tip height variances, we optimize the root in a given phylogenetic tree to estimate the most homogenous evolutionary rate between samples from at least two different time points. Simulations showed that the method had no bias in the estimation of evolutionary rates and that it was robust to tree rooting and topological errors. We show that the evolutionary rates of HIV-1 subtype B and C epidemics have changedmore » over time, with the rate of evolution inversely correlated to the rate of virus spread. For subtype B, the evolutionary rate slowed down and tracked the start of the HAART era in 1996. Subtype C in Ethiopia showed an increase in the evolutionary rate when the prevalence increase markedly slowed down in 1995. Thus, we show that the evolutionary rate of HIV-1 on the population level dynamically tracks epidemic events.« less

  2. Efficiency, error and yield in light-directed maskless synthesis of DNA microarrays

    PubMed Central

    2011-01-01

    Background Light-directed in situ synthesis of DNA microarrays using computer-controlled projection from a digital micromirror device--maskless array synthesis (MAS)--has proved to be successful at both commercial and laboratory scales. The chemical synthetic cycle in MAS is quite similar to that of conventional solid-phase synthesis of oligonucleotides, but the complexity of microarrays and unique synthesis kinetics on the glass substrate require a careful tuning of parameters and unique modifications to the synthesis cycle to obtain optimal deprotection and phosphoramidite coupling. In addition, unintended deprotection due to scattering and diffraction introduce insertion errors that contribute significantly to the overall error rate. Results Stepwise phosphoramidite coupling yields have been greatly improved and are now comparable to those obtained in solid phase synthesis of oligonucleotides. Extended chemical exposure in the synthesis of complex, long oligonucleotide arrays result in lower--but still high--final average yields which approach 99%. The new synthesis chemistry includes elimination of the standard oxidation until the final step, and improved coupling and light deprotection. Coupling Insertions due to stray light are the limiting factor in sequence quality for oligonucleotide synthesis for gene assembly. Diffraction and local flare are by far the largest contributors to loss of optical contrast. Conclusions Maskless array synthesis is an efficient and versatile method for synthesizing high density arrays of long oligonucleotides for hybridization- and other molecular binding-based experiments. For applications requiring high sequence purity, such as gene assembly, diffraction and flare remain significant obstacles, but can be significantly reduced with straightforward experimental strategies. PMID:22152062

  3. Outage probability of a relay strategy allowing intra-link errors utilizing Slepian-Wolf theorem

    NASA Astrophysics Data System (ADS)

    Cheng, Meng; Anwar, Khoirul; Matsumoto, Tad

    2013-12-01

    In conventional decode-and-forward (DF) one-way relay systems, a data block received at the relay node is discarded, if the information part is found to have errors after decoding. Such errors are referred to as intra-link errors in this article. However, in a setup where the relay forwards data blocks despite possible intra-link errors, the two data blocks, one from the source node and the other from the relay node, are highly correlated because they were transmitted from the same source. In this article, we focus on the outage probability analysis of such a relay transmission system, where source-destination and relay-destination links, Link 1 and Link 2, respectively, are assumed to suffer from the correlated fading variation due to block Rayleigh fading. The intra-link is assumed to be represented by a simple bit-flipping model, where some of the information bits recovered at the relay node are the flipped version of their corresponding original information bits at the source. The correlated bit streams are encoded separately by the source and relay nodes, and transmitted block-by-block to a common destination using different time slots, where the information sequence transmitted over Link 2 may be a noise-corrupted interleaved version of the original sequence. The joint decoding takes place at the destination by exploiting the correlation knowledge of the intra-link (source-relay link). It is shown that the outage probability of the proposed transmission technique can be expressed by a set of double integrals over the admissible rate range, given by the Slepian-Wolf theorem, with respect to the probability density function ( pdf) of the instantaneous signal-to-noise power ratios (SNR) of Link 1 and Link 2. It is found that, with the Slepian-Wolf relay technique, so far as the correlation ρ of the complex fading variation is | ρ|<1, the 2nd order diversity can be achieved only if the two bit streams are fully correlated. This indicates that the diversity order exhibited in the outage curve converges to 1 when the bit streams are not fully correlated. Moreover, the Slepian-Wolf outage probability is proved to be smaller than that of the 2nd order maximum ratio combining (MRC) diversity, if the average SNRs of the two independent links are the same. Exact as well as asymptotic expressions of the outage probability are theoretically derived in the article. In addition, the theoretical outage results are compared with the frame-error-rate (FER) curves, obtained by a series of simulations for the Slepian-Wolf relay system based on bit-interleaved coded modulation with iterative detection (BICM-ID). It is shown that the FER curves exhibit the same tendency as the theoretical results.

  4. HIV populations are large and accumulate high genetic diversity in a nonlinear fashion.

    PubMed

    Maldarelli, Frank; Kearney, Mary; Palmer, Sarah; Stephens, Robert; Mican, JoAnn; Polis, Michael A; Davey, Richard T; Kovacs, Joseph; Shao, Wei; Rock-Kress, Diane; Metcalf, Julia A; Rehm, Catherine; Greer, Sarah E; Lucey, Daniel L; Danley, Kristen; Alter, Harvey; Mellors, John W; Coffin, John M

    2013-09-01

    HIV infection is characterized by rapid and error-prone viral replication resulting in genetically diverse virus populations. The rate of accumulation of diversity and the mechanisms involved are under intense study to provide useful information to understand immune evasion and the development of drug resistance. To characterize the development of viral diversity after infection, we carried out an in-depth analysis of single genome sequences of HIV pro-pol to assess diversity and divergence and to estimate replicating population sizes in a group of treatment-naive HIV-infected individuals sampled at single (n = 22) or multiple, longitudinal (n = 11) time points. Analysis of single genome sequences revealed nonlinear accumulation of sequence diversity during the course of infection. Diversity accumulated in recently infected individuals at rates 30-fold higher than in patients with chronic infection. Accumulation of synonymous changes accounted for most of the diversity during chronic infection. Accumulation of diversity resulted in population shifts, but the rates of change were low relative to estimated replication cycle times, consistent with relatively large population sizes. Analysis of changes in allele frequencies revealed effective population sizes that are substantially higher than previous estimates of approximately 1,000 infectious particles/infected individual. Taken together, these observations indicate that HIV populations are large, diverse, and slow to change in chronic infection and that the emergence of new mutations, including drug resistance mutations, is governed by both selection forces and drift.

  5. 45 CFR 98.100 - Error Rate Report.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 45 Public Welfare 1 2013-10-01 2013-10-01 false Error Rate Report. 98.100 Section 98.100 Public Welfare DEPARTMENT OF HEALTH AND HUMAN SERVICES GENERAL ADMINISTRATION CHILD CARE AND DEVELOPMENT FUND Error Rate Reporting § 98.100 Error Rate Report. (a) Applicability—The requirements of this subpart...

  6. 45 CFR 98.100 - Error Rate Report.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 45 Public Welfare 1 2014-10-01 2014-10-01 false Error Rate Report. 98.100 Section 98.100 Public Welfare Department of Health and Human Services GENERAL ADMINISTRATION CHILD CARE AND DEVELOPMENT FUND Error Rate Reporting § 98.100 Error Rate Report. (a) Applicability—The requirements of this subpart...

  7. 45 CFR 98.100 - Error Rate Report.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 45 Public Welfare 1 2012-10-01 2012-10-01 false Error Rate Report. 98.100 Section 98.100 Public Welfare DEPARTMENT OF HEALTH AND HUMAN SERVICES GENERAL ADMINISTRATION CHILD CARE AND DEVELOPMENT FUND Error Rate Reporting § 98.100 Error Rate Report. (a) Applicability—The requirements of this subpart...

  8. 45 CFR 98.100 - Error Rate Report.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 45 Public Welfare 1 2011-10-01 2011-10-01 false Error Rate Report. 98.100 Section 98.100 Public Welfare DEPARTMENT OF HEALTH AND HUMAN SERVICES GENERAL ADMINISTRATION CHILD CARE AND DEVELOPMENT FUND Error Rate Reporting § 98.100 Error Rate Report. (a) Applicability—The requirements of this subpart...

  9. Neuromorphic learning of continuous-valued mappings from noise-corrupted data. Application to real-time adaptive control

    NASA Technical Reports Server (NTRS)

    Troudet, Terry; Merrill, Walter C.

    1990-01-01

    The ability of feed-forward neural network architectures to learn continuous valued mappings in the presence of noise was demonstrated in relation to parameter identification and real-time adaptive control applications. An error function was introduced to help optimize parameter values such as number of training iterations, observation time, sampling rate, and scaling of the control signal. The learning performance depended essentially on the degree of embodiment of the control law in the training data set and on the degree of uniformity of the probability distribution function of the data that are presented to the net during sequence. When a control law was corrupted by noise, the fluctuations of the training data biased the probability distribution function of the training data sequence. Only if the noise contamination is minimized and the degree of embodiment of the control law is maximized, can a neural net develop a good representation of the mapping and be used as a neurocontroller. A multilayer net was trained with back-error-propagation to control a cart-pole system for linear and nonlinear control laws in the presence of data processing noise and measurement noise. The neurocontroller exhibited noise-filtering properties and was found to operate more smoothly than the teacher in the presence of measurement noise.

  10. Targeted Next Generation Sequencing in Patients with Inborn Errors of Metabolism

    PubMed Central

    Yubero, Dèlia; Brandi, Núria; Ormazabal, Aida; Garcia-Cazorla, Àngels; Pérez-Dueñas, Belén; Campistol, Jaime; Ribes, Antonia; Palau, Francesc

    2016-01-01

    Background Next-generation sequencing (NGS) technology has allowed the promotion of genetic diagnosis and are becoming increasingly inexpensive and faster. To evaluate the utility of NGS in the clinical field, a targeted genetic panel approach was designed for the diagnosis of a set of inborn errors of metabolism (IEM). The final aim of the study was to compare the findings for the diagnostic yield of NGS in patients who presented with consistent clinical and biochemical suspicion of IEM with those obtained for patients who did not have specific biomarkers. Methods The subjects studied (n = 146) were classified into two categories: Group 1 (n = 81), which consisted of patients with clinical and biochemical suspicion of IEM, and Group 2 (n = 65), which consisted of IEM cases with clinical suspicion and unspecific biomarkers. A total of 171 genes were analyzed using a custom targeted panel of genes followed by Sanger validation. Results Genetic diagnosis was achieved in 50% of patients (73/146). In addition, the diagnostic yield obtained for Group 1 was 78% (63/81), and this rate decreased to 15.4% (10/65) in Group 2 (X2 = 76.171; p < 0.0001). Conclusions A rapid and effective genetic diagnosis was achieved in our cohort, particularly the group that had both clinical and biochemical indications for the diagnosis. PMID:27243974

  11. Efficiency of the neighbor-joining method in reconstructing deep and shallow evolutionary relationships in large phylogenies.

    PubMed

    Kumar, S; Gadagkar, S R

    2000-12-01

    The neighbor-joining (NJ) method is widely used in reconstructing large phylogenies because of its computational speed and the high accuracy in phylogenetic inference as revealed in computer simulation studies. However, most computer simulation studies have quantified the overall performance of the NJ method in terms of the percentage of branches inferred correctly or the percentage of replications in which the correct tree is recovered. We have examined other aspects of its performance, such as the relative efficiency in correctly reconstructing shallow (close to the external branches of the tree) and deep branches in large phylogenies; the contribution of zero-length branches to topological errors in the inferred trees; and the influence of increasing the tree size (number of sequences), evolutionary rate, and sequence length on the efficiency of the NJ method. Results show that the correct reconstruction of deep branches is no more difficult than that of shallower branches. The presence of zero-length branches in realized trees contributes significantly to the overall error observed in the NJ tree, especially in large phylogenies or slowly evolving genes. Furthermore, the tree size does not influence the efficiency of NJ in reconstructing shallow and deep branches in our simulation study, in which the evolutionary process is assumed to be homogeneous in all lineages.

  12. Consensus generation and variant detection by Celera Assembler.

    PubMed

    Denisov, Gennady; Walenz, Brian; Halpern, Aaron L; Miller, Jason; Axelrod, Nelson; Levy, Samuel; Sutton, Granger

    2008-04-15

    We present an algorithm to identify allelic variation given a Whole Genome Shotgun (WGS) assembly of haploid sequences, and to produce a set of haploid consensus sequences rather than a single consensus sequence. Existing WGS assemblers take a column-by-column approach to consensus generation, and produce a single consensus sequence which can be inconsistent with the underlying haploid alleles, and inconsistent with any of the aligned sequence reads. Our new algorithm uses a dynamic windowing approach. It detects alleles by simultaneously processing the portions of aligned reads spanning a region of sequence variation, assigns reads to their respective alleles, phases adjacent variant alleles and generates a consensus sequence corresponding to each confirmed allele. This algorithm was used to produce the first diploid genome sequence of an individual human. It can also be applied to assemblies of multiple diploid individuals and hybrid assemblies of multiple haploid organisms. Being applied to the individual human genome assembly, the new algorithm detects exactly two confirmed alleles and reports two consensus sequences in 98.98% of the total number 2,033311 detected regions of sequence variation. In 33,269 out of 460,373 detected regions of size >1 bp, it fixes the constructed errors of a mosaic haploid representation of a diploid locus as produced by the original Celera Assembler consensus algorithm. Using an optimized procedure calibrated against 1 506 344 known SNPs, it detects 438 814 new heterozygous SNPs with false positive rate 12%. The open source code is available at: http://wgs-assembler.cvs.sourceforge.net/wgs-assembler/

  13. An educational and audit tool to reduce prescribing error in intensive care.

    PubMed

    Thomas, A N; Boxall, E M; Laha, S K; Day, A J; Grundy, D

    2008-10-01

    To reduce prescribing errors in an intensive care unit by providing prescriber education in tutorials, ward-based teaching and feedback in 3-monthly cycles with each new group of trainee medical staff. Prescribing audits were conducted three times in each 3-month cycle, once pretraining, once post-training and a final audit after 6 weeks. The audit information was fed back to prescribers with their correct prescribing rates, rates for individual error types and total error rates together with anonymised information about other prescribers' error rates. The percentage of prescriptions with errors decreased over each 3-month cycle (pretraining 25%, 19%, (one missing data point), post-training 23%, 6%, 11%, final audit 7%, 3%, 5% (p<0.0005)). The total number of prescriptions and error rates varied widely between trainees (data collection one; cycle two: range of prescriptions written: 1-61, median 18; error rate: 0-100%; median: 15%). Prescriber education and feedback reduce manual prescribing errors in intensive care.

  14. A Six Sigma Trial For Reduction of Error Rates in Pathology Laboratory.

    PubMed

    Tosuner, Zeynep; Gücin, Zühal; Kiran, Tuğçe; Büyükpinarbaşili, Nur; Turna, Seval; Taşkiran, Olcay; Arici, Dilek Sema

    2016-01-01

    A major target of quality assurance is the minimization of error rates in order to enhance patient safety. Six Sigma is a method targeting zero error (3.4 errors per million events) used in industry. The five main principles of Six Sigma are defining, measuring, analysis, improvement and control. Using this methodology, the causes of errors can be examined and process improvement strategies can be identified. The aim of our study was to evaluate the utility of Six Sigma methodology in error reduction in our pathology laboratory. The errors encountered between April 2014 and April 2015 were recorded by the pathology personnel. Error follow-up forms were examined by the quality control supervisor, administrative supervisor and the head of the department. Using Six Sigma methodology, the rate of errors was measured monthly and the distribution of errors at the preanalytic, analytic and postanalytical phases was analysed. Improvement strategies were reclaimed in the monthly intradepartmental meetings and the control of the units with high error rates was provided. Fifty-six (52.4%) of 107 recorded errors in total were at the pre-analytic phase. Forty-five errors (42%) were recorded as analytical and 6 errors (5.6%) as post-analytical. Two of the 45 errors were major irrevocable errors. The error rate was 6.8 per million in the first half of the year and 1.3 per million in the second half, decreasing by 79.77%. The Six Sigma trial in our pathology laboratory provided the reduction of the error rates mainly in the pre-analytic and analytic phases.

  15. Data Analysis & Statistical Methods for Command File Errors

    NASA Technical Reports Server (NTRS)

    Meshkat, Leila; Waggoner, Bruce; Bryant, Larry

    2014-01-01

    This paper explains current work on modeling for managing the risk of command file errors. It is focused on analyzing actual data from a JPL spaceflight mission to build models for evaluating and predicting error rates as a function of several key variables. We constructed a rich dataset by considering the number of errors, the number of files radiated, including the number commands and blocks in each file, as well as subjective estimates of workload and operational novelty. We have assessed these data using different curve fitting and distribution fitting techniques, such as multiple regression analysis, and maximum likelihood estimation to see how much of the variability in the error rates can be explained with these. We have also used goodness of fit testing strategies and principal component analysis to further assess our data. Finally, we constructed a model of expected error rates based on the what these statistics bore out as critical drivers to the error rate. This model allows project management to evaluate the error rate against a theoretically expected rate as well as anticipate future error rates.

  16. Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs.

    PubMed

    Powell, Bradford C; Hutchison, Clyde A

    2006-01-19

    Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene prediction. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes.

  17. Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs

    PubMed Central

    Powell, Bradford C; Hutchison, Clyde A

    2006-01-01

    Background Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. Results "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene predicion. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. Conclusion Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes. PMID:16423288

  18. Detecting Signatures of GRACE Sensor Errors in Range-Rate Residuals

    NASA Astrophysics Data System (ADS)

    Goswami, S.; Flury, J.

    2016-12-01

    In order to reach the accuracy of the GRACE baseline, predicted earlier from the design simulations, efforts are ongoing since a decade. GRACE error budget is highly dominated by noise from sensors, dealiasing models and modeling errors. GRACE range-rate residuals contain these errors. Thus, their analysis provides an insight to understand the individual contribution to the error budget. Hence, we analyze the range-rate residuals with focus on contribution of sensor errors due to mis-pointing and bad ranging performance in GRACE solutions. For the analysis of pointing errors, we consider two different reprocessed attitude datasets with differences in pointing performance. Then range-rate residuals are computed from these two datasetsrespectively and analysed. We further compare the system noise of four K-and Ka- band frequencies of the two spacecrafts, with range-rate residuals. Strong signatures of mis-pointing errors can be seen in the range-rate residuals. Also, correlation between range frequency noise and range-rate residuals are seen.

  19. Analysis of phase error effects in multishot diffusion-prepared turbo spin echo imaging

    PubMed Central

    Cervantes, Barbara; Kooijman, Hendrik; Karampinos, Dimitrios C.

    2017-01-01

    Background To characterize the effect of phase errors on the magnitude and the phase of the diffusion-weighted (DW) signal acquired with diffusion-prepared turbo spin echo (dprep-TSE) sequences. Methods Motion and eddy currents were identified as the main sources of phase errors. An analytical expression for the effect of phase errors on the acquired signal was derived and verified using Bloch simulations, phantom, and in vivo experiments. Results Simulations and experiments showed that phase errors during the diffusion preparation cause both magnitude and phase modulation on the acquired data. When motion-induced phase error (MiPe) is accounted for (e.g., with motion-compensated diffusion encoding), the signal magnitude modulation due to the leftover eddy-current-induced phase error cannot be eliminated by the conventional phase cycling and sum-of-squares (SOS) method. By employing magnitude stabilizers, the phase-error-induced magnitude modulation, regardless of its cause, was removed but the phase modulation remained. The in vivo comparison between pulsed gradient and flow-compensated diffusion preparations showed that MiPe needed to be addressed in multi-shot dprep-TSE acquisitions employing magnitude stabilizers. Conclusions A comprehensive analysis of phase errors in dprep-TSE sequences showed that magnitude stabilizers are mandatory in removing the phase error induced magnitude modulation. Additionally, when multi-shot dprep-TSE is employed the inconsistent signal phase modulation across shots has to be resolved before shot-combination is performed. PMID:28516049

  20. Model Performance Evaluation and Scenario Analysis (MPESA) Tutorial

    EPA Pesticide Factsheets

    The model performance evaluation consists of metrics and model diagnostics. These metrics provides modelers with statistical goodness-of-fit measures that capture magnitude only, sequence only, and combined magnitude and sequence errors.

  1. Sequence polymorphism in an insect RNA virus field population: A snapshot from a single point in space and time reveals stochastic differences among and within individual hosts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stenger, Drake C., E-mail: drake.stenger@ars.usda.

    Population structure of Homalodisca coagulata Virus-1 (HoCV-1) among and within field-collected insects sampled from a single point in space and time was examined. Polymorphism in complete consensus sequences among single-insect isolates was dominated by synonymous substitutions. The mutant spectrum of the C2 helicase region within each single-insect isolate was unique and dominated by nonsynonymous singletons. Bootstrapping was used to correct the within-isolate nonsynonymous:synonymous arithmetic ratio (N:S) for RT-PCR error, yielding an N:S value ~one log-unit greater than that of consensus sequences. Probability of all possible single-base substitutions for the C2 region predicted N:S values within 95% confidence limits of themore » corrected within-isolate N:S when the only constraint imposed was viral polymerase error bias for transitions over transversions. These results indicate that bottlenecks coupled with strong negative/purifying selection drive consensus sequences toward neutral sequence space, and that most polymorphism within single-insect isolates is composed of newly-minted mutations sampled prior to selection. -- Highlights: •Sampling protocol minimized differential selection/history among isolates. •Polymorphism among consensus sequences dominated by negative/purifying selection. •Within-isolate N:S ratio corrected for RT-PCR error by bootstrapping. •Within-isolate mutant spectrum dominated by new mutations yet to undergo selection.« less

  2. TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees.

    PubMed

    Mai, Uyen; Mirarab, Siavash

    2018-05-08

    Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. We propose an automatic method to detect such errors. We build a phylogeny including all the data then detect sequences that artificially inflate the tree diameter. We formulate an optimization problem, called the k-shrink problem, that seeks to find k leaves that could be removed to maximally reduce the tree diameter. We present an algorithm to find the exact solution for this problem in polynomial time. We then use several statistical tests to find outlier species that have an unexpectedly high impact on the tree diameter. These tests can use a single tree or a set of related gene trees and can also adjust to species-specific patterns of branch length. The resulting method is called TreeShrink. We test our method on six phylogenomic biological datasets and an HIV dataset and show that the method successfully detects and removes long branches. TreeShrink removes sequences more conservatively than rogue taxon removal and often reduces gene tree discordance more than rogue taxon removal once the amount of filtering is controlled. TreeShrink is an effective method for detecting sequences that lead to unrealistically long branch lengths in phylogenetic trees. The tool is publicly available at https://github.com/uym2/TreeShrink .

  3. What Information is Stored in DNA: Does it Contain Digital Error Correcting Codes?

    NASA Astrophysics Data System (ADS)

    Liebovitch, Larry

    1998-03-01

    The longest term correlations in living systems are the information stored in DNA which reflects the evolutionary history of an organism. The 4 bases (A,T,G,C) encode sequences of amino acids as well as locations of binding sites for proteins that regulate DNA. The fidelity of this important information is maintained by ANALOG error check mechanisms. When a single strand of DNA is replicated the complementary base is inserted in the new strand. Sometimes the wrong base is inserted that sticks out disrupting the phosphate backbone. The new base is not yet methylated, so repair enzymes, that slide along the DNA, can tear out the wrong base and replace it with the right one. The bases in DNA form a sequence of 4 different symbols and so the information is encoded in a DIGITAL form. All the digital codes in our society (ISBN book numbers, UPC product codes, bank account numbers, airline ticket numbers) use error checking code, where some digits are functions of other digits to maintain the fidelity of transmitted informaiton. Does DNA also utitlize a DIGITAL error chekcing code to maintain the fidelity of its information and increase the accuracy of replication? That is, are some bases in DNA functions of other bases upstream or downstream? This raises the interesting mathematical problem: How does one determine whether some symbols in a sequence of symbols are a function of other symbols. It also bears on the issue of determining algorithmic complexity: What is the function that generates the shortest algorithm for reproducing the symbol sequence. The error checking codes most used in our technology are linear block codes. We developed an efficient method to test for the presence of such codes in DNA. We coded the 4 bases as (0,1,2,3) and used Gaussian elimination, modified for modulus 4, to test if some bases are linear combinations of other bases. We used this method to analyze the base sequence in the genes from the lac operon and cytochrome C. We did not find evidence for such error correcting codes in these genes. However, we analyzed only a small amount of DNA and if digitial error correcting schemes are present in DNA, they may be more subtle than such simple linear block codes. The basic issue we raise here, is how information is stored in DNA and an appreciation that digital symbol sequences, such as DNA, admit of interesting schemes to store and protect the fidelity of their information content. Liebovitch, Tao, Todorov, Levine. 1996. Biophys. J. 71:1539-1544. Supported by NIH grant EY6234.

  4. Deep sequencing analysis of viral infection and evolution allows rapid and detailed characterization of viral mutant spectrum.

    PubMed

    Isakov, Ofer; Bordería, Antonio V; Golan, David; Hamenahem, Amir; Celniker, Gershon; Yoffe, Liron; Blanc, Hervé; Vignuzzi, Marco; Shomron, Noam

    2015-07-01

    The study of RNA virus populations is a challenging task. Each population of RNA virus is composed of a collection of different, yet related genomes often referred to as mutant spectra or quasispecies. Virologists using deep sequencing technologies face major obstacles when studying virus population dynamics, both experimentally and in natural settings due to the relatively high error rates of these technologies and the lack of high performance pipelines. In order to overcome these hurdles we developed a computational pipeline, termed ViVan (Viral Variance Analysis). ViVan is a complete pipeline facilitating the identification, characterization and comparison of sequence variance in deep sequenced virus populations. Applying ViVan on deep sequenced data obtained from samples that were previously characterized by more classical approaches, we uncovered novel and potentially crucial aspects of virus populations. With our experimental work, we illustrate how ViVan can be used for studies ranging from the more practical, detection of resistant mutations and effects of antiviral treatments, to the more theoretical temporal characterization of the population in evolutionary studies. Freely available on the web at http://www.vivanbioinfo.org : nshomron@post.tau.ac.il Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  5. A Generalized Least-Squares Estimate for the Origin of Sporophytic Self-Incompatibility

    PubMed Central

    Uyenoyama, M. K.

    1995-01-01

    Analysis of nucleotide sequences that regulate the expression of self-incompatibility in flowering plants affords a direct means of examining classical hypotheses for the origin and evolution of this major feature of mating systems. Departing from the classical view of monophyly of all forms of self-incompatibility, the current paradigm for the origin of self-incompatibility postulates multiple episodes of recruitment and modification of preexisting genes. In Brassica, the S locus, which regulates sporophytic self-incompatibility, shows homology to a multigene family present both in self-compatible congeners and in groups for which this form of self-incompatibility is atypical. A phylogenetic analysis of S-allele sequences together with homologous sequences that do not cosegregate with self-incompatibility permits dating the change of function that marked the origin of self-incompatibility. A generalized least-squares method is introduced that provides closed-form expressions for estimates and standard errors for function-specific divergence rates and times of divergence among sequences. This analysis suggests that the age of the sporophytic self-incompatibility system expressed in Brassica exceeds species divergence within the genus by four- to fivefold. The extraordinarily high levels of sequence diversity exhibited by S alleles appears to reflect their ancient derivation, with the alternative hypothesis of hypermutability rejected by the analysis. PMID:7713446

  6. Extended RF shimming: Sequence-level parallel transmission optimization applied to steady-state free precession MRI of the heart.

    PubMed

    Beqiri, Arian; Price, Anthony N; Padormo, Francesco; Hajnal, Joseph V; Malik, Shaihan J

    2017-06-01

    Cardiac magnetic resonance imaging (MRI) at high field presents challenges because of the high specific absorption rate and significant transmit field (B 1 + ) inhomogeneities. Parallel transmission MRI offers the ability to correct for both issues at the level of individual radiofrequency (RF) pulses, but must operate within strict hardware and safety constraints. The constraints are themselves affected by sequence parameters, such as the RF pulse duration and TR, meaning that an overall optimal operating point exists for a given sequence. This work seeks to obtain optimal performance by performing a 'sequence-level' optimization in which pulse sequence parameters are included as part of an RF shimming calculation. The method is applied to balanced steady-state free precession cardiac MRI with the objective of minimizing TR, hence reducing the imaging duration. Results are demonstrated using an eight-channel parallel transmit system operating at 3 T, with an in vivo study carried out on seven male subjects of varying body mass index (BMI). Compared with single-channel operation, a mean-squared-error shimming approach leads to reduced imaging durations of 32 ± 3% with simultaneous improvement in flip angle homogeneity of 32 ± 8% within the myocardium. © 2017 The Authors. NMR in Biomedicine published by John Wiley & Sons Ltd.

  7. Medication errors in chemotherapy preparation and administration: a survey conducted among oncology nurses in Turkey.

    PubMed

    Ulas, Arife; Silay, Kamile; Akinci, Sema; Dede, Didem Sener; Akinci, Muhammed Bulent; Sendur, Mehmet Ali Nahit; Cubukcu, Erdem; Coskun, Hasan Senol; Degirmenci, Mustafa; Utkan, Gungor; Ozdemir, Nuriye; Isikdogan, Abdurrahman; Buyukcelik, Abdullah; Inanc, Mevlude; Bilici, Ahmet; Odabasi, Hatice; Cihan, Sener; Avci, Nilufer; Yalcin, Bulent

    2015-01-01

    Medication errors in oncology may cause severe clinical problems due to low therapeutic indices and high toxicity of chemotherapeutic agents. We aimed to investigate unintentional medication errors and underlying factors during chemotherapy preparation and administration based on a systematic survey conducted to reflect oncology nurses experience. This study was conducted in 18 adult chemotherapy units with volunteer participation of 206 nurses. A survey developed by primary investigators and medication errors (MAEs) defined preventable errors during prescription of medication, ordering, preparation or administration. The survey consisted of 4 parts: demographic features of nurses; workload of chemotherapy units; errors and their estimated monthly number during chemotherapy preparation and administration; and evaluation of the possible factors responsible from ME. The survey was conducted by face to face interview and data analyses were performed with descriptive statistics. Chi-square or Fisher exact tests were used for a comparative analysis of categorical data. Some 83.4% of the 210 nurses reported one or more than one error during chemotherapy preparation and administration. Prescribing or ordering wrong doses by physicians (65.7%) and noncompliance with administration sequences during chemotherapy administration (50.5%) were the most common errors. The most common estimated average monthly error was not following the administration sequence of the chemotherapeutic agents (4.1 times/month, range 1-20). The most important underlying reasons for medication errors were heavy workload (49.7%) and insufficient number of staff (36.5%). Our findings suggest that the probability of medication error is very high during chemotherapy preparation and administration, the most common involving prescribing and ordering errors. Further studies must address the strategies to minimize medication error in chemotherapy receiving patients, determine sufficient protective measures and establishing multistep control mechanisms.

  8. A cascaded coding scheme for error control and its performance analysis

    NASA Technical Reports Server (NTRS)

    Lin, S.

    1986-01-01

    A coding scheme for error control in data communication systems is investigated. The scheme is obtained by cascading two error correcting codes, called the inner and the outer codes. The error performance of the scheme is analyzed for a binary symmetric channel with bit error rate epsilon < 1/2. It is shown that, if the inner and outer codes are chosen properly, extremely high reliability can be attained even for a high channel bit error rate. Various specific example schemes with inner codes ranging from high rates to very low rates and Reed-Solomon codes are considered, and their probabilities are evaluated. They all provide extremely high reliability even for very high bit error rates, say 0.1 to 0.01. Several example schemes are being considered by NASA for satellite and spacecraft down link error control.

  9. Accurate Typing of Human Leukocyte Antigen Class I Genes by Oxford Nanopore Sequencing.

    PubMed

    Liu, Chang; Xiao, Fangzhou; Hoisington-Lopez, Jessica; Lang, Kathrin; Quenzel, Philipp; Duffy, Brian; Mitra, Robi David

    2018-04-03

    Oxford Nanopore Technologies' MinION has expanded the current DNA sequencing toolkit by delivering long read lengths and extreme portability. The MinION has the potential to enable expedited point-of-care human leukocyte antigen (HLA) typing, an assay routinely used to assess the immunologic compatibility between organ donors and recipients, but the platform's high error rate makes it challenging to type alleles with accuracy. We developed and validated accurate typing of HLA by Oxford nanopore (Athlon), a bioinformatic pipeline that i) maps nanopore reads to a database of known HLA alleles, ii) identifies candidate alleles with the highest read coverage at different resolution levels that are represented as branching nodes and leaves of a tree structure, iii) generates consensus sequences by remapping the reads to the candidate alleles, and iv) calls the final diploid genotype by blasting consensus sequences against the reference database. Using two independent data sets generated on the R9.4 flow cell chemistry, Athlon achieved a 100% accuracy in class I HLA typing at the two-field resolution. Copyright © 2018 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  10. VARiD: a variation detection framework for color-space and letter-space platforms.

    PubMed

    Dalca, Adrian V; Rumble, Stephen M; Levy, Samuel; Brudno, Michael

    2010-06-15

    High-throughput sequencing (HTS) technologies are transforming the study of genomic variation. The various HTS technologies have different sequencing biases and error rates, and while most HTS technologies sequence the residues of the genome directly, generating base calls for each position, the Applied Biosystem's SOLiD platform generates dibase-coded (color space) sequences. While combining data from the various platforms should increase the accuracy of variation detection, to date there are only a few tools that can identify variants from color space data, and none that can analyze color space and regular (letter space) data together. We present VARiD--a probabilistic method for variation detection from both letter- and color-space reads simultaneously. VARiD is based on a hidden Markov model and uses the forward-backward algorithm to accurately identify heterozygous, homozygous and tri-allelic SNPs, as well as micro-indels. Our analysis shows that VARiD performs better than the AB SOLiD toolset at detecting variants from color-space data alone, and improves the calls dramatically when letter- and color-space reads are combined. The toolset is freely available at http://compbio.cs.utoronto.ca/varid.

  11. A Golay complementary TS-based symbol synchronization scheme in variable rate LDPC-coded MB-OFDM UWBoF system

    NASA Astrophysics Data System (ADS)

    He, Jing; Wen, Xuejie; Chen, Ming; Chen, Lin

    2015-09-01

    In this paper, a Golay complementary training sequence (TS)-based symbol synchronization scheme is proposed and experimentally demonstrated in multiband orthogonal frequency division multiplexing (MB-OFDM) ultra-wideband over fiber (UWBoF) system with a variable rate low-density parity-check (LDPC) code. Meanwhile, the coding gain and spectral efficiency in the variable rate LDPC-coded MB-OFDM UWBoF system are investigated. By utilizing the non-periodic auto-correlation property of the Golay complementary pair, the start point of LDPC-coded MB-OFDM UWB signal can be estimated accurately. After 100 km standard single-mode fiber (SSMF) transmission, at the bit error rate of 1×10-3, the experimental results show that the short block length 64QAM-LDPC coding provides a coding gain of 4.5 dB, 3.8 dB and 2.9 dB for a code rate of 62.5%, 75% and 87.5%, respectively.

  12. Nonadiabatic conditional geometric phase shift with NMR.

    PubMed

    Xiang-Bin, W; Keiji, M

    2001-08-27

    A conditional geometric phase shift gate, which is fault tolerant to certain types of errors due to its geometric nature, was realized recently via nuclear magnetic resonance (NMR) under adiabatic conditions. However, in quantum computation, everything must be completed within the decoherence time. The adiabatic condition makes any fast conditional Berry phase (cyclic adiabatic geometric phase) shift gate impossible. Here we show that by using a newly designed sequence of simple operations with an additional vertical magnetic field, the conditional geometric phase shift gate can be run nonadiabatically. Therefore geometric quantum computation can be done at the same rate as usual quantum computation.

  13. Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing.

    PubMed

    Misra, Sanchit; Agrawal, Ankit; Liao, Wei-keng; Choudhary, Alok

    2011-01-15

    Recently, a number of programs have been proposed for mapping short reads to a reference genome. Many of them are heavily optimized for short-read mapping and hence are very efficient for shorter queries, but that makes them inefficient or not applicable for reads longer than 200 bp. However, many sequencers are already generating longer reads and more are expected to follow. For long read sequence mapping, there are limited options; BLAT, SSAHA2, FANGS and BWA-SW are among the popular ones. However, resequencing and personalized medicine need much faster software to map these long sequencing reads to a reference genome to identify SNPs or rare transcripts. We present AGILE (AliGnIng Long rEads), a hash table based high-throughput sequence mapping algorithm for longer 454 reads that uses diagonal multiple seed-match criteria, customized q-gram filtering and a dynamic incremental search approach among other heuristics to optimize every step of the mapping process. In our experiments, we observe that AGILE is more accurate than BLAT, and comparable to BWA-SW and SSAHA2. For practical error rates (< 5%) and read lengths (200-1000 bp), AGILE is significantly faster than BLAT, SSAHA2 and BWA-SW. Even for the other cases, AGILE is comparable to BWA-SW and several times faster than BLAT and SSAHA2. http://www.ece.northwestern.edu/~smi539/agile.html.

  14. Methods of automatic nucleotide-sequence analysis. Multicomponent spectrophotometric analysis of mixtures of nucleic acid components by a least-squares procedure

    PubMed Central

    Lee, Sheila; McMullen, D.; Brown, G. L.; Stokes, A. R.

    1965-01-01

    1. A theoretical analysis of the errors in multicomponent spectrophotometric analysis of nucleoside mixtures, by a least-squares procedure, has been made to obtain an expression for the error coefficient, relating the error in calculated concentration to the error in extinction measurements. 2. The error coefficients, which depend only on the `library' of spectra used to fit the experimental curves, have been computed for a number of `libraries' containing the following nucleosides found in s-RNA: adenosine, guanosine, cytidine, uridine, 5-ribosyluracil, 7-methylguanosine, 6-dimethylaminopurine riboside, 6-methylaminopurine riboside and thymine riboside. 3. The error coefficients have been used to determine the best conditions for maximum accuracy in the determination of the compositions of nucleoside mixtures. 4. Experimental determinations of the compositions of nucleoside mixtures have been made and the errors found to be consistent with those predicted by the theoretical analysis. 5. It has been demonstrated that, with certain precautions, the multicomponent spectrophotometric method described is suitable as a basis for automatic nucleotide-composition analysis of oligonucleotides containing nine nucleotides. Used in conjunction with continuous chromatography and flow chemical techniques, this method can be applied to the study of the sequence of s-RNA. PMID:14346087

  15. Overcoming Species Boundaries in Peptide Identification with Bayesian Information Criterion-driven Error-tolerant Peptide Search (BICEPS)*

    PubMed Central

    Renard, Bernhard Y.; Xu, Buote; Kirchner, Marc; Zickmann, Franziska; Winter, Dominic; Korten, Simone; Brattig, Norbert W.; Tzur, Amit; Hamprecht, Fred A.; Steen, Hanno

    2012-01-01

    Currently, the reliable identification of peptides and proteins is only feasible when thoroughly annotated sequence databases are available. Although sequencing capacities continue to grow, many organisms remain without reliable, fully annotated reference genomes required for proteomic analyses. Standard database search algorithms fail to identify peptides that are not exactly contained in a protein database. De novo searches are generally hindered by their restricted reliability, and current error-tolerant search strategies are limited by global, heuristic tradeoffs between database and spectral information. We propose a Bayesian information criterion-driven error-tolerant peptide search (BICEPS) and offer an open source implementation based on this statistical criterion to automatically balance the information of each single spectrum and the database, while limiting the run time. We show that BICEPS performs as well as current database search algorithms when such algorithms are applied to sequenced organisms, whereas BICEPS only uses a remotely related organism database. For instance, we use a chicken instead of a human database corresponding to an evolutionary distance of more than 300 million years (International Chicken Genome Sequencing Consortium (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716). We demonstrate the successful application to cross-species proteomics with a 33% increase in the number of identified proteins for a filarial nematode sample of Litomosoides sigmodontis. PMID:22493179

  16. Single-virion sequencing of lamivudine-treated HBV populations reveal population evolution dynamics and demographic history.

    PubMed

    Zhu, Yuan O; Aw, Pauline P K; de Sessions, Paola Florez; Hong, Shuzhen; See, Lee Xian; Hong, Lewis Z; Wilm, Andreas; Li, Chen Hao; Hue, Stephane; Lim, Seng Gee; Nagarajan, Niranjan; Burkholder, William F; Hibberd, Martin

    2017-10-27

    Viral populations are complex, dynamic, and fast evolving. The evolution of groups of closely related viruses in a competitive environment is termed quasispecies. To fully understand the role that quasispecies play in viral evolution, characterizing the trajectories of viral genotypes in an evolving population is the key. In particular, long-range haplotype information for thousands of individual viruses is critical; yet generating this information is non-trivial. Popular deep sequencing methods generate relatively short reads that do not preserve linkage information, while third generation sequencing methods have higher error rates that make detection of low frequency mutations a bioinformatics challenge. Here we applied BAsE-Seq, an Illumina-based single-virion sequencing technology, to eight samples from four chronic hepatitis B (CHB) patients - once before antiviral treatment and once after viral rebound due to resistance. With single-virion sequencing, we obtained 248-8796 single-virion sequences per sample, which allowed us to find evidence for both hard and soft selective sweeps. We were able to reconstruct population demographic history that was independently verified by clinically collected data. We further verified four of the samples independently through PacBio SMRT and Illumina Pooled deep sequencing. Overall, we showed that single-virion sequencing yields insight into viral evolution and population dynamics in an efficient and high throughput manner. We believe that single-virion sequencing is widely applicable to the study of viral evolution in the context of drug resistance and host adaptation, allows differentiation between soft or hard selective sweeps, and may be useful in the reconstruction of intra-host viral population demographic history.

  17. The effect of sampling rate and lowpass filters on saccades - A modeling approach.

    PubMed

    Mack, David J; Belfanti, Sandro; Schwarz, Urs

    2017-12-01

    The study of eye movements has become popular in many fields of science. However, using the preprocessed output of an eye tracker without scrutiny can lead to low-quality or even erroneous data. For example, the sampling rate of the eye tracker influences saccadic peak velocity, while inadequate filters fail to suppress noise or introduce artifacts. Despite previously published guiding values, most filter choices still seem motivated by a trial-and-error approach, and a thorough analysis of filter effects is missing. Therefore, we developed a simple and easy-to-use saccade model that incorporates measured amplitude-velocity main sequences and produces saccades with a similar frequency content to real saccades. We also derived a velocity divergence measure to rate deviations between velocity profiles. In total, we simulated 155 saccades ranging from 0.5° to 60° and subjected them to different sampling rates, noise compositions, and various filter settings. The final goal was to compile a list with the best filter settings for each of these conditions. Replicating previous findings, we observed reduced peak velocities at lower sampling rates. However, this effect was highly non-linear over amplitudes and increasingly stronger for smaller saccades. Interpolating the data to a higher sampling rate significantly reduced this effect. We hope that our model and the velocity divergence measure will be used to provide a quickly accessible ground truth without the need for recording and manually labeling saccades. The comprehensive list of filters allows one to choose the correct filter for analyzing saccade data without resorting to trial-and-error methods.

  18. Effect of bar-code technology on the safety of medication administration.

    PubMed

    Poon, Eric G; Keohane, Carol A; Yoon, Catherine S; Ditmore, Matthew; Bane, Anne; Levtzion-Korach, Osnat; Moniz, Thomas; Rothschild, Jeffrey M; Kachalia, Allen B; Hayes, Judy; Churchill, William W; Lipsitz, Stuart; Whittemore, Anthony D; Bates, David W; Gandhi, Tejal K

    2010-05-06

    Serious medication errors are common in hospitals and often occur during order transcription or administration of medication. To help prevent such errors, technology has been developed to verify medications by incorporating bar-code verification technology within an electronic medication-administration system (bar-code eMAR). We conducted a before-and-after, quasi-experimental study in an academic medical center that was implementing the bar-code eMAR. We assessed rates of errors in order transcription and medication administration on units before and after implementation of the bar-code eMAR. Errors that involved early or late administration of medications were classified as timing errors and all others as nontiming errors. Two clinicians reviewed the errors to determine their potential to harm patients and classified those that could be harmful as potential adverse drug events. We observed 14,041 medication administrations and reviewed 3082 order transcriptions. Observers noted 776 nontiming errors in medication administration on units that did not use the bar-code eMAR (an 11.5% error rate) versus 495 such errors on units that did use it (a 6.8% error rate)--a 41.4% relative reduction in errors (P<0.001). The rate of potential adverse drug events (other than those associated with timing errors) fell from 3.1% without the use of the bar-code eMAR to 1.6% with its use, representing a 50.8% relative reduction (P<0.001). The rate of timing errors in medication administration fell by 27.3% (P<0.001), but the rate of potential adverse drug events associated with timing errors did not change significantly. Transcription errors occurred at a rate of 6.1% on units that did not use the bar-code eMAR but were completely eliminated on units that did use it. Use of the bar-code eMAR substantially reduced the rate of errors in order transcription and in medication administration as well as potential adverse drug events, although it did not eliminate such errors. Our data show that the bar-code eMAR is an important intervention to improve medication safety. (ClinicalTrials.gov number, NCT00243373.) 2010 Massachusetts Medical Society

  19. Development and implementation of a human accuracy program in patient foodservice.

    PubMed

    Eden, S H; Wood, S M; Ptak, K M

    1987-04-01

    For many years, industry has utilized the concept of human error rates to monitor and minimize human errors in the production process. A consistent quality-controlled product increases consumer satisfaction and repeat purchase of product. Administrative dietitians have applied the concepts of using human error rates (the number of errors divided by the number of opportunities for error) at four hospitals, with a total bed capacity of 788, within a tertiary-care medical center. Human error rate was used to monitor and evaluate trayline employee performance and to evaluate layout and tasks of trayline stations, in addition to evaluating employees in patient service areas. Long-term employees initially opposed the error rate system with some hostility and resentment, while newer employees accepted the system. All employees now believe that the constant feedback given by supervisors enhances their self-esteem and productivity. Employee error rates are monitored daily and are used to counsel employees when necessary; they are also utilized during annual performance evaluation. Average daily error rates for a facility staffed by new employees decreased from 7% to an acceptable 3%. In a facility staffed by long-term employees, the error rate increased, reflecting improper error documentation. Patient satisfaction surveys reveal satisfaction, for tray accuracy increased from 88% to 92% in the facility staffed by long-term employees and has remained above the 90% standard in the facility staffed by new employees.

  20. Gene calling and bacterial genome annotation with BG7.

    PubMed

    Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

    2015-01-01

    New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).

  1. Death certificate completion skills of hospital physicians in a developing country.

    PubMed

    Haque, Ahmed Suleman; Shamim, Kanza; Siddiqui, Najm Hasan; Irfan, Muhammad; Khan, Javaid Ahmed

    2013-06-06

    Death certificates (DC) can provide valuable health status data regarding disease incidence, prevalence and mortality in a community. It can guide local health policy and help in setting priorities. Incomplete and inaccurate DC data, on the other hand, can significantly impair the precision of a national health information database. In this study we evaluated the accuracy of death certificates at a tertiary care teaching hospital in a Karachi, Pakistan. A retrospective study conducted at Aga Khan University Hospital, Karachi, Pakistan for a period of six months. Medical records and death certificates of all patients who died under adult medical service were studied. The demographic characteristics, administrative details, co-morbidities and cause of death from death certificates were collected using an approved standardized form. Accuracy of this information was validated using their medical records. Errors in the death certificates were classified into six categories, from 0 to 5 according to increasing severity; a grade 0 was assigned if no errors were identified, and 5, if an incorrect cause of death was attributed or placed in an improper sequence. 223 deaths occurred during the study period. 9 certificates were not accessible and 12 patients had incomplete medical records. 202 certificates were finally analyzed. Most frequent errors pertaining to patients' demographics (92%) and cause/s of death (87%) were identified. 156 (77%) certificates had 3 or more errors and 124 (62%) certificates had a combination of errors that significantly changed the death certificate interpretation. Only 1% certificates were error free. A very high rate of errors was identified in death certificates completed at our academic institution. There is a pressing need for appropriate intervention/s to resolve this important issue.

  2. Importance of DNA repair in tumor suppression

    NASA Astrophysics Data System (ADS)

    Brumer, Yisroel; Shakhnovich, Eugene I.

    2004-12-01

    The transition from a normal to cancerous cell requires a number of highly specific mutations that affect cell cycle regulation, apoptosis, differentiation, and many other cell functions. One hallmark of cancerous genomes is genomic instability, with mutation rates far greater than those of normal cells. In microsatellite instability (MIN tumors), these are often caused by damage to mismatch repair genes, allowing further mutation of the genome and tumor progression. These mutation rates may lie near the error catastrophe found in the quasispecies model of adaptive RNA genomes, suggesting that further increasing mutation rates will destroy cancerous genomes. However, recent results have demonstrated that DNA genomes exhibit an error threshold at mutation rates far lower than their conservative counterparts. Furthermore, while the maximum viable mutation rate in conservative systems increases indefinitely with increasing master sequence fitness, the semiconservative threshold plateaus at a relatively low value. This implies a paradox, wherein inaccessible mutation rates are found in viable tumor cells. In this paper, we address this paradox, demonstrating an isomorphism between the conservatively replicating (RNA) quasispecies model and the semiconservative (DNA) model with post-methylation DNA repair mechanisms impaired. Thus, as DNA repair becomes inactivated, the maximum viable mutation rate increases smoothly to that of a conservatively replicating system on a transformed landscape, with an upper bound that is dependent on replication rates. On a specific single fitness peak landscape, the repair-free semiconservative system is shown to mimic a conservative system exactly. We postulate that inactivation of post-methylation repair mechanisms is fundamental to the progression of a tumor cell and hence these mechanisms act as a method for the prevention and destruction of cancerous genomes.

  3. Quality Control of High-Dose-Rate Brachytherapy: Treatment Delivery Analysis Using Statistical Process Control

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Able, Charles M., E-mail: cable@wfubmc.edu; Bright, Megan; Frizzell, Bart

    Purpose: Statistical process control (SPC) is a quality control method used to ensure that a process is well controlled and operates with little variation. This study determined whether SPC was a viable technique for evaluating the proper operation of a high-dose-rate (HDR) brachytherapy treatment delivery system. Methods and Materials: A surrogate prostate patient was developed using Vyse ordnance gelatin. A total of 10 metal oxide semiconductor field-effect transistors (MOSFETs) were placed from prostate base to apex. Computed tomography guidance was used to accurately position the first detector in each train at the base. The plan consisted of 12 needles withmore » 129 dwell positions delivering a prescribed peripheral dose of 200 cGy. Sixteen accurate treatment trials were delivered as planned. Subsequently, a number of treatments were delivered with errors introduced, including wrong patient, wrong source calibration, wrong connection sequence, single needle displaced inferiorly 5 mm, and entire implant displaced 2 mm and 4 mm inferiorly. Two process behavior charts (PBC), an individual and a moving range chart, were developed for each dosimeter location. Results: There were 4 false positives resulting from 160 measurements from 16 accurately delivered treatments. For the inaccurately delivered treatments, the PBC indicated that measurements made at the periphery and apex (regions of high-dose gradient) were much more sensitive to treatment delivery errors. All errors introduced were correctly identified by either the individual or the moving range PBC in the apex region. Measurements at the urethra and base were less sensitive to errors. Conclusions: SPC is a viable method for assessing the quality of HDR treatment delivery. Further development is necessary to determine the most effective dose sampling, to ensure reproducible evaluation of treatment delivery accuracy.« less

  4. CBrowse: a SAM/BAM-based contig browser for transcriptome assembly visualization and analysis.

    PubMed

    Li, Pei; Ji, Guoli; Dong, Min; Schmidt, Emily; Lenox, Douglas; Chen, Liangliang; Liu, Qi; Liu, Lin; Zhang, Jie; Liang, Chun

    2012-09-15

    To address the impending need for exploring rapidly increased transcriptomics data generated for non-model organisms, we developed CBrowse, an AJAX-based web browser for visualizing and analyzing transcriptome assemblies and contigs. Designed in a standard three-tier architecture with a data pre-processing pipeline, CBrowse is essentially a Rich Internet Application that offers many seamlessly integrated web interfaces and allows users to navigate, sort, filter, search and visualize data smoothly. The pre-processing pipeline takes the contig sequence file in FASTA format and its relevant SAM/BAM file as the input; detects putative polymorphisms, simple sequence repeats and sequencing errors in contigs and generates image, JSON and database-compatible CSV text files that are directly utilized by different web interfaces. CBowse is a generic visualization and analysis tool that facilitates close examination of assembly quality, genetic polymorphisms, sequence repeats and/or sequencing errors in transcriptome sequencing projects. CBrowse is distributed under the GNU General Public License, available at http://bioinfolab.muohio.edu/CBrowse/ liangc@muohio.edu or liangc.mu@gmail.com; glji@xmu.edu.cn Supplementary data are available at Bioinformatics online.

  5. Implicit transfer of reversed temporal structure in visuomotor sequence learning.

    PubMed

    Tanaka, Kanji; Watanabe, Katsumi

    2014-04-01

    Some spatio-temporal structures are easier to transfer implicitly in sequential learning. In this study, we investigated whether the consistent reversal of triads of learned components would support the implicit transfer of their temporal structure in visuomotor sequence learning. A triad comprised three sequential button presses ([1][2][3]) and seven consecutive triads comprised a sequence. Participants learned sequences by trial and error, until they could complete it 20 times without error. Then, they learned another sequence, in which each triad was reversed ([3][2][1]), partially reversed ([2][1][3]), or switched so as not to overlap with the other conditions ([2][3][1] or [3][1][2]). Even when the participants did not notice the alternation rule, the consistent reversal of the temporal structure of each triad led to better implicit transfer; this was confirmed in a subsequent experiment. These results suggest that the implicit transfer of the temporal structure of a learned sequence can be influenced by both the structure and consistency of the change. Copyright © 2013 Cognitive Science Society, Inc.

  6. The influence of the structure and culture of medical group practices on prescription drug errors.

    PubMed

    Kralewski, John E; Dowd, Bryan E; Heaton, Alan; Kaissi, Amer

    2005-08-01

    This project was designed to identify the magnitude of prescription drug errors in medical group practices and to explore the influence of the practice structure and culture on those error rates. Seventy-eight practices serving an upper Midwest managed care (Care Plus) plan during 2001 were included in the study. Using Care Plus claims data, prescription drug error rates were calculated at the enrollee level and then were aggregated to the group practice that each enrollee selected to provide and manage their care. Practice structure and culture data were obtained from surveys of the practices. Data were analyzed using multivariate regression. Both the culture and the structure of these group practices appear to influence prescription drug error rates. Seeing more patients per clinic hour, more prescriptions per patient, and being cared for in a rural clinic were all strongly associated with more errors. Conversely, having a case manager program is strongly related to fewer errors in all of our analyses. The culture of the practices clearly influences error rates, but the findings are mixed. Practices with cohesive cultures have lower error rates but, contrary to our hypothesis, cultures that value physician autonomy and individuality also have lower error rates than those with a more organizational orientation. Our study supports the contention that there are a substantial number of prescription drug errors in the ambulatory care sector. Even by the strictest definition, there were about 13 errors per 100 prescriptions for Care Plus patients in these group practices during 2001. Our study demonstrates that the structure of medical group practices influences prescription drug error rates. In some cases, this appears to be a direct relationship, such as the effects of having a case manager program on fewer drug errors, but in other cases the effect appears to be indirect through the improvement of drug prescribing practices. An important aspect of this study is that it provides insights into the relationships of the structure and culture of medical group practices and prescription drug errors and provides direction for future research. Research focused on the factors influencing the high error rates in rural areas and how the interaction of practice structural and cultural attributes influence error rates would add important insights into our findings. For medical practice directors, our data show that they should focus on patient care coordination to reduce errors.

  7. Robust video super-resolution with registration efficiency adaptation

    NASA Astrophysics Data System (ADS)

    Zhang, Xinfeng; Xiong, Ruiqin; Ma, Siwei; Zhang, Li; Gao, Wen

    2010-07-01

    Super-Resolution (SR) is a technique to construct a high-resolution (HR) frame by fusing a group of low-resolution (LR) frames describing the same scene. The effectiveness of the conventional super-resolution techniques, when applied on video sequences, strongly relies on the efficiency of motion alignment achieved by image registration. Unfortunately, such efficiency is limited by the motion complexity in the video and the capability of adopted motion model. In image regions with severe registration errors, annoying artifacts usually appear in the produced super-resolution video. This paper proposes a robust video super-resolution technique that adapts itself to the spatially-varying registration efficiency. The reliability of each reference pixel is measured by the corresponding registration error and incorporated into the optimization objective function of SR reconstruction. This makes the SR reconstruction highly immune to the registration errors, as outliers with higher registration errors are assigned lower weights in the objective function. In particular, we carefully design a mechanism to assign weights according to registration errors. The proposed superresolution scheme has been tested with various video sequences and experimental results clearly demonstrate the effectiveness of the proposed method.

  8. Comparison of visual and emotional continuous performance test related to sequence of presentation, gender and age.

    PubMed

    Markovska-Simoska, S; Pop-Jordanova, N

    2009-07-01

    (Full text is available at http://www.manu.edu.mk/prilozi). Continous Performance Tests (CPTs) form a group of paradigms for the evaluation of attention and, to a lesser degree, the response inhibition (or disinhibition) component of executive control. The object of this study was to compare performance on a CPT using both visual and emotional tasks in 46 normal adult subjects. In particular, it was to examine the effects of the type of task (VCPT or ECPT), sequence of presentation, and gender/age influence on performance as measured errors of omission, errors of commission, reaction time and variation of reaction time. From the results we can assume that there are significantly worse performance parameters for ECPT than VCPT tasks, with a probable explanation of the influence of emotional stimuli on attention and information-processing and no significant effect of order of presentation and gender on performance. Significant differences with more omission errors for older groups were obtained, showing better attention in younger subjects. Key words: VCPT, ECPT, omission errors, commission errors, reaction time, variation of reaction time, normal adults.

  9. The (not so) immortal strand hypothesis.

    PubMed

    Tomasetti, Cristian; Bozic, Ivana

    2015-03-01

    Non-random segregation of DNA strands during stem cell replication has been proposed as a mechanism to minimize accumulated genetic errors in stem cells of rapidly dividing tissues. According to this hypothesis, an "immortal" DNA strand is passed to the stem cell daughter and not the more differentiated cell, keeping the stem cell lineage replication error-free. After it was introduced, experimental evidence both in favor and against the hypothesis has been presented. Using a novel methodology that utilizes cancer sequencing data we are able to estimate the rate of accumulation of mutations in healthy stem cells of the colon, blood and head and neck tissues. We find that in these tissues mutations in stem cells accumulate at rates strikingly similar to those expected without the protection from the immortal strand mechanism. Utilizing an approach that is fundamentally different from previous efforts to confirm or refute the immortal strand hypothesis, we provide evidence against non-random segregation of DNA during stem cell replication. Our results strongly suggest that parental DNA is passed randomly to stem cell daughters and provides new insight into the mechanism of DNA replication in stem cells. Copyright © 2015. Published by Elsevier B.V.

  10. Cost-effective parallel optical interconnection module based on fully passive-alignment process

    NASA Astrophysics Data System (ADS)

    Son, Dong Hoon; Heo, Young Soon; Park, Hyoung-Jun; Kang, Hyun Seo; Kim, Sung Chang

    2017-11-01

    In optical interconnection technology, high-speed and large data transitions with low error rate and cost reduction are key issues for the upcoming 8K media era. The researchers present notable types of optical manufacturing structures of a four-channel parallel optical module by fully passive alignment, which are able to reduce manufacturing time and cost. Each of the components, such as vertical-cavity surface laser/positive-intrinsic negative-photodiode array, microlens array, fiber array, and receiver (RX)/transmitter (TX) integrated circuit, is integrated successfully using flip-chip bonding, die bonding, and passive alignment with a microscope. Clear eye diagrams are obtained by 25.78-Gb/s (for TX) and 25.7-Gb/s (for RX) nonreturn-to-zero signals of pseudorandom binary sequence with a pattern length of 231 to 1. The measured responsivity and minimum sensitivity of the RX are about 0.5 A/W and ≤-6.5 dBm at a bit error rate (BER) of 10-12, respectively. The optical power margin at a BER of 10-12 is 7.5 dB, and cross talk by the adjacent channel is ≤1 dB.

  11. Two-sided estimates of minimum-error distinguishability of mixed quantum states via generalized Holevo-Curlander bounds

    NASA Astrophysics Data System (ADS)

    Tyson, Jon

    2009-03-01

    We prove a concise factor-of-2 estimate for the failure rate of optimally distinguishing an arbitrary ensemble of mixed quantum states, generalizing work of Holevo [Theor. Probab. Appl. 23, 411 (1978)] and Curlander [Ph.D. Thesis, MIT, 1979]. A modification to the minimal principle of Cocha and Poor [Proceedings of the 6th International Conference on Quantum Communication, Measurement, and Computing (Rinton, Princeton, NJ, 2003)] is used to derive a suboptimal measurement which has an error rate within a factor of 2 of the optimal by construction. This measurement is quadratically weighted and has appeared as the first iterate of a sequence of measurements proposed by Ježek et al. [Phys. Rev. A 65, 060301 (2002)]. Unlike the so-called pretty good measurement, it coincides with Holevo's asymptotically optimal measurement in the case of nonequiprobable pure states. A quadratically weighted version of the measurement bound by Barnum and Knill [J. Math. Phys. 43, 2097 (2002)] is proven. Bounds on the distinguishability of syndromes in the sense of Schumacher and Westmoreland [Phys. Rev. A 56, 131 (1997)] appear as a corollary. An appendix relates our bounds to the trace-Jensen inequality.

  12. The (not so) Immortal Strand Hypothesis

    PubMed Central

    Tomasetti, Cristian; Bozic, Ivana

    2015-01-01

    Background Non-random segregation of DNA strands during stem cell replication has been proposed as a mechanism to minimize accumulated genetic errors in stem cells of rapidly dividing tissues. According to this hypothesis, an “immortal” DNA strand is passed to the stem cell daughter and not the more differentiated cell, keeping the stem cell lineage replication error-free. After it was introduced, experimental evidence both in favor and against the hypothesis has been presented. Principal Findings Using a novel methodology that utilizes cancer sequencing data we are able to estimate the rate of accumulation of mutations in healthy stem cells of the colon, blood and head and neck tissues. We to find that in these tissues mutations in stem cells accumulate at rates strikingly similar to those expected without the protection from the immortal strand mechanism. Significance Utilizing an approach that is fundamentally different from previous efforts to confirm or refute the immortal strand hypothesis, we provide strong evidence against non-random segregation of DNA during stem cell replication. Our results strongly suggest that parental DNA is passed randomly to stem cell daughters and provides new insight into the mechanism of DNA replication in stem cells. PMID:25700960

  13. Improved classification accuracy by feature extraction using genetic algorithms

    NASA Astrophysics Data System (ADS)

    Patriarche, Julia; Manduca, Armando; Erickson, Bradley J.

    2003-05-01

    A feature extraction algorithm has been developed for the purposes of improving classification accuracy. The algorithm uses a genetic algorithm / hill-climber hybrid to generate a set of linearly recombined features, which may be of reduced dimensionality compared with the original set. The genetic algorithm performs the global exploration, and a hill climber explores local neighborhoods. Hybridizing the genetic algorithm with a hill climber improves both the rate of convergence, and the final overall cost function value; it also reduces the sensitivity of the genetic algorithm to parameter selection. The genetic algorithm includes the operators: crossover, mutation, and deletion / reactivation - the last of these effects dimensionality reduction. The feature extractor is supervised, and is capable of deriving a separate feature space for each tissue (which are reintegrated during classification). A non-anatomical digital phantom was developed as a gold standard for testing purposes. In tests with the phantom, and with images of multiple sclerosis patients, classification with feature extractor derived features yielded lower error rates than using standard pulse sequences, and with features derived using principal components analysis. Using the multiple sclerosis patient data, the algorithm resulted in a mean 31% reduction in classification error of pure tissues.

  14. Performance of the unique-word-reverse-modulation type demodulator for mobile satellite communications

    NASA Technical Reports Server (NTRS)

    Dohi, Tomohiro; Nitta, Kazumasa; Ueda, Takashi

    1993-01-01

    This paper proposes a new type of coherent demodulator, the unique-word (UW)-reverse-modulation type demodulator, for burst signal controlled by voice operated transmitter (VOX) in mobile satellite communication channels. The demodulator has three individual circuits: a pre-detection signal combiner, a pre-detection UW detector, and a UW-reverse-modulation type demodulator. The pre-detection signal combiner combines signal sequences received by two antennas and improves bit energy-to-noise power density ratio (E(sub b)/N(sub 0)) 2.5 dB to yield 10(exp -3) average bit error rate (BER) when carrier power-to-multipath power ratio (CMR) is 15 dB. The pre-detection UW detector improves UW detection probability when the frequency offset is large. The UW-reverse-modulation type demodulator realizes a maximum pull-in frequency of 3.9 kHz, the pull-in time is 2.4 seconds and frequency error is less than 20 Hz. The performances of this demodulator are confirmed through computer simulations and its effect is clarified in real-time experiments at a bit rate of 16.8 kbps using a digital signal processor (DSP).

  15. WE-G-BRD-02: Characterizing Information Loss in a Sparse-Sampling-Based Dynamic MRI Sequence (k-T BLAST) for Lung Motion Monitoring

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arai, T; Nofiele, J; Sawant, A

    2015-06-15

    Purpose: Rapid MRI is an attractive, non-ionizing tool for soft-tissue-based monitoring of respiratory motion in thoracic and abdominal radiotherapy. One big challenge is to achieve high temporal resolution while maintaining adequate spatial resolution. K-t BLAST, sparse-sampling and reconstruction sequence based on a-priori information represents a potential solution. In this work, we investigated how much “true” motion information is lost as a-priori information is progressively added for faster imaging. Methods: Lung tumor motions in superior-inferior direction obtained from ten individuals were replayed into an in-house, MRI-compatible, programmable motion platform (50Hz refresh and 100microns precision). Six water-filled 1.5ml tubes were placed onmore » it as fiducial markers. Dynamic marker motion within a coronal slice (FOV: 32×32cm{sup 2}, resolution: 0.67×0.67mm{sup 2}, slice-thickness: 5mm) was collected on 3.0T body scanner (Ingenia, Philips). Balanced-FFE (TE/TR: 1.3ms/2.5ms, flip-angle: 40degrees) was used in conjunction with k-t BLAST. Each motion was repeated four times as four k-t acceleration factors 1, 2, 5, and 16 (corresponding frame rates were 2.5, 4.7, 9.8, and 19.1Hz, respectively) were compared. For each image set, one average motion trajectory was computed from six marker displacements. Root mean square error (RMS) was used as a metric of spatial accuracy where measured trajectories were compared to original data. Results: Tumor motion was approximately 10mm. The mean(standard deviation) of respiratory rates over ten patients was 0.28(0.06)Hz. Cumulative distributions of tumor motion frequency spectra (0–25Hz) obtained from the patients showed that 90% of motion fell on 3.88Hz or less. Therefore, the frame rate must be a double or higher for accurate monitoring. The RMS errors over patients for k-t factors of 1, 2, 5, and 16 were.10(.04),.17(.04), .21(.06) and.26(.06)mm, respectively. Conclusions: K-t factor of 5 or higher can cover the high frequency component of tumor respiratory motion, while the estimated error of spatial accuracy was approximately.2mm.« less

  16. Bidirectional Retroviral Integration Site PCR Methodology and Quantitative Data Analysis Workflow.

    PubMed

    Suryawanshi, Gajendra W; Xu, Song; Xie, Yiming; Chou, Tom; Kim, Namshin; Chen, Irvin S Y; Kim, Sanggu

    2017-06-14

    Integration Site (IS) assays are a critical component of the study of retroviral integration sites and their biological significance. In recent retroviral gene therapy studies, IS assays, in combination with next-generation sequencing, have been used as a cell-tracking tool to characterize clonal stem cell populations sharing the same IS. For the accurate comparison of repopulating stem cell clones within and across different samples, the detection sensitivity, data reproducibility, and high-throughput capacity of the assay are among the most important assay qualities. This work provides a detailed protocol and data analysis workflow for bidirectional IS analysis. The bidirectional assay can simultaneously sequence both upstream and downstream vector-host junctions. Compared to conventional unidirectional IS sequencing approaches, the bidirectional approach significantly improves IS detection rates and the characterization of integration events at both ends of the target DNA. The data analysis pipeline described here accurately identifies and enumerates identical IS sequences through multiple steps of comparison that map IS sequences onto the reference genome and determine sequencing errors. Using an optimized assay procedure, we have recently published the detailed repopulation patterns of thousands of Hematopoietic Stem Cell (HSC) clones following transplant in rhesus macaques, demonstrating for the first time the precise time point of HSC repopulation and the functional heterogeneity of HSCs in the primate system. The following protocol describes the step-by-step experimental procedure and data analysis workflow that accurately identifies and quantifies identical IS sequences.

  17. Image secure transmission for optical orthogonal frequency-division multiplexing visible light communication systems using chaotic discrete cosine transform

    NASA Astrophysics Data System (ADS)

    Wang, Zhongpeng; Zhang, Shaozhong; Chen, Fangni; Wu, Ming-Wei; Qiu, Weiwei

    2017-11-01

    A physical encryption scheme for orthogonal frequency-division multiplexing (OFDM) visible light communication (VLC) systems using chaotic discrete cosine transform (DCT) is proposed. In the scheme, the row of the DCT matrix is permutated by a scrambling sequence generated by a three-dimensional (3-D) Arnold chaos map. Furthermore, two scrambling sequences, which are also generated from a 3-D Arnold map, are employed to encrypt the real and imaginary parts of the transmitted OFDM signal before the chaotic DCT operation. The proposed scheme enhances the physical layer security and improves the bit error rate (BER) performance for OFDM-based VLC. The simulation results prove the efficiency of the proposed encryption method. The experimental results show that the proposed security scheme not only protects image data from eavesdroppers but also keeps the good BER and peak-to-average power ratio performances for image-based OFDM-VLC systems.

  18. A VLSI chip set for real time vector quantization of image sequences

    NASA Technical Reports Server (NTRS)

    Baker, Richard L.

    1989-01-01

    The architecture and implementation of a VLSI chip set that vector quantizes (VQ) image sequences in real time is described. The chip set forms a programmable Single-Instruction, Multiple-Data (SIMD) machine which can implement various vector quantization encoding structures. Its VQ codebook may contain unlimited number of codevectors, N, having dimension up to K = 64. Under a weighted least squared error criterion, the engine locates at video rates the best code vector in full-searched or large tree searched VQ codebooks. The ability to manipulate tree structured codebooks, coupled with parallelism and pipelining, permits searches in as short as O (log N) cycles. A full codebook search results in O(N) performance, compared to O(KN) for a Single-Instruction, Single-Data (SISD) machine. With this VLSI chip set, an entire video code can be built on a single board that permits realtime experimentation with very large codebooks.

  19. Emergency department discharge prescription errors in an academic medical center

    PubMed Central

    Belanger, April; Devine, Lauren T.; Lane, Aaron; Condren, Michelle E.

    2017-01-01

    This study described discharge prescription medication errors written for emergency department patients. This study used content analysis in a cross-sectional design to systematically categorize prescription errors found in a report of 1000 discharge prescriptions submitted in the electronic medical record in February 2015. Two pharmacy team members reviewed the discharge prescription list for errors. Open-ended data were coded by an additional rater for agreement on coding categories. Coding was based upon majority rule. Descriptive statistics were used to address the study objective. Categories evaluated were patient age, provider type, drug class, and type and time of error. The discharge prescription error rate out of 1000 prescriptions was 13.4%, with “incomplete or inadequate prescription” being the most commonly detected error (58.2%). The adult and pediatric error rates were 11.7% and 22.7%, respectively. The antibiotics reviewed had the highest number of errors. The highest within-class error rates were with antianginal medications, antiparasitic medications, antacids, appetite stimulants, and probiotics. Emergency medicine residents wrote the highest percentage of prescriptions (46.7%) and had an error rate of 9.2%. Residents of other specialties wrote 340 prescriptions and had an error rate of 20.9%. Errors occurred most often between 10:00 am and 6:00 pm. PMID:28405061

  20. SU-F-E-02: A Feasibility Study for Application of Metal Artifact Reduction Techniques in MR-Guided Brachytherapy Gynecological Cancer with Titanium Applicators

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kadbi, M

    Purpose: Utilization of Titanium Tandem and Ring (T&R) applicators in MR-guided brachytherapy has become widespread for gynecological cancer treatment. However, Titanium causes magnetic field disturbance and susceptibility artifact, which complicate image interpretation. In this study, metal artifact reduction techniques were employed to improve the image quality and reduce the metal related artifacts. Methods: Several techniques were employed to reduce the metal artifact caused by titanium T&R applicator. These techniques include Metal Artifact Reduction Sequence (MARS), View Angle Tilting (VAT) to correct in-plane distortion, and Slice Encoding for Metal Artifact Correction (SEMAC) for through-plane artifact correction. Moreover, MARS can be combinedmore » with VAT to further reduce the in-plane artifact by reapplying the selection gradients during the readout (MARS+VAT). SEMAC uses a slice selective excitation but acquires additional z-encodings in order to resolve off-resonant signal and to reduce through-plane distortions. Results: Comparison between the clinical sequences revealed that increasing the bandwidth reduces the error in measured diameter of T&R. However, the error is larger than 4mm for the best case with highest bandwidth and spatial resolution. MARS+VAT with isotropic resolution of 1mm reduced the error to 1.9mm which is the least among the examined 2D sequences. The measured diameter of tandem from SEMAC+VAT has the closest value to the actual diameter of tandem (3.2mm) and the error was reduced to less than 1mm. In addition, SEMAC+VAT significantly reduces the blooming artifact in the ring compared to clinical sequences. Conclusion: A higher bandwidth and spatial resolution sequence reduces the artifact and diameter of applicator with a slight compromise in SNR. Metal artifact reduction sequences decrease the distortion associated with titanium applicator. SEMAC+VAT sequence in combination with VAT revealed promising results for titanium imaging and can be utilized for MR-guided brachytherapy in gynecological cancer. The author is employee with Philips Healthcare.« less

  1. Robust dynamical decoupling for quantum computing and quantum memory.

    PubMed

    Souza, Alexandre M; Alvarez, Gonzalo A; Suter, Dieter

    2011-06-17

    Dynamical decoupling (DD) is a popular technique for protecting qubits from the environment. However, unless special care is taken, experimental errors in the control pulses used in this technique can destroy the quantum information instead of preserving it. Here, we investigate techniques for making DD sequences robust against different types of experimental errors while retaining good decoupling efficiency in a fluctuating environment. We present experimental data from solid-state nuclear spin qubits and introduce a new DD sequence that is suitable for quantum computing and quantum memory.

  2. Leveraging Distant Relatedness to Quantify Human Mutation and Gene-Conversion Rates

    PubMed Central

    Palamara, Pier Francesco; Francioli, Laurent C.; Wilton, Peter R.; Genovese, Giulio; Gusev, Alexander; Finucane, Hilary K.; Sankararaman, Sriram; Sunyaev, Shamil R.; de Bakker, Paul I.W.; Wakeley, John; Pe’er, Itsik; Price, Alkes L.

    2015-01-01

    The rate at which human genomes mutate is a central biological parameter that has many implications for our ability to understand demographic and evolutionary phenomena. We present a method for inferring mutation and gene-conversion rates by using the number of sequence differences observed in identical-by-descent (IBD) segments together with a reconstructed model of recent population-size history. This approach is robust to, and can quantify, the presence of substantial genotyping error, as validated in coalescent simulations. We applied the method to 498 trio-phased sequenced Dutch individuals and inferred a point mutation rate of 1.66 × 10−8 per base per generation and a rate of 1.26 × 10−9 for <20 bp indels. By quantifying how estimates varied as a function of allele frequency, we inferred the probability that a site is involved in non-crossover gene conversion as 5.99 × 10−6. We found that recombination does not have observable mutagenic effects after gene conversion is accounted for and that local gene-conversion rates reflect recombination rates. We detected a strong enrichment of recent deleterious variation among mismatching variants found within IBD regions and observed summary statistics of local sharing of IBD segments to closely match previously proposed metrics of background selection; however, we found no significant effects of selection on our mutation-rate estimates. We detected no evidence of strong variation of mutation rates in a number of genomic annotations obtained from several recent studies. Our analysis suggests that a mutation-rate estimate higher than that reported by recent pedigree-based studies should be adopted in the context of DNA-based demographic reconstruction. PMID:26581902

  3. Muver, a computational framework for accurately calling accumulated mutations.

    PubMed

    Burkholder, Adam B; Lujan, Scott A; Lavender, Christopher A; Grimm, Sara A; Kunkel, Thomas A; Fargo, David C

    2018-05-09

    Identification of mutations from next-generation sequencing data typically requires a balance between sensitivity and accuracy. This is particularly true of DNA insertions and deletions (indels), that can impart significant phenotypic consequences on cells but are harder to call than substitution mutations from whole genome mutation accumulation experiments. To overcome these difficulties, we present muver, a computational framework that integrates established bioinformatics tools with novel analytical methods to generate mutation calls with the extremely low false positive rates and high sensitivity required for accurate mutation rate determination and comparison. Muver uses statistical comparison of ancestral and descendant allelic frequencies to identify variant loci and assigns genotypes with models that include per-sample assessments of sequencing errors by mutation type and repeat context. Muver identifies maximally parsimonious mutation pathways that connect these genotypes, differentiating potential allelic conversion events and delineating ambiguities in mutation location, type, and size. Benchmarking with a human gold standard father-son pair demonstrates muver's sensitivity and low false positive rates. In DNA mismatch repair (MMR) deficient Saccharomyces cerevisiae, muver detects multi-base deletions in homopolymers longer than the replicative polymerase footprint at rates greater than predicted for sequential single-base deletions, implying a novel multi-repeat-unit slippage mechanism. Benchmarking results demonstrate the high accuracy and sensitivity achieved with muver, particularly for indels, relative to available tools. Applied to an MMR-deficient Saccharomyces cerevisiae system, muver mutation calls facilitate mechanistic insights into DNA replication fidelity.

  4. Overcoming bias and systematic errors in next generation sequencing data.

    PubMed

    Taub, Margaret A; Corrada Bravo, Hector; Irizarry, Rafael A

    2010-12-10

    Considerable time and effort has been spent in developing analysis and quality assessment methods to allow the use of microarrays in a clinical setting. As is the case for microarrays and other high-throughput technologies, data from new high-throughput sequencing technologies are subject to technological and biological biases and systematic errors that can impact downstream analyses. Only when these issues can be readily identified and reliably adjusted for will clinical applications of these new technologies be feasible. Although much work remains to be done in this area, we describe consistently observed biases that should be taken into account when analyzing high-throughput sequencing data. In this article, we review current knowledge about these biases, discuss their impact on analysis results, and propose solutions.

  5. Effects of Mitochondrial DNA Rate Variation on Reconstruction of Pleistocene Demographic History in a Social Avian Species, Pomatostomus superciliosus

    PubMed Central

    Norman, Janette A.; Blackmore, Caroline J.; Rourke, Meaghan; Christidis, Les

    2014-01-01

    Mitochondrial sequence data is often used to reconstruct the demographic history of Pleistocene populations in an effort to understand how species have responded to past climate change events. However, departures from neutral equilibrium conditions can confound evolutionary inference in species with structured populations or those that have experienced periods of population expansion or decline. Selection can affect patterns of mitochondrial DNA variation and variable mutation rates among mitochondrial genes can compromise inferences drawn from single markers. We investigated the contribution of these factors to patterns of mitochondrial variation and estimates of time to most recent common ancestor (TMRCA) for two clades in a co-operatively breeding avian species, the white-browed babbler Pomatostomus superciliosus. Both the protein-coding ND3 gene and hypervariable domain I control region sequences showed departures from neutral expectations within the superciliosus clade, and a two-fold difference in TMRCA estimates. Bayesian phylogenetic analysis provided evidence of departure from a strict clock model of molecular evolution in domain I, leading to an over-estimation of TMRCA for the superciliosus clade at this marker. Our results suggest mitochondrial studies that attempt to reconstruct Pleistocene demographic histories should rigorously evaluate data for departures from neutral equilibrium expectations, including variation in evolutionary rates across multiple markers. Failure to do so can lead to serious errors in the estimation of evolutionary parameters and subsequent demographic inferences concerning the role of climate as a driver of evolutionary change. These effects may be especially pronounced in species with complex social structures occupying heterogeneous environments. We propose that environmentally driven differences in social structure may explain observed differences in evolutionary rate of domain I sequences, resulting from longer than expected retention times for matriarchal lineages in the superciliosus clade. PMID:25181547

  6. Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting.

    PubMed

    Khan, Tarik A; Friedensohn, Simon; Gorter de Vries, Arthur R; Straszewski, Jakub; Ruscheweyh, Hans-Joachim; Reddy, Sai T

    2016-03-01

    High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion-the intraclonal diversity index-which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology.

  7. Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting

    PubMed Central

    Khan, Tarik A.; Friedensohn, Simon; de Vries, Arthur R. Gorter; Straszewski, Jakub; Ruscheweyh, Hans-Joachim; Reddy, Sai T.

    2016-01-01

    High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion—the intraclonal diversity index—which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology. PMID:26998518

  8. Cognitive tests predict real-world errors: the relationship between drug name confusion rates in laboratory-based memory and perception tests and corresponding error rates in large pharmacy chains

    PubMed Central

    Schroeder, Scott R; Salomon, Meghan M; Galanter, William L; Schiff, Gordon D; Vaida, Allen J; Gaunt, Michael J; Bryson, Michelle L; Rash, Christine; Falck, Suzanne; Lambert, Bruce L

    2017-01-01

    Background Drug name confusion is a common type of medication error and a persistent threat to patient safety. In the USA, roughly one per thousand prescriptions results in the wrong drug being filled, and most of these errors involve drug names that look or sound alike. Prior to approval, drug names undergo a variety of tests to assess their potential for confusability, but none of these preapproval tests has been shown to predict real-world error rates. Objectives We conducted a study to assess the association between error rates in laboratory-based tests of drug name memory and perception and real-world drug name confusion error rates. Methods Eighty participants, comprising doctors, nurses, pharmacists, technicians and lay people, completed a battery of laboratory tests assessing visual perception, auditory perception and short-term memory of look-alike and sound-alike drug name pairs (eg, hydroxyzine/hydralazine). Results Laboratory test error rates (and other metrics) significantly predicted real-world error rates obtained from a large, outpatient pharmacy chain, with the best-fitting model accounting for 37% of the variance in real-world error rates. Cross-validation analyses confirmed these results, showing that the laboratory tests also predicted errors from a second pharmacy chain, with 45% of the variance being explained by the laboratory test data. Conclusions Across two distinct pharmacy chains, there is a strong and significant association between drug name confusion error rates observed in the real world and those observed in laboratory-based tests of memory and perception. Regulators and drug companies seeking a validated preapproval method for identifying confusing drug names ought to consider using these simple tests. By using a standard battery of memory and perception tests, it should be possible to reduce the number of confusing look-alike and sound-alike drug name pairs that reach the market, which will help protect patients from potentially harmful medication errors. PMID:27193033

  9. The Essential Component in DNA-Based Information Storage System: Robust Error-Tolerating Module

    PubMed Central

    Yim, Aldrin Kay-Yuen; Yu, Allen Chi-Shing; Li, Jing-Woei; Wong, Ada In-Chun; Loo, Jacky F. C.; Chan, King Ming; Kong, S. K.; Yip, Kevin Y.; Chan, Ting-Fung

    2014-01-01

    The size of digital data is ever increasing and is expected to grow to 40,000 EB by 2020, yet the estimated global information storage capacity in 2011 is <300 EB, indicating that most of the data are transient. DNA, as a very stable nano-molecule, is an ideal massive storage device for long-term data archive. The two most notable illustrations are from Church et al. and Goldman et al., whose approaches are well-optimized for most sequencing platforms – short synthesized DNA fragments without homopolymer. Here, we suggested improvements on error handling methodology that could enable the integration of DNA-based computational process, e.g., algorithms based on self-assembly of DNA. As a proof of concept, a picture of size 438 bytes was encoded to DNA with low-density parity-check error-correction code. We salvaged a significant portion of sequencing reads with mutations generated during DNA synthesis and sequencing and successfully reconstructed the entire picture. A modular-based programing framework – DNAcodec with an eXtensible Markup Language-based data format was also introduced. Our experiments demonstrated the practicability of long DNA message recovery with high error tolerance, which opens the field to biocomputing and synthetic biology. PMID:25414846

  10. The microprocessor-based synthesizer controller

    NASA Technical Reports Server (NTRS)

    Wick, M. R.

    1980-01-01

    Implementation and performance of the microprocessor-based controllers and Dana Digiphase Synthesizer (DCO) installed in the Deep Space Network exciter in the 64-meter and 34-meter subnets to support uplink tuning required for the Voyager-Saturn Encounter is discussed. Test data in tests conducted during the production of the controllers verified the design objective for phase control accuracy of 10 to the - 12 power cycles in eight hours during ramping. Tests conducted require a phase error between a theoretical calculated value and the actual phase of no greater than + or - 1 cycle. Tests included (1) a ramp over a period of eight hours using a ramp rate which covers the synthesizer tuning range (40-51 MHz) and (2) a ramp sequence using the maximum rate (+ or kHz/s) over the tuning range.

  11. Peak-to-average power ratio reduction in orthogonal frequency division multiplexing-based visible light communication systems using a modified partial transmit sequence technique

    NASA Astrophysics Data System (ADS)

    Liu, Yan; Deng, Honggui; Ren, Shuang; Tang, Chengying; Qian, Xuewen

    2018-01-01

    We propose an efficient partial transmit sequence technique based on genetic algorithm and peak-value optimization algorithm (GAPOA) to reduce high peak-to-average power ratio (PAPR) in visible light communication systems based on orthogonal frequency division multiplexing (VLC-OFDM). By analysis of hill-climbing algorithm's pros and cons, we propose the POA with excellent local search ability to further process the signals whose PAPR is still over the threshold after processed by genetic algorithm (GA). To verify the effectiveness of the proposed technique and algorithm, we evaluate the PAPR performance and the bit error rate (BER) performance and compare them with partial transmit sequence (PTS) technique based on GA (GA-PTS), PTS technique based on genetic and hill-climbing algorithm (GH-PTS), and PTS based on shuffled frog leaping algorithm and hill-climbing algorithm (SFLAHC-PTS). The results show that our technique and algorithm have not only better PAPR performance but also lower computational complexity and BER than GA-PTS, GH-PTS, and SFLAHC-PTS technique.

  12. Preliminary report for analysis of genome wide mutations from four ciprofloxacin resistant B. anthracis Sterne isolates generated by Illumina, 454 sequencing and microarrays for DHS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jaing, Crystal; Vergez, Lisa; Hinckley, Aubree

    2011-06-21

    The objective of this project is to provide DHS a comprehensive evaluation of the current genomic technologies including genotyping, Taqman PCR, multiple locus variable tandem repeat analysis (MLVA), microarray and high-throughput DNA sequencing in the analysis of biothreat agents from complex environmental samples. As the result of a different DHS project, we have selected for and isolated a large number of ciprofloxacin resistant B. anthracis Sterne isolates. These isolates vary in the concentrations of ciprofloxacin that they can tolerate, suggesting multiple mutations in the samples. In collaboration with University of Houston, Eureka Genomics and Oak Ridge National Laboratory, we analyzedmore » the ciprofloxacin resistant B. anthracis Sterne isolates by microarray hybridization, Illumina and Roche 454 sequencing to understand the error rates and sensitivity of the different methods. The report provides an assessment of the results and a complete set of all protocols used and all data generated along with information to interpret the protocols and data sets.« less

  13. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis

    PubMed Central

    Bradley, Phelim; Gordon, N. Claire; Walker, Timothy M.; Dunn, Laura; Heys, Simon; Huang, Bill; Earle, Sarah; Pankhurst, Louise J.; Anson, Luke; de Cesare, Mariateresa; Piazza, Paolo; Votintseva, Antonina A.; Golubchik, Tanya; Wilson, Daniel J.; Wyllie, David H.; Diel, Roland; Niemann, Stefan; Feuerriegel, Silke; Kohl, Thomas A.; Ismail, Nazir; Omar, Shaheed V.; Smith, E. Grace; Buck, David; McVean, Gil; Walker, A. Sarah; Peto, Tim E. A.; Crook, Derrick W.; Iqbal, Zamin

    2015-01-01

    The rise of antibiotic-resistant bacteria has led to an urgent need for rapid detection of drug resistance in clinical samples, and improvements in global surveillance. Here we show how de Bruijn graph representation of bacterial diversity can be used to identify species and resistance profiles of clinical isolates. We implement this method for Staphylococcus aureus and Mycobacterium tuberculosis in a software package (‘Mykrobe predictor') that takes raw sequence data as input, and generates a clinician-friendly report within 3 minutes on a laptop. For S. aureus, the error rates of our method are comparable to gold-standard phenotypic methods, with sensitivity/specificity of 99.1%/99.6% across 12 antibiotics (using an independent validation set, n=470). For M. tuberculosis, our method predicts resistance with sensitivity/specificity of 82.6%/98.5% (independent validation set, n=1,609); sensitivity is lower here, probably because of limited understanding of the underlying genetic mechanisms. We give evidence that minor alleles improve detection of extremely drug-resistant strains, and demonstrate feasibility of the use of emerging single-molecule nanopore sequencing techniques for these purposes. PMID:26686880

  14. ViVaMBC: estimating viral sequence variation in complex populations from illumina deep-sequencing data using model-based clustering.

    PubMed

    Verbist, Bie; Clement, Lieven; Reumers, Joke; Thys, Kim; Vapirev, Alexander; Talloen, Willem; Wetzels, Yves; Meys, Joris; Aerssens, Jeroen; Bijnens, Luc; Thas, Olivier

    2015-02-22

    Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses. Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step. ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently a lot of information is already contained in the quality scores enabling the model based clustering procedure to adjust the majority of the sequencing errors. Overall the sensitivity of ViVaMBC is such that technical constraints like PCR errors start to form the bottleneck for low frequency variant detection.

  15. Dispensing error rate after implementation of an automated pharmacy carousel system.

    PubMed

    Oswald, Scott; Caldwell, Richard

    2007-07-01

    A study was conducted to determine filling and dispensing error rates before and after the implementation of an automated pharmacy carousel system (APCS). The study was conducted in a 613-bed acute and tertiary care university hospital. Before the implementation of the APCS, filling and dispensing rates were recorded during October through November 2004 and January 2005. Postimplementation data were collected during May through June 2006. Errors were recorded in three areas of pharmacy operations: first-dose or missing medication fill, automated dispensing cabinet fill, and interdepartmental request fill. A filling error was defined as an error caught by a pharmacist during the verification step. A dispensing error was defined as an error caught by a pharmacist observer after verification by the pharmacist. Before implementation of the APCS, 422 first-dose or missing medication orders were observed between October 2004 and January 2005. Independent data collected in December 2005, approximately six weeks after the introduction of the APCS, found that filling and error rates had increased. The filling rate for automated dispensing cabinets was associated with the largest decrease in errors. Filling and dispensing error rates had decreased by December 2005. In terms of interdepartmental request fill, no dispensing errors were noted in 123 clinic orders dispensed before the implementation of the APCS. One dispensing error out of 85 clinic orders was identified after implementation of the APCS. The implementation of an APCS at a university hospital decreased medication filling errors related to automated cabinets only and did not affect other filling and dispensing errors.

  16. Effects of learning with explicit elaboration on implicit transfer of visuomotor sequence learning.

    PubMed

    Tanaka, Kanji; Watanabe, Katsumi

    2013-08-01

    Intervals between stimuli and/or responses have significant influences on sequential learning. In the present study, we investigated whether transfer would occur even when the intervals and the visual configurations in a sequence were drastically changed so that participants did not notice that the required sequences of responses were identical. In the experiment, two (or three) sequential button presses comprised a "set," and nine (or six) consecutive sets comprised a "hyperset." In the first session, participants learned either a 2 × 9 or 3 × 6 hyperset by trial and error until they completed it 20 times without error. In the second block, the 2 × 9 (3 × 6) hyperset was changed into the 3 × 6 (2 × 9) hyperset, resulting in different visual configurations and intervals between stimuli and responses. Participants were assigned into two groups: the Identical and Random groups. In the Identical group, the sequence (i.e., the buttons to be pressed) in the second block was identical to that in the first block. In the Random group, a new hyperset was learned. Even in the Identical group, no participants noticed that the sequences were identical. Nevertheless, a significant transfer of performance occurred. However, in the subsequent experiment that did not require explicit trial-and-error learning in the first session, implicit transfer in the second session did not occur. These results indicate that learning with explicit elaboration strengthens the implicit representation of the sequence order as a whole; this might occur independently of the intervals between elements and enable implicit transfer.

  17. Adaptive decoding of convolutional codes

    NASA Astrophysics Data System (ADS)

    Hueske, K.; Geldmacher, J.; Götze, J.

    2007-06-01

    Convolutional codes, which are frequently used as error correction codes in digital transmission systems, are generally decoded using the Viterbi Decoder. On the one hand the Viterbi Decoder is an optimum maximum likelihood decoder, i.e. the most probable transmitted code sequence is obtained. On the other hand the mathematical complexity of the algorithm only depends on the used code, not on the number of transmission errors. To reduce the complexity of the decoding process for good transmission conditions, an alternative syndrome based decoder is presented. The reduction of complexity is realized by two different approaches, the syndrome zero sequence deactivation and the path metric equalization. The two approaches enable an easy adaptation of the decoding complexity for different transmission conditions, which results in a trade-off between decoding complexity and error correction performance.

  18. Iterative updating of model error for Bayesian inversion

    NASA Astrophysics Data System (ADS)

    Calvetti, Daniela; Dunlop, Matthew; Somersalo, Erkki; Stuart, Andrew

    2018-02-01

    In computational inverse problems, it is common that a detailed and accurate forward model is approximated by a computationally less challenging substitute. The model reduction may be necessary to meet constraints in computing time when optimization algorithms are used to find a single estimate, or to speed up Markov chain Monte Carlo (MCMC) calculations in the Bayesian framework. The use of an approximate model introduces a discrepancy, or modeling error, that may have a detrimental effect on the solution of the ill-posed inverse problem, or it may severely distort the estimate of the posterior distribution. In the Bayesian paradigm, the modeling error can be considered as a random variable, and by using an estimate of the probability distribution of the unknown, one may estimate the probability distribution of the modeling error and incorporate it into the inversion. We introduce an algorithm which iterates this idea to update the distribution of the model error, leading to a sequence of posterior distributions that are demonstrated empirically to capture the underlying truth with increasing accuracy. Since the algorithm is not based on rejections, it requires only limited full model evaluations. We show analytically that, in the linear Gaussian case, the algorithm converges geometrically fast with respect to the number of iterations when the data is finite dimensional. For more general models, we introduce particle approximations of the iteratively generated sequence of distributions; we also prove that each element of the sequence converges in the large particle limit under a simplifying assumption. We show numerically that, as in the linear case, rapid convergence occurs with respect to the number of iterations. Additionally, we show through computed examples that point estimates obtained from this iterative algorithm are superior to those obtained by neglecting the model error.

  19. Design and cloning strategies for constructing shRNA expression vectors

    PubMed Central

    McIntyre, Glen J; Fanning, Gregory C

    2006-01-01

    Background Short hairpin RNA (shRNA) encoded within an expression vector has proven an effective means of harnessing the RNA interference (RNAi) pathway in mammalian cells. A survey of the literature revealed that shRNA vector construction can be hindered by high mutation rates and the ensuing sequencing is often problematic. Current options for constructing shRNA vectors include the use of annealed complementary oligonucleotides (74 % of surveyed studies), a PCR approach using hairpin containing primers (22 %) and primer extension of hairpin templates (4 %). Results We considered primer extension the most attractive method in terms of cost. However, in initial experiments we encountered a mutation frequency of 50 % compared to a reported 20 – 40 % for other strategies. By modifying the technique to be an isothermal reaction using the DNA polymerase Phi29, we reduced the error rate to 10 %, making primer extension the most efficient and cost-effective approach tested. We also found that inclusion of a restriction site in the loop could be exploited for confirming construct integrity by automated sequencing, while maintaining intended gene suppression. Conclusion In this study we detail simple improvements for constructing and sequencing shRNA that overcome current limitations. We also compare the advantages of our solutions against proposed alternatives. Our technical modifications will be of tangible benefit to researchers looking for a more efficient and reliable shRNA construction process. PMID:16396676

  20. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers.

    PubMed

    Quail, Michael A; Smith, Miriam; Coupland, Paul; Otto, Thomas D; Harris, Simon R; Connor, Thomas R; Bertoni, Anna; Swerdlow, Harold P; Gu, Yong

    2012-07-24

    Next generation sequencing (NGS) technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent's PGM, Pacific Biosciences' RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy. Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform. All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support.

  1. Linking GPS and travel diary data using sequence alignment in a study of children's independent mobility

    PubMed Central

    2011-01-01

    Background Global positioning systems (GPS) are increasingly being used in health research to determine the location of study participants. Combining GPS data with data collected via travel/activity diaries allows researchers to assess where people travel in conjunction with data about trip purpose and accompaniment. However, linking GPS and diary data is problematic and to date the only method has been to match the two datasets manually, which is time consuming and unlikely to be practical for larger data sets. This paper assesses the feasibility of a new sequence alignment method of linking GPS and travel diary data in comparison with the manual matching method. Methods GPS and travel diary data obtained from a study of children's independent mobility were linked using sequence alignment algorithms to test the proof of concept. Travel diaries were assessed for quality by counting the number of errors and inconsistencies in each participant's set of diaries. The success of the sequence alignment method was compared for higher versus lower quality travel diaries, and for accompanied versus unaccompanied trips. Time taken and percentage of trips matched were compared for the sequence alignment method and the manual method. Results The sequence alignment method matched 61.9% of all trips. Higher quality travel diaries were associated with higher match rates in both the sequence alignment and manual matching methods. The sequence alignment method performed almost as well as the manual method and was an order of magnitude faster. However, the sequence alignment method was less successful at fully matching trips and at matching unaccompanied trips. Conclusions Sequence alignment is a promising method of linking GPS and travel diary data in large population datasets, especially if limitations in the trip detection algorithm are addressed. PMID:22142322

  2. The physics of evolution

    NASA Astrophysics Data System (ADS)

    Eigen, Manfred

    1988-12-01

    The Darwinian concept of evolution through natural selection has been revised and put on a solid physical basis, in a form which applies to self-replicable macromolecules. Two new concepts are introduced: sequence space and quasi-species. Evolutionary change in the DNA- or RNA-sequence of a gene can be mapped as a trajectory in a sequence space of dimension ν, where ν corresponds to the number of changeable positions in the genomic sequence. Emphasis, however, is shifted from the single surviving wildtype, a single point in the sequence space, to the complex structure of the mutant distribution that constitutes the quasi-species. Selection is equivalent to an establishment of the quasi-species in a localized region of sequence space, subject to threshold conditions for the error rate and sequence length. Arrival of a new mutant may violate the local threshold condition and thereby lead to a displacement of the quasi-species into a different region of sequence space. This transformation is similar to a phase transition; the dynamical equations that describe the quase-species have been shown to be analogous to those of the two-dimensional Ising model of ferromagnetism. The occurrence of a selectively advantageous mutant is biased by the particulars of the quasi-species distribution, whose mutants are populated according to their fitness relative to that of the wild-type. Inasmuch as fitness regions are connected (like mountain ridges) the evolutionary trajectory is guided to regions of optimal fitness. Evolution experiments in test tubes confirm this modification of the simple chance and law nature of the Darwinian concept. The results of the theory can also be applied to the construction of a machine that provides optimal conditions for a rapid evolution of functionally active macromolecules. An introduction to the physics of molecular evolution by the author has appeared recently.1 Detailed studies of the kinetics and mechanisms of replication of RNA, the most likely candidate for early evolution2,3, and of the implications on natural selection have been given in Refs. 4 and 5. The quasi-species model has been constructed in Refs. 6 and 7 using the concept of sequence space. Subsequently various methods have been invented to elucidate this concept and to relate it to the theory of critical phenomena 8-19. The instability of the quasi-species at the error threshold is discussed in Ref. 10. Evolution experiments with RNA strands in test tubes are described in Refs. 21 and 22.

  3. Short RNA indicator sequences are not completely degraded by autoclaving

    PubMed Central

    Unnithan, Veena V.; Unc, Adrian; Joe, Valerisa; Smith, Geoffrey B.

    2014-01-01

    Short indicator RNA sequences (<100 bp) persist after autoclaving and are recovered intact by molecular amplification. Primers targeting longer sequences are most likely to produce false positives due to amplification errors easily verified by melting curves analyses. If short indicator RNA sequences are used for virus identification and quantification then post autoclave RNA degradation methodology should be employed, which may include further autoclaving. PMID:24518856

  4. Acquisition of initial /s/-stop and stop-/s/sequences in Greek.

    PubMed

    Syrika, Asimina; Nicolaidis, Katerina; Edwards, Jan; Beckman, Mary E

    2011-09-01

    Previous work on children's acquisition of complex sequences points to a tendency for affricates to be acquired before clusters, but there is no clear evidence of a difference in order of acquisition between clusters with /s/ that violate the Sonority Sequencing Principle (SSP), such as /s/ followed by stop in onset position, and other clusters that obey the SSP. One problem with studies that have compared the acquisition of SSP-obeying and SSP-violating clusters is that the component sounds in the two types of sequences were different.This paper examines the acquisition of initial /s/-stop and stop-/s/ sequences by sixty Greek children aged 2 through 5 years. Results showed greater accuracy for the /s/-stop relative to the stop-/s/ sequences, but no difference in accuracy between /ts/, which is usually analyzed as an affricate in Greek, and the other stop-/s/ sequences. Moreover, errors for the /s/-stop sequences and /ts/ primarily involved stop substitutions, whereas errors for /ps/ and /ks/ were more variable and often involved fricative substitutions, a pattern which may have a perceptual explanation. Finally, /ts/ showed a distinct temporal pattern relative to the stop-/s/ clusters /ps/ and /ks/, similar to what has been reported for productions of Greek adults.

  5. Acquisition of initial /s/-stop and stop-/s/ sequences in Greek

    PubMed Central

    Syrika, Asimina; Nicolaidis, Katerina; Edwards, Jan; Beckman, Mary E.

    2010-01-01

    Previous work on children’s acquisition of complex sequences points to a tendency for affricates to be acquired before clusters, but there is no clear evidence of a difference in order of acquisition between clusters with /s/ that violate the Sonority Sequencing Principle (SSP), such as /s/ followed by stop in onset position, and other clusters that obey the SSP. One problem with studies that have compared the acquisition of SSP-obeying and SSP-violating clusters is that the component sounds in the two types of sequences were different. This paper examines the acquisition of initial /s/-stop and stop-/s/ sequences by sixty Greek children aged 2 through 5 years. Results showed greater accuracy for the /s/-stop relative to the stop-/s/ sequences, but no difference in accuracy between /ts/, which is usually analyzed as an affricate in Greek, and the other stop-/s/ sequences. Moreover, errors for the /s/-stop sequences and /ts/ primarily involved stop substitutions, whereas errors for /ps/ and /ks/ were more variable and often involved fricative substitutions, a pattern which may have a perceptual explanation. Finally, /ts/ showed a distinct temporal pattern relative to the stop-/s/ clusters /ps/ and /ks/, similarly to what has been reported for productions of Greek adults. PMID:22070044

  6. Ultrasound biofeedback treatment for persisting childhood apraxia of speech.

    PubMed

    Preston, Jonathan L; Brick, Nickole; Landi, Nicole

    2013-11-01

    The purpose of this study was to evaluate the efficacy of a treatment program that includes ultrasound biofeedback for children with persisting speech sound errors associated with childhood apraxia of speech (CAS). Six children ages 9-15 years participated in a multiple baseline experiment for 18 treatment sessions during which treatment focused on producing sequences involving lingual sounds. Children were cued to modify their tongue movements using visual feedback from real-time ultrasound images. Probe data were collected before, during, and after treatment to assess word-level accuracy for treated and untreated sound sequences. As participants reached preestablished performance criteria, new sequences were introduced into treatment. All participants met the performance criterion (80% accuracy for 2 consecutive sessions) on at least 2 treated sound sequences. Across the 6 participants, performance criterion was met for 23 of 31 treated sequences in an average of 5 sessions. Some participants showed no improvement in untreated sequences, whereas others showed generalization to untreated sequences that were phonetically similar to the treated sequences. Most gains were maintained 2 months after the end of treatment. The percentage of phonemes correct increased significantly from pretreatment to the 2-month follow-up. A treatment program including ultrasound biofeedback is a viable option for improving speech sound accuracy in children with persisting speech sound errors associated with CAS.

  7. Implicit and explicit motor sequence learning in children born very preterm.

    PubMed

    Jongbloed-Pereboom, Marjolein; Janssen, Anjo J W M; Steiner, K; Steenbergen, Bert; Nijhuis-van der Sanden, Maria W G

    2017-01-01

    Motor skills can be learned explicitly (dependent on working memory (WM)) or implicitly (relatively independent of WM). Children born very preterm (VPT) often have working memory deficits. Explicit learning may be compromised in these children. This study investigated implicit and explicit motor learning and the role of working memory in VPT children and controls. Three groups (6-9 years) participated: 20 VPT children with motor problems, 20 VPT children without motor problems, and 20 controls. A nine button sequence was learned implicitly (pressing the lighted button as quickly as possible) and explicitly (discovering the sequence via trial-and-error). Children learned implicitly and explicitly, evidenced by decreased movement duration of the sequence over time. In the explicit condition, children also reduced the number of errors over time. Controls made more errors than VPT children without motor problems. Visual WM had positive effects on both explicit and implicit performance. VPT birth and low motor proficiency did not negatively affect implicit or explicit learning. Visual WM was positively related to both implicit and explicit performance, but did not influence learning curves. These findings question the theoretical difference between implicit and explicit learning and the proposed role of visual WM therein. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. JPL-ANTOPT antenna structure optimization program

    NASA Technical Reports Server (NTRS)

    Strain, D. M.

    1994-01-01

    New antenna path-length error and pointing-error structure optimization codes were recently added to the MSC/NASTRAN structural analysis computer program. Path-length and pointing errors are important measured of structure-related antenna performance. The path-length and pointing errors are treated as scalar displacements for statics loading cases. These scalar displacements can be subject to constraint during the optimization process. Path-length and pointing-error calculations supplement the other optimization and sensitivity capabilities of NASTRAN. The analysis and design functions were implemented as 'DMAP ALTERs' to the Design Optimization (SOL 200) Solution Sequence of MSC-NASTRAN, Version 67.5.

  9. Round-off errors in cutting plane algorithms based on the revised simplex procedure

    NASA Technical Reports Server (NTRS)

    Moore, J. E.

    1973-01-01

    This report statistically analyzes computational round-off errors associated with the cutting plane approach to solving linear integer programming problems. Cutting plane methods require that the inverse of a sequence of matrices be computed. The problem basically reduces to one of minimizing round-off errors in the sequence of inverses. Two procedures for minimizing this problem are presented, and their influence on error accumulation is statistically analyzed. One procedure employs a very small tolerance factor to round computed values to zero. The other procedure is a numerical analysis technique for reinverting or improving the approximate inverse of a matrix. The results indicated that round-off accumulation can be effectively minimized by employing a tolerance factor which reflects the number of significant digits carried for each calculation and by applying the reinversion procedure once to each computed inverse. If 18 significant digits plus an exponent are carried for each variable during computations, then a tolerance value of 0.1 x 10 to the minus 12th power is reasonable.

  10. QSRA: a quality-value guided de novo short read assembler.

    PubMed

    Bryant, Douglas W; Wong, Weng-Keen; Mockler, Todd C

    2009-02-24

    New rapid high-throughput sequencing technologies have sparked the creation of a new class of assembler. Since all high-throughput sequencing platforms incorporate errors in their output, short-read assemblers must be designed to account for this error while utilizing all available data. We have designed and implemented an assembler, Quality-value guided Short Read Assembler, created to take advantage of quality-value scores as a further method of dealing with error. Compared to previous published algorithms, our assembler shows significant improvements not only in speed but also in output quality. QSRA generally produced the highest genomic coverage, while being faster than VCAKE. QSRA is extremely competitive in its longest contig and N50/N80 contig lengths, producing results of similar quality to those of EDENA and VELVET. QSRA provides a step closer to the goal of de novo assembly of complex genomes, improving upon the original VCAKE algorithm by not only drastically reducing runtimes but also increasing the viability of the assembly algorithm through further error handling capabilities.

  11. Differential detection in quadrature-quadrature phase shift keying (Q2PSK) systems

    NASA Astrophysics Data System (ADS)

    El-Ghandour, Osama M.; Saha, Debabrata

    1991-05-01

    A generalized quadrature-quadrature phase shift keying (Q2PSK) signaling format is considered for differential encoding and differential detection. Performance in the presence of additive white Gaussian noise (AWGN) is analyzed. Symbol error rate is found to be approximately twice the symbol error rate in a quaternary DPSK system operating at the same Eb/N0. However, the bandwidth efficiency of differential Q2PSK is substantially higher than that of quaternary DPSK. When the error is due to AWGN, the ratio of double error rate to single error rate can be very high, and the ratio may approach zero at high SNR. To improve error rate, differential detection through maximum-likelihood decoding based on multiple or N symbol observations is considered. If N and SNR are large this decoding gives a 3-dB advantage in error rate over conventional N = 2 differential detection, fully recovering the energy loss (as compared to coherent detection) if the observation is extended to a large number of symbol durations.

  12. On Neglecting Chemical Exchange Effects When Correcting in Vivo 31P MRS Data for Partial Saturation

    NASA Astrophysics Data System (ADS)

    Ouwerkerk, Ronald; Bottomley, Paul A.

    2001-02-01

    Signal acquisition in most MRS experiments requires a correction for partial saturation that is commonly based on a single exponential model for T1 that ignores effects of chemical exchange. We evaluated the errors in 31P MRS measurements introduced by this approximation in two-, three-, and four-site chemical exchange models under a range of flip-angles and pulse sequence repetition times (TR) that provide near-optimum signal-to-noise ratio (SNR). In two-site exchange, such as the creatine-kinase reaction involving phosphocreatine (PCr) and γ-ATP in human skeletal and cardiac muscle, errors in saturation factors were determined for the progressive saturation method and the dual-angle method of measuring T1. The analysis shows that these errors are negligible for the progressive saturation method if the observed T1 is derived from a three-parameter fit of the data. When T1 is measured with the dual-angle method, errors in saturation factors are less than 5% for all conceivable values of the chemical exchange rate and flip-angles that deliver useful SNR per unit time over the range T1/5 ≤ TR ≤ 2T1. Errors are also less than 5% for three- and four-site exchange when TR ≥ T1*/2, the so-called "intrinsic" T1's of the metabolites. The effect of changing metabolite concentrations and chemical exchange rates on observed T1's and saturation corrections was also examined with a three-site chemical exchange model involving ATP, PCr, and inorganic phosphate in skeletal muscle undergoing up to 95% PCr depletion. Although the observed T1's were dependent on metabolite concentrations, errors in saturation corrections for TR = 2 s could be kept within 5% for all exchanging metabolites using a simple interpolation of two dual-angle T1 measurements performed at the start and end of the experiment. Thus, the single-exponential model appears to be reasonably accurate for correcting 31P MRS data for partial saturation in the presence of chemical exchange. Even in systems where metabolite concentrations change, accurate saturation corrections are possible without much loss in SNR.

  13. Error Correction using Quantum Quasi-Cyclic Low-Density Parity-Check(LDPC) Codes

    NASA Astrophysics Data System (ADS)

    Jing, Lin; Brun, Todd; Quantum Research Team

    Quasi-cyclic LDPC codes can approach the Shannon capacity and have efficient decoders. Manabu Hagiwara et al., 2007 presented a method to calculate parity check matrices with high girth. Two distinct, orthogonal matrices Hc and Hd are used. Using submatrices obtained from Hc and Hd by deleting rows, we can alter the code rate. The submatrix of Hc is used to correct Pauli X errors, and the submatrix of Hd to correct Pauli Z errors. We simulated this system for depolarizing noise on USC's High Performance Computing Cluster, and obtained the block error rate (BER) as a function of the error weight and code rate. From the rates of uncorrectable errors under different error weights we can extrapolate the BER to any small error probability. Our results show that this code family can perform reasonably well even at high code rates, thus considerably reducing the overhead compared to concatenated and surface codes. This makes these codes promising as storage blocks in fault-tolerant quantum computation. Error Correction using Quantum Quasi-Cyclic Low-Density Parity-Check(LDPC) Codes.

  14. Unimodular sequence design under frequency hopping communication compatibility requirements

    NASA Astrophysics Data System (ADS)

    Ge, Peng; Cui, Guolong; Kong, Lingjiang; Yang, Jianyu

    2016-12-01

    The integrated design for both radar and anonymous communication has drawn more attention recently since wireless communication system appeals to enhance security and reliability. Given the frequency hopping (FH) communication system, an effective way to realize integrated design is to meet the spectrum compatibility between these two systems. The paper deals with a unimodular sequence design technique which considers optimizing both the spectrum compatibility and peak sidelobes levels (PSL) of auto-correlation function (ACF). The spectrum compatibility requirement realizes anonymous communication for the FH system and provides this system lower probability of intercept (LPI) since the spectrum of the FH system is hidden in that of the radar system. The proposed algorithm, named generalized fitting template (GFT) technique, converts the sequence optimization design problem to a iterative fitting process. In this process, the power spectrum density (PSD) and PSL behaviors of the generated sequences fit both PSD and PSL templates progressively. Two templates are established based on the spectrum compatibility requirement and the expected PSL. As noted, in order to ensure the communication security and reliability, spectrum compatibility requirement is given a higher priority to achieve in the GFT algorithm. This algorithm realizes this point by adjusting the weight adaptively between these two terms during the iteration process. The simulation results are analyzed in terms of bit error rate (BER), PSD, PSL, and signal-interference rate (SIR) for both the radar and FH systems. The performance of GFT is compared with SCAN, CAN, FRE, CYC, and MAT algorithms in the above aspects, which shows its good effectiveness.

  15. Explicit instruction of rules interferes with visuomotor skill transfer.

    PubMed

    Tanaka, Kanji; Watanabe, Katsumi

    2017-06-01

    In the present study, we examined the effects of explicit knowledge, obtained through instruction or spontaneous detection, on the transfer of visuomotor sequence learning. In the learning session, participants learned a visuomotor sequence, via trial and error. In the transfer session, the order of the sequence was reversed from that of the learning session. Before the commencement of the transfer session, some participants received explicit instruction regarding the reversal rule (i.e., Instruction group), while the others did not receive any information and were sorted into either an Aware or Unaware group, as assessed by interview conducted after the transfer session. Participants in the Instruction and Aware groups performed with fewer errors than the Unaware group in the transfer session. The participants in the Instruction group showed slower speed than the Aware and Unaware groups in the transfer session, and the sluggishness likely persisted even in late learning. These results suggest that explicit knowledge reduces errors in visuomotor skill transfer, but may interfere with performance speed, particularly when explicit knowledge is provided, as opposed to being spontaneously discovered.

  16. Prevalence of refractive errors in Möbius sequence.

    PubMed

    Cronemberger, Monica Fialho; Polati, Mariza; Debert, Iara; Mendonça, Tomás Scalamandré; Souza-Dias, Carlos; Miller, Marilyn; Ventura, Liana Oliveira; Nakanami, Célia Regina; Goldchmit, Mauro

    2013-01-01

    To assess the prevalence of refractive errors in Möbius sequence. This study was carried out during the Annual Meeting of the Brazilian Möbius Society in November 2008. Forty-four patients diagnosed with the Möbius sequence were submitted to a comprehensive assessment, on the following specialties: ophthalmology, neurology, genetics, psychiatry, psychology and dentistry. Forty-three patients were cooperative and able to undertake the ophthalmological examination. Twenty-two (51.2 %) were male and 21 (48.8%) were female. The average age was 8.3 years (from 2 to 17 years). The visual acuity was evaluated using a retro-illuminated logMAR chart in cooperative patients. All children were submitted to exams on ocular motility, cyclopegic refraction, and fundus examination. From the total of 85 eyes, using the spherical equivalent, the major of the eyes (57.6%) were emmetropics (>-0.50 D and <+2.00 D). The prevalence of astigmatism greater than or equal to 0.75 D was 40%. The prevalence of refractive errors, by the spherical equivalent, was 42.4% in this studied group.

  17. Executive Council lists and general practitioner files

    PubMed Central

    Farmer, R. D. T.; Knox, E. G.; Cross, K. W.; Crombie, D. L.

    1974-01-01

    An investigation of the accuracy of general practitioner and Executive Council files was approached by a comparison of the two. High error rates were found, including both file errors and record errors. On analysis it emerged that file error rates could not be satisfactorily expressed except in a time-dimensioned way, and we were unable to do this within the context of our study. Record error rates and field error rates were expressible as proportions of the number of records on both the lists; 79·2% of all records exhibited non-congruencies and particular information fields had error rates ranging from 0·8% (assignation of sex) to 68·6% (assignation of civil state). Many of the errors, both field errors and record errors, were attributable to delayed updating of mutable information. It is concluded that the simple transfer of Executive Council lists to a computer filing system would not solve all the inaccuracies and would not in itself permit Executive Council registers to be used for any health care applications requiring high accuracy. For this it would be necessary to design and implement a purpose designed health care record system which would include, rather than depend upon, the general practitioner remuneration system. PMID:4816588

  18. Signaling on the continuous spectrum of nonlinear optical fiber.

    PubMed

    Tavakkolnia, Iman; Safari, Majid

    2017-08-07

    This paper studies different signaling techniques on the continuous spectrum (CS) of nonlinear optical fiber defined by nonlinear Fourier transform. Three different signaling techniques are proposed and analyzed based on the statistics of the noise added to CS after propagation along the nonlinear optical fiber. The proposed methods are compared in terms of error performance, distance reach, and complexity. Furthermore, the effect of chromatic dispersion on the data rate and noise in nonlinear spectral domain is investigated. It is demonstrated that, for a given sequence of CS symbols, an optimal bandwidth (or symbol rate) can be determined so that the temporal duration of the propagated signal at the end of the fiber is minimized. In effect, the required guard interval between the subsequently transmitted data packets in time is minimized and the effective data rate is significantly enhanced. Moreover, by selecting the proper signaling method and design criteria a distance reach of 7100 km is reported by only singling on CS at a rate of 9.6 Gbps.

  19. What are incident reports telling us? A comparative study at two Australian hospitals of medication errors identified at audit, detected by staff and reported to an incident system

    PubMed Central

    Westbrook, Johanna I.; Li, Ling; Lehnbom, Elin C.; Baysari, Melissa T.; Braithwaite, Jeffrey; Burke, Rosemary; Conn, Chris; Day, Richard O.

    2015-01-01

    Objectives To (i) compare medication errors identified at audit and observation with medication incident reports; (ii) identify differences between two hospitals in incident report frequency and medication error rates; (iii) identify prescribing error detection rates by staff. Design Audit of 3291patient records at two hospitals to identify prescribing errors and evidence of their detection by staff. Medication administration errors were identified from a direct observational study of 180 nurses administering 7451 medications. Severity of errors was classified. Those likely to lead to patient harm were categorized as ‘clinically important’. Setting Two major academic teaching hospitals in Sydney, Australia. Main Outcome Measures Rates of medication errors identified from audit and from direct observation were compared with reported medication incident reports. Results A total of 12 567 prescribing errors were identified at audit. Of these 1.2/1000 errors (95% CI: 0.6–1.8) had incident reports. Clinically important prescribing errors (n = 539) were detected by staff at a rate of 218.9/1000 (95% CI: 184.0–253.8), but only 13.0/1000 (95% CI: 3.4–22.5) were reported. 78.1% (n = 421) of clinically important prescribing errors were not detected. A total of 2043 drug administrations (27.4%; 95% CI: 26.4–28.4%) contained ≥1 errors; none had an incident report. Hospital A had a higher frequency of incident reports than Hospital B, but a lower rate of errors at audit. Conclusions Prescribing errors with the potential to cause harm frequently go undetected. Reported incidents do not reflect the profile of medication errors which occur in hospitals or the underlying rates. This demonstrates the inaccuracy of using incident frequency to compare patient risk or quality performance within or across hospitals. New approaches including data mining of electronic clinical information systems are required to support more effective medication error detection and mitigation. PMID:25583702

  20. Sim3C: simulation of Hi-C and Meta3C proximity ligation sequencing technologies.

    PubMed

    DeMaere, Matthew Z; Darling, Aaron E

    2018-02-01

    Chromosome conformation capture (3C) and Hi-C DNA sequencing methods have rapidly advanced our understanding of the spatial organization of genomes and metagenomes. Many variants of these protocols have been developed, each with their own strengths. Currently there is no systematic means for simulating sequence data from this family of sequencing protocols, potentially hindering the advancement of algorithms to exploit this new datatype. We describe a computational simulator that, given simple parameters and reference genome sequences, will simulate Hi-C sequencing on those sequences. The simulator models the basic spatial structure in genomes that is commonly observed in Hi-C and 3C datasets, including the distance-decay relationship in proximity ligation, differences in the frequency of interaction within and across chromosomes, and the structure imposed by cells. A means to model the 3D structure of randomly generated topologically associating domains is provided. The simulator considers several sources of error common to 3C and Hi-C library preparation and sequencing methods, including spurious proximity ligation events and sequencing error. We have introduced the first comprehensive simulator for 3C and Hi-C sequencing protocols. We expect the simulator to have use in testing of Hi-C data analysis algorithms, as well as more general value for experimental design, where questions such as the required depth of sequencing, enzyme choice, and other decisions can be made in advance in order to ensure adequate statistical power with respect to experimental hypothesis testing.

Top