common sequencing error: Topics by Science.gov

Sample records for common sequencing error

Estimating genotype error rates from high-coverage next-generation sequence data.

PubMed

Wall, Jeffrey D; Tang, Ling Fung; Zerbe, Brandon; Kvale, Mark N; Kwok, Pui-Yan; Schaefer, Catherine; Risch, Neil

2014-11-01

Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on the platform and the depth of coverage. Additionally, we found (1) no difference in the error profiles or rates between blood and saliva samples; (2) Complete Genomics sequences had substantially higher error rates than Illumina sequences had; (3) error rates were higher (up to 6%) for rare or unique variants; (4) error rates generally declined with genotype quality (GQ) score, but in a nonlinear fashion for the Illumina data, likely due to loss of specificity of GQ scores greater than 60; and (5) error rates increased with increasing depth of coverage for the Illumina data. These findings, especially (3)-(5), suggest that caution should be taken in interpreting the results of next-generation sequencing-based association studies, and even more so in clinical application of this technology in the absence of validation by other more robust sequencing or genotyping methods. © 2014 Wall et al.; Published by Cold Spring Harbor Laboratory Press.
Sequencing artifacts in the type A influenza databases and attempts to correct them.

PubMed

Suarez, David L; Chester, Nikki; Hatfield, Jason

2014-07-01

There are over 276 000 influenza gene sequences in public databases, with the quality of the sequences determined by the contributor. As part of a high school class project, influenza sequences with possible errors were identified in the public databases based on the size of the gene being longer than expected, with the hypothesis that these sequences would have an error. Students contacted sequence submitters alerting them of the possible sequence issue(s) and requested they the suspect sequence(s) be correct as appropriate. Type A influenza viruses were screened, and gene segments longer than the accepted size were identified for further analysis. Attention was placed on sequences with additional nucleotides upstream or downstream of the highly conserved non-coding ends of the viral segments. A total of 1081 sequences were identified that met this criterion. Three types of errors were commonly observed: non-influenza primer sequence wasn't removed from the sequence; PCR product was cloned and plasmid sequence was included in the sequence; and Taq polymerase added an adenine at the end of the PCR product. Internal insertions of nucleotide sequence were also commonly observed, but in many cases it was unclear if the sequence was correct or actually contained an error. A total of 215 sequences, or 22.8% of the suspect sequences, were corrected in the public databases in the first year of the student project. Unfortunately 138 additional sequences with possible errors were added to the databases in the second year. Additional awareness of the need for data integrity of sequences submitted to public databases is needed to fully reap the benefits of these large data sets. © 2014 The Authors. Influenza and Other Respiratory Viruses Published by John Wiley & Sons Ltd.
Identification and correction of systematic error in high-throughput sequence data

PubMed Central

2011-01-01

Background A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed "next-gen" sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of systematic error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. Results We characterize and describe systematic errors using overlapping paired reads from high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that they are highly replicable across experiments. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq), and can be used with single-end datasets. Conclusions Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments. PMID:22099972
snpAD: An ancient DNA genotype caller.

PubMed

Prüfer, Kay

2018-06-21

The study of ancient genomes can elucidate the evolutionary past. However, analyses are complicated by base-modifications in ancient DNA molecules that result in errors in DNA sequences. These errors are particularly common near the ends of sequences and pose a challenge for genotype calling. I describe an iterative method that estimates genotype frequencies and errors along sequences to allow for accurate genotype calling from ancient sequences. The implementation of this method, called snpAD, performs well on high-coverage ancient data, as shown by simulations and by subsampling the data of a high-coverage Neandertal genome. Although estimates for low-coverage genomes are less accurate, I am able to derive approximate estimates of heterozygosity from several low-coverage Neandertals. These estimates show that low heterozygosity, compared to modern humans, was common among Neandertals. The C ++ code of snpAD is freely available at http://bioinf.eva.mpg.de/snpAD/. Supplementary data are available at Bioinformatics online.
Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction

PubMed Central

Laehnemann, David; Borkhardt, Arndt

2016-01-01

Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. PMID:26026159
Repeat-aware modeling and correction of short read errors.

PubMed

Yang, Xiao; Aluru, Srinivas; Dorman, Karin S

2011-02-15

High-throughput short read sequencing is revolutionizing genomics and systems biology research by enabling cost-effective deep coverage sequencing of genomes and transcriptomes. Error detection and correction are crucial to many short read sequencing applications including de novo genome sequencing, genome resequencing, and digital gene expression analysis. Short read error detection is typically carried out by counting the observed frequencies of kmers in reads and validating those with frequencies exceeding a threshold. In case of genomes with high repeat content, an erroneous kmer may be frequently observed if it has few nucleotide differences with valid kmers with multiple occurrences in the genome. Error detection and correction were mostly applied to genomes with low repeat content and this remains a challenging problem for genomes with high repeat content. We develop a statistical model and a computational method for error detection and correction in the presence of genomic repeats. We propose a method to infer genomic frequencies of kmers from their observed frequencies by analyzing the misread relationships among observed kmers. We also propose a method to estimate the threshold useful for validating kmers whose estimated genomic frequency exceeds the threshold. We demonstrate that superior error detection is achieved using these methods. Furthermore, we break away from the common assumption of uniformly distributed errors within a read, and provide a framework to model position-dependent error occurrence frequencies common to many short read platforms. Lastly, we achieve better error correction in genomes with high repeat content. The software is implemented in C++ and is freely available under GNU GPL3 license and Boost Software V1.0 license at "http://aluru-sun.ece.iastate.edu/doku.php?id = redeem". We introduce a statistical framework to model sequencing errors in next-generation reads, which led to promising results in detecting and correcting errors for genomes with high repeat content.
UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy

PubMed Central

2017-01-01

Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes that are increasingly used in high-throughput sequencing experiments. Through a UMI, identical copies arising from distinct molecules can be distinguished from those arising through PCR amplification of the same molecule. However, bioinformatic methods to leverage the information from UMIs have yet to be formalized. In particular, sequencing errors in the UMI sequence are often ignored or else resolved in an ad hoc manner. We show that errors in the UMI sequence are common and introduce network-based methods to account for these errors when identifying PCR duplicates. Using these methods, we demonstrate improved quantification accuracy both under simulated conditions and real iCLIP and single-cell RNA-seq data sets. Reproducibility between iCLIP replicates and single-cell RNA-seq clustering are both improved using our proposed network-based method, demonstrating the value of properly accounting for errors in UMIs. These methods are implemented in the open source UMI-tools software package. PMID:28100584
AfterQC: automatic filtering, trimming, error removing and quality control for fastq data.

PubMed

Chen, Shifu; Huang, Tanxiao; Zhou, Yanqing; Han, Yue; Xu, Mingyan; Gu, Jia

2017-03-14

Some applications, especially those clinical applications requiring high accuracy of sequencing data, usually have to face the troubles caused by unavoidable sequencing errors. Several tools have been proposed to profile the sequencing quality, but few of them can quantify or correct the sequencing errors. This unmet requirement motivated us to develop AfterQC, a tool with functions to profile sequencing errors and correct most of them, plus highly automated quality control and data filtering features. Different from most tools, AfterQC analyses the overlapping of paired sequences for pair-end sequencing data. Based on overlapping analysis, AfterQC can detect and cut adapters, and furthermore it gives a novel function to correct wrong bases in the overlapping regions. Another new feature is to detect and visualise sequencing bubbles, which can be commonly found on the flowcell lanes and may raise sequencing errors. Besides normal per cycle quality and base content plotting, AfterQC also provides features like polyX (a long sub-sequence of a same base X) filtering, automatic trimming and K-MER based strand bias profiling. For each single or pair of FastQ files, AfterQC filters out bad reads, detects and eliminates sequencer's bubble effects, trims reads at front and tail, detects the sequencing errors and corrects part of them, and finally outputs clean data and generates HTML reports with interactive figures. AfterQC can run in batch mode with multiprocess support, it can run with a single FastQ file, a single pair of FastQ files (for pair-end sequencing), or a folder for all included FastQ files to be processed automatically. Based on overlapping analysis, AfterQC can estimate the sequencing error rate and profile the error transform distribution. The results of our error profiling tests show that the error distribution is highly platform dependent. Much more than just another new quality control (QC) tool, AfterQC is able to perform quality control, data filtering, error profiling and base correction automatically. Experimental results show that AfterQC can help to eliminate the sequencing errors for pair-end sequencing data to provide much cleaner outputs, and consequently help to reduce the false-positive variants, especially for the low-frequency somatic mutations. While providing rich configurable options, AfterQC can detect and set all the options automatically and require no argument in most cases.
Whole exome sequencing for familial bicuspid aortic valve identifies putative variants.

PubMed

Martin, Lisa J; Pilipenko, Valentina; Kaufman, Kenneth M; Cripe, Linda; Kottyan, Leah C; Keddache, Mehdi; Dexheimer, Phillip; Weirauch, Matthew T; Benson, D Woodrow

2014-10-01

Bicuspid aortic valve (BAV) is the most common congenital cardiovascular malformation. Although highly heritable, few causal variants have been identified. The purpose of this study was to identify genetic variants underlying BAV by whole exome sequencing a multiplex BAV kindred. Whole exome sequencing was performed on 17 individuals from a single family (BAV=3; other cardiovascular malformation, 3). Postvariant calling error control metrics were established after examining the relationship between Mendelian inheritance error rate and coverage, quality score, and call rate. To determine the most effective approach to identifying susceptibility variants from among 54 674 variants passing error control metrics, we evaluated 3 variant selection strategies frequently used in whole exome sequencing studies plus extended family linkage. No putative rare, high-effect variants were identified in all affected but no unaffected individuals. Eight high-effect variants were identified by ≥2 of the commonly used selection strategies; however, these were either common in the general population (>10%) or present in the majority of the unaffected family members. However, using extended family linkage, 3 synonymous variants were identified; all 3 variants were identified by at least one other strategy. These results suggest that traditional whole exome sequencing approaches, which assume causal variants alter coding sense, may be insufficient for BAV and other complex traits. Identification of disease-associated variants is facilitated by the use of segregation within families. © 2014 American Heart Association, Inc.
Is a Genome a Codeword of an Error-Correcting Code?

PubMed Central

Kleinschmidt, João H.; Silva-Filho, Márcio C.; Bim, Edson; Herai, Roberto H.; Yamagishi, Michel E. B.; Palazzo, Reginaldo

2012-01-01

Since a genome is a discrete sequence, the elements of which belong to a set of four letters, the question as to whether or not there is an error-correcting code underlying DNA sequences is unavoidable. The most common approach to answering this question is to propose a methodology to verify the existence of such a code. However, none of the methodologies proposed so far, although quite clever, has achieved that goal. In a recent work, we showed that DNA sequences can be identified as codewords in a class of cyclic error-correcting codes known as Hamming codes. In this paper, we show that a complete intron-exon gene, and even a plasmid genome, can be identified as a Hamming code codeword as well. Although this does not constitute a definitive proof that there is an error-correcting code underlying DNA sequences, it is the first evidence in this direction. PMID:22649495
Medication errors in chemotherapy preparation and administration: a survey conducted among oncology nurses in Turkey.

PubMed

Ulas, Arife; Silay, Kamile; Akinci, Sema; Dede, Didem Sener; Akinci, Muhammed Bulent; Sendur, Mehmet Ali Nahit; Cubukcu, Erdem; Coskun, Hasan Senol; Degirmenci, Mustafa; Utkan, Gungor; Ozdemir, Nuriye; Isikdogan, Abdurrahman; Buyukcelik, Abdullah; Inanc, Mevlude; Bilici, Ahmet; Odabasi, Hatice; Cihan, Sener; Avci, Nilufer; Yalcin, Bulent

2015-01-01

Medication errors in oncology may cause severe clinical problems due to low therapeutic indices and high toxicity of chemotherapeutic agents. We aimed to investigate unintentional medication errors and underlying factors during chemotherapy preparation and administration based on a systematic survey conducted to reflect oncology nurses experience. This study was conducted in 18 adult chemotherapy units with volunteer participation of 206 nurses. A survey developed by primary investigators and medication errors (MAEs) defined preventable errors during prescription of medication, ordering, preparation or administration. The survey consisted of 4 parts: demographic features of nurses; workload of chemotherapy units; errors and their estimated monthly number during chemotherapy preparation and administration; and evaluation of the possible factors responsible from ME. The survey was conducted by face to face interview and data analyses were performed with descriptive statistics. Chi-square or Fisher exact tests were used for a comparative analysis of categorical data. Some 83.4% of the 210 nurses reported one or more than one error during chemotherapy preparation and administration. Prescribing or ordering wrong doses by physicians (65.7%) and noncompliance with administration sequences during chemotherapy administration (50.5%) were the most common errors. The most common estimated average monthly error was not following the administration sequence of the chemotherapeutic agents (4.1 times/month, range 1-20). The most important underlying reasons for medication errors were heavy workload (49.7%) and insufficient number of staff (36.5%). Our findings suggest that the probability of medication error is very high during chemotherapy preparation and administration, the most common involving prescribing and ordering errors. Further studies must address the strategies to minimize medication error in chemotherapy receiving patients, determine sufficient protective measures and establishing multistep control mechanisms.
Single molecule counting and assessment of random molecular tagging errors with transposable giga-scale error-correcting barcodes.

PubMed

Lau, Billy T; Ji, Hanlee P

2017-09-21

RNA-Seq measures gene expression by counting sequence reads belonging to unique cDNA fragments. Molecular barcodes commonly in the form of random nucleotides were recently introduced to improve gene expression measures by detecting amplification duplicates, but are susceptible to errors generated during PCR and sequencing. This results in false positive counts, leading to inaccurate transcriptome quantification especially at low input and single-cell RNA amounts where the total number of molecules present is minuscule. To address this issue, we demonstrated the systematic identification of molecular species using transposable error-correcting barcodes that are exponentially expanded to tens of billions of unique labels. We experimentally showed random-mer molecular barcodes suffer from substantial and persistent errors that are difficult to resolve. To assess our method's performance, we applied it to the analysis of known reference RNA standards. By including an inline random-mer molecular barcode, we systematically characterized the presence of sequence errors in random-mer molecular barcodes. We observed that such errors are extensive and become more dominant at low input amounts. We described the first study to use transposable molecular barcodes and its use for studying random-mer molecular barcode errors. Extensive errors found in random-mer molecular barcodes may warrant the use of error correcting barcodes for transcriptome analysis as input amounts decrease.
Ultraaccurate genome sequencing and haplotyping of single human cells.

PubMed

Chu, Wai Keung; Edge, Peter; Lee, Ho Suk; Bansal, Vikas; Bafna, Vineet; Huang, Xiaohua; Zhang, Kun

2017-11-21

Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10 -8 and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs. Copyright © 2017 the Author(s). Published by PNAS.
Sim3C: simulation of Hi-C and Meta3C proximity ligation sequencing technologies.

PubMed

DeMaere, Matthew Z; Darling, Aaron E

2018-02-01

Chromosome conformation capture (3C) and Hi-C DNA sequencing methods have rapidly advanced our understanding of the spatial organization of genomes and metagenomes. Many variants of these protocols have been developed, each with their own strengths. Currently there is no systematic means for simulating sequence data from this family of sequencing protocols, potentially hindering the advancement of algorithms to exploit this new datatype. We describe a computational simulator that, given simple parameters and reference genome sequences, will simulate Hi-C sequencing on those sequences. The simulator models the basic spatial structure in genomes that is commonly observed in Hi-C and 3C datasets, including the distance-decay relationship in proximity ligation, differences in the frequency of interaction within and across chromosomes, and the structure imposed by cells. A means to model the 3D structure of randomly generated topologically associating domains is provided. The simulator considers several sources of error common to 3C and Hi-C library preparation and sequencing methods, including spurious proximity ligation events and sequencing error. We have introduced the first comprehensive simulator for 3C and Hi-C sequencing protocols. We expect the simulator to have use in testing of Hi-C data analysis algorithms, as well as more general value for experimental design, where questions such as the required depth of sequencing, enzyme choice, and other decisions can be made in advance in order to ensure adequate statistical power with respect to experimental hypothesis testing.
Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample.

PubMed

Luo, Chengwei; Tsementzi, Despina; Kyrpides, Nikos; Read, Timothy; Konstantinidis, Konstantinos T

2012-01-01

Next-generation sequencing (NGS) is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA) II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage) correlated highly between the two platforms (R(2)>0.9). Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.
Choice of Reference Sequence and Assembler for Alignment of Listeria monocytogenes Short-Read Sequence Data Greatly Influences Rates of Error in SNP Analyses

PubMed Central

Pightling, Arthur W.; Petronella, Nicholas; Pagotto, Franco

2014-01-01

The wide availability of whole-genome sequencing (WGS) and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs) in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs) are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps) are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i) depth of sequencing coverage, ii) choice of reference-guided short-read sequence assembler, iii) choice of reference genome, and iv) whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT), using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming). We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers should test a variety of conditions to achieve optimal results. PMID:25144537
Sources of error in the retracted scientific literature.

PubMed

Casadevall, Arturo; Steen, R Grant; Fang, Ferric C

2014-09-01

Retraction of flawed articles is an important mechanism for correction of the scientific literature. We recently reported that the majority of retractions are associated with scientific misconduct. In the current study, we focused on the subset of retractions for which no misconduct was identified, in order to identify the major causes of error. Analysis of the retraction notices for 423 articles indexed in PubMed revealed that the most common causes of error-related retraction are laboratory errors, analytical errors, and irreproducible results. The most common laboratory errors are contamination and problems relating to molecular biology procedures (e.g., sequencing, cloning). Retractions due to contamination were more common in the past, whereas analytical errors are now increasing in frequency. A number of publications that have not been retracted despite being shown to contain significant errors suggest that barriers to retraction may impede correction of the literature. In particular, few cases of retraction due to cell line contamination were found despite recognition that this problem has affected numerous publications. An understanding of the errors leading to retraction can guide practices to improve laboratory research and the integrity of the scientific literature. Perhaps most important, our analysis has identified major problems in the mechanisms used to rectify the scientific literature and suggests a need for action by the scientific community to adopt protocols that ensure the integrity of the publication process. © FASEB.
A filtering method to generate high quality short reads using illumina paired-end technology.

PubMed

Eren, A Murat; Vineis, Joseph H; Morrison, Hilary G; Sogin, Mitchell L

2013-01-01

Consensus between independent reads improves the accuracy of genome and transcriptome analyses, however lack of consensus between very similar sequences in metagenomic studies can and often does represent natural variation of biological significance. The common use of machine-assigned quality scores on next generation platforms does not necessarily correlate with accuracy. Here, we describe using the overlap of paired-end, short sequence reads to identify error-prone reads in marker gene analyses and their contribution to spurious OTUs following clustering analysis using QIIME. Our approach can also reduce error in shotgun sequencing data generated from libraries with small, tightly constrained insert sizes. The open-source implementation of this algorithm in Python programming language with user instructions can be obtained from https://github.com/meren/illumina-utils.
Design of association studies with pooled or un-pooled next-generation sequencing data.

PubMed

Kim, Su Yeon; Li, Yingrui; Guo, Yiran; Li, Ruiqiang; Holmkvist, Johan; Hansen, Torben; Pedersen, Oluf; Wang, Jun; Nielsen, Rasmus

2010-07-01

Most common hereditary diseases in humans are complex and multifactorial. Large-scale genome-wide association studies based on SNP genotyping have only identified a small fraction of the heritable variation of these diseases. One explanation may be that many rare variants (a minor allele frequency, MAF <5%), which are not included in the common genotyping platforms, may contribute substantially to the genetic variation of these diseases. Next-generation sequencing, which would allow the analysis of rare variants, is now becoming so cheap that it provides a viable alternative to SNP genotyping. In this paper, we present cost-effective protocols for using next-generation sequencing in association mapping studies based on pooled and un-pooled samples, and identify optimal designs with respect to total number of individuals, number of individuals per pool, and the sequencing coverage. We perform a small empirical study to evaluate the pooling variance in a realistic setting where pooling is combined with exon-capturing. To test for associations, we develop a likelihood ratio statistic that accounts for the high error rate of next-generation sequencing data. We also perform extensive simulations to determine the power and accuracy of this method. Overall, our findings suggest that with a fixed cost, sequencing many individuals at a more shallow depth with larger pool size achieves higher power than sequencing a small number of individuals in higher depth with smaller pool size, even in the presence of high error rates. Our results provide guidelines for researchers who are developing association mapping studies based on next-generation sequencing. (c) 2010 Wiley-Liss, Inc.
Sources of PCR-induced distortions in high-throughput sequencing data sets

PubMed Central

Kebschull, Justus M.; Zador, Anthony M.

2015-01-01

PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCR-mediated amplification. Here we examine the effects of four important sources of error—bias, stochasticity, template switches and polymerase errors—on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules. PMID:26187991

Quality Control Test for Sequence-Phenotype Assignments

PubMed Central

Ortiz, Maria Teresa Lara; Rosario, Pablo Benjamín Leon; Luna-Nevarez, Pablo; Gamez, Alba Savin; Martínez-del Campo, Ana; Del Rio, Gabriel

2015-01-01

Relating a gene mutation to a phenotype is a common task in different disciplines such as protein biochemistry. In this endeavour, it is common to find false relationships arising from mutations introduced by cells that may be depurated using a phenotypic assay; yet, such phenotypic assays may introduce additional false relationships arising from experimental errors. Here we introduce the use of high-throughput DNA sequencers and statistical analysis aimed to identify incorrect DNA sequence-phenotype assignments and observed that 10–20% of these false assignments are expected in large screenings aimed to identify critical residues for protein function. We further show that this level of incorrect DNA sequence-phenotype assignments may significantly alter our understanding about the structure-function relationship of proteins. We have made available an implementation of our method at http://bis.ifc.unam.mx/en/software/chispas. PMID:25700273
Error and Error Mitigation in Low-Coverage Genome Assemblies

PubMed Central

Hubisz, Melissa J.; Lin, Michael F.; Kellis, Manolis; Siepel, Adam

2011-01-01

The recent release of twenty-two new genome sequences has dramatically increased the data available for mammalian comparative genomics, but twenty of these new sequences are currently limited to ∼2× coverage. Here we examine the extent of sequencing error in these 2× assemblies, and its potential impact in downstream analyses. By comparing 2× assemblies with high-quality sequences from the ENCODE regions, we estimate the rate of sequencing error to be 1–4 errors per kilobase. While this error rate is fairly modest, sequencing error can still have surprising effects. For example, an apparent lineage-specific insertion in a coding region is more likely to reflect sequencing error than a true biological event, and the length distribution of coding indels is strongly distorted by error. We find that most errors are contributed by a small fraction of bases with low quality scores, in particular, by the ends of reads in regions of single-read coverage in the assembly. We explore several approaches for automatic sequencing error mitigation (SEM), making use of the localized nature of sequencing error, the fact that it is well predicted by quality scores, and information about errors that comes from comparisons across species. Our automatic methods for error mitigation cannot replace the need for additional sequencing, but they do allow substantial fractions of errors to be masked or eliminated at the cost of modest amounts of over-correction, and they can reduce the impact of error in downstream phylogenomic analyses. Our error-mitigated alignments are available for download. PMID:21340033
Impact of genotyping errors on statistical power of association tests in genomic analyses: A case study

PubMed Central

Hou, Lin; Sun, Ning; Mane, Shrikant; Sayward, Fred; Rajeevan, Nallakkandi; Cheung, Kei-Hoi; Cho, Kelly; Pyarajan, Saiju; Aslan, Mihaela; Miller, Perry; Harvey, Philip D.; Gaziano, J. Michael; Concato, John; Zhao, Hongyu

2017-01-01

A key step in genomic studies is to assess high throughput measurements across millions of markers for each participant’s DNA, either using microarrays or sequencing techniques. Accurate genotype calling is essential for downstream statistical analysis of genotype-phenotype associations, and next generation sequencing (NGS) has recently become a more common approach in genomic studies. How the accuracy of variant calling in NGS-based studies affects downstream association analysis has not, however, been studied using empirical data in which both microarrays and NGS were available. In this article, we investigate the impact of variant calling errors on the statistical power to identify associations between single nucleotides and disease, and on associations between multiple rare variants and disease. Both differential and nondifferential genotyping errors are considered. Our results show that the power of burden tests for rare variants is strongly influenced by the specificity in variant calling, but is rather robust with regard to sensitivity. By using the variant calling accuracies estimated from a substudy of a Cooperative Studies Program project conducted by the Department of Veterans Affairs, we show that the power of association tests is mostly retained with commonly adopted variant calling pipelines. An R package, GWAS.PC, is provided to accommodate power analysis that takes account of genotyping errors (http://zhaocenter.org/software/). PMID:28019059
Sequence data - Magnitude and implications of some ambiguities.

NASA Technical Reports Server (NTRS)

Holmquist, R.; Jukes, T. H.

1972-01-01

A stochastic model is applied to the divergence of the horse-pig lineage from a common ansestor in terms of the alpha and beta chains of hemoglobin and fibrinopeptides. The results are compared with those based on the minimum mutation distance model of Fitch (1972). Buckwheat and cauliflower cytochrome c sequences are analyzed to demonstrate their ambiguities. A comparative analysis of evolutionary rates for various proteins of horses and pigs shows that errors of considerable magnitude are introduced by Glx and Asx ambiguities into evolutionary conclusions drawn from sequences of incompletely analyzed proteins.
Rapid and accurate taxonomic classification of insect (class Insecta) cytochrome c oxidase subunit 1 (COI) DNA barcode sequences using a naïve Bayesian classifier

PubMed Central

Porter, Teresita M; Gibson, Joel F; Shokralla, Shadi; Baird, Donald J; Golding, G Brian; Hajibabaei, Mehrdad

2014-01-01

Current methods to identify unknown insect (class Insecta) cytochrome c oxidase (COI barcode) sequences often rely on thresholds of distances that can be difficult to define, sequence similarity cut-offs, or monophyly. Some of the most commonly used metagenomic classification methods do not provide a measure of confidence for the taxonomic assignments they provide. The aim of this study was to use a naïve Bayesian classifier (Wang et al. Applied and Environmental Microbiology, 2007; 73: 5261) to automate taxonomic assignments for large batches of insect COI sequences such as data obtained from high-throughput environmental sequencing. This method provides rank-flexible taxonomic assignments with an associated bootstrap support value, and it is faster than the blast-based methods commonly used in environmental sequence surveys. We have developed and rigorously tested the performance of three different training sets using leave-one-out cross-validation, two field data sets, and targeted testing of Lepidoptera, Diptera and Mantodea sequences obtained from the Barcode of Life Data system. We found that type I error rates, incorrect taxonomic assignments with a high bootstrap support, were already relatively low but could be lowered further by ensuring that all query taxa are actually present in the reference database. Choosing bootstrap support cut-offs according to query length and summarizing taxonomic assignments to more inclusive ranks can also help to reduce error while retaining the maximum number of assignments. Additionally, we highlight gaps in the taxonomic and geographic representation of insects in public sequence databases that will require further work by taxonomists to improve the quality of assignments generated using any method.
Comparing K-mer based methods for improved classification of 16S sequences.

PubMed

Vinje, Hilde; Liland, Kristian Hovde; Almøy, Trygve; Snipen, Lars

2015-07-01

The need for precise and stable taxonomic classification is highly relevant in modern microbiology. Parallel to the explosion in the amount of sequence data accessible, there has also been a shift in focus for classification methods. Previously, alignment-based methods were the most applicable tools. Now, methods based on counting K-mers by sliding windows are the most interesting classification approach with respect to both speed and accuracy. Here, we present a systematic comparison on five different K-mer based classification methods for the 16S rRNA gene. The methods differ from each other both in data usage and modelling strategies. We have based our study on the commonly known and well-used naïve Bayes classifier from the RDP project, and four other methods were implemented and tested on two different data sets, on full-length sequences as well as fragments of typical read-length. The difference in classification error obtained by the methods seemed to be small, but they were stable and for both data sets tested. The Preprocessed nearest-neighbour (PLSNN) method performed best for full-length 16S rRNA sequences, significantly better than the naïve Bayes RDP method. On fragmented sequences the naïve Bayes Multinomial method performed best, significantly better than all other methods. For both data sets explored, and on both full-length and fragmented sequences, all the five methods reached an error-plateau. We conclude that no K-mer based method is universally best for classifying both full-length sequences and fragments (reads). All methods approach an error plateau indicating improved training data is needed to improve classification from here. Classification errors occur most frequent for genera with few sequences present. For improving the taxonomy and testing new classification methods, the need for a better and more universal and robust training data set is crucial.
ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection

PubMed Central

Alachiotis, Nikolaos; Vogiatzi, Emmanouella; Pavlidis, Pavlos; Stamatakis, Alexandros

2013-01-01

Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined) characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a per-peak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG), an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA) for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability. The guided MSA assembly procedure in ChromatoGate detects chromatogram peaks of all characters in an alignment that lead to polymorphic sites, given a user-defined threshold. The threshold value represents the sensitivity of the sequencing error detection mechanism. After this pre-filtering, the user only needs to inspect a small number of peaks in every chromatogram to correct sequencing errors. Finally, we show that correcting sequencing errors is important, because population genetic and phylogenetic inferences can be misled by MSAs with uncorrected mis-calls. Our experiments indicate that estimates of population mutation rates can be affected two- to three-fold by uncorrected errors. PMID:24688709
ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection.

PubMed

Alachiotis, Nikolaos; Vogiatzi, Emmanouella; Pavlidis, Pavlos; Stamatakis, Alexandros

2013-01-01

Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined) characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a per-peak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG), an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA) for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability. The guided MSA assembly procedure in ChromatoGate detects chromatogram peaks of all characters in an alignment that lead to polymorphic sites, given a user-defined threshold. The threshold value represents the sensitivity of the sequencing error detection mechanism. After this pre-filtering, the user only needs to inspect a small number of peaks in every chromatogram to correct sequencing errors. Finally, we show that correcting sequencing errors is important, because population genetic and phylogenetic inferences can be misled by MSAs with uncorrected mis-calls. Our experiments indicate that estimates of population mutation rates can be affected two- to three-fold by uncorrected errors.
Error Rate Comparison during Polymerase Chain Reaction by DNA Polymerase

DOE PAGES

McInerney, Peter; Adams, Paul; Hadi, Masood Z.

2014-01-01

As larger-scale cloning projects become more prevalent, there is an increasing need for comparisons among high fidelity DNA polymerases used for PCR amplification. All polymerases marketed for PCR applications are tested for fidelity properties (i.e., error rate determination) by vendors, and numerous literature reports have addressed PCR enzyme fidelity. Nonetheless, it is often difficult to make direct comparisons among different enzymes due to numerous methodological and analytical differences from study to study. We have measured the error rates for 6 DNA polymerases commonly used in PCR applications, including 3 polymerases typically used for cloning applications requiring high fidelity. Error ratemore » measurement values reported here were obtained by direct sequencing of cloned PCR products. The strategy employed here allows interrogation of error rate across a very large DNA sequence space, since 94 unique DNA targets were used as templates for PCR cloning. The six enzymes included in the study, Taq polymerase, AccuPrime-Taq High Fidelity, KOD Hot Start, cloned Pfu polymerase, Phusion Hot Start, and Pwo polymerase, we find the lowest error rates with Pfu , Phusion, and Pwo polymerases. Error rates are comparable for these 3 enzymes and are >10x lower than the error rate observed with Taq polymerase. Mutation spectra are reported, with the 3 high fidelity enzymes displaying broadly similar types of mutations. For these enzymes, transition mutations predominate, with little bias observed for type of transition.« less
Mapping DNA polymerase errors by single-molecule sequencing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, David F.; Lu, Jenny; Chang, Seungwoo

Genomic integrity is compromised by DNA polymerase replication errors, which occur in a sequence-dependent manner across the genome. Accurate and complete quantification of a DNA polymerase's error spectrum is challenging because errors are rare and difficult to detect. We report a high-throughput sequencing assay to map in vitro DNA replication errors at the single-molecule level. Unlike previous methods, our assay is able to rapidly detect a large number of polymerase errors at base resolution over any template substrate without quantification bias. To overcome the high error rate of high-throughput sequencing, our assay uses a barcoding strategy in which each replicationmore » product is tagged with a unique nucleotide sequence before amplification. Here, this allows multiple sequencing reads of the same product to be compared so that sequencing errors can be found and removed. We demonstrate the ability of our assay to characterize the average error rate, error hotspots and lesion bypass fidelity of several DNA polymerases.« less
Mapping DNA polymerase errors by single-molecule sequencing

DOE PAGES

Lee, David F.; Lu, Jenny; Chang, Seungwoo; ...

2016-05-16

Genomic integrity is compromised by DNA polymerase replication errors, which occur in a sequence-dependent manner across the genome. Accurate and complete quantification of a DNA polymerase's error spectrum is challenging because errors are rare and difficult to detect. We report a high-throughput sequencing assay to map in vitro DNA replication errors at the single-molecule level. Unlike previous methods, our assay is able to rapidly detect a large number of polymerase errors at base resolution over any template substrate without quantification bias. To overcome the high error rate of high-throughput sequencing, our assay uses a barcoding strategy in which each replicationmore » product is tagged with a unique nucleotide sequence before amplification. Here, this allows multiple sequencing reads of the same product to be compared so that sequencing errors can be found and removed. We demonstrate the ability of our assay to characterize the average error rate, error hotspots and lesion bypass fidelity of several DNA polymerases.« less
Accurate multiplex polony sequencing of an evolved bacterial genome.

PubMed

Shendure, Jay; Porreca, Gregory J; Reppas, Nikos B; Lin, Xiaoxia; McCutcheon, John P; Rosenbaum, Abraham M; Wang, Michael D; Zhang, Kun; Mitra, Robi D; Church, George M

2005-09-09

We describe a DNA sequencing technology in which a commonly available, inexpensive epifluorescence microscope is converted to rapid nonelectrophoretic DNA sequencing automation. We apply this technology to resequence an evolved strain of Escherichia coli at less than one error per million consensus bases. A cell-free, mate-paired library provided single DNA molecules that were amplified in parallel to 1-micrometer beads by emulsion polymerase chain reaction. Millions of beads were immobilized in a polyacrylamide gel and subjected to automated cycles of sequencing by ligation and four-color imaging. Cost per base was roughly one-ninth as much as that of conventional sequencing. Our protocols were implemented with off-the-shelf instrumentation and reagents.
Modeling Inborn Errors of Hepatic Metabolism Using Induced Pluripotent Stem Cells.

PubMed

Pournasr, Behshad; Duncan, Stephen A

2017-11-01

Inborn errors of hepatic metabolism are because of deficiencies commonly within a single enzyme as a consequence of heritable mutations in the genome. Individually such diseases are rare, but collectively they are common. Advances in genome-wide association studies and DNA sequencing have helped researchers identify the underlying genetic basis of such diseases. Unfortunately, cellular and animal models that accurately recapitulate these inborn errors of hepatic metabolism in the laboratory have been lacking. Recently, investigators have exploited molecular techniques to generate induced pluripotent stem cells from patients' somatic cells. Induced pluripotent stem cells can differentiate into a wide variety of cell types, including hepatocytes, thereby offering an innovative approach to unravel the mechanisms underlying inborn errors of hepatic metabolism. Moreover, such cell models could potentially provide a platform for the discovery of therapeutics. In this mini-review, we present a brief overview of the state-of-the-art in using pluripotent stem cells for such studies. © 2017 American Heart Association, Inc.
Efficient error correction for next-generation sequencing of viral amplicons

PubMed Central

2012-01-01

Background Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. Results In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Conclusions Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses. The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm PMID:22759430
Efficient error correction for next-generation sequencing of viral amplicons.

PubMed

Skums, Pavel; Dimitrova, Zoya; Campo, David S; Vaughan, Gilberto; Rossi, Livia; Forbi, Joseph C; Yokosawa, Jonny; Zelikovsky, Alex; Khudyakov, Yury

2012-06-25

Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses.The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm.
DOE Office of Scientific and Technical Information (OSTI.GOV)

McInerney, Peter; Adams, Paul; Hadi, Masood Z.

As larger-scale cloning projects become more prevalent, there is an increasing need for comparisons among high fidelity DNA polymerases used for PCR amplification. All polymerases marketed for PCR applications are tested for fidelity properties (i.e., error rate determination) by vendors, and numerous literature reports have addressed PCR enzyme fidelity. Nonetheless, it is often difficult to make direct comparisons among different enzymes due to numerous methodological and analytical differences from study to study. We have measured the error rates for 6 DNA polymerases commonly used in PCR applications, including 3 polymerases typically used for cloning applications requiring high fidelity. Error ratemore » measurement values reported here were obtained by direct sequencing of cloned PCR products. The strategy employed here allows interrogation of error rate across a very large DNA sequence space, since 94 unique DNA targets were used as templates for PCR cloning. The six enzymes included in the study, Taq polymerase, AccuPrime-Taq High Fidelity, KOD Hot Start, cloned Pfu polymerase, Phusion Hot Start, and Pwo polymerase, we find the lowest error rates with Pfu , Phusion, and Pwo polymerases. Error rates are comparable for these 3 enzymes and are >10x lower than the error rate observed with Taq polymerase. Mutation spectra are reported, with the 3 high fidelity enzymes displaying broadly similar types of mutations. For these enzymes, transition mutations predominate, with little bias observed for type of transition.« less
Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen).

PubMed

Rambaut, Andrew; Lam, Tommy T; Max Carvalho, Luiz; Pybus, Oliver G

2016-01-01

Gene sequences sampled at different points in time can be used to infer molecular phylogenies on a natural timescale of months or years, provided that the sequences in question undergo measurable amounts of evolutionary change between sampling times. Data sets with this property are termed heterochronous and have become increasingly common in several fields of biology, most notably the molecular epidemiology of rapidly evolving viruses. Here we introduce the cross-platform software tool, TempEst (formerly known as Path-O-Gen), for the visualization and analysis of temporally sampled sequence data. Given a molecular phylogeny and the dates of sampling for each sequence, TempEst uses an interactive regression approach to explore the association between genetic divergence through time and sampling dates. TempEst can be used to (1) assess whether there is sufficient temporal signal in the data to proceed with phylogenetic molecular clock analysis, and (2) identify sequences whose genetic divergence and sampling date are incongruent. Examination of the latter can help identify data quality problems, including errors in data annotation, sample contamination, sequence recombination, or alignment error. We recommend that all users of the molecular clock models implemented in BEAST first check their data using TempEst prior to analysis.
Impacts of visuomotor sequence learning methods on speed and accuracy: Starting over from the beginning or from the point of error.

PubMed

Tanaka, Kanji; Watanabe, Katsumi

2016-02-01

The present study examined whether sequence learning led to more accurate and shorter performance time if people who are learning a sequence start over from the beginning when they make an error (i.e., practice the whole sequence) or only from the point of error (i.e., practice a part of the sequence). We used a visuomotor sequence learning paradigm with a trial-and-error procedure. In Experiment 1, we found fewer errors, and shorter performance time for those who restarted their performance from the beginning of the sequence as compared to those who restarted from the point at which an error occurred, indicating better learning of spatial and motor representations of the sequence. This might be because the learned elements were repeated when the next performance started over from the beginning. In subsequent experiments, we increased the occasions for the repetitions of learned elements by modulating the number of fresh start points in the sequence after errors. The results showed that fewer fresh start points were likely to lead to fewer errors and shorter performance time, indicating that the repetitions of learned elements enabled participants to develop stronger spatial and motor representations of the sequence. Thus, a single or two fresh start points in the sequence (i.e., starting over only from the beginning or from the beginning or midpoint of the sequence after errors) is likely to lead to more accurate and faster performance. Copyright © 2016 Elsevier B.V. All rights reserved.
Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data

PubMed Central

Soraggi, Samuele; Wiuf, Carsten; Albrechtsen, Anders

2017-01-01

The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high-throughput sequencing data, calling genotypes accurately is not always possible; therefore, the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction to combat the problems of sequencing errors, and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this leads to an estimate of the admixture rate. We prove that the D-statistic is approximated by a standard normal distribution. Furthermore, we show that our method outperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low and medium sequencing depth (1–10×), and performances are as good as with perfectly called genotypes at a sequencing depth of 2×. We show the reliability of error correction in scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to estimate the admixture rates. PMID:29196497
Iterative updating of model error for Bayesian inversion

NASA Astrophysics Data System (ADS)

Calvetti, Daniela; Dunlop, Matthew; Somersalo, Erkki; Stuart, Andrew

2018-02-01

In computational inverse problems, it is common that a detailed and accurate forward model is approximated by a computationally less challenging substitute. The model reduction may be necessary to meet constraints in computing time when optimization algorithms are used to find a single estimate, or to speed up Markov chain Monte Carlo (MCMC) calculations in the Bayesian framework. The use of an approximate model introduces a discrepancy, or modeling error, that may have a detrimental effect on the solution of the ill-posed inverse problem, or it may severely distort the estimate of the posterior distribution. In the Bayesian paradigm, the modeling error can be considered as a random variable, and by using an estimate of the probability distribution of the unknown, one may estimate the probability distribution of the modeling error and incorporate it into the inversion. We introduce an algorithm which iterates this idea to update the distribution of the model error, leading to a sequence of posterior distributions that are demonstrated empirically to capture the underlying truth with increasing accuracy. Since the algorithm is not based on rejections, it requires only limited full model evaluations. We show analytically that, in the linear Gaussian case, the algorithm converges geometrically fast with respect to the number of iterations when the data is finite dimensional. For more general models, we introduce particle approximations of the iteratively generated sequence of distributions; we also prove that each element of the sequence converges in the large particle limit under a simplifying assumption. We show numerically that, as in the linear case, rapid convergence occurs with respect to the number of iterations. Additionally, we show through computed examples that point estimates obtained from this iterative algorithm are superior to those obtained by neglecting the model error.

A Novel Genome-Information Content-Based Statistic for Genome-Wide Association Analysis Designed for Next-Generation Sequencing Data

PubMed Central

Luo, Li; Zhu, Yun

2012-01-01

Abstract The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T2, collapsing method, multivariate and collapsing (CMC) method, individual χ2 test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets. PMID:22651812
A novel genome-information content-based statistic for genome-wide association analysis designed for next-generation sequencing data.

PubMed

Luo, Li; Zhu, Yun; Xiong, Momiao

2012-06-01

The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.
Using Fault Trees to Advance Understanding of Diagnostic Errors.

PubMed

Rogith, Deevakar; Iyengar, M Sriram; Singh, Hardeep

2017-11-01

Diagnostic errors annually affect at least 5% of adults in the outpatient setting in the United States. Formal analytic techniques are only infrequently used to understand them, in part because of the complexity of diagnostic processes and clinical work flows involved. In this article, diagnostic errors were modeled using fault tree analysis (FTA), a form of root cause analysis that has been successfully used in other high-complexity, high-risk contexts. How factors contributing to diagnostic errors can be systematically modeled by FTA to inform error understanding and error prevention is demonstrated. A team of three experts reviewed 10 published cases of diagnostic error and constructed fault trees. The fault trees were modeled according to currently available conceptual frameworks characterizing diagnostic error. The 10 trees were then synthesized into a single fault tree to identify common contributing factors and pathways leading to diagnostic error. FTA is a visual, structured, deductive approach that depicts the temporal sequence of events and their interactions in a formal logical hierarchy. The visual FTA enables easier understanding of causative processes and cognitive and system factors, as well as rapid identification of common pathways and interactions in a unified fashion. In addition, it enables calculation of empirical estimates for causative pathways. Thus, fault trees might provide a useful framework for both quantitative and qualitative analysis of diagnostic errors. Future directions include establishing validity and reliability by modeling a wider range of error cases, conducting quantitative evaluations, and undertaking deeper exploration of other FTA capabilities. Copyright © 2017 The Joint Commission. Published by Elsevier Inc. All rights reserved.
Fundamental Bounds for Sequence Reconstruction from Nanopore Sequencers.

PubMed

Magner, Abram; Duda, Jarosław; Szpankowski, Wojciech; Grama, Ananth

2016-06-01

Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given indel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) using replicated extrusion (the process of passing a DNA strand through the nanopore), what is the number of replicas needed to accurately reconstruct the true sequence with high probability? Our results provide a number of important insights: (i) the probability of accurate reconstruction of a sequence from a single sample in the presence of indel errors tends quickly (i.e., exponentially) to zero as the length of the sequence increases; and (ii) replicated extrusion is an effective technique for accurate reconstruction. We show that for typical distributions of indel errors, the required number of replicas is a slow function (polylogarithmic) of sequence length - implying that through replicated extrusion, we can sequence large reads using nanopore sequencers. Moreover, we show that in certain cases, the required number of replicas can be related to information-theoretic parameters of the indel error distributions.
Primer ID Validates Template Sampling Depth and Greatly Reduces the Error Rate of Next-Generation Sequencing of HIV-1 Genomic RNA Populations

PubMed Central

Zhou, Shuntai; Jones, Corbin; Mieczkowski, Piotr

2015-01-01

ABSTRACT Validating the sampling depth and reducing sequencing errors are critical for studies of viral populations using next-generation sequencing (NGS). We previously described the use of Primer ID to tag each viral RNA template with a block of degenerate nucleotides in the cDNA primer. We now show that low-abundance Primer IDs (offspring Primer IDs) are generated due to PCR/sequencing errors. These artifactual Primer IDs can be removed using a cutoff model for the number of reads required to make a template consensus sequence. We have modeled the fraction of sequences lost due to Primer ID resampling. For a typical sequencing run, less than 10% of the raw reads are lost to offspring Primer ID filtering and resampling. The remaining raw reads are used to correct for PCR resampling and sequencing errors. We also demonstrate that Primer ID reveals bias intrinsic to PCR, especially at low template input or utilization. cDNA synthesis and PCR convert ca. 20% of RNA templates into recoverable sequences, and 30-fold sequence coverage recovers most of these template sequences. We have directly measured the residual error rate to be around 1 in 10,000 nucleotides. We use this error rate and the Poisson distribution to define the cutoff to identify preexisting drug resistance mutations at low abundance in an HIV-infected subject. Collectively, these studies show that >90% of the raw sequence reads can be used to validate template sampling depth and to dramatically reduce the error rate in assessing a genetically diverse viral population using NGS. IMPORTANCE Although next-generation sequencing (NGS) has revolutionized sequencing strategies, it suffers from serious limitations in defining sequence heterogeneity in a genetically diverse population, such as HIV-1 due to PCR resampling and PCR/sequencing errors. The Primer ID approach reveals the true sampling depth and greatly reduces errors. Knowing the sampling depth allows the construction of a model of how to maximize the recovery of sequences from input templates and to reduce resampling of the Primer ID so that appropriate multiplexing can be included in the experimental design. With the defined sampling depth and measured error rate, we are able to assign cutoffs for the accurate detection of minority variants in viral populations. This approach allows the power of NGS to be realized without having to guess about sampling depth or to ignore the problem of PCR resampling, while also being able to correct most of the errors in the data set. PMID:26041299
Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data.

PubMed

Soraggi, Samuele; Wiuf, Carsten; Albrechtsen, Anders

2018-02-02

The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high-throughput sequencing data, calling genotypes accurately is not always possible; therefore, the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction to combat the problems of sequencing errors, and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this leads to an estimate of the admixture rate. We prove that the D-statistic is approximated by a standard normal distribution. Furthermore, we show that our method outperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low and medium sequencing depth (1-10×), and performances are as good as with perfectly called genotypes at a sequencing depth of 2×. We show the reliability of error correction in scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to estimate the admixture rates. Copyright © 2018 Soraggi et al.
Improved Quality in Aerospace Testing Through the Modern Design of Experiments

NASA Technical Reports Server (NTRS)

DeLoach, R.

2000-01-01

This paper illustrates how, in the presence of systematic error, the quality of an experimental result can be influenced by the order in which the independent variables are set. It is suggested that in typical experimental circumstances in which systematic errors are significant, the common practice of organizing the set point order of independent variables to maximize data acquisition rate results in a test matrix that fails to produce the highest quality research result. With some care to match the volume of data required to satisfy inference error risk tolerances, it is possible to accept a lower rate of data acquisition and still produce results of higher technical quality (lower experimental error) with less cost and in less time than conventional test procedures, simply by optimizing the sequence in which independent variable levels are set.
Error correcting coding-theory for structured light illumination systems

NASA Astrophysics Data System (ADS)

Porras-Aguilar, Rosario; Falaggis, Konstantinos; Ramos-Garcia, Ruben

2017-06-01

Intensity discrete structured light illumination systems project a series of projection patterns for the estimation of the absolute fringe order using only the temporal grey-level sequence at each pixel. This work proposes the use of error-correcting codes for pixel-wise correction of measurement errors. The use of an error correcting code is advantageous in many ways: it allows reducing the effect of random intensity noise, it corrects outliners near the border of the fringe commonly present when using intensity discrete patterns, and it provides a robustness in case of severe measurement errors (even for burst errors where whole frames are lost). The latter aspect is particular interesting in environments with varying ambient light as well as in critical safety applications as e.g. monitoring of deformations of components in nuclear power plants, where a high reliability is ensured even in case of short measurement disruptions. A special form of burst errors is the so-called salt and pepper noise, which can largely be removed with error correcting codes using only the information of a given pixel. The performance of this technique is evaluated using both simulations and experiments.
Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.

PubMed

Schirmer, Melanie; Ijaz, Umer Z; D'Amore, Rosalinda; Hall, Neil; Sloan, William T; Quince, Christopher

2015-03-31

With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina's MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Local alignment of two-base encoded DNA sequence

PubMed Central

Homer, Nils; Merriman, Barry; Nelson, Stanley F

2009-01-01

Background DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. Results We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions. Conclusion The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data. PMID:19508732
Technical Report on Modeling for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

McLoughlin, K.

2016-01-11

The overall aim of this project is to develop a software package, called MetaQuant, that can determine the constituents of a complex microbial sample and estimate their relative abundances by analysis of metagenomic sequencing data. The goal for Task 1 is to create a generative model describing the stochastic process underlying the creation of sequence read pairs in the data set. The stages in this generative process include the selection of a source genome sequence for each read pair, with probability dependent on its abundance in the sample. The other stages describe the evolution of the source genome from itsmore » nearest common ancestor with a reference genome, breakage of the source DNA into short fragments, and the errors in sequencing the ends of the fragments to produce read pairs.« less
Correcting for sequencing error in maximum likelihood phylogeny inference.

PubMed

Kuhner, Mary K; McGill, James

2014-11-04

Accurate phylogenies are critical to taxonomy as well as studies of speciation processes and other evolutionary patterns. Accurate branch lengths in phylogenies are critical for dating and rate measurements. Such accuracy may be jeopardized by unacknowledged sequencing error. We use simulated data to test a correction for DNA sequencing error in maximum likelihood phylogeny inference. Over a wide range of data polymorphism and true error rate, we found that correcting for sequencing error improves recovery of the branch lengths, even if the assumed error rate is up to twice the true error rate. Low error rates have little effect on recovery of the topology. When error is high, correction improves topological inference; however, when error is extremely high, using an assumed error rate greater than the true error rate leads to poor recovery of both topology and branch lengths. The error correction approach tested here was proposed in 2004 but has not been widely used, perhaps because researchers do not want to commit to an estimate of the error rate. This study shows that correction with an approximate error rate is generally preferable to ignoring the issue. Copyright © 2014 Kuhner and McGill.
On the Power and the Systematic Biases of the Detection of Chromosomal Inversions by Paired-End Genome Sequencing

PubMed Central

Lucas Lledó, José Ignacio; Cáceres, Mario

2013-01-01

One of the most used techniques to study structural variation at a genome level is paired-end mapping (PEM). PEM has the advantage of being able to detect balanced events, such as inversions and translocations. However, inversions are still quite difficult to predict reliably, especially from high-throughput sequencing data. We simulated realistic PEM experiments with different combinations of read and library fragment lengths, including sequencing errors and meaningful base-qualities, to quantify and track down the origin of false positives and negatives along sequencing, mapping, and downstream analysis. We show that PEM is very appropriate to detect a wide range of inversions, even with low coverage data. However, % of inversions located between segmental duplications are expected to go undetected by the most common sequencing strategies. In general, longer DNA libraries improve the detectability of inversions far better than increments of the coverage depth or the read length. Finally, we review the performance of three algorithms to detect inversions —SVDetect, GRIAL, and VariationHunter—, identify common pitfalls, and reveal important differences in their breakpoint precisions. These results stress the importance of the sequencing strategy for the detection of structural variants, especially inversions, and offer guidelines for the design of future genome sequencing projects. PMID:23637806
Improve homology search sensitivity of PacBio data by correcting frameshifts.

PubMed

Du, Nan; Sun, Yanni

2016-09-01

Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than secondary generation sequencing technologies such as Illumina. The long read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and identify gene isoforms with higher accuracy in transcriptomic sequencing. However, PacBio data has high sequencing error rate and most of the errors are insertion or deletion errors. During alignment-based homology search, insertion or deletion errors in genes will cause frameshifts and may only lead to marginal alignment scores and short alignments. As a result, it is hard to distinguish true alignments from random alignments and the ambiguity will incur errors in structural and functional annotation. Existing frameshift correction tools are designed for data with much lower error rate and are not optimized for PacBio data. As an increasing number of groups are using SMRT, there is an urgent need for dedicated homology search tools for PacBio data. In this work, we introduce Frame-Pro, a profile homology search tool for PacBio reads. Our tool corrects sequencing errors and also outputs the profile alignments of the corrected sequences against characterized protein families. We applied our tool to both simulated and real PacBio data. The results showed that our method enables more sensitive homology search, especially for PacBio data sets of low sequencing coverage. In addition, we can correct more errors when comparing with a popular error correction tool that does not rely on hybrid sequencing. The source code is freely available at https://sourceforge.net/projects/frame-pro/ yannisun@msu.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Factors That Affect Large Subunit Ribosomal DNA Amplicon Sequencing Studies of Fungal Communities: Classification Method, Primer Choice, and Error

PubMed Central

Porter, Teresita M.; Golding, G. Brian

2012-01-01

Nuclear large subunit ribosomal DNA is widely used in fungal phylogenetics and to an increasing extent also amplicon-based environmental sequencing. The relatively short reads produced by next-generation sequencing, however, makes primer choice and sequence error important variables for obtaining accurate taxonomic classifications. In this simulation study we tested the performance of three classification methods: 1) a similarity-based method (BLAST + Metagenomic Analyzer, MEGAN); 2) a composition-based method (Ribosomal Database Project naïve Bayesian classifier, NBC); and, 3) a phylogeny-based method (Statistical Assignment Package, SAP). We also tested the effects of sequence length, primer choice, and sequence error on classification accuracy and perceived community composition. Using a leave-one-out cross validation approach, results for classifications to the genus rank were as follows: BLAST + MEGAN had the lowest error rate and was particularly robust to sequence error; SAP accuracy was highest when long LSU query sequences were classified; and, NBC runs significantly faster than the other tested methods. All methods performed poorly with the shortest 50–100 bp sequences. Increasing simulated sequence error reduced classification accuracy. Community shifts were detected due to sequence error and primer selection even though there was no change in the underlying community composition. Short read datasets from individual primers, as well as pooled datasets, appear to only approximate the true community composition. We hope this work informs investigators of some of the factors that affect the quality and interpretation of their environmental gene surveys. PMID:22558215
Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

PubMed Central

Matochko, Wadim L.; Derda, Ratmir

2013-01-01

Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (S a). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of S a and use them to define the sequencing operator (S e q). Sequencing without any bias and errors is S e q = S a IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (C E N), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071
Accounting for Errors in Low Coverage High-Throughput Sequencing Data When Constructing Genetic Maps Using Biparental Outcrossed Populations

PubMed Central

Bilton, Timothy P.; Schofield, Matthew R.; Black, Michael A.; Chagné, David; Wilcox, Phillip L.; Dodds, Ken G.

2018-01-01

Next-generation sequencing is an efficient method that allows for substantially more markers than previous technologies, providing opportunities for building high-density genetic linkage maps, which facilitate the development of nonmodel species’ genomic assemblies and the investigation of their genes. However, constructing genetic maps using data generated via high-throughput sequencing technology (e.g., genotyping-by-sequencing) is complicated by the presence of sequencing errors and genotyping errors resulting from missing parental alleles due to low sequencing depth. If unaccounted for, these errors lead to inflated genetic maps. In addition, map construction in many species is performed using full-sibling family populations derived from the outcrossing of two individuals, where unknown parental phase and varying segregation types further complicate construction. We present a new methodology for modeling low coverage sequencing data in the construction of genetic linkage maps using full-sibling populations of diploid species, implemented in a package called GUSMap. Our model is based on the Lander–Green hidden Markov model but extended to account for errors present in sequencing data. We were able to obtain accurate estimates of the recombination fractions and overall map distance using GUSMap, while most existing mapping packages produced inflated genetic maps in the presence of errors. Our results demonstrate the feasibility of using low coverage sequencing data to produce genetic maps without requiring extensive filtering of potentially erroneous genotypes, provided that the associated errors are correctly accounted for in the model. PMID:29487138
Accounting for Errors in Low Coverage High-Throughput Sequencing Data When Constructing Genetic Maps Using Biparental Outcrossed Populations.

PubMed

Bilton, Timothy P; Schofield, Matthew R; Black, Michael A; Chagné, David; Wilcox, Phillip L; Dodds, Ken G

2018-05-01

Next-generation sequencing is an efficient method that allows for substantially more markers than previous technologies, providing opportunities for building high-density genetic linkage maps, which facilitate the development of nonmodel species' genomic assemblies and the investigation of their genes. However, constructing genetic maps using data generated via high-throughput sequencing technology ( e.g. , genotyping-by-sequencing) is complicated by the presence of sequencing errors and genotyping errors resulting from missing parental alleles due to low sequencing depth. If unaccounted for, these errors lead to inflated genetic maps. In addition, map construction in many species is performed using full-sibling family populations derived from the outcrossing of two individuals, where unknown parental phase and varying segregation types further complicate construction. We present a new methodology for modeling low coverage sequencing data in the construction of genetic linkage maps using full-sibling populations of diploid species, implemented in a package called GUSMap. Our model is based on the Lander-Green hidden Markov model but extended to account for errors present in sequencing data. We were able to obtain accurate estimates of the recombination fractions and overall map distance using GUSMap, while most existing mapping packages produced inflated genetic maps in the presence of errors. Our results demonstrate the feasibility of using low coverage sequencing data to produce genetic maps without requiring extensive filtering of potentially erroneous genotypes, provided that the associated errors are correctly accounted for in the model. Copyright © 2018 Bilton et al.
Suppressing relaxation in superconducting qubits by quasiparticle pumping.

PubMed

Gustavsson, Simon; Yan, Fei; Catelani, Gianluigi; Bylander, Jonas; Kamal, Archana; Birenbaum, Jeffrey; Hover, David; Rosenberg, Danna; Samach, Gabriel; Sears, Adam P; Weber, Steven J; Yoder, Jonilyn L; Clarke, John; Kerman, Andrew J; Yoshihara, Fumiki; Nakamura, Yasunobu; Orlando, Terry P; Oliver, William D

2016-12-23

Dynamical error suppression techniques are commonly used to improve coherence in quantum systems. They reduce dephasing errors by applying control pulses designed to reverse erroneous coherent evolution driven by environmental noise. However, such methods cannot correct for irreversible processes such as energy relaxation. We investigate a complementary, stochastic approach to reducing errors: Instead of deterministically reversing the unwanted qubit evolution, we use control pulses to shape the noise environment dynamically. In the context of superconducting qubits, we implement a pumping sequence to reduce the number of unpaired electrons (quasiparticles) in close proximity to the device. A 70% reduction in the quasiparticle density results in a threefold enhancement in qubit relaxation times and a comparable reduction in coherence variability. Copyright © 2016, American Association for the Advancement of Science.
Processing Dynamic Image Sequences from a Moving Sensor.

DTIC Science & Technology

1984-02-01

65 Roadsign Image Sequence ..... ................ ... 70 Roadsign Sequence with Redundant Features .. ........ . 79 Roadsign Subimage...Selected Feature Error Values .. ........ 66 2c. Industrial Image Selected Feature Local Search Values. .. .... 67 3ab. Roadsign Image Error Values...72 3c. Roadsign Image Local Search Values ............. 73 4ab. Roadsign Redundant Feature Error Values. ............ 8 4c. Roadsign

Method and Apparatus for Evaluating the Visual Quality of Processed Digital Video Sequences

NASA Technical Reports Server (NTRS)

Watson, Andrew B. (Inventor)

2002-01-01

A Digital Video Quality (DVQ) apparatus and method that incorporate a model of human visual sensitivity to predict the visibility of artifacts. The DVQ method and apparatus are used for the evaluation of the visual quality of processed digital video sequences and for adaptively controlling the bit rate of the processed digital video sequences without compromising the visual quality. The DVQ apparatus minimizes the required amount of memory and computation. The input to the DVQ apparatus is a pair of color image sequences: an original (R) non-compressed sequence, and a processed (T) sequence. Both sequences (R) and (T) are sampled, cropped, and subjected to color transformations. The sequences are then subjected to blocking and discrete cosine transformation, and the results are transformed to local contrast. The next step is a time filtering operation which implements the human sensitivity to different time frequencies. The results are converted to threshold units by dividing each discrete cosine transform coefficient by its respective visual threshold. At the next stage the two sequences are subtracted to produce an error sequence. The error sequence is subjected to a contrast masking operation, which also depends upon the reference sequence (R). The masked errors can be pooled in various ways to illustrate the perceptual error over various dimensions, and the pooled error can be converted to a visual quality measure.
Ultra-deep mutant spectrum profiling: improving sequencing accuracy using overlapping read pairs.

PubMed

Chen-Harris, Haiyin; Borucki, Monica K; Torres, Clinton; Slezak, Tom R; Allen, Jonathan E

2013-02-12

High throughput sequencing is beginning to make a transformative impact in the area of viral evolution. Deep sequencing has the potential to reveal the mutant spectrum within a viral sample at high resolution, thus enabling the close examination of viral mutational dynamics both within- and between-hosts. The challenge however, is to accurately model the errors in the sequencing data and differentiate real viral mutations, particularly those that exist at low frequencies, from sequencing errors. We demonstrate that overlapping read pairs (ORP) -- generated by combining short fragment sequencing libraries and longer sequencing reads -- significantly reduce sequencing error rates and improve rare variant detection accuracy. Using this sequencing protocol and an error model optimized for variant detection, we are able to capture a large number of genetic mutations present within a viral population at ultra-low frequency levels (<0.05%). Our rare variant detection strategies have important implications beyond viral evolution and can be applied to any basic and clinical research area that requires the identification of rare mutations.
Design of RNA splicing analysis null models for post hoc filtering of Drosophila head RNA-Seq data with the splicing analysis kit (Spanki)

PubMed Central

2013-01-01

Background The production of multiple transcript isoforms from one gene is a major source of transcriptome complexity. RNA-Seq experiments, in which transcripts are converted to cDNA and sequenced, allow the resolution and quantification of alternative transcript isoforms. However, methods to analyze splicing are underdeveloped and errors resulting in incorrect splicing calls occur in every experiment. Results We used RNA-Seq data to develop sequencing and aligner error models. By applying these error models to known input from simulations, we found that errors result from false alignment to minor splice motifs and antisense stands, shifted junction positions, paralog joining, and repeat induced gaps. By using a series of quantitative and qualitative filters, we eliminated diagnosed errors in the simulation, and applied this to RNA-Seq data from Drosophila melanogaster heads. We used high-confidence junction detections to specifically interrogate local splicing differences between transcripts. This method out-performed commonly used RNA-seq methods to identify known alternative splicing events in the Drosophila sex determination pathway. We describe a flexible software package to perform these tasks called Splicing Analysis Kit (Spanki), available at http://www.cbcb.umd.edu/software/spanki. Conclusions Splice-junction centric analysis of RNA-Seq data provides advantages in specificity for detection of alternative splicing. Our software provides tools to better understand error profiles in RNA-Seq data and improve inference from this new technology. The splice-junction centric approach that this software enables will provide more accurate estimates of differentially regulated splicing than current tools. PMID:24209455
Design of RNA splicing analysis null models for post hoc filtering of Drosophila head RNA-Seq data with the splicing analysis kit (Spanki).

PubMed

Sturgill, David; Malone, John H; Sun, Xia; Smith, Harold E; Rabinow, Leonard; Samson, Marie-Laure; Oliver, Brian

2013-11-09

The production of multiple transcript isoforms from one gene is a major source of transcriptome complexity. RNA-Seq experiments, in which transcripts are converted to cDNA and sequenced, allow the resolution and quantification of alternative transcript isoforms. However, methods to analyze splicing are underdeveloped and errors resulting in incorrect splicing calls occur in every experiment. We used RNA-Seq data to develop sequencing and aligner error models. By applying these error models to known input from simulations, we found that errors result from false alignment to minor splice motifs and antisense stands, shifted junction positions, paralog joining, and repeat induced gaps. By using a series of quantitative and qualitative filters, we eliminated diagnosed errors in the simulation, and applied this to RNA-Seq data from Drosophila melanogaster heads. We used high-confidence junction detections to specifically interrogate local splicing differences between transcripts. This method out-performed commonly used RNA-seq methods to identify known alternative splicing events in the Drosophila sex determination pathway. We describe a flexible software package to perform these tasks called Splicing Analysis Kit (Spanki), available at http://www.cbcb.umd.edu/software/spanki. Splice-junction centric analysis of RNA-Seq data provides advantages in specificity for detection of alternative splicing. Our software provides tools to better understand error profiles in RNA-Seq data and improve inference from this new technology. The splice-junction centric approach that this software enables will provide more accurate estimates of differentially regulated splicing than current tools.
Stability of iterative procedures with errors for approximating common fixed points of a couple of q-contractive-like mappings in Banach spaces

NASA Astrophysics Data System (ADS)

Zeng, Lu-Chuan; Yao, Jen-Chih

2006-09-01

Recently, Agarwal, Cho, Li and Huang [R.P. Agarwal, Y.J. Cho, J. Li, N.J. Huang, Stability of iterative procedures with errors approximating common fixed points for a couple of quasi-contractive mappings in q-uniformly smooth Banach spaces, J. Math. Anal. Appl. 272 (2002) 435-447] introduced the new iterative procedures with errors for approximating the common fixed point of a couple of quasi-contractive mappings and showed the stability of these iterative procedures with errors in Banach spaces. In this paper, we introduce a new concept of a couple of q-contractive-like mappings (q>1) in a Banach space and apply these iterative procedures with errors for approximating the common fixed point of the couple of q-contractive-like mappings. The results established in this paper improve, extend and unify the corresponding ones of Agarwal, Cho, Li and Huang [R.P. Agarwal, Y.J. Cho, J. Li, N.J. Huang, Stability of iterative procedures with errors approximating common fixed points for a couple of quasi-contractive mappings in q-uniformly smooth Banach spaces, J. Math. Anal. Appl. 272 (2002) 435-447], Chidume [C.E. Chidume, Approximation of fixed points of quasi-contractive mappings in Lp spaces, Indian J. Pure Appl. Math. 22 (1991) 273-386], Chidume and Osilike [C.E. Chidume, M.O. Osilike, Fixed points iterations for quasi-contractive maps in uniformly smooth Banach spaces, Bull. Korean Math. Soc. 30 (1993) 201-212], Liu [Q.H. Liu, On Naimpally and Singh's open questions, J. Math. Anal. Appl. 124 (1987) 157-164; Q.H. Liu, A convergence theorem of the sequence of Ishikawa iterates for quasi-contractive mappings, J. Math. Anal. Appl. 146 (1990) 301-305], Osilike [M.O. Osilike, A stable iteration procedure for quasi-contractive maps, Indian J. Pure Appl. Math. 27 (1996) 25-34; M.O. Osilike, Stability of the Ishikawa iteration method for quasi-contractive maps, Indian J. Pure Appl. Math. 28 (1997) 1251-1265] and many others in the literature.
Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems

PubMed Central

2011-01-01

Background The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases. Results We provide quantifications and evidence for GC bias, error rates, error sequence context, effects of quality filtering, and the reliability of quality values. By combining different filtering criteria we reduced error rates 7-fold at the expense of discarding 12.5% of alignable bases. While overall error rates are low in HiSeq data we observed regions of accumulated wrong base calls. Only 3% of all error positions accounted for 24.7% of all substitution errors. Analyzing the forward and reverse strands separately revealed error rates of up to 18.7%. Insertions and deletions occurred at very low rates on average but increased to up to 2% in homopolymers. A positive correlation between read coverage and GC content was found depending on the GC content range. Conclusions The errors and biases we report have implications for the use and the interpretation of Illumina sequencing data. GAIIx and HiSeq data sets show slightly different error profiles. Quality filtering is essential to minimize downstream analysis artifacts. Supporting previous recommendations, the strand-specificity provides a criterion to distinguish sequencing errors from low abundance polymorphisms. PMID:22067484
Prefrontal neural correlates of memory for sequences.

PubMed

Averbeck, Bruno B; Lee, Daeyeol

2007-02-28

The sequence of actions appropriate to solve a problem often needs to be discovered by trial and error and recalled in the future when faced with the same problem. Here, we show that when monkeys had to discover and then remember a sequence of decisions across trials, ensembles of prefrontal cortex neurons reflected the sequence of decisions the animal would make throughout the interval between trials. This signal could reflect either an explicit memory process or a sequence-planning process that begins far in advance of the actual sequence execution. This finding extended to error trials such that, when the neural activity during the intertrial interval specified the wrong sequence, the animal also attempted to execute an incorrect sequence. More specifically, we used a decoding analysis to predict the sequence the monkey was planning to execute at the end of the fore-period, just before sequence execution. When this analysis was applied to error trials, we were able to predict where in the sequence the error would occur, up to three movements into the future. This suggests that prefrontal neural activity can retain information about sequences between trials, and that regardless of whether information is remembered correctly or incorrectly, the prefrontal activity veridically reflects the animal's action plan.
Evaluation of exome variants using the Ion Proton Platform to sequence error-prone regions.

PubMed

Seo, Heewon; Park, Yoomi; Min, Byung Joo; Seo, Myung Eui; Kim, Ju Han

2017-01-01

The Ion Proton sequencer from Thermo Fisher accurately determines sequence variants from target regions with a rapid turnaround time at a low cost. However, misleading variant-calling errors can occur. We performed a systematic evaluation and manual curation of read-level alignments for the 675 ultrarare variants reported by the Ion Proton sequencer from 27 whole-exome sequencing data but that are not present in either the 1000 Genomes Project and the Exome Aggregation Consortium. We classified positive variant calls into 393 highly likely false positives, 126 likely false positives, and 156 likely true positives, which comprised 58.2%, 18.7%, and 23.1% of the variants, respectively. We identified four distinct error patterns of variant calling that may be bioinformatically corrected when using different strategies: simplicity region, SNV cluster, peripheral sequence read, and base inversion. Local de novo assembly successfully corrected 201 (38.7%) of the 519 highly likely or likely false positives. We also demonstrate that the two sequencing kits from Thermo Fisher (the Ion PI Sequencing 200 kit V3 and the Ion PI Hi-Q kit) exhibit different error profiles across different error types. A refined calling algorithm with better polymerase may improve the performance of the Ion Proton sequencing platform.
Sequence-structure mapping errors in the PDB: OB-fold domains

PubMed Central

Venclovas, Česlovas; Ginalski, Krzysztof; Kang, Chulhee

2004-01-01

The Protein Data Bank (PDB) is the single most important repository of structural data for proteins and other biologically relevant molecules. Therefore, it is critically important to keep the PDB data, as much as possible, error-free. In this study, we have analyzed PDB crystal structures possessing oligonucleotide/oligosaccharide binding (OB)-fold, one of the highly populated folds, for the presence of sequence-structure mapping errors. Using energy-based structure quality assessment coupled with sequence analyses, we have found that there are at least five OB-structures in the PDB that have regions where sequences have been incorrectly mapped onto the structure. We have demonstrated that the combination of these computation techniques is effective not only in detecting sequence-structure mapping errors, but also in providing guidance to correct them. Namely, we have used results of computational analysis to direct a revision of X-ray data for one of the PDB entries containing a fairly inconspicuous sequence-structure mapping error. The revised structure has been deposited with the PDB. We suggest use of computational energy assessment and sequence analysis techniques to facilitate structure determination when homologs having known structure are available to use as a reference. Such computational analysis may be useful in either guiding the sequence-structure assignment process or verifying the sequence mapping within poorly defined regions. PMID:15133161
Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells.

PubMed

Beltman, Joost B; Urbanus, Jos; Velds, Arno; van Rooij, Nienke; Rohr, Jan C; Naik, Shalin H; Schumacher, Ton N

2016-04-02

Next generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing. For applications in which both abundant and rare sequences are biologically relevant, the relatively high error rate of NGS techniques complicates data analysis, as it is difficult to distinguish rare true sequences from spurious sequences that are generated by PCR or sequencing errors. This issue, for instance, applies to cellular barcoding strategies that aim to follow the amount and type of offspring of single cells, by supplying these with unique heritable DNA tags. Here, we use genetic barcoding data from the Illumina HiSeq platform to show that straightforward read threshold-based filtering of data is typically insufficient to filter out spurious barcodes. Importantly, we demonstrate that specific sequencing errors occur at an approximately constant rate across different samples that are sequenced in parallel. We exploit this observation by developing a novel approach to filter out spurious sequences. Application of our new method demonstrates its value in the identification of true sequences amongst spurious sequences in biological data sets.
Portable and Error-Free DNA-Based Data Storage.

PubMed

Yazdi, S M Hossein Tabatabaei; Gabrys, Ryan; Milenkovic, Olgica

2017-07-10

DNA-based data storage is an emerging nonvolatile memory technology of potentially unprecedented density, durability, and replication efficiency. The basic system implementation steps include synthesizing DNA strings that contain user information and subsequently retrieving them via high-throughput sequencing technologies. Existing architectures enable reading and writing but do not offer random-access and error-free data recovery from low-cost, portable devices, which is crucial for making the storage technology competitive with classical recorders. Here we show for the first time that a portable, random-access platform may be implemented in practice using nanopore sequencers. The novelty of our approach is to design an integrated processing pipeline that encodes data to avoid costly synthesis and sequencing errors, enables random access through addressing, and leverages efficient portable sequencing via new iterative alignment and deletion error-correcting codes. Our work represents the only known random access DNA-based data storage system that uses error-prone nanopore sequencers, while still producing error-free readouts with the highest reported information rate/density. As such, it represents a crucial step towards practical employment of DNA molecules as storage media.
A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS

PubMed Central

Jiao, Xiaoli; Zheng, Xin; Ma, Liang; Kutty, Geetha; Gogineni, Emile; Sun, Qiang; Sherman, Brad T.; Hu, Xiaojun; Jones, Kristine; Raley, Castle; Tran, Bao; Munroe, David J.; Stephens, Robert; Liang, Dun; Imamichi, Tomozumi; Kovacs, Joseph A.; Lempicki, Richard A.; Huang, Da Wei

2013-01-01

PacBio RS, a newly emerging third-generation DNA sequencing platform, is based on a real-time, single-molecule, nano-nitch sequencing technology that can generate very long reads (up to 20-kb) in contrast to the shorter reads produced by the first and second generation sequencing technologies. As a new platform, it is important to assess the sequencing error rate, as well as the quality control (QC) parameters associated with the PacBio sequence data. In this study, a mixture of 10 prior known, closely related DNA amplicons were sequenced using the PacBio RS sequencing platform. After aligning Circular Consensus Sequence (CCS) reads derived from the above sequencing experiment to the known reference sequences, we found that the median error rate was 2.5% without read QC, and improved to 1.3% with an SVM based multi-parameter QC method. In addition, a De Novo assembly was used as a downstream application to evaluate the effects of different QC approaches. This benchmark study indicates that even though CCS reads are post error-corrected it is still necessary to perform appropriate QC on CCS reads in order to produce successful downstream bioinformatics analytical results. PMID:24179701
A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS.

PubMed

Jiao, Xiaoli; Zheng, Xin; Ma, Liang; Kutty, Geetha; Gogineni, Emile; Sun, Qiang; Sherman, Brad T; Hu, Xiaojun; Jones, Kristine; Raley, Castle; Tran, Bao; Munroe, David J; Stephens, Robert; Liang, Dun; Imamichi, Tomozumi; Kovacs, Joseph A; Lempicki, Richard A; Huang, Da Wei

2013-07-31

PacBio RS, a newly emerging third-generation DNA sequencing platform, is based on a real-time, single-molecule, nano-nitch sequencing technology that can generate very long reads (up to 20-kb) in contrast to the shorter reads produced by the first and second generation sequencing technologies. As a new platform, it is important to assess the sequencing error rate, as well as the quality control (QC) parameters associated with the PacBio sequence data. In this study, a mixture of 10 prior known, closely related DNA amplicons were sequenced using the PacBio RS sequencing platform. After aligning Circular Consensus Sequence (CCS) reads derived from the above sequencing experiment to the known reference sequences, we found that the median error rate was 2.5% without read QC, and improved to 1.3% with an SVM based multi-parameter QC method. In addition, a De Novo assembly was used as a downstream application to evaluate the effects of different QC approaches. This benchmark study indicates that even though CCS reads are post error-corrected it is still necessary to perform appropriate QC on CCS reads in order to produce successful downstream bioinformatics analytical results.
Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin.

PubMed

Bokulich, Nicholas A; Kaehler, Benjamin D; Rideout, Jai Ram; Dillon, Matthew; Bolyen, Evan; Knight, Rob; Huttley, Gavin A; Gregory Caporaso, J

2018-05-17

Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit ( https://github.com/caporaso-lab/tax-credit-data ). Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.
DNA/RNA transverse current sequencing: intrinsic structural noise from neighboring bases

PubMed Central

Alvarez, Jose R.; Skachkov, Dmitry; Massey, Steven E.; Kalitsov, Alan; Velev, Julian P.

2015-01-01

Nanopore DNA sequencing via transverse current has emerged as a promising candidate for third-generation sequencing technology. It produces long read lengths which could alleviate problems with assembly errors inherent in current technologies. However, the high error rates of nanopore sequencing have to be addressed. A very important source of the error is the intrinsic noise in the current arising from carrier dispersion along the chain of the molecule, i.e., from the influence of neighboring bases. In this work we perform calculations of the transverse current within an effective multi-orbital tight-binding model derived from first-principles calculations of the DNA/RNA molecules, to study the effect of this structural noise on the error rates in DNA/RNA sequencing via transverse current in nanopores. We demonstrate that a statistical technique, utilizing not only the currents through the nucleotides but also the correlations in the currents, can in principle reduce the error rate below any desired precision. PMID:26150827
Rapid and accurate pyrosequencing of angiosperm plastid genomes

PubMed Central

Moore, Michael J; Dhingra, Amit; Soltis, Pamela S; Shaw, Regina; Farmerie, William G; Folta, Kevin M; Soltis, Douglas E

2006-01-01

Background Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20) System (454 Life Sciences Corporation), to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae) and Platanus occidentalis (Platanaceae). Results More than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6× in Nandina and 17.3× in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with ~60% of all errors associated with homopolymer runs of 5 or more nucleotides and ~50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions. Conclusion Highly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy observed in the GS 20 plastid genome sequence was generated for a significant reduction in time and cost over traditional shotgun-based genome sequencing techniques, although with approximately half the coverage of previously reported GS 20 de novo genome sequence. The GS 20 should be broadly applicable to angiosperm plastid genome sequencing, and therefore promises to expand the scale of plant genetic and phylogenetic research dramatically. PMID:16934154
Simultaneous message framing and error detection

NASA Technical Reports Server (NTRS)

Frey, A. H., Jr.

1968-01-01

Circuitry simultaneously inserts message framing information and detects noise errors in binary code data transmissions. Separate message groups are framed without requiring both framing bits and error-checking bits, and predetermined message sequence are separated from other message sequences without being hampered by intervening noise.
Push it to the limit: Characterizing the convergence of common sequences of basis sets for intermolecular interactions as described by density functional theory

NASA Astrophysics Data System (ADS)

Witte, Jonathon; Neaton, Jeffrey B.; Head-Gordon, Martin

2016-05-01

With the aim of systematically characterizing the convergence of common families of basis sets such that general recommendations for basis sets can be made, we have tested a wide variety of basis sets against complete-basis binding energies across the S22 set of intermolecular interactions—noncovalent interactions of small and medium-sized molecules consisting of first- and second-row atoms—with three distinct density functional approximations: SPW92, a form of local-density approximation; B3LYP, a global hybrid generalized gradient approximation; and B97M-V, a meta-generalized gradient approximation with nonlocal correlation. We have found that it is remarkably difficult to reach the basis set limit; for the methods and systems examined, the most complete basis is Jensen's pc-4. The Dunning correlation-consistent sequence of basis sets converges slowly relative to the Jensen sequence. The Karlsruhe basis sets are quite cost effective, particularly when a correction for basis set superposition error is applied: counterpoise-corrected def2-SVPD binding energies are better than corresponding energies computed in comparably sized Dunning and Jensen bases, and on par with uncorrected results in basis sets 3-4 times larger. These trends are exhibited regardless of the level of density functional approximation employed. A sense of the magnitude of the intrinsic incompleteness error of each basis set not only provides a foundation for guiding basis set choice in future studies but also facilitates quantitative comparison of existing studies on similar types of systems.
Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach

PubMed Central

Boitard, Simon; Rodríguez, Willy; Jay, Flora; Mona, Stefano; Austerlitz, Frédéric

2016-01-01

Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. PMID:26943927
Reverse Transcription Errors and RNA-DNA Differences at Short Tandem Repeats.

PubMed

Fungtammasan, Arkarachai; Tomaszkiewicz, Marta; Campos-Sánchez, Rebeca; Eckert, Kristin A; DeGiorgio, Michael; Makova, Kateryna D

2016-10-01

Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA-DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a Caenorhabditis elegans data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.

PubMed

Otto, Thomas D; Sanders, Mandy; Berriman, Matthew; Newbold, Chris

2010-07-15

The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy. Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications. The software is available at http://icorn.sourceforge.net
Long-Term Predictive and Feedback Encoding of Motor Signals in the Simple Spike Discharge of Purkinje Cells

PubMed Central

Popa, Laurentiu S.; Streng, Martha L.

2017-01-01

Abstract Most hypotheses of cerebellar function emphasize a role in real-time control of movements. However, the cerebellum’s use of current information to adjust future movements and its involvement in sequencing, working memory, and attention argues for predicting and maintaining information over extended time windows. The present study examines the time course of Purkinje cell discharge modulation in the monkey (Macaca mulatta) during manual, pseudo-random tracking. Analysis of the simple spike firing from 183 Purkinje cells during tracking reveals modulation up to 2 s before and after kinematics and position error. Modulation significance was assessed against trial shuffled firing, which decoupled simple spike activity from behavior and abolished long-range encoding while preserving data statistics. Position, velocity, and position errors have the most frequent and strongest long-range feedforward and feedback modulations, with less common, weaker long-term correlations for speed and radial error. Position, velocity, and position errors can be decoded from the population simple spike firing with considerable accuracy for even the longest predictive (-2000 to -1500 ms) and feedback (1500 to 2000 ms) epochs. Separate analysis of the simple spike firing in the initial hold period preceding tracking shows similar long-range feedforward encoding of the upcoming movement and in the final hold period feedback encoding of the just completed movement, respectively. Complex spike analysis reveals little long-term modulation with behavior. We conclude that Purkinje cell simple spike discharge includes short- and long-range representations of both upcoming and preceding behavior that could underlie cerebellar involvement in error correction, working memory, and sequencing. PMID:28413823
MultiGeMS: detection of SNVs from multiple samples using model selection on high-throughput sequencing data.

PubMed

Murillo, Gabriel H; You, Na; Su, Xiaoquan; Cui, Wei; Reilly, Muredach P; Li, Mingyao; Ning, Kang; Cui, Xinping

2016-05-15

Single nucleotide variant (SNV) detection procedures are being utilized as never before to analyze the recent abundance of high-throughput DNA sequencing data, both on single and multiple sample datasets. Building on previously published work with the single sample SNV caller genotype model selection (GeMS), a multiple sample version of GeMS (MultiGeMS) is introduced. Unlike other popular multiple sample SNV callers, the MultiGeMS statistical model accounts for enzymatic substitution sequencing errors. It also addresses the multiple testing problem endemic to multiple sample SNV calling and utilizes high performance computing (HPC) techniques. A simulation study demonstrates that MultiGeMS ranks highest in precision among a selection of popular multiple sample SNV callers, while showing exceptional recall in calling common SNVs. Further, both simulation studies and real data analyses indicate that MultiGeMS is robust to low-quality data. We also demonstrate that accounting for enzymatic substitution sequencing errors not only improves SNV call precision at low mapping quality regions, but also improves recall at reference allele-dominated sites with high mapping quality. The MultiGeMS package can be downloaded from https://github.com/cui-lab/multigems xinping.cui@ucr.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
SUGAR: graphical user interface-based data refiner for high-throughput DNA sequencing.

PubMed

Sato, Yukuto; Kojima, Kaname; Nariai, Naoki; Yamaguchi-Kabata, Yumi; Kawai, Yosuke; Takahashi, Mamoru; Mimori, Takahiro; Nagasaki, Masao

2014-08-08

Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.
Comparing errors in ED computer-assisted vs conventional pediatric drug dosing and administration.

PubMed

Yamamoto, Loren; Kanemori, Joan

2010-06-01

Compared to fixed-dose single-vial drug administration in adults, pediatric drug dosing and administration requires a series of calculations, all of which are potentially error prone. The purpose of this study is to compare error rates and task completion times for common pediatric medication scenarios using computer program assistance vs conventional methods. Two versions of a 4-part paper-based test were developed. Each part consisted of a set of medication administration and/or dosing tasks. Emergency department and pediatric intensive care unit nurse volunteers completed these tasks using both methods (sequence assigned to start with a conventional or a computer-assisted approach). Completion times, errors, and the reason for the error were recorded. Thirty-eight nurses completed the study. Summing the completion of all 4 parts, the mean conventional total time was 1243 seconds vs the mean computer program total time of 879 seconds (P < .001). The conventional manual method had a mean of 1.8 errors vs the computer program with a mean of 0.7 errors (P < .001). Of the 97 total errors, 36 were due to misreading the drug concentration on the label, 34 were due to calculation errors, and 8 were due to misplaced decimals. Of the 36 label interpretation errors, 18 (50%) occurred with digoxin or insulin. Computerized assistance reduced errors and the time required for drug administration calculations. A pattern of errors emerged, noting that reading/interpreting certain drug labels were more error prone. Optimizing the layout of drug labels could reduce the error rate for error-prone labels. Copyright (c) 2010 Elsevier Inc. All rights reserved.
[Transposition errors during learning to reproduce a sequence by the right- and the left-hand movements: simulation of positional and movement coding].

PubMed

Liakhovetskiĭ, V A; Bobrova, E V; Skopin, G N

2012-01-01

Transposition errors during the reproduction of a hand movement sequence make it possible to receive important information on the internal representation of this sequence in the motor working memory. Analysis of such errors showed that learning to reproduce sequences of the left-hand movements improves the system of positional coding (coding ofpositions), while learning of the right-hand movements improves the system of vector coding (coding of movements). Learning of the right-hand movements after the left-hand performance involved the system of positional coding "imposed" by the left hand. Learning of the left-hand movements after the right-hand performance activated the system of vector coding. Transposition errors during learning to reproduce movement sequences can be explained by neural network using either vector coding or both vector and positional coding.
Dysgraphia in Patients with Primary Lateral Sclerosis: A Speech-Based Rehearsal Deficit?

PubMed Central

Zago, S.; Poletti, B.; Corbo, M.; Adobbati, L.; Silani, V.

2008-01-01

The present study aims to demonstrate that errors when writing are more common than expected in patients affected by primary lateral sclerosis (PLS) with severe dysarthria or complete mutism, independent of spasticity. Sixteen patients meeting Pringle’s et al. [34] criteria for PLS underwent standard neuropsychological tasks and evaluation of writing. We assessed writing abilities in spelling through dictation in which a set of words, non-words and short phrases were presented orally and by composing words using a set of preformed letters. Finally, a written copying task was performed with the same words. Relative to controls, PLS patients made a greater number of spelling errors in all writing conditions, but not in copy task. The error types included: omissions, transpositions, insertions and letter substitutions. These were equally distributed on the writing task and the composition of words with a set of preformed letters. This pattern of performance is consistent with a spelling impairment. The results are consistent with the concept that written production is critically dependent on the subvocal articulatory mechanism of rehearsal, perhaps at the level of retaining the sequence of graphemes in a graphemic buffer. In PLS patients a disturbance in rehearsal opportunity may affect the correct sequencing/assembly of an orthographic representation in the written process. PMID:19096141
Unbiased Taxonomic Annotation of Metagenomic Samples

PubMed Central

Fosso, Bruno; Pesole, Graziano; Rosselló, Francesc

2018-01-01

Abstract The classification of reads from a metagenomic sample using a reference taxonomy is usually based on first mapping the reads to the reference sequences and then classifying each read at a node under the lowest common ancestor of the candidate sequences in the reference taxonomy with the least classification error. However, this taxonomic annotation can be biased by an imbalanced taxonomy and also by the presence of multiple nodes in the taxonomy with the least classification error for a given read. In this article, we show that the Rand index is a better indicator of classification error than the often used area under the receiver operating characteristic (ROC) curve and F-measure for both balanced and imbalanced reference taxonomies, and we also address the second source of bias by reducing the taxonomic annotation problem for a whole metagenomic sample to a set cover problem, for which a logarithmic approximation can be obtained in linear time and an exact solution can be obtained by integer linear programming. Experimental results with a proof-of-concept implementation of the set cover approach to taxonomic annotation in a next release of the TANGO software show that the set cover approach further reduces ambiguity in the taxonomic annotation obtained with TANGO without distorting the relative abundance profile of the metagenomic sample. PMID:29028181
MeCorS: Metagenome-enabled error correction of single cell sequencing reads

DOE PAGES

Bremges, Andreas; Singer, Esther; Woyke, Tanja; ...

2016-03-15

Here we present a new tool, MeCorS, to correct chimeric reads and sequencing errors in Illumina data generated from single amplified genomes (SAGs). It uses sequence information derived from accompanying metagenome sequencing to accurately correct errors in SAG reads, even from ultra-low coverage regions. In evaluations on real data, we show that MeCorS outperforms BayesHammer, the most widely used state-of-the-art approach. MeCorS performs particularly well in correcting chimeric reads, which greatly improves both accuracy and contiguity of de novo SAG assemblies.
Experimental quantum verification in the presence of temporally correlated noise

NASA Astrophysics Data System (ADS)

Mavadia, S.; Edmunds, C. L.; Hempel, C.; Ball, H.; Roy, F.; Stace, T. M.; Biercuk, M. J.

2018-02-01

Growth in the capabilities of quantum information hardware mandates access to techniques for performance verification that function under realistic laboratory conditions. Here we experimentally characterise the impact of common temporally correlated noise processes on both randomised benchmarking (RB) and gate-set tomography (GST). Our analysis highlights the role of sequence structure in enhancing or suppressing the sensitivity of quantum verification protocols to either slowly or rapidly varying noise, which we treat in the limiting cases of quasi-DC miscalibration and white noise power spectra. We perform experiments with a single trapped 171Yb+ ion-qubit and inject engineered noise (" separators="∝σ^ z ) to probe protocol performance. Experiments on RB validate predictions that measured fidelities over sequences are described by a gamma distribution varying between approximately Gaussian, and a broad, highly skewed distribution for rapidly and slowly varying noise, respectively. Similarly we find a strong gate set dependence of default experimental GST procedures in the presence of correlated errors, leading to significant deviations between estimated and calculated diamond distances in the presence of correlated σ^ z errors. Numerical simulations demonstrate that expansion of the gate set to include negative rotations can suppress these discrepancies and increase reported diamond distances by orders of magnitude for the same error processes. Similar effects do not occur for correlated σ^ x or σ^ y errors or depolarising noise processes, highlighting the impact of the critical interplay of selected gate set and the gauge optimisation process on the meaning of the reported diamond norm in correlated noise environments.
ADEPT, a dynamic next generation sequencing data error-detection program with trimming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feng, Shihai; Lo, Chien-Chi; Li, Po-E

Illumina is the most widely used next generation sequencing technology and produces millions of short reads that contain errors. These sequencing errors constitute a major problem in applications such as de novo genome assembly, metagenomics analysis and single nucleotide polymorphism discovery. In this study, we present ADEPT, a dynamic error detection method, based on the quality scores of each nucleotide and its neighboring nucleotides, together with their positions within the read and compares this to the position-specific quality score distribution of all bases within the sequencing run. This method greatly improves upon other available methods in terms of the truemore » positive rate of error discovery without affecting the false positive rate, particularly within the middle of reads. We conclude that ADEPT is the only tool to date that dynamically assesses errors within reads by comparing position-specific and neighboring base quality scores with the distribution of quality scores for the dataset being analyzed. The result is a method that is less prone to position-dependent under-prediction, which is one of the most prominent issues in error prediction. The outcome is that ADEPT improves upon prior efforts in identifying true errors, primarily within the middle of reads, while reducing the false positive rate.« less
ADEPT, a dynamic next generation sequencing data error-detection program with trimming

DOE PAGES

Feng, Shihai; Lo, Chien-Chi; Li, Po-E; ...

2016-02-29

Illumina is the most widely used next generation sequencing technology and produces millions of short reads that contain errors. These sequencing errors constitute a major problem in applications such as de novo genome assembly, metagenomics analysis and single nucleotide polymorphism discovery. In this study, we present ADEPT, a dynamic error detection method, based on the quality scores of each nucleotide and its neighboring nucleotides, together with their positions within the read and compares this to the position-specific quality score distribution of all bases within the sequencing run. This method greatly improves upon other available methods in terms of the truemore » positive rate of error discovery without affecting the false positive rate, particularly within the middle of reads. We conclude that ADEPT is the only tool to date that dynamically assesses errors within reads by comparing position-specific and neighboring base quality scores with the distribution of quality scores for the dataset being analyzed. The result is a method that is less prone to position-dependent under-prediction, which is one of the most prominent issues in error prediction. The outcome is that ADEPT improves upon prior efforts in identifying true errors, primarily within the middle of reads, while reducing the false positive rate.« less
Estimating population genetic parameters and comparing model goodness-of-fit using DNA sequences with error

PubMed Central

Liu, Xiaoming; Fu, Yun-Xin; Maxwell, Taylor J.; Boerwinkle, Eric

2010-01-01

It is known that sequencing error can bias estimation of evolutionary or population genetic parameters. This problem is more prominent in deep resequencing studies because of their large sample size n, and a higher probability of error at each nucleotide site. We propose a new method based on the composite likelihood of the observed SNP configurations to infer population mutation rate θ = 4Neμ, population exponential growth rate R, and error rate ɛ, simultaneously. Using simulation, we show the combined effects of the parameters, θ, n, ɛ, and R on the accuracy of parameter estimation. We compared our maximum composite likelihood estimator (MCLE) of θ with other θ estimators that take into account the error. The results show the MCLE performs well when the sample size is large or the error rate is high. Using parametric bootstrap, composite likelihood can also be used as a statistic for testing the model goodness-of-fit of the observed DNA sequences. The MCLE method is applied to sequence data on the ANGPTL4 gene in 1832 African American and 1045 European American individuals. PMID:19952140
RECKONER: read error corrector based on KMC.

PubMed

Dlugosz, Maciej; Deorowicz, Sebastian

2017-04-01

Presence of sequencing errors in data produced by next-generation sequencers affects quality of downstream analyzes. Accuracy of them can be improved by performing error correction of sequencing reads. We introduce a new correction algorithm capable of processing eukaryotic close to 500 Mbp-genome-size, high error-rated data using less than 4 GB of RAM in about 35 min on 16-core computer. Program is freely available at http://sun.aei.polsl.pl/REFRESH/reckoner . sebastian.deorowicz@polsl.pl. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
The cerebellum for jocks and nerds alike.

PubMed

Popa, Laurentiu S; Hewitt, Angela L; Ebner, Timothy J

2014-01-01

Historically the cerebellum has been implicated in the control of movement. However, the cerebellum's role in non-motor functions, including cognitive and emotional processes, has also received increasing attention. Starting from the premise that the uniform architecture of the cerebellum underlies a common mode of information processing, this review examines recent electrophysiological findings on the motor signals encoded in the cerebellar cortex and then relates these signals to observations in the non-motor domain. Simple spike firing of individual Purkinje cells encodes performance errors, both predicting upcoming errors as well as providing feedback about those errors. Further, this dual temporal encoding of prediction and feedback involves a change in the sign of the simple spike modulation. Therefore, Purkinje cell simple spike firing both predicts and responds to feedback about a specific parameter, consistent with computing sensory prediction errors in which the predictions about the consequences of a motor command are compared with the feedback resulting from the motor command execution. These new findings are in contrast with the historical view that complex spikes encode errors. Evaluation of the kinematic coding in the simple spike discharge shows the same dual temporal encoding, suggesting this is a common mode of signal processing in the cerebellar cortex. Decoding analyses show the considerable accuracy of the predictions provided by Purkinje cells across a range of times. Further, individual Purkinje cells encode linearly and independently a multitude of signals, both kinematic and performance errors. Therefore, the cerebellar cortex's capacity to make associations across different sensory, motor and non-motor signals is large. The results from studying how Purkinje cells encode movement signals suggest that the cerebellar cortex circuitry can support associative learning, sequencing, working memory, and forward internal models in non-motor domains.
The cerebellum for jocks and nerds alike

PubMed Central

Popa, Laurentiu S.; Hewitt, Angela L.; Ebner, Timothy J.

2014-01-01

Historically the cerebellum has been implicated in the control of movement. However, the cerebellum's role in non-motor functions, including cognitive and emotional processes, has also received increasing attention. Starting from the premise that the uniform architecture of the cerebellum underlies a common mode of information processing, this review examines recent electrophysiological findings on the motor signals encoded in the cerebellar cortex and then relates these signals to observations in the non-motor domain. Simple spike firing of individual Purkinje cells encodes performance errors, both predicting upcoming errors as well as providing feedback about those errors. Further, this dual temporal encoding of prediction and feedback involves a change in the sign of the simple spike modulation. Therefore, Purkinje cell simple spike firing both predicts and responds to feedback about a specific parameter, consistent with computing sensory prediction errors in which the predictions about the consequences of a motor command are compared with the feedback resulting from the motor command execution. These new findings are in contrast with the historical view that complex spikes encode errors. Evaluation of the kinematic coding in the simple spike discharge shows the same dual temporal encoding, suggesting this is a common mode of signal processing in the cerebellar cortex. Decoding analyses show the considerable accuracy of the predictions provided by Purkinje cells across a range of times. Further, individual Purkinje cells encode linearly and independently a multitude of signals, both kinematic and performance errors. Therefore, the cerebellar cortex's capacity to make associations across different sensory, motor and non-motor signals is large. The results from studying how Purkinje cells encode movement signals suggest that the cerebellar cortex circuitry can support associative learning, sequencing, working memory, and forward internal models in non-motor domains. PMID:24987338
Curated eutherian third party data gene data sets.

PubMed

Premzl, Marko

2016-03-01

The free available eutherian genomic sequence data sets advanced scientific field of genomics. Of note, future revisions of gene data sets were expected, due to incompleteness of public eutherian genomic sequence assemblies and potential genomic sequence errors. The eutherian comparative genomic analysis protocol was proposed as guidance in protection against potential genomic sequence errors in public eutherian genomic sequences. The protocol was applicable in updates of 7 major eutherian gene data sets, including 812 complete coding sequences deposited in European Nucleotide Archive as curated third party data gene data sets.
MetAMOS: a modular and open source metagenomic assembly and analysis pipeline

PubMed Central

2013-01-01

We describe MetAMOS, an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds, open-reading frames and taxonomic or functional annotations. MetAMOS can aid in reducing assembly errors, commonly encountered when assembling metagenomic samples, and improves taxonomic assignment accuracy while also reducing computational cost. MetAMOS can be downloaded from: https://github.com/treangen/MetAMOS. PMID:23320958
Error mitigation for CCSD compressed imager data

NASA Astrophysics Data System (ADS)

Gladkova, Irina; Grossberg, Michael; Gottipati, Srikanth; Shahriar, Fazlul; Bonev, George

2009-08-01

To efficiently use the limited bandwidth available on the downlink from satellite to ground station, imager data is usually compressed before transmission. Transmission introduces unavoidable errors, which are only partially removed by forward error correction and packetization. In the case of the commonly used CCSD Rice-based compression, it results in a contiguous sequence of dummy values along scan lines in a band of the imager data. We have developed a method capable of using the image statistics to provide a principled estimate of the missing data. Our method outperforms interpolation yet can be performed fast enough to provide uninterrupted data flow. The estimation of the lost data provides significant value to end users who may use only part of the data, may not have statistical tools, or lack the expertise to mitigate the impact of the lost data. Since the locations of the lost data will be clearly marked as meta-data in the HDF or NetCDF header, experts who prefer to handle error mitigation themselves will be free to use or ignore our estimates as they see fit.
Analysis of mutational spectra by denaturant capillary electrophoresis

PubMed Central

Ekstrøm, Per O.; Khrapko, Konstantin; Li-Sucholeiki, Xiao-Cheng; Hunter, Ian W.; Thilly, William G.

2009-01-01

Numbers and kinds of point mutant within DNA from cells, tissues and human population may be discovered for nearly any 75–250bp DNA sequence. High fidelity DNA amplification incorporating a thermally stable DNA “clamp” is followed by separation by denaturing capillary electrophoresis (DCE). DCE allows for peak collection and verification sequencing. DCE in a mode of cycling temperature, e.g.+/− 5°C, CyDCE, permits high resolution of mutant sequences using computer defined analytes without preliminary optimization experiments. DNA sequencers have been modified to permit higher throughput CyDCE and a massively parallel,~25,000 capillary system, has been designed for pangenomic scans in large human populations. DCE has been used to define quantitative point mutational spectra for study a wide variety of genetic phenomena: errors of DNA polymerases, mutations induced in human cells by chemicals and irradiation, testing of human gene-common disease associations and the discovery of origins of point mutations in human development and carcinogenesis. PMID:18600220

Quick, sensitive and specific detection and evaluation of quantification of minor variants by high-throughput sequencing.

PubMed

Leung, Ross Ka-Kit; Dong, Zhi Qiang; Sa, Fei; Chong, Cheong Meng; Lei, Si Wan; Tsui, Stephen Kwok-Wing; Lee, Simon Ming-Yuen

2014-02-01

Minor variants have significant implications in quasispecies evolution, early cancer detection and non-invasive fetal genotyping but their accurate detection by next-generation sequencing (NGS) is hampered by sequencing errors. We generated sequencing data from mixtures at predetermined ratios in order to provide insight into sequencing errors and variations that can arise for which simulation cannot be performed. The information also enables better parameterization in depth of coverage, read quality and heterogeneity, library preparation techniques, technical repeatability for mathematical modeling, theory development and simulation experimental design. We devised minor variant authentication rules that achieved 100% accuracy in both testing and validation experiments. The rules are free from tedious inspection of alignment accuracy, sequencing read quality or errors introduced by homopolymers. The authentication processes only require minor variants to: (1) have minimum depth of coverage larger than 30; (2) be reported by (a) four or more variant callers, or (b) DiBayes or LoFreq, plus SNVer (or BWA when no results are returned by SNVer), and with the interassay coefficient of variation (CV) no larger than 0.1. Quantification accuracy undermined by sequencing errors could neither be overcome by ultra-deep sequencing, nor recruiting more variant callers to reach a consensus, such that consistent underestimation and overestimation (i.e. low CV) were observed. To accommodate stochastic error and adjust the observed ratio within a specified accuracy, we presented a proof of concept for the use of a double calibration curve for quantification, which provides an important reference towards potential industrial-scale fabrication of calibrants for NGS.
Accounting for uncertainty in DNA sequencing data.

PubMed

O'Rawe, Jason A; Ferson, Scott; Lyon, Gholson J

2015-02-01

Science is defined in part by an honest exposition of the uncertainties that arise in measurements and propagate through calculations and inferences, so that the reliabilities of its conclusions are made apparent. The recent rapid development of high-throughput DNA sequencing technologies has dramatically increased the number of measurements made at the biochemical and molecular level. These data come from many different DNA-sequencing technologies, each with their own platform-specific errors and biases, which vary widely. Several statistical studies have tried to measure error rates for basic determinations, but there are no general schemes to project these uncertainties so as to assess the surety of the conclusions drawn about genetic, epigenetic, and more general biological questions. We review here the state of uncertainty quantification in DNA sequencing applications, describe sources of error, and propose methods that can be used for accounting and propagating these errors and their uncertainties through subsequent calculations. Copyright © 2014 Elsevier Ltd. All rights reserved.
A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics

PubMed Central

Nesvizhskii, Alexey I.

2010-01-01

This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues. PMID:20816881
A systematic comparison of error correction enzymes by next-generation sequencing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lubock, Nathan B.; Zhang, Di; Sidore, Angus M.

Gene synthesis, the process of assembling genelength fragments from shorter groups of oligonucleotides (oligos), is becoming an increasingly important tool in molecular and synthetic biology. The length, quality and cost of gene synthesis are limited by errors produced during oligo synthesis and subsequent assembly. Enzymatic error correction methods are cost-effective means to ameliorate errors in gene synthesis. Previous analyses of these methods relied on cloning and Sanger sequencing to evaluate their efficiencies, limiting quantitative assessment. Here, we develop a method to quantify errors in synthetic DNA by next-generation sequencing. We analyzed errors in model gene assemblies and systematically compared sixmore » different error correction enzymes across 11 conditions. We find that ErrASE and T7 Endonuclease I are the most effective at decreasing average error rates (up to 5.8-fold relative to the input), whereas MutS is the best for increasing the number of perfect assemblies (up to 25.2-fold). We are able to quantify differential specificities such as ErrASE preferentially corrects C/G transversions whereas T7 Endonuclease I preferentially corrects A/T transversions. More generally, this experimental and computational pipeline is a fast, scalable and extensible way to analyze errors in gene assemblies, to profile error correction methods, and to benchmark DNA synthesis methods.« less
A systematic comparison of error correction enzymes by next-generation sequencing

DOE PAGES

Lubock, Nathan B.; Zhang, Di; Sidore, Angus M.; ...

2017-08-01

Gene synthesis, the process of assembling genelength fragments from shorter groups of oligonucleotides (oligos), is becoming an increasingly important tool in molecular and synthetic biology. The length, quality and cost of gene synthesis are limited by errors produced during oligo synthesis and subsequent assembly. Enzymatic error correction methods are cost-effective means to ameliorate errors in gene synthesis. Previous analyses of these methods relied on cloning and Sanger sequencing to evaluate their efficiencies, limiting quantitative assessment. Here, we develop a method to quantify errors in synthetic DNA by next-generation sequencing. We analyzed errors in model gene assemblies and systematically compared sixmore » different error correction enzymes across 11 conditions. We find that ErrASE and T7 Endonuclease I are the most effective at decreasing average error rates (up to 5.8-fold relative to the input), whereas MutS is the best for increasing the number of perfect assemblies (up to 25.2-fold). We are able to quantify differential specificities such as ErrASE preferentially corrects C/G transversions whereas T7 Endonuclease I preferentially corrects A/T transversions. More generally, this experimental and computational pipeline is a fast, scalable and extensible way to analyze errors in gene assemblies, to profile error correction methods, and to benchmark DNA synthesis methods.« less
On the value of Mendelian laws of segregation in families: data quality control, imputation and beyond

PubMed Central

Blue, Elizabeth Marchani; Sun, Lei; Tintle, Nathan L.; Wijsman, Ellen M.

2014-01-01

When analyzing family data, we dream of perfectly informative data, even whole genome sequences (WGS) for all family members. Reality intervenes, and we find next-generation sequence (NGS) data have error, and are often too expensive or impossible to collect on everyone. Genetic Analysis Workshop 18 groups “Quality Control” and “Dropping WGS through families using GWAS framework” focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single nucleotide polymorphisms, NGS, and imputed data are generally concordant, but that errors are particularly likely at rare variants, homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelateds. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Both genotype and pedigree errors had an adverse effect on subsequent analyses. Computationally fast rules-based imputation was accurate, but could not cover as many loci or subjects as more computationally demanding probability-based methods. Incorporating population-level data into pedigree-based imputation methods improved results. Observed data outperformed imputed data in association testing, but imputed data were also useful. We discuss the strengths and weaknesses of existing methods, and suggest possible future directions. Topics include improving communication between those performing data collection and analysis, establishing thresholds for and improving imputation quality, and incorporating error into imputation and analytical models. PMID:25112184
Assessing the utility of the Oxford Nanopore MinION for snake venom gland cDNA sequencing.

PubMed

Hargreaves, Adam D; Mulley, John F

2015-01-01

Portable DNA sequencers such as the Oxford Nanopore MinION device have the potential to be truly disruptive technologies, facilitating new approaches and analyses and, in some cases, taking sequencing out of the lab and into the field. However, the capabilities of these technologies are still being revealed. Here we show that single-molecule cDNA sequencing using the MinION accurately characterises venom toxin-encoding genes in the painted saw-scaled viper, Echis coloratus. We find the raw sequencing error rate to be around 12%, improved to 0-2% with hybrid error correction and 3% with de novo error correction. Our corrected data provides full coding sequences and 5' and 3' UTRs for 29 of 33 candidate venom toxins detected, far superior to Illumina data (13/40 complete) and Sanger-based ESTs (15/29). We suggest that, should the current pace of improvement continue, the MinION will become the default approach for cDNA sequencing in a variety of species.
Assessing the utility of the Oxford Nanopore MinION for snake venom gland cDNA sequencing

PubMed Central

Hargreaves, Adam D.

2015-01-01

Portable DNA sequencers such as the Oxford Nanopore MinION device have the potential to be truly disruptive technologies, facilitating new approaches and analyses and, in some cases, taking sequencing out of the lab and into the field. However, the capabilities of these technologies are still being revealed. Here we show that single-molecule cDNA sequencing using the MinION accurately characterises venom toxin-encoding genes in the painted saw-scaled viper, Echis coloratus. We find the raw sequencing error rate to be around 12%, improved to 0–2% with hybrid error correction and 3% with de novo error correction. Our corrected data provides full coding sequences and 5′ and 3′ UTRs for 29 of 33 candidate venom toxins detected, far superior to Illumina data (13/40 complete) and Sanger-based ESTs (15/29). We suggest that, should the current pace of improvement continue, the MinION will become the default approach for cDNA sequencing in a variety of species. PMID:26623194
Using expected sequence features to improve basecalling accuracy of amplicon pyrosequencing data.

PubMed

Rask, Thomas S; Petersen, Bent; Chen, Donald S; Day, Karen P; Pedersen, Anders Gorm

2016-04-22

Amplicon pyrosequencing targets a known genetic region and thus inherently produces reads highly anticipated to have certain features, such as conserved nucleotide sequence, and in the case of protein coding DNA, an open reading frame. Pyrosequencing errors, consisting mainly of nucleotide insertions and deletions, are on the other hand likely to disrupt open reading frames. Such an inverse relationship between errors and expectation based on prior knowledge can be used advantageously to guide the process known as basecalling, i.e. the inference of nucleotide sequence from raw sequencing data. The new basecalling method described here, named Multipass, implements a probabilistic framework for working with the raw flowgrams obtained by pyrosequencing. For each sequence variant Multipass calculates the likelihood and nucleotide sequence of several most likely sequences given the flowgram data. This probabilistic approach enables integration of basecalling into a larger model where other parameters can be incorporated, such as the likelihood for observing a full-length open reading frame at the targeted region. We apply the method to 454 amplicon pyrosequencing data obtained from a malaria virulence gene family, where Multipass generates 20 % more error-free sequences than current state of the art methods, and provides sequence characteristics that allow generation of a set of high confidence error-free sequences. This novel method can be used to increase accuracy of existing and future amplicon sequencing data, particularly where extensive prior knowledge is available about the obtained sequences, for example in analysis of the immunoglobulin VDJ region where Multipass can be combined with a model for the known recombining germline genes. Multipass is available for Roche 454 data at http://www.cbs.dtu.dk/services/MultiPass-1.0 , and the concept can potentially be implemented for other sequencing technologies as well.
Accuracy of UTE-MRI-based patient setup for brain cancer radiation therapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, Yingli; Cao, Minsong; Kaprealian, Tania

2016-01-15

Purpose: Radiation therapy simulations solely based on MRI have advantages compared to CT-based approaches. One feature readily available from computed tomography (CT) that would need to be reproduced with MR is the ability to compute digitally reconstructed radiographs (DRRs) for comparison against on-board radiographs commonly used for patient positioning. In this study, the authors generate MR-based bone images using a single ultrashort echo time (UTE) pulse sequence and quantify their 3D and 2D image registration accuracy to CT and radiographic images for treatments in the cranium. Methods: Seven brain cancer patients were scanned at 1.5 T using a radial UTEmore » sequence. The sequence acquired two images at two different echo times. The two images were processed using an in-house software to generate the UTE bone images. The resultant bone images were rigidly registered to simulation CT data and the registration error was determined using manually annotated landmarks as references. DRRs were created based on UTE-MRI and registered to simulated on-board images (OBIs) and actual clinical 2D oblique images from ExacTrac™. Results: UTE-MRI resulted in well visualized cranial, facial, and vertebral bones that quantitatively matched the bones in the CT images with geometric measurement errors of less than 1 mm. The registration error between DRRs generated from 3D UTE-MRI and the simulated 2D OBIs or the clinical oblique x-ray images was also less than 1 mm for all patients. Conclusions: UTE-MRI-based DRRs appear to be promising for daily patient setup of brain cancer radiotherapy with kV on-board imaging.« less
Manual Dexterity in Schizophrenia—A Neglected Clinical Marker?

PubMed Central

Térémetz, Maxime; Carment, Loïc; Brénugat-Herne, Lindsay; Croca, Marta; Bleton, Jean-Pierre; Krebs, Marie-Odile; Maier, Marc A.; Amado, Isabelle; Lindberg, Påvel G.

2017-01-01

Impaired manual dexterity is commonly observed in schizophrenia. However, a quantitative description of key sensorimotor components contributing to impaired dexterity is lacking. Whether the key components of dexterity are differentially affected and how they relate to clinical characteristics also remains unclear. We quantified the degree of dexterity in 35 stabilized patients with schizophrenia and in 20 age-matched control subjects using four visuomotor tasks: (i) force tracking to quantify visuomotor precision, (ii) sequential finger tapping to measure motor sequence recall, (iii) single-finger tapping to assess temporal regularity, and (iv) multi-finger tapping to measure independence of finger movements. Diverse clinical and neuropsychological tests were also applied. A patient subgroup (N = 15) participated in a 14-week cognitive remediation protocol and was assessed before and after remediation. Compared to control subjects, patients with schizophrenia showed greater error in force tracking, poorer recall of tapping sequences, decreased tapping regularity, and reduced degree of finger individuation. A composite performance measure discriminated patients from controls with sensitivity = 0.79 and specificity = 0.9. Aside from force-tracking error, no other dexterity components correlated with antipsychotic medication. In patients, some dexterity components correlated with neurological soft signs, Positive and Negative Syndrome Scale (PANSS), or neuropsychological scores. This suggests differential cognitive contributions to these components. Cognitive remediation lead to significant improvement in PANSS, tracking error, and sequence recall (without change in medication). These findings show that multiple aspects of sensorimotor control contribute to impaired manual dexterity in schizophrenia. Only visuomotor precision was related to antipsychotic medication. Good diagnostic accuracy and responsiveness to treatment suggest that manual dexterity may represent a useful clinical marker in schizophrenia. PMID:28740470
Experimental and statistical post-validation of positive example EST sequences carrying peroxisome targeting signals type 1 (PTS1)

PubMed Central

Lingner, Thomas; Kataya, Amr R. A.; Reumann, Sigrun

2012-01-01

We recently developed the first algorithms specifically for plants to predict proteins carrying peroxisome targeting signals type 1 (PTS1) from genome sequences.1 As validated experimentally, the prediction methods are able to correctly predict unknown peroxisomal Arabidopsis proteins and to infer novel PTS1 tripeptides. The high prediction performance is primarily determined by the large number and sequence diversity of the underlying positive example sequences, which mainly derived from EST databases. However, a few constructs remained cytosolic in experimental validation studies, indicating sequencing errors in some ESTs. To identify erroneous sequences, we validated subcellular targeting of additional positive example sequences in the present study. Moreover, we analyzed the distribution of prediction scores separately for each orthologous group of PTS1 proteins, which generally resembled normal distributions with group-specific mean values. The cytosolic sequences commonly represented outliers of low prediction scores and were located at the very tail of a fitted normal distribution. Three statistical methods for identifying outliers were compared in terms of sensitivity and specificity.” Their combined application allows elimination of erroneous ESTs from positive example data sets. This new post-validation method will further improve the prediction accuracy of both PTS1 and PTS2 protein prediction models for plants, fungi, and mammals. PMID:22415050
Experimental and statistical post-validation of positive example EST sequences carrying peroxisome targeting signals type 1 (PTS1).

PubMed

Lingner, Thomas; Kataya, Amr R A; Reumann, Sigrun

2012-02-01

We recently developed the first algorithms specifically for plants to predict proteins carrying peroxisome targeting signals type 1 (PTS1) from genome sequences. As validated experimentally, the prediction methods are able to correctly predict unknown peroxisomal Arabidopsis proteins and to infer novel PTS1 tripeptides. The high prediction performance is primarily determined by the large number and sequence diversity of the underlying positive example sequences, which mainly derived from EST databases. However, a few constructs remained cytosolic in experimental validation studies, indicating sequencing errors in some ESTs. To identify erroneous sequences, we validated subcellular targeting of additional positive example sequences in the present study. Moreover, we analyzed the distribution of prediction scores separately for each orthologous group of PTS1 proteins, which generally resembled normal distributions with group-specific mean values. The cytosolic sequences commonly represented outliers of low prediction scores and were located at the very tail of a fitted normal distribution. Three statistical methods for identifying outliers were compared in terms of sensitivity and specificity." Their combined application allows elimination of erroneous ESTs from positive example data sets. This new post-validation method will further improve the prediction accuracy of both PTS1 and PTS2 protein prediction models for plants, fungi, and mammals.
Using single cell sequencing data to model the evolutionary history of a tumor.

PubMed

Kim, Kyung In; Simon, Richard

2014-01-24

The introduction of next-generation sequencing (NGS) technology has made it possible to detect genomic alterations within tumor cells on a large scale. However, most applications of NGS show the genetic content of mixtures of cells. Recently developed single cell sequencing technology can identify variation within a single cell. Characterization of multiple samples from a tumor using single cell sequencing can potentially provide information on the evolutionary history of that tumor. This may facilitate understanding how key mutations accumulate and evolve in lineages to form a heterogeneous tumor. We provide a computational method to infer an evolutionary mutation tree based on single cell sequencing data. Our approach differs from traditional phylogenetic tree approaches in that our mutation tree directly describes temporal order relationships among mutation sites. Our method also accommodates sequencing errors. Furthermore, we provide a method for estimating the proportion of time from the earliest mutation event of the sample to the most recent common ancestor of the sample of cells. Finally, we discuss current limitations on modeling with single cell sequencing data and possible improvements under those limitations. Inferring the temporal ordering of mutational sites using current single cell sequencing data is a challenge. Our proposed method may help elucidate relationships among key mutations and their role in tumor progression.
Trellises and Trellis-Based Decoding Algorithms for Linear Block Codes. Part 3; The Map and Related Decoding Algirithms

NASA Technical Reports Server (NTRS)

Lin, Shu; Fossorier, Marc

1998-01-01

In a coded communication system with equiprobable signaling, MLD minimizes the word error probability and delivers the most likely codeword associated with the corresponding received sequence. This decoding has two drawbacks. First, minimization of the word error probability is not equivalent to minimization of the bit error probability. Therefore, MLD becomes suboptimum with respect to the bit error probability. Second, MLD delivers a hard-decision estimate of the received sequence, so that information is lost between the input and output of the ML decoder. This information is important in coded schemes where the decoded sequence is further processed, such as concatenated coding schemes, multi-stage and iterative decoding schemes. In this chapter, we first present a decoding algorithm which both minimizes bit error probability, and provides the corresponding soft information at the output of the decoder. This algorithm is referred to as the MAP (maximum aposteriori probability) decoding algorithm.
BETASEQ: a powerful novel method to control type-I error inflation in partially sequenced data for rare variant association testing.

PubMed

Yan, Song; Li, Yun

2014-02-15

Despite its great capability to detect rare variant associations, next-generation sequencing is still prohibitively expensive when applied to large samples. In case-control studies, it is thus appealing to sequence only a subset of cases to discover variants and genotype the identified variants in controls and the remaining cases under the reasonable assumption that causal variants are usually enriched among cases. However, this approach leads to inflated type-I error if analyzed naively for rare variant association. Several methods have been proposed in recent literature to control type-I error at the cost of either excluding some sequenced cases or correcting the genotypes of discovered rare variants. All of these approaches thus suffer from certain extent of information loss and thus are underpowered. We propose a novel method (BETASEQ), which corrects inflation of type-I error by supplementing pseudo-variants while keeps the original sequence and genotype data intact. Extensive simulations and real data analysis demonstrate that, in most practical situations, BETASEQ leads to higher testing powers than existing approaches with guaranteed (controlled or conservative) type-I error. BETASEQ and associated R files, including documentation, examples, are available at http://www.unc.edu/~yunmli/betaseq
Accurate Sample Assignment in a Multiplexed, Ultrasensitive, High-Throughput Sequencing Assay for Minimal Residual Disease.

PubMed

Bartram, Jack; Mountjoy, Edward; Brooks, Tony; Hancock, Jeremy; Williamson, Helen; Wright, Gary; Moppett, John; Goulden, Nick; Hubank, Mike

2016-07-01

High-throughput sequencing (HTS) (next-generation sequencing) of the rearranged Ig and T-cell receptor genes promises to be less expensive and more sensitive than current methods of monitoring minimal residual disease (MRD) in patients with acute lymphoblastic leukemia. However, the adoption of new approaches by clinical laboratories requires careful evaluation of all potential sources of error and the development of strategies to ensure the highest accuracy. Timely and efficient clinical use of HTS platforms will depend on combining multiple samples (multiplexing) in each sequencing run. Here we examine the Ig heavy-chain gene HTS on the Illumina MiSeq platform for MRD. We identify errors associated with multiplexing that could potentially impact the accuracy of MRD analysis. We optimize a strategy that combines high-purity, sequence-optimized oligonucleotides, dual indexing, and an error-aware demultiplexing approach to minimize errors and maximize sensitivity. We present a probability-based, demultiplexing pipeline Error-Aware Demultiplexer that is suitable for all MiSeq strategies and accurately assigns samples to the correct identifier without excessive loss of data. Finally, using controls quantified by digital PCR, we show that HTS-MRD can accurately detect as few as 1 in 10(6) copies of specific leukemic MRD. Crown Copyright © 2016. Published by Elsevier Inc. All rights reserved.
Analysis of Sequence Data Under Multivariate Trait-Dependent Sampling.

PubMed

Tao, Ran; Zeng, Donglin; Franceschini, Nora; North, Kari E; Boerwinkle, Eric; Lin, Dan-Yu

2015-06-01

High-throughput DNA sequencing allows for the genotyping of common and rare variants for genetic association studies. At the present time and for the foreseeable future, it is not economically feasible to sequence all individuals in a large cohort. A cost-effective strategy is to sequence those individuals with extreme values of a quantitative trait. We consider the design under which the sampling depends on multiple quantitative traits. Under such trait-dependent sampling, standard linear regression analysis can result in bias of parameter estimation, inflation of type I error, and loss of power. We construct a likelihood function that properly reflects the sampling mechanism and utilizes all available data. We implement a computationally efficient EM algorithm and establish the theoretical properties of the resulting maximum likelihood estimators. Our methods can be used to perform separate inference on each trait or simultaneous inference on multiple traits. We pay special attention to gene-level association tests for rare variants. We demonstrate the superiority of the proposed methods over standard linear regression through extensive simulation studies. We provide applications to the Cohorts for Heart and Aging Research in Genomic Epidemiology Targeted Sequencing Study and the National Heart, Lung, and Blood Institute Exome Sequencing Project.
Efficient and precise calculation of the b-matrix elements in diffusion-weighted imaging pulse sequences.

PubMed

Zubkov, Mikhail; Stait-Gardner, Timothy; Price, William S

2014-06-01

Precise NMR diffusion measurements require detailed knowledge of the cumulative dephasing effect caused by the numerous gradient pulses present in most NMR pulse sequences. This effect, which ultimately manifests itself as the diffusion-related NMR signal attenuation, is usually described by the b-value or the b-matrix in the case of multidirectional diffusion weighting, the latter being common in diffusion-weighted NMR imaging. Neglecting some of the gradient pulses introduces an error in the calculated diffusion coefficient reaching in some cases 100% of the expected value. Therefore, ensuring the b-matrix calculation includes all the known gradient pulses leads to significant error reduction. Calculation of the b-matrix for simple gradient waveforms is rather straightforward, yet it grows cumbersome when complexly shaped and/or numerous gradient pulses are introduced. Making three broad assumptions about the gradient pulse arrangement in a sequence results in an efficient framework for calculation of b-matrices as well providing some insight into optimal gradient pulse placement. The framework allows accounting for the diffusion-sensitising effect of complexly shaped gradient waveforms with modest computational time and power. This is achieved by using the b-matrix elements of the simple unmodified pulse sequence and minimising the integration of the complexly shaped gradient waveform in the modified sequence. Such re-evaluation of the b-matrix elements retains all the analytical relevance of the straightforward approach, yet at least halves the amount of symbolic integration required. The application of the framework is demonstrated with the evaluation of the expression describing the diffusion-sensitizing effect, caused by different bipolar gradient pulse modules. Copyright © 2014 Elsevier Inc. All rights reserved.
Determinants of spontaneous mutation in the bacterium Escherichia coli as revealed by whole-genome sequencing

PubMed Central

Foster, Patricia L.; Lee, Heewook; Popodi, Ellen; Townes, Jesse P.; Tang, Haixu

2015-01-01

A complete understanding of evolutionary processes requires that factors determining spontaneous mutation rates and spectra be identified and characterized. Using mutation accumulation followed by whole-genome sequencing, we found that the mutation rates of three widely diverged commensal Escherichia coli strains differ only by about 50%, suggesting that a rate of 1–2 × 10−3 mutations per generation per genome is common for this bacterium. Four major forces are postulated to contribute to spontaneous mutations: intrinsic DNA polymerase errors, endogenously induced DNA damage, DNA damage caused by exogenous agents, and the activities of error-prone polymerases. To determine the relative importance of these factors, we studied 11 strains, each defective for a major DNA repair pathway. The striking result was that only loss of the ability to prevent or repair oxidative DNA damage significantly impacted mutation rates or spectra. These results suggest that, with the exception of oxidative damage, endogenously induced DNA damage does not perturb the overall accuracy of DNA replication in normally growing cells and that repair pathways may exist primarily to defend against exogenously induced DNA damage. The thousands of mutations caused by oxidative damage recovered across the entire genome revealed strong local-sequence biases of these mutations. Specifically, we found that the identity of the 3′ base can affect the mutability of a purine by oxidative damage by as much as eightfold. PMID:26460006

Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity.

PubMed

Edger, Patrick P; VanBuren, Robert; Colle, Marivi; Poorten, Thomas J; Wai, Ching Man; Niederhuth, Chad E; Alger, Elizabeth I; Ou, Shujun; Acharya, Charlotte B; Wang, Jie; Callow, Pete; McKain, Michael R; Shi, Jinghua; Collier, Chad; Xiong, Zhiyong; Mower, Jeffrey P; Slovin, Janet P; Hytönen, Timo; Jiang, Ning; Childs, Kevin L; Knapp, Steven J

2018-02-01

Although draft genomes are available for most agronomically important plant species, the majority are incomplete, highly fragmented, and often riddled with assembly and scaffolding errors. These assembly issues hinder advances in tool development for functional genomics and systems biology. Here we utilized a robust, cost-effective approach to produce high-quality reference genomes. We report a near-complete genome of diploid woodland strawberry (Fragaria vesca) using single-molecule real-time sequencing from Pacific Biosciences (PacBio). This assembly has a contig N50 length of ∼7.9 million base pairs (Mb), representing a ∼300-fold improvement of the previous version. The vast majority (>99.8%) of the assembly was anchored to 7 pseudomolecules using 2 sets of optical maps from Bionano Genomics. We obtained ∼24.96 Mb of sequence not present in the previous version of the F. vesca genome and produced an improved annotation that includes 1496 new genes. Comparative syntenic analyses uncovered numerous, large-scale scaffolding errors present in each chromosome in the previously published version of the F. vesca genome. Our results highlight the need to improve existing short-read based reference genomes. Furthermore, we demonstrate how genome quality impacts commonly used analyses for addressing both fundamental and applied biological questions. © The Authors 2017. Published by Oxford University Press.
Error baseline rates of five sample preparation methods used to characterize RNA virus populations.

PubMed

Kugelman, Jeffrey R; Wiley, Michael R; Nagle, Elyse R; Reyes, Daniel; Pfeffer, Brad P; Kuhn, Jens H; Sanchez-Lockhart, Mariano; Palacios, Gustavo F

2017-01-01

Individual RNA viruses typically occur as populations of genomes that differ slightly from each other due to mutations introduced by the error-prone viral polymerase. Understanding the variability of RNA virus genome populations is critical for understanding virus evolution because individual mutant genomes may gain evolutionary selective advantages and give rise to dominant subpopulations, possibly even leading to the emergence of viruses resistant to medical countermeasures. Reverse transcription of virus genome populations followed by next-generation sequencing is the only available method to characterize variation for RNA viruses. However, both steps may lead to the introduction of artificial mutations, thereby skewing the data. To better understand how such errors are introduced during sample preparation, we determined and compared error baseline rates of five different sample preparation methods by analyzing in vitro transcribed Ebola virus RNA from an artificial plasmid-based system. These methods included: shotgun sequencing from plasmid DNA or in vitro transcribed RNA as a basic "no amplification" method, amplicon sequencing from the plasmid DNA or in vitro transcribed RNA as a "targeted" amplification method, sequence-independent single-primer amplification (SISPA) as a "random" amplification method, rolling circle reverse transcription sequencing (CirSeq) as an advanced "no amplification" method, and Illumina TruSeq RNA Access as a "targeted" enrichment method. The measured error frequencies indicate that RNA Access offers the best tradeoff between sensitivity and sample preparation error (1.4-5) of all compared methods.
Error baseline rates of five sample preparation methods used to characterize RNA virus populations

PubMed Central

Kugelman, Jeffrey R.; Wiley, Michael R.; Nagle, Elyse R.; Reyes, Daniel; Pfeffer, Brad P.; Kuhn, Jens H.; Sanchez-Lockhart, Mariano; Palacios, Gustavo F.

2017-01-01

Individual RNA viruses typically occur as populations of genomes that differ slightly from each other due to mutations introduced by the error-prone viral polymerase. Understanding the variability of RNA virus genome populations is critical for understanding virus evolution because individual mutant genomes may gain evolutionary selective advantages and give rise to dominant subpopulations, possibly even leading to the emergence of viruses resistant to medical countermeasures. Reverse transcription of virus genome populations followed by next-generation sequencing is the only available method to characterize variation for RNA viruses. However, both steps may lead to the introduction of artificial mutations, thereby skewing the data. To better understand how such errors are introduced during sample preparation, we determined and compared error baseline rates of five different sample preparation methods by analyzing in vitro transcribed Ebola virus RNA from an artificial plasmid-based system. These methods included: shotgun sequencing from plasmid DNA or in vitro transcribed RNA as a basic “no amplification” method, amplicon sequencing from the plasmid DNA or in vitro transcribed RNA as a “targeted” amplification method, sequence-independent single-primer amplification (SISPA) as a “random” amplification method, rolling circle reverse transcription sequencing (CirSeq) as an advanced “no amplification” method, and Illumina TruSeq RNA Access as a “targeted” enrichment method. The measured error frequencies indicate that RNA Access offers the best tradeoff between sensitivity and sample preparation error (1.4−5) of all compared methods. PMID:28182717
Haplotype estimation using sequencing reads.

PubMed

Delaneau, Olivier; Howie, Bryan; Cox, Anthony J; Zagury, Jean-François; Marchini, Jonathan

2013-10-03

High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Significantly improved precision of cell migration analysis in time-lapse video microscopy through use of a fully automated tracking system

PubMed Central

2010-01-01

Background Cell motility is a critical parameter in many physiological as well as pathophysiological processes. In time-lapse video microscopy, manual cell tracking remains the most common method of analyzing migratory behavior of cell populations. In addition to being labor-intensive, this method is susceptible to user-dependent errors regarding the selection of "representative" subsets of cells and manual determination of precise cell positions. Results We have quantitatively analyzed these error sources, demonstrating that manual cell tracking of pancreatic cancer cells lead to mis-calculation of migration rates of up to 410%. In order to provide for objective measurements of cell migration rates, we have employed multi-target tracking technologies commonly used in radar applications to develop fully automated cell identification and tracking system suitable for high throughput screening of video sequences of unstained living cells. Conclusion We demonstrate that our automatic multi target tracking system identifies cell objects, follows individual cells and computes migration rates with high precision, clearly outperforming manual procedures. PMID:20377897
AMPLISAS: a web server for multilocus genotyping using next-generation amplicon sequencing data.

PubMed

Sebastian, Alvaro; Herdegen, Magdalena; Migalska, Magdalena; Radwan, Jacek

2016-03-01

Next-generation sequencing (NGS) technologies are revolutionizing the fields of biology and medicine as powerful tools for amplicon sequencing (AS). Using combinations of primers and barcodes, it is possible to sequence targeted genomic regions with deep coverage for hundreds, even thousands, of individuals in a single experiment. This is extremely valuable for the genotyping of gene families in which locus-specific primers are often difficult to design, such as the major histocompatibility complex (MHC). The utility of AS is, however, limited by the high intrinsic sequencing error rates of NGS technologies and other sources of error such as polymerase amplification or chimera formation. Correcting these errors requires extensive bioinformatic post-processing of NGS data. Amplicon Sequence Assignment (AMPLISAS) is a tool that performs analysis of AS results in a simple and efficient way, while offering customization options for advanced users. AMPLISAS is designed as a three-step pipeline consisting of (i) read demultiplexing, (ii) unique sequence clustering and (iii) erroneous sequence filtering. Allele sequences and frequencies are retrieved in excel spreadsheet format, making them easy to interpret. AMPLISAS performance has been successfully benchmarked against previously published genotyped MHC data sets obtained with various NGS technologies. © 2015 John Wiley & Sons Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Heng, E-mail: hengli@mdanderson.org; Zhu, X. Ronald; Zhang, Xiaodong

Purpose: To develop and validate a novel delivery strategy for reducing the respiratory motion–induced dose uncertainty of spot-scanning proton therapy. Methods and Materials: The spot delivery sequence was optimized to reduce dose uncertainty. The effectiveness of the delivery sequence optimization was evaluated using measurements and patient simulation. One hundred ninety-one 2-dimensional measurements using different delivery sequences of a single-layer uniform pattern were obtained with a detector array on a 1-dimensional moving platform. Intensity modulated proton therapy plans were generated for 10 lung cancer patients, and dose uncertainties for different delivery sequences were evaluated by simulation. Results: Without delivery sequence optimization,more » the maximum absolute dose error can be up to 97.2% in a single measurement, whereas the optimized delivery sequence results in a maximum absolute dose error of ≤11.8%. In patient simulation, the optimized delivery sequence reduces the mean of fractional maximum absolute dose error compared with the regular delivery sequence by 3.3% to 10.6% (32.5-68.0% relative reduction) for different patients. Conclusions: Optimizing the delivery sequence can reduce dose uncertainty due to respiratory motion in spot-scanning proton therapy, assuming the 4-dimensional CT is a true representation of the patients' breathing patterns.« less
Single molecule sequencing-guided scaffolding and correction of draft assemblies.

PubMed

Zhu, Shenglong; Chen, Danny Z; Emrich, Scott J

2017-12-06

Although single molecule sequencing is still improving, the lengths of the generated sequences are inevitably an advantage in genome assembly. Prior work that utilizes long reads to conduct genome assembly has mostly focused on correcting sequencing errors and improving contiguity of de novo assemblies. We propose a disassembling-reassembling approach for both correcting structural errors in the draft assembly and scaffolding a target assembly based on error-corrected single molecule sequences. To achieve this goal, we formulate a maximum alternating path cover problem. We prove that this problem is NP-hard, and solve it by a 2-approximation algorithm. Our experimental results show that our approach can improve the structural correctness of target assemblies in the cost of some contiguity, even with smaller amounts of long reads. In addition, our reassembling process can also serve as a competitive scaffolder relative to well-established assembly benchmarks.
Hybrid error correction and de novo assembly of single-molecule sequencing reads

PubMed Central

Koren, Sergey; Schatz, Michael C.; Walenz, Brian P.; Martin, Jeffrey; Howard, Jason; Ganapathy, Ganeshkumar; Wang, Zhong; Rasko, David A.; McCombie, W. Richard; Jarvis, Erich D.; Phillippy, Adam M.

2012-01-01

Emerging single-molecule sequencing instruments can generate multi-kilobase sequences with the potential to dramatically improve genome and transcriptome assembly. However, the high error rate of single-molecule reads is challenging, and has limited their use to resequencing bacteria. To address this limitation, we introduce a novel correction algorithm and assembly strategy that utilizes shorter, high-identity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on Pacbio RS reads of phage, prokaryotic, and eukaryotic whole genomes, including the novel genome of the parrot Melopsittacus undulatus, as well as for RNA-seq reads of the corn (Zea mays) transcriptome. Our approach achieves over 99.9% read correction accuracy and produces substantially better assemblies than current sequencing strategies: in the best example, quintupling the median contig size relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly. PMID:22750884
Spatial serial order processing in schizophrenia.

PubMed

Fraser, David; Park, Sohee; Clark, Gina; Yohanna, Daniel; Houk, James C

2004-10-01

The aim of this study was to examine serial order processing deficits in 21 schizophrenia patients and 16 age- and education-matched healthy controls. In a spatial serial order working memory task, one to four spatial targets were presented in a randomized sequence. Subjects were required to remember the locations and the order in which the targets were presented. Patients showed a marked deficit in ability to remember the sequences compared with controls. Increasing the number of targets within a sequence resulted in poorer memory performance for both control and schizophrenia subjects, but the effect was much more pronounced in the patients. Targets presented at the end of a long sequence were more vulnerable to memory error in schizophrenia patients. Performance deficits were not attributable to motor errors, but to errors in target choice. The results support the idea that the memory errors seen in schizophrenia patients may be due to saturating the working memory network at relatively low levels of memory load.
European external quality control study on the competence of laboratories to recognize rare sequence variants resulting in unusual genotyping results.

PubMed

Márki-Zay, János; Klein, Christoph L; Gancberg, David; Schimmel, Heinz G; Dux, László

2009-04-01

Depending on the method used, rare sequence variants adjacent to the single nucleotide polymorphism (SNP) of interest may cause unusual or erroneous genotyping results. Because such rare variants are known for many genes commonly tested in diagnostic laboratories, we organized a proficiency study to assess their influence on the accuracy of reported laboratory results. Four external quality control materials were processed and sent to 283 laboratories through 3 EQA organizers for analysis of the prothrombin 20210G>A mutation. Two of these quality control materials contained sequence variants introduced by site-directed mutagenesis. One hundred eighty-nine laboratories participated in the study. When samples gave a usual result with the method applied, the error rate was 5.1%. Detailed analysis showed that more than 70% of the failures were reported from only 9 laboratories. Allele-specific amplification-based PCR had a much higher error rate than other methods (18.3% vs 2.9%). The variants 20209C>T and [20175T>G; 20179_20180delAC] resulted in unusual genotyping results in 67 and 85 laboratories, respectively. Eighty-three (54.6%) of these unusual results were not recognized, 32 (21.1%) were attributed to technical issues, and only 37 (24.3%) were recognized as another sequence variant. Our findings revealed that some of the participating laboratories were not able to recognize and correctly interpret unusual genotyping results caused by rare SNPs. Our study indicates that the majority of the failures could be avoided by improved training and careful selection and validation of the methods applied.
Single-variant and multi-variant trend tests for genetic association with next-generation sequencing that are robust to sequencing error.

PubMed

Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Alejandro Q; Musolf, Anthony; Matise, Tara C; Finch, Stephen J; Gordon, Derek

2012-01-01

As with any new technology, next-generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to those data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have lower power than the corresponding single-variant simulation results, most probably due to our specification of multi-variant SNP correlation values. In conclusion, our LTTae,NGS addresses two key challenges with NGS disease studies; first, it allows for differential misclassification when computing the statistic; and second, it addresses the multiple-testing issue in that there is a multi-variant form of the statistic that has only one degree of freedom, and provides a single p value, no matter how many loci. Copyright © 2013 S. Karger AG, Basel.
Single variant and multi-variant trend tests for genetic association with next generation sequencing that are robust to sequencing error

PubMed Central

Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Andrew; Musolf, Anthony; Matise, Tara C.; Finch, Stephen J.; Gordon, Derek

2013-01-01

As with any new technology, next generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model, based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to that data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have lower power than the corresponding single variant simulation results, most probably due to our specification of multi-variant SNP correlation values. In conclusion, our LTTae,NGS addresses two key challenges with NGS disease studies; first, it allows for differential misclassification when computing the statistic; and second, it addresses the multiple-testing issue in that there is a multi-variant form of the statistic that has only one degree of freedom, and provides a single p-value, no matter how many loci. PMID:23594495
Software for pre-processing Illumina next-generation sequencing short read sequences

PubMed Central

2014-01-01

Background When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets. Methods We developed ngsShoRT (next-generation sequencing Short Reads Trimmer), a flexible and comprehensive open-source software package written in Perl that provides a set of algorithms commonly used for pre-processing NGS short read sequences. We compared the features and performance of ngsShoRT with existing tools: CutAdapt, NGS QC Toolkit and Trimmomatic. We also compared the effects of using pre-processed short read sequences generated by different algorithms on de novo and reference-based assembly for three different genomes: Caenorhabditis elegans, Saccharomyces cerevisiae S288c, and Escherichia coli O157 H7. Results Several combinations of ngsShoRT algorithms were tested on publicly available Illumina GA II, HiSeq 2000, and MiSeq eukaryotic and bacteria genomic short read sequences with the focus on removing sequencing artifacts and low-quality reads and/or bases. Our results show that across three organisms and three sequencing platforms, trimming improved the mean quality scores of trimmed sequences. Using trimmed sequences for de novo and reference-based assembly improved assembly quality as well as assembler performance. In general, ngsShoRT outperformed comparable trimming tools in terms of trimming speed and improvement of de novo and reference-based assembly as measured by assembly contiguity and correctness. Conclusions Trimming of short read sequences can improve the quality of de novo and reference-based assembly and assembler performance. The parallel processing capability of ngsShoRT reduces trimming time and improves the memory efficiency when dealing with large datasets. We recommend combining sequencing artifacts removal, and quality score based read filtering and base trimming as the most consistent method for improving sequence quality and downstream assemblies. ngsShoRT source code, user guide and tutorial are available at http://research.bioinformatics.udel.edu/genomics/ngsShoRT/. ngsShoRT can be incorporated as a pre-processing step in genome and transcriptome assembly projects. PMID:24955109
Error reduction and parameter optimization of the TAPIR method for fast T1 mapping.

PubMed

Zaitsev, M; Steinhoff, S; Shah, N J

2003-06-01

A methodology is presented for the reduction of both systematic and random errors in T(1) determination using TAPIR, a Look-Locker-based fast T(1) mapping technique. The relations between various sequence parameters were carefully investigated in order to develop recipes for choosing optimal sequence parameters. Theoretical predictions for the optimal flip angle were verified experimentally. Inversion pulse imperfections were identified as the main source of systematic errors in T(1) determination with TAPIR. An effective remedy is demonstrated which includes extension of the measurement protocol to include a special sequence for mapping the inversion efficiency itself. Copyright 2003 Wiley-Liss, Inc.
An Activation-Based Model of Routine Sequence Errors

DTIC Science & Technology

2015-04-01

part of the ACT-R frame- work (e.g., Anderson, 1983), we adopt a newer, richer no- tion of priming as part of our approach ( Harrison & Trafton, 2010...2014). Other models of routine sequence errors, such as the in- teractive activation network ( IAN ) model (Cooper & Shal- lice, 2006) and the simple...error patterns that results from an interface layout shift. The ideas behind our expanded priming approach, however, could apply to IAN , which uses
Estimating and comparing microbial diversity in the presence of sequencing errors

PubMed Central

Chiu, Chun-Huo

2016-01-01

Estimating and comparing microbial diversity are statistically challenging due to limited sampling and possible sequencing errors for low-frequency counts, producing spurious singletons. The inflated singleton count seriously affects statistical analysis and inferences about microbial diversity. Previous statistical approaches to tackle the sequencing errors generally require different parametric assumptions about the sampling model or about the functional form of frequency counts. Different parametric assumptions may lead to drastically different diversity estimates. We focus on nonparametric methods which are universally valid for all parametric assumptions and can be used to compare diversity across communities. We develop here a nonparametric estimator of the true singleton count to replace the spurious singleton count in all methods/approaches. Our estimator of the true singleton count is in terms of the frequency counts of doubletons, tripletons and quadrupletons, provided these three frequency counts are reliable. To quantify microbial alpha diversity for an individual community, we adopt the measure of Hill numbers (effective number of taxa) under a nonparametric framework. Hill numbers, parameterized by an order q that determines the measures’ emphasis on rare or common species, include taxa richness (q = 0), Shannon diversity (q = 1, the exponential of Shannon entropy), and Simpson diversity (q = 2, the inverse of Simpson index). A diversity profile which depicts the Hill number as a function of order q conveys all information contained in a taxa abundance distribution. Based on the estimated singleton count and the original non-singleton frequency counts, two statistical approaches (non-asymptotic and asymptotic) are developed to compare microbial diversity for multiple communities. (1) A non-asymptotic approach refers to the comparison of estimated diversities of standardized samples with a common finite sample size or sample completeness. This approach aims to compare diversity estimates for equally-large or equally-complete samples; it is based on the seamless rarefaction and extrapolation sampling curves of Hill numbers, specifically for q = 0, 1 and 2. (2) An asymptotic approach refers to the comparison of the estimated asymptotic diversity profiles. That is, this approach compares the estimated profiles for complete samples or samples whose size tends to be sufficiently large. It is based on statistical estimation of the true Hill number of any order q ≥ 0. In the two approaches, replacing the spurious singleton count by our estimated count, we can greatly remove the positive biases associated with diversity estimates due to spurious singletons and also make fair comparisons across microbial communities, as illustrated in our simulation results and in applying our method to analyze sequencing data from viral metagenomes. PMID:26855872
ABI Base Recall: Automatic Correction and Ends Trimming of DNA Sequences.

PubMed

Elyazghi, Zakaria; Yazouli, Loubna El; Sadki, Khalid; Radouani, Fouzia

2017-12-01

Automated DNA sequencers produce chromatogram files in ABI format. When viewing chromatograms, some ambiguities are shown at various sites along the DNA sequences, because the program implemented in the sequencing machine and used to call bases cannot always precisely determine the right nucleotide, especially when it is represented by either a broad peak or a set of overlaying peaks. In such cases, a letter other than A, C, G, or T is recorded, most commonly N. Thus, DNA sequencing chromatograms need manual examination: checking for mis-calls and truncating the sequence when errors become too frequent. The purpose of this paper is to develop a program allowing the automatic correction of these ambiguities. This application is a Web-based program powered by Shiny and runs under R platform for an easy exploitation. As a part of the interface, we added the automatic ends clipping option, alignment against reference sequences, and BLAST. To develop and test our tool, we collected several bacterial DNA sequences from different laboratories within Institut Pasteur du Maroc and performed both manual and automatic correction. The comparison between the two methods was carried out. As a result, we note that our program, ABI base recall, accomplishes good correction with a high accuracy. Indeed, it increases the rate of identity and coverage and minimizes the number of mismatches and gaps, hence it provides solution to sequencing ambiguities and saves biologists' time and labor.
Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions

PubMed Central

2014-01-01

Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads. PMID:24428920
Evaluation of Magnetic Resonance Imaging-Compatible Needles and Interactive Sequences for Musculoskeletal Interventions Using an Open High-Field Magnetic Resonance Imaging Scanner

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wonneberger, Uta, E-mail: uta.wonneberger@charite.d; Schnackenburg, Bernhard, E-mail: bernhard.schnackenburg@philips.co; Streitparth, Florian, E-mail: florian.streitparth@charite.de

2010-04-15

In this article, we study in vitro evaluation of needle artefacts and image quality for musculoskeletal laser-interventions in an open high-field magnetic resonance imaging (MRI) scanner at 1.0T with vertical field orientation. Five commercially available MRI-compatible puncture needles were assessed based on artefact characteristics in a CuSO4 phantom (0.1%) and in human cadaveric lumbar spines. First, six different interventional sequences were evaluated with varying needle orientation to the main magnetic field B0 (0{sup o} to 90{sup o}) in a sequence test. Artefact width, needle-tip error, and contrast-to-noise ratio (CNR) were calculated. Second, a gradient-echo sequence used for thermometric monitoring wasmore » assessed and in varying echo times, artefact width, tip error, and signal-to-noise ratio (SNR) were measured. Artefact width and needle-tip error correlated with needle material, instrument orientation to B0, and sequence type. Fast spin-echo sequences produced the smallest needle artefacts for all needles, except for the carbon fibre needle (width <3.5 mm, tip error <2 mm) at 45{sup o} to B0. Overall, the proton density-weighted spin-echo sequences had the best CNR (CNR{sub Muscle/Needle} >16.8). Concerning the thermometric gradient echo sequence, artefacts remained <5 mm, and the SNR reached its maximum at an echo time of 15 ms. If needle materials and sequences are accordingly combined, guidance and monitoring of musculoskeletal laser interventions may be feasible in a vertical magnetic field at 1.0T.« less

Draft versus finished sequence data for DNA and protein diagnostic signature development

PubMed Central

Gardner, Shea N.; Lam, Marisa W.; Smith, Jason R.; Torres, Clinton L.; Slezak, Tom R.

2005-01-01

Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP uses simulations to estimate the number of target genomes and close phylogenetic relatives (near neighbors or NNs) to sequence. We use SAP to assess whether draft data are sufficient or finished sequencing is required using Marburg and variola virus sequences. Simulations indicate that intermediate to high-quality draft with error rates of 10−3–10−5 (∼8× coverage) of target organisms is suitable for DNA signature prediction. Low-quality draft with error rates of ∼1% (3× to 6× coverage) of target isolates is inadequate for DNA signature prediction, although low-quality draft of NNs is sufficient, as long as the target genomes are of high quality. For protein signature prediction, sequencing errors in target genomes substantially reduce the detection of amino acid sequence conservation, even if the draft is of high quality. In summary, high-quality draft of target and low-quality draft of NNs appears to be a cost-effective investment for DNA signature prediction, but may lead to underestimation of predicted protein signatures. PMID:16243783
Accurate typing of short tandem repeats from genome-wide sequencing data and its applications.

PubMed

Fungtammasan, Arkarachai; Ananda, Guruprasad; Hile, Suzanne E; Su, Marcia Shu-Wei; Sun, Chen; Harris, Robert; Medvedev, Paul; Eckert, Kristin; Makova, Kateryna D

2015-05-01

Short tandem repeats (STRs) are implicated in dozens of human genetic diseases and contribute significantly to genome variation and instability. Yet profiling STRs from short-read sequencing data is challenging because of their high sequencing error rates. Here, we developed STR-FM, short tandem repeat profiling using flank-based mapping, a computational pipeline that can detect the full spectrum of STR alleles from short-read data, can adapt to emerging read-mapping algorithms, and can be applied to heterogeneous genetic samples (e.g., tumors, viruses, and genomes of organelles). We used STR-FM to study STR error rates and patterns in publicly available human and in-house generated ultradeep plasmid sequencing data sets. We discovered that STRs sequenced with a PCR-free protocol have up to ninefold fewer errors than those sequenced with a PCR-containing protocol. We constructed an error correction model for genotyping STRs that can distinguish heterozygous alleles containing STRs with consecutive repeat numbers. Applying our model and pipeline to Illumina sequencing data with 100-bp reads, we could confidently genotype several disease-related long trinucleotide STRs. Utilizing this pipeline, for the first time we determined the genome-wide STR germline mutation rate from a deeply sequenced human pedigree. Additionally, we built a tool that recommends minimal sequencing depth for accurate STR genotyping, depending on repeat length and sequencing read length. The required read depth increases with STR length and is lower for a PCR-free protocol. This suite of tools addresses the pressing challenges surrounding STR genotyping, and thus is of wide interest to researchers investigating disease-related STRs and STR evolution. © 2015 Fungtammasan et al.; Published by Cold Spring Harbor Laboratory Press.
Dynamically correcting two-qubit gates against any systematic logical error

NASA Astrophysics Data System (ADS)

Calderon Vargas, Fernando Antonio

The reliability of quantum information processing depends on the ability to deal with noise and error in an efficient way. A significant source of error in many settings is coherent, systematic gate error. This work introduces a set of composite pulse sequences that generate maximally entangling gates and correct all systematic errors within the logical subspace to arbitrary order. These sequences are applica- ble for any two-qubit interaction Hamiltonian, and make no assumptions about the underlying noise mechanism except that it is constant on the timescale of the opera- tion. The prime use for our results will be in cases where one has limited knowledge of the underlying physical noise and control mechanisms, highly constrained control, or both. In particular, we apply these composite pulse sequences to the quantum system formed by two capacitively coupled singlet-triplet qubits, which is charac- terized by having constrained control and noise sources that are low frequency and of a non-Markovian nature.
Research on wind field algorithm of wind lidar based on BP neural network and grey prediction

NASA Astrophysics Data System (ADS)

Chen, Yong; Chen, Chun-Li; Luo, Xiong; Zhang, Yan; Yang, Ze-hou; Zhou, Jie; Shi, Xiao-ding; Wang, Lei

2018-01-01

This paper uses the BP neural network and grey algorithm to forecast and study radar wind field. In order to reduce the residual error in the wind field prediction which uses BP neural network and grey algorithm, calculating the minimum value of residual error function, adopting the residuals of the gray algorithm trained by BP neural network, using the trained network model to forecast the residual sequence, using the predicted residual error sequence to modify the forecast sequence of the grey algorithm. The test data show that using the grey algorithm modified by BP neural network can effectively reduce the residual value and improve the prediction precision.
ImmuneDB: a system for the analysis and exploration of high-throughput adaptive immune receptor sequencing data.

PubMed

Rosenfeld, Aaron M; Meng, Wenzhao; Luning Prak, Eline T; Hershberg, Uri

2017-01-15

As high-throughput sequencing of B cells becomes more common, the need for tools to analyze the large quantity of data also increases. This article introduces ImmuneDB, a system for analyzing vast amounts of heavy chain variable region sequences and exploring the resulting data. It can take as input raw FASTA/FASTQ data, identify genes, determine clones, construct lineages, as well as provide information such as selection pressure and mutation analysis. It uses an industry leading database, MySQL, to provide fast analysis and avoid the complexities of using error prone flat-files. ImmuneDB is freely available at http://immunedb.comA demo of the ImmuneDB web interface is available at: http://immunedb.com/demo CONTACT: Uh25@drexel.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
MtDNA mutations are a common cause of severe disease phenotypes in children with Leigh syndrome.

PubMed

Naess, Karin; Freyer, Christoph; Bruhn, Helene; Wibom, Rolf; Malm, Gunilla; Nennesmo, Inger; von Döbeln, Ulrika; Larsson, Nils-Göran

2009-05-01

Leigh syndrome is a common clinical manifestation in children with mitochondrial disease and other types of inborn errors of metabolism. We characterised clinical symptoms, prognosis, respiratory chain function and performed extensive genetic analysis of 25 Swedish children suffering from Leigh syndrome with the aim to obtain insights into the molecular pathophysiology and to provide a rationale for genetic counselling. We reviewed the clinical history of all patients and used muscle biopsies in order to perform molecular, biochemical and genetic investigations, including sequencing the entire mitochondrial DNA (mtDNA), the mitochondrial DNA polymerase (POLGA) gene and the surfeit locus protein 1 (SURF1) gene. Respiratory chain enzyme activity measurements identified five patients with isolated complex I deficiency and five with combined enzyme deficiencies. No patient presented with isolated complex IV deficiency. Seven patients had a decreased ATP production rate. Extensive sequence analysis identified eight patients with pathogenic mtDNA mutations and one patient with mutations in POLGA. Mutations of mtDNA are a common cause of LS and mtDNA analysis should always be included in the diagnosis of LS patients, whereas SURF1 mutations are not a common cause of LS in Sweden. Unexpectedly, age of onset, clinical symptoms and prognosis did not reveal any clear differences in LS patients with mtDNA or nuclear DNA mutations.
Wideband Arrhythmia-Insensitive-Rapid (AIR) Pulse Sequence for Cardiac T1 mapping without Image Artifacts induced by ICD

PubMed Central

Hong, KyungPyo; Jeong, Eun-Kee; Wall, T. Scott; Drakos, Stavros G.; Kim, Daniel

2015-01-01

Purpose To develop and evaluate a wideband arrhythmia-insensitive-rapid (AIR) pulse sequence for cardiac T1 mapping without image artifacts induced by implantable-cardioverter-defibrillator (ICD). Methods We developed a wideband AIR pulse sequence by incorporating a saturation pulse with wide frequency bandwidth (8.9 kHz), in order to achieve uniform T1 weighting in the heart with ICD. We tested the performance of original and “wideband” AIR cardiac T1 mapping pulse sequences in phantom and human experiments at 1.5T. Results In 5 phantoms representing native myocardium and blood and post-contrast blood/tissue T1 values, compared with the control T1 values measured with an inversion-recovery pulse sequence without ICD, T1 values measured with original AIR with ICD were considerably lower (absolute percent error >29%), whereas T1 values measured with wideband AIR with ICD were similar (absolute percent error <5%). Similarly, in 11 human subjects, compared with the control T1 values measured with original AIR without ICD, T1 measured with original AIR with ICD was significantly lower (absolute percent error >10.1%), whereas T1 measured with wideband AIR with ICD was similar (absolute percent error <2.0%). Conclusion This study demonstrates the feasibility of a wideband pulse sequence for cardiac T1 mapping without significant image artifacts induced by ICD. PMID:25975192
[Memorization of Sequences of Movements of the Right and the Left Hand by Right- and Left-Handers].

PubMed

Bobrova, E V; Bogacheva, I N; Lyakhovetskii, V A; Fabinskaja, A A; Fomina, E V

2015-01-01

We analyzed the errors of right- and left-handers when performing memorized sequences by the left or the right hand during the task which activates positional coding: after 6-10 times the order of movements changed (the positions remained the same during all task). The task was first performed by one ("initial") hand, and then by another one ("continuing"); there were 2 groups of right-handers and 2 groups of left-handers. It was found that the pattern of errors during the task performance by the initial hand is similar in right- and left-handers both for the dominant and non-dominant hand. The information about the previous positions after changing the order of elements is used in the sequences for subdominant hands and not used in the sequences for dominant ones. After changing the hand, right- and left-handers show different patterns of errors ("non-symmetrical"). Thus, the errors of right- and left-handers are "symmetrical" at the early stages of task performance, while the transfer of this motor skill in right-and left-handers occurs in different ways.
The Representation of Prediction Error in Auditory Cortex

PubMed Central

Rubin, Jonathan; Ulanovsky, Nachum; Tishby, Naftali

2016-01-01

To survive, organisms must extract information from the past that is relevant for their future. How this process is expressed at the neural level remains unclear. We address this problem by developing a novel approach from first principles. We show here how to generate low-complexity representations of the past that produce optimal predictions of future events. We then illustrate this framework by studying the coding of ‘oddball’ sequences in auditory cortex. We find that for many neurons in primary auditory cortex, trial-by-trial fluctuations of neuronal responses correlate with the theoretical prediction error calculated from the short-term past of the stimulation sequence, under constraints on the complexity of the representation of this past sequence. In some neurons, the effect of prediction error accounted for more than 50% of response variability. Reliable predictions often depended on a representation of the sequence of the last ten or more stimuli, although the representation kept only few details of that sequence. PMID:27490251
XPAT: a toolkit to conduct cross-platform association studies with heterogeneous sequencing datasets.

PubMed

Yu, Yao; Hu, Hao; Bohlender, Ryan J; Hu, Fulan; Chen, Jiun-Sheng; Holt, Carson; Fowler, Jerry; Guthery, Stephen L; Scheet, Paul; Hildebrandt, Michelle A T; Yandell, Mark; Huff, Chad D

2018-04-06

High-throughput sequencing data are increasingly being made available to the research community for secondary analyses, providing new opportunities for large-scale association studies. However, heterogeneity in target capture and sequencing technologies often introduce strong technological stratification biases that overwhelm subtle signals of association in studies of complex traits. Here, we introduce the Cross-Platform Association Toolkit, XPAT, which provides a suite of tools designed to support and conduct large-scale association studies with heterogeneous sequencing datasets. XPAT includes tools to support cross-platform aware variant calling, quality control filtering, gene-based association testing and rare variant effect size estimation. To evaluate the performance of XPAT, we conducted case-control association studies for three diseases, including 783 breast cancer cases, 272 ovarian cancer cases, 205 Crohn disease cases and 3507 shared controls (including 1722 females) using sequencing data from multiple sources. XPAT greatly reduced Type I error inflation in the case-control analyses, while replicating many previously identified disease-gene associations. We also show that association tests conducted with XPAT using cross-platform data have comparable performance to tests using matched platform data. XPAT enables new association studies that combine existing sequencing datasets to identify genetic loci associated with common diseases and other complex traits.
Error correcting code with chip kill capability and power saving enhancement

DOEpatents

Gara, Alan G [Mount Kisco, NY; Chen, Dong [Croton On Husdon, NY; Coteus, Paul W [Yorktown Heights, NY; Flynn, William T [Rochester, MN; Marcella, James A [Rochester, MN; Takken, Todd [Brewster, NY; Trager, Barry M [Yorktown Heights, NY; Winograd, Shmuel [Scarsdale, NY

2011-08-30

A method and system are disclosed for detecting memory chip failure in a computer memory system. The method comprises the steps of accessing user data from a set of user data chips, and testing the user data for errors using data from a set of system data chips. This testing is done by generating a sequence of check symbols from the user data, grouping the user data into a sequence of data symbols, and computing a specified sequence of syndromes. If all the syndromes are zero, the user data has no errors. If one of the syndromes is non-zero, then a set of discriminator expressions are computed, and used to determine whether a single or double symbol error has occurred. In the preferred embodiment, less than two full system data chips are used for testing and correcting the user data.
Illusory conjunctions of pitch and duration in unfamiliar tone sequences.

PubMed

Thompson, W F; Hall, M D; Pressing, J

2001-02-01

In 3 experiments, the authors examined short-term memory for pitch and duration in unfamiliar tone sequences. Participants were presented a target sequence consisting of 2 tones (Experiment 1) or 7 tones (Experiments 2 and 3) and then a probe tone. Participants indicated whether the probe tone matched 1 of the target tones in both pitch and duration. Error rates were relatively low if the probe tone matched 1 of the target tones or if it differed from target tones in pitch, duration, or both. Error rates were remarkably high, however, if the probe tone combined the pitch of 1 target tone with the duration of a different target tone. The results suggest that illusory conjunctions of these dimensions frequently occur. A mathematical model is presented that accounts for the relative contribution of pitch errors, duration errors, and illusory conjunctions of pitch and duration.
SeqLib: a C ++ API for rapid BAM manipulation, sequence alignment and sequence assembly

PubMed Central

Wala, Jeremiah; Beroukhim, Rameen

2017-01-01

Abstract We present SeqLib, a C ++ API and command line tool that provides a rapid and user-friendly interface to BAM/SAM/CRAM files, global sequence alignment operations and sequence assembly. Four C libraries perform core operations in SeqLib: HTSlib for BAM access, BWA-MEM and BLAT for sequence alignment and Fermi for error correction and sequence assembly. Benchmarking indicates that SeqLib has lower CPU and memory requirements than leading C ++ sequence analysis APIs. We demonstrate an example of how minimal SeqLib code can extract, error-correct and assemble reads from a CRAM file and then align with BWA-MEM. SeqLib also provides additional capabilities, including chromosome-aware interval queries and read plotting. Command line tools are available for performing integrated error correction, micro-assemblies and alignment. Availability and Implementation: SeqLib is available on Linux and OSX for the C ++98 standard and later at github.com/walaj/SeqLib. SeqLib is released under the Apache2 license. Additional capabilities for BLAT alignment are available under the BLAT license. Contact: jwala@broadinstitue.org; rameen@broadinstitute.org PMID:28011768
SeqLib: a C ++ API for rapid BAM manipulation, sequence alignment and sequence assembly.

PubMed

Wala, Jeremiah; Beroukhim, Rameen

2017-03-01

We present SeqLib, a C ++ API and command line tool that provides a rapid and user-friendly interface to BAM/SAM/CRAM files, global sequence alignment operations and sequence assembly. Four C libraries perform core operations in SeqLib: HTSlib for BAM access, BWA-MEM and BLAT for sequence alignment and Fermi for error correction and sequence assembly. Benchmarking indicates that SeqLib has lower CPU and memory requirements than leading C ++ sequence analysis APIs. We demonstrate an example of how minimal SeqLib code can extract, error-correct and assemble reads from a CRAM file and then align with BWA-MEM. SeqLib also provides additional capabilities, including chromosome-aware interval queries and read plotting. Command line tools are available for performing integrated error correction, micro-assemblies and alignment. SeqLib is available on Linux and OSX for the C ++98 standard and later at github.com/walaj/SeqLib. SeqLib is released under the Apache2 license. Additional capabilities for BLAT alignment are available under the BLAT license. jwala@broadinstitue.org ; rameen@broadinstitute.org. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Prediction of protein tertiary structure from sequences using a very large back-propagation neural network

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, X.; Wilcox, G.L.

1993-12-31

We have implemented large scale back-propagation neural networks on a 544 node Connection Machine, CM-5, using the C language in MIMD mode. The program running on 512 processors performs backpropagation learning at 0.53 Gflops, which provides 76 million connection updates per second. We have applied the network to the prediction of protein tertiary structure from sequence information alone. A neural network with one hidden layer and 40 million connections is trained to learn the relationship between sequence and tertiary structure. The trained network yields predicted structures of some proteins on which it has not been trained given only their sequences.more » Presentation of the Fourier transform of the sequences accentuates periodicity in the sequence and yields good generalization with greatly increased training efficiency. Training simulations with a large, heterologous set of protein structures (111 proteins from CM-5 time) to solutions with under 2% RMS residual error within the training set (random responses give an RMS error of about 20%). Presentation of 15 sequences of related proteins in a testing set of 24 proteins yields predicted structures with less than 8% RMS residual error, indicating good apparent generalization.« less
DNA assembly with error correction on a droplet digital microfluidics platform.

PubMed

Khilko, Yuliya; Weyman, Philip D; Glass, John I; Adams, Mark D; McNeil, Melanie A; Griffin, Peter B

2018-06-01

Custom synthesized DNA is in high demand for synthetic biology applications. However, current technologies to produce these sequences using assembly from DNA oligonucleotides are costly and labor-intensive. The automation and reduced sample volumes afforded by microfluidic technologies could significantly decrease materials and labor costs associated with DNA synthesis. The purpose of this study was to develop a gene assembly protocol utilizing a digital microfluidic device. Toward this goal, we adapted bench-scale oligonucleotide assembly methods followed by enzymatic error correction to the Mondrian™ digital microfluidic platform. We optimized Gibson assembly, polymerase chain reaction (PCR), and enzymatic error correction reactions in a single protocol to assemble 12 oligonucleotides into a 339-bp double- stranded DNA sequence encoding part of the human influenza virus hemagglutinin (HA) gene. The reactions were scaled down to 0.6-1.2 μL. Initial microfluidic assembly methods were successful and had an error frequency of approximately 4 errors/kb with errors originating from the original oligonucleotide synthesis. Relative to conventional benchtop procedures, PCR optimization required additional amounts of MgCl 2 , Phusion polymerase, and PEG 8000 to achieve amplification of the assembly and error correction products. After one round of error correction, error frequency was reduced to an average of 1.8 errors kb - 1 . We demonstrated that DNA assembly from oligonucleotides and error correction could be completely automated on a digital microfluidic (DMF) platform. The results demonstrate that enzymatic reactions in droplets show a strong dependence on surface interactions, and successful on-chip implementation required supplementation with surfactants, molecular crowding agents, and an excess of enzyme. Enzymatic error correction of assembled fragments improved sequence fidelity by 2-fold, which was a significant improvement but somewhat lower than expected compared to bench-top assays, suggesting an additional capacity for optimization.
Lip-reading enhancement for law enforcement

NASA Astrophysics Data System (ADS)

Theobald, Barry J.; Harvey, Richard; Cox, Stephen J.; Lewis, Colin; Owen, Gari P.

2006-09-01

Accurate lip-reading techniques would be of enormous benefit for agencies involved in counter-terrorism and other law-enforcement areas. Unfortunately, there are very few skilled lip-readers, and it is apparently a difficult skill to transmit, so the area is under-resourced. In this paper we investigate the possibility of making the lip-reading task more amenable to a wider range of operators by enhancing lip movements in video sequences using active appearance models. These are generative, parametric models commonly used to track faces in images and video sequences. The parametric nature of the model allows a face in an image to be encoded in terms of a few tens of parameters, while the generative nature allows faces to be re-synthesised using the parameters. The aim of this study is to determine if exaggerating lip-motions in video sequences by amplifying the parameters of the model improves lip-reading ability. We also present results of lip-reading tests undertaken by experienced (but non-expert) adult subjects who claim to use lip-reading in their speech recognition process. The results, which are comparisons of word error-rates on unprocessed and processed video, are mixed. We find that there appears to be the potential to improve the word error rate but, for the method to improve the intelligibility there is need for more sophisticated tracking and visual modelling. Our technique can also act as an expression or visual gesture amplifier and so has applications to animation and the presentation of information via avatars or synthetic humans.
The role of consolidation in learning context-dependent phonotactic patterns in speech and digital sequence production.

PubMed

Anderson, Nathaniel D; Dell, Gary S

2018-04-03

Speakers implicitly learn novel phonotactic patterns by producing strings of syllables. The learning is revealed in their speech errors. First-order patterns, such as "/f/ must be a syllable onset," can be distinguished from contingent, or second-order, patterns, such as "/f/ must be an onset if the vowel is /a/, but a coda if the vowel is /o/." A metaanalysis of 19 experiments clearly demonstrated that first-order patterns affect speech errors to a very great extent in a single experimental session, but second-order vowel-contingent patterns only affect errors on the second day of testing, suggesting the need for a consolidation period. Two experiments tested an analogue to these studies involving sequences of button pushes, with fingers as "consonants" and thumbs as "vowels." The button-push errors revealed two of the key speech-error findings: first-order patterns are learned quickly, but second-order thumb-contingent patterns are only strongly revealed in the errors on the second day of testing. The influence of computational complexity on the implicit learning of phonotactic patterns in speech production may be a general feature of sequence production.
FMLRC: Hybrid long read error correction using an FM-index.

PubMed

Wang, Jeremy R; Holt, James; McMillan, Leonard; Jones, Corbin D

2018-02-09

Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging "hybrid" assemblies that use long reads for scaffolding and short reads for accuracy. We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods. Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.
A matter of emphasis: Linguistic stress habits modulate serial recall.

PubMed

Taylor, John C; Macken, Bill; Jones, Dylan M

2015-04-01

Models of short-term memory for sequential information rely on item-level, feature-based descriptions to account for errors in serial recall. Transposition errors within alternating similar/dissimilar letter sequences derive from interactions between overlapping features. However, in two experiments, we demonstrated that the characteristics of the sequence are what determine the fates of items, rather than the properties ascribed to the items themselves. Performance in alternating sequences is determined by the way that the sequences themselves induce particular prosodic rehearsal patterns, and not by the nature of the items per se. In a serial recall task, the shapes of the canonical "saw-tooth" serial position curves and transposition error probabilities at successive input-output distances were modulated by subvocal rehearsal strategies, despite all item-based parameters being held constant. We replicated this finding using nonalternating lists, thus demonstrating that transpositions are substantially influenced by prosodic features-such as stress-that emerge during subvocal rehearsal.

Auto-tracking system for human lumbar motion analysis.

PubMed

Sui, Fuge; Zhang, Da; Lam, Shing Chun Benny; Zhao, Lifeng; Wang, Dongjun; Bi, Zhenggang; Hu, Yong

2011-01-01

Previous lumbar motion analyses suggest the usefulness of quantitatively characterizing spine motion. However, the application of such measurements is still limited by the lack of user-friendly automatic spine motion analysis systems. This paper describes an automatic analysis system to measure lumbar spine disorders that consists of a spine motion guidance device, an X-ray imaging modality to acquire digitized video fluoroscopy (DVF) sequences and an automated tracking module with a graphical user interface (GUI). DVF sequences of the lumbar spine are recorded during flexion-extension under a guidance device. The automatic tracking software utilizing a particle filter locates the vertebra-of-interest in every frame of the sequence, and the tracking result is displayed on the GUI. Kinematic parameters are also extracted from the tracking results for motion analysis. We observed that, in a bone model test, the maximum fiducial error was 3.7%, and the maximum repeatability error in translation and rotation was 1.2% and 2.6%, respectively. In our simulated DVF sequence study, the automatic tracking was not successful when the noise intensity was greater than 0.50. In a noisy situation, the maximal difference was 1.3 mm in translation and 1° in the rotation angle. The errors were calculated in translation (fiducial error: 2.4%, repeatability error: 0.5%) and in the rotation angle (fiducial error: 1.0%, repeatability error: 0.7%). However, the automatic tracking software could successfully track simulated sequences contaminated by noise at a density ≤ 0.5 with very high accuracy, providing good reliability and robustness. A clinical trial with 10 healthy subjects and 2 lumbar spondylolisthesis patients were enrolled in this study. The measurement with auto-tacking of DVF provided some information not seen in the conventional X-ray. The results proposed the potential use of the proposed system for clinical applications.
A robust interpolation procedure for producing tidal current ellipse inputs for regional and coastal ocean numerical models

NASA Astrophysics Data System (ADS)

Byun, Do-Seong; Hart, Deirdre E.

2017-04-01

Regional and/or coastal ocean models can use tidal current harmonic forcing, together with tidal harmonic forcing along open boundaries in order to successfully simulate tides and tidal currents. These inputs can be freely generated using online open-access data, but the data produced are not always at the resolution required for regional or coastal models. Subsequent interpolation procedures can produce tidal current forcing data errors for parts of the world's coastal ocean where tidal ellipse inclinations and phases move across the invisible mathematical "boundaries" between 359° and 0° degrees (or 179° and 0°). In nature, such "boundaries" are in fact smooth transitions, but if these mathematical "boundaries" are not treated correctly during interpolation, they can produce inaccurate input data and hamper the accurate simulation of tidal currents in regional and coastal ocean models. These avoidable errors arise due to procedural shortcomings involving vector embodiment problems (i.e., how a vector is represented mathematically, for example as velocities or as coordinates). Automated solutions for producing correct tidal ellipse parameter input data are possible if a series of steps are followed correctly, including the use of Cartesian coordinates during interpolation. This note comprises the first published description of scenarios where tidal ellipse parameter interpolation errors can arise, and of a procedure to successfully avoid these errors when generating tidal inputs for regional and/or coastal ocean numerical models. We explain how a straightforward sequence of data production, format conversion, interpolation, and format reconversion steps may be used to check for the potential occurrence and avoidance of tidal ellipse interpolation and phase errors. This sequence is demonstrated via a case study of the M2 tidal constituent in the seas around Korea but is designed to be universally applicable. We also recommend employing tidal ellipse parameter calculation methods that avoid the use of Foreman's (1978) "northern semi-major axis convention" since, as revealed in our analysis, this commonly used conversion can result in inclination interpolation errors even when Cartesian coordinate-based "vector embodiment" solutions are employed.
ECHO: A reference-free short-read error correction algorithm

PubMed Central

Kao, Wei-Chun; Chan, Andrew H.; Song, Yun S.

2011-01-01

Developing accurate, scalable algorithms to improve data quality is an important computational challenge associated with recent advances in high-throughput sequencing technology. In this study, a novel error-correction algorithm, called ECHO, is introduced for correcting base-call errors in short-reads, without the need of a reference genome. Unlike most previous methods, ECHO does not require the user to specify parameters of which optimal values are typically unknown a priori. ECHO automatically sets the parameters in the assumed model and estimates error characteristics specific to each sequencing run, while maintaining a running time that is within the range of practical use. ECHO is based on a probabilistic model and is able to assign a quality score to each corrected base. Furthermore, it explicitly models heterozygosity in diploid genomes and provides a reference-free method for detecting bases that originated from heterozygous sites. On both real and simulated data, ECHO is able to improve the accuracy of previous error-correction methods by several folds to an order of magnitude, depending on the sequence coverage depth and the position in the read. The improvement is most pronounced toward the end of the read, where previous methods become noticeably less effective. Using a whole-genome yeast data set, it is demonstrated here that ECHO is capable of coping with nonuniform coverage. Also, it is shown that using ECHO to perform error correction as a preprocessing step considerably facilitates de novo assembly, particularly in the case of low-to-moderate sequence coverage depth. PMID:21482625
Spatio-temporal alignment of pedobarographic image sequences.

PubMed

Oliveira, Francisco P M; Sousa, Andreia; Santos, Rubim; Tavares, João Manuel R S

2011-07-01

This article presents a methodology to align plantar pressure image sequences simultaneously in time and space. The spatial position and orientation of a foot in a sequence are changed to match the foot represented in a second sequence. Simultaneously with the spatial alignment, the temporal scale of the first sequence is transformed with the aim of synchronizing the two input footsteps. Consequently, the spatial correspondence of the foot regions along the sequences as well as the temporal synchronizing is automatically attained, making the study easier and more straightforward. In terms of spatial alignment, the methodology can use one of four possible geometric transformation models: rigid, similarity, affine, or projective. In the temporal alignment, a polynomial transformation up to the 4th degree can be adopted in order to model linear and curved time behaviors. Suitable geometric and temporal transformations are found by minimizing the mean squared error (MSE) between the input sequences. The methodology was tested on a set of real image sequences acquired from a common pedobarographic device. When used in experimental cases generated by applying geometric and temporal control transformations, the methodology revealed high accuracy. In addition, the intra-subject alignment tests from real plantar pressure image sequences showed that the curved temporal models produced better MSE results (P < 0.001) than the linear temporal model. This article represents an important step forward in the alignment of pedobarographic image data, since previous methods can only be applied on static images.
CRITICA: coding region identification tool invoking comparative analysis

NASA Technical Reports Server (NTRS)

Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)

1999-01-01

Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).
Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor.

PubMed

Kohany, Oleksiy; Gentles, Andrew J; Hankus, Lukasz; Jurka, Jerzy

2006-10-25

Repbase is a reference database of eukaryotic repetitive DNA, which includes prototypic sequences of repeats and basic information described in annotations. Updating and maintenance of the database requires specialized tools, which we have created and made available for use with Repbase, and which may be useful as a template for other curated databases. We describe the software tools RepbaseSubmitter and Censor, which are designed to facilitate updating and screening the content of Repbase. RepbaseSubmitter is a java-based interface for formatting and annotating Repbase entries. It eliminates many common formatting errors, and automates actions such as calculation of sequence lengths and composition, thus facilitating curation of Repbase sequences. In addition, it has several features for predicting protein coding regions in sequences; searching and including Pubmed references in Repbase entries; and searching the NCBI taxonomy database for correct inclusion of species information and taxonomic position. Censor is a tool to rapidly identify repetitive elements by comparison to known repeats. It uses WU-BLAST for speed and sensitivity, and can conduct DNA-DNA, DNA-protein, or translated DNA-translated DNA searches of genomic sequence. Defragmented output includes a map of repeats present in the query sequence, with the options to report masked query sequence(s), repeat sequences found in the query, and alignments. Censor and RepbaseSubmitter are available as both web-based services and downloadable versions. They can be found at http://www.girinst.org/repbase/submission.html (RepbaseSubmitter) and http://www.girinst.org/censor/index.php (Censor).
Multiple symbol partially coherent detection of MPSK

NASA Technical Reports Server (NTRS)

Simon, M. K.; Divsalar, D.

1992-01-01

It is shown that by using the known (or estimated) value of carrier tracking loop signal to noise ratio (SNR) in the decision metric, it is possible to improve the error probability performance of a partially coherent multiple phase-shift-keying (MPSK) system relative to that corresponding to the commonly used ideal coherent decision rule. Using a maximum-likeihood approach, an optimum decision metric is derived and shown to take the form of a weighted sum of the ideal coherent decision metric (i.e., correlation) and the noncoherent decision metric which is optimum for differential detection of MPSK. The performance of a receiver based on this optimum decision rule is derived and shown to provide continued improvement with increasing length of observation interval (data symbol sequence length). Unfortunately, increasing the observation length does not eliminate the error floor associated with the finite loop SNR. Nevertheless, in the limit of infinite observation length, the average error probability performance approaches the algebraic sum of the error floor and the performance of ideal coherent detection, i.e., at any error probability above the error floor, there is no degradation due to the partial coherence. It is shown that this limiting behavior is virtually achievable with practical size observation lengths. Furthermore, the performance is quite insensitive to mismatch between the estimate of loop SNR (e.g., obtained from measurement) fed to the decision metric and its true value. These results may be of use in low-cost Earth-orbiting or deep-space missions employing coded modulations.
Error catastrophe and phase transition in the empirical fitness landscape of HIV

NASA Astrophysics Data System (ADS)

Hart, Gregory R.; Ferguson, Andrew L.

2015-03-01

We have translated clinical sequence databases of the p6 HIV protein into an empirical fitness landscape quantifying viral replicative capacity as a function of the amino acid sequence. We show that the viral population resides close to a phase transition in sequence space corresponding to an "error catastrophe" beyond which there is lethal accumulation of mutations. Our model predicts that the phase transition may be induced by drug therapies that elevate the mutation rate, or by forcing mutations at particular amino acids. Applying immune pressure to any combination of killer T-cell targets cannot induce the transition, providing a rationale for why the viral protein can exist close to the error catastrophe without sustaining fatal fitness penalties due to adaptive immunity.
A statistical method for the detection of variants from next-generation resequencing of DNA pools.

PubMed

Bansal, Vikas

2010-06-15

Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing. We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80-85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3-5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP. Implementation of this method is available at http://polymorphism.scripps.edu/~vbansal/software/CRISP/.
Pilot self-coding applied in optical OFDM systems

NASA Astrophysics Data System (ADS)

Li, Changping; Yi, Ying; Lee, Kyesan

2015-04-01

This paper studies the frequency offset correction technique which can be applied in optical OFDM systems. Through theoretical analysis and computer simulations, we can observe that our proposed scheme named pilot self-coding (PSC) has a distinct influence for rectifying the frequency offset, which could mitigate the OFDM performance deterioration because of inter-carrier interference and common phase error. The main approach is to assign a pilot subcarrier before data subcarriers and copy this subcarrier sequence to the symmetric side. The simulation results verify that our proposed PSC is indeed effective against the high degree of frequency offset.
Quality of death notification forms in North West Bank/Palestine: a descriptive study.

PubMed

Qaddumi, Jamal A S; Nazzal, Zaher; Yacoup, Allam R S; Mansour, Mahmoud

2017-04-11

The death notification forms (DNFs) are important documents. Thus, inability to fill it properly by physicians will affect the national mortality report and, consequently, the evidence-based decision making. The errors in filling DNFs are common all over the world and are different in types and causes. We aimed to evaluate the quality of DNFs in terms of completeness and types of errors in the cause of death section. A descriptive study was conducted to review 2707 DNFs in North West Bank/Palestine during the year 2012 using data abstraction sheets. SPSS 17.0 was used to show the frequency of major and minor errors committed in filling the DNFs. Surprisingly, only 1% of the examined DNFs had their cause of death section filled completely correct. The immediate cause of death was correctly identified in 5.9% of all DNFs and the underlying cause of death was correctly reported in 55.4% of them. The sequence was incorrect in 41.5% of the DNFs. The most frequently documented minor error was "Not writing Time intervals" error (97.0%). Almost all DNFs contained at least one minor or major error. This high percentage of errors may affect the mortality and morbidity statistics, public health research and the process of providing evidence for health policy. Training workshops on DNF completion for newly recruited employees and at the beginning of the residency program are recommended on a regular basis. As well, we recommend reviewing the national DNFs to simplify it and make it consistent with updated evidence-based guidelines and recommendation.
Spatiotemporal Filtering Using Principal Component Analysis and Karhunen-Loeve Expansion Approaches for Regional GPS Network Analysis

NASA Technical Reports Server (NTRS)

Dong, D.; Fang, P.; Bock, F.; Webb, F.; Prawirondirdjo, L.; Kedar, S.; Jamason, P.

2006-01-01

Spatial filtering is an effective way to improve the precision of coordinate time series for regional GPS networks by reducing so-called common mode errors, thereby providing better resolution for detecting weak or transient deformation signals. The commonly used approach to regional filtering assumes that the common mode error is spatially uniform, which is a good approximation for networks of hundreds of kilometers extent, but breaks down as the spatial extent increases. A more rigorous approach should remove the assumption of spatially uniform distribution and let the data themselves reveal the spatial distribution of the common mode error. The principal component analysis (PCA) and the Karhunen-Loeve expansion (KLE) both decompose network time series into a set of temporally varying modes and their spatial responses. Therefore they provide a mathematical framework to perform spatiotemporal filtering.We apply the combination of PCA and KLE to daily station coordinate time series of the Southern California Integrated GPS Network (SCIGN) for the period 2000 to 2004. We demonstrate that spatially and temporally correlated common mode errors are the dominant error source in daily GPS solutions. The spatial characteristics of the common mode errors are close to uniform for all east, north, and vertical components, which implies a very long wavelength source for the common mode errors, compared to the spatial extent of the GPS network in southern California. Furthermore, the common mode errors exhibit temporally nonrandom patterns.
Pathway analysis with next-generation sequencing data.

PubMed

Zhao, Jinying; Zhu, Yun; Boerwinkle, Eric; Xiong, Momiao

2015-04-01

Although pathway analysis methods have been developed and successfully applied to association studies of common variants, the statistical methods for pathway-based association analysis of rare variants have not been well developed. Many investigators observed highly inflated false-positive rates and low power in pathway-based tests of association of rare variants. The inflated false-positive rates and low true-positive rates of the current methods are mainly due to their lack of ability to account for gametic phase disequilibrium. To overcome these serious limitations, we develop a novel statistic that is based on the smoothed functional principal component analysis (SFPCA) for pathway association tests with next-generation sequencing data. The developed statistic has the ability to capture position-level variant information and account for gametic phase disequilibrium. By intensive simulations, we demonstrate that the SFPCA-based statistic for testing pathway association with either rare or common or both rare and common variants has the correct type 1 error rates. Also the power of the SFPCA-based statistic and 22 additional existing statistics are evaluated. We found that the SFPCA-based statistic has a much higher power than other existing statistics in all the scenarios considered. To further evaluate its performance, the SFPCA-based statistic is applied to pathway analysis of exome sequencing data in the early-onset myocardial infarction (EOMI) project. We identify three pathways significantly associated with EOMI after the Bonferroni correction. In addition, our preliminary results show that the SFPCA-based statistic has much smaller P-values to identify pathway association than other existing methods.
Comparison of cluster-based and source-attribution methods for estimating transmission risk using large HIV sequence databases.

PubMed

Le Vu, Stéphane; Ratmann, Oliver; Delpech, Valerie; Brown, Alison E; Gill, O Noel; Tostevin, Anna; Fraser, Christophe; Volz, Erik M

2018-06-01

Phylogenetic clustering of HIV sequences from a random sample of patients can reveal epidemiological transmission patterns, but interpretation is hampered by limited theoretical support and statistical properties of clustering analysis remain poorly understood. Alternatively, source attribution methods allow fitting of HIV transmission models and thereby quantify aspects of disease transmission. A simulation study was conducted to assess error rates of clustering methods for detecting transmission risk factors. We modeled HIV epidemics among men having sex with men and generated phylogenies comparable to those that can be obtained from HIV surveillance data in the UK. Clustering and source attribution approaches were applied to evaluate their ability to identify patient attributes as transmission risk factors. We find that commonly used methods show a misleading association between cluster size or odds of clustering and covariates that are correlated with time since infection, regardless of their influence on transmission. Clustering methods usually have higher error rates and lower sensitivity than source attribution method for identifying transmission risk factors. But neither methods provide robust estimates of transmission risk ratios. Source attribution method can alleviate drawbacks from phylogenetic clustering but formal population genetic modeling may be required to estimate quantitative transmission risk factors. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Pattern of eyelid motion predictive of decision errors during drowsiness: oculomotor indices of altered states.

PubMed

Lobb, M L; Stern, J A

1986-08-01

Sequential patterns of eye and eyelid motion were identified in seven subjects performing a modified serial probe recognition task under drowsy conditions. Using simultaneous EOG and video recordings, eyelid motion was divided into components above, within, and below the pupil and the durations in sequence were recorded. A serial probe recognition task was modified to allow for distinguishing decision errors from attention errors. Decision errors were found to be more frequent following a downward shift in the gaze angle which the eyelid closing sequence was reduced from a five element to a three element sequence. The velocity of the eyelid moving over the pupil during decision errors was slow in the closing and fast in the reopening phase, while on decision correct trials it was fast in closing and slower in reopening. Due to the high variability of eyelid motion under drowsy conditions these findings were only marginally significant. When a five element blink occurred, the velocity of the lid over pupil motion component of these endogenous eye blinks was significantly faster on decision correct than on decision error trials. Furthermore, the highly variable, long duration closings associated with the decision response produced slow eye movements in the horizontal plane (SEM) which were more frequent and significantly longer in duration on decision error versus decision correct responses.
Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads.

PubMed

Song, Li; Florea, Liliana

2015-01-01

Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole-genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, owing to the variation in gene expression levels and alternative splicing. We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read. Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from https://github.com/mourisl/Rcorrector/.
Using comparative genome analysis to identify problems in annotated microbial genomes.

PubMed

Poptsova, Maria S; Gogarten, J Peter

2010-07-01

Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.
Scoring-and-unfolding trimmed tree assembler: concepts, constructs and comparisons.

PubMed

Narzisi, Giuseppe; Mishra, Bud

2011-01-15

Mired by its connection to a well-known -complete combinatorial optimization problem-namely, the Shortest Common Superstring Problem (SCSP)-historically, the whole-genome sequence assembly (WGSA) problem has been assumed to be amenable only to greedy and heuristic methods. By placing efficiency as their first priority, these methods opted to rely only on local searches, and are thus inherently approximate, ambiguous or error prone, especially, for genomes with complex structures. Furthermore, since choice of the best heuristics depended critically on the properties of (e.g. errors in) the input data and the available long range information, these approaches hindered designing an error free WGSA pipeline. We dispense with the idea of limiting the solutions to just the approximated ones, and instead favor an approach that could potentially lead to an exhaustive (exponential-time) search of all possible layouts. Its computational complexity thus must be tamed through a constrained search (Branch-and-Bound) and quick identification and pruning of implausible overlays. For his purpose, such a method necessarily relies on a set of score functions (oracles) that can combine different structural properties (e.g. transitivity, coverage, physical maps, etc.). We give a detailed description of this novel assembly framework, referred to as Scoring-and-Unfolding Trimmed Tree Assembler (SUTTA), and present experimental results on several bacterial genomes using next-generation sequencing technology data. We also report experimental evidence that the assembly quality strongly depends on the choice of the minimum overlap parameter k. SUTTA's binaries are freely available to non-profit institutions for research and educational purposes at http://www.bioinformatics.nyu.edu.
[Refractive errors in patients with cerebral palsy].

PubMed

Mrugacz, Małgorzata; Bandzul, Krzysztof; Kułak, Wojciech; Poppe, Ewa; Jurowski, Piotr

2013-04-01

Ocular changes are common in patients with cerebral palsy (CP) and they exist in about 50% of cases. The most common are refractive errors and strabismus disease. The aim of the paper was to estimate the relativeness between refractive errors and neurological pathologies in patients with selected types of CP. MATERIAL AND METHODS. The subject of the analysis was showing refractive errors in patients within two groups of CP: diplegia spastica and tetraparesis, with nervous system pathologies taken into account. Results. This study was proven some correlations between refractive errors and type of CP and severity of the CP classified in GMFCS scale. Refractive errors were more common in patients with tetraparesis than with diplegia spastica. In the group with diplegia spastica more common were myopia and astigmatism, however in tetraparesis - hyperopia.
Ensemble codes involving hippocampal neurons are at risk during delayed performance tests.

PubMed

Hampson, R E; Deadwyler, S A

1996-11-26

Multielectrode recording techniques were used to record ensemble activity from 10 to 16 simultaneously active CA1 and CA3 neurons in the rat hippocampus during performance of a spatial delayed-nonmatch-to-sample task. Extracted sources of variance were used to assess the nature of two different types of errors that accounted for 30% of total trials. The two types of errors included ensemble "miscodes" of sample phase information and errors associated with delay-dependent corruption or disappearance of sample information at the time of the nonmatch response. Statistical assessment of trial sequences and associated "strength" of hippocampal ensemble codes revealed that miscoded error trials always followed delay-dependent error trials in which encoding was "weak," indicating that the two types of errors were "linked." It was determined that the occurrence of weakly encoded, delay-dependent error trials initiated an ensemble encoding "strategy" that increased the chances of being correct on the next trial and avoided the occurrence of further delay-dependent errors. Unexpectedly, the strategy involved "strongly" encoding response position information from the prior (delay-dependent) error trial and carrying it forward to the sample phase of the next trial. This produced a miscode type error on trials in which the "carried over" information obliterated encoding of the sample phase response on the next trial. Application of this strategy, irrespective of outcome, was sufficient to reorient the animal to the proper between trial sequence of response contingencies (nonmatch-to-sample) and boost performance to 73% correct on subsequent trials. The capacity for ensemble analyses of strength of information encoding combined with statistical assessment of trial sequences therefore provided unique insight into the "dynamic" nature of the role hippocampus plays in delay type memory tasks.

Ancient DNA sequence revealed by error-correcting codes.

PubMed

Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

2015-07-10

A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.
Ancient DNA sequence revealed by error-correcting codes

PubMed Central

Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

2015-01-01

A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228
Rapid Measurement and Correction of Phase Errors from B0 Eddy Currents: Impact on Image Quality for Non-Cartesian Imaging

PubMed Central

Brodsky, Ethan K.; Klaers, Jessica L.; Samsonov, Alexey A.; Kijowski, Richard; Block, Walter F.

2014-01-01

Non-Cartesian imaging sequences and navigational methods can be more sensitive to scanner imperfections that have little impact on conventional clinical sequences, an issue which has repeatedly complicated the commercialization of these techniques by frustrating transitions to multi-center evaluations. One such imperfection is phase errors caused by resonant frequency shifts from eddy currents induced in the cryostat by time-varying gradients, a phenomemon known as B0 eddy currents. These phase errors can have a substantial impact on sequences that use ramp sampling, bipolar gradients, and readouts at varying azimuthal angles. We present a method for measuring and correcting phase errors from B0 eddy currents and examine the results on two different scanner models. This technique yields significant improvements in image quality for high-resolution joint imaging on certain scanners. The results suggest that correction of short time B0 eddy currents in manufacturer provided service routines would simplify adoption of non-Cartesian sampling methods. PMID:22488532
Genotyping-by-sequencing for estimating relatedness in nonmodel organisms: Avoiding the trap of precise bias.

PubMed

Attard, Catherine R M; Beheregaray, Luciano B; Möller, Luciana M

2018-05-01

There has been remarkably little attention to using the high resolution provided by genotyping-by-sequencing (i.e., RADseq and similar methods) for assessing relatedness in wildlife populations. A major hurdle is the genotyping error, especially allelic dropout, often found in this type of data that could lead to downward-biased, yet precise, estimates of relatedness. Here, we assess the applicability of genotyping-by-sequencing for relatedness inferences given its relatively high genotyping error rate. Individuals of known relatedness were simulated under genotyping error, allelic dropout and missing data scenarios based on an empirical ddRAD data set, and their true relatedness was compared to that estimated by seven relatedness estimators. We found that an estimator chosen through such analyses can circumvent the influence of genotyping error, with the estimator of Ritland (Genetics Research, 67, 175) shown to be unaffected by allelic dropout and to be the most accurate when there is genotyping error. We also found that the choice of estimator should not rely solely on the strength of correlation between estimated and true relatedness as a strong correlation does not necessarily mean estimates are close to true relatedness. We also demonstrated how even a large SNP data set with genotyping error (allelic dropout or otherwise) or missing data still performs better than a perfectly genotyped microsatellite data set of tens of markers. The simulation-based approach used here can be easily implemented by others on their own genotyping-by-sequencing data sets to confirm the most appropriate and powerful estimator for their data. © 2017 John Wiley & Sons Ltd.
Evaluation of normalization methods in mammalian microRNA-Seq data

PubMed Central

Garmire, Lana Xia; Subramaniam, Shankar

2012-01-01

Simple total tag count normalization is inadequate for microRNA sequencing data generated from the next generation sequencing technology. However, so far systematic evaluation of normalization methods on microRNA sequencing data is lacking. We comprehensively evaluate seven commonly used normalization methods including global normalization, Lowess normalization, Trimmed Mean Method (TMM), quantile normalization, scaling normalization, variance stabilization, and invariant method. We assess these methods on two individual experimental data sets with the empirical statistical metrics of mean square error (MSE) and Kolmogorov-Smirnov (K-S) statistic. Additionally, we evaluate the methods with results from quantitative PCR validation. Our results consistently show that Lowess normalization and quantile normalization perform the best, whereas TMM, a method applied to the RNA-Sequencing normalization, performs the worst. The poor performance of TMM normalization is further evidenced by abnormal results from the test of differential expression (DE) of microRNA-Seq data. Comparing with the models used for DE, the choice of normalization method is the primary factor that affects the results of DE. In summary, Lowess normalization and quantile normalization are recommended for normalizing microRNA-Seq data, whereas the TMM method should be used with caution. PMID:22532701
Error propagation in eigenimage filtering.

PubMed

Soltanian-Zadeh, H; Windham, J P; Jenkins, J M

1990-01-01

Mathematical derivation of error (noise) propagation in eigenimage filtering is presented. Based on the mathematical expressions, a method for decreasing the propagated noise given a sequence of images is suggested. The signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) of the final composite image are compared to the SNRs and CNRs of the images in the sequence. The consistency of the assumptions and accuracy of the mathematical expressions are investigated using sequences of simulated and real magnetic resonance (MR) images of an agarose phantom and a human brain.
The detection error of thermal test low-frequency cable based on M sequence correlation algorithm

NASA Astrophysics Data System (ADS)

Wu, Dongliang; Ge, Zheyang; Tong, Xin; Du, Chunlin

2018-04-01

The problem of low accuracy and low efficiency of off-line detecting on thermal test low-frequency cable faults could be solved by designing a cable fault detection system, based on FPGA export M sequence code(Linear feedback shift register sequence) as pulse signal source. The design principle of SSTDR (Spread spectrum time-domain reflectometry) reflection method and hardware on-line monitoring setup figure is discussed in this paper. Testing data show that, this detection error increases with fault location of thermal test low-frequency cable.
WE-G-BRA-04: Common Errors and Deficiencies in Radiation Oncology Practice

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kry, S; Dromgoole, L; Alvarez, P

Purpose: Dosimetric errors in radiotherapy dose delivery lead to suboptimal treatments and outcomes. This work reviews the frequency and severity of dosimetric and programmatic errors identified by on-site audits performed by the IROC Houston QA center. Methods: IROC Houston on-site audits evaluate absolute beam calibration, relative dosimetry data compared to the treatment planning system data, and processes such as machine QA. Audits conducted from 2000-present were abstracted for recommendations, including type of recommendation and magnitude of error when applicable. Dosimetric recommendations corresponded to absolute dose errors >3% and relative dosimetry errors >2%. On-site audits of 1020 accelerators at 409 institutionsmore » were reviewed. Results: A total of 1280 recommendations were made (average 3.1/institution). The most common recommendation was for inadequate QA procedures per TG-40 and/or TG-142 (82% of institutions) with the most commonly noted deficiency being x-ray and electron off-axis constancy versus gantry angle. Dosimetrically, the most common errors in relative dosimetry were in small-field output factors (59% of institutions), wedge factors (33% of institutions), off-axis factors (21% of institutions), and photon PDD (18% of institutions). Errors in calibration were also problematic: 20% of institutions had an error in electron beam calibration, 8% had an error in photon beam calibration, and 7% had an error in brachytherapy source calibration. Almost all types of data reviewed included errors up to 7% although 20 institutions had errors in excess of 10%, and 5 had errors in excess of 20%. The frequency of electron calibration errors decreased significantly with time, but all other errors show non-significant changes. Conclusion: There are many common and often serious errors made during the establishment and maintenance of a radiotherapy program that can be identified through independent peer review. Physicists should be cautious, particularly in areas highlighted herein that show a tendency for errors.« less
Volcanic Eruption Forecasts From Accelerating Rates of Drumbeat Long-Period Earthquakes

NASA Astrophysics Data System (ADS)

Bell, Andrew F.; Naylor, Mark; Hernandez, Stephen; Main, Ian G.; Gaunt, H. Elizabeth; Mothes, Patricia; Ruiz, Mario

2018-02-01

Accelerating rates of quasiperiodic "drumbeat" long-period earthquakes (LPs) are commonly reported before eruptions at andesite and dacite volcanoes, and promise insights into the nature of fundamental preeruptive processes and improved eruption forecasts. Here we apply a new Bayesian Markov chain Monte Carlo gamma point process methodology to investigate an exceptionally well-developed sequence of drumbeat LPs preceding a recent large vulcanian explosion at Tungurahua volcano, Ecuador. For more than 24 hr, LP rates increased according to the inverse power law trend predicted by material failure theory, and with a retrospectively forecast failure time that agrees with the eruption onset within error. LPs resulted from repeated activation of a single characteristic source driven by accelerating loading, rather than a distributed failure process, showing that similar precursory trends can emerge from quite different underlying physics. Nevertheless, such sequences have clear potential for improving forecasts of eruptions at Tungurahua and analogous volcanoes.
Learning by observation: insights from Williams syndrome.

PubMed

Foti, Francesca; Menghini, Deny; Mandolesi, Laura; Federico, Francesca; Vicari, Stefano; Petrosini, Laura

2013-01-01

Observing another person performing a complex action accelerates the observer's acquisition of the same action and limits the time-consuming process of learning by trial and error. Observational learning makes an interesting and potentially important topic in the developmental domain, especially when disorders are considered. The implications of studies aimed at clarifying whether and how this form of learning is spared by pathology are manifold. We focused on a specific population with learning and intellectual disabilities, the individuals with Williams syndrome. The performance of twenty-eight individuals with Williams syndrome was compared with that of mental age- and gender-matched thirty-two typically developing children on tasks of learning of a visuo-motor sequence by observation or by trial and error. Regardless of the learning modality, acquiring the correct sequence involved three main phases: a detection phase, in which participants discovered the correct sequence and learned how to perform the task; an exercise phase, in which they reproduced the sequence until performance was error-free; an automatization phase, in which by repeating the error-free sequence they became accurate and speedy. Participants with Williams syndrome beneficiated of observational training (in which they observed an actor detecting the visuo-motor sequence) in the detection phase, while they performed worse than typically developing children in the exercise and automatization phases. Thus, by exploiting competencies learned by observation, individuals with Williams syndrome detected the visuo-motor sequence, putting into action the appropriate procedural strategies. Conversely, their impaired performances in the exercise phases appeared linked to impaired spatial working memory, while their deficits in automatization phases to deficits in processes increasing efficiency and speed of the response. Overall, observational experience was advantageous for acquiring competencies, since it primed subjects' interest in the actions to be performed and functioned as a catalyst for executed action.
Error correction and statistical analyses for intra-host comparisons of feline immunodeficiency virus diversity from high-throughput sequencing data.

PubMed

Liu, Yang; Chiaromonte, Francesca; Ross, Howard; Malhotra, Raunaq; Elleder, Daniel; Poss, Mary

2015-06-30

Infection with feline immunodeficiency virus (FIV) causes an immunosuppressive disease whose consequences are less severe if cats are co-infected with an attenuated FIV strain (PLV). We use virus diversity measurements, which reflect replication ability and the virus response to various conditions, to test whether diversity of virulent FIV in lymphoid tissues is altered in the presence of PLV. Our data consisted of the 3' half of the FIV genome from three tissues of animals infected with FIV alone, or with FIV and PLV, sequenced by 454 technology. Since rare variants dominate virus populations, we had to carefully distinguish sequence variation from errors due to experimental protocols and sequencing. We considered an exponential-normal convolution model used for background correction of microarray data, and modified it to formulate an error correction approach for minor allele frequencies derived from high-throughput sequencing. Similar to accounting for over-dispersion in counts, this accounts for error-inflated variability in frequencies - and quite effectively reproduces empirically observed distributions. After obtaining error-corrected minor allele frequencies, we applied ANalysis Of VAriance (ANOVA) based on a linear mixed model and found that conserved sites and transition frequencies in FIV genes differ among tissues of dual and single infected cats. Furthermore, analysis of minor allele frequencies at individual FIV genome sites revealed 242 sites significantly affected by infection status (dual vs. single) or infection status by tissue interaction. All together, our results demonstrated a decrease in FIV diversity in bone marrow in the presence of PLV. Importantly, these effects were weakened or undetectable when error correction was performed with other approaches (thresholding of minor allele frequencies; probabilistic clustering of reads). We also queried the data for cytidine deaminase activity on the viral genome, which causes an asymmetric increase in G to A substitutions, but found no evidence for this host defense strategy. Our error correction approach for minor allele frequencies (more sensitive and computationally efficient than other algorithms) and our statistical treatment of variation (ANOVA) were critical for effective use of high-throughput sequencing data in understanding viral diversity. We found that co-infection with PLV shifts FIV diversity from bone marrow to lymph node and spleen.
Characterization of Blue Mold Penicillium Species Isolated from Stored Fruits Using Multiple Highly Conserved Loci

PubMed Central

Yin, Guohua; Zhang, Yuliang; Pennerman, Kayla K.; Wu, Guangxi; Hua, Sui Sheng T.; Yu, Jiujiang; Jurick, Wayne M.; Guo, Anping; Bennett, Joan W.

2017-01-01

Penicillium is a large genus of common molds with over 400 described species; however, identification of individual species is difficult, including for those species that cause postharvest rots. In this study, blue rot fungi from stored apples and pears were isolated from a variety of hosts, locations, and years. Based on morphological and cultural characteristics and partial amplification of the β-tubulin locus, the isolates were provisionally identified as several different species of Penicillium. These isolates were investigated further using a suite of molecular DNA markers and compared to sequences of the ex-type for cognate species in GenBank, and were identified as P. expansum (3 isolates), P. solitum (3 isolates), P. carneum (1 isolate), and P. paneum (1 isolate). Three of the markers we used (ITS, internal transcribed spacer rDNA sequence; benA, β-tubulin; CaM, calmodulin) were suitable for distinguishing most of our isolates from one another at the species level. In contrast, we were unable to amplify RPB2 sequences from four of the isolates. Comparison of our sequences with cognate sequences in GenBank from isolates with the same species names did not always give coherent data, reinforcing earlier studies that have shown large intraspecific variability in many Penicillium species, as well as possible errors in some sequence data deposited in GenBank. PMID:29371531
Mathematical Writing Errors in Expository Writings of College Mathematics Students

ERIC Educational Resources Information Center

Guce, Ivee K.

2017-01-01

Despite the efforts to confirm the effectiveness of writing in learning mathematics, analysis on common errors in mathematical writings has not received sufficient attention. This study aimed to provide an account of the students' procedural explanations in terms of their commonly committed errors in mathematical writing. Nine errors in…
Increased taxon sampling reveals thousands of hidden orthologs in flatworms

PubMed Central

2017-01-01

Gains and losses shape the gene complement of animal lineages and are a fundamental aspect of genomic evolution. Acquiring a comprehensive view of the evolution of gene repertoires is limited by the intrinsic limitations of common sequence similarity searches and available databases. Thus, a subset of the gene complement of an organism consists of hidden orthologs, i.e., those with no apparent homology to sequenced animal lineages—mistakenly considered new genes—but actually representing rapidly evolving orthologs or undetected paralogs. Here, we describe Leapfrog, a simple automated BLAST pipeline that leverages increased taxon sampling to overcome long evolutionary distances and identify putative hidden orthologs in large transcriptomic databases by transitive homology. As a case study, we used 35 transcriptomes of 29 flatworm lineages to recover 3427 putative hidden orthologs, some unidentified by OrthoFinder and HaMStR, two common orthogroup inference algorithms. Unexpectedly, we do not observe a correlation between the number of putative hidden orthologs in a lineage and its “average” evolutionary rate. Hidden orthologs do not show unusual sequence composition biases that might account for systematic errors in sequence similarity searches. Instead, gene duplication with divergence of one paralog and weak positive selection appear to underlie hidden orthology in Platyhelminthes. By using Leapfrog, we identify key centrosome-related genes and homeodomain classes previously reported as absent in free-living flatworms, e.g., planarians. Altogether, our findings demonstrate that hidden orthologs comprise a significant proportion of the gene repertoire in flatworms, qualifying the impact of gene losses and gains in gene complement evolution. PMID:28400424
Are quantitative trait-dependent sampling designs cost-effective for analysis of rare and common variants?

PubMed

Yilmaz, Yildiz E; Bull, Shelley B

2011-11-29

Use of trait-dependent sampling designs in whole-genome association studies of sequence data can reduce total sequencing costs with modest losses of statistical efficiency. In a quantitative trait (QT) analysis of data from the Genetic Analysis Workshop 17 mini-exome for unrelated individuals in the Asian subpopulation, we investigate alternative designs that sequence only 50% of the entire cohort. In addition to a simple random sampling design, we consider extreme-phenotype designs that are of increasing interest in genetic association analysis of QTs, especially in studies concerned with the detection of rare genetic variants. We also evaluate a novel sampling design in which all individuals have a nonzero probability of being selected into the sample but in which individuals with extreme phenotypes have a proportionately larger probability. We take differential sampling of individuals with informative trait values into account by inverse probability weighting using standard survey methods which thus generalizes to the source population. In replicate 1 data, we applied the designs in association analysis of Q1 with both rare and common variants in the FLT1 gene, based on knowledge of the generating model. Using all 200 replicate data sets, we similarly analyzed Q1 and Q4 (which is known to be free of association with FLT1) to evaluate relative efficiency, type I error, and power. Simulation study results suggest that the QT-dependent selection designs generally yield greater than 50% relative efficiency compared to using the entire cohort, implying cost-effectiveness of 50% sample selection and worthwhile reduction of sequencing costs.
Retention-error patterns in complex alphanumeric serial-recall tasks.

PubMed

Mathy, Fabien; Varré, Jean-Stéphane

2013-01-01

We propose a new method based on an algorithm usually dedicated to DNA sequence alignment in order to both reliably score short-term memory performance on immediate serial-recall tasks and analyse retention-error patterns. There can be considerable confusion on how performance on immediate serial list recall tasks is scored, especially when the to-be-remembered items are sampled with replacement. We discuss the utility of sequence-alignment algorithms to compare the stimuli to the participants' responses. The idea is that deletion, substitution, translocation, and insertion errors, which are typical in DNA, are also typical putative errors in short-term memory (respectively omission, confusion, permutation, and intrusion errors). We analyse four data sets in which alphanumeric lists included a few (or many) repetitions. After examining the method on two simple data sets, we show that sequence alignment offers 1) a compelling method for measuring capacity in terms of chunks when many regularities are introduced in the material (third data set) and 2) a reliable estimator of individual differences in short-term memory capacity. This study illustrates the difficulty of arriving at a good measure of short-term memory performance, and also attempts to characterise the primary factors underpinning remembering and forgetting.
TU-H-CAMPUS-JeP3-05: Adaptive Determination of Needle Sequence HDR Prostate Brachytherapy with Divergent Needle-By-Needle Delivery

DOE Office of Scientific and Technical Information (OSTI.GOV)

Borot de Battisti, M; Maenhout, M; Lagendijk, J J W

Purpose: To develop a new method which adaptively determines the optimal needle insertion sequence for HDR prostate brachytherapy involving divergent needle-by-needle dose delivery by e.g. a robotic device. A needle insertion sequence is calculated at the beginning of the intervention and updated after each needle insertion with feedback on needle positioning errors. Methods: Needle positioning errors and anatomy changes may occur during HDR brachytherapy which can lead to errors in the delivered dose. A novel strategy was developed to calculate and update the needle sequence and the dose plan after each needle insertion with feedback on needle positioning errors. Themore » dose plan optimization was performed by numerical simulations. The proposed needle sequence determination optimizes the final dose distribution based on the dose coverage impact of each needle. This impact is predicted stochastically by needle insertion simulations. HDR procedures were simulated with varying number of needle insertions (4 to 12) using 11 patient MR data-sets with PTV, prostate, urethra, bladder and rectum delineated. Needle positioning errors were modeled by random normally distributed angulation errors (standard deviation of 3 mm at the needle’s tip). The final dose parameters were compared in the situations where the needle with the largest vs. the smallest dose coverage impact was selected at each insertion. Results: Over all scenarios, the percentage of clinically acceptable final dose distribution improved when the needle selected had the largest dose coverage impact (91%) compared to the smallest (88%). The differences were larger for few (4 to 6) needle insertions (maximum difference scenario: 79% vs. 60%). The computation time of the needle sequence optimization was below 60s. Conclusion: A new adaptive needle sequence determination for HDR prostate brachytherapy was developed. Coupled to adaptive planning, the selection of the needle with the largest dose coverage impact increases chances of reaching the clinical constraints. M. Borot de Battisti is funded by Philips Medical Systems Nederland B.V.; M. Moerland is principal investigator on a contract funded by Philips Medical Systems Nederland B.V.; G. Hautvast and D. Binnekamp are fulltime employees of Philips Medical Systems Nederland B.V.« less
Characteristics of pediatric chemotherapy medication errors in a national error reporting database.

PubMed

Rinke, Michael L; Shore, Andrew D; Morlock, Laura; Hicks, Rodney W; Miller, Marlene R

2007-07-01

Little is known regarding chemotherapy medication errors in pediatrics despite studies suggesting high rates of overall pediatric medication errors. In this study, the authors examined patterns in pediatric chemotherapy errors. The authors queried the United States Pharmacopeia MEDMARX database, a national, voluntary, Internet-accessible error reporting system, for all error reports from 1999 through 2004 that involved chemotherapy medications and patients aged <18 years. Of the 310 pediatric chemotherapy error reports, 85% reached the patient, and 15.6% required additional patient monitoring or therapeutic intervention. Forty-eight percent of errors originated in the administering phase of medication delivery, and 30% originated in the drug-dispensing phase. Of the 387 medications cited, 39.5% were antimetabolites, 14.0% were alkylating agents, 9.3% were anthracyclines, and 9.3% were topoisomerase inhibitors. The most commonly involved chemotherapeutic agents were methotrexate (15.3%), cytarabine (12.1%), and etoposide (8.3%). The most common error types were improper dose/quantity (22.9% of 327 cited error types), wrong time (22.6%), omission error (14.1%), and wrong administration technique/wrong route (12.2%). The most common error causes were performance deficit (41.3% of 547 cited error causes), equipment and medication delivery devices (12.4%), communication (8.8%), knowledge deficit (6.8%), and written order errors (5.5%). Four of the 5 most serious errors occurred at community hospitals. Pediatric chemotherapy errors often reached the patient, potentially were harmful, and differed in quality between outpatient and inpatient areas. This study indicated which chemotherapeutic agents most often were involved in errors and that administering errors were common. Investigation is needed regarding targeted medication administration safeguards for these high-risk medications. Copyright (c) 2007 American Cancer Society.
Alignment methods: strategies, challenges, benchmarking, and comparative overview.

PubMed

Löytynoja, Ari

2012-01-01

Comparative evolutionary analyses of molecular sequences are solely based on the identities and differences detected between homologous characters. Errors in this homology statement, that is errors in the alignment of the sequences, are likely to lead to errors in the downstream analyses. Sequence alignment and phylogenetic inference are tightly connected and many popular alignment programs use the phylogeny to divide the alignment problem into smaller tasks. They then neglect the phylogenetic tree, however, and produce alignments that are not evolutionarily meaningful. The use of phylogeny-aware methods reduces the error but the resulting alignments, with evolutionarily correct representation of homology, can challenge the existing practices and methods for viewing and visualising the sequences. The inter-dependency of alignment and phylogeny can be resolved by joint estimation of the two; methods based on statistical models allow for inferring the alignment parameters from the data and correctly take into account the uncertainty of the solution but remain computationally challenging. Widely used alignment methods are based on heuristic algorithms and unlikely to find globally optimal solutions. The whole concept of one correct alignment for the sequences is questionable, however, as there typically exist vast numbers of alternative, roughly equally good alignments that should also be considered. This uncertainty is hidden by many popular alignment programs and is rarely correctly taken into account in the downstream analyses. The quest for finding and improving the alignment solution is complicated by the lack of suitable measures of alignment goodness. The difficulty of comparing alternative solutions also affects benchmarks of alignment methods and the results strongly depend on the measure used. As the effects of alignment error cannot be predicted, comparing the alignments' performance in downstream analyses is recommended.
Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index.

PubMed

Yang, Jian; Bakshi, Andrew; Zhu, Zhihong; Hemani, Gibran; Vinkhuyzen, Anna A E; Lee, Sang Hong; Robinson, Matthew R; Perry, John R B; Nolte, Ilja M; van Vliet-Ostaptchouk, Jana V; Snieder, Harold; Esko, Tonu; Milani, Lili; Mägi, Reedik; Metspalu, Andres; Hamsten, Anders; Magnusson, Patrik K E; Pedersen, Nancy L; Ingelsson, Erik; Soranzo, Nicole; Keller, Matthew C; Wray, Naomi R; Goddard, Michael E; Visscher, Peter M

2015-10-01

We propose a method (GREML-LDMS) to estimate heritability for human complex traits in unrelated individuals using whole-genome sequencing data. We demonstrate using simulations based on whole-genome sequencing data that ∼97% and ∼68% of variation at common and rare variants, respectively, can be captured by imputation. Using the GREML-LDMS method, we estimate from 44,126 unrelated individuals that all ∼17 million imputed variants explain 56% (standard error (s.e.) = 2.3%) of variance for height and 27% (s.e. = 2.5%) of variance for body mass index (BMI), and we find evidence that height- and BMI-associated variants have been under natural selection. Considering the imperfect tagging of imputation and potential overestimation of heritability from previous family-based studies, heritability is likely to be 60-70% for height and 30-40% for BMI. Therefore, the missing heritability is small for both traits. For further discovery of genes associated with complex traits, a study design with SNP arrays followed by imputation is more cost-effective than whole-genome sequencing at current prices.

Task planning with uncertainty for robotic systems. Thesis

NASA Technical Reports Server (NTRS)

Cao, Tiehua

1993-01-01

In a practical robotic system, it is important to represent and plan sequences of operations and to be able to choose an efficient sequence from them for a specific task. During the generation and execution of task plans, different kinds of uncertainty may occur and erroneous states need to be handled to ensure the efficiency and reliability of the system. An approach to task representation, planning, and error recovery for robotic systems is demonstrated. Our approach to task planning is based on an AND/OR net representation, which is then mapped to a Petri net representation of all feasible geometric states and associated feasibility criteria for net transitions. Task decomposition of robotic assembly plans based on this representation is performed on the Petri net for robotic assembly tasks, and the inheritance of properties of liveness, safeness, and reversibility at all levels of decomposition are explored. This approach provides a framework for robust execution of tasks through the properties of traceability and viability. Uncertainty in robotic systems are modeled by local fuzzy variables, fuzzy marking variables, and global fuzzy variables which are incorporated in fuzzy Petri nets. Analysis of properties and reasoning about uncertainty are investigated using fuzzy reasoning structures built into the net. Two applications of fuzzy Petri nets, robot task sequence planning and sensor-based error recovery, are explored. In the first application, the search space for feasible and complete task sequences with correct precedence relationships is reduced via the use of global fuzzy variables in reasoning about subgoals. In the second application, sensory verification operations are modeled by mutually exclusive transitions to reason about local and global fuzzy variables on-line and automatically select a retry or an alternative error recovery sequence when errors occur. Task sequencing and task execution with error recovery capability for one and multiple soft components in robotic systems are investigated.
Phylogenetically Structured Differences in rRNA Gene Sequence Variation among Species of Arbuscular Mycorrhizal Fungi and Their Implications for Sequence Clustering

PubMed Central

Ekanayake, Saliya; Ruan, Yang; Schütte, Ursel M. E.; Kaonongbua, Wittaya; Fox, Geoffrey; Ye, Yuzhen; Bever, James D.

2016-01-01

ABSTRACT Arbuscular mycorrhizal (AM) fungi form mutualisms with plant roots that increase plant growth and shape plant communities. Each AM fungal cell contains a large amount of genetic diversity, but it is unclear if this diversity varies across evolutionary lineages. We found that sequence variation in the nuclear large-subunit (LSU) rRNA gene from 29 isolates representing 21 AM fungal species generally assorted into genus- and species-level clades, with the exception of species of the genera Claroideoglomus and Entrophospora. However, there were significant differences in the levels of sequence variation across the phylogeny and between genera, indicating that it is an evolutionarily constrained trait in AM fungi. These consistent patterns of sequence variation across both phylogenetic and taxonomic groups pose challenges to interpreting operational taxonomic units (OTUs) as approximations of species-level groups of AM fungi. We demonstrate that the OTUs produced by five sequence clustering methods using 97% or equivalent sequence similarity thresholds failed to match the expected species of AM fungi, although OTUs from AbundantOTU, CD-HIT-OTU, and CROP corresponded better to species than did OTUs from mothur or UPARSE. This lack of OTU-to-species correspondence resulted both from sequences of one species being split into multiple OTUs and from sequences of multiple species being lumped into the same OTU. The OTU richness therefore will not reliably correspond to the AM fungal species richness in environmental samples. Conservatively, this error can overestimate species richness by 4-fold or underestimate richness by one-half, and the direction of this error will depend on the genera represented in the sample. IMPORTANCE Arbuscular mycorrhizal (AM) fungi form important mutualisms with the roots of most plant species. Individual AM fungi are genetically diverse, but it is unclear whether the level of this diversity differs among evolutionary lineages. We found that the amount of sequence variation in an rRNA gene that is commonly used to identify AM fungal species varied significantly between evolutionary groups that correspond to different genera, with the exception of two genera that are genetically indistinguishable from each other. When we clustered groups of similar sequences into operational taxonomic units (OTUs) using five different clustering methods, these patterns of sequence variation caused the number of OTUs to either over- or underestimate the actual number of AM fungal species, depending on the genus. Our results indicate that OTU-based inferences about AM fungal species composition from environmental sequences can be improved if they take these taxonomically structured patterns of sequence variation into account. PMID:27260357
Designing robust watermark barcodes for multiplex long-read sequencing.

PubMed

Ezpeleta, Joaquín; Krsticevic, Flavia J; Bulacio, Pilar; Tapia, Elizabeth

2017-03-15

To attain acceptable sample misassignment rates, current approaches to multiplex single-molecule real-time sequencing require upstream quality improvement, which is obtained from multiple passes over the sequenced insert and significantly reduces the effective read length. In order to fully exploit the raw read length on multiplex applications, robust barcodes capable of dealing with the full single-pass error rates are needed. We present a method for designing sequencing barcodes that can withstand a large number of insertion, deletion and substitution errors and are suitable for use in multiplex single-molecule real-time sequencing. The manuscript focuses on the design of barcodes for full-length single-pass reads, impaired by challenging error rates in the order of 11%. The proposed barcodes can multiplex hundreds or thousands of samples while achieving sample misassignment probabilities as low as 10-7 under the above conditions, and are designed to be compatible with chemical constraints imposed by the sequencing process. Software tools for constructing watermark barcode sets and demultiplexing barcoded reads, together with example sets of barcodes and synthetic barcoded reads, are freely available at www.cifasis-conicet.gov.ar/ezpeleta/NS-watermark . ezpeleta@cifasis-conicet.gov.ar. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Characterization of Hepatitis C Virus (HCV) Envelope Diversification from Acute to Chronic Infection within a Sexually Transmitted HCV Cluster by Using Single-Molecule, Real-Time Sequencing

PubMed Central

Ho, Cynthia K. Y.; Raghwani, Jayna; Koekkoek, Sylvie; Liang, Richard H.; Van der Meer, Jan T. M.; Van Der Valk, Marc; De Jong, Menno; Pybus, Oliver G.

2016-01-01

ABSTRACT In contrast to other available next-generation sequencing platforms, PacBio single-molecule, real-time (SMRT) sequencing has the advantage of generating long reads albeit with a relatively higher error rate in unprocessed data. Using this platform, we longitudinally sampled and sequenced the hepatitis C virus (HCV) envelope genome region (1,680 nucleotides [nt]) from individuals belonging to a cluster of sexually transmitted cases. All five subjects were coinfected with HIV-1 and a closely related strain of HCV genotype 4d. In total, 50 samples were analyzed by using SMRT sequencing. By using 7 passes of circular consensus sequencing, the error rate was reduced to 0.37%, and the median number of sequences was 612 per sample. A further reduction of insertions was achieved by alignment against a sample-specific reference sequence. However, in vitro recombination during PCR amplification could not be excluded. Phylogenetic analysis supported close relationships among HCV sequences from the four male subjects and subsequent transmission from one subject to his female partner. Transmission was characterized by a strong genetic bottleneck. Viral genetic diversity was low during acute infection and increased upon progression to chronicity but subsequently fluctuated during chronic infection, caused by the alternate detection of distinct coexisting lineages. SMRT sequencing combines long reads with sufficient depth for many phylogenetic analyses and can therefore provide insights into within-host HCV evolutionary dynamics without the need for haplotype reconstruction using statistical algorithms. IMPORTANCE Next-generation sequencing has revolutionized the study of genetically variable RNA virus populations, but for phylogenetic and evolutionary analyses, longer sequences than those generated by most available platforms, while minimizing the intrinsic error rate, are desired. Here, we demonstrate for the first time that PacBio SMRT sequencing technology can be used to generate full-length HCV envelope sequences at the single-molecule level, providing a data set with large sequencing depth for the characterization of intrahost viral dynamics. The selection of consensus reads derived from at least 7 full circular consensus sequencing rounds significantly reduced the intrinsic high error rate of this method. We used this method to genetically characterize a unique transmission cluster of sexually transmitted HCV infections, providing insight into the distinct evolutionary pathways in each patient over time and identifying the transmission-associated genetic bottleneck as well as fluctuations in viral genetic diversity over time, accompanied by dynamic shifts in viral subpopulations. PMID:28077634
Errors Detection by 5- to 8-Year-Olds Listening to a Wrong French Sequence of Number Words: Music before Lyrics?

ERIC Educational Resources Information Center

Gauderat-Bagault, Laurence; Lehalle, Henri

Children, ages 5 to 8 years (n=71), were required to listen and detect errors out of a partly wrong sequence of tape-recorded French number words from 1 to 100. Children (from several schools near Montpellier, France) were from preschool, grade 1, and grade 2. Results show that wrong syntactic rules were better detected than omissions, whereas…
Basecalling with LifeTrace

PubMed Central

Walther, Dirk; Bartha, Gábor; Morris, Macdonald

2001-01-01

A pivotal step in electrophoresis sequencing is the conversion of the raw, continuous chromatogram data into the actual sequence of discrete nucleotides, a process referred to as basecalling. We describe a novel algorithm for basecalling implemented in the program LifeTrace. Like Phred, currently the most widely used basecalling software program, LifeTrace takes processed trace data as input. It was designed to be tolerant to variable peak spacing by means of an improved peak-detection algorithm that emphasizes local chromatogram information over global properties. LifeTrace is shown to generate high-quality basecalls and reliable quality scores. It proved particularly effective when applied to MegaBACE capillary sequencing machines. In a benchmark test of 8372 dye-primer MegaBACE chromatograms, LifeTrace generated 17% fewer substitution errors, 16% fewer insertion/deletion errors, and 2.4% more aligned bases to the finished sequence than did Phred. For two sets totaling 6624 dye-terminator chromatograms, the performance improvement was 15% fewer substitution errors, 10% fewer insertion/deletion errors, and 2.1% more aligned bases. The processing time required by LifeTrace is comparable to that of Phred. The predicted quality scores were in line with observed quality scores, permitting direct use for quality clipping and in silico single nucleotide polymorphism (SNP) detection. Furthermore, we introduce a new type of quality score associated with every basecall: the gap-quality. It estimates the probability of a deletion error between the current and the following basecall. This additional quality score improves detection of single basepair deletions when used for locating potential basecalling errors during the alignment. We also describe a new protocol for benchmarking that we believe better discerns basecaller performance differences than methods previously published. PMID:11337481
Evaluation of 16S Rrna amplicon sequencing using two next-generation sequencing technologies for phylogenetic analysis of the rumen bacterial community in steers

USDA-ARS?s Scientific Manuscript database

Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...
Evaluation of 16S rRNA amplicon sequencing using two next-generation sequencing technologies for phylogenetic analysis of the rumen bacterial community in steers

USDA-ARS?s Scientific Manuscript database

Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...
Synthetic Spike-in Standards Improve Run-Specific Systematic Error Analysis for DNA and RNA Sequencing

PubMed Central

Zook, Justin M.; Samarov, Daniel; McDaniel, Jennifer; Sen, Shurjo K.; Salit, Marc

2012-01-01

While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a data set used to calculate association of SSEs with various features in the reads and sequence context. This data set is typically either from a part of the data set being “recalibrated” (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 Phred-scaled quality score units, and by as much as 13 units at CpG sites. In addition, since the spike-in data used for recalibration are independent of the genome being sequenced, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration. PMID:22859977
Common but unappreciated sources of error in one, two, and multiple-color pyrometry

NASA Technical Reports Server (NTRS)

Spjut, R. Erik

1988-01-01

The most common sources of error in optical pyrometry are examined. They can be classified as either noise and uncertainty errors, stray radiation errors, or speed-of-response errors. Through judicious choice of detectors and optical wavelengths the effect of noise errors can be minimized, but one should strive to determine as many of the system properties as possible. Careful consideration of the optical-collection system can minimize stray radiation errors. Careful consideration must also be given to the slowest elements in a pyrometer when measuring rapid phenomena.
Multiframe video coding for improved performance over wireless channels.

PubMed

Budagavi, M; Gibson, J D

2001-01-01

We propose and evaluate a multi-frame extension to block motion compensation (BMC) coding of videoconferencing-type video signals for wireless channels. The multi-frame BMC (MF-BMC) coder makes use of the redundancy that exists across multiple frames in typical videoconferencing sequences to achieve additional compression over that obtained by using the single frame BMC (SF-BMC) approach, such as in the base-level H.263 codec. The MF-BMC approach also has an inherent ability of overcoming some transmission errors and is thus more robust when compared to the SF-BMC approach. We model the error propagation process in MF-BMC coding as a multiple Markov chain and use Markov chain analysis to infer that the use of multiple frames in motion compensation increases robustness. The Markov chain analysis is also used to devise a simple scheme which randomizes the selection of the frame (amongst the multiple previous frames) used in BMC to achieve additional robustness. The MF-BMC coders proposed are a multi-frame extension of the base level H.263 coder and are found to be more robust than the base level H.263 coder when subjected to simulated errors commonly encountered on wireless channels.
Antiretroviral medication prescribing errors are common with hospitalization of HIV-infected patients.

PubMed

Commers, Tessa; Swindells, Susan; Sayles, Harlan; Gross, Alan E; Devetten, Marcel; Sandkovsky, Uriel

2014-01-01

Errors in prescribing antiretroviral therapy (ART) often occur with the hospitalization of HIV-infected patients. The rapid identification and prevention of errors may reduce patient harm and healthcare-associated costs. A retrospective review of hospitalized HIV-infected patients was carried out between 1 January 2009 and 31 December 2011. Errors were documented as omission, underdose, overdose, duplicate therapy, incorrect scheduling and/or incorrect therapy. The time to error correction was recorded. Relative risks (RRs) were computed to evaluate patient characteristics and error rates. A total of 289 medication errors were identified in 146/416 admissions (35%). The most common was drug omission (69%). At an error rate of 31%, nucleoside reverse transcriptase inhibitors were associated with an increased risk of error when compared with protease inhibitors (RR 1.32; 95% CI 1.04-1.69) and co-formulated drugs (RR 1.59; 95% CI 1.19-2.09). Of the errors, 31% were corrected within the first 24 h, but over half (55%) were never remedied. Admissions with an omission error were 7.4 times more likely to have all errors corrected within 24 h than were admissions without an omission. Drug interactions with ART were detected on 51 occasions. For the study population (n = 177), an increased risk of admission error was observed for black (43%) compared with white (28%) individuals (RR 1.53; 95% CI 1.16-2.03) but no significant differences were observed between white patients and other minorities or between men and women. Errors in inpatient ART were common, and the majority were never detected. The most common errors involved omission of medication, and nucleoside reverse transcriptase inhibitors had the highest rate of prescribing error. Interventions to prevent and correct errors are urgently needed.
Dynamically corrected gates for singlet-triplet spin qubits with control-dependent errors

NASA Astrophysics Data System (ADS)

Jacobson, N. Tobias; Witzel, Wayne M.; Nielsen, Erik; Carroll, Malcolm S.

2013-03-01

Magnetic field inhomogeneity due to random polarization of quasi-static local magnetic impurities is a major source of environmentally induced error for singlet-triplet double quantum dot (DQD) spin qubits. Moreover, for singlet-triplet qubits this error may depend on the applied controls. This effect is significant when a static magnetic field gradient is applied to enable full qubit control. Through a configuration interaction analysis, we observe that the dependence of the field inhomogeneity-induced error on the DQD bias voltage can vary systematically as a function of the controls for certain experimentally relevant operating regimes. To account for this effect, we have developed a straightforward prescription for adapting dynamically corrected gate sequences that assume control-independent errors into sequences that compensate for systematic control-dependent errors. We show that accounting for such errors may lead to a substantial increase in gate fidelities. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. DOE's National Nuclear Security Administration under contract DE-AC04-94AL85000.
[Epidemiology of refractive errors].

PubMed

Wolfram, C

2017-07-01

Refractive errors are very common and can lead to severe pathological changes in the eye. This article analyzes the epidemiology of refractive errors in the general population in Germany and worldwide and describes common definitions for refractive errors and clinical characteristics for pathologicaal changes. Refractive errors differ between age groups due to refractive changes during the life time and also due to generation-specific factors. Current research about the etiology of refractive errors has strengthened the influence of environmental factors, which led to new strategies for the prevention of refractive pathologies.
Error identification and recovery by student nurses using human patient simulation: opportunity to improve patient safety.

PubMed

Henneman, Elizabeth A; Roche, Joan P; Fisher, Donald L; Cunningham, Helene; Reilly, Cheryl A; Nathanson, Brian H; Henneman, Philip L

2010-02-01

This study examined types of errors that occurred or were recovered in a simulated environment by student nurses. Errors occurred in all four rule-based error categories, and all students committed at least one error. The most frequent errors occurred in the verification category. Another common error was related to physician interactions. The least common errors were related to coordinating information with the patient and family. Our finding that 100% of student subjects committed rule-based errors is cause for concern. To decrease errors and improve safe clinical practice, nurse educators must identify effective strategies that students can use to improve patient surveillance. Copyright 2010 Elsevier Inc. All rights reserved.
Machine Translation as a Model for Overcoming Some Common Errors in English-into-Arabic Translation among EFL University Freshmen

ERIC Educational Resources Information Center

El-Banna, Adel I.; Naeem, Marwa A.

2016-01-01

This research work aimed at making use of Machine Translation to help students avoid some syntactic, semantic and pragmatic common errors in translation from English into Arabic. Participants were a hundred and five freshmen who studied the "Translation Common Errors Remedial Program" prepared by the researchers. A testing kit that…
Sequencing artifacts in the type A influenza database and attempts to correct them

USDA-ARS?s Scientific Manuscript database

Currently over 300,000 Type A influenza gene sequences representing over 50,000 strains are available in publicly available databases. However, the quality of the sequences submitted are determined by the contributor and many sequence errors are present in the databases, which can affect the result...
Frequency and Type of Situational Awareness Errors Contributing to Death and Brain Damage: A Closed Claims Analysis.

PubMed

Schulz, Christian M; Burden, Amanda; Posner, Karen L; Mincer, Shawn L; Steadman, Randolph; Wagner, Klaus J; Domino, Karen B

2017-08-01

Situational awareness errors may play an important role in the genesis of patient harm. The authors examined closed anesthesia malpractice claims for death or brain damage to determine the frequency and type of situational awareness errors. Surgical and procedural anesthesia death and brain damage claims in the Anesthesia Closed Claims Project database were analyzed. Situational awareness error was defined as failure to perceive relevant clinical information, failure to comprehend the meaning of available information, or failure to project, anticipate, or plan. Patient and case characteristics, primary damaging events, and anesthesia payments in claims with situational awareness errors were compared to other death and brain damage claims from 2002 to 2013. Anesthesiologist situational awareness errors contributed to death or brain damage in 198 of 266 claims (74%). Respiratory system damaging events were more common in claims with situational awareness errors (56%) than other claims (21%, P < 0.001). The most common specific respiratory events in error claims were inadequate oxygenation or ventilation (24%), difficult intubation (11%), and aspiration (10%). Payments were made in 85% of situational awareness error claims compared to 46% in other claims (P = 0.001), with no significant difference in payment size. Among 198 claims with anesthesia situational awareness error, perception errors were most common (42%), whereas comprehension errors (29%) and projection errors (29%) were relatively less common. Situational awareness error definitions were operationalized for reliable application to real-world anesthesia cases. Situational awareness errors may have contributed to catastrophic outcomes in three quarters of recent anesthesia malpractice claims.Situational awareness errors resulting in death or brain damage remain prevalent causes of malpractice claims in the 21st century.
Medication errors in anesthesia: unacceptable or unavoidable?

PubMed

Dhawan, Ira; Tewari, Anurag; Sehgal, Sankalp; Sinha, Ashish Chandra

Medication errors are the common causes of patient morbidity and mortality. It adds financial burden to the institution as well. Though the impact varies from no harm to serious adverse effects including death, it needs attention on priority basis since medication errors' are preventable. In today's world where people are aware and medical claims are on the hike, it is of utmost priority that we curb this issue. Individual effort to decrease medication error alone might not be successful until a change in the existing protocols and system is incorporated. Often drug errors that occur cannot be reversed. The best way to 'treat' drug errors is to prevent them. Wrong medication (due to syringe swap), overdose (due to misunderstanding or preconception of the dose, pump misuse and dilution error), incorrect administration route, under dosing and omission are common causes of medication error that occur perioperatively. Drug omission and calculation mistakes occur commonly in ICU. Medication errors can occur perioperatively either during preparation, administration or record keeping. Numerous human and system errors can be blamed for occurrence of medication errors. The need of the hour is to stop the blame - game, accept mistakes and develop a safe and 'just' culture in order to prevent medication errors. The newly devised systems like VEINROM, a fluid delivery system is a novel approach in preventing drug errors due to most commonly used medications in anesthesia. Similar developments along with vigilant doctors, safe workplace culture and organizational support all together can help prevent these errors. Copyright © 2016. Published by Elsevier Editora Ltda.
Impact of gradient timing error on the tissue sodium concentration bioscale measured using flexible twisted projection imaging

NASA Astrophysics Data System (ADS)

Lu, Aiming; Atkinson, Ian C.; Vaughn, J. Thomas; Thulborn, Keith R.

2011-12-01

The rapid biexponential transverse relaxation of the sodium MR signal from brain tissue requires efficient k-space sampling for quantitative imaging in a time that is acceptable for human subjects. The flexible twisted projection imaging (flexTPI) sequence has been shown to be suitable for quantitative sodium imaging with an ultra-short echo time to minimize signal loss. The fidelity of the k-space center location is affected by the readout gradient timing errors on the three physical axes, which is known to cause image distortion for projection-based acquisitions. This study investigated the impact of these timing errors on the voxel-wise accuracy of the tissue sodium concentration (TSC) bioscale measured with the flexTPI sequence. Our simulations show greater than 20% spatially varying quantification errors when the gradient timing errors are larger than 10 μs on all three axes. The quantification is more tolerant of gradient timing errors on the Z-axis. An existing method was used to measure the gradient timing errors with <1 μs error. The gradient timing error measurement is shown to be RF coil dependent, and timing error differences of up to ˜16 μs have been observed between different RF coils used on the same scanner. The measured timing errors can be corrected prospectively or retrospectively to obtain accurate TSC values.

Bacterial community comparisons by taxonomy-supervised analysis independent of sequence alignment and clustering

PubMed Central

Sul, Woo Jun; Cole, James R.; Jesus, Ederson da C.; Wang, Qiong; Farris, Ryan J.; Fish, Jordan A.; Tiedje, James M.

2011-01-01

High-throughput sequencing of 16S rRNA genes has increased our understanding of microbial community structure, but now even higher-throughput methods to the Illumina scale allow the creation of much larger datasets with more samples and orders-of-magnitude more sequences that swamp current analytic methods. We developed a method capable of handling these larger datasets on the basis of assignment of sequences into an existing taxonomy using a supervised learning approach (taxonomy-supervised analysis). We compared this method with a commonly used clustering approach based on sequence similarity (taxonomy-unsupervised analysis). We sampled 211 different bacterial communities from various habitats and obtained ∼1.3 million 16S rRNA sequences spanning the V4 hypervariable region by pyrosequencing. Both methodologies gave similar ecological conclusions in that β-diversity measures calculated by using these two types of matrices were significantly correlated to each other, as were the ordination configurations and hierarchical clustering dendrograms. In addition, our taxonomy-supervised analyses were also highly correlated with phylogenetic methods, such as UniFrac. The taxonomy-supervised analysis has the advantages that it is not limited by the exhaustive computation required for the alignment and clustering necessary for the taxonomy-unsupervised analysis, is more tolerant of sequencing errors, and allows comparisons when sequences are from different regions of the 16S rRNA gene. With the tremendous expansion in 16S rRNA data acquisition underway, the taxonomy-supervised approach offers the potential to provide more rapid and extensive community comparisons across habitats and samples. PMID:21873204
Checking distributional assumptions for pharmacokinetic summary statistics based on simulations with compartmental models.

PubMed

Shen, Meiyu; Russek-Cohen, Estelle; Slud, Eric V

2016-08-12

Bioequivalence (BE) studies are an essential part of the evaluation of generic drugs. The most common in vivo BE study design is the two-period two-treatment crossover design. AUC (area under the concentration-time curve) and Cmax (maximum concentration) are obtained from the observed concentration-time profiles for each subject from each treatment under each sequence. In the BE evaluation of pharmacokinetic crossover studies, the normality of the univariate response variable, e.g. log(AUC) 1 or log(Cmax), is often assumed in the literature without much evidence. Therefore, we investigate the distributional assumption of the normality of response variables, log(AUC) and log(Cmax), by simulating concentration-time profiles from two-stage pharmacokinetic models (commonly used in pharmacokinetic research) for a wide range of pharmacokinetic parameters and measurement error structures. Our simulations show that, under reasonable distributional assumptions on the pharmacokinetic parameters, log(AUC) has heavy tails and log(Cmax) is skewed. Sensitivity analyses are conducted to investigate how the distribution of the standardized log(AUC) (or the standardized log(Cmax)) for a large number of simulated subjects deviates from normality if distributions of errors in the pharmacokinetic model for plasma concentrations deviate from normality and if the plasma concentration can be described by different compartmental models.
Frequency and types of the medication errors in an academic emergency department in Iran: The emergent need for clinical pharmacy services in emergency departments.

PubMed

Zeraatchi, Alireza; Talebian, Mohammad-Taghi; Nejati, Amir; Dashti-Khavidaki, Simin

2013-07-01

Emergency departments (EDs) are characterized by simultaneous care of multiple patients with various medical conditions. Due to a large number of patients with complex diseases, speed and complexity of medication use, working in under-staffing and crowded environment, medication errors are commonly perpetrated by emergency care providers. This study was designed to evaluate the incidence of medication errors among patients attending to an ED in a teaching hospital in Iran. In this cross-sectional study, a total of 500 patients attending to ED were randomly assessed for incidence and types of medication errors. Some factors related to medication errors such as working shift, weekdays and schedule of the educational program of trainee were also evaluated. Nearly, 22% of patients experienced at least one medication error. The rate of medication errors were 0.41 errors per patient and 0.16 errors per ordered medication. The frequency of medication errors was higher in men, middle age patients, first weekdays, night-time work schedules and the first semester of educational year of new junior emergency medicine residents. More than 60% of errors were prescription errors by physicians and the remaining were transcription or administration errors by nurses. More than 35% of the prescribing errors happened during the selection of drug dose and frequency. The most common medication errors by nurses during the administration were omission error (16.2%) followed by unauthorized drug (6.4%). Most of the medication errors happened for anticoagulants and thrombolytics (41.2%) followed by antimicrobial agents (37.7%) and insulin (7.4%). In this study, at least one-fifth of the patients attending to ED experienced medication errors resulting from multiple factors. More common prescription errors happened during ordering drug dose and frequency. More common administration errors included dug omission or unauthorized drug.
Turtle: identifying frequent k-mers with cache-efficient algorithms.

PubMed

Roy, Rajat Shuvro; Bhattacharya, Debashish; Schliep, Alexander

2014-07-15

Counting the frequencies of k-mers in read libraries is often a first step in the analysis of high-throughput sequencing data. Infrequent k-mers are assumed to be a result of sequencing errors. The frequent k-mers constitute a reduced but error-free representation of the experiment, which can inform read error correction or serve as the input to de novo assembly methods. Ideally, the memory requirement for counting should be linear in the number of frequent k-mers and not in the, typically much larger, total number of k-mers in the read library. We present a novel method that balances time, space and accuracy requirements to efficiently extract frequent k-mers even for high-coverage libraries and large genomes such as human. Our method is designed to minimize cache misses in a cache-efficient manner by using a pattern-blocked Bloom filter to remove infrequent k-mers from consideration in combination with a novel sort-and-compact scheme, instead of a hash, for the actual counting. Although this increases theoretical complexity, the savings in cache misses reduce the empirical running times. A variant of method can resort to a counting Bloom filter for even larger savings in memory at the expense of false-negative rates in addition to the false-positive rates common to all Bloom filter-based approaches. A comparison with the state-of-the-art shows reduced memory requirements and running times. The tools are freely available for download at http://bioinformatics.rutgers.edu/Software/Turtle and http://figshare.com/articles/Turtle/791582. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Kinematic Analysis of Speech Sound Sequencing Errors Induced by Delayed Auditory Feedback.

PubMed

Cler, Gabriel J; Lee, Jackson C; Mittelman, Talia; Stepp, Cara E; Bohland, Jason W

2017-06-22

Delayed auditory feedback (DAF) causes speakers to become disfluent and make phonological errors. Methods for assessing the kinematics of speech errors are lacking, with most DAF studies relying on auditory perceptual analyses, which may be problematic, as errors judged to be categorical may actually represent blends of sounds or articulatory errors. Eight typical speakers produced nonsense syllable sequences under normal and DAF (200 ms). Lip and tongue kinematics were captured with electromagnetic articulography. Time-locked acoustic recordings were transcribed, and the kinematics of utterances with and without perceived errors were analyzed with existing and novel quantitative methods. New multivariate measures showed that for 5 participants, kinematic variability for productions perceived to be error free was significantly increased under delay; these results were validated by using the spatiotemporal index measure. Analysis of error trials revealed both typical productions of a nontarget syllable and productions with articulatory kinematics that incorporated aspects of both the target and the perceived utterance. This study is among the first to characterize articulatory changes under DAF and provides evidence for different classes of speech errors, which may not be perceptually salient. New methods were developed that may aid visualization and analysis of large kinematic data sets. https://doi.org/10.23641/asha.5103067.
Data driven CAN node reliability assessment for manufacturing system

NASA Astrophysics Data System (ADS)

Zhang, Leiming; Yuan, Yong; Lei, Yong

2017-01-01

The reliability of the Controller Area Network(CAN) is critical to the performance and safety of the system. However, direct bus-off time assessment tools are lacking in practice due to inaccessibility of the node information and the complexity of the node interactions upon errors. In order to measure the mean time to bus-off(MTTB) of all the nodes, a novel data driven node bus-off time assessment method for CAN network is proposed by directly using network error information. First, the corresponding network error event sequence for each node is constructed using multiple-layer network error information. Then, the generalized zero inflated Poisson process(GZIP) model is established for each node based on the error event sequence. Finally, the stochastic model is constructed to predict the MTTB of the node. The accelerated case studies with different error injection rates are conducted on a laboratory network to demonstrate the proposed method, where the network errors are generated by a computer controlled error injection system. Experiment results show that the MTTB of nodes predicted by the proposed method agree well with observations in the case studies. The proposed data driven node time to bus-off assessment method for CAN networks can successfully predict the MTTB of nodes by directly using network error event data.
Kinematic Analysis of Speech Sound Sequencing Errors Induced by Delayed Auditory Feedback

PubMed Central

Lee, Jackson C.; Mittelman, Talia; Stepp, Cara E.; Bohland, Jason W.

2017-01-01

Purpose Delayed auditory feedback (DAF) causes speakers to become disfluent and make phonological errors. Methods for assessing the kinematics of speech errors are lacking, with most DAF studies relying on auditory perceptual analyses, which may be problematic, as errors judged to be categorical may actually represent blends of sounds or articulatory errors. Method Eight typical speakers produced nonsense syllable sequences under normal and DAF (200 ms). Lip and tongue kinematics were captured with electromagnetic articulography. Time-locked acoustic recordings were transcribed, and the kinematics of utterances with and without perceived errors were analyzed with existing and novel quantitative methods. Results New multivariate measures showed that for 5 participants, kinematic variability for productions perceived to be error free was significantly increased under delay; these results were validated by using the spatiotemporal index measure. Analysis of error trials revealed both typical productions of a nontarget syllable and productions with articulatory kinematics that incorporated aspects of both the target and the perceived utterance. Conclusions This study is among the first to characterize articulatory changes under DAF and provides evidence for different classes of speech errors, which may not be perceptually salient. New methods were developed that may aid visualization and analysis of large kinematic data sets. Supplemental Material https://doi.org/10.23641/asha.5103067 PMID:28655038
Associations among measures of sequential processing in motor and linguistics tasks in adults with and without a family history of childhood apraxia of speech: a replication study.

PubMed

Button, Le; Peter, Beate; Stoel-Gammon, Carol; Raskind, Wendy H

2013-03-01

The purpose of this study was to address the hypothesis that childhood apraxia of speech (CAS) is influenced by an underlying deficit in sequential processing that is also expressed in other modalities. In a sample of 21 adults from five multigenerational families, 11 with histories of various familial speech sound disorders, 3 biologically related adults from a family with familial CAS showed motor sequencing deficits in an alternating motor speech task. Compared with the other adults, these three participants showed deficits in tasks requiring high loads of sequential processing, including nonword imitation, nonword reading and spelling. Qualitative error analyses in real word and nonword imitations revealed group differences in phoneme sequencing errors. Motor sequencing ability was correlated with phoneme sequencing errors during real word and nonword imitation, reading and spelling. Correlations were characterized by extremely high scores in one family and extremely low scores in another. Results are consistent with a central deficit in sequential processing in CAS of familial origin.
Associations among measures of sequential processing in motor and linguistics tasks in adults with and without a family history of childhood apraxia of speech: A replication study

PubMed Central

BUTTON, LE; PETER, BEATE; STOEL-GAMMON, CAROL; RASKIND, WENDY H.

2013-01-01

The purpose of this study was to address the hypothesis that childhood apraxia of speech (CAS) is influenced by an underlying deficit in sequential processing that is also expressed in other modalities. In a sample of 21 adults from five multigenerational families, 11 with histories of various familial speech sound disorders, 3 biologically related adults from a family with familial CAS showed motor sequencing deficits in an alternating motor speech task. Compared with the other adults, these three participants showed deficits in tasks requiring high loads of sequential processing, including nonword imitation, nonword reading and spelling. Qualitative error analyses in real word and nonword imitations revealed group differences in phoneme sequencing errors. Motor sequencing ability was correlated with phoneme sequencing errors during real word and nonword imitation, reading and spelling. Correlations were characterized by extremely high scores in one family and extremely low scores in another. Results are consistent with a central deficit in sequential processing in CAS of familial origin. PMID:23339292
Genome alignment with graph data structures: a comparison

PubMed Central

2014-01-01

Background Recent advances in rapid, low-cost sequencing have opened up the opportunity to study complete genome sequences. The computational approach of multiple genome alignment allows investigation of evolutionarily related genomes in an integrated fashion, providing a basis for downstream analyses such as rearrangement studies and phylogenetic inference. Graphs have proven to be a powerful tool for coping with the complexity of genome-scale sequence alignments. The potential of graphs to intuitively represent all aspects of genome alignments led to the development of graph-based approaches for genome alignment. These approaches construct a graph from a set of local alignments, and derive a genome alignment through identification and removal of graph substructures that indicate errors in the alignment. Results We compare the structures of commonly used graphs in terms of their abilities to represent alignment information. We describe how the graphs can be transformed into each other, and identify and classify graph substructures common to one or more graphs. Based on previous approaches, we compile a list of modifications that remove these substructures. Conclusion We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs. If we neglect vertex or edge labels, the graphs differ in their information content. Still, many ideas are shared among all graph-based approaches. Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools. PMID:24712884
Metagenomic and near full-length 16S rRNA sequence data in support of the phylogenetic analysis of the rumen bacterial community in steers

USDA-ARS?s Scientific Manuscript database

Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...
Computationally mapping sequence space to understand evolutionary protein engineering.

PubMed

Armstrong, Kathryn A; Tidor, Bruce

2008-01-01

Evolutionary protein engineering has been dramatically successful, producing a wide variety of new proteins with altered stability, binding affinity, and enzymatic activity. However, the success of such procedures is often unreliable, and the impact of the choice of protein, engineering goal, and evolutionary procedure is not well understood. We have created a framework for understanding aspects of the protein engineering process by computationally mapping regions of feasible sequence space for three small proteins using structure-based design protocols. We then tested the ability of different evolutionary search strategies to explore these sequence spaces. The results point to a non-intuitive relationship between the error-prone PCR mutation rate and the number of rounds of replication. The evolutionary relationships among feasible sequences reveal hub-like sequences that serve as particularly fruitful starting sequences for evolutionary search. Moreover, genetic recombination procedures were examined, and tradeoffs relating sequence diversity and search efficiency were identified. This framework allows us to consider the impact of protein structure on the allowed sequence space and therefore on the challenges that each protein presents to error-prone PCR and genetic recombination procedures.
Errors in otology.

PubMed

Kartush, J M

1996-11-01

Practicing medicine successfully requires that errors in diagnosis and treatment be minimized. Malpractice laws encourage litigators to ascribe all medical errors to incompetence and negligence. There are, however, many other causes of unintended outcomes. This article describes common causes of errors and suggests ways to minimize mistakes in otologic practice. Widespread dissemination of knowledge about common errors and their precursors can reduce the incidence of their occurrence. Consequently, laws should be passed to allow for a system of non-punitive, confidential reporting of errors and "near misses" that can be shared by physicians nationwide.
Reducing false-positive incidental findings with ensemble genotyping and logistic regression based variant filtering methods.

PubMed

Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choe, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B; Gupta, Neha; Kohane, Isaac S; Green, Robert C; Kong, Sek Won

2014-08-01

As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false-negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous single nucleotide variants (SNVs); 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery in NA12878, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and an ensemble genotyping would be essential to minimize false-positive DNM candidates. © 2014 WILEY PERIODICALS, INC.
Reducing false positive incidental findings with ensemble genotyping and logistic regression-based variant filtering methods

PubMed Central

Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choi, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B.; Gupta, Neha; Kohane, Isaac S.; Green, Robert C.; Kong, Sek Won

2014-01-01

As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous SNVs; 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and ensemble genotyping would be essential to minimize false positive DNM candidates. PMID:24829188
Association analysis using next-generation sequence data from publicly available control groups: the robust variance score statistic

PubMed Central

Derkach, Andriy; Chiang, Theodore; Gong, Jiafen; Addis, Laura; Dobbins, Sara; Tomlinson, Ian; Houlston, Richard; Pal, Deb K.; Strug, Lisa J.

2014-01-01

Motivation: Sufficiently powered case–control studies with next-generation sequence (NGS) data remain prohibitively expensive for many investigators. If feasible, a more efficient strategy would be to include publicly available sequenced controls. However, these studies can be confounded by differences in sequencing platform; alignment, single nucleotide polymorphism and variant calling algorithms; read depth; and selection thresholds. Assuming one can match cases and controls on the basis of ethnicity and other potential confounding factors, and one has access to the aligned reads in both groups, we investigate the effect of systematic differences in read depth and selection threshold when comparing allele frequencies between cases and controls. We propose a novel likelihood-based method, the robust variance score (RVS), that substitutes genotype calls by their expected values given observed sequence data. Results: We show theoretically that the RVS eliminates read depth bias in the estimation of minor allele frequency. We also demonstrate that, using simulated and real NGS data, the RVS method controls Type I error and has comparable power to the ‘gold standard’ analysis with the true underlying genotypes for both common and rare variants. Availability and implementation: An RVS R script and instructions can be found at strug.research.sickkids.ca, and at https://github.com/strug-lab/RVS. Contact: lisa.strug@utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24733292
Nurses' behaviors and visual scanning patterns may reduce patient identification errors.

PubMed

Marquard, Jenna L; Henneman, Philip L; He, Ze; Jo, Junghee; Fisher, Donald L; Henneman, Elizabeth A

2011-09-01

Patient identification (ID) errors occurring during the medication administration process can be fatal. The aim of this study is to determine whether differences in nurses' behaviors and visual scanning patterns during the medication administration process influence their capacities to identify patient ID errors. Nurse participants (n = 20) administered medications to 3 patients in a simulated clinical setting, with 1 patient having an embedded ID error. Error-identifying nurses tended to complete more process steps in a similar amount of time than non-error-identifying nurses and tended to scan information across artifacts (e.g., ID band, patient chart, medication label) rather than fixating on several pieces of information on a single artifact before fixating on another artifact. Non-error-indentifying nurses tended to increase their durations of off-topic conversations-a type of process interruption-over the course of the trials; the difference between groups was significant in the trial with the embedded ID error. Error-identifying nurses tended to have their most fixations in a row on the patient's chart, whereas non-error-identifying nurses did not tend to have a single artifact on which they consistently fixated. Finally, error-identifying nurses tended to have predictable eye fixation sequences across artifacts, whereas non-error-identifying nurses tended to have seemingly random eye fixation sequences. This finding has implications for nurse training and the design of tools and technologies that support nurses as they complete the medication administration process. (c) 2011 APA, all rights reserved.
Representation of item position in immediate serial recall: Evidence from intrusion errors.

PubMed

Fischer-Baum, Simon; McCloskey, Michael

2015-09-01

In immediate serial recall, participants are asked to recall novel sequences of items in the correct order. Theories of the representations and processes required for this task differ in how order information is maintained; some have argued that order is represented through item-to-item associations, while others have argued that each item is coded for its position in a sequence, with position being defined either by distance from the start of the sequence, or by distance from both the start and the end of the sequence. Previous researchers have used error analyses to adjudicate between these different proposals. However, these previous attempts have not allowed researchers to examine the full set of alternative proposals. In the current study, we analyzed errors produced in 2 immediate serial recall experiments that differ in the modality of input (visual vs. aural presentation of words) and the modality of output (typed vs. spoken responses), using new analysis methods that allow for a greater number of alternative hypotheses to be considered. We find evidence that sequence positions are represented relative to both the start and the end of the sequence, and show a contribution of the end-based representation beyond the final item in the sequence. We also find limited evidence for item-to-item associations, suggesting that both a start-end positional scheme and item-to-item associations play a role in representing item order in immediate serial recall. (c) 2015 APA, all rights reserved).
Accuracy and Reproducibility of Adipose Tissue Measurements in Young Infants by Whole Body Magnetic Resonance Imaging

PubMed Central

Bauer, Jan Stefan; Noël, Peter Benjamin; Vollhardt, Christiane; Much, Daniela; Degirmenci, Saliha; Brunner, Stefanie; Rummeny, Ernst Josef; Hauner, Hans

2015-01-01

Purpose MR might be well suited to obtain reproducible and accurate measures of fat tissues in infants. This study evaluates MR-measurements of adipose tissue in young infants in vitro and in vivo. Material and Methods MR images of ten phantoms simulating subcutaneous fat of an infant’s torso were obtained using a 1.5T MR scanner with and without simulated breathing. Scans consisted of a cartesian water-suppression turbo spin echo (wsTSE) sequence, and a PROPELLER wsTSE sequence. Fat volume was quantified directly and by MR imaging using k-means clustering and threshold-based segmentation procedures to calculate accuracy in vitro. Whole body MR was obtained in sleeping young infants (average age 67±30 days). This study was approved by the local review board. All parents gave written informed consent. To obtain reproducibility in vivo, cartesian and PROPELLER wsTSE sequences were repeated in seven and four young infants, respectively. Overall, 21 repetitions were performed for the cartesian sequence and 13 repetitions for the PROPELLER sequence. Results In vitro accuracy errors depended on the chosen segmentation procedure, ranging from 5.4% to 76%, while the sequence showed no significant influence. Artificial breathing increased the minimal accuracy error to 9.1%. In vivo reproducibility errors for total fat volume of the sleeping infants ranged from 2.6% to 3.4%. Neither segmentation nor sequence significantly influenced reproducibility. Conclusion With both cartesian and PROPELLER sequences an accurate and reproducible measure of body fat was achieved. Adequate segmentation was mandatory for high accuracy. PMID:25706876
Identifying and reducing error in cluster-expansion approximations of protein energies.

PubMed

Hahn, Seungsoo; Ashenberg, Orr; Grigoryan, Gevorg; Keating, Amy E

2010-12-01

Protein design involves searching a vast space for sequences that are compatible with a defined structure. This can pose significant computational challenges. Cluster expansion is a technique that can accelerate the evaluation of protein energies by generating a simple functional relationship between sequence and energy. The method consists of several steps. First, for a given protein structure, a training set of sequences with known energies is generated. Next, this training set is used to expand energy as a function of clusters consisting of single residues, residue pairs, and higher order terms, if required. The accuracy of the sequence-based expansion is monitored and improved using cross-validation testing and iterative inclusion of additional clusters. As a trade-off for evaluation speed, the cluster-expansion approximation causes prediction errors, which can be reduced by including more training sequences, including higher order terms in the expansion, and/or reducing the sequence space described by the cluster expansion. This article analyzes the sources of error and introduces a method whereby accuracy can be improved by judiciously reducing the described sequence space. The method is applied to describe the sequence-stability relationship for several protein structures: coiled-coil dimers and trimers, a PDZ domain, and T4 lysozyme as examples with computationally derived energies, and SH3 domains in amphiphysin-1 and endophilin-1 as examples where the expanded pseudo-energies are obtained from experiments. Our open-source software package Cluster Expansion Version 1.0 allows users to expand their own energy function of interest and thereby apply cluster expansion to custom problems in protein design. © 2010 Wiley Periodicals, Inc.

Accuracy and reproducibility of adipose tissue measurements in young infants by whole body magnetic resonance imaging.

PubMed

Bauer, Jan Stefan; Noël, Peter Benjamin; Vollhardt, Christiane; Much, Daniela; Degirmenci, Saliha; Brunner, Stefanie; Rummeny, Ernst Josef; Hauner, Hans

2015-01-01

MR might be well suited to obtain reproducible and accurate measures of fat tissues in infants. This study evaluates MR-measurements of adipose tissue in young infants in vitro and in vivo. MR images of ten phantoms simulating subcutaneous fat of an infant's torso were obtained using a 1.5T MR scanner with and without simulated breathing. Scans consisted of a cartesian water-suppression turbo spin echo (wsTSE) sequence, and a PROPELLER wsTSE sequence. Fat volume was quantified directly and by MR imaging using k-means clustering and threshold-based segmentation procedures to calculate accuracy in vitro. Whole body MR was obtained in sleeping young infants (average age 67±30 days). This study was approved by the local review board. All parents gave written informed consent. To obtain reproducibility in vivo, cartesian and PROPELLER wsTSE sequences were repeated in seven and four young infants, respectively. Overall, 21 repetitions were performed for the cartesian sequence and 13 repetitions for the PROPELLER sequence. In vitro accuracy errors depended on the chosen segmentation procedure, ranging from 5.4% to 76%, while the sequence showed no significant influence. Artificial breathing increased the minimal accuracy error to 9.1%. In vivo reproducibility errors for total fat volume of the sleeping infants ranged from 2.6% to 3.4%. Neither segmentation nor sequence significantly influenced reproducibility. With both cartesian and PROPELLER sequences an accurate and reproducible measure of body fat was achieved. Adequate segmentation was mandatory for high accuracy.
Syntactic and semantic errors in radiology reports associated with speech recognition software.

PubMed

Ringler, Michael D; Goss, Brian C; Bartholmai, Brian J

2017-03-01

Speech recognition software can increase the frequency of errors in radiology reports, which may affect patient care. We retrieved 213,977 speech recognition software-generated reports from 147 different radiologists and proofread them for errors. Errors were classified as "material" if they were believed to alter interpretation of the report. "Immaterial" errors were subclassified as intrusion/omission or spelling errors. The proportion of errors and error type were compared among individual radiologists, imaging subspecialty, and time periods. In all, 20,759 reports (9.7%) contained errors, of which 3992 (1.9%) were material errors. Among immaterial errors, spelling errors were more common than intrusion/omission errors ( p < .001). Proportion of errors and fraction of material errors varied significantly among radiologists and between imaging subspecialties ( p < .001). Errors were more common in cross-sectional reports, reports reinterpreting results of outside examinations, and procedural studies (all p < .001). Error rate decreased over time ( p < .001), which suggests that a quality control program with regular feedback may reduce errors.
ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data

PubMed Central

2012-01-01

Background Next-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results. Results Here we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy. Conclusion ReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration. PMID:22946927
Augmenting Chinese hamster genome assembly by identifying regions of high confidence.

PubMed

Vishwanathan, Nandita; Bandyopadhyay, Arpan A; Fu, Hsu-Yuan; Sharma, Mohit; Johnson, Kathryn C; Mudge, Joann; Ramaraj, Thiruvarangan; Onsongo, Getiria; Silverstein, Kevin A T; Jacob, Nitya M; Le, Huong; Karypis, George; Hu, Wei-Shou

2016-09-01

Chinese hamster Ovary (CHO) cell lines are the dominant industrial workhorses for therapeutic recombinant protein production. The availability of genome sequence of Chinese hamster and CHO cells will spur further genome and RNA sequencing of producing cell lines. However, the mammalian genomes assembled using shot-gun sequencing data still contain regions of uncertain quality due to assembly errors. Identifying high confidence regions in the assembled genome will facilitate its use for cell engineering and genome engineering. We assembled two independent drafts of Chinese hamster genome by de novo assembly from shotgun sequencing reads and by re-scaffolding and gap-filling the draft genome from NCBI for improved scaffold lengths and gap fractions. We then used the two independent assemblies to identify high confidence regions using two different approaches. First, the two independent assemblies were compared at the sequence level to identify their consensus regions as "high confidence regions" which accounts for at least 78 % of the assembled genome. Further, a genome wide comparison of the Chinese hamster scaffolds with mouse chromosomes revealed scaffolds with large blocks of collinearity, which were also compiled as high-quality scaffolds. Genome scale collinearity was complemented with EST based synteny which also revealed conserved gene order compared to mouse. As cell line sequencing becomes more commonly practiced, the approaches reported here are useful for assessing the quality of assembly and potentially facilitate the engineering of cell lines. Copyright © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Epidemic history of hepatitis C virus infection in two remote communities in Nigeria, West Africa.

PubMed

Forbi, Joseph C; Purdy, Michael A; Campo, David S; Vaughan, Gilberto; Dimitrova, Zoya E; Ganova-Raeva, Lilia M; Xia, Guo-Liang; Khudyakov, Yury E

2012-07-01

We investigated the molecular epidemiology and population dynamics of HCV infection among indigenes of two semi-isolated communities in North-Central Nigeria. Despite remoteness and isolation, ~15% of the population had serological or molecular markers of hepatitis C virus (HCV) infection. Phylogenetic analysis of the NS5b sequences obtained from 60 HCV-infected residents showed that HCV variants belonged to genotype 1 (n=51; 85%) and genotype 2 (n=9; 15%). All sequences were unique and intermixed in the phylogenetic tree with HCV sequences from people infected from other West African countries. The high-throughput 454 pyrosequencing of the HCV hypervariable region 1 and an empirical threshold error correction algorithm were used to evaluate intra-host heterogeneity of HCV strains of genotype 1 (n=43) and genotype 2 (n=6) from residents of the communities. Analysis revealed a rare detectable intermixing of HCV intra-host variants among residents. Identification of genetically close HCV variants among all known groups of relatives suggests a common intra-familial HCV transmission in the communities. Applying Bayesian coalescent analysis to the NS5b sequences, the most recent common ancestors for genotype 1 and 2 variants were estimated to have existed 675 and 286 years ago, respectively. Bayesian skyline plots suggest that HCV lineages of both genotypes identified in the Nigerian communities experienced epidemic growth for 200-300 years until the mid-20th century. The data suggest a massive introduction of numerous HCV variants to the communities during the 20th century in the background of a dynamic evolutionary history of the hepatitis C epidemic in Nigeria over the past three centuries.
An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.

PubMed

Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K

2014-01-01

Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone free software.
Errors Analysis of Solving Linear Inequalities among the Preparatory Year Students at King Saud University

ERIC Educational Resources Information Center

El-khateeb, Mahmoud M. A.

2016-01-01

The purpose of this study aims to investigate the errors classes occurred by the Preparatory year students at King Saud University, through analysis student responses to the items of the study test, and to identify the varieties of the common errors and ratios of common errors that occurred in solving inequalities. In the collection of the data,…
Medication Errors in Pediatric Anesthesia: A Report From the Wake Up Safe Quality Improvement Initiative.

PubMed

Lobaugh, Lauren M Y; Martin, Lizabeth D; Schleelein, Laura E; Tyler, Donald C; Litman, Ronald S

2017-09-01

Wake Up Safe is a quality improvement initiative of the Society for Pediatric Anesthesia that contains a deidentified registry of serious adverse events occurring in pediatric anesthesia. The aim of this study was to describe and characterize reported medication errors to find common patterns amenable to preventative strategies. In September 2016, we analyzed approximately 6 years' worth of medication error events reported to Wake Up Safe. Medication errors were classified by: (1) medication category; (2) error type by phase of administration: prescribing, preparation, or administration; (3) bolus or infusion error; (4) provider type and level of training; (5) harm as defined by the National Coordinating Council for Medication Error Reporting and Prevention; and (6) perceived preventability. From 2010 to the time of our data analysis in September 2016, 32 institutions had joined and submitted data on 2087 adverse events during 2,316,635 anesthetics. These reports contained details of 276 medication errors, which comprised the third highest category of events behind cardiac and respiratory related events. Medication errors most commonly involved opioids and sedative/hypnotics. When categorized by phase of handling, 30 events occurred during preparation, 67 during prescribing, and 179 during administration. The most common error type was accidental administration of the wrong dose (N = 84), followed by syringe swap (accidental administration of the wrong syringe, N = 49). Fifty-seven (21%) reported medication errors involved medications prepared as infusions as opposed to 1 time bolus administrations. Medication errors were committed by all types of anesthesia providers, most commonly by attendings. Over 80% of reported medication errors reached the patient and more than half of these events caused patient harm. Fifteen events (5%) required a life sustaining intervention. Nearly all cases (97%) were judged to be either likely or certainly preventable. Our findings characterize the most common types of medication errors in pediatric anesthesia practice and provide guidance on future preventative strategies. Many of these errors will be almost entirely preventable with the use of prefilled medication syringes to avoid accidental ampule swap, bar-coding at the point of medication administration to prevent syringe swap and to confirm the proper dose, and 2-person checking of medication infusions for accuracy.
Golay sequences coded coherent optical OFDM for long-haul transmission

NASA Astrophysics Data System (ADS)

Qin, Cui; Ma, Xiangrong; Hua, Tao; Zhao, Jing; Yu, Huilong; Zhang, Jian

2017-09-01

We propose to use binary Golay sequences in coherent optical orthogonal frequency division multiplexing (CO-OFDM) to improve the long-haul transmission performance. The Golay sequences are generated by binary Reed-Muller codes, which have low peak-to-average power ratio and certain error correction capability. A low-complexity decoding algorithm for the Golay sequences is then proposed to recover the signal. Under same spectral efficiency, the QPSK modulated OFDM with binary Golay sequences coding with and without discrete Fourier transform (DFT) spreading (DFTS-QPSK-GOFDM and QPSK-GOFDM) are compared with the normal BPSK modulated OFDM with and without DFT spreading (DFTS-BPSK-OFDM and BPSK-OFDM) after long-haul transmission. At a 7% forward error correction code threshold (Q2 factor of 8.5 dB), it is shown that DFTS-QPSK-GOFDM outperforms DFTS-BPSK-OFDM by extending the transmission distance by 29% and 18%, in non-dispersion managed and dispersion managed links, respectively.
Performance and precision of double digestion RAD (ddRAD) genotyping in large multiplexed datasets of marine fish species.

PubMed

Maroso, F; Hillen, J E J; Pardo, B G; Gkagkavouzis, K; Coscia, I; Hermida, M; Franch, R; Hellemans, B; Van Houdt, J; Simionati, B; Taggart, J B; Nielsen, E E; Maes, G; Ciavaglia, S A; Webster, L M I; Volckaert, F A M; Martinez, P; Bargelloni, L; Ogden, R

2018-06-01

The development of Genotyping-By-Sequencing (GBS) technologies enables cost-effective analysis of large numbers of Single Nucleotide Polymorphisms (SNPs), especially in "non-model" species. Nevertheless, as such technologies enter a mature phase, biases and errors inherent to GBS are becoming evident. Here, we evaluated the performance of double digest Restriction enzyme Associated DNA (ddRAD) sequencing in SNP genotyping studies including high number of samples. Datasets of sequence data were generated from three marine teleost species (>5500 samples, >2.5 × 10 12 bases in total), using a standardized protocol. A common bioinformatics pipeline based on STACKS was established, with and without the use of a reference genome. We performed analyses throughout the production and analysis of ddRAD data in order to explore (i) the loss of information due to heterogeneous raw read number across samples; (ii) the discrepancy between expected and observed tag length and coverage; (iii) the performances of reference based vs. de novo approaches; (iv) the sources of potential genotyping errors of the library preparation/bioinformatics protocol, by comparing technical replicates. Our results showed use of a reference genome and a posteriori genotype correction improved genotyping precision. Individual read coverage was a key variable for reproducibility; variance in sequencing depth between loci in the same individual was also identified as an important factor and found to correlate to tag length. A comparison of downstream analysis carried out with ddRAD vs single SNP allele specific assay genotypes provided information about the levels of genotyping imprecision that can have a significant impact on allele frequency estimations and population assignment. The results and insights presented here will help to select and improve approaches to the analysis of large datasets based on RAD-like methodologies. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.
Hospital-based transfusion error tracking from 2005 to 2010: identifying the key errors threatening patient transfusion safety.

PubMed

Maskens, Carolyn; Downie, Helen; Wendt, Alison; Lima, Ana; Merkley, Lisa; Lin, Yulia; Callum, Jeannie

2014-01-01

This report provides a comprehensive analysis of transfusion errors occurring at a large teaching hospital and aims to determine key errors that are threatening transfusion safety, despite implementation of safety measures. Errors were prospectively identified from 2005 to 2010. Error data were coded on a secure online database called the Transfusion Error Surveillance System. Errors were defined as any deviation from established standard operating procedures. Errors were identified by clinical and laboratory staff. Denominator data for volume of activity were used to calculate rates. A total of 15,134 errors were reported with a median number of 215 errors per month (range, 85-334). Overall, 9083 (60%) errors occurred on the transfusion service and 6051 (40%) on the clinical services. In total, 23 errors resulted in patient harm: 21 of these errors occurred on the clinical services and two in the transfusion service. Of the 23 harm events, 21 involved inappropriate use of blood. Errors with no harm were 657 times more common than events that caused harm. The most common high-severity clinical errors were sample labeling (37.5%) and inappropriate ordering of blood (28.8%). The most common high-severity error in the transfusion service was sample accepted despite not meeting acceptance criteria (18.3%). The cost of product and component loss due to errors was $593,337. Errors occurred at every point in the transfusion process, with the greatest potential risk of patient harm resulting from inappropriate ordering of blood products and errors in sample labeling. © 2013 American Association of Blood Banks (CME).
MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data.

PubMed

Fan, Yu; Xi, Liu; Hughes, Daniel S T; Zhang, Jianjun; Zhang, Jianhua; Futreal, P Andrew; Wheeler, David A; Wang, Wenyi

2016-08-24

Subclonal mutations reveal important features of the genetic architecture of tumors. However, accurate detection of mutations in genetically heterogeneous tumor cell populations using next-generation sequencing remains challenging. We develop MuSE ( http://bioinformatics.mdanderson.org/main/MuSE ), Mutation calling using a Markov Substitution model for Evolution, a novel approach for modeling the evolution of the allelic composition of the tumor and normal tissue at each reference base. MuSE adopts a sample-specific error model that reflects the underlying tumor heterogeneity to greatly improve the overall accuracy. We demonstrate the accuracy of MuSE in calling subclonal mutations in the context of large-scale tumor sequencing projects using whole exome and whole genome sequencing.
Model Performance Evaluation and Scenario Analysis ...

EPA Pesticide Factsheets

This tool consists of two parts: model performance evaluation and scenario analysis (MPESA). The model performance evaluation consists of two components: model performance evaluation metrics and model diagnostics. These metrics provides modelers with statistical goodness-of-fit measures that capture magnitude only, sequence only, and combined magnitude and sequence errors. The performance measures include error analysis, coefficient of determination, Nash-Sutcliffe efficiency, and a new weighted rank method. These performance metrics only provide useful information about the overall model performance. Note that MPESA is based on the separation of observed and simulated time series into magnitude and sequence components. The separation of time series into magnitude and sequence components and the reconstruction back to time series provides diagnostic insights to modelers. For example, traditional approaches lack the capability to identify if the source of uncertainty in the simulated data is due to the quality of the input data or the way the analyst adjusted the model parameters. This report presents a suite of model diagnostics that identify if mismatches between observed and simulated data result from magnitude or sequence related errors. MPESA offers graphical and statistical options that allow HSPF users to compare observed and simulated time series and identify the parameter values to adjust or the input data to modify. The scenario analysis part of the too
Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data

PubMed Central

Kosugi, Shunichi; Natsume, Satoshi; Yoshida, Kentaro; MacLean, Daniel; Cano, Liliana; Kamoun, Sophien; Terauchi, Ryohei

2013-01-01

Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in ‘targeted’ alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/. PMID:24116042
Partial bisulfite conversion for unique template sequencing

PubMed Central

Kumar, Vijay; Rosenbaum, Julie; Wang, Zihua; Forcier, Talitha; Ronemus, Michael; Wigler, Michael

2018-01-01

Abstract We introduce a new protocol, mutational sequencing or muSeq, which uses sodium bisulfite to randomly deaminate unmethylated cytosines at a fixed and tunable rate. The muSeq protocol marks each initial template molecule with a unique mutation signature that is present in every copy of the template, and in every fragmented copy of a copy. In the sequenced read data, this signature is observed as a unique pattern of C-to-T or G-to-A nucleotide conversions. Clustering reads with the same conversion pattern enables accurate count and long-range assembly of initial template molecules from short-read sequence data. We explore count and low-error sequencing by profiling 135 000 restriction fragments in a PstI representation, demonstrating that muSeq improves copy number inference and significantly reduces sporadic sequencer error. We explore long-range assembly in the context of cDNA, generating contiguous transcript clusters greater than 3,000 bp in length. The muSeq assemblies reveal transcriptional diversity not observable from short-read data alone. PMID:29161423
Generalized functional linear models for gene-based case-control association studies.

PubMed

Fan, Ruzong; Wang, Yifan; Mills, James L; Carter, Tonia C; Lobach, Iryna; Wilson, Alexander F; Bailey-Wilson, Joan E; Weeks, Daniel E; Xiong, Momiao

2014-11-01

By using functional data analysis techniques, we developed generalized functional linear models for testing association between a dichotomous trait and multiple genetic variants in a genetic region while adjusting for covariates. Both fixed and mixed effect models are developed and compared. Extensive simulations show that Rao's efficient score tests of the fixed effect models are very conservative since they generate lower type I errors than nominal levels, and global tests of the mixed effect models generate accurate type I errors. Furthermore, we found that the Rao's efficient score test statistics of the fixed effect models have higher power than the sequence kernel association test (SKAT) and its optimal unified version (SKAT-O) in most cases when the causal variants are both rare and common. When the causal variants are all rare (i.e., minor allele frequencies less than 0.03), the Rao's efficient score test statistics and the global tests have similar or slightly lower power than SKAT and SKAT-O. In practice, it is not known whether rare variants or common variants in a gene region are disease related. All we can assume is that a combination of rare and common variants influences disease susceptibility. Thus, the improved performance of our models when the causal variants are both rare and common shows that the proposed models can be very useful in dissecting complex traits. We compare the performance of our methods with SKAT and SKAT-O on real neural tube defects and Hirschsprung's disease datasets. The Rao's efficient score test statistics and the global tests are more sensitive than SKAT and SKAT-O in the real data analysis. Our methods can be used in either gene-disease genome-wide/exome-wide association studies or candidate gene analyses. © 2014 WILEY PERIODICALS, INC.
Generalized Functional Linear Models for Gene-based Case-Control Association Studies

PubMed Central

Mills, James L.; Carter, Tonia C.; Lobach, Iryna; Wilson, Alexander F.; Bailey-Wilson, Joan E.; Weeks, Daniel E.; Xiong, Momiao

2014-01-01

By using functional data analysis techniques, we developed generalized functional linear models for testing association between a dichotomous trait and multiple genetic variants in a genetic region while adjusting for covariates. Both fixed and mixed effect models are developed and compared. Extensive simulations show that Rao's efficient score tests of the fixed effect models are very conservative since they generate lower type I errors than nominal levels, and global tests of the mixed effect models generate accurate type I errors. Furthermore, we found that the Rao's efficient score test statistics of the fixed effect models have higher power than the sequence kernel association test (SKAT) and its optimal unified version (SKAT-O) in most cases when the causal variants are both rare and common. When the causal variants are all rare (i.e., minor allele frequencies less than 0.03), the Rao's efficient score test statistics and the global tests have similar or slightly lower power than SKAT and SKAT-O. In practice, it is not known whether rare variants or common variants in a gene are disease-related. All we can assume is that a combination of rare and common variants influences disease susceptibility. Thus, the improved performance of our models when the causal variants are both rare and common shows that the proposed models can be very useful in dissecting complex traits. We compare the performance of our methods with SKAT and SKAT-O on real neural tube defects and Hirschsprung's disease data sets. The Rao's efficient score test statistics and the global tests are more sensitive than SKAT and SKAT-O in the real data analysis. Our methods can be used in either gene-disease genome-wide/exome-wide association studies or candidate gene analyses. PMID:25203683
Evaluating aggregate effects of rare and common variants in the 1000 Genomes Project exon sequencing data using latent variable structural equation modeling.

PubMed

Nock, Nl; Zhang, Lx

2011-11-29

Methods that can evaluate aggregate effects of rare and common variants are limited. Therefore, we applied a two-stage approach to evaluate aggregate gene effects in the 1000 Genomes Project data, which contain 24,487 single-nucleotide polymorphisms (SNPs) in 697 unrelated individuals from 7 populations. In stage 1, we identified potentially interesting genes (PIGs) as those having at least one SNP meeting Bonferroni correction using univariate, multiple regression models. In stage 2, we evaluate aggregate PIG effects on trait, Q1, by modeling each gene as a latent construct, which is defined by multiple common and rare variants, using the multivariate statistical framework of structural equation modeling (SEM). In stage 1, we found that PIGs varied markedly between a randomly selected replicate (replicate 137) and 100 other replicates, with the exception of FLT1. In stage 1, collapsing rare variants decreased false positives but increased false negatives. In stage 2, we developed a good-fitting SEM model that included all nine genes simulated to affect Q1 (FLT1, KDR, ARNT, ELAV4, FLT4, HIF1A, HIF3A, VEGFA, VEGFC) and found that FLT1 had the largest effect on Q1 (βstd = 0.33 ± 0.05). Using replicate 137 estimates as population values, we found that the mean relative bias in the parameters (loadings, paths, residuals) and their standard errors across 100 replicates was on average, less than 5%. Our latent variable SEM approach provides a viable framework for modeling aggregate effects of rare and common variants in multiple genes, but more elegant methods are needed in stage 1 to minimize type I and type II error.
Carbapenem Susceptibility Testing Errors Using Three Automated Systems, Disk Diffusion, Etest, and Broth Microdilution and Carbapenem Resistance Genes in Isolates of Acinetobacter baumannii-calcoaceticus Complex

DTIC Science & Technology

2011-10-01

Phoenix, and Vitek 2 systems). Discordant results were categorized as very major errors (VME), major errors (ME), and minor errors (mE). DNA sequences...01 OCT 2011 2 . REPORT TYPE N/A 3. DATES COVERED - 4. TITLE AND SUBTITLE Carbapenem Susceptibility Testing Errors Using Three Automated...FDA standards required for device approval (11). The Vitek 2 method was the only automated susceptibility method in our study that satisfied FDA
Effects of learning duration on implicit transfer.

PubMed

Tanaka, Kanji; Watanabe, Katsumi

2015-10-01

Implicit learning and transfer in sequence acquisition play important roles in daily life. Several previous studies have found that even when participants are not aware that a transfer sequence has been transformed from the learning sequence, they are able to perform the transfer sequence faster and more accurately; this suggests implicit transfer of visuomotor sequences. Here, we investigated whether implicit transfer could be modulated by the number of trials completed in a learning session. Participants learned a sequence through trial and error, known as the m × n task (Hikosaka et al. in J Neurophysiol 74:1652-1661, 1995). In the learning session, participants were required to successfully perform the same sequence 4, 12, 16, or 20 times. In the transfer session, participants then learned one of two other sequences: one where the button configuration Vertically Mirrored the learning sequence, or a randomly generated sequence. Our results show that even when participants did not notice the alternation rule (i.e., vertical mirroring), their total working time was less and their total number of errors was lower in the transfer session compared with those who performed a Random sequence, irrespective of the number of trials completed in the learning session. This result suggests that implicit transfer likely occurs even over a shorter learning duration.

Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions.

PubMed

Senol Cali, Damla; Kim, Jeremie S; Ghose, Saugata; Alkan, Can; Mutlu, Onur

2018-04-02

Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, to overcome the high error rates of the nanopore sequencing technology.
Comparative Omics-Driven Genome Annotation Refinement: Application across Yersiniae

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rutledge, Alexandra C.; Jones, Marcus B.; Chauhan, Sadhana

2012-03-27

Genome sequencing continues to be a rapidly evolving technology, yet most downstream aspects of genome annotation pipelines remain relatively stable or are even being abandoned. To date, the perceived value of manual curation for genome annotations is not offset by the real cost and time associated with the process. In order to balance the large number of sequences generated, the annotation process is now performed almost exclusively in an automated fashion for most genome sequencing projects. One possible way to reduce errors inherent to automated computational annotations is to apply data from 'omics' measurements (i.e. transcriptional and proteomic) to themore » un-annotated genome with a proteogenomic-based approach. This approach does require additional experimental and bioinformatics methods to include omics technologies; however, the approach is readily automatable and can benefit from rapid developments occurring in those research domains as well. The annotation process can be improved by experimental validation of transcription and translation and aid in the discovery of annotation errors. Here the concept of annotation refinement has been extended to include a comparative assessment of genomes across closely related species, as is becoming common in sequencing efforts. Transcriptomic and proteomic data derived from three highly similar pathogenic Yersiniae (Y. pestis CO92, Y. pestis pestoides F, and Y. pseudotuberculosis PB1/+) was used to demonstrate a comprehensive comparative omic-based annotation methodology. Peptide and oligo measurements experimentally validated the expression of nearly 40% of each strain's predicted proteome and revealed the identification of 28 novel and 68 previously incorrect protein-coding sequences (e.g., observed frameshifts, extended start sites, and translated pseudogenes) within the three current Yersinia genome annotations. Gene loss is presumed to play a major role in Y. pestis acquiring its niche as a virulent pathogen, thus the discovery of many translated pseudogenes underscores a need for functional analyses to investigate hypotheses related to divergence. Refinements included the discovery of a seemingly essential ribosomal protein, several virulence-associated factors, and a transcriptional regulator, among other proteins, most of which are annotated as hypothetical, that were missed during annotation.« less
Elucidation of cross-species proteomic effects in human and hominin bone proteome identification through a bioinformatics experiment.

PubMed

Welker, F

2018-02-20

The study of ancient protein sequences is increasingly focused on the analysis of older samples, including those of ancient hominins. The analysis of such ancient proteomes thereby potentially suffers from "cross-species proteomic effects": the loss of peptide and protein identifications at increased evolutionary distances due to a larger number of protein sequence differences between the database sequence and the analyzed organism. Error-tolerant proteomic search algorithms should theoretically overcome this problem at both the peptide and protein level; however, this has not been demonstrated. If error-tolerant searches do not overcome the cross-species proteomic issue then there might be inherent biases in the identified proteomes. Here, a bioinformatics experiment is performed to test this using a set of modern human bone proteomes and three independent searches against sequence databases at increasing evolutionary distances: the human (0 Ma), chimpanzee (6-8 Ma) and orangutan (16-17 Ma) reference proteomes, respectively. Incorrectly suggested amino acid substitutions are absent when employing adequate filtering criteria for mutable Peptide Spectrum Matches (PSMs), but roughly half of the mutable PSMs were not recovered. As a result, peptide and protein identification rates are higher in error-tolerant mode compared to non-error-tolerant searches but did not recover protein identifications completely. Data indicates that peptide length and the number of mutations between the target and database sequences are the main factors influencing mutable PSM identification. The error-tolerant results suggest that the cross-species proteomics problem is not overcome at increasing evolutionary distances, even at the protein level. Peptide and protein loss has the potential to significantly impact divergence dating and proteome comparisons when using ancient samples as there is a bias towards the identification of conserved sequences and proteins. Effects are minimized between moderately divergent proteomes, as indicated by almost complete recovery of informative positions in the search against the chimpanzee proteome (≈90%, 6-8 Ma). This provides a bioinformatic background to future phylogenetic and proteomic analysis of ancient hominin proteomes, including the future description of novel hominin amino acid sequences, but also has negative implications for the study of fast-evolving proteins in hominins, non-hominin animals, and ancient bacterial proteins in evolutionary contexts.
HangOut: generating clean PSI-BLAST profiles for domains with long insertions.

PubMed

Kim, Bong-Hyun; Cong, Qian; Grishin, Nick V

2010-06-15

Profile-based similarity search is an essential step in structure-function studies of proteins. However, inclusion of non-homologous sequence segments into a profile causes its corruption and results in false positives. Profile corruption is common in multidomain proteins, and single domains with long insertions are a significant source of errors. We developed a procedure (HangOut) that, for a single domain with specified insertion position, cleans erroneously extended PSI-BLAST alignments to generate better profiles. HangOut is implemented in Python 2.3 and runs on all Unix-compatible platforms. The source code is available under the GNU GPL license at http://prodata.swmed.edu/HangOut/. Supplementary data are available at Bioinformatics online.
Local neutral networks help maintain inaccurately replicating ribozymes.

PubMed

Szilágyi, András; Kun, Ádám; Szathmáry, Eörs

2014-01-01

The error threshold of replication limits the selectively maintainable genome size against recurrent deleterious mutations for most fitness landscapes. In the context of RNA replication a distinction between the genotypic and the phenotypic error threshold has been made; where the latter concerns the maintenance of secondary structure rather than sequence. RNA secondary structure is treated as a proxy for function. The phenotypic error threshold allows higher per digit mutation rates than its genotypic counterpart, and is known to increase with the frequency of neutral mutations in sequence space. Here we show that the degree of neutrality, i.e. the frequency of nearest-neighbour (one-step) neutral mutants is a remarkably accurate proxy for the overall frequency of such mutants in an experimentally verifiable formula for the phenotypic error threshold; this we achieve by the full numerical solution for the concentration of all sequences in mutation-selection balance up to length 16. We reinforce our previous result that currently known ribozymes could be selectively maintained by the accuracy known from the best available polymerase ribozymes. Furthermore, we show that in silico stabilizing selection can increase the mutational robustness of ribozymes due to the fact that they were produced by artificial directional selection in the first place. Our finding offers a better understanding of the error threshold and provides further insight into the plausibility of an ancient RNA world.
Fractal-like Distributions over the Rational Numbers in High-throughput Biological and Clinical Data

NASA Astrophysics Data System (ADS)

Trifonov, Vladimir; Pasqualucci, Laura; Dalla-Favera, Riccardo; Rabadan, Raul

2011-12-01

Recent developments in extracting and processing biological and clinical data are allowing quantitative approaches to studying living systems. High-throughput sequencing (HTS), expression profiles, proteomics, and electronic health records (EHR) are some examples of such technologies. Extracting meaningful information from those technologies requires careful analysis of the large volumes of data they produce. In this note, we present a set of fractal-like distributions that commonly appear in the analysis of such data. The first set of examples are drawn from a HTS experiment. Here, the distributions appear as part of the evaluation of the error rate of the sequencing and the identification of tumorogenic genomic alterations. The other examples are obtained from risk factor evaluation and analysis of relative disease prevalence and co-mordbidity as these appear in EHR. The distributions are also relevant to identification of subclonal populations in tumors and the study of quasi-species and intrahost diversity of viral populations.
Intracellular diversity of the V4 and V9 regions of the 18S rRNA in marine protists (radiolarians) assessed by high-throughput sequencing.

PubMed

Decelle, Johan; Romac, Sarah; Sasaki, Eriko; Not, Fabrice; Mahé, Frédéric

2014-01-01

Metabarcoding is a powerful tool for exploring microbial diversity in the environment, but its accurate interpretation is impeded by diverse technical (e.g. PCR and sequencing errors) and biological biases (e.g. intra-individual polymorphism) that remain poorly understood. To help interpret environmental metabarcoding datasets, we investigated the intracellular diversity of the V4 and V9 regions of the 18S rRNA gene from Acantharia and Nassellaria (radiolarians) using 454 pyrosequencing. Individual cells of radiolarians were isolated, and PCRs were performed with generalist primers to amplify the V4 and V9 regions. Different denoising procedures were employed to filter the pyrosequenced raw amplicons (Acacia, AmpliconNoise, Linkage method). For each of the six isolated cells, an average of 541 V4 and 562 V9 amplicons assigned to radiolarians were obtained, from which one numerically dominant sequence and several minor variants were found. At the 97% identity, a diversity metrics commonly used in environmental surveys, up to 5 distinct OTUs were detected in a single cell. However, most amplicons grouped within a single OTU whereas other OTUs contained very few amplicons. Different analytical methods provided evidence that most minor variants forming different OTUs correspond to PCR and sequencing artifacts. Duplicate PCR and sequencing from the same DNA extract of a single cell had only 9 to 16% of unique amplicons in common, and alignment visualization of V4 and V9 amplicons showed that most minor variants contained substitutions in highly-conserved regions. We conclude that intracellular variability of the 18S rRNA in radiolarians is very limited despite its multi-copy nature and the existence of multiple nuclei in these protists. Our study recommends some technical guidelines to conservatively discard artificial amplicons from metabarcoding datasets, and thus properly assess the diversity and richness of protists in the environment.
Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs.

PubMed

Powell, Bradford C; Hutchison, Clyde A

2006-01-19

Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene prediction. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes.
Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs

PubMed Central

Powell, Bradford C; Hutchison, Clyde A

2006-01-01

Background Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. Results "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene predicion. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. Conclusion Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes. PMID:16423288
Context and meter enhance long-range planning in music performance

PubMed Central

Mathias, Brian; Pfordresher, Peter Q.; Palmer, Caroline

2015-01-01

Neural responses demonstrate evidence of resonance, or oscillation, during the production of periodic auditory events. Music contains periodic auditory events that give rise to a sense of beat, which in turn generates a sense of meter on the basis of multiple periodicities. Metrical hierarchies may aid memory for music by facilitating similarity-based associations among sequence events at different periodic distances that unfold in longer contexts. A fundamental question is how metrical associations arising from a musical context influence memory during music performance. Longer contexts may facilitate metrical associations at higher hierarchical levels more than shorter contexts, a prediction of the range model, a formal model of planning processes in music performance (Palmer and Pfordresher, 2003; Pfordresher et al., 2007). Serial ordering errors, in which intended sequence events are produced in incorrect sequence positions, were measured as skilled pianists performed musical pieces that contained excerpts embedded in long or short musical contexts. Pitch errors arose from metrically similar positions and further sequential distances more often when the excerpt was embedded in long contexts compared to short contexts. Musicians’ keystroke intensities and error rates also revealed influences of metrical hierarchies, which differed for performances in long and short contexts. The range model accounted for contextual effects and provided better fits to empirical findings when metrical associations between sequence events were included. Longer sequence contexts may facilitate planning during sequence production by increasing conceptual similarity between hierarchically associated events. These findings are consistent with the notion that neural oscillations at multiple periodicities may strengthen metrical associations across sequence events during planning. PMID:25628550
Multi-species Identification of Polymorphic Peptide Variants via Propagation in Spectral Networks*

PubMed Central

Bandeira, Nuno

2016-01-01

Peptide and protein identification remains challenging in organisms with poorly annotated or rapidly evolving genomes, as are commonly encountered in environmental or biofuels research. Such limitations render tandem mass spectrometry (MS/MS) database search algorithms ineffective as they lack corresponding sequences required for peptide-spectrum matching. We address this challenge with the spectral networks approach to (1) match spectra of orthologous peptides across multiple related species and then (2) propagate peptide annotations from identified to unidentified spectra. We here present algorithms to assess the statistical significance of spectral alignments (Align-GF), reduce the impurity in spectral networks, and accurately estimate the error rate in propagated identifications. Analyzing three related Cyanothece species, a model organism for biohydrogen production, spectral networks identified peptides from highly divergent sequences from networks with dozens of variant peptides, including thousands of peptides in species lacking a sequenced genome. Our analysis further detected the presence of many novel putative peptides even in genomically characterized species, thus suggesting the possibility of gaps in our understanding of their proteomic and genomic expression. A web-based pipeline for spectral networks analysis is available at http://proteomics.ucsd.edu/software. PMID:27609420
Gaining knowledge from previously unexplained spectra-application of the PTM-Explorer software to detect PTM in HUPO BPP MS/MS data.

PubMed

Chamrad, Daniel C; Körting, Gerhard; Schäfer, Heike; Stephan, Christian; Thiele, Herbert; Apweiler, Rolf; Meyer, Helmut E; Marcus, Katrin; Blüggel, Martin

2006-09-01

A novel software tool named PTM-Explorer has been applied to LC-MS/MS datasets acquired within the Human Proteome Organisation (HUPO) Brain Proteome Project (BPP). PTM-Explorer enables automatic identification of peptide MS/MS spectra that were not explained in typical sequence database searches. The main focus was detection of PTMs, but PTM-Explorer detects also unspecific peptide cleavage, mass measurement errors, experimental modifications, amino acid substitutions, transpeptidation products and unknown mass shifts. To avoid a combinatorial problem the search is restricted to a set of selected protein sequences, which stem from previous protein identifications using a common sequence database search. Prior to application to the HUPO BPP data, PTM-Explorer was evaluated on excellently manually characterized and evaluated LC-MS/MS data sets from Alpha-A-Crystallin gel spots obtained from mouse eye lens. Besides various PTMs including phosphorylation, a wealth of experimental modifications and unspecific cleavage products were successfully detected, completing the primary structure information of the measured proteins. Our results indicate that a large amount of MS/MS spectra that currently remain unidentified in standard database searches contain valuable information that can only be elucidated using suitable software tools.
TypeLoader: A fast and efficient automated workflow for the annotation and submission of novel full-length HLA alleles.

PubMed

Surendranath, V; Albrecht, V; Hayhurst, J D; Schöne, B; Robinson, J; Marsh, S G E; Schmidt, A H; Lange, V

2017-07-01

Recent years have seen a rapid increase in the discovery of novel allelic variants of the human leukocyte antigen (HLA) genes. Commonly, only the exons encoding the peptide binding domains of novel HLA alleles are submitted. As a result, the IPD-IMGT/HLA Database lacks sequence information outside those regions for the majority of known alleles. This has implications for the application of the new sequencing technologies, which deliver sequence data often covering the complete gene. As these technologies simplify the characterization of the complete gene regions, it is desirable for novel alleles to be submitted as full-length sequences to the database. However, the manual annotation of full-length alleles and the generation of specific formats required by the sequence repositories is prone to error and time consuming. We have developed TypeLoader to address both these facets. With only the full-length sequence as a starting point, Typeloader performs automatic sequence annotation and subsequently handles all steps involved in preparing the specific formats for submission with very little manual intervention. TypeLoader is routinely used at the DKMS Life Science Lab and has aided in the successful submission of more than 900 novel HLA alleles as full-length sequences to the European Nucleotide Archive repository and the IPD-IMGT/HLA Database with a 95% reduction in the time spent on annotation and submission when compared with handling these processes manually. TypeLoader is implemented as a web application and can be easily installed and used on a standalone Linux desktop system or within a Linux client/server architecture. TypeLoader is downloadable from http://www.github.com/DKMS-LSL/typeloader. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Association of medication errors with drug classifications, clinical units, and consequence of errors: Are they related?

PubMed

Muroi, Maki; Shen, Jay J; Angosta, Alona

2017-02-01

Registered nurses (RNs) play an important role in safe medication administration and patient safety. This study examined a total of 1276 medication error (ME) incident reports made by RNs in hospital inpatient settings in the southwestern region of the United States. The most common drug class associated with MEs was cardiovascular drugs (24.7%). Among this class, anticoagulants had the most errors (11.3%). The antimicrobials was the second most common drug class associated with errors (19.1%) and vancomycin was the most common antimicrobial that caused errors in this category (6.1%). MEs occurred more frequently in the medical-surgical and intensive care units than any other hospital units. Ten percent of MEs reached the patients with harm and 11% reached the patients with increased monitoring. Understanding the contributing factors related to MEs, addressing and eliminating risk of errors across hospital units, and providing education and resources for nurses may help reduce MEs. Copyright © 2016 Elsevier Inc. All rights reserved.
Design and Evaluation of Illumina MiSeq-Compatible, 18S rRNA Gene-Specific Primers for Improved Characterization of Mixed Phototrophic Communities.

PubMed

Bradley, Ian M; Pinto, Ameet J; Guest, Jeremy S

2016-10-01

The use of high-throughput sequencing technologies with the 16S rRNA gene for characterization of bacterial and archaeal communities has become routine. However, the adoption of sequencing methods for eukaryotes has been slow, despite their significance to natural and engineered systems. There are large variations among the target genes used for amplicon sequencing, and for the 18S rRNA gene, there is no consensus on which hypervariable region provides the most suitable representation of diversity. Additionally, it is unclear how much PCR/sequencing bias affects the depiction of community structure using current primers. The present study amplified the V4 and V8-V9 regions from seven microalgal mock communities as well as eukaryotic communities from freshwater, coastal, and wastewater samples to examine the effect of PCR/sequencing bias on community structure and membership. We found that degeneracies on the 3' end of the current V4-specific primers impact read length and mean relative abundance. Furthermore, the PCR/sequencing error is markedly higher for GC-rich members than for communities with balanced GC content. Importantly, the V4 region failed to reliably capture 2 of the 12 mock community members, and the V8-V9 hypervariable region more accurately represents mean relative abundance and alpha and beta diversity. Overall, the V4 and V8-V9 regions show similar community representations over freshwater, coastal, and wastewater environments, but specific samples show markedly different communities. These results indicate that multiple primer sets may be advantageous for gaining a more complete understanding of community structure and highlight the importance of including mock communities composed of species of interest. The quantification of error associated with community representation by amplicon sequencing is a critical challenge that is often ignored. When target genes are amplified using currently available primers, differential amplification efficiencies result in inaccurate estimates of community structure. The extent to which amplification bias affects community representation and the accuracy with which different gene targets represent community structure are not known. As a result, there is no consensus on which region provides the most suitable representation of diversity for eukaryotes. This study determined the accuracy with which commonly used 18S rRNA gene primer sets represent community structure and identified particular biases related to PCR amplification and Illumina MiSeq sequencing in order to more accurately study eukaryotic microbial communities. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Analysis of phase error effects in multishot diffusion-prepared turbo spin echo imaging

PubMed Central

Cervantes, Barbara; Kooijman, Hendrik; Karampinos, Dimitrios C.

2017-01-01

Background To characterize the effect of phase errors on the magnitude and the phase of the diffusion-weighted (DW) signal acquired with diffusion-prepared turbo spin echo (dprep-TSE) sequences. Methods Motion and eddy currents were identified as the main sources of phase errors. An analytical expression for the effect of phase errors on the acquired signal was derived and verified using Bloch simulations, phantom, and in vivo experiments. Results Simulations and experiments showed that phase errors during the diffusion preparation cause both magnitude and phase modulation on the acquired data. When motion-induced phase error (MiPe) is accounted for (e.g., with motion-compensated diffusion encoding), the signal magnitude modulation due to the leftover eddy-current-induced phase error cannot be eliminated by the conventional phase cycling and sum-of-squares (SOS) method. By employing magnitude stabilizers, the phase-error-induced magnitude modulation, regardless of its cause, was removed but the phase modulation remained. The in vivo comparison between pulsed gradient and flow-compensated diffusion preparations showed that MiPe needed to be addressed in multi-shot dprep-TSE acquisitions employing magnitude stabilizers. Conclusions A comprehensive analysis of phase errors in dprep-TSE sequences showed that magnitude stabilizers are mandatory in removing the phase error induced magnitude modulation. Additionally, when multi-shot dprep-TSE is employed the inconsistent signal phase modulation across shots has to be resolved before shot-combination is performed. PMID:28516049
Model Performance Evaluation and Scenario Analysis (MPESA) Tutorial

EPA Pesticide Factsheets

The model performance evaluation consists of metrics and model diagnostics. These metrics provides modelers with statistical goodness-of-fit measures that capture magnitude only, sequence only, and combined magnitude and sequence errors.
Sequence polymorphism in an insect RNA virus field population: A snapshot from a single point in space and time reveals stochastic differences among and within individual hosts

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stenger, Drake C., E-mail: drake.stenger@ars.usda.

Population structure of Homalodisca coagulata Virus-1 (HoCV-1) among and within field-collected insects sampled from a single point in space and time was examined. Polymorphism in complete consensus sequences among single-insect isolates was dominated by synonymous substitutions. The mutant spectrum of the C2 helicase region within each single-insect isolate was unique and dominated by nonsynonymous singletons. Bootstrapping was used to correct the within-isolate nonsynonymous:synonymous arithmetic ratio (N:S) for RT-PCR error, yielding an N:S value ~one log-unit greater than that of consensus sequences. Probability of all possible single-base substitutions for the C2 region predicted N:S values within 95% confidence limits of themore » corrected within-isolate N:S when the only constraint imposed was viral polymerase error bias for transitions over transversions. These results indicate that bottlenecks coupled with strong negative/purifying selection drive consensus sequences toward neutral sequence space, and that most polymorphism within single-insect isolates is composed of newly-minted mutations sampled prior to selection. -- Highlights: •Sampling protocol minimized differential selection/history among isolates. •Polymorphism among consensus sequences dominated by negative/purifying selection. •Within-isolate N:S ratio corrected for RT-PCR error by bootstrapping. •Within-isolate mutant spectrum dominated by new mutations yet to undergo selection.« less
TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees.

PubMed

Mai, Uyen; Mirarab, Siavash

2018-05-08

Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. We propose an automatic method to detect such errors. We build a phylogeny including all the data then detect sequences that artificially inflate the tree diameter. We formulate an optimization problem, called the k-shrink problem, that seeks to find k leaves that could be removed to maximally reduce the tree diameter. We present an algorithm to find the exact solution for this problem in polynomial time. We then use several statistical tests to find outlier species that have an unexpectedly high impact on the tree diameter. These tests can use a single tree or a set of related gene trees and can also adjust to species-specific patterns of branch length. The resulting method is called TreeShrink. We test our method on six phylogenomic biological datasets and an HIV dataset and show that the method successfully detects and removes long branches. TreeShrink removes sequences more conservatively than rogue taxon removal and often reduces gene tree discordance more than rogue taxon removal once the amount of filtering is controlled. TreeShrink is an effective method for detecting sequences that lead to unrealistically long branch lengths in phylogenetic trees. The tool is publicly available at https://github.com/uym2/TreeShrink .
What Information is Stored in DNA: Does it Contain Digital Error Correcting Codes?

NASA Astrophysics Data System (ADS)

Liebovitch, Larry

1998-03-01

The longest term correlations in living systems are the information stored in DNA which reflects the evolutionary history of an organism. The 4 bases (A,T,G,C) encode sequences of amino acids as well as locations of binding sites for proteins that regulate DNA. The fidelity of this important information is maintained by ANALOG error check mechanisms. When a single strand of DNA is replicated the complementary base is inserted in the new strand. Sometimes the wrong base is inserted that sticks out disrupting the phosphate backbone. The new base is not yet methylated, so repair enzymes, that slide along the DNA, can tear out the wrong base and replace it with the right one. The bases in DNA form a sequence of 4 different symbols and so the information is encoded in a DIGITAL form. All the digital codes in our society (ISBN book numbers, UPC product codes, bank account numbers, airline ticket numbers) use error checking code, where some digits are functions of other digits to maintain the fidelity of transmitted informaiton. Does DNA also utitlize a DIGITAL error chekcing code to maintain the fidelity of its information and increase the accuracy of replication? That is, are some bases in DNA functions of other bases upstream or downstream? This raises the interesting mathematical problem: How does one determine whether some symbols in a sequence of symbols are a function of other symbols. It also bears on the issue of determining algorithmic complexity: What is the function that generates the shortest algorithm for reproducing the symbol sequence. The error checking codes most used in our technology are linear block codes. We developed an efficient method to test for the presence of such codes in DNA. We coded the 4 bases as (0,1,2,3) and used Gaussian elimination, modified for modulus 4, to test if some bases are linear combinations of other bases. We used this method to analyze the base sequence in the genes from the lac operon and cytochrome C. We did not find evidence for such error correcting codes in these genes. However, we analyzed only a small amount of DNA and if digitial error correcting schemes are present in DNA, they may be more subtle than such simple linear block codes. The basic issue we raise here, is how information is stored in DNA and an appreciation that digital symbol sequences, such as DNA, admit of interesting schemes to store and protect the fidelity of their information content. Liebovitch, Tao, Todorov, Levine. 1996. Biophys. J. 71:1539-1544. Supported by NIH grant EY6234.

Toward Joint Hypothesis-Tests Seismic Event Screening Analysis: Ms|mb and Event Depth

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anderson, Dale; Selby, Neil

2012-08-14

Well established theory can be used to combine single-phenomenology hypothesis tests into a multi-phenomenology event screening hypothesis test (Fisher's and Tippett's tests). Commonly used standard error in Ms:mb event screening hypothesis test is not fully consistent with physical basis. Improved standard error - Better agreement with physical basis, and correctly partitions error to include Model Error as a component of variance, correctly reduces station noise variance through network averaging. For 2009 DPRK test - Commonly used standard error 'rejects' H0 even with better scaling slope ({beta} = 1, Selby et al.), improved standard error 'fails to rejects' H0.
Mutations in Elongation Factor Ef-1α Affect the Frequency of Frameshifting and Amino Acid Misincorporation in Saccharomyces Cerevisiae

PubMed Central

Sandbaken, M. G.; Culbertson, M. R.

1988-01-01

A mutational analysis of the eukaryotic elongation factor EF-1α indicates that this protein functions to limit the frequency of errors during genetic code translation. We found that both amino acid misincorporation and reading frame errors are controlled by EF-1α. In order to examine the function of this protein, the TEF2 gene, which encodes EF-1α in Saccharomyces cerevisiae, was mutagenized in vitro with hydroxylamine. Sixteen independent TEF2 alleles were isolated by their ability to suppress frameshift mutations. DNA sequence analysis identified eight different sites in the EF-1α protein that elevate the frequency of mistranslation when mutated. These sites are located in two different regions of the protein. Amino acid substitutions located in or near the GTP-binding and hydrolysis domain of the protein cause suppression of frameshift and nonsense mutations. These mutations may effect mistranslation by altering the binding or hydrolysis of GTP. Amino acid substitutions located adjacent to a putative aminoacyl-tRNA binding region also suppress frameshift and nonsense mutations. These mutations may alter the binding of aminoacyl-tRNA by EF-1α. The identification of frameshift and nonsense suppressor mutations in EF-1α indicates a role for this protein in limiting amino acid misincorporation and reading frame errors. We suggest that these types of errors are controlled by a common mechanism or closely related mechanisms. PMID:3066688
Qualitative and quantitative assessment of Illumina's forensic STR and SNP kits on MiSeq FGx™.

PubMed

Sharma, Vishakha; Chow, Hoi Yan; Siegel, Donald; Wurmbach, Elisa

2017-01-01

Massively parallel sequencing (MPS) is a powerful tool transforming DNA analysis in multiple fields ranging from medicine, to environmental science, to evolutionary biology. In forensic applications, MPS offers the ability to significantly increase the discriminatory power of human identification as well as aid in mixture deconvolution. However, before the benefits of any new technology can be employed, a thorough evaluation of its quality, consistency, sensitivity, and specificity must be rigorously evaluated in order to gain a detailed understanding of the technique including sources of error, error rates, and other restrictions/limitations. This extensive study assessed the performance of Illumina's MiSeq FGx MPS system and ForenSeq™ kit in nine experimental runs including 314 reaction samples. In-depth data analysis evaluated the consequences of different assay conditions on test results. Variables included: sample numbers per run, targets per run, DNA input per sample, and replications. Results are presented as heat maps revealing patterns for each locus. Data analysis focused on read numbers (allele coverage), drop-outs, drop-ins, and sequence analysis. The study revealed that loci with high read numbers performed better and resulted in fewer drop-outs and well balanced heterozygous alleles. Several loci were prone to drop-outs which led to falsely typed homozygotes and therefore to genotype errors. Sequence analysis of allele drop-in typically revealed a single nucleotide change (deletion, insertion, or substitution). Analyses of sequences, no template controls, and spurious alleles suggest no contamination during library preparation, pooling, and sequencing, but indicate that sequencing or PCR errors may have occurred due to DNA polymerase infidelities. Finally, we found utilizing Illumina's FGx System at recommended conditions does not guarantee 100% outcomes for all samples tested, including the positive control, and required manual editing due to low read numbers and/or allele drop-in. These findings are important for progressing towards implementation of MPS in forensic DNA testing.
Temporal regularization of ultrasound-based liver motion estimation for image-guided radiation therapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

O’Shea, Tuathan P., E-mail: tuathan.oshea@icr.ac.uk; Bamber, Jeffrey C.; Harris, Emma J.

Purpose: Ultrasound-based motion estimation is an expanding subfield of image-guided radiation therapy. Although ultrasound can detect tissue motion that is a fraction of a millimeter, its accuracy is variable. For controlling linear accelerator tracking and gating, ultrasound motion estimates must remain highly accurate throughout the imaging sequence. This study presents a temporal regularization method for correlation-based template matching which aims to improve the accuracy of motion estimates. Methods: Liver ultrasound sequences (15–23 Hz imaging rate, 2.5–5.5 min length) from ten healthy volunteers under free breathing were used. Anatomical features (blood vessels) in each sequence were manually annotated for comparison withmore » normalized cross-correlation based template matching. Five sequences from a Siemens Acuson™ scanner were used for algorithm development (training set). Results from incremental tracking (IT) were compared with a temporal regularization method, which included a highly specific similarity metric and state observer, known as the α–β filter/similarity threshold (ABST). A further five sequences from an Elekta Clarity™ system were used for validation, without alteration of the tracking algorithm (validation set). Results: Overall, the ABST method produced marked improvements in vessel tracking accuracy. For the training set, the mean and 95th percentile (95%) errors (defined as the difference from manual annotations) were 1.6 and 1.4 mm, respectively (compared to 6.2 and 9.1 mm, respectively, for IT). For each sequence, the use of the state observer leads to improvement in the 95% error. For the validation set, the mean and 95% errors for the ABST method were 0.8 and 1.5 mm, respectively. Conclusions: Ultrasound-based motion estimation has potential to monitor liver translation over long time periods with high accuracy. Nonrigid motion (strain) and the quality of the ultrasound data are likely to have an impact on tracking performance. A future study will investigate spatial uniformity of motion and its effect on the motion estimation errors.« less
Methods of automatic nucleotide-sequence analysis. Multicomponent spectrophotometric analysis of mixtures of nucleic acid components by a least-squares procedure

PubMed Central

Lee, Sheila; McMullen, D.; Brown, G. L.; Stokes, A. R.

1965-01-01

1. A theoretical analysis of the errors in multicomponent spectrophotometric analysis of nucleoside mixtures, by a least-squares procedure, has been made to obtain an expression for the error coefficient, relating the error in calculated concentration to the error in extinction measurements. 2. The error coefficients, which depend only on the `library' of spectra used to fit the experimental curves, have been computed for a number of `libraries' containing the following nucleosides found in s-RNA: adenosine, guanosine, cytidine, uridine, 5-ribosyluracil, 7-methylguanosine, 6-dimethylaminopurine riboside, 6-methylaminopurine riboside and thymine riboside. 3. The error coefficients have been used to determine the best conditions for maximum accuracy in the determination of the compositions of nucleoside mixtures. 4. Experimental determinations of the compositions of nucleoside mixtures have been made and the errors found to be consistent with those predicted by the theoretical analysis. 5. It has been demonstrated that, with certain precautions, the multicomponent spectrophotometric method described is suitable as a basis for automatic nucleotide-composition analysis of oligonucleotides containing nine nucleotides. Used in conjunction with continuous chromatography and flow chemical techniques, this method can be applied to the study of the sequence of s-RNA. PMID:14346087
DNA Barcoding through Quaternary LDPC Codes

PubMed Central

Tapia, Elizabeth; Spetale, Flavio; Krsticevic, Flavia; Angelone, Laura; Bulacio, Pilar

2015-01-01

For many parallel applications of Next-Generation Sequencing (NGS) technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH) or have intrinsic poor error correcting abilities (Hamming). Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC) codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10−2 per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10−9 at the expense of a rate of read losses just in the order of 10−6. PMID:26492348
DNA Barcoding through Quaternary LDPC Codes.

PubMed

Tapia, Elizabeth; Spetale, Flavio; Krsticevic, Flavia; Angelone, Laura; Bulacio, Pilar

2015-01-01

For many parallel applications of Next-Generation Sequencing (NGS) technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH) or have intrinsic poor error correcting abilities (Hamming). Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC) codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10(-2) per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10(-9) at the expense of a rate of read losses just in the order of 10(-6).
Overcoming Species Boundaries in Peptide Identification with Bayesian Information Criterion-driven Error-tolerant Peptide Search (BICEPS)*

PubMed Central

Renard, Bernhard Y.; Xu, Buote; Kirchner, Marc; Zickmann, Franziska; Winter, Dominic; Korten, Simone; Brattig, Norbert W.; Tzur, Amit; Hamprecht, Fred A.; Steen, Hanno

2012-01-01

Currently, the reliable identification of peptides and proteins is only feasible when thoroughly annotated sequence databases are available. Although sequencing capacities continue to grow, many organisms remain without reliable, fully annotated reference genomes required for proteomic analyses. Standard database search algorithms fail to identify peptides that are not exactly contained in a protein database. De novo searches are generally hindered by their restricted reliability, and current error-tolerant search strategies are limited by global, heuristic tradeoffs between database and spectral information. We propose a Bayesian information criterion-driven error-tolerant peptide search (BICEPS) and offer an open source implementation based on this statistical criterion to automatically balance the information of each single spectrum and the database, while limiting the run time. We show that BICEPS performs as well as current database search algorithms when such algorithms are applied to sequenced organisms, whereas BICEPS only uses a remotely related organism database. For instance, we use a chicken instead of a human database corresponding to an evolutionary distance of more than 300 million years (International Chicken Genome Sequencing Consortium (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716). We demonstrate the successful application to cross-species proteomics with a 33% increase in the number of identified proteins for a filarial nematode sample of Litomosoides sigmodontis. PMID:22493179
Gene calling and bacterial genome annotation with BG7.

PubMed

Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

2015-01-01

New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).
Rare Variant Association Test with Multiple Phenotypes

PubMed Central

Lee, Selyeong; Won, Sungho; Kim, Young Jin; Kim, Yongkang; Kim, Bong-Jo; Park, Taesung

2016-01-01

Although genome-wide association studies (GWAS) have now discovered thousands of genetic variants associated with common traits, such variants cannot explain the large degree of “missing heritability,” likely due to rare variants. The advent of next generation sequencing technology has allowed rare variant detection and association with common traits, often by investigating specific genomic regions for rare variant effects on a trait. Although multiply correlated phenotypes are often concurrently observed in GWAS, most studies analyze only single phenotypes, which may lessen statistical power. To increase power, multivariate analyses, which consider correlations between multiple phenotypes, can be used. However, few existing multi-variant analyses can identify rare variants for assessing multiple phenotypes. Here, we propose Multivariate Association Analysis using Score Statistics (MAAUSS), to identify rare variants associated with multiple phenotypes, based on the widely used Sequence Kernel Association Test (SKAT) for a single phenotype. We applied MAAUSS to Whole Exome Sequencing (WES) data from a Korean population of 1,058 subjects, to discover genes associated with multiple traits of liver function. We then assessed validation of those genes by a replication study, using an independent dataset of 3,445 individuals. Notably, we detected the gene ZNF620 among five significant genes. We then performed a simulation study to compare MAAUSS's performance with existing methods. Overall, MAAUSS successfully conserved type 1 error rates and in many cases, had a higher power than the existing methods. This study illustrates a feasible and straightforward approach for identifying rare variants correlated with multiple phenotypes, with likely relevance to missing heritability. PMID:28039885
Association analysis using next-generation sequence data from publicly available control groups: the robust variance score statistic.

PubMed

Derkach, Andriy; Chiang, Theodore; Gong, Jiafen; Addis, Laura; Dobbins, Sara; Tomlinson, Ian; Houlston, Richard; Pal, Deb K; Strug, Lisa J

2014-08-01

Sufficiently powered case-control studies with next-generation sequence (NGS) data remain prohibitively expensive for many investigators. If feasible, a more efficient strategy would be to include publicly available sequenced controls. However, these studies can be confounded by differences in sequencing platform; alignment, single nucleotide polymorphism and variant calling algorithms; read depth; and selection thresholds. Assuming one can match cases and controls on the basis of ethnicity and other potential confounding factors, and one has access to the aligned reads in both groups, we investigate the effect of systematic differences in read depth and selection threshold when comparing allele frequencies between cases and controls. We propose a novel likelihood-based method, the robust variance score (RVS), that substitutes genotype calls by their expected values given observed sequence data. We show theoretically that the RVS eliminates read depth bias in the estimation of minor allele frequency. We also demonstrate that, using simulated and real NGS data, the RVS method controls Type I error and has comparable power to the 'gold standard' analysis with the true underlying genotypes for both common and rare variants. An RVS R script and instructions can be found at strug.research.sickkids.ca, and at https://github.com/strug-lab/RVS. lisa.strug@utoronto.ca Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Developing market class specific InDel markers from next generation sequence data in Phaseolus vulgaris L.

PubMed

Moghaddam, Samira Mafi; Song, Qijian; Mamidi, Sujan; Schmutz, Jeremy; Lee, Rian; Cregan, Perry; Osorno, Juan M; McClean, Phillip E

2014-01-01

Next generation sequence data provides valuable information and tools for genetic and genomic research and offers new insights useful for marker development. This data is useful for the design of accurate and user-friendly molecular tools. Common bean (Phaseolus vulgaris L.) is a diverse crop in which separate domestication events happened in each gene pool followed by race and market class diversification that has resulted in different morphological characteristics in each commercial market class. This has led to essentially independent breeding programs within each market class which in turn has resulted in limited within market class sequence variation. Sequence data from selected genotypes of five bean market classes (pinto, black, navy, and light and dark red kidney) were used to develop InDel-based markers specific to each market class. Design of the InDel markers was conducted through a combination of assembly, alignment and primer design software using 1.6× to 5.1× coverage of Illumina GAII sequence data for each of the selected genotypes. The procedure we developed for primer design is fast, accurate, less error prone, and higher throughput than when they are designed manually. All InDel markers are easy to run and score with no need for PCR optimization. A total of 2687 InDel markers distributed across the genome were developed. To highlight their usefulness, they were employed to construct a phylogenetic tree and a genetic map, showing that InDel markers are reliable, simple, and accurate.
Developing market class specific InDel markers from next generation sequence data in Phaseolus vulgaris L.

PubMed Central

Moghaddam, Samira Mafi; Song, Qijian; Mamidi, Sujan; Schmutz, Jeremy; Lee, Rian; Cregan, Perry; Osorno, Juan M.; McClean, Phillip E.

2013-01-01

Next generation sequence data provides valuable information and tools for genetic and genomic research and offers new insights useful for marker development. This data is useful for the design of accurate and user-friendly molecular tools. Common bean (Phaseolus vulgaris L.) is a diverse crop in which separate domestication events happened in each gene pool followed by race and market class diversification that has resulted in different morphological characteristics in each commercial market class. This has led to essentially independent breeding programs within each market class which in turn has resulted in limited within market class sequence variation. Sequence data from selected genotypes of five bean market classes (pinto, black, navy, and light and dark red kidney) were used to develop InDel-based markers specific to each market class. Design of the InDel markers was conducted through a combination of assembly, alignment and primer design software using 1.6× to 5.1× coverage of Illumina GAII sequence data for each of the selected genotypes. The procedure we developed for primer design is fast, accurate, less error prone, and higher throughput than when they are designed manually. All InDel markers are easy to run and score with no need for PCR optimization. A total of 2687 InDel markers distributed across the genome were developed. To highlight their usefulness, they were employed to construct a phylogenetic tree and a genetic map, showing that InDel markers are reliable, simple, and accurate. PMID:24860578
DOE Office of Scientific and Technical Information (OSTI.GOV)

Witte, Jonathon; Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, California 94720; Neaton, Jeffrey B.

With the aim of systematically characterizing the convergence of common families of basis sets such that general recommendations for basis sets can be made, we have tested a wide variety of basis sets against complete-basis binding energies across the S22 set of intermolecular interactions—noncovalent interactions of small and medium-sized molecules consisting of first- and second-row atoms—with three distinct density functional approximations: SPW92, a form of local-density approximation; B3LYP, a global hybrid generalized gradient approximation; and B97M-V, a meta-generalized gradient approximation with nonlocal correlation. We have found that it is remarkably difficult to reach the basis set limit; for the methodsmore » and systems examined, the most complete basis is Jensen’s pc-4. The Dunning correlation-consistent sequence of basis sets converges slowly relative to the Jensen sequence. The Karlsruhe basis sets are quite cost effective, particularly when a correction for basis set superposition error is applied: counterpoise-corrected def2-SVPD binding energies are better than corresponding energies computed in comparably sized Dunning and Jensen bases, and on par with uncorrected results in basis sets 3-4 times larger. These trends are exhibited regardless of the level of density functional approximation employed. A sense of the magnitude of the intrinsic incompleteness error of each basis set not only provides a foundation for guiding basis set choice in future studies but also facilitates quantitative comparison of existing studies on similar types of systems.« less
DNA-DNA hybridization values and their relationship to whole-genome sequence similarities.

PubMed

Goris, Johan; Konstantinidis, Konstantinos T; Klappenbach, Joel A; Coenye, Tom; Vandamme, Peter; Tiedje, James M

2007-01-01

DNA-DNA hybridization (DDH) values have been used by bacterial taxonomists since the 1960s to determine relatedness between strains and are still the most important criterion in the delineation of bacterial species. Since the extent of hybridization between a pair of strains is ultimately governed by their respective genomic sequences, we examined the quantitative relationship between DDH values and genome sequence-derived parameters, such as the average nucleotide identity (ANI) of common genes and the percentage of conserved DNA. A total of 124 DDH values were determined for 28 strains for which genome sequences were available. The strains belong to six important and diverse groups of bacteria for which the intra-group 16S rRNA gene sequence identity was greater than 94 %. The results revealed a close relationship between DDH values and ANI and between DNA-DNA hybridization and the percentage of conserved DNA for each pair of strains. The recommended cut-off point of 70 % DDH for species delineation corresponded to 95 % ANI and 69 % conserved DNA. When the analysis was restricted to the protein-coding portion of the genome, 70 % DDH corresponded to 85 % conserved genes for a pair of strains. These results reveal extensive gene diversity within the current concept of "species". Examination of reciprocal values indicated that the level of experimental error associated with the DDH method is too high to reveal the subtle differences in genome size among the strains sampled. It is concluded that ANI can accurately replace DDH values for strains for which genome sequences are available.
CBrowse: a SAM/BAM-based contig browser for transcriptome assembly visualization and analysis.

PubMed

Li, Pei; Ji, Guoli; Dong, Min; Schmidt, Emily; Lenox, Douglas; Chen, Liangliang; Liu, Qi; Liu, Lin; Zhang, Jie; Liang, Chun

2012-09-15

To address the impending need for exploring rapidly increased transcriptomics data generated for non-model organisms, we developed CBrowse, an AJAX-based web browser for visualizing and analyzing transcriptome assemblies and contigs. Designed in a standard three-tier architecture with a data pre-processing pipeline, CBrowse is essentially a Rich Internet Application that offers many seamlessly integrated web interfaces and allows users to navigate, sort, filter, search and visualize data smoothly. The pre-processing pipeline takes the contig sequence file in FASTA format and its relevant SAM/BAM file as the input; detects putative polymorphisms, simple sequence repeats and sequencing errors in contigs and generates image, JSON and database-compatible CSV text files that are directly utilized by different web interfaces. CBowse is a generic visualization and analysis tool that facilitates close examination of assembly quality, genetic polymorphisms, sequence repeats and/or sequencing errors in transcriptome sequencing projects. CBrowse is distributed under the GNU General Public License, available at http://bioinfolab.muohio.edu/CBrowse/ liangc@muohio.edu or liangc.mu@gmail.com; glji@xmu.edu.cn Supplementary data are available at Bioinformatics online.
Partial bisulfite conversion for unique template sequencing.

PubMed

Kumar, Vijay; Rosenbaum, Julie; Wang, Zihua; Forcier, Talitha; Ronemus, Michael; Wigler, Michael; Levy, Dan

2018-01-25

We introduce a new protocol, mutational sequencing or muSeq, which uses sodium bisulfite to randomly deaminate unmethylated cytosines at a fixed and tunable rate. The muSeq protocol marks each initial template molecule with a unique mutation signature that is present in every copy of the template, and in every fragmented copy of a copy. In the sequenced read data, this signature is observed as a unique pattern of C-to-T or G-to-A nucleotide conversions. Clustering reads with the same conversion pattern enables accurate count and long-range assembly of initial template molecules from short-read sequence data. We explore count and low-error sequencing by profiling 135 000 restriction fragments in a PstI representation, demonstrating that muSeq improves copy number inference and significantly reduces sporadic sequencer error. We explore long-range assembly in the context of cDNA, generating contiguous transcript clusters greater than 3,000 bp in length. The muSeq assemblies reveal transcriptional diversity not observable from short-read data alone. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Implicit transfer of reversed temporal structure in visuomotor sequence learning.

PubMed

Tanaka, Kanji; Watanabe, Katsumi

2014-04-01

Some spatio-temporal structures are easier to transfer implicitly in sequential learning. In this study, we investigated whether the consistent reversal of triads of learned components would support the implicit transfer of their temporal structure in visuomotor sequence learning. A triad comprised three sequential button presses ([1][2][3]) and seven consecutive triads comprised a sequence. Participants learned sequences by trial and error, until they could complete it 20 times without error. Then, they learned another sequence, in which each triad was reversed ([3][2][1]), partially reversed ([2][1][3]), or switched so as not to overlap with the other conditions ([2][3][1] or [3][1][2]). Even when the participants did not notice the alternation rule, the consistent reversal of the temporal structure of each triad led to better implicit transfer; this was confirmed in a subsequent experiment. These results suggest that the implicit transfer of the temporal structure of a learned sequence can be influenced by both the structure and consistency of the change. Copyright © 2013 Cognitive Science Society, Inc.
TABLE D - WMO AND LOCAL (NCEP) DESCRIPTORS AS WELL AS THOSE AWAITING

Science.gov Websites

sequences common to satellite observations None 3 05 Meteorological or hydrological sequences common to Vertical sounding sequences (conventional data) None 3 10 Vertical sounding sequences (satellite data) None (satellite data) None 3 13 Sequences common to image data None 3 14 Reserved None 3 15 Oceanographic report
Model Error Budgets

NASA Technical Reports Server (NTRS)

Briggs, Hugh C.

2008-01-01

An error budget is a commonly used tool in design of complex aerospace systems. It represents system performance requirements in terms of allowable errors and flows these down through a hierarchical structure to lower assemblies and components. The requirements may simply be 'allocated' based upon heuristics or experience, or they may be designed through use of physics-based models. This paper presents a basis for developing an error budget for models of the system, as opposed to the system itself. The need for model error budgets arises when system models are a principle design agent as is increasingly more common for poorly testable high performance space systems.

Robust video super-resolution with registration efficiency adaptation

NASA Astrophysics Data System (ADS)

Zhang, Xinfeng; Xiong, Ruiqin; Ma, Siwei; Zhang, Li; Gao, Wen

2010-07-01

Super-Resolution (SR) is a technique to construct a high-resolution (HR) frame by fusing a group of low-resolution (LR) frames describing the same scene. The effectiveness of the conventional super-resolution techniques, when applied on video sequences, strongly relies on the efficiency of motion alignment achieved by image registration. Unfortunately, such efficiency is limited by the motion complexity in the video and the capability of adopted motion model. In image regions with severe registration errors, annoying artifacts usually appear in the produced super-resolution video. This paper proposes a robust video super-resolution technique that adapts itself to the spatially-varying registration efficiency. The reliability of each reference pixel is measured by the corresponding registration error and incorporated into the optimization objective function of SR reconstruction. This makes the SR reconstruction highly immune to the registration errors, as outliers with higher registration errors are assigned lower weights in the objective function. In particular, we carefully design a mechanism to assign weights according to registration errors. The proposed superresolution scheme has been tested with various video sequences and experimental results clearly demonstrate the effectiveness of the proposed method.
Comparison of visual and emotional continuous performance test related to sequence of presentation, gender and age.

PubMed

Markovska-Simoska, S; Pop-Jordanova, N

2009-07-01

(Full text is available at http://www.manu.edu.mk/prilozi). Continous Performance Tests (CPTs) form a group of paradigms for the evaluation of attention and, to a lesser degree, the response inhibition (or disinhibition) component of executive control. The object of this study was to compare performance on a CPT using both visual and emotional tasks in 46 normal adult subjects. In particular, it was to examine the effects of the type of task (VCPT or ECPT), sequence of presentation, and gender/age influence on performance as measured errors of omission, errors of commission, reaction time and variation of reaction time. From the results we can assume that there are significantly worse performance parameters for ECPT than VCPT tasks, with a probable explanation of the influence of emotional stimuli on attention and information-processing and no significant effect of order of presentation and gender on performance. Significant differences with more omission errors for older groups were obtained, showing better attention in younger subjects. Key words: VCPT, ECPT, omission errors, commission errors, reaction time, variation of reaction time, normal adults.
SU-F-E-02: A Feasibility Study for Application of Metal Artifact Reduction Techniques in MR-Guided Brachytherapy Gynecological Cancer with Titanium Applicators

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kadbi, M

Purpose: Utilization of Titanium Tandem and Ring (T&R) applicators in MR-guided brachytherapy has become widespread for gynecological cancer treatment. However, Titanium causes magnetic field disturbance and susceptibility artifact, which complicate image interpretation. In this study, metal artifact reduction techniques were employed to improve the image quality and reduce the metal related artifacts. Methods: Several techniques were employed to reduce the metal artifact caused by titanium T&R applicator. These techniques include Metal Artifact Reduction Sequence (MARS), View Angle Tilting (VAT) to correct in-plane distortion, and Slice Encoding for Metal Artifact Correction (SEMAC) for through-plane artifact correction. Moreover, MARS can be combinedmore » with VAT to further reduce the in-plane artifact by reapplying the selection gradients during the readout (MARS+VAT). SEMAC uses a slice selective excitation but acquires additional z-encodings in order to resolve off-resonant signal and to reduce through-plane distortions. Results: Comparison between the clinical sequences revealed that increasing the bandwidth reduces the error in measured diameter of T&R. However, the error is larger than 4mm for the best case with highest bandwidth and spatial resolution. MARS+VAT with isotropic resolution of 1mm reduced the error to 1.9mm which is the least among the examined 2D sequences. The measured diameter of tandem from SEMAC+VAT has the closest value to the actual diameter of tandem (3.2mm) and the error was reduced to less than 1mm. In addition, SEMAC+VAT significantly reduces the blooming artifact in the ring compared to clinical sequences. Conclusion: A higher bandwidth and spatial resolution sequence reduces the artifact and diameter of applicator with a slight compromise in SNR. Metal artifact reduction sequences decrease the distortion associated with titanium applicator. SEMAC+VAT sequence in combination with VAT revealed promising results for titanium imaging and can be utilized for MR-guided brachytherapy in gynecological cancer. The author is employee with Philips Healthcare.« less
Robust dynamical decoupling for quantum computing and quantum memory.

PubMed

Souza, Alexandre M; Alvarez, Gonzalo A; Suter, Dieter

2011-06-17

Dynamical decoupling (DD) is a popular technique for protecting qubits from the environment. However, unless special care is taken, experimental errors in the control pulses used in this technique can destroy the quantum information instead of preserving it. Here, we investigate techniques for making DD sequences robust against different types of experimental errors while retaining good decoupling efficiency in a fluctuating environment. We present experimental data from solid-state nuclear spin qubits and introduce a new DD sequence that is suitable for quantum computing and quantum memory.
Overcoming bias and systematic errors in next generation sequencing data.

PubMed

Taub, Margaret A; Corrada Bravo, Hector; Irizarry, Rafael A

2010-12-10

Considerable time and effort has been spent in developing analysis and quality assessment methods to allow the use of microarrays in a clinical setting. As is the case for microarrays and other high-throughput technologies, data from new high-throughput sequencing technologies are subject to technological and biological biases and systematic errors that can impact downstream analyses. Only when these issues can be readily identified and reliably adjusted for will clinical applications of these new technologies be feasible. Although much work remains to be done in this area, we describe consistently observed biases that should be taken into account when analyzing high-throughput sequencing data. In this article, we review current knowledge about these biases, discuss their impact on analysis results, and propose solutions.
GenomePeek—an online tool for prokaryotic genome and metagenome analysis

DOE PAGES

McNair, Katelyn; Edwards, Robert A.

2015-06-16

As increases in prokaryotic sequencing take place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek) was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping errormore » rates low, as well as offering unique data visualization options.« less
Patterns of technical error among surgical malpractice claims: an analysis of strategies to prevent injury to surgical patients.

PubMed

Regenbogen, Scott E; Greenberg, Caprice C; Studdert, David M; Lipsitz, Stuart R; Zinner, Michael J; Gawande, Atul A

2007-11-01

To identify the most prevalent patterns of technical errors in surgery, and evaluate commonly recommended interventions in light of these patterns. The majority of surgical adverse events involve technical errors, but little is known about the nature and causes of these events. We examined characteristics of technical errors and common contributing factors among closed surgical malpractice claims. Surgeon reviewers analyzed 444 randomly sampled surgical malpractice claims from four liability insurers. Among 258 claims in which injuries due to error were detected, 52% (n = 133) involved technical errors. These technical errors were further analyzed with a structured review instrument designed by qualitative content analysis. Forty-nine percent of the technical errors caused permanent disability; an additional 16% resulted in death. Two-thirds (65%) of the technical errors were linked to manual error, 9% to errors in judgment, and 26% to both manual and judgment error. A minority of technical errors involved advanced procedures requiring special training ("index operations"; 16%), surgeons inexperienced with the task (14%), or poorly supervised residents (9%). The majority involved experienced surgeons (73%), and occurred in routine, rather than index, operations (84%). Patient-related complexities-including emergencies, difficult or unexpected anatomy, and previous surgery-contributed to 61% of technical errors, and technology or systems failures contributed to 21%. Most technical errors occur in routine operations with experienced surgeons, under conditions of increased patient complexity or systems failure. Commonly recommended interventions, including restricting high-complexity operations to experienced surgeons, additional training for inexperienced surgeons, and stricter supervision of trainees, are likely to address only a minority of technical errors. Surgical safety research should instead focus on improving decision-making and performance in routine operations for complex patients and circumstances.
SBL-Online: Implementing Studio-Based Learning Techniques in an Online Introductory Programming Course to Address Common Programming Errors and Misconceptions

ERIC Educational Resources Information Center

Polo, Blanca J.

2013-01-01

Much research has been done in regards to student programming errors, online education and studio-based learning (SBL) in computer science education. This study furthers this area by bringing together this knowledge and applying it to proactively help students overcome impasses caused by common student programming errors. This project proposes a…
Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting.

PubMed

Khan, Tarik A; Friedensohn, Simon; Gorter de Vries, Arthur R; Straszewski, Jakub; Ruscheweyh, Hans-Joachim; Reddy, Sai T

2016-03-01

High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion-the intraclonal diversity index-which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology.
Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting

PubMed Central

Khan, Tarik A.; Friedensohn, Simon; de Vries, Arthur R. Gorter; Straszewski, Jakub; Ruscheweyh, Hans-Joachim; Reddy, Sai T.

2016-01-01

High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion—the intraclonal diversity index—which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology. PMID:26998518
QualComp: a new lossy compressor for quality scores based on rate distortion theory

PubMed Central

2013-01-01

Background Next Generation Sequencing technologies have revolutionized many fields in biology by reducing the time and cost required for sequencing. As a result, large amounts of sequencing data are being generated. A typical sequencing data file may occupy tens or even hundreds of gigabytes of disk space, prohibitively large for many users. This data consists of both the nucleotide sequences and per-base quality scores that indicate the level of confidence in the readout of these sequences. Quality scores account for about half of the required disk space in the commonly used FASTQ format (before compression), and therefore the compression of the quality scores can significantly reduce storage requirements and speed up analysis and transmission of sequencing data. Results In this paper, we present a new scheme for the lossy compression of the quality scores, to address the problem of storage. Our framework allows the user to specify the rate (bits per quality score) prior to compression, independent of the data to be compressed. Our algorithm can work at any rate, unlike other lossy compression algorithms. We envisage our algorithm as being part of a more general compression scheme that works with the entire FASTQ file. Numerical experiments show that we can achieve a better mean squared error (MSE) for small rates (bits per quality score) than other lossy compression schemes. For the organism PhiX, whose assembled genome is known and assumed to be correct, we show that it is possible to achieve a significant reduction in size with little compromise in performance on downstream applications (e.g., alignment). Conclusions QualComp is an open source software package, written in C and freely available for download at https://sourceforge.net/projects/qualcomp. PMID:23758828
The Essential Component in DNA-Based Information Storage System: Robust Error-Tolerating Module

PubMed Central

Yim, Aldrin Kay-Yuen; Yu, Allen Chi-Shing; Li, Jing-Woei; Wong, Ada In-Chun; Loo, Jacky F. C.; Chan, King Ming; Kong, S. K.; Yip, Kevin Y.; Chan, Ting-Fung

2014-01-01

The size of digital data is ever increasing and is expected to grow to 40,000 EB by 2020, yet the estimated global information storage capacity in 2011 is <300 EB, indicating that most of the data are transient. DNA, as a very stable nano-molecule, is an ideal massive storage device for long-term data archive. The two most notable illustrations are from Church et al. and Goldman et al., whose approaches are well-optimized for most sequencing platforms – short synthesized DNA fragments without homopolymer. Here, we suggested improvements on error handling methodology that could enable the integration of DNA-based computational process, e.g., algorithms based on self-assembly of DNA. As a proof of concept, a picture of size 438 bytes was encoded to DNA with low-density parity-check error-correction code. We salvaged a significant portion of sequencing reads with mutations generated during DNA synthesis and sequencing and successfully reconstructed the entire picture. A modular-based programing framework – DNAcodec with an eXtensible Markup Language-based data format was also introduced. Our experiments demonstrated the practicability of long DNA message recovery with high error tolerance, which opens the field to biocomputing and synthetic biology. PMID:25414846
Seshat: A Web service for accurate annotation, validation, and analysis of TP53 variants generated by conventional and next-generation sequencing.

PubMed

Tikkanen, Tuomas; Leroy, Bernard; Fournier, Jean Louis; Risques, Rosa Ana; Malcikova, Jitka; Soussi, Thierry

2018-07-01

Accurate annotation of genomic variants in human diseases is essential to allow personalized medicine. Assessment of somatic and germline TP53 alterations has now reached the clinic and is required in several circumstances such as the identification of the most effective cancer therapy for patients with chronic lymphocytic leukemia (CLL). Here, we present Seshat, a Web service for annotating TP53 information derived from sequencing data. A flexible framework allows the use of standard file formats such as Mutation Annotation Format (MAF) or Variant Call Format (VCF), as well as common TXT files. Seshat performs accurate variant annotations using the Human Genome Variation Society (HGVS) nomenclature and the stable TP53 genomic reference provided by the Locus Reference Genomic (LRG). In addition, using the 2017 release of the UMD_TP53 database, Seshat provides multiple statistical information for each TP53 variant including database frequency, functional activity, or pathogenicity. The information is delivered in standardized output tables that minimize errors and facilitate comparison of mutational data across studies. Seshat is a beneficial tool to interpret the ever-growing TP53 sequencing data generated by multiple sequencing platforms and it is freely available via the TP53 Website, http://p53.fr or directly at http://vps338341.ovh.net/. © 2018 Wiley Periodicals, Inc.
Arabidopsis intragenomic conserved noncoding sequence

PubMed Central

Thomas, Brian C.; Rapaka, Lakshmi; Lyons, Eric; Pedersen, Brent; Freeling, Michael

2007-01-01

After the most recent tetraploidy in the Arabidopsis lineage, most gene pairs lost one, but not both, of their duplicates. We manually inspected the 3,179 retained gene pairs and their surrounding gene space still present in the genome using a custom-made viewer application. The display of these pairs allowed us to define intragenic conserved noncoding sequences (CNSs), identify exon annotation errors, and discover potentially new genes. Using a strict algorithm to sort high-scoring pair sequences from the bl2seq data, we created a database of 14,944 intragenomic Arabidopsis CNSs. The mean CNS length is 31 bp, ranging from 15 to 285 bp. There are ≈1.7 CNSs associated with a typical gene, and Arabidopsis CNSs are found in all areas around exons, most frequently in the 5′ upstream region. Gene ontology classifications related to transcription, regulation, or “response to …” external or endogenous stimuli, especially hormones, tend to be significantly overrepresented among genes containing a large number of CNSs, whereas protein localization, transport, and metabolism are common among genes with no CNSs. There is a 1.5% overlap between these CNSs and the 218,982 putative RNAs in the Arabidopsis Small RNA Project database, allowing for two mismatches. These CNSs provide a unique set of noncoding sequences enriched for function. CNS function is implied by evolutionary conservation and independently supported because CNS-richness predicts regulatory gene ontology categories. PMID:17301222
Remediating Common Math Errors.

ERIC Educational Resources Information Center

Wagner, Rudolph F.

1981-01-01

Explanations and remediation suggestions for five types of mathematics errors due either to perceptual or cognitive difficulties are given. Error types include directionality problems, mirror writing, visually misperceived signs, diagnosed directionality problems, and mixed process errors. (CL)
Endodontic Procedural Errors: Frequency, Type of Error, and the Most Frequently Treated Tooth.

PubMed

Yousuf, Waqas; Khan, Moiz; Mehdi, Hasan

2015-01-01

Introduction. The aim of this study is to determine the most common endodontically treated tooth and the most common error produced during treatment and to note the association of particular errors with particular teeth. Material and Methods. Periapical radiographs were taken of all the included teeth and were stored and assessed using DIGORA Optime. Teeth in each group were evaluated for presence or absence of procedural errors (i.e., overfill, underfill, ledge formation, perforations, apical transportation, and/or instrument separation) and the most frequent tooth to undergo endodontic treatment was also noted. Results. A total of 1748 root canal treated teeth were assessed, out of which 574 (32.8%) contained a procedural error. Out of these 397 (22.7%) were overfilled, 155 (8.9%) were underfilled, 16 (0.9%) had instrument separation, and 7 (0.4%) had apical transportation. The most frequently treated tooth was right permanent mandibular first molar (11.3%). The least commonly treated teeth were the permanent mandibular third molars (0.1%). Conclusion. Practitioners should show greater care to maintain accuracy of the working length throughout the procedure, as errors in length accounted for the vast majority of errors and special care should be taken when working on molars.
Exploring Common Misconceptions and Errors about Fractions among College Students in Saudi Arabia

ERIC Educational Resources Information Center

Alghazo, Yazan M.; Alghazo, Runna

2017-01-01

The purpose of this study was to investigate what common errors and misconceptions about fractions exist among Saudi Arabian college students. Moreover, the study aimed at investigating the possible explanations for the existence of such misconceptions among students. A researcher developed mathematical test aimed at identifying common errors…
Bellman's GAP--a language and compiler for dynamic programming in sequence analysis.

PubMed

Sauthoff, Georg; Möhl, Mathias; Janssen, Stefan; Giegerich, Robert

2013-03-01

Dynamic programming is ubiquitous in bioinformatics. Developing and implementing non-trivial dynamic programming algorithms is often error prone and tedious. Bellman's GAP is a new programming system, designed to ease the development of bioinformatics tools based on the dynamic programming technique. In Bellman's GAP, dynamic programming algorithms are described in a declarative style by tree grammars, evaluation algebras and products formed thereof. This bypasses the design of explicit dynamic programming recurrences and yields programs that are free of subscript errors, modular and easy to modify. The declarative modules are compiled into C++ code that is competitive to carefully hand-crafted implementations. This article introduces the Bellman's GAP system and its language, GAP-L. It then demonstrates the ease of development and the degree of re-use by creating variants of two common bioinformatics algorithms. Finally, it evaluates Bellman's GAP as an implementation platform of 'real-world' bioinformatics tools. Bellman's GAP is available under GPL license from http://bibiserv.cebitec.uni-bielefeld.de/bellmansgap. This Web site includes a repository of re-usable modules for RNA folding based on thermodynamics.
In silico comparison of genomic regions containing genes coding for enzymes and transcription factors for the phenylpropanoid pathway in Phaseolus vulgaris L. and Glycine max L. Merr

PubMed Central

Reinprecht, Yarmilla; Yadegari, Zeinab; Perry, Gregory E.; Siddiqua, Mahbuba; Wright, Lori C.; McClean, Phillip E.; Pauls, K. Peter

2013-01-01

Legumes contain a variety of phytochemicals derived from the phenylpropanoid pathway that have important effects on human health as well as seed coat color, plant disease resistance and nodulation. However, the information about the genes involved in this important pathway is fragmentary in common bean (Phaseolus vulgaris L.). The objectives of this research were to isolate genes that function in and control the phenylpropanoid pathway in common bean, determine their genomic locations in silico in common bean and soybean, and analyze sequences of the 4CL gene family in two common bean genotypes. Sequences of phenylpropanoid pathway genes available for common bean or other plant species were aligned, and the conserved regions were used to design sequence-specific primers. The PCR products were cloned and sequenced and the gene sequences along with common bean gene-based (g) markers were BLASTed against the Glycine max v.1.0 genome and the P. vulgaris v.1.0 (Andean) early release genome. In addition, gene sequences were BLASTed against the OAC Rex (Mesoamerican) genome sequence assembly. In total, fragments of 46 structural and regulatory phenylpropanoid pathway genes were characterized in this way and placed in silico on common bean and soybean sequence maps. The maps contain over 250 common bean g and SSR (simple sequence repeat) markers and identify the positions of more than 60 additional phenylpropanoid pathway gene sequences, plus the putative locations of seed coat color genes. The majority of cloned phenylpropanoid pathway gene sequences were mapped to one location in the common bean genome but had two positions in soybean. The comparison of the genomic maps confirmed previous studies, which show that common bean and soybean share genomic regions, including those containing phenylpropanoid pathway gene sequences, with conserved synteny. Indels identified in the comparison of Andean and Mesoamerican common bean 4CL gene sequences might be used to develop inter-pool phenylpropanoid pathway gene-based markers. We anticipate that the information obtained by this study will simplify and accelerate selections of common bean with specific phenylpropanoid pathway alleles to increase the contents of beneficial phenylpropanoids in common bean and other legumes. PMID:24046770
ViVaMBC: estimating viral sequence variation in complex populations from illumina deep-sequencing data using model-based clustering.

PubMed

Verbist, Bie; Clement, Lieven; Reumers, Joke; Thys, Kim; Vapirev, Alexander; Talloen, Willem; Wetzels, Yves; Meys, Joris; Aerssens, Jeroen; Bijnens, Luc; Thas, Olivier

2015-02-22

Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses. Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step. ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently a lot of information is already contained in the quality scores enabling the model based clustering procedure to adjust the majority of the sequencing errors. Overall the sensitivity of ViVaMBC is such that technical constraints like PCR errors start to form the bottleneck for low frequency variant detection.

Effects of learning with explicit elaboration on implicit transfer of visuomotor sequence learning.

PubMed

Tanaka, Kanji; Watanabe, Katsumi

2013-08-01

Intervals between stimuli and/or responses have significant influences on sequential learning. In the present study, we investigated whether transfer would occur even when the intervals and the visual configurations in a sequence were drastically changed so that participants did not notice that the required sequences of responses were identical. In the experiment, two (or three) sequential button presses comprised a "set," and nine (or six) consecutive sets comprised a "hyperset." In the first session, participants learned either a 2 × 9 or 3 × 6 hyperset by trial and error until they completed it 20 times without error. In the second block, the 2 × 9 (3 × 6) hyperset was changed into the 3 × 6 (2 × 9) hyperset, resulting in different visual configurations and intervals between stimuli and responses. Participants were assigned into two groups: the Identical and Random groups. In the Identical group, the sequence (i.e., the buttons to be pressed) in the second block was identical to that in the first block. In the Random group, a new hyperset was learned. Even in the Identical group, no participants noticed that the sequences were identical. Nevertheless, a significant transfer of performance occurred. However, in the subsequent experiment that did not require explicit trial-and-error learning in the first session, implicit transfer in the second session did not occur. These results indicate that learning with explicit elaboration strengthens the implicit representation of the sequence order as a whole; this might occur independently of the intervals between elements and enable implicit transfer.
Proteomic Identification of Monoclonal Antibodies from Serum

PubMed Central

2015-01-01

Characterizing the in vivo dynamics of the polyclonal antibody repertoire in serum, such as that which might arise in response to stimulation with an antigen, is difficult due to the presence of many highly similar immunoglobulin proteins, each specified by distinct B lymphocytes. These challenges have precluded the use of conventional mass spectrometry for antibody identification based on peptide mass spectral matches to a genomic reference database. Recently, progress has been made using bottom-up analysis of serum antibodies by nanoflow liquid chromatography/high-resolution tandem mass spectrometry combined with a sample-specific antibody sequence database generated by high-throughput sequencing of individual B cell immunoglobulin variable domains (V genes). Here, we describe how intrinsic features of antibody primary structure, most notably the interspersed segments of variable and conserved amino acid sequences, generate recurring patterns in the corresponding peptide mass spectra of V gene peptides, greatly complicating the assignment of correct sequences to mass spectral data. We show that the standard method of decoy-based error modeling fails to account for the error introduced by these highly similar sequences, leading to a significant underestimation of the false discovery rate. Because of these effects, antibody-derived peptide mass spectra require increased stringency in their interpretation. The use of filters based on the mean precursor ion mass accuracy of peptide-spectrum matches is shown to be particularly effective in distinguishing between “true” and “false” identifications. These findings highlight important caveats associated with the use of standard database search and error-modeling methods with nonstandard data sets and custom sequence databases. PMID:24684310
Adaptive decoding of convolutional codes

NASA Astrophysics Data System (ADS)

Hueske, K.; Geldmacher, J.; Götze, J.

2007-06-01

Convolutional codes, which are frequently used as error correction codes in digital transmission systems, are generally decoded using the Viterbi Decoder. On the one hand the Viterbi Decoder is an optimum maximum likelihood decoder, i.e. the most probable transmitted code sequence is obtained. On the other hand the mathematical complexity of the algorithm only depends on the used code, not on the number of transmission errors. To reduce the complexity of the decoding process for good transmission conditions, an alternative syndrome based decoder is presented. The reduction of complexity is realized by two different approaches, the syndrome zero sequence deactivation and the path metric equalization. The two approaches enable an easy adaptation of the decoding complexity for different transmission conditions, which results in a trade-off between decoding complexity and error correction performance.
Short RNA indicator sequences are not completely degraded by autoclaving

PubMed Central

Unnithan, Veena V.; Unc, Adrian; Joe, Valerisa; Smith, Geoffrey B.

2014-01-01

Short indicator RNA sequences (<100 bp) persist after autoclaving and are recovered intact by molecular amplification. Primers targeting longer sequences are most likely to produce false positives due to amplification errors easily verified by melting curves analyses. If short indicator RNA sequences are used for virus identification and quantification then post autoclave RNA degradation methodology should be employed, which may include further autoclaving. PMID:24518856
Acquisition of initial /s/-stop and stop-/s/sequences in Greek.

PubMed

Syrika, Asimina; Nicolaidis, Katerina; Edwards, Jan; Beckman, Mary E

2011-09-01

Previous work on children's acquisition of complex sequences points to a tendency for affricates to be acquired before clusters, but there is no clear evidence of a difference in order of acquisition between clusters with /s/ that violate the Sonority Sequencing Principle (SSP), such as /s/ followed by stop in onset position, and other clusters that obey the SSP. One problem with studies that have compared the acquisition of SSP-obeying and SSP-violating clusters is that the component sounds in the two types of sequences were different.This paper examines the acquisition of initial /s/-stop and stop-/s/ sequences by sixty Greek children aged 2 through 5 years. Results showed greater accuracy for the /s/-stop relative to the stop-/s/ sequences, but no difference in accuracy between /ts/, which is usually analyzed as an affricate in Greek, and the other stop-/s/ sequences. Moreover, errors for the /s/-stop sequences and /ts/ primarily involved stop substitutions, whereas errors for /ps/ and /ks/ were more variable and often involved fricative substitutions, a pattern which may have a perceptual explanation. Finally, /ts/ showed a distinct temporal pattern relative to the stop-/s/ clusters /ps/ and /ks/, similar to what has been reported for productions of Greek adults.
Acquisition of initial /s/-stop and stop-/s/ sequences in Greek

PubMed Central

Syrika, Asimina; Nicolaidis, Katerina; Edwards, Jan; Beckman, Mary E.

2010-01-01

Previous work on children’s acquisition of complex sequences points to a tendency for affricates to be acquired before clusters, but there is no clear evidence of a difference in order of acquisition between clusters with /s/ that violate the Sonority Sequencing Principle (SSP), such as /s/ followed by stop in onset position, and other clusters that obey the SSP. One problem with studies that have compared the acquisition of SSP-obeying and SSP-violating clusters is that the component sounds in the two types of sequences were different. This paper examines the acquisition of initial /s/-stop and stop-/s/ sequences by sixty Greek children aged 2 through 5 years. Results showed greater accuracy for the /s/-stop relative to the stop-/s/ sequences, but no difference in accuracy between /ts/, which is usually analyzed as an affricate in Greek, and the other stop-/s/ sequences. Moreover, errors for the /s/-stop sequences and /ts/ primarily involved stop substitutions, whereas errors for /ps/ and /ks/ were more variable and often involved fricative substitutions, a pattern which may have a perceptual explanation. Finally, /ts/ showed a distinct temporal pattern relative to the stop-/s/ clusters /ps/ and /ks/, similarly to what has been reported for productions of Greek adults. PMID:22070044
Ultrasound biofeedback treatment for persisting childhood apraxia of speech.

PubMed

Preston, Jonathan L; Brick, Nickole; Landi, Nicole

2013-11-01

The purpose of this study was to evaluate the efficacy of a treatment program that includes ultrasound biofeedback for children with persisting speech sound errors associated with childhood apraxia of speech (CAS). Six children ages 9-15 years participated in a multiple baseline experiment for 18 treatment sessions during which treatment focused on producing sequences involving lingual sounds. Children were cued to modify their tongue movements using visual feedback from real-time ultrasound images. Probe data were collected before, during, and after treatment to assess word-level accuracy for treated and untreated sound sequences. As participants reached preestablished performance criteria, new sequences were introduced into treatment. All participants met the performance criterion (80% accuracy for 2 consecutive sessions) on at least 2 treated sound sequences. Across the 6 participants, performance criterion was met for 23 of 31 treated sequences in an average of 5 sessions. Some participants showed no improvement in untreated sequences, whereas others showed generalization to untreated sequences that were phonetically similar to the treated sequences. Most gains were maintained 2 months after the end of treatment. The percentage of phonemes correct increased significantly from pretreatment to the 2-month follow-up. A treatment program including ultrasound biofeedback is a viable option for improving speech sound accuracy in children with persisting speech sound errors associated with CAS.
Implicit and explicit motor sequence learning in children born very preterm.

PubMed

Jongbloed-Pereboom, Marjolein; Janssen, Anjo J W M; Steiner, K; Steenbergen, Bert; Nijhuis-van der Sanden, Maria W G

2017-01-01

Motor skills can be learned explicitly (dependent on working memory (WM)) or implicitly (relatively independent of WM). Children born very preterm (VPT) often have working memory deficits. Explicit learning may be compromised in these children. This study investigated implicit and explicit motor learning and the role of working memory in VPT children and controls. Three groups (6-9 years) participated: 20 VPT children with motor problems, 20 VPT children without motor problems, and 20 controls. A nine button sequence was learned implicitly (pressing the lighted button as quickly as possible) and explicitly (discovering the sequence via trial-and-error). Children learned implicitly and explicitly, evidenced by decreased movement duration of the sequence over time. In the explicit condition, children also reduced the number of errors over time. Controls made more errors than VPT children without motor problems. Visual WM had positive effects on both explicit and implicit performance. VPT birth and low motor proficiency did not negatively affect implicit or explicit learning. Visual WM was positively related to both implicit and explicit performance, but did not influence learning curves. These findings question the theoretical difference between implicit and explicit learning and the proposed role of visual WM therein. Copyright © 2016 Elsevier Ltd. All rights reserved.
Physical layer one-time-pad data encryption through synchronized semiconductor laser networks

NASA Astrophysics Data System (ADS)

Argyris, Apostolos; Pikasis, Evangelos; Syvridis, Dimitris

2016-02-01

Semiconductor lasers (SL) have been proven to be a key device in the generation of ultrafast true random bit streams. Their potential to emit chaotic signals under conditions with desirable statistics, establish them as a low cost solution to cover various needs, from large volume key generation to real-time encrypted communications. Usually, only undemanding post-processing is needed to convert the acquired analog timeseries to digital sequences that pass all established tests of randomness. A novel architecture that can generate and exploit these true random sequences is through a fiber network in which the nodes are semiconductor lasers that are coupled and synchronized to central hub laser. In this work we show experimentally that laser nodes in such a star network topology can synchronize with each other through complex broadband signals that are the seed to true random bit sequences (TRBS) generated at several Gb/s. The potential for each node to access real-time generated and synchronized with the rest of the nodes random bit streams, through the fiber optic network, allows to implement an one-time-pad encryption protocol that mixes the synchronized true random bit sequence with real data at Gb/s rates. Forward-error correction methods are used to reduce the errors in the TRBS and the final error rate at the data decoding level. An appropriate selection in the sampling methodology and properties, as well as in the physical properties of the chaotic seed signal through which network locks in synchronization, allows an error free performance.
The Language of Scholarship: How to Rapidly Locate and Avoid Common APA Errors.

PubMed

Freysteinson, Wyona M; Krepper, Rebecca; Mellott, Susan

2015-10-01

This article is relevant for nurses and nursing students who are writing scholarly documents for work, school, or publication and who have a basic understanding of American Psychological Association (APA) style. Common APA errors on the reference list and in citations within the text are reviewed. Methods to quickly find and reduce those errors are shared. Copyright 2015, SLACK Incorporated.
What Do Spelling Errors Tell Us? Classification and Analysis of Errors Made by Greek Schoolchildren with and without Dyslexia

ERIC Educational Resources Information Center

Protopapas, Athanassios; Fakou, Aikaterini; Drakopoulou, Styliani; Skaloumbakas, Christos; Mouzaki, Angeliki

2013-01-01

In this study we propose a classification system for spelling errors and determine the most common spelling difficulties of Greek children with and without dyslexia. Spelling skills of 542 children from the general population and 44 children with dyslexia, Grades 3-4 and 7, were assessed with a dictated common word list and age-appropriate…
Generalized Structured Component Analysis with Uniqueness Terms for Accommodating Measurement Error

PubMed Central

Hwang, Heungsun; Takane, Yoshio; Jung, Kwanghee

2017-01-01

Generalized structured component analysis (GSCA) is a component-based approach to structural equation modeling (SEM), where latent variables are approximated by weighted composites of indicators. It has no formal mechanism to incorporate errors in indicators, which in turn renders components prone to the errors as well. We propose to extend GSCA to account for errors in indicators explicitly. This extension, called GSCAM, considers both common and unique parts of indicators, as postulated in common factor analysis, and estimates a weighted composite of indicators with their unique parts removed. Adding such unique parts or uniqueness terms serves to account for measurement errors in indicators in a manner similar to common factor analysis. Simulation studies are conducted to compare parameter recovery of GSCAM and existing methods. These methods are also applied to fit a substantively well-established model to real data. PMID:29270146
JPL-ANTOPT antenna structure optimization program

NASA Technical Reports Server (NTRS)

Strain, D. M.

1994-01-01

New antenna path-length error and pointing-error structure optimization codes were recently added to the MSC/NASTRAN structural analysis computer program. Path-length and pointing errors are important measured of structure-related antenna performance. The path-length and pointing errors are treated as scalar displacements for statics loading cases. These scalar displacements can be subject to constraint during the optimization process. Path-length and pointing-error calculations supplement the other optimization and sensitivity capabilities of NASTRAN. The analysis and design functions were implemented as 'DMAP ALTERs' to the Design Optimization (SOL 200) Solution Sequence of MSC-NASTRAN, Version 67.5.
Round-off errors in cutting plane algorithms based on the revised simplex procedure

NASA Technical Reports Server (NTRS)

Moore, J. E.

1973-01-01

This report statistically analyzes computational round-off errors associated with the cutting plane approach to solving linear integer programming problems. Cutting plane methods require that the inverse of a sequence of matrices be computed. The problem basically reduces to one of minimizing round-off errors in the sequence of inverses. Two procedures for minimizing this problem are presented, and their influence on error accumulation is statistically analyzed. One procedure employs a very small tolerance factor to round computed values to zero. The other procedure is a numerical analysis technique for reinverting or improving the approximate inverse of a matrix. The results indicated that round-off accumulation can be effectively minimized by employing a tolerance factor which reflects the number of significant digits carried for each calculation and by applying the reinversion procedure once to each computed inverse. If 18 significant digits plus an exponent are carried for each variable during computations, then a tolerance value of 0.1 x 10 to the minus 12th power is reasonable.
QSRA: a quality-value guided de novo short read assembler.

PubMed

Bryant, Douglas W; Wong, Weng-Keen; Mockler, Todd C

2009-02-24

New rapid high-throughput sequencing technologies have sparked the creation of a new class of assembler. Since all high-throughput sequencing platforms incorporate errors in their output, short-read assemblers must be designed to account for this error while utilizing all available data. We have designed and implemented an assembler, Quality-value guided Short Read Assembler, created to take advantage of quality-value scores as a further method of dealing with error. Compared to previous published algorithms, our assembler shows significant improvements not only in speed but also in output quality. QSRA generally produced the highest genomic coverage, while being faster than VCAKE. QSRA is extremely competitive in its longest contig and N50/N80 contig lengths, producing results of similar quality to those of EDENA and VELVET. QSRA provides a step closer to the goal of de novo assembly of complex genomes, improving upon the original VCAKE algorithm by not only drastically reducing runtimes but also increasing the viability of the assembly algorithm through further error handling capabilities.
Medication errors as malpractice-a qualitative content analysis of 585 medication errors by nurses in Sweden.

PubMed

Björkstén, Karin Sparring; Bergqvist, Monica; Andersén-Karlsson, Eva; Benson, Lina; Ulfvarson, Johanna

2016-08-24

Many studies address the prevalence of medication errors but few address medication errors serious enough to be regarded as malpractice. Other studies have analyzed the individual and system contributory factor leading to a medication error. Nurses have a key role in medication administration, and there are contradictory reports on the nurses' work experience in relation to the risk and type for medication errors. All medication errors where a nurse was held responsible for malpractice (n = 585) during 11 years in Sweden were included. A qualitative content analysis and classification according to the type and the individual and system contributory factors was made. In order to test for possible differences between nurses' work experience and associations within and between the errors and contributory factors, Fisher's exact test was used, and Cohen's kappa (k) was performed to estimate the magnitude and direction of the associations. There were a total of 613 medication errors in the 585 cases, the most common being "Wrong dose" (41 %), "Wrong patient" (13 %) and "Omission of drug" (12 %). In 95 % of the cases, an average of 1.4 individual contributory factors was found; the most common being "Negligence, forgetfulness or lack of attentiveness" (68 %), "Proper protocol not followed" (25 %), "Lack of knowledge" (13 %) and "Practice beyond scope" (12 %). In 78 % of the cases, an average of 1.7 system contributory factors was found; the most common being "Role overload" (36 %), "Unclear communication or orders" (30 %) and "Lack of adequate access to guidelines or unclear organisational routines" (30 %). The errors "Wrong patient due to mix-up of patients" and "Wrong route" and the contributory factors "Lack of knowledge" and "Negligence, forgetfulness or lack of attentiveness" were more common in less experienced nurses. The experienced nurses were more prone to "Practice beyond scope of practice" and to make errors in spite of "Lack of adequate access to guidelines or unclear organisational routines". Medication errors regarded as malpractice in Sweden were of the same character as medication errors worldwide. A complex interplay between individual and system factors often contributed to the errors.
Explicit instruction of rules interferes with visuomotor skill transfer.

PubMed

Tanaka, Kanji; Watanabe, Katsumi

2017-06-01

In the present study, we examined the effects of explicit knowledge, obtained through instruction or spontaneous detection, on the transfer of visuomotor sequence learning. In the learning session, participants learned a visuomotor sequence, via trial and error. In the transfer session, the order of the sequence was reversed from that of the learning session. Before the commencement of the transfer session, some participants received explicit instruction regarding the reversal rule (i.e., Instruction group), while the others did not receive any information and were sorted into either an Aware or Unaware group, as assessed by interview conducted after the transfer session. Participants in the Instruction and Aware groups performed with fewer errors than the Unaware group in the transfer session. The participants in the Instruction group showed slower speed than the Aware and Unaware groups in the transfer session, and the sluggishness likely persisted even in late learning. These results suggest that explicit knowledge reduces errors in visuomotor skill transfer, but may interfere with performance speed, particularly when explicit knowledge is provided, as opposed to being spontaneously discovered.
Mapping DNA methylation by transverse current sequencing: Reduction of noise from neighboring nucleotides

NASA Astrophysics Data System (ADS)

Alvarez, Jose; Massey, Steven; Kalitsov, Alan; Velev, Julian

Nanopore sequencing via transverse current has emerged as a competitive candidate for mapping DNA methylation without needed bisulfite-treatment, fluorescent tag, or PCR amplification. By eliminating the error producing amplification step, long read lengths become feasible, which greatly simplifies the assembly process and reduces the time and the cost inherent in current technologies. However, due to the large error rates of nanopore sequencing, single base resolution has not been reached. A very important source of noise is the intrinsic structural noise in the electric signature of the nucleotide arising from the influence of neighboring nucleotides. In this work we perform calculations of the tunneling current through DNA molecules in nanopores using the non-equilibrium electron transport method within an effective multi-orbital tight-binding model derived from first-principles calculations. We develop a base-calling algorithm accounting for the correlations of the current through neighboring bases, which in principle can reduce the error rate below any desired precision. Using this method we show that we can clearly distinguish DNA methylation and other base modifications based on the reading of the tunneling current.
Prevalence of refractive errors in Möbius sequence.

PubMed

Cronemberger, Monica Fialho; Polati, Mariza; Debert, Iara; Mendonça, Tomás Scalamandré; Souza-Dias, Carlos; Miller, Marilyn; Ventura, Liana Oliveira; Nakanami, Célia Regina; Goldchmit, Mauro

2013-01-01

To assess the prevalence of refractive errors in Möbius sequence. This study was carried out during the Annual Meeting of the Brazilian Möbius Society in November 2008. Forty-four patients diagnosed with the Möbius sequence were submitted to a comprehensive assessment, on the following specialties: ophthalmology, neurology, genetics, psychiatry, psychology and dentistry. Forty-three patients were cooperative and able to undertake the ophthalmological examination. Twenty-two (51.2 %) were male and 21 (48.8%) were female. The average age was 8.3 years (from 2 to 17 years). The visual acuity was evaluated using a retro-illuminated logMAR chart in cooperative patients. All children were submitted to exams on ocular motility, cyclopegic refraction, and fundus examination. From the total of 85 eyes, using the spherical equivalent, the major of the eyes (57.6%) were emmetropics (>-0.50 D and <+2.00 D). The prevalence of astigmatism greater than or equal to 0.75 D was 40%. The prevalence of refractive errors, by the spherical equivalent, was 42.4% in this studied group.
A short note on dynamic programming in a band.

PubMed

Gibrat, Jean-François

2018-06-15

Third generation sequencing technologies generate long reads that exhibit high error rates, in particular for insertions and deletions which are usually the most difficult errors to cope with. The only exact algorithm capable of aligning sequences with insertions and deletions is a dynamic programming algorithm. In this note, for the sake of efficiency, we consider dynamic programming in a band. We show how to choose the band width in function of the long reads' error rates, thus obtaining an [Formula: see text] algorithm in space and time. We also propose a procedure to decide whether this algorithm, when applied to semi-global alignments, provides the optimal score. We suggest that dynamic programming in a band is well suited to the problem of aligning long reads between themselves and can be used as a core component of methods for obtaining a consensus sequence from the long reads alone. The function implementing the dynamic programming algorithm in a band is available, as a standalone program, at: https://forgemia.inra.fr/jean-francois.gibrat/BAND_DYN_PROG.git.

First order error corrections in common introductory physics experiments

NASA Astrophysics Data System (ADS)

Beckey, Jacob; Baker, Andrew; Aravind, Vasudeva; Clarion Team

As a part of introductory physics courses, students perform different standard lab experiments. Almost all of these experiments are prone to errors owing to factors like friction, misalignment of equipment, air drag, etc. Usually these types of errors are ignored by students and not much thought is paid to the source of these errors. However, paying attention to these factors that give rise to errors help students make better physics models and understand physical phenomena behind experiments in more detail. In this work, we explore common causes of errors in introductory physics experiment and suggest changes that will mitigate the errors, or suggest models that take the sources of these errors into consideration. This work helps students build better and refined physical models and understand physics concepts in greater detail. We thank Clarion University undergraduate student grant for financial support involving this project.
Vocal Generalization Depends on Gesture Identity and Sequence

PubMed Central

Sober, Samuel J.

2014-01-01

Generalization, the brain's ability to transfer motor learning from one context to another, occurs in a wide range of complex behaviors. However, the rules of generalization in vocal behavior are poorly understood, and it is unknown how vocal learning generalizes across an animal's entire repertoire of natural vocalizations and sequences. Here, we asked whether generalization occurs in a nonhuman vocal learner and quantified its properties. We hypothesized that adaptive error correction of a vocal gesture produced in one sequence would generalize to the same gesture produced in other sequences. To test our hypothesis, we manipulated the fundamental frequency (pitch) of auditory feedback in Bengalese finches (Lonchura striata var. domestica) to create sensory errors during vocal gestures (song syllables) produced in particular sequences. As hypothesized, error-corrective learning on pitch-shifted vocal gestures generalized to the same gestures produced in other sequential contexts. Surprisingly, generalization magnitude depended strongly on sequential distance from the pitch-shifted syllables, with greater adaptation for gestures produced near to the pitch-shifted syllable. A further unexpected result was that nonshifted syllables changed their pitch in the direction opposite from the shifted syllables. This apparently antiadaptive pattern of generalization could not be explained by correlations between generalization and the acoustic similarity to the pitch-shifted syllable. These findings therefore suggest that generalization depends on the type of vocal gesture and its sequential context relative to other gestures and may reflect an advantageous strategy for vocal learning and maintenance. PMID:24741046
Aging and the intrusion superiority effect in visuo-spatial working memory.

PubMed

Cornoldi, Cesare; Bassani, Chiara; Berto, Rita; Mammarella, Nicola

2007-01-01

This study investigated the active component of visuo-spatial working memory (VSWM) in younger and older adults testing the hypotheses that elderly individuals have a poorer performance than younger ones and that errors in active VSWM tasks depend, at least partially, on difficulties in avoiding intrusions (i.e., avoiding already activated information). In two experiments, participants were presented with sequences of matrices on which three positions were pointed out sequentially: their task was to process all the positions but indicate only the final position of each sequence. Results showed a poorer performance in the elderly compared to the younger group and a higher number of intrusion (errors due to activated but irrelevant positions) rather than invention (errors consisting of pointing out a position never indicated by the experiementer) errors. The number of errors increased when a concurrent task was introduced (Experiment 1) and it was affected by different patterns of matrices (Experiment 2). In general, results show that elderly people have an impaired VSWM and produce a large number of errors due to inhibition failures. However, both the younger and the older adults' visuo-spatial working memory was affected by the presence of activated irrelevant information, the reduction of the available resources, and task constraints.
An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data

PubMed Central

Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M.; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A. V. S. K.; Varshney, Rajeev K.

2014-01-01

Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone free software. PMID:25003610
Phonological learning in semantic dementia.

PubMed

Jefferies, Elizabeth; Bott, Samantha; Ehsan, Sheeba; Lambon Ralph, Matthew A

2011-04-01

Patients with semantic dementia (SD) have anterior temporal lobe (ATL) atrophy that gives rise to a highly selective deterioration of semantic knowledge. Despite pronounced anomia and poor comprehension of words and pictures, SD patients have well-formed, fluent speech and normal digit span. Given the intimate connection between phonological STM and word learning revealed by both neuropsychological and developmental studies, SD patients might be expected to show good acquisition of new phonological forms, even though their ability to map these onto meanings is impaired. In contradiction of these predictions, a limited amount of previous research has found poor learning of new phonological forms in SD. In a series of experiments, we examined whether SD patient, GE, could learn novel phonological sequences and, if so, under which circumstances. GE showed normal benefits of phonological knowledge in STM (i.e., normal phonotactic frequency and phonological similarity effects) but reduced support from semantic memory (i.e., poor immediate serial recall for semantically degraded words, characterised by frequent item errors). Next, we demonstrated normal learning of serial order information for repeated lists of single-digit number words using the Hebb paradigm: these items were well-understood allowing them to be repeated without frequent item errors. In contrast, patient GE showed little learning of nonsense syllable sequences using the same Hebb paradigm. Detailed analysis revealed that both GE and the controls showed a tendency to learn their own errors as opposed to the target items. Finally, we showed normal learning of phonological sequences for GE when he was prevented from repeating his errors. These findings confirm that the ATL atrophy in SD disrupts phonological processing for semantically degraded words but leaves the phonological architecture intact. Consequently, when item errors are minimised, phonological STM can support the acquisition of new phoneme sequences in patients with SD. Copyright © 2011 Elsevier Ltd. All rights reserved.
Data compression of discrete sequence: A tree based approach using dynamic programming

NASA Technical Reports Server (NTRS)

Shivaram, Gurusrasad; Seetharaman, Guna; Rao, T. R. N.

1994-01-01

A dynamic programming based approach for data compression of a ID sequence is presented. The compression of an input sequence of size N to that of a smaller size k is achieved by dividing the input sequence into k subsequences and replacing the subsequences by their respective average values. The partitioning of the input sequence is carried with the intention of reducing the mean squared error in the reconstructed sequence. The complexity involved in finding the partitions which would result in such an optimal compressed sequence is reduced by using the dynamic programming approach, which is presented.
Impact of Internally Developed Electronic Prescription on Prescribing Errors at Discharge from the Emergency Department

PubMed Central

Hitti, Eveline; Tamim, Hani; Bakhti, Rinad; Zebian, Dina; Mufarrij, Afif

2017-01-01

Introduction Medication errors are common, with studies reporting at least one error per patient encounter. At hospital discharge, medication errors vary from 15%–38%. However, studies assessing the effect of an internally developed electronic (E)-prescription system at discharge from an emergency department (ED) are comparatively minimal. Additionally, commercially available electronic solutions are cost-prohibitive in many resource-limited settings. We assessed the impact of introducing an internally developed, low-cost E-prescription system, with a list of commonly prescribed medications, on prescription error rates at discharge from the ED, compared to handwritten prescriptions. Methods We conducted a pre- and post-intervention study comparing error rates in a randomly selected sample of discharge prescriptions (handwritten versus electronic) five months pre and four months post the introduction of the E-prescription. The internally developed, E-prescription system included a list of 166 commonly prescribed medications with the generic name, strength, dose, frequency and duration. We included a total of 2,883 prescriptions in this study: 1,475 in the pre-intervention phase were handwritten (HW) and 1,408 in the post-intervention phase were electronic. We calculated rates of 14 different errors and compared them between the pre- and post-intervention period. Results Overall, E-prescriptions included fewer prescription errors as compared to HW-prescriptions. Specifically, E-prescriptions reduced missing dose (11.3% to 4.3%, p <0.0001), missing frequency (3.5% to 2.2%, p=0.04), missing strength errors (32.4% to 10.2%, p <0.0001) and legibility (0.7% to 0.2%, p=0.005). E-prescriptions, however, were associated with a significant increase in duplication errors, specifically with home medication (1.7% to 3%, p=0.02). Conclusion A basic, internally developed E-prescription system, featuring commonly used medications, effectively reduced medication errors in a low-resource setting where the costs of sophisticated commercial electronic solutions are prohibitive. PMID:28874948
Impact of Internally Developed Electronic Prescription on Prescribing Errors at Discharge from the Emergency Department.

PubMed

Hitti, Eveline; Tamim, Hani; Bakhti, Rinad; Zebian, Dina; Mufarrij, Afif

2017-08-01

Medication errors are common, with studies reporting at least one error per patient encounter. At hospital discharge, medication errors vary from 15%-38%. However, studies assessing the effect of an internally developed electronic (E)-prescription system at discharge from an emergency department (ED) are comparatively minimal. Additionally, commercially available electronic solutions are cost-prohibitive in many resource-limited settings. We assessed the impact of introducing an internally developed, low-cost E-prescription system, with a list of commonly prescribed medications, on prescription error rates at discharge from the ED, compared to handwritten prescriptions. We conducted a pre- and post-intervention study comparing error rates in a randomly selected sample of discharge prescriptions (handwritten versus electronic) five months pre and four months post the introduction of the E-prescription. The internally developed, E-prescription system included a list of 166 commonly prescribed medications with the generic name, strength, dose, frequency and duration. We included a total of 2,883 prescriptions in this study: 1,475 in the pre-intervention phase were handwritten (HW) and 1,408 in the post-intervention phase were electronic. We calculated rates of 14 different errors and compared them between the pre- and post-intervention period. Overall, E-prescriptions included fewer prescription errors as compared to HW-prescriptions. Specifically, E-prescriptions reduced missing dose (11.3% to 4.3%, p <0.0001), missing frequency (3.5% to 2.2%, p=0.04), missing strength errors (32.4% to 10.2%, p <0.0001) and legibility (0.7% to 0.2%, p=0.005). E-prescriptions, however, were associated with a significant increase in duplication errors, specifically with home medication (1.7% to 3%, p=0.02). A basic, internally developed E-prescription system, featuring commonly used medications, effectively reduced medication errors in a low-resource setting where the costs of sophisticated commercial electronic solutions are prohibitive.
Medical error and related factors during internship and residency.

PubMed

Ahmadipour, Habibeh; Nahid, Mortazavi

2015-01-01

It is difficult to determine the real incidence of medical errors due to the lack of a precise definition of errors, as well as the failure to report them under certain circumstances. We carried out a cross- sectional study in Kerman University of Medical Sciences, Iran in 2013. The participants were selected through the census method. The data were collected using a self-administered questionnaire, which consisted of questions on the participants' demographic data and questions on the medical errors committed. The data were analysed by SPSS 19. It was found that 270 participants had committed medical errors. There was no significant difference in the frequency of errors committed by interns and residents. In the case of residents, the most common error was misdiagnosis and in that of interns, errors related to history-taking and physical examination. Considering that medical errors are common in the clinical setting, the education system should train interns and residents to prevent the occurrence of errors. In addition, the system should develop a positive attitude among them so that they can deal better with medical errors.
Introduction to the Application of Web-Based Surveys.

ERIC Educational Resources Information Center

Timmerman, Annemarie

This paper discusses some basic assumptions and issues concerning web-based surveys. Discussion includes: assumptions regarding cost and ease of use; disadvantages of web-based surveys, concerning the inability to compensate for four common errors of survey research: coverage error, sampling error, measurement error and nonresponse error; and…
Making sense of deep sequencing

PubMed Central

Goldman, D.; Domschke, K.

2016-01-01

This review, the first of an occasional series, tries to make sense of the concepts and uses of deep sequencing of polynucleic acids (DNA and RNA). Deep sequencing, synonymous with next-generation sequencing, high-throughput sequencing and massively parallel sequencing, includes whole genome sequencing but is more often and diversely applied to specific parts of the genome captured in different ways, for example the highly expressed portion of the genome known as the exome and portions of the genome that are epigenetically marked either by DNA methylation, the binding of proteins including histones, or that are in different configurations and thus more or less accessible to enzymes that cleave DNA. Deep sequencing of RNA (RNASeq) reverse-transcribed to complementary DNA is invaluable for measuring RNA expression and detecting changes in RNA structure. Important concepts in deep sequencing include the length and depth of sequence reads, mapping and assembly of reads, sequencing error, haplotypes, and the propensity of deep sequencing, as with other types of ‘big data’, to generate large numbers of errors, requiring monitoring for methodologic biases and strategies for replication and validation. Deep sequencing yields a unique genetic fingerprint that can be used to identify a person, and a trove of predictors of genetic medical diseases. Deep sequencing to identify epigenetic events including changes in DNA methylation and RNA expression can reveal the history and impact of environmental exposures. Because of the power of sequencing to identify and deliver biomedically significant information about a person and their blood relatives, it creates ethical dilemmas and practical challenges in research and clinical care, for example the decision and procedures to report incidental findings that will increasingly and frequently be discovered. PMID:24925306
Optimizing symmetry-based recoupling sequences in solid-state NMR by pulse-transient compensation and asynchronous implementation

NASA Astrophysics Data System (ADS)

Hellwagner, Johannes; Sharma, Kshama; Tan, Kong Ooi; Wittmann, Johannes J.; Meier, Beat H.; Madhu, P. K.; Ernst, Matthias

2017-06-01

Pulse imperfections like pulse transients and radio-frequency field maladjustment or inhomogeneity are the main sources of performance degradation and limited reproducibility in solid-state nuclear magnetic resonance experiments. We quantitatively analyze the influence of such imperfections on the performance of symmetry-based pulse sequences and describe how they can be compensated. Based on a triple-mode Floquet analysis, we develop a theoretical description of symmetry-based dipolar recoupling sequences, in particular, R2 6411, calculating first- and second-order effective Hamiltonians using real pulse shapes. We discuss the various origins of effective fields, namely, pulse transients, deviation from the ideal flip angle, and fictitious fields, and develop strategies to counteract them for the restoration of full transfer efficiency. We compare experimental applications of transient-compensated pulses and an asynchronous implementation of the sequence to a supercycle, SR26, which is known to be efficient in compensating higher-order error terms. We are able to show the superiority of R26 compared to the supercycle, SR26, given the ability to reduce experimental error on the pulse sequence by pulse-transient compensation and a complete theoretical understanding of the sequence.
Random noise effects in pulse-mode digital multilayer neural networks.

PubMed

Kim, Y C; Shanblatt, M A

1995-01-01

A pulse-mode digital multilayer neural network (DMNN) based on stochastic computing techniques is implemented with simple logic gates as basic computing elements. The pulse-mode signal representation and the use of simple logic gates for neural operations lead to a massively parallel yet compact and flexible network architecture, well suited for VLSI implementation. Algebraic neural operations are replaced by stochastic processes using pseudorandom pulse sequences. The distributions of the results from the stochastic processes are approximated using the hypergeometric distribution. Synaptic weights and neuron states are represented as probabilities and estimated as average pulse occurrence rates in corresponding pulse sequences. A statistical model of the noise (error) is developed to estimate the relative accuracy associated with stochastic computing in terms of mean and variance. Computational differences are then explained by comparison to deterministic neural computations. DMNN feedforward architectures are modeled in VHDL using character recognition problems as testbeds. Computational accuracy is analyzed, and the results of the statistical model are compared with the actual simulation results. Experiments show that the calculations performed in the DMNN are more accurate than those anticipated when Bernoulli sequences are assumed, as is common in the literature. Furthermore, the statistical model successfully predicts the accuracy of the operations performed in the DMNN.
Computing prokaryotic gene ubiquity: rescuing the core from extinction.

PubMed

Charlebois, Robert L; Doolittle, W Ford

2004-12-01

The genomic core concept has found several uses in comparative and evolutionary genomics. Defined as the set of all genes common to (ubiquitous among) all genomes in a phylogenetically coherent group, core size decreases as the number and phylogenetic diversity of the relevant group increases. Here, we focus on methods for defining the size and composition of the core of all genes shared by sequenced genomes of prokaryotes (Bacteria and Archaea). There are few (almost certainly less than 50) genes shared by all of the 147 genomes compared, surely insufficient to conduct all essential functions. Sequencing and annotation errors are responsible for the apparent absence of some genes, while very limited but genuine disappearances (from just one or a few genomes) can account for several others. Core size will continue to decrease as more genome sequences appear, unless the requirement for ubiquity is relaxed. Such relaxation seems consistent with any reasonable biological purpose for seeking a core, but it renders the problem of definition more problematic. We propose an alternative approach (the phylogenetically balanced core), which preserves some of the biological utility of the core concept. Cores, however delimited, preferentially contain informational rather than operational genes; we present a new hypothesis for why this might be so.
Blood flow quantification using 1D CFD parameter identification

NASA Astrophysics Data System (ADS)

Brosig, Richard; Kowarschik, Markus; Maday, Peter; Katouzian, Amin; Demirci, Stefanie; Navab, Nassir

2014-03-01

Patient-specific measurements of cerebral blood flow provide valuable diagnostic information concerning cerebrovascular diseases rather than visually driven qualitative evaluation. In this paper, we present a quantitative method to estimate blood flow parameters with high temporal resolution from digital subtraction angiography (DSA) image sequences. Using a 3D DSA dataset and a 2D+t DSA sequence, the proposed algorithm employs a 1D Computational Fluid Dynamics (CFD) model for estimation of time-dependent flow values along a cerebral vessel, combined with an additional Advection Diffusion Equation (ADE) for contrast agent propagation. The CFD system, followed by the ADE, is solved with a finite volume approximation, which ensures the conservation of mass. Instead of defining a new imaging protocol to obtain relevant data, our cost function optimizes the bolus arrival time (BAT) of the contrast agent in 2D+t DSA sequences. The visual determination of BAT is common clinical practice and can be easily derived from and be compared to values, generated by a 1D-CFD simulation. Using this strategy, we ensure that our proposed method fits best to clinical practice and does not require any changes to the medical work flow. Synthetic experiments show that the recovered flow estimates match the ground truth values with less than 12% error in the mean flow rates.
Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data

PubMed Central

2016-01-01

Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted ‘glmnet’). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the sample Pearson correlation between observed and imputed genotype dosages at the site and individual level; computation time served as a second metric for comparison. We then set out to examine factors affecting imputation accuracy, such as levels of missing data, read depth, minor allele frequency (MAF), and reference panel composition. PMID:27537694
Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.

PubMed

Chan, Ariel W; Hamblin, Martha T; Jannink, Jean-Luc

2016-01-01

Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted 'glmnet'). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the sample Pearson correlation between observed and imputed genotype dosages at the site and individual level; computation time served as a second metric for comparison. We then set out to examine factors affecting imputation accuracy, such as levels of missing data, read depth, minor allele frequency (MAF), and reference panel composition.
Monitoring Error Rates In Illumina Sequencing.

PubMed

Manley, Leigh J; Ma, Duanduan; Levine, Stuart S

2016-12-01

Guaranteeing high-quality next-generation sequencing data in a rapidly changing environment is an ongoing challenge. The introduction of the Illumina NextSeq 500 and the depreciation of specific metrics from Illumina's Sequencing Analysis Viewer (SAV; Illumina, San Diego, CA, USA) have made it more difficult to determine directly the baseline error rate of sequencing runs. To improve our ability to measure base quality, we have created an open-source tool to construct the Percent Perfect Reads (PPR) plot, previously provided by the Illumina sequencers. The PPR program is compatible with HiSeq 2000/2500, MiSeq, and NextSeq 500 instruments and provides an alternative to Illumina's quality value (Q) scores for determining run quality. Whereas Q scores are representative of run quality, they are often overestimated and are sourced from different look-up tables for each platform. The PPR's unique capabilities as a cross-instrument comparison device, as a troubleshooting tool, and as a tool for monitoring instrument performance can provide an increase in clarity over SAV metrics that is often crucial for maintaining instrument health. These capabilities are highlighted.
A Likelihood-Based Framework for Association Analysis of Allele-Specific Copy Numbers.

PubMed

Hu, Y J; Lin, D Y; Sun, W; Zeng, D

2014-10-01

Copy number variants (CNVs) and single nucleotide polymorphisms (SNPs) co-exist throughout the human genome and jointly contribute to phenotypic variations. Thus, it is desirable to consider both types of variants, as characterized by allele-specific copy numbers (ASCNs), in association studies of complex human diseases. Current SNP genotyping technologies capture the CNV and SNP information simultaneously via fluorescent intensity measurements. The common practice of calling ASCNs from the intensity measurements and then using the ASCN calls in downstream association analysis has important limitations. First, the association tests are prone to false-positive findings when differential measurement errors between cases and controls arise from differences in DNA quality or handling. Second, the uncertainties in the ASCN calls are ignored. We present a general framework for the integrated analysis of CNVs and SNPs, including the analysis of total copy numbers as a special case. Our approach combines the ASCN calling and the association analysis into a single step while allowing for differential measurement errors. We construct likelihood functions that properly account for case-control sampling and measurement errors. We establish the asymptotic properties of the maximum likelihood estimators and develop EM algorithms to implement the corresponding inference procedures. The advantages of the proposed methods over the existing ones are demonstrated through realistic simulation studies and an application to a genome-wide association study of schizophrenia. Extensions to next-generation sequencing data are discussed.
Estimating Water and Heat Fluxes with a Four-dimensional Weak-constraint Variational Data Assimilation Approach

NASA Astrophysics Data System (ADS)

Bateni, S. M.; Xu, T.

2015-12-01

Accurate estimation of water and heat fluxes is required for irrigation scheduling, weather prediction, and water resources planning and management. A weak-constraint variational data assimilation (WC-VDA) scheme is developed to estimate water and heat fluxes by assimilating sequences of land surface temperature (LST) observations. The commonly used strong-constraint VDA systems adversely affect the accuracy of water and heat flux estimates as they assume the model is perfect. The WC-VDA approach accounts for structural and model errors and generates more accurate results via adding a model error term into the surface energy balance equation. The two key unknown parameters of the WC-VDA system (i.e., CHN, the bulk heat transfer coefficient and EF, evaporative fraction) and the model error term are optimized by minimizing the cost function. The WC-VDA model was tested at two sites with contrasting hydrological and vegetative conditions: the Daman site (a wet site located in an oasis area and covered by seeded corn) and the Huazhaizi site (a dry site located in a desert area and covered by sparse grass) in middle stream of Heihe river basin, northwest China. Compared to the strong-constraint VDA system, the WC-VDA method generates more accurate estimates of water and energy fluxes over the desert and oasis sites with dry and wet conditions.

Phase Error Correction in Time-Averaged 3D Phase Contrast Magnetic Resonance Imaging of the Cerebral Vasculature

PubMed Central

MacDonald, M. Ethan; Forkert, Nils D.; Pike, G. Bruce; Frayne, Richard

2016-01-01

Purpose Volume flow rate (VFR) measurements based on phase contrast (PC)-magnetic resonance (MR) imaging datasets have spatially varying bias due to eddy current induced phase errors. The purpose of this study was to assess the impact of phase errors in time averaged PC-MR imaging of the cerebral vasculature and explore the effects of three common correction schemes (local bias correction (LBC), local polynomial correction (LPC), and whole brain polynomial correction (WBPC)). Methods Measurements of the eddy current induced phase error from a static phantom were first obtained. In thirty healthy human subjects, the methods were then assessed in background tissue to determine if local phase offsets could be removed. Finally, the techniques were used to correct VFR measurements in cerebral vessels and compared statistically. Results In the phantom, phase error was measured to be <2.1 ml/s per pixel and the bias was reduced with the correction schemes. In background tissue, the bias was significantly reduced, by 65.6% (LBC), 58.4% (LPC) and 47.7% (WBPC) (p < 0.001 across all schemes). Correction did not lead to significantly different VFR measurements in the vessels (p = 0.997). In the vessel measurements, the three correction schemes led to flow measurement differences of -0.04 ± 0.05 ml/s, 0.09 ± 0.16 ml/s, and -0.02 ± 0.06 ml/s. Although there was an improvement in background measurements with correction, there was no statistical difference between the three correction schemes (p = 0.242 in background and p = 0.738 in vessels). Conclusions While eddy current induced phase errors can vary between hardware and sequence configurations, our results showed that the impact is small in a typical brain PC-MR protocol and does not have a significant effect on VFR measurements in cerebral vessels. PMID:26910600
The Effects of Wiggler Errors on Free Electron Laser Performance

DTIC Science & Technology

1990-04-02

phase deviation at the end of the wiggler by 113. The detrimental effects of wiggler errors may be reduced by arranging the magent poles in an optimal...fdz6BI. To meet these specifications, the vendor may arrange the mIagnet pole iD an optimum sequence such that If dz6BI is minimized. The present research...zc a- A,,/2. By considering a wiggler in which the error for a given magnet pole is correlated to the errors of the surrounding poles , one may
Image navigation and registration performance assessment tool set for the GOES-R Advanced Baseline Imager and Geostationary Lightning Mapper

NASA Astrophysics Data System (ADS)

De Luccia, Frank J.; Houchin, Scott; Porter, Brian C.; Graybill, Justin; Haas, Evan; Johnson, Patrick D.; Isaacson, Peter J.; Reth, Alan D.

2016-05-01

The GOES-R Flight Project has developed an Image Navigation and Registration (INR) Performance Assessment Tool Set (IPATS) for measuring Advanced Baseline Imager (ABI) and Geostationary Lightning Mapper (GLM) INR performance metrics in the post-launch period for performance evaluation and long term monitoring. For ABI, these metrics are the 3-sigma errors in navigation (NAV), channel-to-channel registration (CCR), frame-to-frame registration (FFR), swath-to-swath registration (SSR), and within frame registration (WIFR) for the Level 1B image products. For GLM, the single metric of interest is the 3-sigma error in the navigation of background images (GLM NAV) used by the system to navigate lightning strikes. 3-sigma errors are estimates of the 99. 73rd percentile of the errors accumulated over a 24 hour data collection period. IPATS utilizes a modular algorithmic design to allow user selection of data processing sequences optimized for generation of each INR metric. This novel modular approach minimizes duplication of common processing elements, thereby maximizing code efficiency and speed. Fast processing is essential given the large number of sub-image registrations required to generate INR metrics for the many images produced over a 24 hour evaluation period. Another aspect of the IPATS design that vastly reduces execution time is the off-line propagation of Landsat based truth images to the fixed grid coordinates system for each of the three GOES-R satellite locations, operational East and West and initial checkout locations. This paper describes the algorithmic design and implementation of IPATS and provides preliminary test results.
Image Navigation and Registration (INR) Performance Assessment Tool Set (IPATS) for the GOES-R Advanced Baseline Imager and Geostationary Lightning Mapper

NASA Technical Reports Server (NTRS)

DeLuccia, Frank J.; Houchin, Scott; Porter, Brian C.; Graybill, Justin; Haas, Evan; Johnson, Patrick D.; Isaacson, Peter J.; Reth, Alan D.

2016-01-01

The GOES-R Flight Project has developed an Image Navigation and Registration (INR) Performance Assessment Tool Set (IPATS) for measuring Advanced Baseline Imager (ABI) and Geostationary Lightning Mapper (GLM) INR performance metrics in the post-launch period for performance evaluation and long term monitoring. For ABI, these metrics are the 3-sigma errors in navigation (NAV), channel-to-channel registration (CCR), frame-to-frame registration (FFR), swath-to-swath registration (SSR), and within frame registration (WIFR) for the Level 1B image products. For GLM, the single metric of interest is the 3-sigma error in the navigation of background images (GLM NAV) used by the system to navigate lightning strikes. 3-sigma errors are estimates of the 99.73rd percentile of the errors accumulated over a 24 hour data collection period. IPATS utilizes a modular algorithmic design to allow user selection of data processing sequences optimized for generation of each INR metric. This novel modular approach minimizes duplication of common processing elements, thereby maximizing code efficiency and speed. Fast processing is essential given the large number of sub-image registrations required to generate INR metrics for the many images produced over a 24 hour evaluation period. Another aspect of the IPATS design that vastly reduces execution time is the off-line propagation of Landsat based truth images to the fixed grid coordinates system for each of the three GOES-R satellite locations, operational East and West and initial checkout locations. This paper describes the algorithmic design and implementation of IPATS and provides preliminary test results.
Image Navigation and Registration Performance Assessment Tool Set for the GOES-R Advanced Baseline Imager and Geostationary Lightning Mapper

NASA Technical Reports Server (NTRS)

De Luccia, Frank J.; Houchin, Scott; Porter, Brian C.; Graybill, Justin; Haas, Evan; Johnson, Patrick D.; Isaacson, Peter J.; Reth, Alan D.

2016-01-01

The GOES-R Flight Project has developed an Image Navigation and Registration (INR) Performance Assessment Tool Set (IPATS) for measuring Advanced Baseline Imager (ABI) and Geostationary Lightning Mapper (GLM) INR performance metrics in the post-launch period for performance evaluation and long term monitoring. For ABI, these metrics are the 3-sigma errors in navigation (NAV), channel-to-channel registration (CCR), frame-to-frame registration (FFR), swath-to-swath registration (SSR), and within frame registration (WIFR) for the Level 1B image products. For GLM, the single metric of interest is the 3-sigma error in the navigation of background images (GLM NAV) used by the system to navigate lightning strikes. 3-sigma errors are estimates of the 99.73rd percentile of the errors accumulated over a 24-hour data collection period. IPATS utilizes a modular algorithmic design to allow user selection of data processing sequences optimized for generation of each INR metric. This novel modular approach minimizes duplication of common processing elements, thereby maximizing code efficiency and speed. Fast processing is essential given the large number of sub-image registrations required to generate INR metrics for the many images produced over a 24-hour evaluation period. Another aspect of the IPATS design that vastly reduces execution time is the off-line propagation of Landsat based truth images to the fixed grid coordinates system for each of the three GOES-R satellite locations, operational East and West and initial checkout locations. This paper describes the algorithmic design and implementation of IPATS and provides preliminary test results.
Overlay improvement by exposure map based mask registration optimization

NASA Astrophysics Data System (ADS)

Shi, Irene; Guo, Eric; Chen, Ming; Lu, Max; Li, Gordon; Li, Rivan; Tian, Eric

2015-03-01

Along with the increased miniaturization of semiconductor electronic devices, the design rules of advanced semiconductor devices shrink dramatically. [1] One of the main challenges of lithography step is the layer-to-layer overlay control. Furthermore, DPT (Double Patterning Technology) has been adapted for the advanced technology node like 28nm and 14nm, corresponding overlay budget becomes even tighter. [2][3] After the in-die mask registration (pattern placement) measurement is introduced, with the model analysis of a KLA SOV (sources of variation) tool, it's observed that registration difference between masks is a significant error source of wafer layer-to-layer overlay at 28nm process. [4][5] Mask registration optimization would highly improve wafer overlay performance accordingly. It was reported that a laser based registration control (RegC) process could be applied after the pattern generation or after pellicle mounting and allowed fine tuning of the mask registration. [6] In this paper we propose a novel method of mask registration correction, which can be applied before mask writing based on mask exposure map, considering the factors of mask chip layout, writing sequence, and pattern density distribution. Our experiment data show if pattern density on the mask keeps at a low level, in-die mask registration residue error in 3sigma could be always under 5nm whatever blank type and related writer POSCOR (position correction) file was applied; it proves random error induced by material or equipment would occupy relatively fixed error budget as an error source of mask registration. On the real production, comparing the mask registration difference through critical production layers, it could be revealed that registration residue error of line space layers with higher pattern density is always much larger than the one of contact hole layers with lower pattern density. Additionally, the mask registration difference between layers with similar pattern density could also achieve under 5nm performance. We assume mask registration excluding random error is mostly induced by charge accumulation during mask writing, which may be calculated from surrounding exposed pattern density. Multi-loading test mask registration result shows that with x direction writing sequence, mask registration behavior in x direction is mainly related to sequence direction, but mask registration in y direction would be highly impacted by pattern density distribution map. It proves part of mask registration error is due to charge issue from nearby environment. If exposure sequence is chip by chip for normal multi chip layout case, mask registration of both x and y direction would be impacted analogously, which has also been proved by real data. Therefore, we try to set up a simple model to predict the mask registration error based on mask exposure map, and correct it with the given POSCOR (position correction) file for advanced mask writing if needed.
Bit error rate tester using fast parallel generation of linear recurring sequences

DOEpatents

Pierson, Lyndon G.; Witzke, Edward L.; Maestas, Joseph H.

2003-05-06

A fast method for generating linear recurring sequences by parallel linear recurring sequence generators (LRSGs) with a feedback circuit optimized to balance minimum propagation delay against maximal sequence period. Parallel generation of linear recurring sequences requires decimating the sequence (creating small contiguous sections of the sequence in each LRSG). A companion matrix form is selected depending on whether the LFSR is right-shifting or left-shifting. The companion matrix is completed by selecting a primitive irreducible polynomial with 1's most closely grouped in a corner of the companion matrix. A decimation matrix is created by raising the companion matrix to the (n*k).sup.th power, where k is the number of parallel LRSGs and n is the number of bits to be generated at a time by each LRSG. Companion matrices with 1's closely grouped in a corner will yield sparse decimation matrices. A feedback circuit comprised of XOR logic gates implements the decimation matrix in hardware. Sparse decimation matrices can be implemented with minimum number of XOR gates, and therefore a minimum propagation delay through the feedback circuit. The LRSG of the invention is particularly well suited to use as a bit error rate tester on high speed communication lines because it permits the receiver to synchronize to the transmitted pattern within 2n bits.
Machine Learned Replacement of N-Labels for Basecalled Sequences in DNA Barcoding.

PubMed

Ma, Eddie Y T; Ratnasingham, Sujeevan; Kremer, Stefan C

2018-01-01

This study presents a machine learning method that increases the number of identified bases in Sanger Sequencing. The system post-processes a KB basecalled chromatogram. It selects a recoverable subset of N-labels in the KB-called chromatogram to replace with basecalls (A,C,G,T). An N-label correction is defined given an additional read of the same sequence, and a human finished sequence. Corrections are added to the dataset when an alignment determines the additional read and human agree on the identity of the N-label. KB must also rate the replacement with quality value of in the additional read. Corrections are only available during system training. Developing the system, nearly 850,000 N-labels are obtained from Barcode of Life Datasystems, the premier database of genetic markers called DNA Barcodes. Increasing the number of correct bases improves reference sequence reliability, increases sequence identification accuracy, and assures analysis correctness. Keeping with barcoding standards, our system maintains an error rate of percent. Our system only applies corrections when it estimates low rate of error. Tested on this data, our automation selects and recovers: 79 percent of N-labels from COI (animal barcode); 80 percent from matK and rbcL (plant barcodes); and 58 percent from non-protein-coding sequences (across eukaryotes).
The Relationship Between Technical Errors and Decision Making Skills in the Junior Resident

PubMed Central

Nathwani, J. N.; Fiers, R.M.; Ray, R.D.; Witt, A.K.; Law, K. E.; DiMarco, S.M.; Pugh, C.M.

2017-01-01

Objective The purpose of this study is to co-evaluate resident technical errors and decision-making capabilities during placement of a subclavian central venous catheter (CVC). We hypothesize that there will be significant correlations between scenario based decision making skills, and technical proficiency in central line insertion. We also predict residents will have problems in anticipating common difficulties and generating solutions associated with line placement. Design Participants were asked to insert a subclavian central line on a simulator. After completion, residents were presented with a real life patient photograph depicting CVC placement and asked to anticipate difficulties and generate solutions. Error rates were analyzed using chi-square tests and a 5% expected error rate. Correlations were sought by comparing technical errors and scenario based decision making. Setting This study was carried out at seven tertiary care centers. Participants Study participants (N=46) consisted of largely first year research residents that could be followed longitudinally. Second year research and clinical residents were not excluded. Results Six checklist errors were committed more often than anticipated. Residents performed an average of 1.9 errors, significantly more than the 1 error, at most, per person expected (t(44)=3.82, p<.001). The most common error was performance of the procedure steps in the wrong order (28.5%, P<.001). Some of the residents (24%) had no errors, 30% committed one error, and 46 % committed more than one error. The number of technical errors committed negatively correlated with the total number of commonly identified difficulties and generated solutions (r(33)= −.429, p=.021, r(33)= −.383, p=.044 respectively). Conclusions Almost half of the surgical residents committed multiple errors while performing subclavian CVC placement. The correlation between technical errors and decision making skills suggests a critical need to train residents in both technique and error management. ACGME Competencies Medical Knowledge, Practice Based Learning and Improvement, Systems Based Practice PMID:27671618
Deep sequencing of hepatitis C virus hypervariable region 1 reveals no correlation between genetic heterogeneity and antiviral treatment outcome

PubMed Central

2014-01-01

Background Hypervariable region 1 (HVR1) contained within envelope protein 2 (E2) gene is the most variable part of HCV genome and its translation product is a major target for the host immune response. Variability within HVR1 may facilitate evasion of the immune response and could affect treatment outcome. The aim of the study was to analyze the impact of HVR1 heterogeneity employing sensitive ultra-deep sequencing, on the outcome of PEG-IFN-α (pegylated interferon α) and ribavirin treatment. Methods HVR1 sequences were amplified from pretreatment serum samples of 25 patients infected with genotype 1b HCV (12 responders and 13 non-responders) and were subjected to pyrosequencing (GS Junior, 454/Roche). Reads were corrected for sequencing error using ShoRAH software, while population reconstruction was done using three different minimal variant frequency cut-offs of 1%, 2% and 5%. Statistical analysis was done using Mann–Whitney and Fisher’s exact tests. Results Complexity, Shannon entropy, nucleotide diversity per site, genetic distance and the number of genetic substitutions were not significantly different between responders and non-responders, when analyzing viral populations at any of the three frequencies (≥1%, ≥2% and ≥5%). When clonal sample was used to determine pyrosequencing error, 4% of reads were found to be incorrect and the most abundant variant was present at a frequency of 1.48%. Use of ShoRAH reduced the sequencing error to 1%, with the most abundant erroneous variant present at frequency of 0.5%. Conclusions While deep sequencing revealed complex genetic heterogeneity of HVR1 in chronic hepatitis C patients, there was no correlation between treatment outcome and any of the analyzed quasispecies parameters. PMID:25016390
A label-free, fluorescence based assay for microarray

NASA Astrophysics Data System (ADS)

Niu, Sanjun

DNA chip technology has drawn tremendous attention since it emerged in the mid 90's as a method that expedites gene sequencing by over 100-fold. DNA chip, also called DNA microarray, is a combinatorial technology in which different single-stranded DNA (ssDNA) molecules of known sequences are immobilized at specific spots. The immobilized ssDNA strands are called probes. In application, the chip is exposed to a solution containing ssDNA of unknown sequence, called targets, which are labeled with fluorescent dyes. Due to specific molecular recognition among the base pairs in the DNA, the binding or hybridization occurs only when the probe and target sequences are complementary. The nucleotide sequence of the target is determined by imaging the fluorescence from the spots. The uncertainty of background in signal detection and statistical error in data analysis, primarily due to the error in the DNA amplification process and statistical distribution of the tags in the target DNA, have become the fundamental barriers in bringing the technology into application for clinical diagnostics. Furthermore, the dye and tagging process are expensive, making the cost of DNA chips inhibitive for clinical testing. These limitations and challenges make it difficult to implement DNA chip methods as a diagnostic tool in a pathology laboratory. The objective of this dissertation research is to provide an alternative approach that will address the above challenges. In this research, a label-free assay is designed and studied. Polystyrene (PS), a commonly used polymeric material, serves as the fluorescence agent. Probe ssDNA is covalently immobilized on polystyrene thin film that is supported by a reflecting substrate. When this chip is exposed to excitation light, fluorescence light intensity from PS is detected as the signal. Since the optical constants and conformations of ssDNA and dsDNA (double stranded DNA) are different, the measured fluorescence from PS changes for the same intensity of excitation light. The fluorescence contrast is used to quantify the amount of probe-target hybridization. A mathematical model that considers multiple reflections and scattering is developed to explain the mechanism of the fluorescence contrast which depends on the thickness of the PS film. Scattering is the dominant factor that contributes to the contrast. The potential of this assay to detect single nucleotide polymorphism is also tested.
The Complex Exogenous RNA Spectra in Human Plasma: An Interface with Human Gut Biota?

PubMed Central

Wang, Kai; Li, Hong; Yuan, Yue; Etheridge, Alton; Zhou, Yong; Huang, David; Wilmes, Paul; Galas, David

2012-01-01

Human plasma has long been a rich source for biomarker discovery. It has recently become clear that plasma RNA molecules, such as microRNA, in addition to proteins are common and can serve as biomarkers. Surveying human plasma for microRNA biomarkers using next generation sequencing technology, we observed that a significant fraction of the circulating RNA appear to originate from exogenous species. With careful analysis of sequence error statistics and other controls, we demonstrated that there is a wide range of RNA from many different organisms, including bacteria and fungi as well as from other species. These RNAs may be associated with protein, lipid or other molecules protecting them from RNase activity in plasma. Some of these RNAs are detected in intracellular complexes and may be able to influence cellular activities under in vitro conditions. These findings raise the possibility that plasma RNAs of exogenous origin may serve as signaling molecules mediating for example the human-microbiome interaction and may affect and/or indicate the state of human health. PMID:23251414
Genetic Misdiagnoses and the Potential for Health Disparities.

PubMed

Manrai, Arjun K; Funke, Birgit H; Rehm, Heidi L; Olesen, Morten S; Maron, Bradley A; Szolovits, Peter; Margulies, David M; Loscalzo, Joseph; Kohane, Isaac S

2016-08-18

For more than a decade, risk stratification for hypertrophic cardiomyopathy has been enhanced by targeted genetic testing. Using sequencing results, clinicians routinely assess the risk of hypertrophic cardiomyopathy in a patient's relatives and diagnose the condition in patients who have ambiguous clinical presentations. However, the benefits of genetic testing come with the risk that variants may be misclassified. Using publicly accessible exome data, we identified variants that have previously been considered causal in hypertrophic cardiomyopathy and that are overrepresented in the general population. We studied these variants in diverse populations and reevaluated their initial ascertainments in the medical literature. We reviewed patient records at a leading genetic-testing laboratory for occurrences of these variants during the near-decade-long history of the laboratory. Multiple patients, all of whom were of African or unspecified ancestry, received positive reports, with variants misclassified as pathogenic on the basis of the understanding at the time of testing. Subsequently, all reported variants were recategorized as benign. The mutations that were most common in the general population were significantly more common among black Americans than among white Americans (P<0.001). Simulations showed that the inclusion of even small numbers of black Americans in control cohorts probably would have prevented these misclassifications. We identified methodologic shortcomings that contributed to these errors in the medical literature. The misclassification of benign variants as pathogenic that we found in our study shows the need for sequencing the genomes of diverse populations, both in asymptomatic controls and the tested patient population. These results expand on current guidelines, which recommend the use of ancestry-matched controls to interpret variants. As additional populations of different ancestry backgrounds are sequenced, we expect variant reclassifications to increase, particularly for ancestry groups that have historically been less well studied. (Funded by the National Institutes of Health.).
Comprehensive mutation screening in 55 probands with type 1 primary hyperoxaluria shows feasibility of a gene-based diagnosis.

PubMed

Monico, Carla G; Rossetti, Sandro; Schwanz, Heidi A; Olson, Julie B; Lundquist, Patrick A; Dawson, D Brian; Harris, Peter C; Milliner, Dawn S

2007-06-01

Mutations in AGXT, a locus mapped to 2q37.3, cause deficiency of liver-specific alanine:glyoxylate aminotransferase (AGT), the metabolic error in type 1 primary hyperoxaluria (PH1). Genetic analysis of 55 unrelated probands with PH1 from the Mayo Clinic Hyperoxaluria Center, to date the largest with availability of complete sequencing across the entire AGXT coding region and documented hepatic AGT deficiency, suggests that a molecular diagnosis (identification of two disease alleles) is feasible in 96% of patients. Unique to this PH1 population was the higher frequency of G170R, the most common AGXT mutation, accounting for 37% of alleles, and detection of a new 3' end deletion (Ex 11_3'UTR del). A described frameshift mutation (c.33_34insC) occurred with the next highest frequency (11%), followed by F152I and G156R (frequencies of 6.3 and 4.5%, respectively), both surpassing the frequency (2.7%) of I244T, the previously reported third most common pathogenic change. These sequencing data indicate that AGXT is even more variable than formerly believed, with 28 new variants (21 mutations and seven polymorphisms) detected, with highest frequencies on exons 1, 4, and 7. When limited to these three exons, molecular analysis sensitivity was 77%, compared with 98% for whole-gene sequencing. These are the first data in support of comprehensive AGXT analysis for the diagnosis of PH1, obviating a liver biopsy in most well-characterized patients. Also reported here is previously unavailable evidence for the pathogenic basis of all AGXT missense variants, including evolutionary conservation data in a multisequence alignment and use of a normal control population.
Mutagenic Spectra Arising from Replication Bypass of the 2,6-diamino-4-hydroxy-N5-methyl Formamidopyrimidine Adduct in Primate Cells

PubMed Central

Earley, Lauriel F.; Minko, Irina G.; Christov, Plamen P.; Rizzo, Carmelo J.; Lloyd, R. Stephen

2013-01-01

DNA exposures to electrophilic methylating agents that are commonly used during chemotherapeutic treatments cause diverse chemical modifications of nucleobases, with reaction at N7-dG being the most abundant. Although this base modification frequently results in destabilization of the glycosyl bond and spontaneous depurination, the adduct can react with hydroxide ion to yield a stable, ring-opened MeFapy-dG and this lesion has been reported to persist in animal tissues. Results from prior in vitro replication bypass investigations of the MeFapy-dG adduct had revealed complex spectra of replication errors that differed depending on the identity of DNA polymerase and the local sequence context. In this study, a series of nine site-specifically modified MeFapy-dG-containing oligodeoxynucleotides were engineered into a shuttle vector and subjected to replication in primate cells. In all nine sequence contexts examined, MeFapy-dG was shown to be associated with a strong mutator phenotype, predominantly causing base substitutions, with G to T transversions being most common. Single and dinucleotide deletions were also found in a subset of the sequence contexts. Interestingly, single-nucleotide deletions occurred not only at the adducted site, but also one nucleotide downstream of the adduct. Standard models for primer-template misalignment could account for some, but not all mutations observed. These data demonstrate that in addition to mutagenesis predicted from replication of DNAs containing O6-Me-dG and O4-Me-dT, the MeFapy-dG adduct likely contributes to mutagenic events following chemotherapeutic treatments. PMID:23763662
Prevalence of amblyopia and patterns of refractive error in the amblyopic children of a tertiary eye care center of Nepal.

PubMed

Sapkota, K; Pirouzian, A; Matta, N S

2013-01-01

Refractive error is a common cause of amblyopia. To determine prevalence of amblyopia and the pattern and the types of refractive error in children with amblyopia in a tertiary eye hospital of Nepal. A retrospective chart review of children diagnosed with amblyopia in the Nepal Eye Hospital (NEH) from July 2006 to June 2011 was conducted. Children of age 13+ or who had any ocular pathology were excluded. Cycloplegic refraction and an ophthalmological examination was performed for all children. The pattern of refractive error and the association between types of refractive error and types of amblyopia were determined. Amblyopia was found in 0.7 % (440) of 62,633 children examined in NEH during this period. All the amblyopic eyes of the subjects had refractive error. Fifty-six percent (248) of the patients were male and the mean age was 7.74 ± 2.97 years. Anisometropia was the most common cause of amblyopia (p less than 0.001). One third (29 %) of the subjects had bilateral amblyopia due to high ametropia. Forty percent of eyes had severe amblyopia with visual acuity of 20/120 or worse. About twothirds (59.2 %) of the eyes had astigmatism. The prevalence of amblyopia in the Nepal Eye Hospital is 0.7%. Anisometropia is the most common cause of amblyopia. Astigmatism is the most common types of refractive error in amblyopic eyes. © NEPjOPH.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Passarge, M; Fix, M K; Manser, P

Purpose: To create and test an accurate EPID-frame-based VMAT QA metric to detect gross dose errors in real-time and to provide information about the source of error. Methods: A Swiss cheese model was created for an EPID-based real-time QA process. The system compares a treatmentplan- based reference set of EPID images with images acquired over each 2° gantry angle interval. The metric utilizes a sequence of independent consecutively executed error detection Methods: a masking technique that verifies infield radiation delivery and ensures no out-of-field radiation; output normalization checks at two different stages; global image alignment to quantify rotation, scaling andmore » translation; standard gamma evaluation (3%, 3 mm) and pixel intensity deviation checks including and excluding high dose gradient regions. Tolerances for each test were determined. For algorithm testing, twelve different types of errors were selected to modify the original plan. Corresponding predictions for each test case were generated, which included measurement-based noise. Each test case was run multiple times (with different noise per run) to assess the ability to detect introduced errors. Results: Averaged over five test runs, 99.1% of all plan variations that resulted in patient dose errors were detected within 2° and 100% within 4° (∼1% of patient dose delivery). Including cases that led to slightly modified but clinically equivalent plans, 91.5% were detected by the system within 2°. Based on the type of method that detected the error, determination of error sources was achieved. Conclusion: An EPID-based during-treatment error detection system for VMAT deliveries was successfully designed and tested. The system utilizes a sequence of methods to identify and prevent gross treatment delivery errors. The system was inspected for robustness with realistic noise variations, demonstrating that it has the potential to detect a large majority of errors in real-time and indicate the error source. J. V. Siebers receives funding support from Varian Medical Systems.« less
Large-scale contamination of microbial isolate genomes by Illumina PhiX control.

PubMed

Mukherjee, Supratim; Huntemann, Marcel; Ivanova, Natalia; Kyrpides, Nikos C; Pati, Amrita

2015-01-01

With the rapid growth and development of sequencing technologies, genomes have become the new go-to for exploring solutions to some of the world's biggest challenges such as searching for alternative energy sources and exploration of genomic dark matter. However, progress in sequencing has been accompanied by its share of errors that can occur during template or library preparation, sequencing, imaging or data analysis. In this study we screened over 18,000 publicly available microbial isolate genome sequences in the Integrated Microbial Genomes database and identified more than 1000 genomes that are contaminated with PhiX, a control frequently used during Illumina sequencing runs. Approximately 10% of these genomes have been published in literature and 129 contaminated genomes were sequenced under the Human Microbiome Project. Raw sequence reads are prone to contamination from various sources and are usually eliminated during downstream quality control steps. Detection of PhiX contaminated genomes indicates a lapse in either the application or effectiveness of proper quality control measures. The presence of PhiX contamination in several publicly available isolate genomes can result in additional errors when such data are used in comparative genomics analyses. Such contamination of public databases have far-reaching consequences in the form of erroneous data interpretation and analyses, and necessitates better measures to proofread raw sequences before releasing them to the broader scientific community.
Comparative modeling without implicit sequence alignments.

PubMed

Kolinski, Andrzej; Gront, Dominik

2007-10-01

The number of known protein sequences is about thousand times larger than the number of experimentally solved 3D structures. For more than half of the protein sequences a close or distant structural analog could be identified. The key starting point in a classical comparative modeling is to generate the best possible sequence alignment with a template or templates. With decreasing sequence similarity, the number of errors in the alignments increases and these errors are the main causes of the decreasing accuracy of the molecular models generated. Here we propose a new approach to comparative modeling, which does not require the implicit alignment - the model building phase explores geometric, evolutionary and physical properties of a template (or templates). The proposed method requires prior identification of a template, although the initial sequence alignment is ignored. The model is built using a very efficient reduced representation search engine CABS to find the best possible superposition of the query protein onto the template represented as a 3D multi-featured scaffold. The criteria used include: sequence similarity, predicted secondary structure consistency, local geometric features and hydrophobicity profile. For more difficult cases, the new method qualitatively outperforms existing schemes of comparative modeling. The algorithm unifies de novo modeling, 3D threading and sequence-based methods. The main idea is general and could be easily combined with other efficient modeling tools as Rosetta, UNRES and others.
Detection of microRNAs in color space.

PubMed

Marco, Antonio; Griffiths-Jones, Sam

2012-02-01

Deep sequencing provides inexpensive opportunities to characterize the transcriptional diversity of known genomes. The AB SOLiD technology generates millions of short sequencing reads in color-space; that is, the raw data is a sequence of colors, where each color represents 2 nt and each nucleotide is represented by two consecutive colors. This strategy is purported to have several advantages, including increased ability to distinguish sequencing errors from polymorphisms. Several programs have been developed to map short reads to genomes in color space. However, a number of previously unexplored technical issues arise when using SOLiD technology to characterize microRNAs. Here we explore these technical difficulties. First, since the sequenced reads are longer than the biological sequences, every read is expected to contain linker fragments. The color-calling error rate increases toward the 3(') end of the read such that recognizing the linker sequence for removal becomes problematic. Second, mapping in color space may lead to the loss of the first nucleotide of each read. We propose a sequential trimming and mapping approach to map small RNAs. Using our strategy, we reanalyze three published insect small RNA deep sequencing datasets and characterize 22 new microRNAs. A bash shell script to perform the sequential trimming and mapping procedure, called SeqTrimMap, is available at: http://www.mirbase.org/tools/seqtrimmap/ antonio.marco@manchester.ac.uk Supplementary data are available at Bioinformatics online.

Correcting for Sample Contamination in Genotype Calling of DNA Sequence Data

PubMed Central

Flickinger, Matthew; Jun, Goo; Abecasis, Gonçalo R.; Boehnke, Michael; Kang, Hyun Min

2015-01-01

DNA sample contamination is a frequent problem in DNA sequencing studies and can result in genotyping errors and reduced power for association testing. We recently described methods to identify within-species DNA sample contamination based on sequencing read data, showed that our methods can reliably detect and estimate contamination levels as low as 1%, and suggested strategies to identify and remove contaminated samples from sequencing studies. Here we propose methods to model contamination during genotype calling as an alternative to removal of contaminated samples from further analyses. We compare our contamination-adjusted calls to calls that ignore contamination and to calls based on uncontaminated data. We demonstrate that, for moderate contamination levels (5%–20%), contamination-adjusted calls eliminate 48%–77% of the genotyping errors. For lower levels of contamination, our contamination correction methods produce genotypes nearly as accurate as those based on uncontaminated data. Our contamination correction methods are useful generally, but are particularly helpful for sample contamination levels from 2% to 20%. PMID:26235984
Optimization of High-Throughput Sequencing Kinetics for determining enzymatic rate constants of thousands of RNA substrates

PubMed Central

Niland, Courtney N.; Jankowsky, Eckhard; Harris, Michael E.

2016-01-01

Quantification of the specificity of RNA binding proteins and RNA processing enzymes is essential to understanding their fundamental roles in biological processes. High Throughput Sequencing Kinetics (HTS-Kin) uses high throughput sequencing and internal competition kinetics to simultaneously monitor the processing rate constants of thousands of substrates by RNA processing enzymes. This technique has provided unprecedented insight into the substrate specificity of the tRNA processing endonuclease ribonuclease P. Here, we investigate the accuracy and robustness of measurements associated with each step of the HTS-Kin procedure. We examine the effect of substrate concentration on the observed rate constant, determine the optimal kinetic parameters, and provide guidelines for reducing error in amplification of the substrate population. Importantly, we find that high-throughput sequencing, and experimental reproducibility contribute their own sources of error, and these are the main sources of imprecision in the quantified results when otherwise optimized guidelines are followed. PMID:27296633
DOE Office of Scientific and Technical Information (OSTI.GOV)

N, Gwilliam M; J, Collins D; O, Leach M

Purpose: To assess the feasibility of accurately quantifying the concentration of MRI contrast agent (CA) in pulsatile flowing blood by measuring its T{sub 1}, as is common for the purposes of obtaining a patientspecific arterial input function (AIF). Dynamic contrast enhanced (DCE) - MRI and pharmacokinetic (PK) modelling is widely used to produce measures of vascular function but accurate measurement of the AIF undermines their accuracy. A proposed solution is to measure the T{sub 1} of blood in a large vessel using the Fram double flip angle method during the passage of a bolus of CA. This work expands onmore » previous work by assessing pulsatile flow and the changes in T{sub 1} seen with a CA bolus. Methods: A phantom was developed which used a physiological pump to pass fluid of a known T{sub 1} (812ms) through the centre of a head coil of a clinical 1.5T MRI scanner. Measurements were made using high temporal resolution sequences suitable for DCE-MRI and were used to validate a virtual phantom that simulated the expected errors due to pulsatile flow and bolus of CA concentration changes typically found in patients. Results: : Measured and virtual results showed similar trends, although there were differences that may be attributed to the virtual phantom not accurately simulating the spin history of the fluid before entering the imaging volume. The relationship between T{sub 1} measurement and flow speed was non-linear. T{sub 1} measurement is compromised by new spins flowing into the imaging volume, not being subject to enough excitations to have reached steady-state. The virtual phantom demonstrated a range of recorded T{sub 1} for various simulated T{sub 1} / flow rates. Conclusion: T{sub 1} measurement of flowing blood using standard DCE-MRI sequences is very challenging. Measurement error is non-linear with relation to instantaneous flow speed. Optimising sequence parameters and lowering baseline T{sub 1} of blood should be considered.« less
Motor imagery training: Kinesthetic imagery strategy and inferior parietal fMRI activation.

PubMed

Lebon, Florent; Horn, Ulrike; Domin, Martin; Lotze, Martin

2018-04-01

Motor imagery (MI) is the mental simulation of action frequently used by professionals in different fields. However, with respect to performance, well-controlled functional imaging studies on MI training are sparse. We investigated changes in fMRI representation going along with performance changes of a finger sequence (error and velocity) after MI training in 48 healthy young volunteers. Before training, we tested the vividness of kinesthetic and visual imagery. During tests, participants were instructed to move or to imagine moving the fingers of the right hand in a specific order. During MI training, participants repeatedly imagined the sequence for 15 min. Imaging analysis was performed using a full-factorial design to assess brain changes due to imagery training. We also used regression analyses to identify those who profited from training (performance outcome and gain) with initial imagery scores (vividness) and fMRI activation magnitude during MI at pre-test (MI pre ). After training, error rate decreased and velocity increased. We combined both parameters into a common performance index. FMRI activation in the left inferior parietal lobe (IPL) was associated with MI and increased over time. In addition, fMRI activation in the right IPL during MI pre was associated with high initial kinesthetic vividness. High kinesthetic imagery vividness predicted a high performance after training. In contrast, occipital activation, associated with visual imagery strategies, showed a negative predictive value for performance. Our data echo the importance of high kinesthetic vividness for MI training outcome and consider IPL as a key area during MI and through MI training. © 2018 Wiley Periodicals, Inc.
A bottom-up model of spatial attention predicts human error patterns in rapid scene recognition.

PubMed

Einhäuser, Wolfgang; Mundhenk, T Nathan; Baldi, Pierre; Koch, Christof; Itti, Laurent

2007-07-20

Humans demonstrate a peculiar ability to detect complex targets in rapidly presented natural scenes. Recent studies suggest that (nearly) no focal attention is required for overall performance in such tasks. Little is known, however, of how detection performance varies from trial to trial and which stages in the processing hierarchy limit performance: bottom-up visual processing (attentional selection and/or recognition) or top-down factors (e.g., decision-making, memory, or alertness fluctuations)? To investigate the relative contribution of these factors, eight human observers performed an animal detection task in natural scenes presented at 20 Hz. Trial-by-trial performance was highly consistent across observers, far exceeding the prediction of independent errors. This consistency demonstrates that performance is not primarily limited by idiosyncratic factors but by visual processing. Two statistical stimulus properties, contrast variation in the target image and the information-theoretical measure of "surprise" in adjacent images, predict performance on a trial-by-trial basis. These measures are tightly related to spatial attention, demonstrating that spatial attention and rapid target detection share common mechanisms. To isolate the causal contribution of the surprise measure, eight additional observers performed the animal detection task in sequences that were reordered versions of those all subjects had correctly recognized in the first experiment. Reordering increased surprise before and/or after the target while keeping the target and distractors themselves unchanged. Surprise enhancement impaired target detection in all observers. Consequently, and contrary to several previously published findings, our results demonstrate that attentional limitations, rather than target recognition alone, affect the detection of targets in rapidly presented visual sequences.
Comparison of the Predictive Accuracy of DNA Array-Based Multigene Classifiers across cDNA Arrays and Affymetrix GeneChips

PubMed Central

Stec, James; Wang, Jing; Coombes, Kevin; Ayers, Mark; Hoersch, Sebastian; Gold, David L.; Ross, Jeffrey S; Hess, Kenneth R.; Tirrell, Stephen; Linette, Gerald; Hortobagyi, Gabriel N.; Symmans, W. Fraser; Pusztai, Lajos

2005-01-01

We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r ≥ 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene. PMID:16049308
Refractive Errors

MedlinePlus

... and lens of your eye helps you focus. Refractive errors are vision problems that happen when the shape ... cornea, or aging of the lens. Four common refractive errors are Myopia, or nearsightedness - clear vision close up ...
Correcting Inconsistencies and Errors in Bacterial Genome Metadata Using an Automated Curation Tool in Excel (AutoCurE).

PubMed

Schmedes, Sarah E; King, Jonathan L; Budowle, Bruce

2015-01-01

Whole-genome data are invaluable for large-scale comparative genomic studies. Current sequencing technologies have made it feasible to sequence entire bacterial genomes with relative ease and time with a substantially reduced cost per nucleotide, hence cost per genome. More than 3,000 bacterial genomes have been sequenced and are available at the finished status. Publically available genomes can be readily downloaded; however, there are challenges to verify the specific supporting data contained within the download and to identify errors and inconsistencies that may be present within the organizational data content and metadata. AutoCurE, an automated tool for bacterial genome database curation in Excel, was developed to facilitate local database curation of supporting data that accompany downloaded genomes from the National Center for Biotechnology Information. AutoCurE provides an automated approach to curate local genomic databases by flagging inconsistencies or errors by comparing the downloaded supporting data to the genome reports to verify genome name, RefSeq accession numbers, the presence of archaea, BioProject/UIDs, and sequence file descriptions. Flags are generated for nine metadata fields if there are inconsistencies between the downloaded genomes and genomes reports and if erroneous or missing data are evident. AutoCurE is an easy-to-use tool for local database curation for large-scale genome data prior to downstream analyses.
A comparison of serial order short-term memory effects across verbal and musical domains.

PubMed

Gorin, Simon; Mengal, Pierre; Majerus, Steve

2018-04-01

Recent studies suggest that the mechanisms involved in the short-term retention of serial order information may be shared across short-term memory (STM) domains such as verbal and visuospatial STM. Given the intrinsic sequential organization of musical material, the study of STM for musical information may be particularly informative about serial order retention processes and their domain-generality. The present experiment examined serial order STM for verbal and musical sequences in participants with no advanced musical expertise and experienced musicians. Serial order STM for verbal information was assessed via a serial order reconstruction task for digit sequences. In the musical domain, serial order STM was assessed using a novel melodic sequence reconstruction task maximizing the retention of tone order information. We observed that performance for the verbal and musical tasks was characterized by sequence length as well as primacy and recency effects. Serial order errors in both tasks were characterized by similar transposition gradients and ratios of fill-in:infill errors. These effects were observed for both participant groups, although the transposition gradients and ratios of fill-in:infill errors showed additional specificities for musician participants in the musical task. The data support domain-general serial order STM effects but also suggest the existence of additional domain-specific effects. Implications for models of serial order STM in verbal and musical domains are discussed.
Event-related potentials in response to violations of content and temporal event knowledge.

PubMed

Drummer, Janna; van der Meer, Elke; Schaadt, Gesa

2016-01-08

Scripts that store knowledge of everyday events are fundamentally important for managing daily routines. Content event knowledge (i.e., knowledge about which events belong to a script) and temporal event knowledge (i.e., knowledge about the chronological order of events in a script) constitute qualitatively different forms of knowledge. However, there is limited information about each distinct process and the time course involved in accessing content and temporal event knowledge. Therefore, we analyzed event-related potentials (ERPs) in response to either correctly presented event sequences or event sequences that contained a content or temporal error. We found an N400, which was followed by a posteriorly distributed P600 in response to content errors in event sequences. By contrast, we did not find an N400 but an anteriorly distributed P600 in response to temporal errors in event sequences. Thus, the N400 seems to be elicited as a response to a general mismatch between an event and the established event model. We assume that the expectancy violation of content event knowledge, as indicated by the N400, induces the collapse of the established event model, a process indicated by the posterior P600. The expectancy violation of temporal event knowledge is assumed to induce an attempt to reorganize the event model in working memory, a process indicated by the frontal P600. Copyright © 2015 Elsevier Ltd. All rights reserved.
Droplet Digital™ PCR Next-Generation Sequencing Library QC Assay.

PubMed

Heredia, Nicholas J

2018-01-01

Digital PCR is a valuable tool to quantify next-generation sequencing (NGS) libraries precisely and accurately. Accurately quantifying NGS libraries enable accurate loading of the libraries on to the sequencer and thus improve sequencing performance by reducing under and overloading error. Accurate quantification also benefits users by enabling uniform loading of indexed/barcoded libraries which in turn greatly improves sequencing uniformity of the indexed/barcoded samples. The advantages gained by employing the Droplet Digital PCR (ddPCR™) library QC assay includes the precise and accurate quantification in addition to size quality assessment, enabling users to QC their sequencing libraries with confidence.
Medication prescribing errors in the medical intensive care unit of Tikur Anbessa Specialized Hospital, Addis Ababa, Ethiopia.

PubMed

Sada, Oumer; Melkie, Addisu; Shibeshi, Workineh

2015-09-16

Medication errors (MEs) are important problems in all hospitalized populations, especially in intensive care unit (ICU). Little is known about the prevalence of medication prescribing errors in the ICU of hospitals in Ethiopia. The aim of this study was to assess medication prescribing errors in the ICU of Tikur Anbessa Specialized Hospital using retrospective cross-sectional analysis of patient cards and medication charts. About 220 patient charts were reviewed with a total of 1311 patient-days, and 882 prescription episodes. 359 MEs were detected; with prevalence of 40 per 100 orders. Common prescribing errors were omission errors 154 (42.89%), 101 (28.13%) wrong combination, 48 (13.37%) wrong abbreviation, 30 (8.36%) wrong dose, wrong frequency 18 (5.01%) and wrong indications 8 (2.23%). The present study shows that medication errors are common in medical ICU of Tikur Anbessa Specialized Hospital. These results suggest future targets of prevention strategies to reduce the rate of medication error.
Differences among Job Positions Related to Communication Errors at Construction Sites

NASA Astrophysics Data System (ADS)

Takahashi, Akiko; Ishida, Toshiro

In a previous study, we classified the communicatio n errors at construction sites as faulty intention and message pattern, inadequate channel pattern, and faulty comprehension pattern. This study seeks to evaluate the degree of risk of communication errors and to investigate differences among people in various job positions in perception of communication error risk . Questionnaires based on the previous study were a dministered to construction workers (n=811; 149 adminis trators, 208 foremen and 454 workers). Administrators evaluated all patterns of communication error risk equally. However, foremen and workers evaluated communication error risk differently in each pattern. The common contributing factors to all patterns wer e inadequate arrangements before work and inadequate confirmation. Some factors were common among patterns but other factors were particular to a specific pattern. To help prevent future accidents at construction sites, administrators should understand how people in various job positions perceive communication errors and propose human factors measures to prevent such errors.
Speech serial control in healthy speakers and speakers with hypokinetic or ataxic dysarthria: effects of sequence length and practice

PubMed Central

Reilly, Kevin J.; Spencer, Kristie A.

2013-01-01

The current study investigated the processes responsible for selection of sounds and syllables during production of speech sequences in 10 adults with hypokinetic dysarthria from Parkinson’s disease, five adults with ataxic dysarthria, and 14 healthy control speakers. Speech production data from a choice reaction time task were analyzed to evaluate the effects of sequence length and practice on speech sound sequencing. Speakers produced sequences that were between one and five syllables in length over five experimental runs of 60 trials each. In contrast to the healthy speakers, speakers with hypokinetic dysarthria demonstrated exaggerated sequence length effects for both inter-syllable intervals (ISIs) and speech error rates. Conversely, speakers with ataxic dysarthria failed to demonstrate a sequence length effect on ISIs and were also the only group that did not exhibit practice-related changes in ISIs and speech error rates over the five experimental runs. The exaggerated sequence length effects in the hypokinetic speakers with Parkinson’s disease are consistent with an impairment of action selection during speech sequence production. The absent length effects observed in the speakers with ataxic dysarthria is consistent with previous findings that indicate a limited capacity to buffer speech sequences in advance of their execution. In addition, the lack of practice effects in these speakers suggests that learning-related improvements in the production rate and accuracy of speech sequences involves processing by structures of the cerebellum. Together, the current findings inform models of serial control for speech in healthy speakers and support the notion that sequencing deficits contribute to speech symptoms in speakers with hypokinetic or ataxic dysarthria. In addition, these findings indicate that speech sequencing is differentially impaired in hypokinetic and ataxic dysarthria. PMID:24137121
Thrombus segmentation by texture dynamics from microscopic image sequences

NASA Astrophysics Data System (ADS)

Brieu, Nicolas; Serbanovic-Canic, Jovana; Cvejic, Ana; Stemple, Derek; Ouwehand, Willem; Navab, Nassir; Groher, Martin

2010-03-01

The genetic factors of thrombosis are commonly explored by microscopically imaging the coagulation of blood cells induced by injuring a vessel of mice or of zebrafish mutants. The latter species is particularly interesting since skin transparency permits to non-invasively acquire microscopic images of the scene with a CCD camera and to estimate the parameters characterizing the thrombus development. These parameters are currently determined by manual outlining, which is both error prone and extremely time consuming. Even though a technique for automatic thrombus extraction would be highly valuable for gene analysts, little work can be found, which is mainly due to very low image contrast and spurious structures. In this work, we propose to semi-automatically segment the thrombus over time from microscopic image sequences of wild-type zebrafish larvae. To compensate the lack of valuable spatial information, our main idea consists of exploiting the temporal information by modeling the variations of the pixel intensities over successive temporal windows with a linear Markov-based dynamic texture formalization. We then derive an image from the estimated model parameters, which represents the probability of a pixel to belong to the thrombus. We employ this probability image to accurately estimate the thrombus position via an active contour segmentation incorporating also prior and spatial information of the underlying intensity images. The performance of our approach is tested on three microscopic image sequences. We show that the thrombus is accurately tracked over time in each sequence if the respective parameters controlling prior influence and contour stiffness are correctly chosen.
Genome-wide gene–gene interaction analysis for next-generation sequencing

PubMed Central

Zhao, Jinying; Zhu, Yun; Xiong, Momiao

2016-01-01

The critical barrier in interaction analysis for next-generation sequencing (NGS) data is that the traditional pairwise interaction analysis that is suitable for common variants is difficult to apply to rare variants because of their prohibitive computational time, large number of tests and low power. The great challenges for successful detection of interactions with NGS data are (1) the demands in the paradigm of changes in interaction analysis; (2) severe multiple testing; and (3) heavy computations. To meet these challenges, we shift the paradigm of interaction analysis between two SNPs to interaction analysis between two genomic regions. In other words, we take a gene as a unit of analysis and use functional data analysis techniques as dimensional reduction tools to develop a novel statistic to collectively test interaction between all possible pairs of SNPs within two genome regions. By intensive simulations, we demonstrate that the functional logistic regression for interaction analysis has the correct type 1 error rates and higher power to detect interaction than the currently used methods. The proposed method was applied to a coronary artery disease dataset from the Wellcome Trust Case Control Consortium (WTCCC) study and the Framingham Heart Study (FHS) dataset, and the early-onset myocardial infarction (EOMI) exome sequence datasets with European origin from the NHLBI's Exome Sequencing Project. We discovered that 6 of 27 pairs of significantly interacted genes in the FHS were replicated in the independent WTCCC study and 24 pairs of significantly interacted genes after applying Bonferroni correction in the EOMI study. PMID:26173972
High-fidelity target sequencing of individual molecules identified using barcode sequences: de novo detection and absolute quantitation of mutations in plasma cell-free DNA from cancer patients.

PubMed

Kukita, Yoji; Matoba, Ryo; Uchida, Junji; Hamakawa, Takuya; Doki, Yuichiro; Imamura, Fumio; Kato, Kikuya

2015-08-01

Circulating tumour DNA (ctDNA) is an emerging field of cancer research. However, current ctDNA analysis is usually restricted to one or a few mutation sites due to technical limitations. In the case of massively parallel DNA sequencers, the number of false positives caused by a high read error rate is a major problem. In addition, the final sequence reads do not represent the original DNA population due to the global amplification step during the template preparation. We established a high-fidelity target sequencing system of individual molecules identified in plasma cell-free DNA using barcode sequences; this system consists of the following two steps. (i) A novel target sequencing method that adds barcode sequences by adaptor ligation. This method uses linear amplification to eliminate the errors introduced during the early cycles of polymerase chain reaction. (ii) The monitoring and removal of erroneous barcode tags. This process involves the identification of individual molecules that have been sequenced and for which the number of mutations have been absolute quantitated. Using plasma cell-free DNA from patients with gastric or lung cancer, we demonstrated that the system achieved near complete elimination of false positives and enabled de novo detection and absolute quantitation of mutations in plasma cell-free DNA. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Teaching Common Errors in Applying a Procedure.

ERIC Educational Resources Information Center

Marcone, Stephen; Reigeluth, Charles M.

1988-01-01

Discusses study that investigated whether or not the teaching of matched examples and nonexamples in the form of common errors could improve student performance in undergraduate music theory courses. Highlights include hypotheses tested, pretests and posttests, and suggestions for further research with different age groups. (19 references)…
At the cross-roads: an on-road examination of driving errors at intersections.

PubMed

Young, Kristie L; Salmon, Paul M; Lenné, Michael G

2013-09-01

A significant proportion of road trauma occurs at intersections. Understanding the nature of driving errors at intersections therefore has the potential to lead to significant injury reductions. To further understand how the complexity of modern intersections shapes behaviour of these errors are compared to errors made mid-block, and the role of wider systems failures in intersection error causation is investigated in an on-road study. Twenty-five participants drove a pre-determined urban route incorporating 25 intersections. Two in-vehicle observers recorded the errors made while a range of other data was collected, including driver verbal protocols, video, driver eye glance behaviour and vehicle data (e.g., speed, braking and lane position). Participants also completed a post-trial cognitive task analysis interview. Participants were found to make 39 specific error types, with speeding violations the most common. Participants made significantly more errors at intersections compared to mid-block, with misjudgement, action and perceptual/observation errors more commonly observed at intersections. Traffic signal configuration was found to play a key role in intersection error causation, with drivers making more errors at partially signalised compared to fully signalised intersections. Copyright © 2012 Elsevier Ltd. All rights reserved.
A Simple Exact Error Rate Analysis for DS-CDMA with Arbitrary Pulse Shape in Flat Nakagami Fading

NASA Astrophysics Data System (ADS)

Rahman, Mohammad Azizur; Sasaki, Shigenobu; Kikuchi, Hisakazu; Harada, Hiroshi; Kato, Shuzo

A simple exact error rate analysis is presented for random binary direct sequence code division multiple access (DS-CDMA) considering a general pulse shape and flat Nakagami fading channel. First of all, a simple model is developed for the multiple access interference (MAI). Based on this, a simple exact expression of the characteristic function (CF) of MAI is developed in a straight forward manner. Finally, an exact expression of error rate is obtained following the CF method of error rate analysis. The exact error rate so obtained can be much easily evaluated as compared to the only reliable approximate error rate expression currently available, which is based on the Improved Gaussian Approximation (IGA).

MPST Software: grl_pef_check

NASA Technical Reports Server (NTRS)

Call, Jared A.; Kwok, John H.; Fisher, Forest W.

2013-01-01

This innovation is a tool used to verify and validate spacecraft sequences at the predicted events file (PEF) level for the GRAIL (Gravity Recovery and Interior Laboratory, see http://www.nasa. gov/mission_pages/grail/main/index. html) mission as part of the Multi-Mission Planning and Sequencing Team (MPST) operations process to reduce the possibility for errors. This tool is used to catch any sequence related errors or issues immediately after the seqgen modeling to streamline downstream processes. This script verifies and validates the seqgen modeling for the GRAIL MPST process. A PEF is provided as input, and dozens of checks are performed on it to verify and validate the command products including command content, command ordering, flight-rule violations, modeling boundary consistency, resource limits, and ground commanding consistency. By performing as many checks as early in the process as possible, grl_pef_check streamlines the MPST task of generating GRAIL command and modeled products on an aggressive schedule. By enumerating each check being performed, and clearly stating the criteria and assumptions made at each step, grl_pef_check can be used as a manual checklist as well as an automated tool. This helper script was written with a focus on enabling the user with the information they need in order to evaluate a sequence quickly and efficiently, while still keeping them informed and active in the overall sequencing process. grl_pef_check verifies and validates the modeling and sequence content prior to investing any more effort into the build. There are dozens of various items in the modeling run that need to be checked, which is a time-consuming and errorprone task. Currently, no software exists that provides this functionality. Compared to a manual process, this script reduces human error and saves considerable man-hours by automating and streamlining the mission planning and sequencing task for the GRAIL mission.
[Diagnostic Errors in Medicine].

PubMed

Buser, Claudia; Bankova, Andriyana

2015-12-09

The recognition of diagnostic errors in everyday practice can help improve patient safety. The most common diagnostic errors are the cognitive errors, followed by system-related errors and no fault errors. The cognitive errors often result from mental shortcuts, known as heuristics. The rate of cognitive errors can be reduced by a better understanding of heuristics and the use of checklists. The autopsy as a retrospective quality assessment of clinical diagnosis has a crucial role in learning from diagnostic errors. Diagnostic errors occur more often in primary care in comparison to hospital settings. On the other hand, the inpatient errors are more severe than the outpatient errors.
Crosstalk error correction through dynamical decoupling of single-qubit gates in capacitively coupled singlet-triplet semiconductor spin qubits

NASA Astrophysics Data System (ADS)

Buterakos, Donovan; Throckmorton, Robert E.; Das Sarma, S.

2018-01-01

In addition to magnetic field and electric charge noise adversely affecting spin-qubit operations, performing single-qubit gates on one of multiple coupled singlet-triplet qubits presents a new challenge: crosstalk, which is inevitable (and must be minimized) in any multiqubit quantum computing architecture. We develop a set of dynamically corrected pulse sequences that are designed to cancel the effects of both types of noise (i.e., field and charge) as well as crosstalk to leading order, and provide parameters for these corrected sequences for all 24 of the single-qubit Clifford gates. We then provide an estimate of the error as a function of the noise and capacitive coupling to compare the fidelity of our corrected gates to their uncorrected versions. Dynamical error correction protocols presented in this work are important for the next generation of singlet-triplet qubit devices where coupling among many qubits will become relevant.
Parallel processing spacecraft communication system

NASA Technical Reports Server (NTRS)

Bolotin, Gary S. (Inventor); Donaldson, James A. (Inventor); Luong, Huy H. (Inventor); Wood, Steven H. (Inventor)

1998-01-01

An uplink controlling assembly speeds data processing using a special parallel codeblock technique. A correct start sequence initiates processing of a frame. Two possible start sequences can be used; and the one which is used determines whether data polarity is inverted or non-inverted. Processing continues until uncorrectable errors are found. The frame ends by intentionally sending a block with an uncorrectable error. Each of the codeblocks in the frame has a channel ID. Each channel ID can be separately processed in parallel. This obviates the problem of waiting for error correction processing. If that channel number is zero, however, it indicates that the frame of data represents a critical command only. That data is handled in a special way, independent of the software. Otherwise, the processed data further handled using special double buffering techniques to avoid problems from overrun. When overrun does occur, the system takes action to lose only the oldest data.
Reliability of a Longitudinal Sequence of Scale Ratings

ERIC Educational Resources Information Center

Laenen, Annouschka; Alonso, Ariel; Molenberghs, Geert; Vangeneugden, Tony

2009-01-01

Reliability captures the influence of error on a measurement and, in the classical setting, is defined as one minus the ratio of the error variance to the total variance. Laenen, Alonso, and Molenberghs ("Psychometrika" 73:443-448, 2007) proposed an axiomatic definition of reliability and introduced the R[subscript T] coefficient, a measure of…
Fault detection and bypass in a sequence information signal processor

NASA Technical Reports Server (NTRS)

Peterson, John C. (Inventor); Chow, Edward T. (Inventor)

1992-01-01

The invention comprises a plurality of scan registers, each such register respectively associated with a processor element; an on-chip comparator, encoder and fault bypass register. Each scan register generates a unitary signal the logic state of which depends on the correctness of the input from the previous processor in the systolic array. These unitary signals are input to a common comparator which generates an output indicating whether or not an error has occurred. These unitary signals are also input to an encoder which identifies the location of any fault detected so that an appropriate multiplexer can be switched to bypass the faulty processor element. Input scan data can be readily programmed to fully exercise all of the processor elements so that no fault can remain undetected.
Optimization of the moving-bed biofilm sequencing batch reactor (MBSBR) to control aeration time by kinetic computational modeling: Simulated sugar-industry wastewater treatment.

PubMed

Faridnasr, Maryam; Ghanbari, Bastam; Sassani, Ardavan

2016-05-01

A novel approach was applied for optimization of a moving-bed biofilm sequencing batch reactor (MBSBR) to treat sugar-industry wastewater (BOD5=500-2500 and COD=750-3750 mg/L) at 2-4 h of cycle time (CT). Although the experimental data showed that MBSBR reached high BOD5 and COD removal performances, it failed to achieve the standard limits at the mentioned CTs. Thus, optimization of the reactor was rendered by kinetic computational modeling and using statistical error indicator normalized root mean square error (NRMSE). The results of NRMSE revealed that Stover-Kincannon (error=6.40%) and Grau (error=6.15%) models provide better fits to the experimental data and may be used for CT optimization in the reactor. The models predicted required CTs of 4.5, 6.5, 7 and 7.5 h for effluent standardization of 500, 1000, 1500 and 2500 mg/L influent BOD5 concentrations, respectively. Similar pattern of the experimental data also confirmed these findings. Copyright © 2016 Elsevier Ltd. All rights reserved.
A parallel row-based algorithm with error control for standard-cell replacement on a hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Sargent, Jeff Scott

1988-01-01

A new row-based parallel algorithm for standard-cell placement targeted for execution on a hypercube multiprocessor is presented. Key features of this implementation include a dynamic simulated-annealing schedule, row-partitioning of the VLSI chip image, and two novel new approaches to controlling error in parallel cell-placement algorithms; Heuristic Cell-Coloring and Adaptive (Parallel Move) Sequence Control. Heuristic Cell-Coloring identifies sets of noninteracting cells that can be moved repeatedly, and in parallel, with no buildup of error in the placement cost. Adaptive Sequence Control allows multiple parallel cell moves to take place between global cell-position updates. This feedback mechanism is based on an error bound derived analytically from the traditional annealing move-acceptance profile. Placement results are presented for real industry circuits and the performance is summarized of an implementation on the Intel iPSC/2 Hypercube. The runtime of this algorithm is 5 to 16 times faster than a previous program developed for the Hypercube, while producing equivalent quality placement. An integrated place and route program for the Intel iPSC/2 Hypercube is currently being developed.
Pulse sequences for suppressing leakage in single-qubit gate operations

NASA Astrophysics Data System (ADS)

Ghosh, Joydip; Coppersmith, S. N.; Friesen, Mark

2017-06-01

Many realizations of solid-state qubits involve couplings to leakage states lying outside the computational subspace, posing a threat to high-fidelity quantum gate operations. Mitigating leakage errors is especially challenging when the coupling strength is unknown, e.g., when it is caused by noise. Here we show that simple pulse sequences can be used to strongly suppress leakage errors for a qubit embedded in a three-level system. As an example, we apply our scheme to the recently proposed charge quadrupole (CQ) qubit for quantum dots. These results provide a solution to a key challenge for fault-tolerant quantum computing with solid-state elements.
Using string alignment in a query-by-humming system for real world applications

NASA Astrophysics Data System (ADS)

Sailer, Christian

2005-09-01

Though query by humming (i.e., retrieving music or information about music by singing a characteristic melody) has been a popular research topic during the past decade, few approaches have reached a level of usefulness beyond mere scientific interest. One of the main problems is the inherent contradiction between error tolerance and dicriminative power in conventional melody matching algorithms that rely on a melody contour approach to handle intonation or transcription errors. Adopting the string matching/alignment techniques from bioinformatics to melody sequences allows to directly assess the similarity between two melodies. This method takes an MPEG-7 compliant melody sequence (i.e., a list of note intervals and length ratios) as query and evaluates the steps necessary to transform it into the reference sequence. By introducing a musically founded cost-of-replace function and an adequate post processing, this method yields a measure for melodic similarity. Thus it is possible to construct a query by humming system that can properly discriminate between thousands of melodies and still be sufficiently error tolerant to be used by untrained singers. The robustness has been verified in extensive tests and real world applications.
Prevalence of teen driver errors leading to serious motor vehicle crashes.

PubMed

Curry, Allison E; Hafetz, Jessica; Kallan, Michael J; Winston, Flaura K; Durbin, Dennis R

2011-07-01

Motor vehicle crashes are the leading cause of adolescent deaths. Programs and policies should target the most common and modifiable reasons for crashes. We estimated the frequency of critical reasons for crashes involving teen drivers, and examined in more depth specific teen driver errors. The National Highway Traffic Safety Administration's (NHTSA) National Motor Vehicle Crash Causation Survey collected data at the scene of a nationally representative sample of 5470 serious crashes between 7/05 and 12/07. NHTSA researchers assigned a single driver, vehicle, or environmental factor as the critical reason for the event immediately leading to each crash. We analyzed crashes involving 15-18 year old drivers. 822 teen drivers were involved in 795 serious crashes, representing 335,667 teens in 325,291 crashes. Driver error was by far the most common reason for crashes (95.6%), as opposed to vehicle or environmental factors. Among crashes with a driver error, a teen made the error 79.3% of the time (75.8% of all teen-involved crashes). Recognition errors (e.g., inadequate surveillance, distraction) accounted for 46.3% of all teen errors, followed by decision errors (e.g., following too closely, too fast for conditions) (40.1%) and performance errors (e.g., loss of control) (8.0%). Inadequate surveillance, driving too fast for conditions, and distracted driving together accounted for almost half of all crashes. Aggressive driving behavior, drowsy driving, and physical impairments were less commonly cited as critical reasons. Males and females had similar proportions of broadly classified errors, although females were specifically more likely to make inadequate surveillance errors. Our findings support prioritization of interventions targeting driver distraction and surveillance and hazard awareness training. Copyright © 2010 Elsevier Ltd. All rights reserved.
Outage probability of a relay strategy allowing intra-link errors utilizing Slepian-Wolf theorem

NASA Astrophysics Data System (ADS)

Cheng, Meng; Anwar, Khoirul; Matsumoto, Tad

2013-12-01

In conventional decode-and-forward (DF) one-way relay systems, a data block received at the relay node is discarded, if the information part is found to have errors after decoding. Such errors are referred to as intra-link errors in this article. However, in a setup where the relay forwards data blocks despite possible intra-link errors, the two data blocks, one from the source node and the other from the relay node, are highly correlated because they were transmitted from the same source. In this article, we focus on the outage probability analysis of such a relay transmission system, where source-destination and relay-destination links, Link 1 and Link 2, respectively, are assumed to suffer from the correlated fading variation due to block Rayleigh fading. The intra-link is assumed to be represented by a simple bit-flipping model, where some of the information bits recovered at the relay node are the flipped version of their corresponding original information bits at the source. The correlated bit streams are encoded separately by the source and relay nodes, and transmitted block-by-block to a common destination using different time slots, where the information sequence transmitted over Link 2 may be a noise-corrupted interleaved version of the original sequence. The joint decoding takes place at the destination by exploiting the correlation knowledge of the intra-link (source-relay link). It is shown that the outage probability of the proposed transmission technique can be expressed by a set of double integrals over the admissible rate range, given by the Slepian-Wolf theorem, with respect to the probability density function ( pdf) of the instantaneous signal-to-noise power ratios (SNR) of Link 1 and Link 2. It is found that, with the Slepian-Wolf relay technique, so far as the correlation ρ of the complex fading variation is | ρ|<1, the 2nd order diversity can be achieved only if the two bit streams are fully correlated. This indicates that the diversity order exhibited in the outage curve converges to 1 when the bit streams are not fully correlated. Moreover, the Slepian-Wolf outage probability is proved to be smaller than that of the 2nd order maximum ratio combining (MRC) diversity, if the average SNRs of the two independent links are the same. Exact as well as asymptotic expressions of the outage probability are theoretically derived in the article. In addition, the theoretical outage results are compared with the frame-error-rate (FER) curves, obtained by a series of simulations for the Slepian-Wolf relay system based on bit-interleaved coded modulation with iterative detection (BICM-ID). It is shown that the FER curves exhibit the same tendency as the theoretical results.
Out-of-This-World Calculations

ERIC Educational Resources Information Center

Kalb, Kristina S.; Gravett, Julie M.

2012-01-01

By following learned rules rather than reasoning, students often fall into common error patterns, something every experienced teacher has observed in the classroom. In their effort to circumvent the developing common error patterns of their students, the authors decided to supplement their math text with two weeklong investigations. The first was…
Ten common errors beginning substance abuse workers make in group treatment.

PubMed

Greif, G L

1996-01-01

Beginning therapists sometimes make mistakes when working with substance abusers in groups. This article discusses ten common errors that the author has observed. Five center on the therapist's approach and five center on the nuts and bolts of group leadership. Suggestions are offered for how to avoid them.
Neuropsychological analysis of a typewriting disturbance following cerebral damage.

PubMed

Boyle, M; Canter, G J

1987-01-01

Following a left CVA, a skilled professional typist sustained a disturbance of typing disproportionate to her handwriting disturbance. Typing errors were predominantly of the sequencing type, with spatial errors much less frequent, suggesting that the impairment was based on a relatively early (premotor) stage of processing. Depriving the subject of visual feedback during handwriting greatly increased her error rate. Similarly, interfering with auditory feedback during speech substantially reduced her self-correction of speech errors. These findings suggested that impaired ability to utilize somesthetic information--probably caused by the subject's parietal lobe lesion--may have been the basis of the typing disorder.
Systematic Errors in an Air Track Experiment.

ERIC Educational Resources Information Center

Ramirez, Santos A.; Ham, Joe S.

1990-01-01

Errors found in a common physics experiment to measure acceleration resulting from gravity using a linear air track are investigated. Glider position at release and initial velocity are shown to be sources of systematic error. (CW)
Discrimination of germline V genes at different sequencing lengths and mutational burdens: A new tool for identifying and evaluating the reliability of V gene assignment.

PubMed

Zhang, Bochao; Meng, Wenzhao; Prak, Eline T Luning; Hershberg, Uri

2015-12-01

Immune repertoires are collections of lymphocytes that express diverse antigen receptor gene rearrangements consisting of Variable (V), (Diversity (D) in the case of heavy chains) and Joining (J) gene segments. Clonally related cells typically share the same germline gene segments and have highly similar junctional sequences within their third complementarity determining regions. Identifying clonal relatedness of sequences is a key step in the analysis of immune repertoires. The V gene is the most important for clone identification because it has the longest sequence and the greatest number of sequence variants. However, accurate identification of a clone's germline V gene source is challenging because there is a high degree of similarity between different germline V genes. This difficulty is compounded in antibodies, which can undergo somatic hypermutation. Furthermore, high-throughput sequencing experiments often generate partial sequences and have significant error rates. To address these issues, we describe a novel method to estimate which germline V genes (or alleles) cannot be discriminated under different conditions (read lengths, sequencing errors or somatic hypermutation frequencies). Starting with any set of germline V genes, this method measures their similarity using different sequencing lengths and calculates their likelihood of unambiguous assignment under different levels of mutation. Hence, one can identify, under different experimental and biological conditions, the germline V genes (or alleles) that cannot be uniquely identified and bundle them together into groups of specific V genes with highly similar sequences. Copyright © 2015 Elsevier B.V. All rights reserved.
SIBIS: a Bayesian model for inconsistent protein sequence estimation.

PubMed

Khenoussi, Walyd; Vanhoutrève, Renaud; Poch, Olivier; Thompson, Julie D

2014-09-01

The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
An effective solution to the nonlinear, nonstationary Navier-Stokes equations for two dimensions

NASA Technical Reports Server (NTRS)

Gabrielsen, R. E.

1975-01-01

A sequence of approximate solutions for the nonlinear, nonstationary Navier-Stokes equations for a two-dimensional domain, from which explicit error estimates and rates of convergence are obtained, is described. This sequence of approximate solutions is based primarily on the Newton-Kantorovich method.
Fast imputation using medium- or low-coverage sequence data

USDA-ARS?s Scientific Manuscript database

Direct imputation from raw sequence reads can be more accurate than calling genotypes first and then imputing, especially if read depth is low or error rates high, but different imputation strategies are required than those used for data from genotyping chips. A fast algorithm to impute from lower t...

La parole, vue et prise par les etudiants (Speech as Seen and Understood by Student).

ERIC Educational Resources Information Center

Gajo, Laurent, Ed.; Jeanneret, Fabrice, Ed.

1998-01-01

Articles on speech and second language learning include: "Les sequences de correction en classe de langue seconde: evitement du 'non' explicite" ("Error Correction Sequences in Second Language Class: Avoidance of the Explicit 'No'") (Anne-Lise de Bosset); "Analyse hierarchique et fonctionnelle du discours: conversations…
In Search of Grid Converged Solutions

NASA Technical Reports Server (NTRS)

Lockard, David P.

2010-01-01

Assessing solution error continues to be a formidable task when numerically solving practical flow problems. Currently, grid refinement is the primary method used for error assessment. The minimum grid spacing requirements to achieve design order accuracy for a structured-grid scheme are determined for several simple examples using truncation error evaluations on a sequence of meshes. For certain methods and classes of problems, obtaining design order may not be sufficient to guarantee low error. Furthermore, some schemes can require much finer meshes to obtain design order than would be needed to reduce the error to acceptable levels. Results are then presented from realistic problems that further demonstrate the challenges associated with using grid refinement studies to assess solution accuracy.
Meta sequence analysis of human blood peptides and their parent proteins.

PubMed

Bowden, Peter; Pendrak, Voitek; Zhu, Peihong; Marshall, John G

2010-04-18

Sequence analysis of the blood peptides and their qualities will be key to understanding the mechanisms that contribute to error in LC-ESI-MS/MS. Analysis of peptides and their proteins at the level of sequences is much more direct and informative than the comparison of disparate accession numbers. A portable database of all blood peptide and protein sequences with descriptor fields and gene ontology terms might be useful for designing immunological or MRM assays from human blood. The results of twelve studies of human blood peptides and/or proteins identified by LC-MS/MS and correlated against a disparate array of genetic libraries were parsed and matched to proteins from the human ENSEMBL, SwissProt and RefSeq databases by SQL. The reported peptide and protein sequences were organized into an SQL database with full protein sequences and up to five unique peptides in order of prevalence along with the peptide count for each protein. Structured query language or BLAST was used to acquire descriptive information in current databases. Sampling error at the level of peptides is the largest source of disparity between groups. Chi Square analysis of peptide to protein distributions confirmed the significant agreement between groups on identified proteins. Copyright 2010. Published by Elsevier B.V.
Complete chloroplast genome sequence of MD-2 pineapple and its comparative analysis among nine other plants from the subclass Commelinidae.

PubMed

Redwan, R M; Saidin, A; Kumar, S V

2015-08-12

Pineapple (Ananas comosus var. comosus) is known as the king of fruits for its crown and is the third most important tropical fruit after banana and citrus. The plant, which is indigenous to South America, is the most important species in the Bromeliaceae family and is largely traded for fresh fruit consumption. Here, we report the complete chloroplast sequence of the MD-2 pineapple that was sequenced using the PacBio sequencing technology. In this study, the high error rate of PacBio long sequence reads of A. comosus's total genomic DNA were improved by leveraging on the high accuracy but short Illumina reads for error-correction via the latest error correction module from Novocraft. Error corrected long PacBio reads were assembled by using a single tool to produce a contig representing the pineapple chloroplast genome. The genome of 159,636 bp in length is featured with the conserved quadripartite structure of chloroplast containing a large single copy region (LSC) with a size of 87,482 bp, a small single copy region (SSC) with a size of 18,622 bp and two inverted repeat regions (IRA and IRB) each with the size of 26,766 bp. Overall, the genome contained 117 unique coding regions and 30 were repeated in the IR region with its genes contents, structure and arrangement similar to its sister taxon, Typha latifolia. A total of 35 repeats structure were detected in both the coding and non-coding regions with a majority being tandem repeats. In addition, 205 SSRs were detected in the genome with six protein-coding genes contained more than two SSRs. Comparative chloroplast genomes from the subclass Commelinidae revealed a conservative protein coding gene albeit located in a highly divergence region. Analysis of selection pressure on protein-coding genes using Ka/Ks ratio showed significant positive selection exerted on the rps7 gene of the pineapple chloroplast with P less than 0.05. Phylogenetic analysis confirmed the recent taxonomical relation among the member of commelinids which support the monophyly relationship between Arecales and Dasypogonaceae and between Zingiberales to the Poales, which includes the A. comosus. The complete sequence of the chloroplast of pineapple provides insights to the divergence of genic chloroplast sequences from the members of the subclass Commelinidae. The complete pineapple chloroplast will serve as a reference for in-depth taxonomical studies in the Bromeliaceae family when more species under the family are sequenced in the future. The genetic sequence information will also make feasible other molecular applications of the pineapple chloroplast for plant genetic improvement.
Categorizing accident sequences in the external radiotherapy for risk analysis

PubMed Central

2013-01-01

Purpose This study identifies accident sequences from the past accidents in order to help the risk analysis application to the external radiotherapy. Materials and Methods This study reviews 59 accidental cases in two retrospective safety analyses that have collected the incidents in the external radiotherapy extensively. Two accident analysis reports that accumulated past incidents are investigated to identify accident sequences including initiating events, failure of safety measures, and consequences. This study classifies the accidents by the treatments stages and sources of errors for initiating events, types of failures in the safety measures, and types of undesirable consequences and the number of affected patients. Then, the accident sequences are grouped into several categories on the basis of similarity of progression. As a result, these cases can be categorized into 14 groups of accident sequence. Results The result indicates that risk analysis needs to pay attention to not only the planning stage, but also the calibration stage that is committed prior to the main treatment process. It also shows that human error is the largest contributor to initiating events as well as to the failure of safety measures. This study also illustrates an event tree analysis for an accident sequence initiated in the calibration. Conclusion This study is expected to provide sights into the accident sequences for the prospective risk analysis through the review of experiences. PMID:23865005
The Orthogonally Partitioned EM Algorithm: Extending the EM Algorithm for Algorithmic Stability and Bias Correction Due to Imperfect Data.

PubMed

Regier, Michael D; Moodie, Erica E M

2016-05-01

We propose an extension of the EM algorithm that exploits the common assumption of unique parameterization, corrects for biases due to missing data and measurement error, converges for the specified model when standard implementation of the EM algorithm has a low probability of convergence, and reduces a potentially complex algorithm into a sequence of smaller, simpler, self-contained EM algorithms. We use the theory surrounding the EM algorithm to derive the theoretical results of our proposal, showing that an optimal solution over the parameter space is obtained. A simulation study is used to explore the finite sample properties of the proposed extension when there is missing data and measurement error. We observe that partitioning the EM algorithm into simpler steps may provide better bias reduction in the estimation of model parameters. The ability to breakdown a complicated problem in to a series of simpler, more accessible problems will permit a broader implementation of the EM algorithm, permit the use of software packages that now implement and/or automate the EM algorithm, and make the EM algorithm more accessible to a wider and more general audience.
OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis.

PubMed

Verzotto, Davide; M Teo, Audrey S; Hillmer, Axel M; Nagarajan, Niranjan

2016-01-01

Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging because of the lack of efficient and sensitive map-alignment algorithms for robustly aligning error-prone maps to sequences. We introduce a novel seed-and-extend glocal (short for global-local) alignment method, OPTIMA (and a sliding-window extension for overlap alignment, OPTIMA-Overlap), which is the first to create indexes for continuous-valued mapping data while accounting for mapping errors. We also present a novel statistical model, agnostic with respect to technology-dependent error rates, for conservatively evaluating the significance of alignments without relying on expensive permutation-based tests. We show that OPTIMA and OPTIMA-Overlap outperform other state-of-the-art approaches (1.6-2 times more sensitive) and are more efficient (170-200 %) and precise in their alignments (nearly 99 % precision). These advantages are independent of the quality of the data, suggesting that our indexing approach and statistical evaluation are robust, provide improved sensitivity and guarantee high precision.
Spatial Distortion in MRI-Guided Stereotactic Procedures: Evaluation in 1.5-, 3- and 7-Tesla MRI Scanners.

PubMed

Neumann, Jan-Oliver; Giese, Henrik; Biller, Armin; Nagel, Armin M; Kiening, Karl

2015-01-01

Magnetic resonance imaging (MRI) is replacing computed tomography (CT) as the main imaging modality for stereotactic transformations. MRI is prone to spatial distortion artifacts, which can lead to inaccuracy in stereotactic procedures. Modern MRI systems provide distortion correction algorithms that may ameliorate this problem. This study investigates the different options of distortion correction using standard 1.5-, 3- and 7-tesla MRI scanners. A phantom was mounted on a stereotactic frame. One CT scan and three MRI scans were performed. At all three field strengths, two 3-dimensional sequences, volumetric interpolated breath-hold examination (VIBE) and magnetization-prepared rapid acquisition with gradient echo, were acquired, and automatic distortion correction was performed. Global stereotactic transformation of all 13 datasets was performed and two stereotactic planning workflows (MRI only vs. CT/MR image fusion) were subsequently analysed. Distortion correction on the 1.5- and 3-tesla scanners caused a considerable reduction in positional error. The effect was more pronounced when using the VIBE sequences. By using co-registration (CT/MR image fusion), even a lower positional error could be obtained. In ultra-high-field (7 T) MR imaging, distortion correction introduced even higher errors. However, the accuracy of non-corrected 7-tesla sequences was comparable to CT/MR image fusion 3-tesla imaging. MRI distortion correction algorithms can reduce positional errors by up to 60%. For stereotactic applications of utmost precision, we recommend a co-registration to an additional CT dataset. © 2015 S. Karger AG, Basel.
Error control techniques for satellite and space communications

NASA Technical Reports Server (NTRS)

Costello, Daniel J., Jr.

1994-01-01

The unequal error protection capabilities of convolutional and trellis codes are studied. In certain environments, a discrepancy in the amount of error protection placed on different information bits is desirable. Examples of environments which have data of varying importance are a number of speech coding algorithms, packet switched networks, multi-user systems, embedded coding systems, and high definition television. Encoders which provide more than one level of error protection to information bits are called unequal error protection (UEP) codes. In this work, the effective free distance vector, d, is defined as an alternative to the free distance as a primary performance parameter for UEP convolutional and trellis encoders. For a given (n, k), convolutional encoder, G, the effective free distance vector is defined as the k-dimensional vector d = (d(sub 0), d(sub 1), ..., d(sub k-1)), where d(sub j), the j(exp th) effective free distance, is the lowest Hamming weight among all code sequences that are generated by input sequences with at least one '1' in the j(exp th) position. It is shown that, although the free distance for a code is unique to the code and independent of the encoder realization, the effective distance vector is dependent on the encoder realization.
Genomic insight into the common carp (Cyprinus carpio) genome by sequencing analysis of BAC-end sequences

PubMed Central

2011-01-01

Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES) are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio), a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3,100 microsyntenies, covering over 50% of the zebrafish genome. BES of common carp are tremendous tools for comparative mapping between the two closely related species, zebrafish and common carp, which should facilitate both structural and functional genome analysis in common carp. PMID:21492448
Effects of skilled nursing facility structure and process factors on medication errors during nursing home admission.

PubMed

Lane, Sandi J; Troyer, Jennifer L; Dienemann, Jacqueline A; Laditka, Sarah B; Blanchette, Christopher M

2014-01-01

Older adults are at greatest risk of medication errors during the transition period of the first 7 days after admission and readmission to a skilled nursing facility (SNF). The aim of this study was to evaluate structure- and process-related factors that contribute to medication errors and harm during transition periods at a SNF. Data for medication errors and potential medication errors during the 7-day transition period for residents entering North Carolina SNFs were from the Medication Error Quality Initiative-Individual Error database from October 2006 to September 2007. The impact of SNF structure and process measures on the number of reported medication errors and harm from errors were examined using bivariate and multivariate model methods. A total of 138 SNFs reported 581 transition period medication errors; 73 (12.6%) caused harm. Chain affiliation was associated with a reduction in the volume of errors during the transition period. One third of all reported transition errors occurred during the medication administration phase of the medication use process, where dose omissions were the most common type of error; however, dose omissions caused harm less often than wrong-dose errors did. Prescribing errors were much less common than administration errors but were much more likely to cause harm. Both structure and process measures of quality were related to the volume of medication errors.However, process quality measures may play a more important role in predicting harm from errors during the transition of a resident into an SNF. Medication errors during transition could be reduced by improving both prescribing processes and transcription and documentation of orders.
Structure-Function Analysis of Chloroplast Proteins via Random Mutagenesis Using Error-Prone PCR.

PubMed

Dumas, Louis; Zito, Francesca; Auroy, Pascaline; Johnson, Xenie; Peltier, Gilles; Alric, Jean

2018-06-01

Site-directed mutagenesis of chloroplast genes was developed three decades ago and has greatly advanced the field of photosynthesis research. Here, we describe a new approach for generating random chloroplast gene mutants that combines error-prone polymerase chain reaction of a gene of interest with chloroplast complementation of the knockout Chlamydomonas reinhardtii mutant. As a proof of concept, we targeted a 300-bp sequence of the petD gene that encodes subunit IV of the thylakoid membrane-bound cytochrome b 6 f complex. By sequencing chloroplast transformants, we revealed 149 mutations in the 300-bp target petD sequence that resulted in 92 amino acid substitutions in the 100-residue target subunit IV sequence. Our results show that this method is suited to the study of highly hydrophobic, multisubunit, and chloroplast-encoded proteins containing cofactors such as hemes, iron-sulfur clusters, and chlorophyll pigments. Moreover, we show that mutant screening and sequencing can be used to study photosynthetic mechanisms or to probe the mutational robustness of chloroplast-encoded proteins, and we propose that this method is a valuable tool for the directed evolution of enzymes in the chloroplast. © 2018 American Society of Plant Biologists. All rights reserved.
Development of an Ontology to Model Medical Errors, Information Needs, and the Clinical Communication Space

PubMed Central

Stetson, Peter D.; McKnight, Lawrence K.; Bakken, Suzanne; Curran, Christine; Kubose, Tate T.; Cimino, James J.

2002-01-01

Medical errors are common, costly and often preventable. Work in understanding the proximal causes of medical errors demonstrates that systems failures predispose to adverse clinical events. Most of these systems failures are due to lack of appropriate information at the appropriate time during the course of clinical care. Problems with clinical communication are common proximal causes of medical errors. We have begun a project designed to measure the impact of wireless computing on medical errors. We report here on our efforts to develop an ontology representing the intersection of medical errors, information needs and the communication space. We will use this ontology to support the collection, storage and interpretation of project data. The ontology’s formal representation of the concepts in this novel domain will help guide the rational deployment of our informatics interventions. A real-life scenario is evaluated using the ontology in order to demonstrate its utility.
Building a genome database using an object-oriented approach.

PubMed

Barbasiewicz, Anna; Liu, Lin; Lang, B Franz; Burger, Gertraud

2002-01-01

GOBASE is a relational database that integrates data associated with mitochondria and chloroplasts. The most important data in GOBASE, i. e., molecular sequences and taxonomic information, are obtained from the public sequence data repository at the National Center for Biotechnology Information (NCBI), and are validated by our experts. Maintaining a curated genomic database comes with a towering labor cost, due to the shear volume of available genomic sequences and the plethora of annotation errors and omissions in records retrieved from public repositories. Here we describe our approach to increase automation of the database population process, thereby reducing manual intervention. As a first step, we used Unified Modeling Language (UML) to construct a list of potential errors. Each case was evaluated independently, and an expert solution was devised, and represented as a diagram. Subsequently, the UML diagrams were used as templates for writing object-oriented automation programs in the Java programming language.
Nucleotide-Specific Contrast for DNA Sequencing by Electron Spectroscopy.

PubMed

Mankos, Marian; Persson, Henrik H J; N'Diaye, Alpha T; Shadman, Khashayar; Schmid, Andreas K; Davis, Ronald W

2016-01-01

DNA sequencing by imaging in an electron microscope is an approach that holds promise to deliver long reads with low error rates and without the need for amplification. Earlier work using transmission electron microscopes, which use high electron energies on the order of 100 keV, has shown that low contrast and radiation damage necessitates the use of heavy atom labeling of individual nucleotides, which increases the read error rates. Other prior work using scattering electrons with much lower energy has shown to suppress beam damage on DNA. Here we explore possibilities to increase contrast by employing two methods, X-ray photoelectron and Auger electron spectroscopy. Using bulk DNA samples with monomers of each base, both methods are shown to provide contrast mechanisms that can distinguish individual nucleotides without labels. Both spectroscopic techniques can be readily implemented in a low energy electron microscope, which may enable label-free DNA sequencing by direct imaging.
Performance Analysis of Direct-Sequence Code-Division Multiple-Access Communications with Asymmetric Quadrature Phase-Shift-Keying Modulation

NASA Technical Reports Server (NTRS)

Wang, C.-W.; Stark, W.

2005-01-01

This article considers a quaternary direct-sequence code-division multiple-access (DS-CDMA) communication system with asymmetric quadrature phase-shift-keying (AQPSK) modulation for unequal error protection (UEP) capability. Both time synchronous and asynchronous cases are investigated. An expression for the probability distribution of the multiple-access interference is derived. The exact bit-error performance and the approximate performance using a Gaussian approximation and random signature sequences are evaluated by extending the techniques used for uniform quadrature phase-shift-keying (QPSK) and binary phase-shift-keying (BPSK) DS-CDMA systems. Finally, a general system model with unequal user power and the near-far problem is considered and analyzed. The results show that, for a system with UEP capability, the less protected data bits are more sensitive to the near-far effect that occurs in a multiple-access environment than are the more protected bits.
Spacecraft command verification: The AI solution

NASA Technical Reports Server (NTRS)

Fesq, Lorraine M.; Stephan, Amy; Smith, Brian K.

1990-01-01

Recently, a knowledge-based approach was used to develop a system called the Command Constraint Checker (CCC) for TRW. CCC was created to automate the process of verifying spacecraft command sequences. To check command files by hand for timing and sequencing errors is a time-consuming and error-prone task. Conventional software solutions were rejected when it was estimated that it would require 36 man-months to build an automated tool to check constraints by conventional methods. Using rule-based representation to model the various timing and sequencing constraints of the spacecraft, CCC was developed and tested in only three months. By applying artificial intelligence techniques, CCC designers were able to demonstrate the viability of AI as a tool to transform difficult problems into easily managed tasks. The design considerations used in developing CCC are discussed and the potential impact of this system on future satellite programs is examined.
Random access in large-scale DNA data storage.

PubMed

Organick, Lee; Ang, Siena Dumas; Chen, Yuan-Jyue; Lopez, Randolph; Yekhanin, Sergey; Makarychev, Konstantin; Racz, Miklos Z; Kamath, Govinda; Gopalan, Parikshit; Nguyen, Bichlien; Takahashi, Christopher N; Newman, Sharon; Parker, Hsing-Yeh; Rashtchian, Cyrus; Stewart, Kendall; Gupta, Gagan; Carlson, Robert; Mulligan, John; Carmean, Douglas; Seelig, Georg; Ceze, Luis; Strauss, Karin

2018-03-01

Synthetic DNA is durable and can encode digital data with high density, making it an attractive medium for data storage. However, recovering stored data on a large-scale currently requires all the DNA in a pool to be sequenced, even if only a subset of the information needs to be extracted. Here, we encode and store 35 distinct files (over 200 MB of data), in more than 13 million DNA oligonucleotides, and show that we can recover each file individually and with no errors, using a random access approach. We design and validate a large library of primers that enable individual recovery of all files stored within the DNA. We also develop an algorithm that greatly reduces the sequencing read coverage required for error-free decoding by maximizing information from all sequence reads. These advances demonstrate a viable, large-scale system for DNA data storage and retrieval.
Putting Meaning Back Into the Mean: A Comment on the Misuse of Elementary Statistics in a Sample of Manuscripts Submitted to Clinical Therapeutics.

PubMed

Forrester, Janet E

2015-12-01

Errors in the statistical presentation and analyses of data in the medical literature remain common despite efforts to improve the review process, including the creation of guidelines for authors and the use of statistical reviewers. This article discusses common elementary statistical errors seen in manuscripts recently submitted to Clinical Therapeutics and describes some ways in which authors and reviewers can identify errors and thus correct them before publication. A nonsystematic sample of manuscripts submitted to Clinical Therapeutics over the past year was examined for elementary statistical errors. Clinical Therapeutics has many of the same errors that reportedly exist in other journals. Authors require additional guidance to avoid elementary statistical errors and incentives to use the guidance. Implementation of reporting guidelines for authors and reviewers by journals such as Clinical Therapeutics may be a good approach to reduce the rate of statistical errors. Copyright © 2015 Elsevier HS Journals, Inc. All rights reserved.
Assessing the performance of the Oxford Nanopore Technologies MinION

PubMed Central

Laver, T.; Harrison, J.; O’Neill, P.A.; Moore, K.; Farbos, A.; Paszkiewicz, K.; Studholme, D.J.

2015-01-01

The Oxford Nanopore Technologies (ONT) MinION is a new sequencing technology that potentially offers read lengths of tens of kilobases (kb) limited only by the length of DNA molecules presented to it. The device has a low capital cost, is by far the most portable DNA sequencer available, and can produce data in real-time. It has numerous prospective applications including improving genome sequence assemblies and resolution of repeat-rich regions. Before such a technology is widely adopted, it is important to assess its performance and limitations in respect of throughput and accuracy. In this study we assessed the performance of the MinION by re-sequencing three bacterial genomes, with very different nucleotide compositions ranging from 28.6% to 70.7%; the high G + C strain was underrepresented in the sequencing reads. We estimate the error rate of the MinION (after base calling) to be 38.2%. Mean and median read lengths were 2 kb and 1 kb respectively, while the longest single read was 98 kb. The whole length of a 5 kb rRNA operon was covered by a single read. As the first nanopore-based single molecule sequencer available to researchers, the MinION is an exciting prospect; however, the current error rate limits its ability to compete with existing sequencing technologies, though we do show that MinION sequence reads can enhance contiguity of de novo assembly when used in conjunction with Illumina MiSeq data. PMID:26753127

Perceptions of Randomness: Why Three Heads Are Better than Four

ERIC Educational Resources Information Center

Hahn, Ulrike; Warren, Paul A.

2009-01-01

A long tradition of psychological research has lamented the systematic errors and biases in people's perception of the characteristics of sequences generated by a random mechanism such as a coin toss. It is proposed that once the likely nature of people's actual experience of such processes is taken into account, these "errors" and "biases"…
NASTRAN maintenance and enhancement experiences

NASA Technical Reports Server (NTRS)

Schmitz, R. P.

1975-01-01

The current capability is described which includes isoparametric elements, optimization of grid point sequencing, and eigenvalue routine. Overlay and coding errors were corrected for cyclic symmetry, transient response, and differential stiffness rigid formats. Error corrections and program enhancements are discussed along with developments scheduled for the current year and a brief description of analyses being performed using the program.
Reliability Generalization: The Importance of Considering Sample Specificity, Confident Intervals, and Subgroup Differences.

ERIC Educational Resources Information Center

Onwuegbuzie, Anthony J.; Daniel, Larry G.

The purposes of this paper are to identify common errors made by researchers when dealing with reliability coefficients and to outline best practices for reporting and interpreting reliability coefficients. Common errors that researchers make are: (1) stating that the instruments are reliable; (2) incorrectly interpreting correlation coefficients;…
The Effectiveness of Chinese NNESTs in Teaching English Syntax

ERIC Educational Resources Information Center

Chou, Chun-Hui; Bartz, Kevin

2007-01-01

This paper evaluates the effect of Chinese non-native English-speaking teachers (NNESTs) on Chinese ESL students' struggles with English syntax. The paper first classifies Chinese learners' syntactic errors into 10 common types. It demonstrates how each type of error results from an internal attempt to translate a common Chinese construction into…
The global burden of diagnostic errors in primary care

PubMed Central

Singh, Hardeep; Schiff, Gordon D; Graber, Mark L; Onakpoya, Igho; Thompson, Matthew J

2017-01-01

Diagnosis is one of the most important tasks performed by primary care physicians. The World Health Organization (WHO) recently prioritized patient safety areas in primary care, and included diagnostic errors as a high-priority problem. In addition, a recent report from the Institute of Medicine in the USA, ‘Improving Diagnosis in Health Care’, concluded that most people will likely experience a diagnostic error in their lifetime. In this narrative review, we discuss the global significance, burden and contributory factors related to diagnostic errors in primary care. We synthesize available literature to discuss the types of presenting symptoms and conditions most commonly affected. We then summarize interventions based on available data and suggest next steps to reduce the global burden of diagnostic errors. Research suggests that we are unlikely to find a ‘magic bullet’ and confirms the need for a multifaceted approach to understand and address the many systems and cognitive issues involved in diagnostic error. Because errors involve many common conditions and are prevalent across all countries, the WHO’s leadership at a global level will be instrumental to address the problem. Based on our review, we recommend that the WHO consider bringing together primary care leaders, practicing frontline clinicians, safety experts, policymakers, the health IT community, medical education and accreditation organizations, researchers from multiple disciplines, patient advocates, and funding bodies among others, to address the many common challenges and opportunities to reduce diagnostic error. This could lead to prioritization of practice changes needed to improve primary care as well as setting research priorities for intervention development to reduce diagnostic error. PMID:27530239
A Comparison of Medication Histories Obtained by a Pharmacy Technician Versus Nurses in the Emergency Department.

PubMed

Markovic, Marija; Mathis, A Scott; Ghin, Hoytin Lee; Gardiner, Michelle; Fahim, Germin

2017-01-01

To compare the medication history error rate of the emergency department (ED) pharmacy technician with that of nursing staff and to describe the workflow environment. Fifty medication histories performed by an ED nurse followed by the pharmacy technician were evaluated for discrepancies (RN-PT group). A separate 50 medication histories performed by the pharmacy technician and observed with necessary intervention by the ED pharmacist were evaluated for discrepancies (PT-RPh group). Discrepancies were totaled and categorized by type of error and therapeutic category of the medication. The workflow description was obtained by observation and staff interview. A total of 474 medications in the RN-PT group and 521 in the PT-RPh group were evaluated. Nurses made at least one error in all 50 medication histories (100%), compared to 18 medication histories for the pharmacy technician (36%). In the RN-PT group, 408 medications had at least one error, corresponding to an accuracy rate of 14% for nurses. In the PT-RPh group, 30 medications had an error, corresponding to an accuracy rate of 94.4% for the pharmacy technician ( P < 0.0001). The most common error made by nurses was a missing medication (n = 109), while the most common error for the pharmacy technician was a wrong medication frequency (n = 19). The most common drug class with documented errors for ED nurses was cardiovascular medications (n = 100), while the pharmacy technician made the most errors in gastrointestinal medications (n = 11). Medication histories obtained by the pharmacy technician were significantly more accurate than those obtained by nurses in the emergency department.
Geolocation error tracking of ZY-3 three line cameras

NASA Astrophysics Data System (ADS)

Pan, Hongbo

2017-01-01

The high-accuracy geolocation of high-resolution satellite images (HRSIs) is a key issue for mapping and integrating multi-temporal, multi-sensor images. In this manuscript, we propose a new geometric frame for analysing the geometric error of a stereo HRSI, in which the geolocation error can be divided into three parts: the epipolar direction, cross base direction, and height direction. With this frame, we proved that the height error of three line cameras (TLCs) is independent of nadir images, and that the terrain effect has a limited impact on the geolocation errors. For ZY-3 error sources, the drift error in both the pitch and roll angle and its influence on the geolocation accuracy are analysed. Epipolar and common tie-point constraints are proposed to study the bundle adjustment of HRSIs. Epipolar constraints explain that the relative orientation can reduce the number of compensation parameters in the cross base direction and have a limited impact on the height accuracy. The common tie points adjust the pitch-angle errors to be consistent with each other for TLCs. Therefore, free-net bundle adjustment of a single strip cannot significantly improve the geolocation accuracy. Furthermore, the epipolar and common tie-point constraints cause the error to propagate into the adjacent strip when multiple strips are involved in the bundle adjustment, which results in the same attitude uncertainty throughout the whole block. Two adjacent strips-Orbit 305 and Orbit 381, covering 7 and 12 standard scenes separately-and 308 ground control points (GCPs) were used for the experiments. The experiments validate the aforementioned theory. The planimetric and height root mean square errors were 2.09 and 1.28 m, respectively, when two GCPs were settled at the beginning and end of the block.
Error Analysis in Mathematics. Technical Report #1012

ERIC Educational Resources Information Center

Lai, Cheng-Fei

2012-01-01

Error analysis is a method commonly used to identify the cause of student errors when they make consistent mistakes. It is a process of reviewing a student's work and then looking for patterns of misunderstanding. Errors in mathematics can be factual, procedural, or conceptual, and may occur for a number of reasons. Reasons why students make…
Information-Gathering Patterns Associated with Higher Rates of Diagnostic Error

ERIC Educational Resources Information Center

Delzell, John E., Jr.; Chumley, Heidi; Webb, Russell; Chakrabarti, Swapan; Relan, Anju

2009-01-01

Diagnostic errors are an important source of medical errors. Problematic information-gathering is a common cause of diagnostic errors among physicians and medical students. The objectives of this study were to (1) determine if medical students' information-gathering patterns formed clusters of similar strategies, and if so (2) to calculate the…
Lexical Errors and Accuracy in Foreign Language Writing. Second Language Acquisition

ERIC Educational Resources Information Center

del Pilar Agustin Llach, Maria

2011-01-01

Lexical errors are a determinant in gaining insight into vocabulary acquisition, vocabulary use and writing quality assessment. Lexical errors are very frequent in the written production of young EFL learners, but they decrease as learners gain proficiency. Misspellings are the most common category, but formal errors give way to semantic-based…
More on Systematic Error in a Boyle's Law Experiment

ERIC Educational Resources Information Center

McCall, Richard P.

2012-01-01

A recent article in "The Physics Teacher" describes a method for analyzing a systematic error in a Boyle's law laboratory activity. Systematic errors are important to consider in physics labs because they tend to bias the results of measurements. There are numerous laboratory examples and resources that discuss this common source of error.
Random measurement error: Why worry? An example of cardiovascular risk factors.

PubMed

Brakenhoff, Timo B; van Smeden, Maarten; Visseren, Frank L J; Groenwold, Rolf H H

2018-01-01

With the increased use of data not originally recorded for research, such as routine care data (or 'big data'), measurement error is bound to become an increasingly relevant problem in medical research. A common view among medical researchers on the influence of random measurement error (i.e. classical measurement error) is that its presence leads to some degree of systematic underestimation of studied exposure-outcome relations (i.e. attenuation of the effect estimate). For the common situation where the analysis involves at least one exposure and one confounder, we demonstrate that the direction of effect of random measurement error on the estimated exposure-outcome relations can be difficult to anticipate. Using three example studies on cardiovascular risk factors, we illustrate that random measurement error in the exposure and/or confounder can lead to underestimation as well as overestimation of exposure-outcome relations. We therefore advise medical researchers to refrain from making claims about the direction of effect of measurement error in their manuscripts, unless the appropriate inferential tools are used to study or alleviate the impact of measurement error from the analysis.
A new accuracy measure based on bounded relative error for time series forecasting

PubMed Central

Twycross, Jamie; Garibaldi, Jonathan M.

2017-01-01

Many accuracy measures have been proposed in the past for time series forecasting comparisons. However, many of these measures suffer from one or more issues such as poor resistance to outliers and scale dependence. In this paper, while summarising commonly used accuracy measures, a special review is made on the symmetric mean absolute percentage error. Moreover, a new accuracy measure called the Unscaled Mean Bounded Relative Absolute Error (UMBRAE), which combines the best features of various alternative measures, is proposed to address the common issues of existing measures. A comparative evaluation on the proposed and related measures has been made with both synthetic and real-world data. The results indicate that the proposed measure, with user selectable benchmark, performs as well as or better than other measures on selected criteria. Though it has been commonly accepted that there is no single best accuracy measure, we suggest that UMBRAE could be a good choice to evaluate forecasting methods, especially for cases where measures based on geometric mean of relative errors, such as the geometric mean relative absolute error, are preferred. PMID:28339480
A new accuracy measure based on bounded relative error for time series forecasting.

PubMed

Chen, Chao; Twycross, Jamie; Garibaldi, Jonathan M

2017-01-01

Many accuracy measures have been proposed in the past for time series forecasting comparisons. However, many of these measures suffer from one or more issues such as poor resistance to outliers and scale dependence. In this paper, while summarising commonly used accuracy measures, a special review is made on the symmetric mean absolute percentage error. Moreover, a new accuracy measure called the Unscaled Mean Bounded Relative Absolute Error (UMBRAE), which combines the best features of various alternative measures, is proposed to address the common issues of existing measures. A comparative evaluation on the proposed and related measures has been made with both synthetic and real-world data. The results indicate that the proposed measure, with user selectable benchmark, performs as well as or better than other measures on selected criteria. Though it has been commonly accepted that there is no single best accuracy measure, we suggest that UMBRAE could be a good choice to evaluate forecasting methods, especially for cases where measures based on geometric mean of relative errors, such as the geometric mean relative absolute error, are preferred.
Application of viromics: a new approach to the understanding of viral infections in humans.

PubMed

Ramamurthy, Mageshbabu; Sankar, Sathish; Kannangai, Rajesh; Nandagopal, Balaji; Sridharan, Gopalan

2017-12-01

This review is focused at exploring the strengths of modern technology driven data compiled in the areas of virus gene sequencing, virus protein structures and their implication to viral diagnosis and therapy. The information for virome analysis (viromics) is generated by the study of viral genomes (entire nucleotide sequence) and viral genes (coding for protein). Presently, the study of viral infectious diseases in terms of etiopathogenesis and development of newer therapeutics is undergoing rapid changes. Currently, viromics relies on deep sequencing, next generation sequencing (NGS) data and public domain databases like GenBank and unique virus specific databases. Two commonly used NGS platforms: Illumina and Ion Torrent, recommend maximum fragment lengths of about 300 and 400 nucleotides for analysis respectively. Direct detection of viruses in clinical samples is now evolving using these methods. Presently, there are a considerable number of good treatment options for HBV/HIV/HCV. These viruses however show development of drug resistance. The drug susceptibility regions of the genomes are sequenced and the prediction of drug resistance is now possible from 3 public domains available on the web. This has been made possible through advances in the technology with the advent of high throughput sequencing and meta-analysis through sophisticated and easy to use software and the use of high speed computers for bioinformatics. More recently NGS technology has been improved with single-molecule real-time sequencing. Here complete long reads can be obtained with less error overcoming a limitation of the NGS which is inherently prone to software anomalies that arise in the hands of personnel without adequate training. The development in understanding the viruses in terms of their genome, pathobiology, transcriptomics and molecular epidemiology constitutes viromics. It could be stated that these developments will bring about radical changes and advancement especially in the field of antiviral therapy and diagnostic virology.
Prevention of a wrong-location misadministration through the use of an intradepartmental incident learning system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ford, Eric C.; Smith, Koren; Harris, Kendra

2012-11-15

Purpose: A series of examples are presented in which potential errors in the delivery of radiation therapy were prevented through use of incident learning. These examples underscore the value of reporting near miss incidents. Methods: Using a departmental incident learning system, eight incidents were noted over a two-year period in which fields were treated 'out-of-sequence,' that is, fields from a boost phase were treated, while the patient was still in the initial phase of treatment. As a result, an error-prevention policy was instituted in which radiation treatment fields are 'hidden' within the oncology information system (OIS) when they are notmore » in current use. In this way, fields are only available to be treated in the intended sequence and, importantly, old fields cannot be activated at the linear accelerator control console. Results: No out-of-sequence treatments have been reported in more than two years since the policy change. Furthermore, at least three near-miss incidents were detected and corrected as a result of the policy change. In the first two, the policy operated as intended to directly prevent an error in field scheduling. In the third near-miss, the policy operated 'off target' to prevent a type of error scenario that it was not directly intended to prevent. In this incident, an incorrect digitally reconstructed radiograph (DRR) was scheduled in the OIS for a patient receiving lung cancer treatment. The incorrect DRR had an isocenter which was misplaced by approximately two centimeters. The error was a result of a field from an old plan being scheduled instead of the intended new plan. As a result of the policy described above, the DRR field could not be activated for treatment however and the error was discovered and corrected. Other quality control barriers in place would have been unlikely to have detected this error. Conclusions: In these examples, a policy was adopted based on incident learning, which prevented several errors, at least one of which was potentially severe. These examples underscore the need for a rigorous, systematic incident learning process within each clinic. The experiences reported in this technical note demonstrate the value of near-miss incident reporting to improve patient safety.« less
Visually lossless compression of digital hologram sequences

NASA Astrophysics Data System (ADS)

Darakis, Emmanouil; Kowiel, Marcin; Näsänen, Risto; Naughton, Thomas J.

2010-01-01

Digital hologram sequences have great potential for the recording of 3D scenes of moving macroscopic objects as their numerical reconstruction can yield a range of perspective views of the scene. Digital holograms inherently have large information content and lossless coding of holographic data is rather inefficient due to the speckled nature of the interference fringes they contain. Lossy coding of still holograms and hologram sequences has shown promising results. By definition, lossy compression introduces errors in the reconstruction. In all of the previous studies, numerical metrics were used to measure the compression error and through it, the coding quality. Digital hologram reconstructions are highly speckled and the speckle pattern is very sensitive to data changes. Hence, numerical quality metrics can be misleading. For example, for low compression ratios, a numerically significant coding error can have visually negligible effects. Yet, in several cases, it is of high interest to know how much lossy compression can be achieved, while maintaining the reconstruction quality at visually lossless levels. Using an experimental threshold estimation method, the staircase algorithm, we determined the highest compression ratio that was not perceptible to human observers for objects compressed with Dirac and MPEG-4 compression methods. This level of compression can be regarded as the point below which compression is perceptually lossless although physically the compression is lossy. It was found that up to 4 to 7.5 fold compression can be obtained with the above methods without any perceptible change in the appearance of video sequences.
Sequential congruency effects: disentangling priming and conflict adaptation.

PubMed

Puccioni, Olga; Vallesi, Antonino

2012-09-01

Responding to the color of a word is slower and less accurate if the word refers to a different color (incongruent condition) than if it refers to the same color (congruent condition). This phenomenon, known as the Stroop effect, is modulated by sequential effects: it is bigger when the current trial is preceded by a congruent condition than by an incongruent one in the previous trial. Whether this phenomenon is due to priming mechanisms or to cognitive control is still debated. To disentangle the contribution of priming with respect to conflict adaptation mechanisms in determining sequential effects, two experiments were designed here with a four-alternative forced choice (4-AFC) Stroop task: in the first one only trials with complete alternations of features were used, while in the second experiment all possible types of repetitions were presented. Both response times (RTs) and errors were evaluated. Conflict adaptation effects on RTs were limited to congruent trials and were exclusively due to priming: they disappeared in the priming-free experiment and, in the second experiment, they occurred in sequences with feature repetitions but not in complete alternation sequences. Error results, instead, support the presence of conflict adaptation effects in incongruent trials. In priming-free sequences (experiment 1 and complete alternation sequences of experiment 2) with incongruent previous trials there was no error Stroop effect, while this effect was significant with congruent previous trials. These results indicate that cognitive control may modulate performance above and beyond priming effects.
Registration of retinal sequences from new video-ophthalmoscopic camera.

PubMed

Kolar, Radim; Tornow, Ralf P; Odstrcilik, Jan; Liberdova, Ivana

2016-05-20

Analysis of fast temporal changes on retinas has become an important part of diagnostic video-ophthalmology. It enables investigation of the hemodynamic processes in retinal tissue, e.g. blood-vessel diameter changes as a result of blood-pressure variation, spontaneous venous pulsation influenced by intracranial-intraocular pressure difference, blood-volume changes as a result of changes in light reflection from retinal tissue, and blood flow using laser speckle contrast imaging. For such applications, image registration of the recorded sequence must be performed. Here we use a new non-mydriatic video-ophthalmoscope for simple and fast acquisition of low SNR retinal sequences. We introduce a novel, two-step approach for fast image registration. The phase correlation in the first stage removes large eye movements. Lucas-Kanade tracking in the second stage removes small eye movements. We propose robust adaptive selection of the tracking points, which is the most important part of tracking-based approaches. We also describe a method for quantitative evaluation of the registration results, based on vascular tree intensity profiles. The achieved registration error evaluated on 23 sequences (5840 frames) is 0.78 ± 0.67 pixels inside the optic disc and 1.39 ± 0.63 pixels outside the optic disc. We compared the results with the commonly used approaches based on Lucas-Kanade tracking and scale-invariant feature transform, which achieved worse results. The proposed method can efficiently correct particular frames of retinal sequences for shift and rotation. The registration results for each frame (shift in X and Y direction and eye rotation) can also be used for eye-movement evaluation during single-spot fixation tasks.
Species detection and identification in sexual organisms using population genetic theory and DNA sequences.

PubMed

Birky, C William

2013-01-01

Phylogenetic trees of DNA sequences of a group of specimens may include clades of two kinds: those produced by stochastic processes (random genetic drift) within a species, and clades that represent different species. The ratio of the mean pairwise sequence difference between a pair of clades (K) to the mean pairwise sequence difference within a clade (θ) can be used to determine whether the clades are samples from different species (K/θ ≥ 4) or the same species (K/θ<4) with probability ≥ 0.95. Previously I applied this criterion to delimit species of asexual organisms. Here I use data from the literature to show how it can also be applied to delimit sexual species using four groups of sexual organisms as examples: ravens, spotted leopards, sea butterflies, and liverworts. Mitochondrial or chloroplast genes are used because these segregate earlier during speciation than most nuclear genes and hence detect earlier stages of speciation. In several cases the K/θ ratio was greater than 4, confirming the original authors' intuition that the clades were sufficiently different to be assigned to different species. But the K/θ ratio split each of two liverwort species into two evolutionary species, and showed that support for the distinction between the common and Chihuahuan raven species is weak. I also discuss some possible sources of error in using the K/θ ratio; the most significant one would be cases where males migrate between different populations but females do not, making the use of maternally inherited organelle genes problematic. The K/θ ratio must be used with some caution, like all other methods for species delimitation. Nevertheless, it is a simple theory-based quantitative method for using DNA sequences to make rigorous decisions about species delimitation in sexual as well as asexual eukaryotes.

Muroid rodent phylogenetics: 900-species tree reveals increasing diversification rates

PubMed Central

Schenk, John J.

2017-01-01

We combined new sequence data for more than 300 muroid rodent species with our previously published sequences for up to five nuclear and one mitochondrial genes to generate the most widely and densely sampled hypothesis of evolutionary relationships across Muroidea. An exhaustive screening procedure for publically available sequences was implemented to avoid the propagation of taxonomic errors that are common to supermatrix studies. The combined data set of carefully screened sequences derived from all available sequences on GenBank with our new data resulted in a robust maximum likelihood phylogeny for 900 of the approximately 1,620 muroids. Several regions that were equivocally resolved in previous studies are now more decisively resolved, and we estimated a chronogram using 28 fossil calibrations for the most integrated age and topological estimates to date. The results were used to update muroid classification and highlight questions needing additional data. We also compared the results of multigene supermatrix studies like this one with the principal published supertrees and concluded that the latter are unreliable for any comparative study in muroids. In addition, we explored diversification patterns as an explanation for why muroid rodents represent one of the most species-rich groups of mammals by detecting evidence for increasing net diversification rates through time across the muroid tree. We suggest the observation of increasing rates may be due to a combination of parallel increases in rate across clades and high average extinction rates. Five increased diversification-rate-shifts were inferred, suggesting that multiple, but perhaps not independent, events have led to the remarkable species diversity in the superfamily. Our results provide a phylogenetic framework for comparative studies that is not highly dependent upon the signal from any one gene. PMID:28813483
Numerical Solution of Optimal Control Problem under SPDE Constraints

DTIC Science & Technology

2011-10-14

Faure and Sobol sequences are used to evaluate high dimensional integrals, and the errors in the numerical results for over 30 dimensions become quite...sequence; right: 1000 points of dimension 26 and 27 projection for optimal Kronecker sequence. benchmark Faure and Sobol methods. 2.2 High order...J. Goodman and J. O’Rourke, Handbook of discrete and computational geome- try, CRC Press, Inc., (2004). [5] S. Joe and F. Kuo, Constructing Sobol
An observational study of drug administration errors in a Malaysian hospital (study of drug administration errors).

PubMed

Chua, S S; Tea, M H; Rahman, M H A

2009-04-01

Drug administration errors were the second most frequent type of medication errors, after prescribing errors but the latter were often intercepted hence, administration errors were more probably to reach the patients. Therefore, this study was conducted to determine the frequency and types of drug administration errors in a Malaysian hospital ward. This is a prospective study that involved direct, undisguised observations of drug administrations in a hospital ward. A researcher was stationed in the ward under study for 15 days to observe all drug administrations which were recorded in a data collection form and then compared with the drugs prescribed for the patient. A total of 1118 opportunities for errors were observed and 127 administrations had errors. This gave an error rate of 11.4 % [95% confidence interval (CI) 9.5-13.3]. If incorrect time errors were excluded, the error rate reduced to 8.7% (95% CI 7.1-10.4). The most common types of drug administration errors were incorrect time (25.2%), followed by incorrect technique of administration (16.3%) and unauthorized drug errors (14.1%). In terms of clinical significance, 10.4% of the administration errors were considered as potentially life-threatening. Intravenous routes were more likely to be associated with an administration error than oral routes (21.3% vs. 7.9%, P < 0.001). The study indicates that the frequency of drug administration errors in developing countries such as Malaysia is similar to that in the developed countries. Incorrect time errors were also the most common type of drug administration errors. A non-punitive system of reporting medication errors should be established to encourage more information to be documented so that risk management protocol could be developed and implemented.
Single-molecule sequencing and conformational capture enable de novo mammalian reference genomes

USDA-ARS?s Scientific Manuscript database

Genome assemblies have been produced for numerous species as a result of advances in sequencing technologies. However, many of the assemblies are fragmented, with many gaps, ambiguities, and errors. We use the genome of the domestic goat (Capra hircus) to demonstrate current state of the art for ef...
Single-Event Upset Characterization of Common First- and Second-Order All-Digital Phase-Locked Loops

NASA Astrophysics Data System (ADS)

Chen, Y. P.; Massengill, L. W.; Kauppila, J. S.; Bhuva, B. L.; Holman, W. T.; Loveless, T. D.

2017-08-01

The single-event upset (SEU) vulnerability of common first- and second-order all-digital-phase-locked loops (ADPLLs) is investigated through field-programmable gate array-based fault injection experiments. SEUs in the highest order pole of the loop filter and fraction-based phase detectors (PDs) may result in the worst case error response, i.e., limit cycle errors, often requiring system restart. SEUs in integer-based linear PDs may result in loss-of-lock errors, while SEUs in bang-bang PDs only result in temporary-frequency errors. ADPLLs with the same frequency tuning range but fewer bits in the control word exhibit better overall SEU performance.
Automatic Command Sequence Generation

NASA Technical Reports Server (NTRS)

Fisher, Forest; Gladded, Roy; Khanampompan, Teerapat

2007-01-01

Automatic Sequence Generator (Autogen) Version 3.0 software automatically generates command sequences for the Mars Reconnaissance Orbiter (MRO) and several other JPL spacecraft operated by the multi-mission support team. Autogen uses standard JPL sequencing tools like APGEN, ASP, SEQGEN, and the DOM database to automate the generation of uplink command products, Spacecraft Command Message Format (SCMF) files, and the corresponding ground command products, DSN Keywords Files (DKF). Autogen supports all the major multi-mission mission phases including the cruise, aerobraking, mapping/science, and relay mission phases. Autogen is a Perl script, which functions within the mission operations UNIX environment. It consists of two parts: a set of model files and the autogen Perl script. Autogen encodes the behaviors of the system into a model and encodes algorithms for context sensitive customizations of the modeled behaviors. The model includes knowledge of different mission phases and how the resultant command products must differ for these phases. The executable software portion of Autogen, automates the setup and use of APGEN for constructing a spacecraft activity sequence file (SASF). The setup includes file retrieval through the DOM (Distributed Object Manager), an object database used to store project files. This step retrieves all the needed input files for generating the command products. Depending on the mission phase, Autogen also uses the ASP (Automated Sequence Processor) and SEQGEN to generate the command product sent to the spacecraft. Autogen also provides the means for customizing sequences through the use of configuration files. By automating the majority of the sequencing generation process, Autogen eliminates many sequence generation errors commonly introduced by manually constructing spacecraft command sequences. Through the layering of commands into the sequence by a series of scheduling algorithms, users are able to rapidly and reliably construct the desired uplink command products. With the aid of Autogen, sequences may be produced in a matter of hours instead of weeks, with a significant reduction in the number of people on the sequence team. As a result, the uplink product generation process is significantly streamlined and mission risk is significantly reduced. Autogen is used for operations of MRO, Mars Global Surveyor (MGS), Mars Exploration Rover (MER), Mars Odyssey, and will be used for operations of Phoenix. Autogen Version 3.0 is the operational version of Autogen including the MRO adaptation for the cruise mission phase, and was also used for development of the aerobraking and mapping mission phases for MRO.
Second-order shaped pulsed for solid-state quantum computation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sengupta, Pinaki

2008-01-01

We present the construction and detailed analysis of highly optimized self-refocusing pulse shapes for several rotation angles. We characterize the constructed pulses by the coefficients appearing in the Magnus expansion up to second order. This allows a semianalytical analysis of the performance of the constructed shapes in sequences and composite pulses by computing the corresponding leading-order error operators. Higher orders can be analyzed with the numerical technique suggested by us previously. We illustrate the technique by analyzing several composite pulses designed to protect against pulse amplitude errors, and on decoupling sequences for potentially long chains of qubits with on-site andmore » nearest-neighbor couplings.« less
SOPRA: Scaffolding algorithm for paired reads via statistical optimization.

PubMed

Dayarian, Adel; Michael, Todd P; Sengupta, Anirvan M

2010-06-24

High throughput sequencing (HTS) platforms produce gigabases of short read (<100 bp) data per run. While these short reads are adequate for resequencing applications, de novo assembly of moderate size genomes from such reads remains a significant challenge. These limitations could be partially overcome by utilizing mate pair technology, which provides pairs of short reads separated by a known distance along the genome. We have developed SOPRA, a tool designed to exploit the mate pair/paired-end information for assembly of short reads. The main focus of the algorithm is selecting a sufficiently large subset of simultaneously satisfiable mate pair constraints to achieve a balance between the size and the quality of the output scaffolds. Scaffold assembly is presented as an optimization problem for variables associated with vertices and with edges of the contig connectivity graph. Vertices of this graph are individual contigs with edges drawn between contigs connected by mate pairs. Similar graph problems have been invoked in the context of shotgun sequencing and scaffold building for previous generation of sequencing projects. However, given the error-prone nature of HTS data and the fundamental limitations from the shortness of the reads, the ad hoc greedy algorithms used in the earlier studies are likely to lead to poor quality results in the current context. SOPRA circumvents this problem by treating all the constraints on equal footing for solving the optimization problem, the solution itself indicating the problematic constraints (chimeric/repetitive contigs, etc.) to be removed. The process of solving and removing of constraints is iterated till one reaches a core set of consistent constraints. For SOLiD sequencer data, SOPRA uses a dynamic programming approach to robustly translate the color-space assembly to base-space. For assessing the quality of an assembly, we report the no-match/mismatch error rate as well as the rates of various rearrangement errors. Applying SOPRA to real data from bacterial genomes, we were able to assemble contigs into scaffolds of significant length (N50 up to 200 Kb) with very few errors introduced in the process. In general, the methodology presented here will allow better scaffold assemblies of any type of mate pair sequencing data.
AveBoost2: Boosting for Noisy Data

NASA Technical Reports Server (NTRS)

Oza, Nikunj C.

2004-01-01

AdaBoost is a well-known ensemble learning algorithm that constructs its constituent or base models in sequence. A key step in AdaBoost is constructing a distribution over the training examples to create each base model. This distribution, represented as a vector, is constructed to be orthogonal to the vector of mistakes made by the pre- vious base model in the sequence. The idea is to make the next base model's errors uncorrelated with those of the previous model. In previous work, we developed an algorithm, AveBoost, that constructed distributions orthogonal to the mistake vectors of all the previous models, and then averaged them to create the next base model s distribution. Our experiments demonstrated the superior accuracy of our approach. In this paper, we slightly revise our algorithm to allow us to obtain non-trivial theoretical results: bounds on the training error and generalization error (difference between training and test error). Our averaging process has a regularizing effect which, as expected, leads us to a worse training error bound for our algorithm than for AdaBoost but a superior generalization error bound. For this paper, we experimented with the data that we used in both as originally supplied and with added label noise-a small fraction of the data has its original label changed. Noisy data are notoriously difficult for AdaBoost to learn. Our algorithm's performance improvement over AdaBoost is even greater on the noisy data than the original data.
Error reduction study employing a pseudo-random binary sequence for use in acoustic pyrometry of gases

NASA Astrophysics Data System (ADS)

Ewan, B. C. R.; Ireland, S. N.

2000-12-01

Acoustic pyrometry uses the temperature dependence of sound speed in materials to measure temperature. This is normally achieved by measuring the transit time for a sound signal over a known path length and applying the material relation between temperature and velocity to extract an "average" temperature. Sources of error associated with the measurement of mean transit time are discussed in implementing the technique in gases, one of the principal causes being background noise in typical industrial environments. A number of transmitted signal and processing strategies which can be used in the area are examined and the expected error in mean transit time associated with each technique is quantified. Transmitted signals included pulses, pure frequencies, chirps, and pseudorandom binary sequences (prbs), while processing involves edge detection and correlation. Errors arise through the misinterpretation of the positions of edge arrival or correlation peaks due to instantaneous deviations associated with background noise and these become more severe as signal to noise amplitude ratios decrease. Population errors in the mean transit time are estimated for the different measurement strategies and it is concluded that PRBS combined with correlation can provide the lowest errors when operating in high noise environments. The operation of an instrument based on PRBS transmitted signals is described and test results under controlled noise conditions are presented. These confirm the value of the strategy and demonstrate that measurements can be made with signal to noise amplitude ratios down to 0.5.
Learning by subtraction: Hippocampal activity and effects of ethanol during the acquisition and performance of response sequences.

PubMed

Ketchum, Myles J; Weyand, Theodore G; Weed, Peter F; Winsauer, Peter J

2016-05-01

Learning is believed to be reflected in the activity of the hippocampus. However, neural correlates of learning have been difficult to characterize because hippocampal activity is integrated with ongoing behavior. To address this issue, male rats (n = 5) implanted with electrodes (n = 14) in the CA1 subfield responded during two tasks within a single test session. In one task, subjects acquired a new 3-response sequence (acquisition), whereas in the other task, subjects completed a well-rehearsed 3-response sequence (performance). Both tasks though could be completed using an identical response topography and used the same sensory stimuli and schedule of reinforcement. More important, comparing neural patterns during sequence acquisition to those during sequence performance allows for a subtractive approach whereby activity associated with learning could potentially be dissociated from the activity associated with ongoing behavior. At sites where CA1 activity was closely associated with behavior, the patterns of activity were differentially modulated by key position and the serial position of a response within the schedule of reinforcement. Temporal shifts between peak activity and responding on particular keys also occurred during sequence acquisition, but not during sequence performance. Ethanol disrupted CA1 activity while producing rate-decreasing effects in both tasks and error-increasing effects that were more selective for sequence acquisition than sequence performance. Ethanol also produced alterations in the magnitude of modulations and temporal pattern of CA1 activity, although these effects were not selective for sequence acquisition. Similar to ethanol, hippocampal micro-stimulation decreased response rate in both tasks and selectively increased the percentage of errors during sequence acquisition, and provided a more direct demonstration of hippocampal involvement during sequence acquisition. Together, these results strongly support the notion that ethanol disrupts sequence acquisition by disrupting hippocampal activity and that the hippocampus is necessary for the conditioned associations required for sequence acquisition. © 2015 Wiley Periodicals, Inc.
Learning by Subtraction: Hippocampal Activity and Effects of Ethanol during the Acquisition and Performance of Response Sequences

PubMed Central

Ketchum, Myles J.; Weyand, Theodore G.; Weed, Peter F.; Winsauer, Peter J.

2015-01-01

Learning is believed to be reflected in the activity of the hippocampus. However, neural correlates of learning have been difficult to characterize because hippocampal activity is integrated with ongoing behavior. To address this issue, male rats (n=5) implanted with electrodes (n=14) in the CA1 subfield responded during two tasks within a single test session. In one task, subjects acquired a new 3-response sequence (acquisition), whereas in the other task, subjects completed a well-rehearsed 3-response sequence (performance). Both tasks though could be completed using an identical response topography and used the same sensory stimuli and schedule of reinforcement. More important, comparing neural patterns during sequence acquisition to those during sequence performance allows for a subtractive approach whereby activity associated with learning could potentially be dissociated from the activity associated with ongoing behavior. At sites where CA1 activity was closely associated with behavior, the patterns of activity were differentially modulated by key position and the serial position of a response within the schedule of reinforcement. Temporal shifts between peak activity and responding on particular keys also occurred during sequence acquisition, but not during sequence performance. Ethanol disrupted CA1 activity while producing rate-decreasing effects in both tasks and error-increasing effects that were more selective for sequence acquisition than sequence performance. Ethanol also produced alterations in the magnitude of modulations and temporal pattern of CA1 activity, although these effects were not selective for sequence acquisition. Similar to ethanol, hippocampal micro-stimulation decreased response rate in both tasks and selectively increased the percentage of errors during sequence acquisition, and provided a more direct demonstration of hippocampal involvement during sequence acquisition. Together, these results strongly support the notion that ethanol disrupts sequence acquisition by disrupting hippocampal activity and that the hippocampus is necessary for the conditioned associations required for sequence acquisition. PMID:26482846
Statistical physics of interacting neural networks

NASA Astrophysics Data System (ADS)

Kinzel, Wolfgang; Metzler, Richard; Kanter, Ido

2001-12-01

Recent results on the statistical physics of time series generation and prediction are presented. A neural network is trained on quasi-periodic and chaotic sequences and overlaps to the sequence generator as well as the prediction errors are calculated numerically. For each network there exists a sequence for which it completely fails to make predictions. Two interacting networks show a transition to perfect synchronization. A pool of interacting networks shows good coordination in the minority game-a model of competition in a closed market. Finally, as a demonstration, a perceptron predicts bit sequences produced by human beings.
Refractive errors in Mercyland Specialist Hospital, Osogbo, Western Nigeria.

PubMed

Adeoti, C O; Egbewale, B E

2008-06-01

The study was conducted to determine the magnitude and pattern of refractive errors in order to provide facilities for its management. A prospective study of 3601 eyes of 1824 consective patients was conducted. Information obtained included age, sex, occupation, visual acuity, type and degree of refractive error. The data was analysed using Statistical Package for Social Sciences 11.0 version) Computer Software. Refractive error was found in 1824(53.71%) patients. There were 832(45.61%) males and 992(54.39%) females with a mean age of 35.55. Myopia was the commonest (1412(39.21% eyes). Others include hypermetropia (840(23.33% eyes), astigmatism (785(21.80%) and 820 patients (1640 eyes) had presbyopia. Anisometropia was present in 791(44.51%) of 1777 patients that had bilateral refractive errors. Two thousand two hundred and fifty two eyes has spherical errors. Out of 2252 eyes with spherical errors, 1308 eyes (58.08%) had errors -0.50 to +0.50 dioptres, 567 eyes (25.18%) had errors less than -0.50 dioptres of whom 63 eyes (2.80%) had errors less than -5.00 dioptres while 377 eyes (16.74%) had errors greater than +0.50 dioptres of whom 81 eyes (3.60%) had errors greater than +2.00 dioptres. The highest error was 20.00 dioptres for myopia and 18.00 dioptres for hypermetropia. Refractive error is common in this environment. Adequate provision should be made for its correction bearing in mind the common types and degrees.
Limited copy number - high resolution melting (LCN-HRM) enables the detection and identification by sequencing of low level mutations in cancer biopsies

PubMed Central

Do, Hongdo; Dobrovic, Alexander

2009-01-01

Background Mutation detection in clinical tumour samples is challenging when the proportion of tumour cells, and thus mutant alleles, is low. The limited sensitivity of conventional sequencing necessitates the adoption of more sensitive approaches. High resolution melting (HRM) is more sensitive than sequencing but identification of the mutation is desirable, particularly when it is important to discriminate false positives due to PCR errors or template degradation from true mutations. We thus developed limited copy number - high resolution melting (LCN-HRM) which applies limiting dilution to HRM. Multiple replicate reactions with a limited number of target sequences per reaction allow low level mutations to be detected. The dilutions used (based on Ct values) are chosen such that mutations, if present, can be detected by the direct sequencing of amplicons with aberrant melting patterns. Results Using cell lines heterozygous for mutations, we found that the mutations were not readily detected when they comprised 10% of total alleles (20% tumour cells) by sequencing, whereas they were readily detectable at 5% total alleles by standard HRM. LCN-HRM allowed these mutations to be identified by direct sequencing of those positive reactions. LCN-HRM was then used to review formalin-fixed paraffin-embedded (FFPE) clinical samples showing discordant findings between sequencing and HRM for KRAS exon 2 and EGFR exons 19 and 21. Both true mutations present at low levels and sequence changes due to artefacts were detected by LCN-HRM. The use of high fidelity polymerases showed that the majority of the artefacts were derived from the damaged template rather than replication errors during amplification. Conclusion LCN-HRM bridges the sensitivity gap between HRM and sequencing and is effective in distinguishing between artefacts and true mutations. PMID:19811662
Limited copy number-high resolution melting (LCN-HRM) enables the detection and identification by sequencing of low level mutations in cancer biopsies.

PubMed

Do, Hongdo; Dobrovic, Alexander

2009-10-08

Mutation detection in clinical tumour samples is challenging when the proportion of tumour cells, and thus mutant alleles, is low. The limited sensitivity of conventional sequencing necessitates the adoption of more sensitive approaches. High resolution melting (HRM) is more sensitive than sequencing but identification of the mutation is desirable, particularly when it is important to discriminate false positives due to PCR errors or template degradation from true mutations.We thus developed limited copy number - high resolution melting (LCN-HRM) which applies limiting dilution to HRM. Multiple replicate reactions with a limited number of target sequences per reaction allow low level mutations to be detected. The dilutions used (based on Ct values) are chosen such that mutations, if present, can be detected by the direct sequencing of amplicons with aberrant melting patterns. Using cell lines heterozygous for mutations, we found that the mutations were not readily detected when they comprised 10% of total alleles (20% tumour cells) by sequencing, whereas they were readily detectable at 5% total alleles by standard HRM. LCN-HRM allowed these mutations to be identified by direct sequencing of those positive reactions.LCN-HRM was then used to review formalin-fixed paraffin-embedded (FFPE) clinical samples showing discordant findings between sequencing and HRM for KRAS exon 2 and EGFR exons 19 and 21. Both true mutations present at low levels and sequence changes due to artefacts were detected by LCN-HRM. The use of high fidelity polymerases showed that the majority of the artefacts were derived from the damaged template rather than replication errors during amplification. LCN-HRM bridges the sensitivity gap between HRM and sequencing and is effective in distinguishing between artefacts and true mutations.
Secondary School Teachers' Pedagogical Content Knowledge of Some Common Student Errors and Misconceptions in Sets

ERIC Educational Resources Information Center

Kolitsoe Moru, Eunice; Qhobela, Makomosela

2013-01-01

The study investigated teachers' pedagogical content knowledge of common students' errors and misconceptions in sets. Five mathematics teachers from one Lesotho secondary school were the sample of the study. Questionnaires and interviews were used for data collection. The results show that teachers were able to identify the following students'…
Addressing Common Student Errors with Classroom Voting in Multivariable Calculus

ERIC Educational Resources Information Center

Cline, Kelly; Parker, Mark; Zullo, Holly; Stewart, Ann

2012-01-01

One technique for identifying and addressing common student errors is the method of classroom voting, in which the instructor presents a multiple-choice question to the class, and after a few minutes for consideration and small group discussion, each student votes on the correct answer, often using a hand-held electronic clicker. If a large number…
The Nature of Error in Adolescent Student Writing

ERIC Educational Resources Information Center

Wilcox, Kristen Campbell; Yagelski, Robert; Yu, Fang

2014-01-01

This study examined the nature and frequency of error in high school native English speaker (L1) and English learner (L2) writing. Four main research questions were addressed: Are there significant differences in students' error rates in English language arts (ELA) and social studies? Do the most common errors made by students differ in ELA…
False Positives in Multiple Regression: Unanticipated Consequences of Measurement Error in the Predictor Variables

ERIC Educational Resources Information Center

Shear, Benjamin R.; Zumbo, Bruno D.

2013-01-01

Type I error rates in multiple regression, and hence the chance for false positive research findings, can be drastically inflated when multiple regression models are used to analyze data that contain random measurement error. This article shows the potential for inflated Type I error rates in commonly encountered scenarios and provides new…

Comparing Measurement Error between Two Different Methods of Measurement of Various Magnitudes

ERIC Educational Resources Information Center

Zavorsky, Gerald S.

2010-01-01

Measurement error is a common problem in several fields of research such as medicine, physiology, and exercise science. The standard deviation of repeated measurements on the same person is the measurement error. One way of presenting measurement error is called the repeatability, which is 2.77 multiplied by the within subject standard deviation.…
[Errors in prescriptions and their preparation at the outpatient pharmacy of a regional hospital].

PubMed

Alvarado A, Carolina; Ossa G, Ximena; Bustos M, Luis

2017-01-01

Adverse effects of medications are an important cause of morbidity and hospital admissions. Errors in prescription or preparation of medications by pharmacy personnel are a factor that may influence these occurrence of the adverse effects Aim: To assess the frequency and type of errors in prescriptions and in their preparation at the pharmacy unit of a regional public hospital. Prescriptions received by ambulatory patients and those being discharged from the hospital, were reviewed using a 12-item checklist. The preparation of such prescriptions at the pharmacy unit was also reviewed using a seven item checklist. Seventy two percent of prescriptions had at least one error. The most common mistake was the impossibility of determining the concentration of the prescribed drug. Prescriptions for patients being discharged from the hospital had the higher number of errors. When a prescription had more than two drugs, the risk of error increased 2.4 times. Twenty four percent of prescription preparations had at least one error. The most common mistake was the labeling of drugs with incomplete medical indications. When a preparation included more than three drugs, the risk of preparation error increased 1.8 times. Prescription and preparation of medication delivered to patients had frequent errors. The most important risk factor for errors was the number of drugs prescribed.
RAS screening in colorectal cancer: a comprehensive analysis of the results from the UK NEQAS colorectal cancer external quality assurance schemes (2009-2016).

PubMed

Richman, Susan D; Fairley, Jennifer; Butler, Rachel; Deans, Zandra C

2017-12-01

Evidence strongly indicates that extended RAS testing should be undertaken in mCRC patients, prior to prescribing anti-EGFR therapies. With more laboratories implementing testing, the requirement for External Quality Assurance schemes increases, thus ensuring high standards of molecular analysis. Data was analysed from 15 United Kingdom National External Quality Assessment Service (UK NEQAS) for Molecular Genetics Colorectal cancer external quality assurance (EQA) schemes, delivered between 2009 and 2016. Laboratories were provided annually with nine colorectal tumour samples for genotyping. Information on methodology and extent of testing coverage was requested, and scores given for genotyping, interpretation and clerical accuracy. There has been a sixfold increase in laboratory participation (18 in 2009 to 108 in 2016). For RAS genotyping, fewer laboratories now use Roche cobas®, pyrosequencing and Sanger sequencing, with more moving to next generation sequencing (NGS). NGS is the most commonly employed technology for BRAF and PIK3CA mutation screening. KRAS genotyping errors were seen in ≤10% laboratories, until the 2014-2015 scheme, when there was an increase to 16.7%, corresponding to a large increase in scheme participants. NRAS genotyping errors peaked at 25.6% in the first 2015-2016 scheme but subsequently dropped to below 5%. Interpretation and clerical accuracy scores have been consistently good throughout. Within this EQA scheme, we have observed that the quality of molecular analysis for colorectal cancer has continued to improve, despite changes in the required targets, the volume of testing and the technologies employed. It is reassuring to know that laboratories clearly recognise the importance of participating in EQA schemes.
Aortic blood pressure measured via EIT: investigation of different measurement settings.

PubMed

Braun, Fabian; Proença, Martin; Rapin, Michael; Lemay, Mathieu; Adler, Andy; Grychtol, Bartłomiej; Solà, Josep; Thiran, Jean-Philippe

2015-06-01

Electrical impedance tomography (EIT) allows the measurement of intra-thoracic impedance changes related to cardiovascular activity. As a safe and low-cost imaging modality, EIT is an appealing candidate for non-invasive and continuous haemodynamic monitoring. EIT has recently been shown to allow the assessment of aortic blood pressure via the estimation of the aortic pulse arrival time (PAT). However, finding the aortic signal within EIT image sequences is a challenging task: the signal has a small amplitude and is difficult to locate due to the small size of the aorta and the inherent low spatial resolution of EIT. In order to most reliably detect the aortic signal, our objective was to understand the effect of EIT measurement settings (electrode belt placement, reconstruction algorithm). This paper investigates the influence of three transversal belt placements and two commonly-used difference reconstruction algorithms (Gauss-Newton and GREIT) on the measurement of aortic signals in view of aortic blood pressure estimation via EIT. A magnetic resonance imaging based three-dimensional finite element model of the haemodynamic bio-impedance properties of the human thorax was created. Two simulation experiments were performed with the aim to (1) evaluate the timing error in aortic PAT estimation and (2) quantify the strength of the aortic signal in each pixel of the EIT image sequences. Both experiments reveal better performance for images reconstructed with Gauss-Newton (with a noise figure of 0.5 or above) and a belt placement at the height of the heart or higher. According to the noise-free scenarios simulated, the uncertainty in the analysis of the aortic EIT signal is expected to induce blood pressure errors of at least ± 1.4 mmHg.
[Error analysis of functional articulation disorders in children].

PubMed

Zhou, Qiao-juan; Yin, Heng; Shi, Bing

2008-08-01

To explore the clinical characteristic of functional articulation disorders in children and provide more evidence for differential diagnosis and speech therapy. 172 children with functional articulation disorders were grouped by age. Children aged 4-5 years were assigned to one group, and those aged 6-10 years were to another group. Their phonological samples were collected and analyzed. In the two groups, substitution and omission (deletion) were the mainly articulation errors in these children, dental consonants were the main wrong sounds, and bilabial and labio-dental were rarely wrong. In age 4-5 group, sequence according to the error frequency from the highest to lowest was dental, velar, lingual, apical, bilabial, and labio-dental. In age 6-10 group, the sequence was dental, lingual, apical, velar, bilabial, labio-dental. Lateral misarticulation and palatalized misarticulation occurred more often in age 6-10 group than age 4-5 group and were only found in lingual and dental consonants in two groups. Misarticulation of functional articulation disorders mainly occurs in dental and rarely in bilabial and labio-dental. Substitution and omission are the most often occurred errors. Lateral misarticulation and palatalized misarticulation occur mainly in lingual and dental consonants.
A Rasch Perspective

ERIC Educational Resources Information Center

Schumacker, Randall E.; Smith, Everett V., Jr.

2007-01-01

Measurement error is a common theme in classical measurement models used in testing and assessment. In classical measurement models, the definition of measurement error and the subsequent reliability coefficients differ on the basis of the test administration design. Internal consistency reliability specifies error due primarily to poor item…
Assumption-free estimation of the genetic contribution to refractive error across childhood.

PubMed

Guggenheim, Jeremy A; St Pourcain, Beate; McMahon, George; Timpson, Nicholas J; Evans, David M; Williams, Cathy

2015-01-01

Studies in relatives have generally yielded high heritability estimates for refractive error: twins 75-90%, families 15-70%. However, because related individuals often share a common environment, these estimates are inflated (via misallocation of unique/common environment variance). We calculated a lower-bound heritability estimate for refractive error free from such bias. Between the ages 7 and 15 years, participants in the Avon Longitudinal Study of Parents and Children (ALSPAC) underwent non-cycloplegic autorefraction at regular research clinics. At each age, an estimate of the variance in refractive error explained by single nucleotide polymorphism (SNP) genetic variants was calculated using genome-wide complex trait analysis (GCTA) using high-density genome-wide SNP genotype information (minimum N at each age=3,404). The variance in refractive error explained by the SNPs ("SNP heritability") was stable over childhood: Across age 7-15 years, SNP heritability averaged 0.28 (SE=0.08, p<0.001). The genetic correlation for refractive error between visits varied from 0.77 to 1.00 (all p<0.001) demonstrating that a common set of SNPs was responsible for the genetic contribution to refractive error across this period of childhood. Simulations suggested lack of cycloplegia during autorefraction led to a small underestimation of SNP heritability (adjusted SNP heritability=0.35; SE=0.09). To put these results in context, the variance in refractive error explained (or predicted) by the time participants spent outdoors was <0.005 and by the time spent reading was <0.01, based on a parental questionnaire completed when the child was aged 8-9 years old. Genetic variation captured by common SNPs explained approximately 35% of the variation in refractive error between unrelated subjects. This value sets an upper limit for predicting refractive error using existing SNP genotyping arrays, although higher-density genotyping in larger samples and inclusion of interaction effects is expected to raise this figure toward twin- and family-based heritability estimates. The same SNPs influenced refractive error across much of childhood. Notwithstanding the strong evidence of association between time outdoors and myopia, and time reading and myopia, less than 1% of the variance in myopia at age 15 was explained by crude measures of these two risk factors, indicating that their effects may be limited, at least when averaged over the whole population.
The global burden of diagnostic errors in primary care.

PubMed

Singh, Hardeep; Schiff, Gordon D; Graber, Mark L; Onakpoya, Igho; Thompson, Matthew J

2017-06-01

Diagnosis is one of the most important tasks performed by primary care physicians. The World Health Organization (WHO) recently prioritized patient safety areas in primary care, and included diagnostic errors as a high-priority problem. In addition, a recent report from the Institute of Medicine in the USA, 'Improving Diagnosis in Health Care ', concluded that most people will likely experience a diagnostic error in their lifetime. In this narrative review, we discuss the global significance, burden and contributory factors related to diagnostic errors in primary care. We synthesize available literature to discuss the types of presenting symptoms and conditions most commonly affected. We then summarize interventions based on available data and suggest next steps to reduce the global burden of diagnostic errors. Research suggests that we are unlikely to find a 'magic bullet' and confirms the need for a multifaceted approach to understand and address the many systems and cognitive issues involved in diagnostic error. Because errors involve many common conditions and are prevalent across all countries, the WHO's leadership at a global level will be instrumental to address the problem. Based on our review, we recommend that the WHO consider bringing together primary care leaders, practicing frontline clinicians, safety experts, policymakers, the health IT community, medical education and accreditation organizations, researchers from multiple disciplines, patient advocates, and funding bodies among others, to address the many common challenges and opportunities to reduce diagnostic error. This could lead to prioritization of practice changes needed to improve primary care as well as setting research priorities for intervention development to reduce diagnostic error. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
A Comparison of Medication Histories Obtained by a Pharmacy Technician Versus Nurses in the Emergency Department

PubMed Central

Markovic, Marija; Mathis, A. Scott; Ghin, Hoytin Lee; Gardiner, Michelle; Fahim, Germin

2017-01-01

Purpose: To compare the medication history error rate of the emergency department (ED) pharmacy technician with that of nursing staff and to describe the workflow environment. Methods: Fifty medication histories performed by an ED nurse followed by the pharmacy technician were evaluated for discrepancies (RN-PT group). A separate 50 medication histories performed by the pharmacy technician and observed with necessary intervention by the ED pharmacist were evaluated for discrepancies (PT-RPh group). Discrepancies were totaled and categorized by type of error and therapeutic category of the medication. The workflow description was obtained by observation and staff interview. Results: A total of 474 medications in the RN-PT group and 521 in the PT-RPh group were evaluated. Nurses made at least one error in all 50 medication histories (100%), compared to 18 medication histories for the pharmacy technician (36%). In the RN-PT group, 408 medications had at least one error, corresponding to an accuracy rate of 14% for nurses. In the PT-RPh group, 30 medications had an error, corresponding to an accuracy rate of 94.4% for the pharmacy technician (P < 0.0001). The most common error made by nurses was a missing medication (n = 109), while the most common error for the pharmacy technician was a wrong medication frequency (n = 19). The most common drug class with documented errors for ED nurses was cardiovascular medications (n = 100), while the pharmacy technician made the most errors in gastrointestinal medications (n = 11). Conclusion: Medication histories obtained by the pharmacy technician were significantly more accurate than those obtained by nurses in the emergency department. PMID:28090164
Interspecific and intraspecific gene variability in a 1-Mb region containing the highest density of NBS-LRR genes found in the melon genome.

PubMed

González, Víctor M; Aventín, Núria; Centeno, Emilio; Puigdomènech, Pere

2014-12-17

Plant NBS-LRR -resistance genes tend to be found in clusters, which have been shown to be hot spots of genome variability. In melon, half of the 81 predicted NBS-LRR genes group in nine clusters, and a 1 Mb region on linkage group V contains the highest density of R-genes and presence/absence gene polymorphisms found in the melon genome. This region is known to contain the locus of Vat, an agronomically important gene that confers resistance to aphids. However, the presence of duplications makes the sequencing and annotation of R-gene clusters difficult, usually resulting in multi-gapped sequences with higher than average errors. A 1-Mb sequence that contains the largest NBS-LRR gene cluster found in melon was improved using a strategy that combines Illumina paired-end mapping and PCR-based gap closing. Unknown sequence was decreased by 70% while about 3,000 SNPs and small indels were corrected. As a result, the annotations of 18 of a total of 23 NBS-LRR genes found in this region were modified, including additional coding sequences, amino acid changes, correction of splicing boundaries, or fussion of ORFs in common transcription units. A phylogeny analysis of the R-genes and their comparison with syntenic sequences in other cucurbits point to a pattern of local gene amplifications since the diversification of cucurbits from other families, and through speciation within the family. A candidate Vat gene is proposed based on the sequence similarity between a reported Vat gene from a Korean melon cultivar and a sequence fragment previously absent in the unrefined sequence. A sequence refinement strategy allowed substantial improvement of a 1 Mb fragment of the melon genome and the re-annotation of the largest cluster of NBS-LRR gene homologues found in melon. Analysis of the cluster revealed that resistance genes have been produced by sequence duplication in adjacent genome locations since the divergence of cucurbits from other close families, and through the process of speciation within the family a candidate Vat gene was also identified using sequence previously unavailable, which demonstrates the advantages of genome assembly refinements when analyzing complex regions such as those containing clusters of highly similar genes.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lapidus, Alla L.

From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly ofmore » whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.« less
Statistical method evaluation for differentially methylated CpGs in base resolution next-generation DNA sequencing data.

PubMed

Zhang, Yun; Baheti, Saurabh; Sun, Zhifu

2018-05-01

High-throughput bisulfite methylation sequencing such as reduced representation bisulfite sequencing (RRBS), Agilent SureSelect Human Methyl-Seq (Methyl-seq) or whole-genome bisulfite sequencing is commonly used for base resolution methylome research. These data are represented either by the ratio of methylated cytosine versus total coverage at a CpG site or numbers of methylated and unmethylated cytosines. Multiple statistical methods can be used to detect differentially methylated CpGs (DMCs) between conditions, and these methods are often the base for the next step of differentially methylated region identification. The ratio data have a flexibility of fitting to many linear models, but the raw count data take consideration of coverage information. There is an array of options in each datatype for DMC detection; however, it is not clear which is an optimal statistical method. In this study, we systematically evaluated four statistic methods on methylation ratio data and four methods on count-based data and compared their performances with regard to type I error control, sensitivity and specificity of DMC detection and computational resource demands using real RRBS data along with simulation. Our results show that the ratio-based tests are generally more conservative (less sensitive) than the count-based tests. However, some count-based methods have high false-positive rates and should be avoided. The beta-binomial model gives a good balance between sensitivity and specificity and is preferred method. Selection of methods in different settings, signal versus noise and sample size estimation are also discussed.
NMR structure determination of a synthetic analogue of bacillomycin Lc reveals the strategic role of L-Asn1 in the natural iturinic antibiotics

NASA Astrophysics Data System (ADS)

Volpon, Laurent; Tsan, Pascale; Majer, Zsuzsa; Vass, Elemer; Hollósi, Miklós; Noguéra, Valérie; Lancelin, Jean-Marc; Besson, Françoise

2007-08-01

Iturins are a group of antifungal produced by Bacillus subtilis. All are cyclic lipopeptides with seven α-amino acids of configuration LDDLLDL and one β-amino fatty acid. The bacillomycin L is a member of this family and its NMR structure was previously resolved using the sequence Asp-Tyr-Asn-Ser-Gln-Ser-Thr. In this work, we carefully examined the NMR spectra of this compound and detected an error in the sequence. In fact, Asp1 and Gln5 need to be changed into Asn1 and Glu5, which therefore makes it identical to bacillomycin Lc. As a consequence, it now appears that all iturinic peptides with antibiotic activity share the common β-amino fatty acid 8- L-Asn1- D-Tyr2- D-Asn3 sequence. To better understand the conformational influence of the acidic residue L-Asp1, present, for example in the inactive iturin C, the NMR structure of the synthetic analogue SCP [cyclo ( L-Asp1- D-Tyr2- D-Asn3- L-Ser4- L-Gln5- D-Ser6- L-Thr7-β-Ala8)] was determined and compared with bacillomycin Lc recalculated with the corrected sequence. In both cases, the conformers obtained were separated into two families of similar energy which essentially differ in the number and type of turns. A detailed analysis of both cyclopeptide structures is presented here. In addition, CD and FTIR spectra were performed and confirmed the conformational differences observed by NMR between both cyclopeptides.
Origin-Dependent Inverted-Repeat Amplification: Tests of a Model for Inverted DNA Amplification.

PubMed

Brewer, Bonita J; Payen, Celia; Di Rienzi, Sara C; Higgins, Megan M; Ong, Giang; Dunham, Maitreya J; Raghuraman, M K

2015-12-01

DNA replication errors are a major driver of evolution--from single nucleotide polymorphisms to large-scale copy number variations (CNVs). Here we test a specific replication-based model to explain the generation of interstitial, inverted triplications. While no genetic information is lost, the novel inversion junctions and increased copy number of the included sequences create the potential for adaptive phenotypes. The model--Origin-Dependent Inverted-Repeat Amplification (ODIRA)-proposes that a replication error at pre-existing short, interrupted, inverted repeats in genomic sequences generates an extrachromosomal, inverted dimeric, autonomously replicating intermediate; subsequent genomic integration of the dimer yields this class of CNV without loss of distal chromosomal sequences. We used a combination of in vitro and in vivo approaches to test the feasibility of the proposed replication error and its downstream consequences on chromosome structure in the yeast Saccharomyces cerevisiae. We show that the proposed replication error-the ligation of leading and lagging nascent strands to create "closed" forks-can occur in vitro at short, interrupted inverted repeats. The removal of molecules with two closed forks results in a hairpin-capped linear duplex that we show replicates in vivo to create an inverted, dimeric plasmid that subsequently integrates into the genome by homologous recombination, creating an inverted triplication. While other models have been proposed to explain inverted triplications and their derivatives, our model can also explain the generation of human, de novo, inverted amplicons that have a 2:1 mixture of sequences from both homologues of a single parent--a feature readily explained by a plasmid intermediate that arises from one homologue and integrates into the other homologue prior to meiosis. Our tests of key features of ODIRA lend support to this mechanism and suggest further avenues of enquiry to unravel the origins of interstitial, inverted CNVs pivotal in human health and evolution.
Assessing the Relationship of Ancient and Modern Populations

PubMed Central

Schraiber, Joshua G.

2018-01-01

Genetic material sequenced from ancient samples is revolutionizing our understanding of the recent evolutionary past. However, ancient DNA is often degraded, resulting in low coverage, error-prone sequencing. Several solutions exist to this problem, ranging from simple approach, such as selecting a read at random for each site, to more complicated approaches involving genotype likelihoods. In this work, we present a novel method for assessing the relationship of an ancient sample with a modern population, while accounting for sequencing error and postmortem damage by analyzing raw reads from multiple ancient individuals simultaneously. We show that, when analyzing SNP data, it is better to sequence more ancient samples to low coverage: two samples sequenced to 0.5× coverage provide better resolution than a single sample sequenced to 2× coverage. We also examined the power to detect whether an ancient sample is directly ancestral to a modern population, finding that, with even a few high coverage individuals, even ancient samples that are very slightly diverged from the modern population can be detected with ease. When we applied our approach to European samples, we found that no ancient samples represent direct ancestors of modern Europeans. We also found that, as shown previously, the most ancient Europeans appear to have had the smallest effective population sizes, indicating a role for agriculture in modern population growth. PMID:29167200
Systematic and stochastic influences on the performance of the MinION nanopore sequencer across a range of nucleotide bias

DOE Office of Scientific and Technical Information (OSTI.GOV)

Krishnakumar, Raga; Sinha, Anupama; Bird, Sara W.

Emerging sequencing technologies are allowing us to characterize environmental, clinical and laboratory samples with increasing speed and detail, including real-time analysis and interpretation of data. One example of this is being able to rapidly and accurately detect a wide range of pathogenic organisms, both in the clinic and the field. Genomes can have radically different GC content however, such that accurate sequence analysis can be challenging depending upon the technology used. Here, we have characterized the performance of the Oxford MinION nanopore sequencer for detection and evaluation of organisms with a range of genomic nucleotide bias. We have diagnosed themore » quality of base-calling across individual reads and discovered that the position within the read affects base-calling and quality scores. Finally, we have evaluated the performance of the current state-of-the-art neural network-based MinION basecaller, characterizing its behavior with respect to systemic errors as well as context- and sequence-specific errors. Overall, we present a detailed characterization the capabilities of the MinION in terms of generating high-accuracy sequence data from genomes with a wide range of nucleotide content. This study provides a framework for designing the appropriate experiments that are the likely to lead to accurate and rapid field-forward diagnostics.« less
Systematic and stochastic influences on the performance of the MinION nanopore sequencer across a range of nucleotide bias

DOE PAGES

Krishnakumar, Raga; Sinha, Anupama; Bird, Sara W.; ...

2018-02-16

Emerging sequencing technologies are allowing us to characterize environmental, clinical and laboratory samples with increasing speed and detail, including real-time analysis and interpretation of data. One example of this is being able to rapidly and accurately detect a wide range of pathogenic organisms, both in the clinic and the field. Genomes can have radically different GC content however, such that accurate sequence analysis can be challenging depending upon the technology used. Here, we have characterized the performance of the Oxford MinION nanopore sequencer for detection and evaluation of organisms with a range of genomic nucleotide bias. We have diagnosed themore » quality of base-calling across individual reads and discovered that the position within the read affects base-calling and quality scores. Finally, we have evaluated the performance of the current state-of-the-art neural network-based MinION basecaller, characterizing its behavior with respect to systemic errors as well as context- and sequence-specific errors. Overall, we present a detailed characterization the capabilities of the MinION in terms of generating high-accuracy sequence data from genomes with a wide range of nucleotide content. This study provides a framework for designing the appropriate experiments that are the likely to lead to accurate and rapid field-forward diagnostics.« less
mtDNAmanager: a Web-based tool for the management and quality analysis of mitochondrial DNA control-region sequences

PubMed Central

Lee, Hwan Young; Song, Injee; Ha, Eunho; Cho, Sung-Bae; Yang, Woo Ick; Shin, Kyoung-Jin

2008-01-01

Background For the past few years, scientific controversy has surrounded the large number of errors in forensic and literature mitochondrial DNA (mtDNA) data. However, recent research has shown that using mtDNA phylogeny and referring to known mtDNA haplotypes can be useful for checking the quality of sequence data. Results We developed a Web-based bioinformatics resource "mtDNAmanager" that offers a convenient interface supporting the management and quality analysis of mtDNA sequence data. The mtDNAmanager performs computations on mtDNA control-region sequences to estimate the most-probable mtDNA haplogroups and retrieves similar sequences from a selected database. By the phased designation of the most-probable haplogroups (both expected and estimated haplogroups), mtDNAmanager enables users to systematically detect errors whilst allowing for confirmation of the presence of clear key diagnostic mutations and accompanying mutations. The query tools of mtDNAmanager also facilitate database screening with two options of "match" and "include the queried nucleotide polymorphism". In addition, mtDNAmanager provides Web interfaces for users to manage and analyse their own data in batch mode. Conclusion The mtDNAmanager will provide systematic routines for mtDNA sequence data management and analysis via easily accessible Web interfaces, and thus should be very useful for population, medical and forensic studies that employ mtDNA analysis. mtDNAmanager can be accessed at . PMID:19014619
Contingent negative variation (CNV) associated with sensorimotor timing error correction.

PubMed

Jang, Joonyong; Jones, Myles; Milne, Elizabeth; Wilson, Daniel; Lee, Kwang-Hyuk

2016-02-15

Detection and subsequent correction of sensorimotor timing errors are fundamental to adaptive behavior. Using scalp-recorded event-related potentials (ERPs), we sought to find ERP components that are predictive of error correction performance during rhythmic movements. Healthy right-handed participants were asked to synchronize their finger taps to a regular tone sequence (every 600 ms), while EEG data were continuously recorded. Data from 15 participants were analyzed. Occasional irregularities were built into stimulus presentation timing: 90 ms before (advances: negative shift) or after (delays: positive shift) the expected time point. A tapping condition alternated with a listening condition in which identical stimulus sequence was presented but participants did not tap. Behavioral error correction was observed immediately following a shift, with a degree of over-correction with positive shifts. Our stimulus-locked ERP data analysis revealed, 1) increased auditory N1 amplitude for the positive shift condition and decreased auditory N1 modulation for the negative shift condition; and 2) a second enhanced negativity (N2) in the tapping positive condition, compared with the tapping negative condition. In response-locked epochs, we observed a CNV (contingent negative variation)-like negativity with earlier latency in the tapping negative condition compared with the tapping positive condition. This CNV-like negativity peaked at around the onset of subsequent tapping, with the earlier the peak, the better the error correction performance with the negative shifts while the later the peak, the better the error correction performance with the positive shifts. This study showed that the CNV-like negativity was associated with the error correction performance during our sensorimotor synchronization study. Auditory N1 and N2 were differentially involved in negative vs. positive error correction. However, we did not find evidence for their involvement in behavioral error correction. Overall, our study provides the basis from which further research on the role of the CNV in perceptual and motor timing can be developed. Copyright © 2015 Elsevier Inc. All rights reserved.
Dynamical decoupling of local transverse random telegraph noise in a two-qubit gate

NASA Astrophysics Data System (ADS)

D'Arrigo, A.; Falci, G.; Paladino, E.

2015-10-01

Achieving high-fidelity universal two-qubit gates is a central requisite of any implementation of quantum information processing. The presence of spurious fluctuators of various physical origin represents a limiting factor for superconducting nanodevices. Operating qubits at optimal points, where the qubit-fluctuator interaction is transverse with respect to the single qubit Hamiltonian, considerably improved single qubit gates. Further enhancement has been achieved by dynamical decoupling (DD). In this article we investigate DD of transverse random telegraph noise acting locally on each of the qubits forming an entangling gate. Our analysis is based on the exact numerical solution of the stochastic Schrödinger equation. We evaluate the gate error under local periodic, Carr-Purcell and Uhrig DD sequences. We find that a threshold value of the number, n, of pulses exists above which the gate error decreases with a sequence-specific power-law dependence on n. Below threshold, DD may even increase the error with respect to the unconditioned evolution, a behaviour reminiscent of the anti-Zeno effect.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.