A Fast Approximate Algorithm for Mapping Long Reads to Large Reference Databases.
Jain, Chirag; Dilthey, Alexander; Koren, Sergey; Aluru, Srinivas; Phillippy, Adam M
2018-04-30
Emerging single-molecule sequencing technologies from Pacific Biosciences and Oxford Nanopore have revived interest in long-read mapping algorithms. Alignment-based seed-and-extend methods demonstrate good accuracy, but face limited scalability, while faster alignment-free methods typically trade decreased precision for efficiency. In this article, we combine a fast approximate read mapping algorithm based on minimizers with a novel MinHash identity estimation technique to achieve both scalability and precision. In contrast to prior methods, we develop a mathematical framework that defines the types of mapping targets we uncover, establish probabilistic estimates of p-value and sensitivity, and demonstrate tolerance for alignment error rates up to 20%. With this framework, our algorithm automatically adapts to different minimum length and identity requirements and provides both positional and identity estimates for each mapping reported. For mapping human PacBio reads to the hg38 reference, our method is 290 × faster than Burrows-Wheeler Aligner-MEM with a lower memory footprint and recall rate of 96%. We further demonstrate the scalability of our method by mapping noisy PacBio reads (each ≥5 kbp in length) to the complete NCBI RefSeq database containing 838 Gbp of sequence and >60,000 genomes.
2009-01-01
Background ESTs or variable sequence reads can be available in prokaryotic studies well before a complete genome is known. Use cases include (i) transcriptome studies or (ii) single cell sequencing of bacteria. Without suitable software their further analysis and mapping would have to await finalization of the corresponding genome. Results The tool JANE rapidly maps ESTs or variable sequence reads in prokaryotic sequencing and transcriptome efforts to related template genomes. It provides an easy-to-use graphics interface for information retrieval and a toolkit for EST or nucleotide sequence function prediction. Furthermore, we developed for rapid mapping an enhanced sequence alignment algorithm which reassembles and evaluates high scoring pairs provided from the BLAST algorithm. Rapid assembly on and replacement of the template genome by sequence reads or mapped ESTs is achieved. This is illustrated (i) by data from Staphylococci as well as from a Blattabacteria sequencing effort, (ii) mapping single cell sequencing reads is shown for poribacteria to sister phylum representative Rhodopirellula Baltica SH1. The algorithm has been implemented in a web-server accessible at http://jane.bioapps.biozentrum.uni-wuerzburg.de. Conclusion Rapid prokaryotic EST mapping or mapping of sequence reads is achieved applying JANE even without knowing the cognate genome sequence. PMID:19943962
Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing.
Misra, Sanchit; Agrawal, Ankit; Liao, Wei-keng; Choudhary, Alok
2011-01-15
Recently, a number of programs have been proposed for mapping short reads to a reference genome. Many of them are heavily optimized for short-read mapping and hence are very efficient for shorter queries, but that makes them inefficient or not applicable for reads longer than 200 bp. However, many sequencers are already generating longer reads and more are expected to follow. For long read sequence mapping, there are limited options; BLAT, SSAHA2, FANGS and BWA-SW are among the popular ones. However, resequencing and personalized medicine need much faster software to map these long sequencing reads to a reference genome to identify SNPs or rare transcripts. We present AGILE (AliGnIng Long rEads), a hash table based high-throughput sequence mapping algorithm for longer 454 reads that uses diagonal multiple seed-match criteria, customized q-gram filtering and a dynamic incremental search approach among other heuristics to optimize every step of the mapping process. In our experiments, we observe that AGILE is more accurate than BLAT, and comparable to BWA-SW and SSAHA2. For practical error rates (< 5%) and read lengths (200-1000 bp), AGILE is significantly faster than BLAT, SSAHA2 and BWA-SW. Even for the other cases, AGILE is comparable to BWA-SW and several times faster than BLAT and SSAHA2. http://www.ece.northwestern.edu/~smi539/agile.html.
A Secure Alignment Algorithm for Mapping Short Reads to Human Genome.
Zhao, Yongan; Wang, Xiaofeng; Tang, Haixu
2018-05-09
The elastic and inexpensive computing resources such as clouds have been recognized as a useful solution to analyzing massive human genomic data (e.g., acquired by using next-generation sequencers) in biomedical researches. However, outsourcing human genome computation to public or commercial clouds was hindered due to privacy concerns: even a small number of human genome sequences contain sufficient information for identifying the donor of the genomic data. This issue cannot be directly addressed by existing security and cryptographic techniques (such as homomorphic encryption), because they are too heavyweight to carry out practical genome computation tasks on massive data. In this article, we present a secure algorithm to accomplish the read mapping, one of the most basic tasks in human genomic data analysis based on a hybrid cloud computing model. Comparing with the existing approaches, our algorithm delegates most computation to the public cloud, while only performing encryption and decryption on the private cloud, and thus makes the maximum use of the computing resource of the public cloud. Furthermore, our algorithm reports similar results as the nonsecure read mapping algorithms, including the alignment between reads and the reference genome, which can be directly used in the downstream analysis such as the inference of genomic variations. We implemented the algorithm in C++ and Python on a hybrid cloud system, in which the public cloud uses an Apache Spark system.
Kim, Jeremie S; Senol Cali, Damla; Xin, Hongyi; Lee, Donghyuk; Ghose, Saugata; Alser, Mohammed; Hassan, Hasan; Ergin, Oguz; Alkan, Can; Mutlu, Onur
2018-05-09
Seed location filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. State-of-the-art read mappers 1) quickly generate possible mapping locations for seeds (i.e., smaller segments) within each read, 2) extract reference sequences at each of the mapping locations, and 3) check similarity between each read and its associated reference sequences with a computationally-expensive algorithm (i.e., sequence alignment) to determine the origin of the read. A seed location filter comes into play before alignment, discarding seed locations that alignment would deem a poor match. The ideal seed location filter would discard all poor match locations prior to alignment such that there is no wasted computation on unnecessary alignments. We propose a novel seed location filtering algorithm, GRIM-Filter, optimized to exploit 3D-stacked memory systems that integrate computation within a logic layer stacked under memory layers, to perform processing-in-memory (PIM). GRIM-Filter quickly filters seed locations by 1) introducing a new representation of coarse-grained segments of the reference genome, and 2) using massively-parallel in-memory operations to identify read presence within each coarse-grained segment. Our evaluations show that for a sequence alignment error tolerance of 0.05, GRIM-Filter 1) reduces the false negative rate of filtering by 5.59x-6.41x, and 2) provides an end-to-end read mapper speedup of 1.81x-3.65x, compared to a state-of-the-art read mapper employing the best previous seed location filtering algorithm. GRIM-Filter exploits 3D-stacked memory, which enables the efficient use of processing-in-memory, to overcome the memory bandwidth bottleneck in seed location filtering. We show that GRIM-Filter significantly improves the performance of a state-of-the-art read mapper. GRIM-Filter is a universal seed location filter that can be applied to any read mapper. We hope that our results provide inspiration for new works to design other bioinformatics algorithms that take advantage of emerging technologies and new processing paradigms, such as processing-in-memory using 3D-stacked memory devices.
Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data
2014-01-01
Background The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms. Results In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established. Conclusions A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark procedure can be used to evaluate existing or in-development mappers as well as to optimize parameters of a chosen mapper for any application and any sequencing platform. PMID:24708189
HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads
Li, Pinghao; Jiang, Xiaoqian; Wang, Shuang; Kim, Jihoon; Xiong, Hongkai; Ohno-Machado, Lucila
2014-01-01
Background and objective Short-read sequencing is becoming the standard of practice for the study of structural variants associated with disease. However, with the growth of sequence data largely surpassing reasonable storage capability, the biomedical community is challenged with the management, transfer, archiving, and storage of sequence data. Methods We developed Hierarchical mUlti-reference Genome cOmpression (HUGO), a novel compression algorithm for aligned reads in the sorted Sequence Alignment/Map (SAM) format. We first aligned short reads against a reference genome and stored exactly mapped reads for compression. For the inexact mapped or unmapped reads, we realigned them against different reference genomes using an adaptive scheme by gradually shortening the read length. Regarding the base quality value, we offer lossy and lossless compression mechanisms. The lossy compression mechanism for the base quality values uses k-means clustering, where a user can adjust the balance between decompression quality and compression rate. The lossless compression can be produced by setting k (the number of clusters) to the number of different quality values. Results The proposed method produced a compression ratio in the range 0.5–0.65, which corresponds to 35–50% storage savings based on experimental datasets. The proposed approach achieved 15% more storage savings over CRAM and comparable compression ratio with Samcomp (CRAM and Samcomp are two of the state-of-the-art genome compression algorithms). The software is freely available at https://sourceforge.net/projects/hierachicaldnac/with a General Public License (GPL) license. Limitation Our method requires having different reference genomes and prolongs the execution time for additional alignments. Conclusions The proposed multi-reference-based compression algorithm for aligned reads outperforms existing single-reference based algorithms. PMID:24368726
Survey of gene splicing algorithms based on reads.
Si, Xiuhua; Wang, Qian; Zhang, Lei; Wu, Ruo; Ma, Jiquan
2017-11-02
Gene splicing is the process of assembling a large number of unordered short sequence fragments to the original genome sequence as accurately as possible. Several popular splicing algorithms based on reads are reviewed in this article, including reference genome algorithms and de novo splicing algorithms (Greedy-extension, Overlap-Layout-Consensus graph, De Bruijn graph). We also discuss a new splicing method based on the MapReduce strategy and Hadoop. By comparing these algorithms, some conclusions are drawn and some suggestions on gene splicing research are made.
Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi
2016-06-15
Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502. yasu@bio.keio.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Concurrent and Accurate Short Read Mapping on Multicore Processors.
Martínez, Héctor; Tárraga, Joaquín; Medina, Ignacio; Barrachina, Sergio; Castillo, Maribel; Dopazo, Joaquín; Quintana-Ortí, Enrique S
2015-01-01
We introduce a parallel aligner with a work-flow organization for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, HPG Aligner SA (HPG Aligner SA is an open-source application. The software is available at http://www.opencb.org, exploits a suffix array to rapidly map a large fraction of the RNA fragments (reads), as well as leverages the accuracy of the Smith-Waterman algorithm to deal with conflictive reads. The aligner is enhanced with a careful strategy to detect splice junctions based on an adaptive division of RNA reads into small segments (or seeds), which are then mapped onto a number of candidate alignment locations, providing crucial information for the successful alignment of the complete reads. The experimental results on a platform with Intel multicore technology report the parallel performance of HPG Aligner SA, on RNA reads of 100-400 nucleotides, which excels in execution time/sensitivity to state-of-the-art aligners such as TopHat 2+Bowtie 2, MapSplice, and STAR.
The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote
Liao, Yang; Smyth, Gordon K.; Shi, Wei
2013-01-01
Read alignment is an ongoing challenge for the analysis of data from sequencing technologies. This article proposes an elegantly simple multi-seed strategy, called seed-and-vote, for mapping reads to a reference genome. The new strategy chooses the mapped genomic location for the read directly from the seeds. It uses a relatively large number of short seeds (called subreads) extracted from each read and allows all the seeds to vote on the optimal location. When the read length is <160 bp, overlapping subreads are used. More conventional alignment algorithms are then used to fill in detailed mismatch and indel information between the subreads that make up the winning voting block. The strategy is fast because the overall genomic location has already been chosen before the detailed alignment is done. It is sensitive because no individual subread is required to map exactly, nor are individual subreads constrained to map close by other subreads. It is accurate because the final location must be supported by several different subreads. The strategy extends easily to find exon junctions, by locating reads that contain sets of subreads mapping to different exons of the same gene. It scales up efficiently for longer reads. PMID:23558742
Zhang, Zijun; Xing, Yi
2017-09-19
Crosslinking or RNA immunoprecipitation followed by sequencing (CLIP-seq or RIP-seq) allows transcriptome-wide discovery of RNA regulatory sites. As CLIP-seq/RIP-seq reads are short, existing computational tools focus on uniquely mapped reads, while reads mapped to multiple loci are discarded. We present CLAM (CLIP-seq Analysis of Multi-mapped reads). CLAM uses an expectation-maximization algorithm to assign multi-mapped reads and calls peaks combining uniquely and multi-mapped reads. To demonstrate the utility of CLAM, we applied it to a wide range of public CLIP-seq/RIP-seq datasets involving numerous splicing factors, microRNAs and m6A RNA methylation. CLAM recovered a large number of novel RNA regulatory sites inaccessible by uniquely mapped reads. The functional significance of these sites was demonstrated by consensus motif patterns and association with alternative splicing (splicing factors), transcript abundance (AGO2) and mRNA half-life (m6A). CLAM provides a useful tool to discover novel protein-RNA interactions and RNA modification sites from CLIP-seq and RIP-seq data, and reveals the significant contribution of repetitive elements to the RNA regulatory landscape of the human transcriptome. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.
Lee, Wan-Ping; Stromberg, Michael P; Ward, Alistair; Stewart, Chip; Garrison, Erik P; Marth, Gabor T
2014-01-01
MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me).
MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping
Lee, Wan-Ping; Stromberg, Michael P.; Ward, Alistair; Stewart, Chip; Garrison, Erik P.; Marth, Gabor T.
2014-01-01
MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me). PMID:24599324
biobambam: tools for read pair collation based algorithms on BAM files
2014-01-01
Background Sequence alignment data is often ordered by coordinate (id of the reference sequence plus position on the sequence where the fragment was mapped) when stored in BAM files, as this simplifies the extraction of variants between the mapped data and the reference or of variants within the mapped data. In this order paired reads are usually separated in the file, which complicates some other applications like duplicate marking or conversion to the FastQ format which require to access the full information of the pairs. Results In this paper we introduce biobambam, a set of tools based on the efficient collation of alignments in BAM files by read name. The employed collation algorithm avoids time and space consuming sorting of alignments by read name where this is possible without using more than a specified amount of main memory. Using this algorithm tasks like duplicate marking in BAM files and conversion of BAM files to the FastQ format can be performed very efficiently with limited resources. We also make the collation algorithm available in the form of an API for other projects. This API is part of the libmaus package. Conclusions In comparison with previous approaches to problems involving the collation of alignments by read name like the BAM to FastQ or duplication marking utilities our approach can often perform an equivalent task more efficiently in terms of the required main memory and run-time. Our BAM to FastQ conversion is faster than all widely known alternatives including Picard and bamUtil. Our duplicate marking is about as fast as the closest competitor bamUtil for small data sets and faster than all known alternatives on large and complex data sets.
Long Read Alignment with Parallel MapReduce Cloud Platform
Al-Absi, Ahmed Abdulhakim; Kang, Dae-Ki
2015-01-01
Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner's Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR) cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms. PMID:26839887
Long Read Alignment with Parallel MapReduce Cloud Platform.
Al-Absi, Ahmed Abdulhakim; Kang, Dae-Ki
2015-01-01
Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner's Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR) cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms.
Xiao, Chuan-Le; Mai, Zhi-Biao; Lian, Xin-Lei; Zhong, Jia-Yong; Jin, Jing-Jie; He, Qing-Yu; Zhang, Gong
2014-01-01
Correct and bias-free interpretation of the deep sequencing data is inevitably dependent on the complete mapping of all mappable reads to the reference sequence, especially for quantitative RNA-seq applications. Seed-based algorithms are generally slow but robust, while Burrows-Wheeler Transform (BWT) based algorithms are fast but less robust. To have both advantages, we developed an algorithm FANSe2 with iterative mapping strategy based on the statistics of real-world sequencing error distribution to substantially accelerate the mapping without compromising the accuracy. Its sensitivity and accuracy are higher than the BWT-based algorithms in the tests using both prokaryotic and eukaryotic sequencing datasets. The gene identification results of FANSe2 is experimentally validated, while the previous algorithms have false positives and false negatives. FANSe2 showed remarkably better consistency to the microarray than most other algorithms in terms of gene expression quantifications. We implemented a scalable and almost maintenance-free parallelization method that can utilize the computational power of multiple office computers, a novel feature not present in any other mainstream algorithm. With three normal office computers, we demonstrated that FANSe2 mapped an RNA-seq dataset generated from an entire Illunima HiSeq 2000 flowcell (8 lanes, 608 M reads) to masked human genome within 4.1 hours with higher sensitivity than Bowtie/Bowtie2. FANSe2 thus provides robust accuracy, full indel sensitivity, fast speed, versatile compatibility and economical computational utilization, making it a useful and practical tool for deep sequencing applications. FANSe2 is freely available at http://bioinformatics.jnu.edu.cn/software/fanse2/.
Zheng, Qi; Grice, Elizabeth A
2016-10-01
Accurate mapping of next-generation sequencing (NGS) reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely) mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost's algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost.
ReadXplorer—visualization and analysis of mapped sequences
Hilker, Rolf; Stadermann, Kai Bernd; Doppmeier, Daniel; Kalinowski, Jörn; Stoye, Jens; Straube, Jasmin; Winnebald, Jörn; Goesmann, Alexander
2014-01-01
Motivation: Fast algorithms and well-arranged visualizations are required for the comprehensive analysis of the ever-growing size of genomic and transcriptomic next-generation sequencing data. Results: ReadXplorer is a software offering straightforward visualization and extensive analysis functions for genomic and transcriptomic DNA sequences mapped on a reference. A unique specialty of ReadXplorer is the quality classification of the read mappings. It is incorporated in all analysis functions and displayed in ReadXplorer's various synchronized data viewers for (i) the reference sequence, its base coverage as (ii) normalizable plot and (iii) histogram, (iv) read alignments and (v) read pairs. ReadXplorer's analysis capability covers RNA secondary structure prediction, single nucleotide polymorphism and deletion–insertion polymorphism detection, genomic feature and general coverage analysis. Especially for RNA-Seq data, it offers differential gene expression analysis, transcription start site and operon detection as well as RPKM value and read count calculations. Furthermore, ReadXplorer can combine or superimpose coverage of different datasets. Availability and implementation: ReadXplorer is available as open-source software at http://www.readxplorer.org along with a detailed manual. Contact: rhilker@mikrobio.med.uni-giessen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24790157
TopHat: discovering splice junctions with RNA-Seq
Trapnell, Cole; Pachter, Lior; Salzberg, Steven L.
2009-01-01
Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu Contact: cole@cs.umd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19289445
Zheng, Qi; Grice, Elizabeth A.
2016-01-01
Accurate mapping of next-generation sequencing (NGS) reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely) mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost’s algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost. PMID:27706155
Indexing a sequence for mapping reads with a single mismatch.
Crochemore, Maxime; Langiu, Alessio; Rahman, M Sohel
2014-05-28
Mapping reads against a genome sequence is an interesting and useful problem in computational molecular biology and bioinformatics. In this paper, we focus on the problem of indexing a sequence for mapping reads with a single mismatch. We first focus on a simpler problem where the length of the pattern is given beforehand during the data structure construction. This version of the problem is interesting in its own right in the context of the next generation sequencing. In the sequel, we show how to solve the more general problem. In both cases, our algorithm can construct an efficient data structure in O(n log(1+ε) n) time and space and can answer subsequent queries in O(m log log n + K) time. Here, n is the length of the sequence, m is the length of the read, 0<ε<1 and is the optimal output size.
Mapping From an Instrumented Glove to a Robot Hand
NASA Technical Reports Server (NTRS)
Goza, Michael
2005-01-01
An algorithm has been developed to solve the problem of mapping from (1) a glove instrumented with joint-angle sensors to (2) an anthropomorphic robot hand. Such a mapping is needed to generate control signals to make the robot hand mimic the configuration of the hand of a human attempting to control the robot. The mapping problem is complicated by uncertainties in sensor locations caused by variations in sizes and shapes of hands and variations in the fit of the glove. The present mapping algorithm is robust in the face of these uncertainties, largely because it includes a calibration sub-algorithm that inherently adapts the mapping to the specific hand and glove, without need for measuring the hand and without regard for goodness of fit. The algorithm utilizes a forward-kinematics model of the glove derived from documentation provided by the manufacturer of the glove. In this case, forward-kinematics model signifies a mathematical model of the glove fingertip positions as functions of the sensor readings. More specifically, given the sensor readings, the forward-kinematics model calculates the glove fingertip positions in a Cartesian reference frame nominally attached to the palm. The algorithm also utilizes an inverse-kinematics model of the robot hand. In this case, inverse-kinematics model signifies a mathematical model of the robot finger-joint angles as functions of the robot fingertip positions. Again, more specifically, the inverse-kinematics model calculates the finger-joint commands needed to place the fingertips at specified positions in a Cartesian reference frame that is attached to the palm of the robot hand and that nominally corresponds to the Cartesian reference frame attached to the palm of the glove. Initially, because of the aforementioned uncertainties, the glove fingertip positions calculated by the forwardkinematics model in the glove Cartesian reference frame cannot be expected to match the robot fingertip positions in the robot-hand Cartesian reference frame. A calibration must be performed to make the glove and robot-hand fingertip positions correspond more precisely. The calibration procedure involves a few simple hand poses designed to provide well-defined fingertip positions. One of the poses is a fist. In each of the other poses, a finger touches the thumb. The calibration subalgorithm uses the sensor readings from these poses to modify the kinematical models to make the two sets of fingertip positions agree more closely.
BatMis: a fast algorithm for k-mismatch mapping.
Tennakoon, Chandana; Purbojati, Rikky W; Sung, Wing-Kin
2012-08-15
Second-generation sequencing (SGS) generates millions of reads that need to be aligned to a reference genome allowing errors. Although current aligners can efficiently map reads allowing a small number of mismatches, they are not well suited for handling a large number of mismatches. The efficiency of aligners can be improved using various heuristics, but the sensitivity and accuracy of the alignments are sacrificed. In this article, we introduce Basic Alignment tool for Mismatches (BatMis)--an efficient method to align short reads to a reference allowing k mismatches. BatMis is a Burrows-Wheeler transformation based aligner that uses a seed and extend approach, and it is an exact method. Benchmark tests show that BatMis performs better than competing aligners in solving the k-mismatch problem. Furthermore, it can compete favorably even when compared with the heuristic modes of the other aligners. BatMis is a useful alternative for applications where fast k-mismatch mappings, unique mappings or multiple mappings of SGS data are required. BatMis is written in C/C++ and is freely available from http://code.google.com/p/batmis/
A hybrid short read mapping accelerator
2013-01-01
Background The rapid growth of short read datasets poses a new challenge to the short read mapping problem in terms of sensitivity and execution speed. Existing methods often use a restrictive error model for computing the alignments to improve speed, whereas more flexible error models are generally too slow for large-scale applications. A number of short read mapping software tools have been proposed. However, designs based on hardware are relatively rare. Field programmable gate arrays (FPGAs) have been successfully used in a number of specific application areas, such as the DSP and communications domains due to their outstanding parallel data processing capabilities, making them a competitive platform to solve problems that are “inherently parallel”. Results We present a hybrid system for short read mapping utilizing both FPGA-based hardware and CPU-based software. The computation intensive alignment and the seed generation operations are mapped onto an FPGA. We present a computationally efficient, parallel block-wise alignment structure (Align Core) to approximate the conventional dynamic programming algorithm. The performance is compared to the multi-threaded CPU-based GASSST and BWA software implementations. For single-end alignment, our hybrid system achieves faster processing speed than GASSST (with a similar sensitivity) and BWA (with a higher sensitivity); for pair-end alignment, our design achieves a slightly worse sensitivity than that of BWA but has a higher processing speed. Conclusions This paper shows that our hybrid system can effectively accelerate the mapping of short reads to a reference genome based on the seed-and-extend approach. The performance comparison to the GASSST and BWA software implementations under different conditions shows that our hybrid design achieves a high degree of sensitivity and requires less overall execution time with only modest FPGA resource utilization. Our hybrid system design also shows that the performance bottleneck for the short read mapping problem can be changed from the alignment stage to the seed generation stage, which provides an additional requirement for the future development of short read aligners. PMID:23441908
USDA-ARS?s Scientific Manuscript database
Ongoing developments and cost decreases in next-generation sequencing (NGS) technologies have led to an increase in their application, which has greatly enhanced the fields of genetics and genomics. Mapping sequence reads onto a reference genome is a fundamental step in the analysis of NGS data. Eff...
Flood inundation extent mapping based on block compressed tracing
NASA Astrophysics Data System (ADS)
Shen, Dingtao; Rui, Yikang; Wang, Jiechen; Zhang, Yu; Cheng, Liang
2015-07-01
Flood inundation extent, depth, and duration are important factors affecting flood hazard evaluation. At present, flood inundation analysis is based mainly on a seeded region-growing algorithm, which is an inefficient process because it requires excessive recursive computations and it is incapable of processing massive datasets. To address this problem, we propose a block compressed tracing algorithm for mapping the flood inundation extent, which reads the DEM data in blocks before transferring them to raster compression storage. This allows a smaller computer memory to process a larger amount of data, which solves the problem of the regular seeded region-growing algorithm. In addition, the use of a raster boundary tracing technique allows the algorithm to avoid the time-consuming computations required by the seeded region-growing. Finally, we conduct a comparative evaluation in the Chin-sha River basin, results show that the proposed method solves the problem of flood inundation extent mapping based on massive DEM datasets with higher computational efficiency than the original method, which makes it suitable for practical applications.
Yuan, Shuai; Johnston, H. Richard; Zhang, Guosheng; Li, Yun; Hu, Yi-Juan; Qin, Zhaohui S.
2015-01-01
With rapid decline of the sequencing cost, researchers today rush to embrace whole genome sequencing (WGS), or whole exome sequencing (WES) approach as the next powerful tool for relating genetic variants to human diseases and phenotypes. A fundamental step in analyzing WGS and WES data is mapping short sequencing reads back to the reference genome. This is an important issue because incorrectly mapped reads affect the downstream variant discovery, genotype calling and association analysis. Although many read mapping algorithms have been developed, the majority of them uses the universal reference genome and do not take sequence variants into consideration. Given that genetic variants are ubiquitous, it is highly desirable if they can be factored into the read mapping procedure. In this work, we developed a novel strategy that utilizes genotypes obtained a priori to customize the universal haploid reference genome into a personalized diploid reference genome. The new strategy is implemented in a program named RefEditor. When applying RefEditor to real data, we achieved encouraging improvements in read mapping, variant discovery and genotype calling. Compared to standard approaches, RefEditor can significantly increase genotype calling consistency (from 43% to 61% at 4X coverage; from 82% to 92% at 20X coverage) and reduce Mendelian inconsistency across various sequencing depths. Because many WGS and WES studies are conducted on cohorts that have been genotyped using array-based genotyping platforms previously or concurrently, we believe the proposed strategy will be of high value in practice, which can also be applied to the scenario where multiple NGS experiments are conducted on the same cohort. The RefEditor sources are available at https://github.com/superyuan/refeditor. PMID:26267278
Efficient algorithms for polyploid haplotype phasing.
He, Dan; Saha, Subrata; Finkers, Richard; Parida, Laxmi
2018-05-09
Inference of haplotypes, or the sequence of alleles along the same chromosomes, is a fundamental problem in genetics and is a key component for many analyses including admixture mapping, identifying regions of identity by descent and imputation. Haplotype phasing based on sequencing reads has attracted lots of attentions. Diploid haplotype phasing where the two haplotypes are complimentary have been studied extensively. In this work, we focused on Polyploid haplotype phasing where we aim to phase more than two haplotypes at the same time from sequencing data. The problem is much more complicated as the search space becomes much larger and the haplotypes do not need to be complimentary any more. We proposed two algorithms, (1) Poly-Harsh, a Gibbs Sampling based algorithm which alternatively samples haplotypes and the read assignments to minimize the mismatches between the reads and the phased haplotypes, (2) An efficient algorithm to concatenate haplotype blocks into contiguous haplotypes. Our experiments showed that our method is able to improve the quality of the phased haplotypes over the state-of-the-art methods. To our knowledge, our algorithm for haplotype blocks concatenation is the first algorithm that leverages the shared information across multiple individuals to construct contiguous haplotypes. Our experiments showed that it is both efficient and effective.
Algorithms for Large-Scale Astronomical Problems
2013-08-01
implemented as a succession of Hadoop MapReduce jobs and sequential programs written in Java . The sampling and splitting stages are implemented as...one MapReduce job, the partitioning and clustering phases make up another job. The merging stage is implemented as a stand-alone Java program. The...Merging. The merging stage is implemented as a sequential Java program that reads the files with the shell information, which were generated by
ORMAN: optimal resolution of ambiguous RNA-Seq multimappings in the presence of novel isoforms.
Dao, Phuong; Numanagić, Ibrahim; Lin, Yen-Yi; Hach, Faraz; Karakoc, Emre; Donmez, Nilgun; Collins, Colin; Eichler, Evan E; Sahinalp, S Cenk
2014-03-01
RNA-Seq technology is promising to uncover many novel alternative splicing events, gene fusions and other variations in RNA transcripts. For an accurate detection and quantification of transcripts, it is important to resolve the mapping ambiguity for those RNA-Seq reads that can be mapped to multiple loci: >17% of the reads from mouse RNA-Seq data and 50% of the reads from some plant RNA-Seq data have multiple mapping loci. In this study, we show how to resolve the mapping ambiguity in the presence of novel transcriptomic events such as exon skipping and novel indels towards accurate downstream analysis. We introduce ORMAN ( O ptimal R esolution of M ultimapping A mbiguity of R N A-Seq Reads), which aims to compute the minimum number of potential transcript products for each gene and to assign each multimapping read to one of these transcripts based on the estimated distribution of the region covering the read. ORMAN achieves this objective through a combinatorial optimization formulation, which is solved through well-known approximation algorithms, integer linear programs and heuristics. On a simulated RNA-Seq dataset including a random subset of transcripts from the UCSC database, the performance of several state-of-the-art methods for identifying and quantifying novel transcripts, such as Cufflinks, IsoLasso and CLIIQ, is significantly improved through the use of ORMAN. Furthermore, in an experiment using real RNA-Seq reads, we show that ORMAN is able to resolve multimapping to produce coverage values that are similar to the original distribution, even in genes with highly non-uniform coverage. ORMAN is available at http://orman.sf.net
Spiral waves characterization: Implications for an automated cardiodynamic tissue characterization.
Alagoz, Celal; Cohen, Andrew R; Frisch, Daniel R; Tunç, Birkan; Phatharodom, Saran; Guez, Allon
2018-07-01
Spiral waves are phenomena observed in cardiac tissue especially during fibrillatory activities. Spiral waves are revealed through in-vivo and in-vitro studies using high density mapping that requires special experimental setup. Also, in-silico spiral wave analysis and classification is performed using membrane potentials from entire tissue. In this study, we report a characterization approach that identifies spiral wave behaviors using intracardiac electrogram (EGM) readings obtained with commonly used multipolar diagnostic catheters that perform localized but high-resolution readings. Specifically, the algorithm is designed to distinguish between stationary, meandering, and break-up rotors. The clustering and classification algorithms are tested on simulated data produced using a phenomenological 2D model of cardiac propagation. For EGM measurements, unipolar-bipolar EGM readings from various locations on tissue using two catheter types are modeled. The distance measure between spiral behaviors are assessed using normalized compression distance (NCD), an information theoretical distance. NCD is a universal metric in the sense it is solely based on compressibility of dataset and not requiring feature extraction. We also introduce normalized FFT distance (NFFTD) where compressibility is replaced with a FFT parameter. Overall, outstanding clustering performance was achieved across varying EGM reading configurations. We found that effectiveness in distinguishing was superior in case of NCD than NFFTD. We demonstrated that distinct spiral activity identification on a behaviorally heterogeneous tissue is also possible. This report demonstrates a theoretical validation of clustering and classification approaches that provide an automated mapping from EGM signals to assessment of spiral wave behaviors and hence offers a potential mapping and analysis framework for cardiac tissue wavefront propagation patterns. Copyright © 2018 Elsevier B.V. All rights reserved.
Zhang, Yanju; Lameijer, Eric-Wubbo; 't Hoen, Peter A C; Ning, Zemin; Slagboom, P Eline; Ye, Kai
2012-02-15
RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon-exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge. We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ≈ 137,000 and 173,000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples. The code and utilities can be freely downloaded from https://trac.nbic.nl/passion and ftp://ftp.sanger.ac.uk/pub/zn1/passion.
ILP-based maximum likelihood genome scaffolding
2014-01-01
Background Interest in de novo genome assembly has been renewed in the past decade due to rapid advances in high-throughput sequencing (HTS) technologies which generate relatively short reads resulting in highly fragmented assemblies consisting of contigs. Additional long-range linkage information is typically used to orient, order, and link contigs into larger structures referred to as scaffolds. Due to library preparation artifacts and erroneous mapping of reads originating from repeats, scaffolding remains a challenging problem. In this paper, we provide a scalable scaffolding algorithm (SILP2) employing a maximum likelihood model capturing read mapping uncertainty and/or non-uniformity of contig coverage which is solved using integer linear programming. A Non-Serial Dynamic Programming (NSDP) paradigm is applied to render our algorithm useful in the processing of larger mammalian genomes. To compare scaffolding tools, we employ novel quantitative metrics in addition to the extant metrics in the field. We have also expanded the set of experiments to include scaffolding of low-complexity metagenomic samples. Results SILP2 achieves better scalability throughg a more efficient NSDP algorithm than previous release of SILP. The results show that SILP2 compares favorably to previous methods OPERA and MIP in both scalability and accuracy for scaffolding single genomes of up to human size, and significantly outperforms them on scaffolding low-complexity metagenomic samples. Conclusions Equipped with NSDP, SILP2 is able to scaffold large mammalian genomes, resulting in the longest and most accurate scaffolds. The ILP formulation for the maximum likelihood model is shown to be flexible enough to handle metagenomic samples. PMID:25253180
Munger, Steven C.; Raghupathy, Narayanan; Choi, Kwangbom; Simons, Allen K.; Gatti, Daniel M.; Hinerfeld, Douglas A.; Svenson, Karen L.; Keller, Mark P.; Attie, Alan D.; Hibbs, Matthew A.; Graber, Joel H.; Chesler, Elissa J.; Churchill, Gary A.
2014-01-01
Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations. PMID:25236449
NURD: an implementation of a new method to estimate isoform expression from non-uniform RNA-seq data
2013-01-01
Background RNA-Seq technology has been used widely in transcriptome study, and one of the most important applications is to estimate the expression level of genes and their alternative splicing isoforms. There have been several algorithms published to estimate the expression based on different models. Recently Wu et al. published a method that can accurately estimate isoform level expression by considering position-related sequencing biases using nonparametric models. The method has advantages in handling different read distributions, but there hasn’t been an efficient program to implement this algorithm. Results We developed an efficient implementation of the algorithm in the program NURD. It uses a binary interval search algorithm. The program can correct both the global tendency of sequencing bias in the data and local sequencing bias specific to each gene. The correction makes the isoform expression estimation more reliable under various read distributions. And the implementation is computationally efficient in both the memory cost and running time and can be readily scaled up for huge datasets. Conclusion NURD is an efficient and reliable tool for estimating the isoform expression level. Given the reads mapping result and gene annotation file, NURD will output the expression estimation result. The package is freely available for academic use at http://bioinfo.au.tsinghua.edu.cn/software/NURD/. PMID:23837734
NASA Astrophysics Data System (ADS)
Jin, Minglei; Jin, Weiqi; Li, Yiyang; Li, Shuo
2015-08-01
In this paper, we propose a novel scene-based non-uniformity correction algorithm for infrared image processing-temporal high-pass non-uniformity correction algorithm based on grayscale mapping (THP and GM). The main sources of non-uniformity are: (1) detector fabrication inaccuracies; (2) non-linearity and variations in the read-out electronics and (3) optical path effects. The non-uniformity will be reduced by non-uniformity correction (NUC) algorithms. The NUC algorithms are often divided into calibration-based non-uniformity correction (CBNUC) algorithms and scene-based non-uniformity correction (SBNUC) algorithms. As non-uniformity drifts temporally, CBNUC algorithms must be repeated by inserting a uniform radiation source which SBNUC algorithms do not need into the view, so the SBNUC algorithm becomes an essential part of infrared imaging system. The SBNUC algorithms' poor robustness often leads two defects: artifacts and over-correction, meanwhile due to complicated calculation process and large storage consumption, hardware implementation of the SBNUC algorithms is difficult, especially in Field Programmable Gate Array (FPGA) platform. The THP and GM algorithm proposed in this paper can eliminate the non-uniformity without causing defects. The hardware implementation of the algorithm only based on FPGA has two advantages: (1) low resources consumption, and (2) small hardware delay: less than 20 lines, it can be transplanted to a variety of infrared detectors equipped with FPGA image processing module, it can reduce the stripe non-uniformity and the ripple non-uniformity.
Hahn, Lars; Leimeister, Chris-André; Ounit, Rachid; Lonardi, Stefano; Morgenstern, Burkhard
2016-10-01
Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don't-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de/.
GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping.
Alser, Mohammed; Hassan, Hasan; Xin, Hongyi; Ergin, Oguz; Mutlu, Onur; Alkan, Can
2017-11-01
High throughput DNA sequencing (HTS) technologies generate an excessive number of small DNA segments -called short reads- that cause significant computational burden. To analyze the entire genome, each of the billions of short reads must be mapped to a reference genome based on the similarity between a read and 'candidate' locations in that reference genome. The similarity measurement, called alignment, formulated as an approximate string matching problem, is the computational bottleneck because: (i) it is implemented using quadratic-time dynamic programming algorithms and (ii) the majority of candidate locations in the reference genome do not align with a given read due to high dissimilarity. Calculating the alignment of such incorrect candidate locations consumes an overwhelming majority of a modern read mapper's execution time. Therefore, it is crucial to develop a fast and effective filter that can detect incorrect candidate locations and eliminate them before invoking computationally costly alignment algorithms. We propose GateKeeper, a new hardware accelerator that functions as a pre-alignment step that quickly filters out most incorrect candidate locations. GateKeeper is the first design to accelerate pre-alignment using Field-Programmable Gate Arrays (FPGAs), which can perform pre-alignment much faster than software. When implemented on a single FPGA chip, GateKeeper maintains high accuracy (on average >96%) while providing, on average, 90-fold and 130-fold speedup over the state-of-the-art software pre-alignment techniques, Adjacency Filter and Shifted Hamming Distance (SHD), respectively. The addition of GateKeeper as a pre-alignment step can reduce the verification time of the mrFAST mapper by a factor of 10. https://github.com/BilkentCompGen/GateKeeper. mohammedalser@bilkent.edu.tr or onur.mutlu@inf.ethz.ch or calkan@cs.bilkent.edu.tr. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Normal and compound poisson approximations for pattern occurrences in NGS reads.
Zhai, Zhiyuan; Reinert, Gesine; Song, Kai; Waterman, Michael S; Luan, Yihui; Sun, Fengzhu
2012-06-01
Next generation sequencing (NGS) technologies are now widely used in many biological studies. In NGS, sequence reads are randomly sampled from the genome sequence of interest. Most computational approaches for NGS data first map the reads to the genome and then analyze the data based on the mapped reads. Since many organisms have unknown genome sequences and many reads cannot be uniquely mapped to the genomes even if the genome sequences are known, alternative analytical methods are needed for the study of NGS data. Here we suggest using word patterns to analyze NGS data. Word pattern counting (the study of the probabilistic distribution of the number of occurrences of word patterns in one or multiple long sequences) has played an important role in molecular sequence analysis. However, no studies are available on the distribution of the number of occurrences of word patterns in NGS reads. In this article, we build probabilistic models for the background sequence and the sampling process of the sequence reads from the genome. Based on the models, we provide normal and compound Poisson approximations for the number of occurrences of word patterns from the sequence reads, with bounds on the approximation error. The main challenge is to consider the randomness in generating the long background sequence, as well as in the sampling of the reads using NGS. We show the accuracy of these approximations under a variety of conditions for different patterns with various characteristics. Under realistic assumptions, the compound Poisson approximation seems to outperform the normal approximation in most situations. These approximate distributions can be used to evaluate the statistical significance of the occurrence of patterns from NGS data. The theory and the computational algorithm for calculating the approximate distributions are then used to analyze ChIP-Seq data using transcription factor GABP. Software is available online (www-rcf.usc.edu/∼fsun/Programs/NGS_motif_power/NGS_motif_power.html). In addition, Supplementary Material can be found online (www.liebertonline.com/cmb).
Heuristic approach to image registration
NASA Astrophysics Data System (ADS)
Gertner, Izidor; Maslov, Igor V.
2000-08-01
Image registration, i.e. correct mapping of images obtained from different sensor readings onto common reference frame, is a critical part of multi-sensor ATR/AOR systems based on readings from different types of sensors. In order to fuse two different sensor readings of the same object, the readings have to be put into a common coordinate system. This task can be formulated as optimization problem in a space of all possible affine transformations of an image. In this paper, a combination of heuristic methods is explored to register gray- scale images. The modification of Genetic Algorithm is used as the first step in global search for optimal transformation. It covers the entire search space with (randomly or heuristically) scattered probe points and helps significantly reduce the search space to a subspace of potentially most successful transformations. Due to its discrete character, however, Genetic Algorithm in general can not converge while coming close to the optimum. Its termination point can be specified either as some predefined number of generations or as achievement of a certain acceptable convergence level. To refine the search, potential optimal subspaces are searched using more delicate and efficient for local search Taboo and Simulated Annealing methods.
Flexible taxonomic assignment of ambiguous sequencing reads
2011-01-01
Background To characterize the diversity of bacterial populations in metagenomic studies, sequencing reads need to be accurately assigned to taxonomic units in a given reference taxonomy. Reads that cannot be reliably assigned to a unique leaf in the taxonomy (ambiguous reads) are typically assigned to the lowest common ancestor of the set of species that match it. This introduces a potentially severe error in the estimation of bacteria present in the sample due to false positives, since all species in the subtree rooted at the ancestor are implicitly assigned to the read even though many of them may not match it. Results We present a method that maps each read to a node in the taxonomy that minimizes a penalty score while balancing the relevance of precision and recall in the assignment through a parameter q. This mapping can be obtained in time linear in the number of matching sequences, because LCA queries to the reference taxonomy take constant time. When applied to six different metagenomic datasets, our algorithm produces different taxonomic distributions depending on whether coverage or precision is maximized. Including information on the quality of the reads reduces the number of unassigned reads but increases the number of ambiguous reads, stressing the relevance of our method. Finally, two measures of performance are described and results with a set of artificially generated datasets are discussed. Conclusions The assignment strategy of sequencing reads introduced in this paper is a versatile and a quick method to study bacterial communities. The bacterial composition of the analyzed samples can vary significantly depending on how ambiguous reads are assigned depending on the value of the q parameter. Validation of our results in an artificial dataset confirm that a combination of values of q produces the most accurate results. PMID:21211059
DistMap: a toolkit for distributed short read mapping on a Hadoop cluster.
Pandey, Ram Vinay; Schlötterer, Christian
2013-01-01
With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/
DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster
Pandey, Ram Vinay; Schlötterer, Christian
2013-01-01
With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/ PMID:24009693
Uniform, optimal signal processing of mapped deep-sequencing data.
Kumar, Vibhor; Muratani, Masafumi; Rayan, Nirmala Arul; Kraus, Petra; Lufkin, Thomas; Ng, Huck Hui; Prabhakar, Shyam
2013-07-01
Despite their apparent diversity, many problems in the analysis of high-throughput sequencing data are merely special cases of two general problems, signal detection and signal estimation. Here we adapt formally optimal solutions from signal processing theory to analyze signals of DNA sequence reads mapped to a genome. We describe DFilter, a detection algorithm that identifies regulatory features in ChIP-seq, DNase-seq and FAIRE-seq data more accurately than assay-specific algorithms. We also describe EFilter, an estimation algorithm that accurately predicts mRNA levels from as few as 1-2 histone profiles (R ∼0.9). Notably, the presence of regulatory motifs in promoters correlates more with histone modifications than with mRNA levels, suggesting that histone profiles are more predictive of cis-regulatory mechanisms. We show by applying DFilter and EFilter to embryonic forebrain ChIP-seq data that regulatory protein identification and functional annotation are feasible despite tissue heterogeneity. The mathematical formalism underlying our tools facilitates integrative analysis of data from virtually any sequencing-based functional profile.
Bennetts, Victor Hernandez; Schaffernicht, Erik; Pomareda, Victor; Lilienthal, Achim J; Marco, Santiago; Trincavelli, Marco
2014-09-17
In this paper, we address the task of gas distribution modeling in scenarios where multiple heterogeneous compounds are present. Gas distribution modeling is particularly useful in emission monitoring applications where spatial representations of the gaseous patches can be used to identify emission hot spots. In realistic environments, the presence of multiple chemicals is expected and therefore, gas discrimination has to be incorporated in the modeling process. The approach presented in this work addresses the task of gas distribution modeling by combining different non selective gas sensors. Gas discrimination is addressed with an open sampling system, composed by an array of metal oxide sensors and a probabilistic algorithm tailored to uncontrolled environments. For each of the identified compounds, the mapping algorithm generates a calibrated gas distribution model using the classification uncertainty and the concentration readings acquired with a photo ionization detector. The meta parameters of the proposed modeling algorithm are automatically learned from the data. The approach was validated with a gas sensitive robot patrolling outdoor and indoor scenarios, where two different chemicals were released simultaneously. The experimental results show that the generated multi compound maps can be used to accurately predict the location of emitting gas sources.
Zhang, Yanju; Lameijer, Eric-Wubbo; 't Hoen, Peter A. C.; Ning, Zemin; Slagboom, P. Eline; Ye, Kai
2012-01-01
Motivation: RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon–exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge. Results: We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ∼ 137 000 and 173 000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples. Availability: The code and utilities can be freely downloaded from https://trac.nbic.nl/passion and ftp://ftp.sanger.ac.uk/pub/zn1/passion Contact: y.zhang@lumc.nl; k.ye@lumc.nl Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22219203
Light-weight reference-based compression of FASTQ data.
Zhang, Yongpeng; Li, Linsen; Yang, Yanli; Yang, Xiao; He, Shan; Zhu, Zexuan
2015-06-09
The exponential growth of next generation sequencing (NGS) data has posed big challenges to data storage, management and archive. Data compression is one of the effective solutions, where reference-based compression strategies can typically achieve superior compression ratios compared to the ones not relying on any reference. This paper presents a lossless light-weight reference-based compression algorithm namely LW-FQZip to compress FASTQ data. The three components of any given input, i.e., metadata, short reads and quality score strings, are first parsed into three data streams in which the redundancy information are identified and eliminated independently. Particularly, well-designed incremental and run-length-limited encoding schemes are utilized to compress the metadata and quality score streams, respectively. To handle the short reads, LW-FQZip uses a novel light-weight mapping model to fast map them against external reference sequence(s) and produce concise alignment results for storage. The three processed data streams are then packed together with some general purpose compression algorithms like LZMA. LW-FQZip was evaluated on eight real-world NGS data sets and achieved compression ratios in the range of 0.111-0.201. This is comparable or superior to other state-of-the-art lossless NGS data compression algorithms. LW-FQZip is a program that enables efficient lossless FASTQ data compression. It contributes to the state of art applications for NGS data storage and transmission. LW-FQZip is freely available online at: http://csse.szu.edu.cn/staff/zhuzx/LWFQZip.
Learning to read aloud: A neural network approach using sparse distributed memory
NASA Technical Reports Server (NTRS)
Joglekar, Umesh Dwarkanath
1989-01-01
An attempt to solve a problem of text-to-phoneme mapping is described which does not appear amenable to solution by use of standard algorithmic procedures. Experiments based on a model of distributed processing are also described. This model (sparse distributed memory (SDM)) can be used in an iterative supervised learning mode to solve the problem. Additional improvements aimed at obtaining better performance are suggested.
Minimap2: pairwise alignment for nucleotide sequences.
Li, Heng
2018-05-10
Recent advances in sequencing technologies promise ultra-long reads of ∼100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥ 100bp in length, ≥1kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads, and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions (INDELs) and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. https://github.com/lh3/minimap2. hengli@broadinstitute.org.
Dilthey, Alexander T; Gourraud, Pierre-Antoine; Mentzer, Alexander J; Cereb, Nezih; Iqbal, Zamin; McVean, Gil
2016-10-01
Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample) remain a significant challenge to practical application.
High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs
Dilthey, Alexander T.; Gourraud, Pierre-Antoine; McVean, Gil
2016-01-01
Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30–250 CPU hours per sample) remain a significant challenge to practical application. PMID:27792722
YAHA: fast and flexible long-read alignment with optimal breakpoint detection.
Faust, Gregory G; Hall, Ira M
2012-10-01
With improved short-read assembly algorithms and the recent development of long-read sequencers, split mapping will soon be the preferred method for structural variant (SV) detection. Yet, current alignment tools are not well suited for this. We present YAHA, a fast and flexible hash-based aligner. YAHA is as fast and accurate as BWA-SW at finding the single best alignment per query and is dramatically faster and more sensitive than both SSAHA2 and MegaBLAST at finding all possible alignments. Unlike other aligners that report all, or one, alignment per query, or that use simple heuristics to select alignments, YAHA uses a directed acyclic graph to find the optimal set of alignments that cover a query using a biologically relevant breakpoint penalty. YAHA can also report multiple mappings per defined segment of the query. We show that YAHA detects more breakpoints in less time than BWA-SW across all SV classes, and especially excels at complex SVs comprising multiple breakpoints. YAHA is currently supported on 64-bit Linux systems. Binaries and sample data are freely available for download from http://faculty.virginia.edu/irahall/YAHA. imh4y@virginia.edu.
A parallel and sensitive software tool for methylation analysis on multicore platforms.
Tárraga, Joaquín; Pérez, Mariano; Orduña, Juan M; Duato, José; Medina, Ignacio; Dopazo, Joaquín
2015-10-01
DNA methylation analysis suffers from very long processing time, as the advent of Next-Generation Sequencers has shifted the bottleneck of genomic studies from the sequencers that obtain the DNA samples to the software that performs the analysis of these samples. The existing software for methylation analysis does not seem to scale efficiently neither with the size of the dataset nor with the length of the reads to be analyzed. As it is expected that the sequencers will provide longer and longer reads in the near future, efficient and scalable methylation software should be developed. We present a new software tool, called HPG-Methyl, which efficiently maps bisulphite sequencing reads on DNA, analyzing DNA methylation. The strategy used by this software consists of leveraging the speed of the Burrows-Wheeler Transform to map a large number of DNA fragments (reads) rapidly, as well as the accuracy of the Smith-Waterman algorithm, which is exclusively employed to deal with the most ambiguous and shortest reads. Experimental results on platforms with Intel multicore processors show that HPG-Methyl significantly outperforms in both execution time and sensitivity state-of-the-art software such as Bismark, BS-Seeker or BSMAP, particularly for long bisulphite reads. Software in the form of C libraries and functions, together with instructions to compile and execute this software. Available by sftp to anonymous@clariano.uv.es (password 'anonymous'). juan.orduna@uv.es or jdopazo@cipf.es. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Capturing User Reading Behaviors for Personalized Document Summarization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, Songhua; Jiang, Hao; Lau, Francis
2011-01-01
We propose a new personalized document summarization method that observes a user's personal reading preferences. These preferences are inferred from the user's reading behaviors, including facial expressions, gaze positions, and reading durations that were captured during the user's past reading activities. We compare the performance of our algorithm with that of a few peer algorithms and software packages. The results of our comparative study show that our algorithm can produce more superior personalized document summaries than all the other methods in that the summaries generated by our algorithm can better satisfy a user's personal preferences.
A Massively Parallel Computational Method of Reading Index Files for SOAPsnv.
Zhu, Xiaoqian; Peng, Shaoliang; Liu, Shaojie; Cui, Yingbo; Gu, Xiang; Gao, Ming; Fang, Lin; Fang, Xiaodong
2015-12-01
SOAPsnv is the software used for identifying the single nucleotide variation in cancer genes. However, its performance is yet to match the massive amount of data to be processed. Experiments reveal that the main performance bottleneck of SOAPsnv software is the pileup algorithm. The original pileup algorithm's I/O process is time-consuming and inefficient to read input files. Moreover, the scalability of the pileup algorithm is also poor. Therefore, we designed a new algorithm, named BamPileup, aiming to improve the performance of sequential read, and the new pileup algorithm implemented a parallel read mode based on index. Using this method, each thread can directly read the data start from a specific position. The results of experiments on the Tianhe-2 supercomputer show that, when reading data in a multi-threaded parallel I/O way, the processing time of algorithm is reduced to 3.9 s and the application program can achieve a speedup up to 100×. Moreover, the scalability of the new algorithm is also satisfying.
Read clouds uncover variation in complex regions of the human genome
Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E.; West, Robert; Sidow, Arend; Batzoglou, Serafim
2015-01-01
Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. PMID:26286554
A non-parametric peak calling algorithm for DamID-Seq.
Li, Renhua; Hempel, Leonie U; Jiang, Tingbo
2015-01-01
Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS) of double sex (DSX)-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID) technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq). One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only). After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1) reads resampling; 2) reads scaling (normalization) and computing signal-to-noise fold changes; 3) filtering; 4) Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC). We also used irreproducible discovery rate (IDR) analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.
A difference tracking algorithm based on discrete sine transform
NASA Astrophysics Data System (ADS)
Liu, HaoPeng; Yao, Yong; Lei, HeBing; Wu, HaoKun
2018-04-01
Target tracking is an important field of computer vision. The template matching tracking algorithm based on squared difference matching (SSD) and standard correlation coefficient (NCC) matching is very sensitive to the gray change of image. When the brightness or gray change, the tracking algorithm will be affected by high-frequency information. Tracking accuracy is reduced, resulting in loss of tracking target. In this paper, a differential tracking algorithm based on discrete sine transform is proposed to reduce the influence of image gray or brightness change. The algorithm that combines the discrete sine transform and the difference algorithm maps the target image into a image digital sequence. The Kalman filter predicts the target position. Using the Hamming distance determines the degree of similarity between the target and the template. The window closest to the template is determined the target to be tracked. The target to be tracked updates the template. Based on the above achieve target tracking. The algorithm is tested in this paper. Compared with SSD and NCC template matching algorithms, the algorithm tracks target stably when image gray or brightness change. And the tracking speed can meet the read-time requirement.
Beyond mind-reading: multi-voxel pattern analysis of fMRI data.
Norman, Kenneth A; Polyn, Sean M; Detre, Greg J; Haxby, James V
2006-09-01
A key challenge for cognitive neuroscience is determining how mental representations map onto patterns of neural activity. Recently, researchers have started to address this question by applying sophisticated pattern-classification algorithms to distributed (multi-voxel) patterns of functional MRI data, with the goal of decoding the information that is represented in the subject's brain at a particular point in time. This multi-voxel pattern analysis (MVPA) approach has led to several impressive feats of mind reading. More importantly, MVPA methods constitute a useful new tool for advancing our understanding of neural information processing. We review how researchers are using MVPA methods to characterize neural coding and information processing in domains ranging from visual perception to memory search.
Accurate typing of short tandem repeats from genome-wide sequencing data and its applications.
Fungtammasan, Arkarachai; Ananda, Guruprasad; Hile, Suzanne E; Su, Marcia Shu-Wei; Sun, Chen; Harris, Robert; Medvedev, Paul; Eckert, Kristin; Makova, Kateryna D
2015-05-01
Short tandem repeats (STRs) are implicated in dozens of human genetic diseases and contribute significantly to genome variation and instability. Yet profiling STRs from short-read sequencing data is challenging because of their high sequencing error rates. Here, we developed STR-FM, short tandem repeat profiling using flank-based mapping, a computational pipeline that can detect the full spectrum of STR alleles from short-read data, can adapt to emerging read-mapping algorithms, and can be applied to heterogeneous genetic samples (e.g., tumors, viruses, and genomes of organelles). We used STR-FM to study STR error rates and patterns in publicly available human and in-house generated ultradeep plasmid sequencing data sets. We discovered that STRs sequenced with a PCR-free protocol have up to ninefold fewer errors than those sequenced with a PCR-containing protocol. We constructed an error correction model for genotyping STRs that can distinguish heterozygous alleles containing STRs with consecutive repeat numbers. Applying our model and pipeline to Illumina sequencing data with 100-bp reads, we could confidently genotype several disease-related long trinucleotide STRs. Utilizing this pipeline, for the first time we determined the genome-wide STR germline mutation rate from a deeply sequenced human pedigree. Additionally, we built a tool that recommends minimal sequencing depth for accurate STR genotyping, depending on repeat length and sequencing read length. The required read depth increases with STR length and is lower for a PCR-free protocol. This suite of tools addresses the pressing challenges surrounding STR genotyping, and thus is of wide interest to researchers investigating disease-related STRs and STR evolution. © 2015 Fungtammasan et al.; Published by Cold Spring Harbor Laboratory Press.
Service-Aware Clustering: An Energy-Efficient Model for the Internet-of-Things
Bagula, Antoine; Abidoye, Ademola Philip; Zodi, Guy-Alain Lusilao
2015-01-01
Current generation wireless sensor routing algorithms and protocols have been designed based on a myopic routing approach, where the motes are assumed to have the same sensing and communication capabilities. Myopic routing is not a natural fit for the IoT, as it may lead to energy imbalance and subsequent short-lived sensor networks, routing the sensor readings over the most service-intensive sensor nodes, while leaving the least active nodes idle. This paper revisits the issue of energy efficiency in sensor networks to propose a clustering model where sensor devices’ service delivery is mapped into an energy awareness model, used to design a clustering algorithm that finds service-aware clustering (SAC) configurations in IoT settings. The performance evaluation reveals the relative energy efficiency of the proposed SAC algorithm compared to related routing algorithms in terms of energy consumption, the sensor nodes’ life span and its traffic engineering efficiency in terms of throughput and delay. These include the well-known low energy adaptive clustering hierarchy (LEACH) and LEACH-centralized (LEACH-C) algorithms, as well as the most recent algorithms, such as DECSA and MOCRN. PMID:26703619
Service-Aware Clustering: An Energy-Efficient Model for the Internet-of-Things.
Bagula, Antoine; Abidoye, Ademola Philip; Zodi, Guy-Alain Lusilao
2015-12-23
Current generation wireless sensor routing algorithms and protocols have been designed based on a myopic routing approach, where the motes are assumed to have the same sensing and communication capabilities. Myopic routing is not a natural fit for the IoT, as it may lead to energy imbalance and subsequent short-lived sensor networks, routing the sensor readings over the most service-intensive sensor nodes, while leaving the least active nodes idle. This paper revisits the issue of energy efficiency in sensor networks to propose a clustering model where sensor devices' service delivery is mapped into an energy awareness model, used to design a clustering algorithm that finds service-aware clustering (SAC) configurations in IoT settings. The performance evaluation reveals the relative energy efficiency of the proposed SAC algorithm compared to related routing algorithms in terms of energy consumption, the sensor nodes' life span and its traffic engineering efficiency in terms of throughput and delay. These include the well-known low energy adaptive clustering hierarchy (LEACH) and LEACH-centralized (LEACH-C) algorithms, as well as the most recent algorithms, such as DECSA and MOCRN.
Nawrocki, Eric P.; Burge, Sarah W.
2013-01-01
The development of RNA bioinformatic tools began more than 30 y ago with the description of the Nussinov and Zuker dynamic programming algorithms for single sequence RNA secondary structure prediction. Since then, many tools have been developed for various RNA sequence analysis problems such as homology search, multiple sequence alignment, de novo RNA discovery, read-mapping, and many more. In this issue, we have collected a sampling of reviews and original research that demonstrate some of the many ways bioinformatics is integrated with current RNA biology research. PMID:23948768
Read clouds uncover variation in complex regions of the human genome.
Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E; West, Robert; Sidow, Arend; Batzoglou, Serafim
2015-10-01
Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. © 2015 Bishara et al.; Published by Cold Spring Harbor Laboratory Press.
Scholz, Matthew; Lo, Chien -Chi; Chain, Patrick S. G.
2014-10-01
Assembly of metagenomic samples is a very complex process, with algorithms designed to address sequencing platform-specific issues, (read length, data volume, and/or community complexity), while also faced with genomes that differ greatly in nucleotide compositional biases and in abundance. To address these issues, we have developed a post-assembly process: MetaGenomic Assembly by Merging (MeGAMerge). We compare this process to the performance of several assemblers, using both real, and in-silico generated samples of different community composition and complexity. MeGAMerge consistently outperforms individual assembly methods, producing larger contigs with an increased number of predicted genes, without replication of data. MeGAMerge contigs aremore » supported by read mapping and contig alignment data, when using synthetically-derived and real metagenomic data, as well as by gene prediction analyses and similarity searches. Ultimately, MeGAMerge is a flexible method that generates improved metagenome assemblies, with the ability to accommodate upcoming sequencing platforms, as well as present and future assembly algorithms.« less
Automated Robot Movement in the Mapped Area Using Fuzzy Logic for Wheel Chair Application
NASA Astrophysics Data System (ADS)
Siregar, B.; Efendi, S.; Ramadhana, H.; Andayani, U.; Fahmi, F.
2018-03-01
The difficulties of the disabled to move make them unable to live independently. People with disabilities need supporting device to move from place to place. For that, we proposed a solution that can help people with disabilities to move from one room to another automatically. This study aims to create a wheelchair prototype in the form of a wheeled robot as a means to learn the automatic mobilization. The fuzzy logic algorithm was used to determine motion direction based on initial position, ultrasonic sensors reading in avoiding obstacles, infrared sensors reading as a black line reader for the wheeled robot to move smooth and smartphone as a mobile controller. As a result, smartphones with the Android operating system can control the robot using Bluetooth. Here Bluetooth technology can be used to control the robot from a maximum distance of 15 meters. The proposed algorithm was able to work stable for automatic motion determination based on initial position, and also able to modernize the wheelchair movement from one room to another automatically.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McLoughlin, Kevin
2016-01-11
This report describes the design and implementation of an algorithm for estimating relative microbial abundances, together with confidence limits, using data from metagenomic DNA sequencing. For the background behind this project and a detailed discussion of our modeling approach for metagenomic data, we refer the reader to our earlier technical report, dated March 4, 2014. Briefly, we described a fully Bayesian generative model for paired-end sequence read data, incorporating the effects of the relative abundances, the distribution of sequence fragment lengths, fragment position bias, sequencing errors and variations between the sampled genomes and the nearest reference genomes. A distinctive featuremore » of our modeling approach is the use of a Chinese restaurant process (CRP) to describe the selection of genomes to be sampled, and thus the relative abundances. The CRP component is desirable for fitting abundances to reads that may map ambiguously to multiple targets, because it naturally leads to sparse solutions that select the best representative from each set of nearly equivalent genomes.« less
libgapmis: extending short-read alignments
2013-01-01
Background A wide variety of short-read alignment programmes have been published recently to tackle the problem of mapping millions of short reads to a reference genome, focusing on different aspects of the procedure such as time and memory efficiency, sensitivity, and accuracy. These tools allow for a small number of mismatches in the alignment; however, their ability to allow for gaps varies greatly, with many performing poorly or not allowing them at all. The seed-and-extend strategy is applied in most short-read alignment programmes. After aligning a substring of the reference sequence against the high-quality prefix of a short read--the seed--an important problem is to find the best possible alignment between a substring of the reference sequence succeeding and the remaining suffix of low quality of the read--extend. The fact that the reads are rather short and that the gap occurrence frequency observed in various studies is rather low suggest that aligning (parts of) those reads with a single gap is in fact desirable. Results In this article, we present libgapmis, a library for extending pairwise short-read alignments. Apart from the standard CPU version, it includes ultrafast SSE- and GPU-based implementations. libgapmis is based on an algorithm computing a modified version of the traditional dynamic-programming matrix for sequence alignment. Extensive experimental results demonstrate that the functions of the CPU version provided in this library accelerate the computations by a factor of 20 compared to other programmes. The analogous SSE- and GPU-based implementations accelerate the computations by a factor of 6 and 11, respectively, compared to the CPU version. The library also provides the user the flexibility to split the read into fragments, based on the observed gap occurrence frequency and the length of the read, thereby allowing for a variable, but bounded, number of gaps in the alignment. Conclusions We present libgapmis, a library for extending pairwise short-read alignments. We show that libgapmis is better-suited and more efficient than existing algorithms for this task. The importance of our contribution is underlined by the fact that the provided functions may be seamlessly integrated into any short-read alignment pipeline. The open-source code of libgapmis is available at http://www.exelixis-lab.org/gapmis. PMID:24564250
NASA Astrophysics Data System (ADS)
Alvarez, Jose; Massey, Steven; Kalitsov, Alan; Velev, Julian
Nanopore sequencing via transverse current has emerged as a competitive candidate for mapping DNA methylation without needed bisulfite-treatment, fluorescent tag, or PCR amplification. By eliminating the error producing amplification step, long read lengths become feasible, which greatly simplifies the assembly process and reduces the time and the cost inherent in current technologies. However, due to the large error rates of nanopore sequencing, single base resolution has not been reached. A very important source of noise is the intrinsic structural noise in the electric signature of the nucleotide arising from the influence of neighboring nucleotides. In this work we perform calculations of the tunneling current through DNA molecules in nanopores using the non-equilibrium electron transport method within an effective multi-orbital tight-binding model derived from first-principles calculations. We develop a base-calling algorithm accounting for the correlations of the current through neighboring bases, which in principle can reduce the error rate below any desired precision. Using this method we show that we can clearly distinguish DNA methylation and other base modifications based on the reading of the tunneling current.
Accurate estimation of short read mapping quality for next-generation genome sequencing
Ruffalo, Matthew; Koyutürk, Mehmet; Ray, Soumya; LaFramboise, Thomas
2012-01-01
Motivation: Several software tools specialize in the alignment of short next-generation sequencing reads to a reference sequence. Some of these tools report a mapping quality score for each alignment—in principle, this quality score tells researchers the likelihood that the alignment is correct. However, the reported mapping quality often correlates weakly with actual accuracy and the qualities of many mappings are underestimated, encouraging the researchers to discard correct mappings. Further, these low-quality mappings tend to correlate with variations in the genome (both single nucleotide and structural), and such mappings are important in accurately identifying genomic variants. Approach: We develop a machine learning tool, LoQuM (LOgistic regression tool for calibrating the Quality of short read mappings, to assign reliable mapping quality scores to mappings of Illumina reads returned by any alignment tool. LoQuM uses statistics on the read (base quality scores reported by the sequencer) and the alignment (number of matches, mismatches and deletions, mapping quality score returned by the alignment tool, if available, and number of mappings) as features for classification and uses simulated reads to learn a logistic regression model that relates these features to actual mapping quality. Results: We test the predictions of LoQuM on an independent dataset generated by the ART short read simulation software and observe that LoQuM can ‘resurrect’ many mappings that are assigned zero quality scores by the alignment tools and are therefore likely to be discarded by researchers. We also observe that the recalibration of mapping quality scores greatly enhances the precision of called single nucleotide polymorphisms. Availability: LoQuM is available as open source at http://compbio.case.edu/loqum/. Contact: matthew.ruffalo@case.edu. PMID:22962451
Palmer, Lance E; Dejori, Mathaeus; Bolanos, Randall; Fasulo, Daniel
2010-01-15
With the rapid expansion of DNA sequencing databases, it is now feasible to identify relevant information from prior sequencing projects and completed genomes and apply it to de novo sequencing of new organisms. As an example, this paper demonstrates how such extra information can be used to improve de novo assemblies by augmenting the overlapping step. Finding all pairs of overlapping reads is a key task in many genome assemblers, and to this end, highly efficient algorithms have been developed to find alignments in large collections of sequences. It is well known that due to repeated sequences, many aligned pairs of reads nevertheless do not overlap. But no overlapping algorithm to date takes a rigorous approach to separating aligned but non-overlapping read pairs from true overlaps. We present an approach that extends the Minimus assembler by a data driven step to classify overlaps as true or false prior to contig construction. We trained several different classification models within the Weka framework using various statistics derived from overlaps of reads available from prior sequencing projects. These statistics included percent mismatch and k-mer frequencies within the overlaps as well as a comparative genomics score derived from mapping reads to multiple reference genomes. We show that in real whole-genome sequencing data from the E. coli and S. aureus genomes, by providing a curated set of overlaps to the contigging phase of the assembler, we nearly doubled the median contig length (N50) without sacrificing coverage of the genome or increasing the number of mis-assemblies. Machine learning methods that use comparative and non-comparative features to classify overlaps as true or false can be used to improve the quality of a sequence assembly.
Bickhart, Derek M; Rosen, Benjamin D; Koren, Sergey; Sayre, Brian L; Hastie, Alex R; Chan, Saki; Lee, Joyce; Lam, Ernest T; Liachko, Ivan; Sullivan, Shawn T; Burton, Joshua N; Huson, Heather J; Nystrom, John C; Kelley, Christy M; Hutchison, Jana L; Zhou, Yang; Sun, Jiajie; Crisà, Alessandra; Ponce de León, F Abel; Schwartz, John C; Hammond, John A; Waldbieser, Geoffrey C; Schroeder, Steven G; Liu, George E; Dunham, Maitreya J; Shendure, Jay; Sonstegard, Tad S; Phillippy, Adam M; Van Tassell, Curtis P; Smith, Timothy P L
2017-04-01
The decrease in sequencing cost and increased sophistication of assembly algorithms for short-read platforms has resulted in a sharp increase in the number of species with genome assemblies. However, these assemblies are highly fragmented, with many gaps, ambiguities, and errors, impeding downstream applications. We demonstrate current state of the art for de novo assembly using the domestic goat (Capra hircus) based on long reads for contig formation, short reads for consensus validation, and scaffolding by optical and chromatin interaction mapping. These combined technologies produced what is, to our knowledge, the most continuous de novo mammalian assembly to date, with chromosome-length scaffolds and only 649 gaps. Our assembly represents a ∼400-fold improvement in continuity due to properly assembled gaps, compared to the previously published C. hircus assembly, and better resolves repetitive structures longer than 1 kb, representing the largest repeat family and immune gene complex yet produced for an individual of a ruminant species.
Bickhart, Derek M.; Rosen, Benjamin D.; Koren, Sergey; Sayre, Brian L.; Hastie, Alex R.; Chan, Saki; Lee, Joyce; Lam, Ernest T.; Liachko, Ivan; Sullivan, Shawn T.; Burton, Joshua N.; Huson, Heather J.; Nystrom, John C.; Kelley, Christy M.; Hutchison, Jana L.; Zhou, Yang; Sun, Jiajie; Crisà, Alessandra; de León, F. Abel Ponce; Schwartz, John C.; Hammond, John A.; Waldbieser, Geoffrey C.; Schroeder, Steven G.; Liu, George E.; Dunham, Maitreya J.; Shendure, Jay; Sonstegard, Tad S.; Phillippy, Adam M.; Van Tassell, Curtis P.; Smith, Timothy P.L.
2018-01-01
The decrease in sequencing cost and increased sophistication of assembly algorithms for short-read platforms has resulted in a sharp increase in the number of species with genome assemblies. However, these assemblies are highly fragmented, with many gaps, ambiguities, and errors, impeding downstream applications. We demonstrate current state of the art for de novo assembly using the domestic goat (Capra hircus), based on long reads for contig formation, short reads for consensus validation, and scaffolding by optical and chromatin interaction mapping. These combined technologies produced the most continuous de novo mammalian assembly to date, with chromosome-length scaffolds and only 649 gaps. Our assembly represents a ~400-fold improvement in continuity due to properly assembled gaps compared to the previously published C. hircus assembly, and better resolves repetitive structures longer than 1 kb, representing the largest repeat family and immune gene complex ever produced for an individual of a ruminant species. PMID:28263316
ZOOM Lite: next-generation sequencing data mapping and visualization software
Zhang, Zefeng; Lin, Hao; Ma, Bin
2010-01-01
High-throughput next-generation sequencing technologies pose increasing demands on the efficiency, accuracy and usability of data analysis software. In this article, we present ZOOM Lite, a software for efficient reads mapping and result visualization. With a kernel capable of mapping tens of millions of Illumina or AB SOLiD sequencing reads efficiently and accurately, and an intuitive graphical user interface, ZOOM Lite integrates reads mapping and result visualization into a easy to use pipeline on desktop PC. The software handles both single-end and paired-end reads, and can output both the unique mapping result or the top N mapping results for each read. Additionally, the software takes a variety of input file formats and outputs to several commonly used result formats. The software is freely available at http://bioinfor.com/zoom/lite/. PMID:20530531
2016-01-01
Abstract Cortical mapping techniques using fMRI have been instrumental in identifying the boundaries of topological (neighbor‐preserving) maps in early sensory areas. The presence of topological maps beyond early sensory areas raises the possibility that they might play a significant role in other cognitive systems, and that topological mapping might help to delineate areas involved in higher cognitive processes. In this study, we combine surface‐based visual, auditory, and somatomotor mapping methods with a naturalistic reading comprehension task in the same group of subjects to provide a qualitative and quantitative assessment of the cortical overlap between sensory‐motor maps in all major sensory modalities, and reading processing regions. Our results suggest that cortical activation during naturalistic reading comprehension overlaps more extensively with topological sensory‐motor maps than has been heretofore appreciated. Reading activation in regions adjacent to occipital lobe and inferior parietal lobe almost completely overlaps visual maps, whereas a significant portion of frontal activation for reading in dorsolateral and ventral prefrontal cortex overlaps both visual and auditory maps. Even classical language regions in superior temporal cortex are partially overlapped by topological visual and auditory maps. By contrast, the main overlap with somatomotor maps is restricted to a small region on the anterior bank of the central sulcus near the border between the face and hand representations of M‐I. Hum Brain Mapp 37:2784–2810, 2016. © 2016 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc. PMID:27061771
An improved filtering algorithm for big read datasets and its application to single-cell assembly.
Wedemeyer, Axel; Kliemann, Lasse; Srivastav, Anand; Schielke, Christian; Reusch, Thorsten B; Rosenstiel, Philip
2017-07-03
For single-cell or metagenomic sequencing projects, it is necessary to sequence with a very high mean coverage in order to make sure that all parts of the sample DNA get covered by the reads produced. This leads to huge datasets with lots of redundant data. A filtering of this data prior to assembly is advisable. Brown et al. (2012) presented the algorithm Diginorm for this purpose, which filters reads based on the abundance of their k-mers. We present Bignorm, a faster and quality-conscious read filtering algorithm. An important new algorithmic feature is the use of phred quality scores together with a detailed analysis of the k-mer counts to decide which reads to keep. We qualify and recommend parameters for our new read filtering algorithm. Guided by these parameters, we remove in terms of median 97.15% of the reads while keeping the mean phred score of the filtered dataset high. Using the SDAdes assembler, we produce assemblies of high quality from these filtered datasets in a fraction of the time needed for an assembly from the datasets filtered with Diginorm. We conclude that read filtering is a practical and efficient method for reducing read data and for speeding up the assembly process. This applies not only for single cell assembly, as shown in this paper, but also to other projects with high mean coverage datasets like metagenomic sequencing projects. Our Bignorm algorithm allows assemblies of competitive quality in comparison to Diginorm, while being much faster. Bignorm is available for download at https://git.informatik.uni-kiel.de/axw/Bignorm .
Story Map Instruction: A Road Map for Reading Comprehension.
ERIC Educational Resources Information Center
Davis, Zephaniah, T.; McPherson, Michael D.
1989-01-01
Introduces teachers to the development and use of story maps as a tool for promoting reading comprehension. Presents a definition and review of story map research. Explains how to construct story maps, and offers suggestions for starting story map instruction. Provides variations on the use of story maps. (MG)
Li, You; Heavican, Tayla B.; Vellichirammal, Neetha N.; Iqbal, Javeed
2017-01-01
Abstract The RNA-Seq technology has revolutionized transcriptome characterization not only by accurately quantifying gene expression, but also by the identification of novel transcripts like chimeric fusion transcripts. The ‘fusion’ or ‘chimeric’ transcripts have improved the diagnosis and prognosis of several tumors, and have led to the development of novel therapeutic regimen. The fusion transcript detection is currently accomplished by several software packages, primarily relying on sequence alignment algorithms. The alignment of sequencing reads from fusion transcript loci in cancer genomes can be highly challenging due to the incorrect mapping induced by genomic alterations, thereby limiting the performance of alignment-based fusion transcript detection methods. Here, we developed a novel alignment-free method, ChimeRScope that accurately predicts fusion transcripts based on the gene fingerprint (as k-mers) profiles of the RNA-Seq paired-end reads. Results on published datasets and in-house cancer cell line datasets followed by experimental validations demonstrate that ChimeRScope consistently outperforms other popular methods irrespective of the read lengths and sequencing depth. More importantly, results on our in-house datasets show that ChimeRScope is a better tool that is capable of identifying novel fusion transcripts with potential oncogenic functions. ChimeRScope is accessible as a standalone software at (https://github.com/ChimeRScope/ChimeRScope/wiki) or via the Galaxy web-interface at (https://galaxy.unmc.edu/). PMID:28472320
Faramarzi, Salar; Moradi, Mohammadreza; Abedi, Ahmad
2018-06-01
The present study aimed to develop the thinking maps training package and compare its training effect with the thinking maps method on the reading performance of second and fifth grade of elementary school male dyslexic students. For this mixed method exploratory study, from among the above mentioned grades' students in Isfahan, 90 students who met the inclusion criteria were selected by multistage sampling and randomly assigned into six experimental and control groups. The data were collected by reading and dyslexia test and Wechsler Intelligence Scale for Children-fourth edition. The results of covariance analysis indicated a significant difference between the reading performance of the experimental (thinking maps training package and thinking maps method groups) and control groups ([Formula: see text]). Moreover, there were significant differences between the thinking maps training package group and thinking maps method group in some of the subtests ([Formula: see text]). It can be concluded that thinking maps training package and the thinking maps method exert a positive influence on the reading performance of dyslexic students; therefore, thinking maps can be used as an effective training and treatment method.
Sood, Mariam R; Sereno, Martin I
2016-08-01
Cortical mapping techniques using fMRI have been instrumental in identifying the boundaries of topological (neighbor-preserving) maps in early sensory areas. The presence of topological maps beyond early sensory areas raises the possibility that they might play a significant role in other cognitive systems, and that topological mapping might help to delineate areas involved in higher cognitive processes. In this study, we combine surface-based visual, auditory, and somatomotor mapping methods with a naturalistic reading comprehension task in the same group of subjects to provide a qualitative and quantitative assessment of the cortical overlap between sensory-motor maps in all major sensory modalities, and reading processing regions. Our results suggest that cortical activation during naturalistic reading comprehension overlaps more extensively with topological sensory-motor maps than has been heretofore appreciated. Reading activation in regions adjacent to occipital lobe and inferior parietal lobe almost completely overlaps visual maps, whereas a significant portion of frontal activation for reading in dorsolateral and ventral prefrontal cortex overlaps both visual and auditory maps. Even classical language regions in superior temporal cortex are partially overlapped by topological visual and auditory maps. By contrast, the main overlap with somatomotor maps is restricted to a small region on the anterior bank of the central sulcus near the border between the face and hand representations of M-I. Hum Brain Mapp 37:2784-2810, 2016. © 2016 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc. © 2016 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc.
2013-01-01
Background Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous computational tools to analyze and assemble the millions to billions of short sequencing reads produced by these technologies. While these tools filled an important gap, current approaches for storing, processing, and analyzing short read datasets generally have remained simple and lack the complexity needed to efficiently model the produced reads and assemble them correctly. Results Previously, we presented an overlap graph coarsening scheme for modeling read overlap relationships on multiple levels. Most current read assembly and analysis approaches use a single graph or set of clusters to represent the relationships among a read dataset. Instead, we use a series of graphs to represent the reads and their overlap relationships across a spectrum of information granularity. At each information level our algorithm is capable of generating clusters of reads from the reduced graph, forming an integrated graph modeling and clustering approach for read analysis and assembly. Previously we applied our algorithm to simulated and real 454 datasets to assess its ability to efficiently model and cluster next generation sequencing data. In this paper we extend our algorithm to large simulated and real Illumina datasets to demonstrate that our algorithm is practical for both sequencing technologies. Conclusions Our overlap graph theoretic algorithm is able to model next generation sequencing reads at various levels of granularity through the process of graph coarsening. Additionally, our model allows for efficient representation of the read overlap relationships, is scalable for large datasets, and is practical for both Illumina and 454 sequencing technologies. PMID:24564333
Hyperspectral feature mapping classification based on mathematical morphology
NASA Astrophysics Data System (ADS)
Liu, Chang; Li, Junwei; Wang, Guangping; Wu, Jingli
2016-03-01
This paper proposed a hyperspectral feature mapping classification algorithm based on mathematical morphology. Without the priori information such as spectral library etc., the spectral and spatial information can be used to realize the hyperspectral feature mapping classification. The mathematical morphological erosion and dilation operations are performed respectively to extract endmembers. The spectral feature mapping algorithm is used to carry on hyperspectral image classification. The hyperspectral image collected by AVIRIS is applied to evaluate the proposed algorithm. The proposed algorithm is compared with minimum Euclidean distance mapping algorithm, minimum Mahalanobis distance mapping algorithm, SAM algorithm and binary encoding mapping algorithm. From the results of the experiments, it is illuminated that the proposed algorithm's performance is better than that of the other algorithms under the same condition and has higher classification accuracy.
Quantifying uncertainty in read-across assessment – an algorithmic approach - (SOT)
Read-across is a popular data gap filling technique within category and analogue approaches for regulatory purposes. Acceptance of read-across remains an ongoing challenge with several efforts underway for identifying and addressing uncertainties. Here we demonstrate an algorithm...
Automated identification of reference genes based on RNA-seq data.
Carmona, Rosario; Arroyo, Macarena; Jiménez-Quesada, María José; Seoane, Pedro; Zafra, Adoración; Larrosa, Rafael; Alché, Juan de Dios; Claros, M Gonzalo
2017-08-18
Gene expression analyses demand appropriate reference genes (RGs) for normalization, in order to obtain reliable assessments. Ideally, RG expression levels should remain constant in all cells, tissues or experimental conditions under study. Housekeeping genes traditionally fulfilled this requirement, but they have been reported to be less invariant than expected; therefore, RGs should be tested and validated for every particular situation. Microarray data have been used to propose new RGs, but only a limited set of model species and conditions are available; on the contrary, RNA-seq experiments are more and more frequent and constitute a new source of candidate RGs. An automated workflow based on mapped NGS reads has been constructed to obtain highly and invariantly expressed RGs based on a normalized expression in reads per mapped million and the coefficient of variation. This workflow has been tested with Roche/454 reads from reproductive tissues of olive tree (Olea europaea L.), as well as with Illumina paired-end reads from two different accessions of Arabidopsis thaliana and three different human cancers (prostate, small-cell cancer lung and lung adenocarcinoma). Candidate RGs have been proposed for each species and many of them have been previously reported as RGs in literature. Experimental validation of significant RGs in olive tree is provided to support the algorithm. Regardless sequencing technology, number of replicates, and library sizes, when RNA-seq experiments are designed and performed, the same datasets can be analyzed with our workflow to extract suitable RGs for subsequent PCR validation. Moreover, different subset of experimental conditions can provide different suitable RGs.
Acceleration of short and long DNA read mapping without loss of accuracy using suffix array.
Tárraga, Joaquín; Arnau, Vicente; Martínez, Héctor; Moreno, Raul; Cazorla, Diego; Salavert-Torres, José; Blanquer-Espert, Ignacio; Dopazo, Joaquín; Medina, Ignacio
2014-12-01
HPG Aligner applies suffix arrays for DNA read mapping. This implementation produces a highly sensitive and extremely fast mapping of DNA reads that scales up almost linearly with read length. The approach presented here is faster (over 20× for long reads) and more sensitive (over 98% in a wide range of read lengths) than the current state-of-the-art mappers. HPG Aligner is not only an optimal alternative for current sequencers but also the only solution available to cope with longer reads and growing throughputs produced by forthcoming sequencing technologies. https://github.com/opencb/hpg-aligner. © The Author 2014. Published by Oxford University Press.
ERIC Educational Resources Information Center
Dwyer, Christopher P.; Hogan, Michael J.; Stewart, Ian
2010-01-01
The current study compared the effects on comprehension and memory of learning via text versus learning via argument map. Argument mapping is a method of diagrammatic representation of arguments designed to simplify the reading of an argument structure and allow for easy assimilation of core propositions and relations. In the current study, 400…
Analogical processes in children's understanding of spatial representations.
Yuan, Lei; Uttal, David; Gentner, Dedre
2017-06-01
We propose that map reading can be construed as a form of analogical mapping. We tested 2 predictions that follow from this claim: First, young children's patterns of performance in map reading tasks should parallel those found in analogical mapping tasks; and, second, children will benefit from guided alignment instructions that help them see the relational correspondences between the map and the space. In 4 experiments, 3-year-olds completed a map reading task in which they were asked to find hidden objects in a miniature room, using a corresponding map. We manipulated the availability of guided alignment (showing children the analogical mapping between maps and spaces; Experiments 1, 2, and 3a), the format of guided alignment (gesture or relational language; Experiment 2), and the iconicity of maps (Experiments 3a and 3b). We found that (a) young children's difficulties in map reading follow from known patterns of analogical development-for example, focusing on object similarity over relational similarity; and (b) guided alignment based on analogical reasoning led to substantially better performance. Results also indicated that children's map reading performance was affected by the format of guided alignment, the iconicity of the maps, and the order of tasks. The results bear on the developmental mechanisms underlying young children's learning of spatial representations and also suggest ways to support this learning. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
ERIC Educational Resources Information Center
Sabbah, Sabah Salman
2015-01-01
This study explored the potential effect of college students' self-generated computerized mind maps on their reading comprehension. It also investigated the subjects' attitudes toward generating computerized mind maps for reading comprehension. The study was conducted in response to the inability of the foundation-level students, who were learning…
Three Reading Comprehension Strategies: TELLS, Story Mapping, and QARs.
ERIC Educational Resources Information Center
Sorrell, Adrian L.
1990-01-01
Three reading comprehension strategies are presented to assist learning-disabled students: an advance organizer technique called "TELLS Fact or Fiction" used before reading a passage, a schema-based technique called "Story Mapping" used while reading, and a postreading method of categorizing questions called…
Rcount: simple and flexible RNA-Seq read counting.
Schmid, Marc W; Grossniklaus, Ueli
2015-02-01
Analysis of differential gene expression by RNA sequencing (RNA-Seq) is frequently done using feature counts, i.e. the number of reads mapping to a gene. However, commonly used count algorithms (e.g. HTSeq) do not address the problem of reads aligning with multiple locations in the genome (multireads) or reads aligning with positions where two or more genes overlap (ambiguous reads). Rcount specifically addresses these issues. Furthermore, Rcount allows the user to assign priorities to certain feature types (e.g. higher priority for protein-coding genes compared to rRNA-coding genes) or to add flanking regions. Rcount provides a fast and easy-to-use graphical user interface requiring no command line or programming skills. It is implemented in C++ using the SeqAn (www.seqan.de) and the Qt libraries (qt-project.org). Source code and 64 bit binaries for (Ubuntu) Linux, Windows (7) and MacOSX are released under the GPLv3 license and are freely available on github.com/MWSchmid/Rcount. marcschmid@gmx.ch Test data, genome annotation files, useful Python and R scripts and a step-by-step user guide (including run-time and memory usage tests) are available on github.com/MWSchmid/Rcount. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Functional sequencing read annotation for high precision microbiome analysis
Zhu, Chengsheng; Miller, Maximilian; Marpaka, Srinayani; Vaysberg, Pavel; Rühlemann, Malte C; Wu, Guojun; Heinsen, Femke-Anouska; Tempel, Marie; Zhao, Liping; Lieb, Wolfgang; Franke, Andre; Bromberg, Yana
2018-01-01
Abstract The vast majority of microorganisms on Earth reside in often-inseparable environment-specific communities—microbiomes. Meta-genomic/-transcriptomic sequencing could reveal the otherwise inaccessible functionality of microbiomes. However, existing analytical approaches focus on attributing sequencing reads to known genes/genomes, often failing to make maximal use of available data. We created faser (functional annotation of sequencing reads), an algorithm that is optimized to map reads to molecular functions encoded by the read-correspondent genes. The mi-faser microbiome analysis pipeline, combining faser with our manually curated reference database of protein functions, accurately annotates microbiome molecular functionality. mi-faser’s minutes-per-microbiome processing speed is significantly faster than that of other methods, allowing for large scale comparisons. Microbiome function vectors can be compared between different conditions to highlight environment-specific and/or time-dependent changes in functionality. Here, we identified previously unseen oil degradation-specific functions in BP oil-spill data, as well as functional signatures of individual-specific gut microbiome responses to a dietary intervention in children with Prader–Willi syndrome. Our method also revealed variability in Crohn's Disease patient microbiomes and clearly distinguished them from those of related healthy individuals. Our analysis highlighted the microbiome role in CD pathogenicity, demonstrating enrichment of patient microbiomes in functions that promote inflammation and that help bacteria survive it. PMID:29194524
Ma, Lin; Xu, Yubin
2015-01-01
Green WLAN is a promising technique for accessing future indoor Internet services. It is designed not only for high-speed data communication purposes but also for energy efficiency. The basic strategy of green WLAN is that all the access points are not always powered on, but rather work on-demand. Though powering off idle access points does not affect data communication, a serious asymmetric matching problem will arise in a WLAN indoor positioning system due to the fact the received signal strength (RSS) readings from the available access points are different in their offline and online phases. This asymmetry problem will no doubt invalidate the fingerprint algorithm used to estimate the mobile device location. Therefore, in this paper we propose a green WLAN indoor positioning system, which can recover RSS readings and achieve good localization performance based on singular value thresholding (SVT) theory. By solving the nuclear norm minimization problem, SVT recovers not only the radio map, but also online RSS readings from a sparse matrix by sensing only a fraction of the RSS readings. We have implemented the method in our lab and evaluated its performances. The experimental results indicate the proposed system could recover the RSS readings and achieve good localization performance. PMID:25587977
ABMapper: a suffix array-based tool for multi-location searching and splice-junction mapping.
Lou, Shao-Ke; Ni, Bing; Lo, Leung-Yau; Tsui, Stephen Kwok-Wing; Chan, Ting-Fung; Leung, Kwong-Sak
2011-02-01
Sequencing reads generated by RNA-sequencing (RNA-seq) must first be mapped back to the genome through alignment before they can be further analyzed. Current fast and memory-saving short-read mappers could give us a quick view of the transcriptome. However, they are neither designed for reads that span across splice junctions nor for repetitive reads, which can be mapped to multiple locations in the genome (multi-reads). Here, we describe a new software package: ABMapper, which is specifically designed for exploring all putative locations of reads that are mapped to splice junctions or repetitive in nature. The software is freely available at: http://abmapper.sourceforge.net/. The software is written in C++ and PERL. It runs on all major platforms and operating systems including Windows, Mac OS X and LINUX.
The Emergence of Route Map Reading Skills in Young Children.
ERIC Educational Resources Information Center
Frank, Rita E.
There is little agreement about how the ability to read route maps initially emerges and about how it should be stimulated by early childhood educators. This study assessed the route map reading behavior of young children and the basic skills that might contribute to that behavior. In individual videotaped sessions, 120 four, five, and six year…
ERIC Educational Resources Information Center
Liu, Pei-Lin; Chen, Chiu-Jung; Chang, Yu-Ju
2010-01-01
The purpose of this research was to investigate the effects of a computer-assisted concept mapping learning strategy on EFL college learners' English reading comprehension. The research questions were: (1) what was the influence of the computer-assisted concept mapping learning strategy on different learners' English reading comprehension? (2) did…
Experimental Evaluation of a Braille-Reading-Inspired Finger Motion Adaptive Algorithm.
Ulusoy, Melda; Sipahi, Rifat
2016-01-01
Braille reading is a complex process involving intricate finger-motion patterns and finger-rubbing actions across Braille letters for the stimulation of appropriate nerves. Although Braille reading is performed by smoothly moving the finger from left-to-right, research shows that even fluent reading requires right-to-left movements of the finger, known as "reversal". Reversals are crucial as they not only enhance stimulation of nerves for correctly reading the letters, but they also show one to re-read the letters that were missed in the first pass. Moreover, it is known that reversals can be performed as often as in every sentence and can start at any location in a sentence. Here, we report experimental results on the feasibility of an algorithm that can render a machine to automatically adapt to reversal gestures of one's finger. Through Braille-reading-analogous tasks, the algorithm is tested with thirty sighted subjects that volunteered in the study. We find that the finger motion adaptive algorithm (FMAA) is useful in achieving cooperation between human finger and the machine. In the presence of FMAA, subjects' performance metrics associated with the tasks have significantly improved as supported by statistical analysis. In light of these encouraging results, preliminary experiments are carried out with five blind subjects with the aim to put the algorithm to test. Results obtained from carefully designed experiments showed that subjects' Braille reading accuracy in the presence of FMAA was more favorable then when FMAA was turned off. Utilization of FMAA in future generation Braille reading devices thus holds strong promise.
Experimental Evaluation of a Braille-Reading-Inspired Finger Motion Adaptive Algorithm
2016-01-01
Braille reading is a complex process involving intricate finger-motion patterns and finger-rubbing actions across Braille letters for the stimulation of appropriate nerves. Although Braille reading is performed by smoothly moving the finger from left-to-right, research shows that even fluent reading requires right-to-left movements of the finger, known as “reversal”. Reversals are crucial as they not only enhance stimulation of nerves for correctly reading the letters, but they also show one to re-read the letters that were missed in the first pass. Moreover, it is known that reversals can be performed as often as in every sentence and can start at any location in a sentence. Here, we report experimental results on the feasibility of an algorithm that can render a machine to automatically adapt to reversal gestures of one’s finger. Through Braille-reading-analogous tasks, the algorithm is tested with thirty sighted subjects that volunteered in the study. We find that the finger motion adaptive algorithm (FMAA) is useful in achieving cooperation between human finger and the machine. In the presence of FMAA, subjects’ performance metrics associated with the tasks have significantly improved as supported by statistical analysis. In light of these encouraging results, preliminary experiments are carried out with five blind subjects with the aim to put the algorithm to test. Results obtained from carefully designed experiments showed that subjects’ Braille reading accuracy in the presence of FMAA was more favorable then when FMAA was turned off. Utilization of FMAA in future generation Braille reading devices thus holds strong promise. PMID:26849058
BM-Map: Bayesian Mapping of Multireads for Next-Generation Sequencing Data
Ji, Yuan; Xu, Yanxun; Zhang, Qiong; Tsui, Kam-Wah; Yuan, Yuan; Norris, Clift; Liang, Shoudan; Liang, Han
2011-01-01
Summary Next-generation sequencing (NGS) technology generates millions of short reads, which provide valuable information for various aspects of cellular activities and biological functions. A key step in NGS applications (e.g., RNA-Seq) is to map short reads to correct genomic locations within the source genome. While most reads are mapped to a unique location, a significant proportion of reads align to multiple genomic locations with equal or similar numbers of mismatches; these are called multireads. The ambiguity in mapping the multireads may lead to bias in downstream analyses. Currently, most practitioners discard the multireads in their analysis, resulting in a loss of valuable information, especially for the genes with similar sequences. To refine the read mapping, we develop a Bayesian model that computes the posterior probability of mapping a multiread to each competing location. The probabilities are used for downstream analyses, such as the quantification of gene expression. We show through simulation studies and RNA-Seq analysis of real life data that the Bayesian method yields better mapping than the current leading methods. We provide a C++ program for downloading that is being packaged into a user-friendly software. PMID:21517792
JVM: Java Visual Mapping tool for next generation sequencing read.
Yang, Ye; Liu, Juan
2015-01-01
We developed a program JVM (Java Visual Mapping) for mapping next generation sequencing read to reference sequence. The program is implemented in Java and is designed to deal with millions of short read generated by sequence alignment using the Illumina sequencing technology. It employs seed index strategy and octal encoding operations for sequence alignments. JVM is useful for DNA-Seq, RNA-Seq when dealing with single-end resequencing. JVM is a desktop application, which supports reads capacity from 1 MB to 10 GB.
Simultaneous isoform discovery and quantification from RNA-seq.
Hiller, David; Wong, Wing Hung
2013-05-01
RNA sequencing is a recent technology which has seen an explosion of methods addressing all levels of analysis, from read mapping to transcript assembly to differential expression modeling. In particular the discovery of isoforms at the transcript assembly stage is a complex problem and current approaches suffer from various limitations. For instance, many approaches use graphs to construct a minimal set of isoforms which covers the observed reads, then perform a separate algorithm to quantify the isoforms, which can result in a loss of power. Current methods also use ad-hoc solutions to deal with the vast number of possible isoforms which can be constructed from a given set of reads. Finally, while the need of taking into account features such as read pairing and sampling rate of reads has been acknowledged, most existing methods do not seamlessly integrate these features as part of the model. We present Montebello, an integrated statistical approach which performs simultaneous isoform discovery and quantification by using a Monte Carlo simulation to find the most likely isoform composition leading to a set of observed reads. We compare Montebello to Cufflinks, a popular isoform discovery approach, on a simulated data set and on 46.3 million brain reads from an Illumina tissue panel. On this data set Montebello appears to offer a modest improvement over Cufflinks when considering discovery and parsimony metrics. In addition Montebello mitigates specific difficulties inherent in the Cufflinks approach. Finally, Montebello can be fine-tuned depending on the type of solution desired.
ARYANA: Aligning Reads by Yet Another Approach
2014-01-01
Motivation Although there are many different algorithms and software tools for aligning sequencing reads, fast gapped sequence search is far from solved. Strong interest in fast alignment is best reflected in the $106 prize for the Innocentive competition on aligning a collection of reads to a given database of reference genomes. In addition, de novo assembly of next-generation sequencing long reads requires fast overlap-layout-concensus algorithms which depend on fast and accurate alignment. Contribution We introduce ARYANA, a fast gapped read aligner, developed on the base of BWA indexing infrastructure with a completely new alignment engine that makes it significantly faster than three other aligners: Bowtie2, BWA and SeqAlto, with comparable generality and accuracy. Instead of the time-consuming backtracking procedures for handling mismatches, ARYANA comes with the seed-and-extend algorithmic framework and a significantly improved efficiency by integrating novel algorithmic techniques including dynamic seed selection, bidirectional seed extension, reset-free hash tables, and gap-filling dynamic programming. As the read length increases ARYANA's superiority in terms of speed and alignment rate becomes more evident. This is in perfect harmony with the read length trend as the sequencing technologies evolve. The algorithmic platform of ARYANA makes it easy to develop mission-specific aligners for other applications using ARYANA engine. Availability ARYANA with complete source code can be obtained from http://github.com/aryana-aligner PMID:25252881
ARYANA: Aligning Reads by Yet Another Approach.
Gholami, Milad; Arbabi, Aryan; Sharifi-Zarchi, Ali; Chitsaz, Hamidreza; Sadeghi, Mehdi
2014-01-01
Although there are many different algorithms and software tools for aligning sequencing reads, fast gapped sequence search is far from solved. Strong interest in fast alignment is best reflected in the $10(6) prize for the Innocentive competition on aligning a collection of reads to a given database of reference genomes. In addition, de novo assembly of next-generation sequencing long reads requires fast overlap-layout-concensus algorithms which depend on fast and accurate alignment. We introduce ARYANA, a fast gapped read aligner, developed on the base of BWA indexing infrastructure with a completely new alignment engine that makes it significantly faster than three other aligners: Bowtie2, BWA and SeqAlto, with comparable generality and accuracy. Instead of the time-consuming backtracking procedures for handling mismatches, ARYANA comes with the seed-and-extend algorithmic framework and a significantly improved efficiency by integrating novel algorithmic techniques including dynamic seed selection, bidirectional seed extension, reset-free hash tables, and gap-filling dynamic programming. As the read length increases ARYANA's superiority in terms of speed and alignment rate becomes more evident. This is in perfect harmony with the read length trend as the sequencing technologies evolve. The algorithmic platform of ARYANA makes it easy to develop mission-specific aligners for other applications using ARYANA engine. ARYANA with complete source code can be obtained from http://github.com/aryana-aligner.
ERIC Educational Resources Information Center
Alturki, Nada
2017-01-01
The purpose of this study was to examine the effectiveness of using group story-mapping on ESL students with a learning disability in reading comprehension. The researcher focused on a specific graphic organizer in this study, called Group Story-Mapping. This strategy required students with learning disabilities involving reading comprehension to…
ReprDB and panDB: minimalist databases with maximal microbial representation.
Zhou, Wei; Gay, Nicole; Oh, Julia
2018-01-18
Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tools. Moreover, computational tools to rapidly compile and update such databases to accommodate the rapid increase in new reference genomes do not exist. As a result, database-guided analyses often fail to profile a substantial fraction of metagenomic shotgun sequencing reads from complex microbiomes. We report pipelines that efficiently traverse all open access microbial genomes and assemble non-redundant genomic information. The pipelines result in two species-resolution microbial reference databases of relatively small sizes: reprDB, which assembles microbial representative or reference genomes, and panDB, for which we developed a novel iterative alignment algorithm to identify and assemble non-redundant genomic regions in multiple sequenced strains. With the databases, we managed to assign taxonomic labels and genome positions to the majority of metagenomic reads from human skin and gut microbiomes, demonstrating a significant improvement over a previous database-guided analysis on the same datasets. reprDB and panDB leverage the rapid increases in the number of open access microbial genomes to more fully profile metagenomic samples. Additionally, the databases exclude redundant sequence information to avoid inflated storage or memory space and indexing or analyzing time. Finally, the novel iterative alignment algorithm significantly increases efficiency in pan-genome identification and can be useful in comparative genomic analyses.
A constraint optimization based virtual network mapping method
NASA Astrophysics Data System (ADS)
Li, Xiaoling; Guo, Changguo; Wang, Huaimin; Li, Zhendong; Yang, Zhiwen
2013-03-01
Virtual network mapping problem, maps different virtual networks onto the substrate network is an extremely challenging work. This paper proposes a constraint optimization based mapping method for solving virtual network mapping problem. This method divides the problem into two phases, node mapping phase and link mapping phase, which are all NP-hard problems. Node mapping algorithm and link mapping algorithm are proposed for solving node mapping phase and link mapping phase, respectively. Node mapping algorithm adopts the thinking of greedy algorithm, mainly considers two factors, available resources which are supplied by the nodes and distance between the nodes. Link mapping algorithm is based on the result of node mapping phase, adopts the thinking of distributed constraint optimization method, which can guarantee to obtain the optimal mapping with the minimum network cost. Finally, simulation experiments are used to validate the method, and results show that the method performs very well.
In vivo fluorescence lifetime tomography of a FRET probe expressed in mouse
McGinty, James; Stuckey, Daniel W.; Soloviev, Vadim Y.; Laine, Romain; Wylezinska-Arridge, Marzena; Wells, Dominic J.; Arridge, Simon R.; French, Paul M. W.; Hajnal, Joseph V.; Sardini, Alessandro
2011-01-01
Förster resonance energy transfer (FRET) is a powerful biological tool for reading out cell signaling processes. In vivo use of FRET is challenging because of the scattering properties of bulk tissue. By combining diffuse fluorescence tomography with fluorescence lifetime imaging (FLIM), implemented using wide-field time-gated detection of fluorescence excited by ultrashort laser pulses in a tomographic imaging system and applying inverse scattering algorithms, we can reconstruct the three dimensional spatial localization of fluorescence quantum efficiency and lifetime. We demonstrate in vivo spatial mapping of FRET between genetically expressed fluorescent proteins in live mice read out using FLIM. Following transfection by electroporation, mouse hind leg muscles were imaged in vivo and the emission of free donor (eGFP) in the presence of free acceptor (mCherry) could be clearly distinguished from the fluorescence of the donor when directly linked to the acceptor in a tandem (eGFP-mCherry) FRET construct. PMID:21750768
Indel variant analysis of short-read sequencing data with Scalpel
Fang, Han; Bergmann, Ewa A; Arora, Kanika; Vacic, Vladimir; Zody, Michael C; Iossifov, Ivan; O’Rawe, Jason A; Wu, Yiyang; Barron, Laura T Jimenez; Rosenbaum, Julie; Ronemus, Michael; Lee, Yoon-ha; Wang, Zihua; Dikoglu, Esra; Jobanputra, Vaidehi; Lyon, Gholson J; Wigler, Michael; Schatz, Michael C; Narzisi, Giuseppe
2017-01-01
As the second most common type of variation in the human genome, insertions and deletions (indels) have been linked to many diseases, but the discovery of indels of more than a few bases in size from short-read sequencing data remains challenging. Scalpel (http://scalpel.sourceforge.net) is an open-source software for reliable indel detection based on the microassembly technique. It has been successfully used to discover mutations in novel candidate genes for autism, and it is extensively used in other large-scale studies of human diseases. This protocol gives an overview of the algorithm and describes how to use Scalpel to perform highly accurate indel calling from whole-genome and whole-exome sequencing data. We provide detailed instructions for an exemplary family-based de novo study, but we also characterize the other two supported modes of operation: single-sample and somatic analysis. Indel normalization, visualization and annotation of the mutations are also illustrated. Using a standard server, indel discovery and characterization in the exonic regions of the example sequencing data can be completed in ~5 h after read mapping. PMID:27854363
Cognitive Processes in Orienteering: A Review.
ERIC Educational Resources Information Center
Seiler, Roland
1996-01-01
Reviews recent research on information processing and decision making in orienteering. The main cognitive demands investigated were selection of relevant map information for route choice, comparison between map and terrain in map reading and in relocation, and quick awareness of mistakes. Presents a model of map reading based on results. Contains…
Anatomy assisted PET image reconstruction incorporating multi-resolution joint entropy
NASA Astrophysics Data System (ADS)
Tang, Jing; Rahmim, Arman
2015-01-01
A promising approach in PET image reconstruction is to incorporate high resolution anatomical information (measured from MR or CT) taking the anato-functional similarity measures such as mutual information or joint entropy (JE) as the prior. These similarity measures only classify voxels based on intensity values, while neglecting structural spatial information. In this work, we developed an anatomy-assisted maximum a posteriori (MAP) reconstruction algorithm wherein the JE measure is supplied by spatial information generated using wavelet multi-resolution analysis. The proposed wavelet-based JE (WJE) MAP algorithm involves calculation of derivatives of the subband JE measures with respect to individual PET image voxel intensities, which we have shown can be computed very similarly to how the inverse wavelet transform is implemented. We performed a simulation study with the BrainWeb phantom creating PET data corresponding to different noise levels. Realistically simulated T1-weighted MR images provided by BrainWeb modeling were applied in the anatomy-assisted reconstruction with the WJE-MAP algorithm and the intensity-only JE-MAP algorithm. Quantitative analysis showed that the WJE-MAP algorithm performed similarly to the JE-MAP algorithm at low noise level in the gray matter (GM) and white matter (WM) regions in terms of noise versus bias tradeoff. When noise increased to medium level in the simulated data, the WJE-MAP algorithm started to surpass the JE-MAP algorithm in the GM region, which is less uniform with smaller isolated structures compared to the WM region. In the high noise level simulation, the WJE-MAP algorithm presented clear improvement over the JE-MAP algorithm in both the GM and WM regions. In addition to the simulation study, we applied the reconstruction algorithms to real patient studies involving DPA-173 PET data and Florbetapir PET data with corresponding T1-MPRAGE MRI images. Compared to the intensity-only JE-MAP algorithm, the WJE-MAP algorithm resulted in comparable regional mean values to those from the maximum likelihood algorithm while reducing noise. Achieving robust performance in various noise-level simulation and patient studies, the WJE-MAP algorithm demonstrates its potential in clinical quantitative PET imaging.
Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data
Degner, Jacob F.; Marioni, John C.; Pai, Athma A.; Pickrell, Joseph K.; Nkadori, Everlyne; Gilad, Yoav; Pritchard, Jonathan K.
2009-01-01
Motivation: Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here, we investigate the impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE). Results: We generated 16 million 35 bp reads from mRNA of each of two HapMap Yoruba individuals. When we mapped these reads to the human genome we found that, at heterozygous SNPs, there was a significant bias toward higher mapping rates of the allele in the reference sequence, compared with the alternative allele. Masking known SNP positions in the genome sequence eliminated the reference bias but, surprisingly, did not lead to more reliable results overall. We find that even after masking, ∼5–10% of SNPs still have an inherent bias toward more effective mapping of one allele. Filtering out inherently biased SNPs removes 40% of the top signals of ASE. The remaining SNPs showing ASE are enriched in genes previously known to harbor cis-regulatory variation or known to show uniparental imprinting. Our results have implications for a variety of applications involving detection of alternate alleles from short-read sequence data. Availability: Scripts, written in Perl and R, for simulating short reads, masking SNP variation in a reference genome and analyzing the simulation output are available upon request from JFD. Raw short read data were deposited in GEO (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE18156. Contact: jdegner@uchicago.edu; marioni@uchicago.edu; gilad@uchicago.edu; pritch@uchicago.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19808877
High-confidence coding and noncoding transcriptome maps
2017-01-01
The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap 2.0, The Cancer Genome Atlas, and GTEx projects, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalog that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of noncoding genomes. PMID:28396519
Accelerating Demand Paging for Local and Remote Out-of-Core Visualization
NASA Technical Reports Server (NTRS)
Ellsworth, David
2001-01-01
This paper describes a new algorithm that improves the performance of application-controlled demand paging for the out-of-core visualization of data sets that are on either local disks or disks on remote servers. The performance improvements come from better overlapping the computation with the page reading process, and by performing multiple page reads in parallel. The new algorithm can be applied to many different visualization algorithms since application-controlled demand paging is not specific to any visualization algorithm. The paper includes measurements that show that the new multi-threaded paging algorithm decreases the time needed to compute visualizations by one third when using one processor and reading data from local disk. The time needed when using one processor and reading data from remote disk decreased by up to 60%. Visualization runs using data from remote disk ran about as fast as ones using data from local disk because the remote runs were able to make use of the remote server's high performance disk array.
SCALCE: boosting sequence compression algorithms using locally consistent encoding.
Hach, Faraz; Numanagic, Ibrahim; Alkan, Can; Sahinalp, S Cenk
2012-12-01
The high throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for the computational infrastructure. Data management, storage and analysis have become major logistical obstacles for those adopting the new platforms. The requirement for large investment for this purpose almost signalled the end of the Sequence Read Archive hosted at the National Center for Biotechnology Information (NCBI), which holds most of the sequence data generated world wide. Currently, most HTS data are compressed through general purpose algorithms such as gzip. These algorithms are not designed for compressing data generated by the HTS platforms; for example, they do not take advantage of the specific nature of genomic sequence data, that is, limited alphabet size and high similarity among reads. Fast and efficient compression algorithms designed specifically for HTS data should be able to address some of the issues in data management, storage and communication. Such algorithms would also help with analysis provided they offer additional capabilities such as random access to any read and indexing for efficient sequence similarity search. Here we present SCALCE, a 'boosting' scheme based on Locally Consistent Parsing technique, which reorganizes the reads in a way that results in a higher compression speed and compression rate, independent of the compression algorithm in use and without using a reference genome. Our tests indicate that SCALCE can improve the compression rate achieved through gzip by a factor of 4.19-when the goal is to compress the reads alone. In fact, on SCALCE reordered reads, gzip running time can improve by a factor of 15.06 on a standard PC with a single core and 6 GB memory. Interestingly even the running time of SCALCE + gzip improves that of gzip alone by a factor of 2.09. When compared with the recently published BEETL, which aims to sort the (inverted) reads in lexicographic order for improving bzip2, SCALCE + gzip provides up to 2.01 times better compression while improving the running time by a factor of 5.17. SCALCE also provides the option to compress the quality scores as well as the read names, in addition to the reads themselves. This is achieved by compressing the quality scores through order-3 Arithmetic Coding (AC) and the read names through gzip through the reordering SCALCE provides on the reads. This way, in comparison with gzip compression of the unordered FASTQ files (including reads, read names and quality scores), SCALCE (together with gzip and arithmetic encoding) can provide up to 3.34 improvement in the compression rate and 1.26 improvement in running time. Our algorithm, SCALCE (Sequence Compression Algorithm using Locally Consistent Encoding), is implemented in C++ with both gzip and bzip2 compression options. It also supports multithreading when gzip option is selected, and the pigz binary is available. It is available at http://scalce.sourceforge.net. fhach@cs.sfu.ca or cenk@cs.sfu.ca Supplementary data are available at Bioinformatics online.
ERIC Educational Resources Information Center
Izard, Véronique; O'Donnell, Evan; Spelke, Elizabeth S.
2014-01-01
Preschool children can navigate by simple geometric maps of the environment, but the nature of the geometric relations they use in map reading remains unclear. Here, children were tested specifically on their sensitivity to angle. Forty-eight children (age 47:15-53:30 months) were presented with fragments of geometric maps, in which angle sections…
Analogical Processes in Children's Understanding of Spatial Representations
ERIC Educational Resources Information Center
Yuan, Lei; Uttal, David; Gentner, Dedre
2017-01-01
We propose that map reading can be construed as a form of analogical mapping. We tested 2 predictions that follow from this claim: First, young children's patterns of performance in map reading tasks should parallel those found in analogical mapping tasks; and, second, children will benefit from guided alignment instructions that help them see the…
Wang, Mingjie; Ye, Yuzhen; Tang, Haixu
2012-06-01
The wide applications of next-generation sequencing (NGS) technologies in metagenomics have raised many computational challenges. One of the essential problems in metagenomics is to estimate the taxonomic composition of a microbial community, which can be approached by mapping shotgun reads acquired from the community to previously characterized microbial genomes followed by quantity profiling of these species based on the number of mapped reads. This procedure, however, is not as trivial as it appears at first glance. A shotgun metagenomic dataset often contains DNA sequences from many closely-related microbial species (e.g., within the same genus) or strains (e.g., within the same species), thus it is often difficult to determine which species/strain a specific read is sampled from when it can be mapped to a common region shared by multiple genomes at high similarity. Furthermore, high genomic variations are observed among individual genomes within the same species, which are difficult to be differentiated from the inter-species variations during reads mapping. To address these issues, a commonly used approach is to quantify taxonomic distribution only at the genus level, based on the reads mapped to all species belonging to the same genus; alternatively, reads are mapped to a set of representative genomes, each selected to represent a different genus. Here, we introduce a novel approach to the quantity estimation of closely-related species within the same genus by mapping the reads to their genomes represented by a de Bruijn graph, in which the common genomic regions among them are collapsed. Using simulated and real metagenomic datasets, we show the de Bruijn graph approach has several advantages over existing methods, including (1) it avoids redundant mapping of shotgun reads to multiple copies of the common regions in different genomes, and (2) it leads to more accurate quantification for the closely-related species (and even for strains within the same species).
ERIC Educational Resources Information Center
Zubaidah, Siti; Corebima, Aloysius Duran; Mahanal, Susriyati; Mistianah
2018-01-01
The aim of this research was to reveal the relationship between student's reading interest and critical thinking skills through Reading Concept Map Group Investigation (Remap GI) and Reading Concept Map Jigsaw (Remap Jigsaw) learning models. To do so, two science classes from first grade of two Senior High Schools in Malang, Indonesia were…
mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications.
Hach, Faraz; Sarrafi, Iman; Hormozdiari, Farhad; Alkan, Can; Eichler, Evan E; Sahinalp, S Cenk
2014-07-01
High throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for processing and downstream analysis. While tools that report the 'best' mapping location of each read provide a fast way to process HTS data, they are not suitable for many types of downstream analysis such as structural variation detection, where it is important to report multiple mapping loci for each read. For this purpose we introduce mrsFAST-Ultra, a fast, cache oblivious, SNP-aware aligner that can handle the multi-mapping of HTS reads very efficiently. mrsFAST-Ultra improves mrsFAST, our first cache oblivious read aligner capable of handling multi-mapping reads, through new and compact index structures that reduce not only the overall memory usage but also the number of CPU operations per alignment. In fact the size of the index generated by mrsFAST-Ultra is 10 times smaller than that of mrsFAST. As importantly, mrsFAST-Ultra introduces new features such as being able to (i) obtain the best mapping loci for each read, and (ii) return all reads that have at most n mapping loci (within an error threshold), together with these loci, for any user specified n. Furthermore, mrsFAST-Ultra is SNP-aware, i.e. it can map reads to reference genome while discounting the mismatches that occur at common SNP locations provided by db-SNP; this significantly increases the number of reads that can be mapped to the reference genome. Notice that all of the above features are implemented within the index structure and are not simple post-processing steps and thus are performed highly efficiently. Finally, mrsFAST-Ultra utilizes multiple available cores and processors and can be tuned for various memory settings. Our results show that mrsFAST-Ultra is roughly five times faster than its predecessor mrsFAST. In comparison to newly enhanced popular tools such as Bowtie2, it is more sensitive (it can report 10 times or more mappings per read) and much faster (six times or more) in the multi-mapping mode. Furthermore, mrsFAST-Ultra has an index size of 2GB for the entire human reference genome, which is roughly half of that of Bowtie2. mrsFAST-Ultra is open source and it can be accessed at http://mrsfast.sourceforge.net. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Tong, Xiuli; McBride, Catherine
2017-07-01
Following a review of contemporary models of word-level processing for reading and their limitations, we propose a new hypothetical model of Chinese character reading, namely, the graded lexical space mapping model that characterizes how sublexical radicals and lexical information are involved in Chinese character reading development. The underlying assumption of this model is that Chinese character recognition is a process of competitive mappings of phonology, semantics, and orthography in both lexical and sublexical systems, operating as functions of statistical properties of print input based on the individual's specific level of reading. This model leads to several testable predictions concerning how the quasiregularity and continuity of Chinese-specific radicals are organized in memory for both child and adult readers at different developmental stages of reading.
The Design and Implementation of a Read Prediction Buffer
1992-12-01
City, State, and ZIP Code) 7b ADDRESS (City, State. and ZIP Code) 8a. NAME OF FUNDING /SPONSORING 8b. OFFICE SYMBOL 9 PROCUREMENT INSTRUMENT... 9 E. THESIS STRUCTURE.. . .... ............... 9 II. READ PREDICTION ALGORITHM AND BUFFER DESIGN 10 A. THE READ PREDICTION ALGORITHM...29 Figure 9 . Basic Multiplexer Cell .... .......... .. 30 Figure 10. Block Diagram Simulation Labels ......... 38 viii I. INTRODUCTION A
High-fidelity projective read-out of a solid-state spin quantum register.
Robledo, Lucio; Childress, Lilian; Bernien, Hannes; Hensen, Bas; Alkemade, Paul F A; Hanson, Ronald
2011-09-21
Initialization and read-out of coupled quantum systems are essential ingredients for the implementation of quantum algorithms. Single-shot read-out of the state of a multi-quantum-bit (multi-qubit) register would allow direct investigation of quantum correlations (entanglement), and would give access to further key resources such as quantum error correction and deterministic quantum teleportation. Although spins in solids are attractive candidates for scalable quantum information processing, their single-shot detection has been achieved only for isolated qubits. Here we demonstrate the preparation and measurement of a multi-spin quantum register in a low-temperature solid-state system by implementing resonant optical excitation techniques originally developed in atomic physics. We achieve high-fidelity read-out of the electronic spin associated with a single nitrogen-vacancy centre in diamond, and use this read-out to project up to three nearby nuclear spin qubits onto a well-defined state. Conversely, we can distinguish the state of the nuclear spins in a single shot by mapping it onto, and subsequently measuring, the electronic spin. Finally, we show compatibility with qubit control: we demonstrate initialization, coherent manipulation and single-shot read-out in a single experiment on a two-qubit register, using techniques suitable for extension to larger registers. These results pave the way for a test of Bell's inequalities on solid-state spins and the implementation of measurement-based quantum information protocols. © 2011 Macmillan Publishers Limited. All rights reserved
Information-optimal genome assembly via sparse read-overlap graphs.
Shomorony, Ilan; Kim, Samuel H; Courtade, Thomas A; Tse, David N C
2016-09-01
In the context of third-generation long-read sequencing technologies, read-overlap-based approaches are expected to play a central role in the assembly step. A fundamental challenge in assembling from a read-overlap graph is that the true sequence corresponds to a Hamiltonian path on the graph, and, under most formulations, the assembly problem becomes NP-hard, restricting practical approaches to heuristics. In this work, we avoid this seemingly fundamental barrier by first setting the computational complexity issue aside, and seeking an algorithm that targets information limits In particular, we consider a basic feasibility question: when does the set of reads contain enough information to allow unambiguous reconstruction of the true sequence? Based on insights from this information feasibility question, we present an algorithm-the Not-So-Greedy algorithm-to construct a sparse read-overlap graph. Unlike most other assembly algorithms, Not-So-Greedy comes with a performance guarantee: whenever information feasibility conditions are satisfied, the algorithm reduces the assembly problem to an Eulerian path problem on the resulting graph, and can thus be solved in linear time. In practice, this theoretical guarantee translates into assemblies of higher quality. Evaluations on both simulated reads from real genomes and a PacBio Escherichia coli K12 dataset demonstrate that Not-So-Greedy compares favorably with standard string graph approaches in terms of accuracy of the resulting read-overlap graph and contig N50. Available at github.com/samhykim/nsg courtade@eecs.berkeley.edu or dntse@stanford.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Difficulties in the diagnosis of vertebral fracture in men: agreement between doctors.
Fechtenbaum, Jacques; Briot, Karine; Paternotte, Simon; Audran, Maurice; Breuil, Véronique; Cortet, Bernard; Debiais, Françoise; Grados, Franck; Guggenbuhl, Pascal; Laroche, Michel; Legrand, Erick; Lespessailles, Eric; Marcelli, Christian; Orcel, Philippe; Szulc, Pawel; Thomas, Thierry; Kolta, Sami; Roux, Christian
2014-03-01
The agreement for vertebral fracture (VF) diagnosis in men, between doctors is poor. To assess the agreement for VF diagnosis, in men, on standard radiographs, between experts, before and after consensual workshop and establishing an algorithm. The agreement between thirteen experimented rheumatologists has been calculated in thirty osteoporotic men. Then, the group discussed in a workshop and 28 other radiograph sets of osteoporotic men with follow-up radiographs and incident confirmed VF, have been reviewed. The experts identified and hierarchised 18 pathological features of vertebral deformation and established an algorithm of VF diagnosis. Eleven experts have realized a second reading of the first set of radiographs. We compared the agreement between the 2 readings without and with the algorithm. After consensus and the use of the algorithm the results are: number of fractured patients (with at least 1 VF) according to the experts varies from 13 to 26 patients out of 30 (13 to 28 during the first reading). The agreement between the experts at the patient level is 75% (70% at the first reading). Among the 390 vertebrae analyzed by the experts, the number of VF detected varies from 18 to 59 (18 to 98 at the first reading). The agreement between the experts at the vertebral level is 92% (89% at the first reading). The algorithm allows a good improvement of the agreement, especially for 8 of the 11 experts. Discrepancies for the VF diagnosis between experts exist. The algorithm improves the agreement. Copyright © 2013 Société française de rhumatologie. Published by Elsevier SAS. All rights reserved.
Construction of Cognitive Maps to Improve E-Book Reading and Navigation
ERIC Educational Resources Information Center
Li, Liang-Yi; Chen, Gwo-Dong; Yang, Sheng-Jie
2013-01-01
People have greater difficulty reading academic textbooks on screen than on paper. One notable problem is that they cannot construct an effective cognitive map because of the lack of contextual information cues and ineffective navigational mechanisms in e-books. To support the construction of cognitive maps, this paper proposes the visual cue map,…
Electrocardiograms with pacemakers: accuracy of computer reading.
Guglin, Maya E; Datwani, Neeta
2007-04-01
We analyzed the accuracy with which a computer algorithm reads electrocardiograms (ECGs) with electronic pacemakers (PMs). Electrocardiograms were screened for the presence of electronic pacing spikes. Computer-derived interpretations were compared with cardiologists' readings. Computer-drawn interpretations required revision by cardiologists in 61.3% of cases. In 18.4% of cases, the ECG reading algorithm failed to recognize the presence of a PM. The misinterpretation of paced beats as intrinsic beats led to multiple secondary errors, including myocardial infarctions in varying localization. The most common error in computer reading was the failure to identify an underlying rhythm. This error caused frequent misidentification of the PM type, especially when the presence of normal sinus rhythm was not recognized in a tracing with a DDD PM tracking the atrial activity. The increasing number of pacing devices, and the resulting number of ECGs with pacing spikes, mandates the refining of ECG reading algorithms. Improvement is especially needed in the recognition of the underlying rhythm, pacing spikes, and mode of pacing.
ERIC Educational Resources Information Center
Armbruster, Bonnie B.; Anderson, Thomas H.
Idea-mapping (i-mapping), a way of representing ideas from a text in the form of a diagram, is defined and illustrated in this document as a way to help students "see" how the ideas they read are linked to each other. The first portion of the document discusses the fundamental relationships found in texts (A is a characteristic of B, A…
Detection of microRNAs in color space.
Marco, Antonio; Griffiths-Jones, Sam
2012-02-01
Deep sequencing provides inexpensive opportunities to characterize the transcriptional diversity of known genomes. The AB SOLiD technology generates millions of short sequencing reads in color-space; that is, the raw data is a sequence of colors, where each color represents 2 nt and each nucleotide is represented by two consecutive colors. This strategy is purported to have several advantages, including increased ability to distinguish sequencing errors from polymorphisms. Several programs have been developed to map short reads to genomes in color space. However, a number of previously unexplored technical issues arise when using SOLiD technology to characterize microRNAs. Here we explore these technical difficulties. First, since the sequenced reads are longer than the biological sequences, every read is expected to contain linker fragments. The color-calling error rate increases toward the 3(') end of the read such that recognizing the linker sequence for removal becomes problematic. Second, mapping in color space may lead to the loss of the first nucleotide of each read. We propose a sequential trimming and mapping approach to map small RNAs. Using our strategy, we reanalyze three published insect small RNA deep sequencing datasets and characterize 22 new microRNAs. A bash shell script to perform the sequential trimming and mapping procedure, called SeqTrimMap, is available at: http://www.mirbase.org/tools/seqtrimmap/ antonio.marco@manchester.ac.uk Supplementary data are available at Bioinformatics online.
SCALCE: boosting sequence compression algorithms using locally consistent encoding
Hach, Faraz; Numanagić, Ibrahim; Sahinalp, S Cenk
2012-01-01
Motivation: The high throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for the computational infrastructure. Data management, storage and analysis have become major logistical obstacles for those adopting the new platforms. The requirement for large investment for this purpose almost signalled the end of the Sequence Read Archive hosted at the National Center for Biotechnology Information (NCBI), which holds most of the sequence data generated world wide. Currently, most HTS data are compressed through general purpose algorithms such as gzip. These algorithms are not designed for compressing data generated by the HTS platforms; for example, they do not take advantage of the specific nature of genomic sequence data, that is, limited alphabet size and high similarity among reads. Fast and efficient compression algorithms designed specifically for HTS data should be able to address some of the issues in data management, storage and communication. Such algorithms would also help with analysis provided they offer additional capabilities such as random access to any read and indexing for efficient sequence similarity search. Here we present SCALCE, a ‘boosting’ scheme based on Locally Consistent Parsing technique, which reorganizes the reads in a way that results in a higher compression speed and compression rate, independent of the compression algorithm in use and without using a reference genome. Results: Our tests indicate that SCALCE can improve the compression rate achieved through gzip by a factor of 4.19—when the goal is to compress the reads alone. In fact, on SCALCE reordered reads, gzip running time can improve by a factor of 15.06 on a standard PC with a single core and 6 GB memory. Interestingly even the running time of SCALCE + gzip improves that of gzip alone by a factor of 2.09. When compared with the recently published BEETL, which aims to sort the (inverted) reads in lexicographic order for improving bzip2, SCALCE + gzip provides up to 2.01 times better compression while improving the running time by a factor of 5.17. SCALCE also provides the option to compress the quality scores as well as the read names, in addition to the reads themselves. This is achieved by compressing the quality scores through order-3 Arithmetic Coding (AC) and the read names through gzip through the reordering SCALCE provides on the reads. This way, in comparison with gzip compression of the unordered FASTQ files (including reads, read names and quality scores), SCALCE (together with gzip and arithmetic encoding) can provide up to 3.34 improvement in the compression rate and 1.26 improvement in running time. Availability: Our algorithm, SCALCE (Sequence Compression Algorithm using Locally Consistent Encoding), is implemented in C++ with both gzip and bzip2 compression options. It also supports multithreading when gzip option is selected, and the pigz binary is available. It is available at http://scalce.sourceforge.net. Contact: fhach@cs.sfu.ca or cenk@cs.sfu.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23047557
Sparsity-constrained PET image reconstruction with learned dictionaries
NASA Astrophysics Data System (ADS)
Tang, Jing; Yang, Bao; Wang, Yanhua; Ying, Leslie
2016-09-01
PET imaging plays an important role in scientific and clinical measurement of biochemical and physiological processes. Model-based PET image reconstruction such as the iterative expectation maximization algorithm seeking the maximum likelihood solution leads to increased noise. The maximum a posteriori (MAP) estimate removes divergence at higher iterations. However, a conventional smoothing prior or a total-variation (TV) prior in a MAP reconstruction algorithm causes over smoothing or blocky artifacts in the reconstructed images. We propose to use dictionary learning (DL) based sparse signal representation in the formation of the prior for MAP PET image reconstruction. The dictionary to sparsify the PET images in the reconstruction process is learned from various training images including the corresponding MR structural image and a self-created hollow sphere. Using simulated and patient brain PET data with corresponding MR images, we study the performance of the DL-MAP algorithm and compare it quantitatively with a conventional MAP algorithm, a TV-MAP algorithm, and a patch-based algorithm. The DL-MAP algorithm achieves improved bias and contrast (or regional mean values) at comparable noise to what the other MAP algorithms acquire. The dictionary learned from the hollow sphere leads to similar results as the dictionary learned from the corresponding MR image. Achieving robust performance in various noise-level simulation and patient studies, the DL-MAP algorithm with a general dictionary demonstrates its potential in quantitative PET imaging.
Cognitive and artificial representations in handwriting recognition
NASA Astrophysics Data System (ADS)
Lenaghan, Andrew P.; Malyan, Ron
1996-03-01
Both cognitive processes and artificial recognition systems may be characterized by the forms of representation they build and manipulate. This paper looks at how handwriting is represented in current recognition systems and the psychological evidence for its representation in the cognitive processes responsible for reading. Empirical psychological work on feature extraction in early visual processing is surveyed to show that a sound psychological basis for feature extraction exists and to describe the features this approach leads to. The first stage of the development of an architecture for a handwriting recognition system which has been strongly influenced by the psychological evidence for the cognitive processes and representations used in early visual processing, is reported. This architecture builds a number of parallel low level feature maps from raw data. These feature maps are thresholded and a region labeling algorithm is used to generate sets of features. Fuzzy logic is used to quantify the uncertainty in the presence of individual features.
NASA Technical Reports Server (NTRS)
Banks, David C.
1994-01-01
This talk features two simple and useful tools for digital image processing in the UNIX environment. They are xv and pbmplus. The xv image viewer which runs under the X window system reads images in a number of different file formats and writes them out in different formats. The view area supports a pop-up control panel. The 'algorithms' menu lets you blur an image. The xv control panel also activates the color editor which displays the image's color map (if one exists). The xv image viewer is available through the internet. The pbmplus package is a set of tools designed to perform image processing from within a UNIX shell. The acronym 'pbm' stands for portable bit map. Like xv, the pbm plus tool can convert images from and to many different file formats. The source code and manual pages for pbmplus are also available through the internet. This software is in the public domain.
A teaching-learning sequence about weather map reading
NASA Astrophysics Data System (ADS)
Mandrikas, Achilleas; Stavrou, Dimitrios; Skordoulis, Constantine
2017-07-01
In this paper a teaching-learning sequence (TLS) introducing pre-service elementary teachers (PET) to weather map reading, with emphasis on wind assignment, is presented. The TLS includes activities about recognition of wind symbols, assignment of wind direction and wind speed on a weather map and identification of wind characteristics in a weather forecast. Sixty PET capabilities and difficulties in understanding weather maps were investigated, using inquiry-based learning activities. The results show that most PET became more capable of reading weather maps and assigning wind direction and speed on them. Our results also show that PET could be guided to understand meteorology concepts useful in everyday life and in teaching their future students.
Unified View of Backward Backtracking in Short Read Mapping
NASA Astrophysics Data System (ADS)
Mäkinen, Veli; Välimäki, Niko; Laaksonen, Antti; Katainen, Riku
Mapping short DNA reads to the reference genome is the core task in the recent high-throughput technologies to study e.g. protein-DNA interactions (ChIP-seq) and alternative splicing (RNA-seq). Several tools for the task (bowtie, bwa, SOAP2, TopHat) have been developed that exploit Burrows-Wheeler transform and the backward backtracking technique on it, to map the reads to their best approximate occurrences in the genome. These tools use different tailored mechanisms for small error-levels to prune the search phase significantly. We propose a new pruning mechanism that can be seen a generalization of the tailored mechanisms used so far. It uses a novel idea of storing all cyclic rotations of fixed length substrings of the reference sequence with a compressed index that is able to exploit the repetitions created to level out the growth of the input set. For RNA-seq we propose a new method that combines dynamic programming with backtracking to map efficiently and correctly all reads that span two exons. Same mechanism can also be used for mapping mate-pair reads.
De, Rajat K.
2015-01-01
Copy number variation (CNV) is a form of structural alteration in the mammalian DNA sequence, which are associated with many complex neurological diseases as well as cancer. The development of next generation sequencing (NGS) technology provides us a new dimension towards detection of genomic locations with copy number variations. Here we develop an algorithm for detecting CNVs, which is based on depth of coverage data generated by NGS technology. In this work, we have used a novel way to represent the read count data as a two dimensional geometrical point. A key aspect of detecting the regions with CNVs, is to devise a proper segmentation algorithm that will distinguish the genomic locations having a significant difference in read count data. We have designed a new segmentation approach in this context, using convex hull algorithm on the geometrical representation of read count data. To our knowledge, most algorithms have used a single distribution model of read count data, but here in our approach, we have considered the read count data to follow two different distribution models independently, which adds to the robustness of detection of CNVs. In addition, our algorithm calls CNVs based on the multiple sample analysis approach resulting in a low false discovery rate with high precision. PMID:26291322
Sinha, Rituparna; Samaddar, Sandip; De, Rajat K
2015-01-01
Copy number variation (CNV) is a form of structural alteration in the mammalian DNA sequence, which are associated with many complex neurological diseases as well as cancer. The development of next generation sequencing (NGS) technology provides us a new dimension towards detection of genomic locations with copy number variations. Here we develop an algorithm for detecting CNVs, which is based on depth of coverage data generated by NGS technology. In this work, we have used a novel way to represent the read count data as a two dimensional geometrical point. A key aspect of detecting the regions with CNVs, is to devise a proper segmentation algorithm that will distinguish the genomic locations having a significant difference in read count data. We have designed a new segmentation approach in this context, using convex hull algorithm on the geometrical representation of read count data. To our knowledge, most algorithms have used a single distribution model of read count data, but here in our approach, we have considered the read count data to follow two different distribution models independently, which adds to the robustness of detection of CNVs. In addition, our algorithm calls CNVs based on the multiple sample analysis approach resulting in a low false discovery rate with high precision.
Lucas Lledó, José Ignacio; Cáceres, Mario
2013-01-01
One of the most used techniques to study structural variation at a genome level is paired-end mapping (PEM). PEM has the advantage of being able to detect balanced events, such as inversions and translocations. However, inversions are still quite difficult to predict reliably, especially from high-throughput sequencing data. We simulated realistic PEM experiments with different combinations of read and library fragment lengths, including sequencing errors and meaningful base-qualities, to quantify and track down the origin of false positives and negatives along sequencing, mapping, and downstream analysis. We show that PEM is very appropriate to detect a wide range of inversions, even with low coverage data. However, % of inversions located between segmental duplications are expected to go undetected by the most common sequencing strategies. In general, longer DNA libraries improve the detectability of inversions far better than increments of the coverage depth or the read length. Finally, we review the performance of three algorithms to detect inversions —SVDetect, GRIAL, and VariationHunter—, identify common pitfalls, and reveal important differences in their breakpoint precisions. These results stress the importance of the sequencing strategy for the detection of structural variants, especially inversions, and offer guidelines for the design of future genome sequencing projects. PMID:23637806
ERIC Educational Resources Information Center
Clements, Millard
1982-01-01
What should social studies teachers be trying to teach students how to do? Every culture provides its members with social "maps" that explain how things are--e.g., school materials, advertisements. Teaching students how to read these social "maps" should be the central task for social studies education. (RM)
A short note on dynamic programming in a band.
Gibrat, Jean-François
2018-06-15
Third generation sequencing technologies generate long reads that exhibit high error rates, in particular for insertions and deletions which are usually the most difficult errors to cope with. The only exact algorithm capable of aligning sequences with insertions and deletions is a dynamic programming algorithm. In this note, for the sake of efficiency, we consider dynamic programming in a band. We show how to choose the band width in function of the long reads' error rates, thus obtaining an [Formula: see text] algorithm in space and time. We also propose a procedure to decide whether this algorithm, when applied to semi-global alignments, provides the optimal score. We suggest that dynamic programming in a band is well suited to the problem of aligning long reads between themselves and can be used as a core component of methods for obtaining a consensus sequence from the long reads alone. The function implementing the dynamic programming algorithm in a band is available, as a standalone program, at: https://forgemia.inra.fr/jean-francois.gibrat/BAND_DYN_PROG.git.
Tramontano, A; Macchiato, M F
1986-01-01
An algorithm to determine the probability that a reading frame codifies for a protein is presented. It is based on the results of our previous studies on the thermodynamic characteristics of a translated reading frame. We also develop a prediction procedure to distinguish between coding and non-coding reading frames. The procedure is based on the characteristics of the putative product of the DNA sequence and not on periodicity characteristics of the sequence, so the prediction is not biased by the presence of overlapping translated reading frames or by the presence of translated reading frames on the complementary DNA strand. PMID:3753761
Algorithm for Identifying Erroneous Rain-Gauge Readings
NASA Technical Reports Server (NTRS)
Rickman, Doug
2005-01-01
An algorithm analyzes rain-gauge data to identify statistical outliers that could be deemed to be erroneous readings. Heretofore, analyses of this type have been performed in burdensome manual procedures that have involved subjective judgements. Sometimes, the analyses have included computational assistance for detecting values falling outside of arbitrary limits. The analyses have been performed without statistically valid knowledge of the spatial and temporal variations of precipitation within rain events. In contrast, the present algorithm makes it possible to automate such an analysis, makes the analysis objective, takes account of the spatial distribution of rain gauges in conjunction with the statistical nature of spatial variations in rainfall readings, and minimizes the use of arbitrary criteria. The algorithm implements an iterative process that involves nonparametric statistics.
Core Knowledge and the Emergence of Symbols: The Case of Maps
ERIC Educational Resources Information Center
Huang, Yi; Spelke, Elizabeth S.
2015-01-01
Map reading is unique to humans but is present in people of diverse cultures, at ages as young as 4 years old. Here, we explore the nature and sources of this ability and ask both what geometric information young children use in maps and what nonsymbolic systems are associated with their map-reading performance. Four-year-old children were given…
H-PoP and H-PoPG: heuristic partitioning algorithms for single individual haplotyping of polyploids.
Xie, Minzhu; Wu, Qiong; Wang, Jianxin; Jiang, Tao
2016-12-15
Some economically important plants including wheat and cotton have more than two copies of each chromosome. With the decreasing cost and increasing read length of next-generation sequencing technologies, reconstructing the multiple haplotypes of a polyploid genome from its sequence reads becomes practical. However, the computational challenge in polyploid haplotyping is much greater than that in diploid haplotyping, and there are few related methods. This article models the polyploid haplotyping problem as an optimal poly-partition problem of the reads, called the Polyploid Balanced Optimal Partition model. For the reads sequenced from a k-ploid genome, the model tries to divide the reads into k groups such that the difference between the reads of the same group is minimized while the difference between the reads of different groups is maximized. When the genotype information is available, the model is extended to the Polyploid Balanced Optimal Partition with Genotype constraint problem. These models are all NP-hard. We propose two heuristic algorithms, H-PoP and H-PoPG, based on dynamic programming and a strategy of limiting the number of intermediate solutions at each iteration, to solve the two models, respectively. Extensive experimental results on simulated and real data show that our algorithms can solve the models effectively, and are much faster and more accurate than the recent state-of-the-art polyploid haplotyping algorithms. The experiments also show that our algorithms can deal with long reads and deep read coverage effectively and accurately. Furthermore, H-PoP might be applied to help determine the ploidy of an organism. https://github.com/MinzhuXie/H-PoPG CONTACT: xieminzhu@hotmail.comSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
ERIC Educational Resources Information Center
Berry, Jaime; Potter, Jalene; Hollas, Victoria
2013-01-01
This quasi-experimental study compared the effects of concept mapping and teacher generated questioning on students' organization and retention of science knowledge when used along with interactive informational read-alouds. Fifty-eight third grade students completed an eight-day unit regarding "soil formation." Students who participated…
78 FR 7441 - Proposed Flood Hazard Determinations
Federal Register 2010, 2011, 2012, 2013, 2014
2013-02-01
... Online at:'' Link is corrected to read as follows: http://riskmap6.com/Community.aspx?cid=208&sid=4 3. On... ``Maps Available for Inspection Online at:'' Link is corrected to read as follows: http://riskmap6.com... Areas,'' the entry for the ``Maps Available for Inspection Online at:'' Link is corrected to read as...
Effects of Feedforward and Feedback Consistency on Reading and Spelling in Dyslexia
ERIC Educational Resources Information Center
Davies, Robert A. I.; Weekes, Brendan S.
2005-01-01
We investigated the effects of rime consistency on reading and spelling among dyslexic children and a group of matched reading age skilled readers by manipulating consistency of orthography-to-phonology (OP) mappings and consistency of mappings from phonology-to-orthography (PO). For both dyslexic and control children we found feedforward…
An optimized protocol for generation and analysis of Ion Proton sequencing reads for RNA-Seq.
Yuan, Yongxian; Xu, Huaiqian; Leung, Ross Ka-Kit
2016-05-26
Previous studies compared running cost, time and other performance measures of popular sequencing platforms. However, comprehensive assessment of library construction and analysis protocols for Proton sequencing platform remains unexplored. Unlike Illumina sequencing platforms, Proton reads are heterogeneous in length and quality. When sequencing data from different platforms are combined, this can result in reads with various read length. Whether the performance of the commonly used software for handling such kind of data is satisfactory is unknown. By using universal human reference RNA as the initial material, RNaseIII and chemical fragmentation methods in library construction showed similar result in gene and junction discovery number and expression level estimated accuracy. In contrast, sequencing quality, read length and the choice of software affected mapping rate to a much larger extent. Unspliced aligner TMAP attained the highest mapping rate (97.27 % to genome, 86.46 % to transcriptome), though 47.83 % of mapped reads were clipped. Long reads could paradoxically reduce mapping in junctions. With reference annotation guide, the mapping rate of TopHat2 significantly increased from 75.79 to 92.09 %, especially for long (>150 bp) reads. Sailfish, a k-mer based gene expression quantifier attained highly consistent results with that of TaqMan array and highest sensitivity. We provided for the first time, the reference statistics of library preparation methods, gene detection and quantification and junction discovery for RNA-Seq by the Ion Proton platform. Chemical fragmentation performed equally well with the enzyme-based one. The optimal Ion Proton sequencing options and analysis software have been evaluated.
A Teaching-Learning Sequence about Weather Map Reading
ERIC Educational Resources Information Center
Mandrikas, Achilleas; Stavrou, Dimitrios; Skordoulis, Constantine
2017-01-01
In this paper a teaching-learning sequence (TLS) introducing pre-service elementary teachers (PET) to weather map reading, with emphasis on wind assignment, is presented. The TLS includes activities about recognition of wind symbols, assignment of wind direction and wind speed on a weather map and identification of wind characteristics in a…
An Examination of the Effects of Argument Mapping on Students' Memory and Comprehension Performance
ERIC Educational Resources Information Center
Dwyer, Christopher P.; Hogan, Michael J.; Stewart, Ian
2013-01-01
Argument mapping (AM) is a method of visually diagramming arguments to allow for easy comprehension of core statements and relations. A series of three experiments compared argument map reading and construction with hierarchical outlining, text summarisation, and text reading as learning methods by examining subsequent memory and comprehension…
NASA Astrophysics Data System (ADS)
Donchenko, Sergey S.; Odinokov, Sergey B.; Betin, Alexandr U.; Hanevich, Pavel; Semishko, Sergey; Zlokazov, Evgenii Y.
2017-05-01
The holographic disk reading device for recovery of CGFH is described. Principle of its work is shown. Analyzed approaches for developing algorithms, used in this device: guidance and decoding. Listed results of experimental researches.
Arnold, David T; Rowen, Donna; Versteegh, Matthijs M; Morley, Anna; Hooper, Clare E; Maskell, Nicholas A
2015-01-23
In order to estimate utilities for cancer studies where the EQ-5D was not used, the EORTC QLQ-C30 can be used to estimate EQ-5D using existing mapping algorithms. Several mapping algorithms exist for this transformation, however, algorithms tend to lose accuracy in patients in poor health states. The aim of this study was to test all existing mapping algorithms of QLQ-C30 onto EQ-5D, in a dataset of patients with malignant pleural mesothelioma, an invariably fatal malignancy where no previous mapping estimation has been published. Health related quality of life (HRQoL) data where both the EQ-5D and QLQ-C30 were used simultaneously was obtained from the UK-based prospective observational SWAMP (South West Area Mesothelioma and Pemetrexed) trial. In the original trial 73 patients with pleural mesothelioma were offered palliative chemotherapy and their HRQoL was assessed across five time points. This data was used to test the nine available mapping algorithms found in the literature, comparing predicted against observed EQ-5D values. The ability of algorithms to predict the mean, minimise error and detect clinically significant differences was assessed. The dataset had a total of 250 observations across 5 timepoints. The linear regression mapping algorithms tested generally performed poorly, over-estimating the predicted compared to observed EQ-5D values, especially when observed EQ-5D was below 0.5. The best performing algorithm used a response mapping method and predicted the mean EQ-5D with accuracy with an average root mean squared error of 0.17 (Standard Deviation; 0.22). This algorithm reliably discriminated between clinically distinct subgroups seen in the primary dataset. This study tested mapping algorithms in a population with poor health states, where they have been previously shown to perform poorly. Further research into EQ-5D estimation should be directed at response mapping methods given its superior performance in this study.
ERIC Educational Resources Information Center
Faramarzi, Salar; Moradi, Mohammadreza; Abedi, Ahmad
2018-01-01
The present study aimed to develop the thinking maps training package and compare its training effect with the thinking maps method on the reading performance of second and fifth grade of elementary school male dyslexic students. For this mixed method exploratory study, from among the above mentioned grades' students in Isfahan, 90 students who…
A Novel Color Image Encryption Algorithm Based on Quantum Chaos Sequence
NASA Astrophysics Data System (ADS)
Liu, Hui; Jin, Cong
2017-03-01
In this paper, a novel algorithm of image encryption based on quantum chaotic is proposed. The keystreams are generated by the two-dimensional logistic map as initial conditions and parameters. And then general Arnold scrambling algorithm with keys is exploited to permute the pixels of color components. In diffusion process, a novel encryption algorithm, folding algorithm, is proposed to modify the value of diffused pixels. In order to get the high randomness and complexity, the two-dimensional logistic map and quantum chaotic map are coupled with nearest-neighboring coupled-map lattices. Theoretical analyses and computer simulations confirm that the proposed algorithm has high level of security.
State-of-the-Art Fusion-Finder Algorithms Sensitivity and Specificity
Carrara, Matteo; Beccuti, Marco; Lazzarato, Fulvio; Cavallo, Federica; Cordero, Francesca; Donatelli, Susanna; Calogero, Raffaele A.
2013-01-01
Background. Gene fusions arising from chromosomal translocations have been implicated in cancer. RNA-seq has the potential to discover such rearrangements generating functional proteins (chimera/fusion). Recently, many methods for chimeras detection have been published. However, specificity and sensitivity of those tools were not extensively investigated in a comparative way. Results. We tested eight fusion-detection tools (FusionHunter, FusionMap, FusionFinder, MapSplice, deFuse, Bellerophontes, ChimeraScan, and TopHat-fusion) to detect fusion events using synthetic and real datasets encompassing chimeras. The comparison analysis run only on synthetic data could generate misleading results since we found no counterpart on real dataset. Furthermore, most tools report a very high number of false positive chimeras. In particular, the most sensitive tool, ChimeraScan, reports a large number of false positives that we were able to significantly reduce by devising and applying two filters to remove fusions not supported by fusion junction-spanning reads or encompassing large intronic regions. Conclusions. The discordant results obtained using synthetic and real datasets suggest that synthetic datasets encompassing fusion events may not fully catch the complexity of RNA-seq experiment. Moreover, fusion detection tools are still limited in sensitivity or specificity; thus, there is space for further improvement in the fusion-finder algorithms. PMID:23555082
USDA-ARS?s Scientific Manuscript database
Assembled sequence contigs by SOAPdenova and Volvet algorithms from metagenomic short reads of a new bacterial isolate of gut origin. This study included 2 submissions with a total of 9.8 million bp of assembled contigs....
Liu, Jiemeng; Wang, Haifeng; Yang, Hongxing; Zhang, Yizhe; Wang, Jinfeng; Zhao, Fangqing; Qi, Ji
2013-01-01
Compared with traditional algorithms for long metagenomic sequence classification, characterizing microorganisms’ taxonomic and functional abundance based on tens of millions of very short reads are much more challenging. We describe an efficient composition and phylogeny-based algorithm [Metagenome Composition Vector (MetaCV)] to classify very short metagenomic reads (75–100 bp) into specific taxonomic and functional groups. We applied MetaCV to the Meta-HIT data (371-Gb 75-bp reads of 109 human gut metagenomes), and this single-read-based, instead of assembly-based, classification has a high resolution to characterize the composition and structure of human gut microbiota, especially for low abundance species. Most strikingly, it only took MetaCV 10 days to do all the computation work on a server with five 24-core nodes. To our knowledge, MetaCV, benefited from the strategy of composition comparison, is the first algorithm that can classify millions of very short reads within affordable time. PMID:22941634
ERIC Educational Resources Information Center
Riahi, Zahra; Pourdana, Natasha
2017-01-01
The present study attempted to investigate the possible impacts of Individual Concept Mapping (ICM) and Collaborative Concept Mapping (CCM) strategies on Iranian EFL learners' reading comprehension. For this purpose, 90 pre-intermediate female language learners ranged between 12 to 17 years of age were selected to randomly assign into ICM, CCM and…
Giese, Sven H; Zickmann, Franziska; Renard, Bernhard Y
2014-01-01
Accurate estimation, comparison and evaluation of read mapping error rates is a crucial step in the processing of next-generation sequencing data, as further analysis steps and interpretation assume the correctness of the mapping results. Current approaches are either focused on sensitivity estimation and thereby disregard specificity or are based on read simulations. Although continuously improving, read simulations are still prone to introduce a bias into the mapping error quantitation and cannot capture all characteristics of an individual dataset. We introduce ARDEN (artificial reference driven estimation of false positives in next-generation sequencing data), a novel benchmark method that estimates error rates of read mappers based on real experimental reads, using an additionally generated artificial reference genome. It allows a dataset-specific computation of error rates and the construction of a receiver operating characteristic curve. Thereby, it can be used for optimization of parameters for read mappers, selection of read mappers for a specific problem or for filtering alignments based on quality estimation. The use of ARDEN is demonstrated in a general read mapper comparison, a parameter optimization for one read mapper and an application example in single-nucleotide polymorphism discovery with a significant reduction in the number of false positive identifications. The ARDEN source code is freely available at http://sourceforge.net/projects/arden/.
A Model Critical Reading Lesson for Secondary High-Risk Students.
ERIC Educational Resources Information Center
Haney, Gail; Thistlethwaite, Linda
1991-01-01
This article defines critical reading, discusses associated frameworks, and lists considerations for choosing topics and reading materials. A sample critical reading lesson using a "mapping" approach with a reading on euthanasia demonstrates guiding secondary learning-disabled students in critical reading. (DB)
Zseq: An Approach for Preprocessing Next-Generation Sequencing Data.
Alkhateeb, Abedalrhman; Rueda, Luis
2017-08-01
Next-generation sequencing technology generates a huge number of reads (short sequences), which contain a vast amount of genomic data. The sequencing process, however, comes with artifacts. Preprocessing of sequences is mandatory for further downstream analysis. We present Zseq, a linear method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of unique k-mers in each sequence as its corresponding score and also takes into the account other factors such as ambiguous nucleotides or high GC-content percentage in k-mers. Based on a z-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold. Zseq algorithm is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as de novo assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Moreover, de novo assembled transcripts from the reads filtered by Zseq have longer genomic sequences than other tested methods. Estimating the threshold of the cutoff point is introduced using labeling rules with optimistic results.
Plasmid mapping computer program.
Nolan, G P; Maina, C V; Szalay, A A
1984-01-01
Three new computer algorithms are described which rapidly order the restriction fragments of a plasmid DNA which has been cleaved with two restriction endonucleases in single and double digestions. Two of the algorithms are contained within a single computer program (called MPCIRC). The Rule-Oriented algorithm, constructs all logical circular map solutions within sixty seconds (14 double-digestion fragments) when used in conjunction with the Permutation method. The program is written in Apple Pascal and runs on an Apple II Plus Microcomputer with 64K of memory. A third algorithm is described which rapidly maps double digests and uses the above two algorithms as adducts. Modifications of the algorithms for linear mapping are also presented. PMID:6320105
2015-01-01
We present and discuss philosophy and methodology of chaotic evolution that is theoretically supported by chaos theory. We introduce four chaotic systems, that is, logistic map, tent map, Gaussian map, and Hénon map, in a well-designed chaotic evolution algorithm framework to implement several chaotic evolution (CE) algorithms. By comparing our previous proposed CE algorithm with logistic map and two canonical differential evolution (DE) algorithms, we analyse and discuss optimization performance of CE algorithm. An investigation on the relationship between optimization capability of CE algorithm and distribution characteristic of chaotic system is conducted and analysed. From evaluation result, we find that distribution of chaotic system is an essential factor to influence optimization performance of CE algorithm. We propose a new interactive EC (IEC) algorithm, interactive chaotic evolution (ICE) that replaces fitness function with a real human in CE algorithm framework. There is a paired comparison-based mechanism behind CE search scheme in nature. A simulation experimental evaluation is conducted with a pseudo-IEC user to evaluate our proposed ICE algorithm. The evaluation result indicates that ICE algorithm can obtain a significant better performance than or the same performance as interactive DE. Some open topics on CE, ICE, fusion of these optimization techniques, algorithmic notation, and others are presented and discussed. PMID:25879067
Pei, Yan
2015-01-01
We present and discuss philosophy and methodology of chaotic evolution that is theoretically supported by chaos theory. We introduce four chaotic systems, that is, logistic map, tent map, Gaussian map, and Hénon map, in a well-designed chaotic evolution algorithm framework to implement several chaotic evolution (CE) algorithms. By comparing our previous proposed CE algorithm with logistic map and two canonical differential evolution (DE) algorithms, we analyse and discuss optimization performance of CE algorithm. An investigation on the relationship between optimization capability of CE algorithm and distribution characteristic of chaotic system is conducted and analysed. From evaluation result, we find that distribution of chaotic system is an essential factor to influence optimization performance of CE algorithm. We propose a new interactive EC (IEC) algorithm, interactive chaotic evolution (ICE) that replaces fitness function with a real human in CE algorithm framework. There is a paired comparison-based mechanism behind CE search scheme in nature. A simulation experimental evaluation is conducted with a pseudo-IEC user to evaluate our proposed ICE algorithm. The evaluation result indicates that ICE algorithm can obtain a significant better performance than or the same performance as interactive DE. Some open topics on CE, ICE, fusion of these optimization techniques, algorithmic notation, and others are presented and discussed.
Fault-tolerant processing system
NASA Technical Reports Server (NTRS)
Palumbo, Daniel L. (Inventor)
1996-01-01
A fault-tolerant, fiber optic interconnect, or backplane, which serves as a via for data transfer between modules. Fault tolerance algorithms are embedded in the backplane by dividing the backplane into a read bus and a write bus and placing a redundancy management unit (RMU) between the read bus and the write bus so that all data transmitted by the write bus is subjected to the fault tolerance algorithms before the data is passed for distribution to the read bus. The RMU provides both backplane control and fault tolerance.
Concept Mapping as a Reading Strategy: Does It Scaffold Comprehension and Recall?
ERIC Educational Resources Information Center
Tajeddin, Zia; Tabatabaei, Soudabeh
2016-01-01
Concept maps reflect the linkage of concepts or facts within a text. This study was set out to investigate whether concept mapping as a learning strategy would have any scaffolding effect on the reading comprehension and recall of propositions by L2 learners. Out of 60 high school students, 30 in the experimental group were exposed to concept…
ERIC Educational Resources Information Center
Alturki, Nada
2017-01-01
The purpose of this study was to examine the effectiveness of using group story-mapping of English as a second language (ESL) on students with learning disability while reading comprehension. The researcher focused on a specific graphic organizer in this study, called group story-mapping. This strategy required students with learning disabilities…
The Impact of Electronic Mind Maps on Students' Reading Comprehension
ERIC Educational Resources Information Center
Mohaidat, Mohammad Mahmoud Talal
2018-01-01
This study aimed to investigate the impact of the electronic mind map (IMindMap) on the development of reading comprehension among the ninth grade students in Jordan. The sample of the study consisted of two ninth grade sections from two public schools in Irbid First Directorate during the academic 2016-2017. Each section consisted of (30)…
Map reading tools for map libraries.
Greenberg, G.L.
1982-01-01
Engineers, navigators and military strategists employ a broad array of mechanical devices to facilitate map use. A larger number of map users such as educators, students, tourists, journalists, historians, politicians, economists and librarians are unaware of the available variety of tools which can be used with maps to increase the speed and efficiency of their application and interpretation. This paper identifies map reading tools such as coordinate readers, protractors, dividers, planimeters, and symbol-templets according to a functional classification. Particularly, arrays of tools are suggested for use in determining position, direction, distance, area and form (perimeter-shape-pattern-relief). -from Author
NASA Technical Reports Server (NTRS)
Lee, C. S. G.; Chen, C. L.
1989-01-01
Two efficient mapping algorithms for scheduling the robot inverse dynamics computation consisting of m computational modules with precedence relationship to be executed on a multiprocessor system consisting of p identical homogeneous processors with processor and communication costs to achieve minimum computation time are presented. An objective function is defined in terms of the sum of the processor finishing time and the interprocessor communication time. The minimax optimization is performed on the objective function to obtain the best mapping. This mapping problem can be formulated as a combination of the graph partitioning and the scheduling problems; both have been known to be NP-complete. Thus, to speed up the searching for a solution, two heuristic algorithms were proposed to obtain fast but suboptimal mapping solutions. The first algorithm utilizes the level and the communication intensity of the task modules to construct an ordered priority list of ready modules and the module assignment is performed by a weighted bipartite matching algorithm. For a near-optimal mapping solution, the problem can be solved by the heuristic algorithm with simulated annealing. These proposed optimization algorithms can solve various large-scale problems within a reasonable time. Computer simulations were performed to evaluate and verify the performance and the validity of the proposed mapping algorithms. Finally, experiments for computing the inverse dynamics of a six-jointed PUMA-like manipulator based on the Newton-Euler dynamic equations were implemented on an NCUBE/ten hypercube computer to verify the proposed mapping algorithms. Computer simulation and experimental results are compared and discussed.
2012-01-01
Background Structured association mapping is proving to be a powerful strategy to find genetic polymorphisms associated with disease. However, these algorithms are often distributed as command line implementations that require expertise and effort to customize and put into practice. Because of the difficulty required to use these cutting-edge techniques, geneticists often revert to simpler, less powerful methods. Results To make structured association mapping more accessible to geneticists, we have developed an automatic processing system called Auto-SAM. Auto-SAM enables geneticists to run structured association mapping algorithms automatically, using parallelization. Auto-SAM includes algorithms to discover gene-networks and find population structure. Auto-SAM can also run popular association mapping algorithms, in addition to five structured association mapping algorithms. Conclusions Auto-SAM is available through GenAMap, a front-end desktop visualization tool. GenAMap and Auto-SAM are implemented in JAVA; binaries for GenAMap can be downloaded from http://sailing.cs.cmu.edu/genamap. PMID:22471660
An Algorithm Enabling Blind Users to Find and Read Barcodes
Tekin, Ender; Coughlan, James M.
2010-01-01
Most camera-based systems for finding and reading barcodes are designed to be used by sighted users (e.g. the Red Laser iPhone app), and assume the user carefully centers the barcode in the image before the barcode is read. Blind individuals could benefit greatly from such systems to identify packaged goods (such as canned goods in a supermarket), but unfortunately in their current form these systems are completely inaccessible because of their reliance on visual feedback from the user. To remedy this problem, we propose a computer vision algorithm that processes several frames of video per second to detect barcodes from a distance of several inches; the algorithm issues directional information with audio feedback (e.g. “left,” “right”) and thereby guides a blind user holding a webcam or other portable camera to locate and home in on a barcode. Once the barcode is detected at sufficiently close range, a barcode reading algorithm previously developed by the authors scans and reads aloud the barcode and the corresponding product information. We demonstrate encouraging experimental results of our proposed system implemented on a desktop computer with a webcam held by a blindfolded user; ultimately the system will be ported to a camera phone for use by visually impaired users. PMID:20617114
Izard, Véronique; O'Donnell, Evan; Spelke, Elizabeth S
2014-01-01
Preschool children can navigate by simple geometric maps of the environment, but the nature of the geometric relations they use in map reading remains unclear. Here, children were tested specifically on their sensitivity to angle. Forty-eight children (age 47:15-53:30 months) were presented with fragments of geometric maps, in which angle sections appeared without any relevant length or distance information. Children were able to read these map fragments and compare two-dimensional to three-dimensional angles. However, this ability appeared both variable and fragile among the youngest children of the sample. These findings suggest that 4-year-old children begin to form an abstract concept of angle that applies both to two-dimensional and three-dimensional displays and that serves to interpret novel spatial symbols. © 2013 The Authors. Child Development © 2013 Society for Research in Child Development, Inc.
A new image enhancement algorithm with applications to forestry stand mapping
NASA Technical Reports Server (NTRS)
Kan, E. P. F. (Principal Investigator); Lo, J. K.
1975-01-01
The author has identified the following significant results. Results show that the new algorithm produced cleaner classification maps in which holes of small predesignated sizes were eliminated and significant boundary information was preserved. These cleaner post-processed maps better resemble true life timber stand maps and are thus more usable products than the pre-post-processing ones: Compared to an accepted neighbor-checking post-processing technique, the new algorithm is more appropriate for timber stand mapping.
Axe: rapid, competitive sequence read demultiplexing using a trie.
Murray, Kevin D; Borevitz, Justin O
2018-06-01
We describe a rapid algorithm for demultiplexing DNA sequence reads with in-read indices. Axe selects the optimal index present in a sequence read, even in the presence of sequencing errors. The algorithm is able to handle combinatorial indexing, indices of differing length, and several mismatches per index sequence. Axe is implemented in C, and is used as a command-line program on Unix-like systems. Axe is available online at https://github.com/kdmurray91/axe, and is available in Debian/Ubuntu distributions of GNU/Linux as the package axe-demultiplexer. Kevin Murray axe@kdmurray.id.au. Supplementary data are available at Bioinformatics online.
Ribeiro, Antonio; Golicz, Agnieszka; Hackett, Christine Anne; Milne, Iain; Stephen, Gordon; Marshall, David; Flavell, Andrew J; Bayer, Micha
2015-11-11
Single Nucleotide Polymorphisms (SNPs) are widely used molecular markers, and their use has increased massively since the inception of Next Generation Sequencing (NGS) technologies, which allow detection of large numbers of SNPs at low cost. However, both NGS data and their analysis are error-prone, which can lead to the generation of false positive (FP) SNPs. We explored the relationship between FP SNPs and seven factors involved in mapping-based variant calling - quality of the reference sequence, read length, choice of mapper and variant caller, mapping stringency and filtering of SNPs by read mapping quality and read depth. This resulted in 576 possible factor level combinations. We used error- and variant-free simulated reads to ensure that every SNP found was indeed a false positive. The variation in the number of FP SNPs generated ranged from 0 to 36,621 for the 120 million base pairs (Mbp) genome. All of the experimental factors tested had statistically significant effects on the number of FP SNPs generated and there was a considerable amount of interaction between the different factors. Using a fragmented reference sequence led to a dramatic increase in the number of FP SNPs generated, as did relaxed read mapping and a lack of SNP filtering. The choice of reference assembler, mapper and variant caller also significantly affected the outcome. The effect of read length was more complex and suggests a possible interaction between mapping specificity and the potential for contributing more false positives as read length increases. The choice of tools and parameters involved in variant calling can have a dramatic effect on the number of FP SNPs produced, with particularly poor combinations of software and/or parameter settings yielding tens of thousands in this experiment. Between-factor interactions make simple recommendations difficult for a SNP discovery pipeline but the quality of the reference sequence is clearly of paramount importance. Our findings are also a stark reminder that it can be unwise to use the relaxed mismatch settings provided as defaults by some read mappers when reads are being mapped to a relatively unfinished reference sequence from e.g. a non-model organism in its early stages of genomic exploration.
Mapping chemicals in air using an environmental CAT scanning system: evaluation of algorithms
NASA Astrophysics Data System (ADS)
Samanta, A.; Todd, L. A.
A new technique is being developed which creates near real-time maps of chemical concentrations in air for environmental and occupational environmental applications. This technique, we call Environmental CAT Scanning, combines the real-time measuring technique of open-path Fourier transform infrared spectroscopy with the mapping capabilitites of computed tomography to produce two-dimensional concentration maps. With this system, a network of open-path measurements is obtained over an area; measurements are then processed using a tomographic algorithm to reconstruct the concentrations. This research focussed on the process of evaluating and selecting appropriate reconstruction algorithms, for use in the field, by using test concentration data from both computer simultation and laboratory chamber studies. Four algorithms were tested using three types of data: (1) experimental open-path data from studies that used a prototype opne-path Fourier transform/computed tomography system in an exposure chamber; (2) synthetic open-path data generated from maps created by kriging point samples taken in the chamber studies (in 1), and; (3) synthetic open-path data generated using a chemical dispersion model to create time seires maps. The iterative algorithms used to reconstruct the concentration data were: Algebraic Reconstruction Technique without Weights (ART1), Algebraic Reconstruction Technique with Weights (ARTW), Maximum Likelihood with Expectation Maximization (MLEM) and Multiplicative Algebraic Reconstruction Technique (MART). Maps were evaluated quantitatively and qualitatively. In general, MART and MLEM performed best, followed by ARTW and ART1. However, algorithm performance varied under different contaminant scenarios. This study showed the importance of using a variety of maps, particulary those generated using dispersion models. The time series maps provided a more rigorous test of the algorithms and allowed distinctions to be made among the algorithms. A comprehensive evaluation of algorithms, for the environmental application of tomography, requires the use of a battery of test concentration data before field implementation, which models reality and tests the limits of the algorithms.
Electronic Thermometer Readings
NASA Technical Reports Server (NTRS)
2001-01-01
NASA Stennis' adaptive predictive algorithm for electronic thermometers uses sample readings during the initial rise in temperature and applies an algorithm that accurately and rapidly predicts the steady state temperature. The final steady state temperature of an object can be calculated based on the second-order logarithm of the temperature signals acquired by the sensor and predetermined variables from the sensor characteristics. These variables are calculated during tests of the sensor. Once the variables are determined, relatively little data acquisition and data processing time by the algorithm is required to provide a near-accurate approximation of the final temperature. This reduces the delay in the steady state response time of a temperature sensor. This advanced algorithm can be implemented in existing software or hardware with an erasable programmable read-only memory (EPROM). The capability for easy integration eliminates the expense of developing a whole new system that offers the benefits provided by NASA Stennis' technology.
Lee, Hayan; Schatz, Michael C
2012-08-15
Genome resequencing and short read mapping are two of the primary tools of genomics and are used for many important applications. The current state-of-the-art in mapping uses the quality values and mapping quality scores to evaluate the reliability of the mapping. These attributes, however, are assigned to individual reads and do not directly measure the problematic repeats across the genome. Here, we present the Genome Mappability Score (GMS) as a novel measure of the complexity of resequencing a genome. The GMS is a weighted probability that any read could be unambiguously mapped to a given position and thus measures the overall composition of the genome itself. We have developed the Genome Mappability Analyzer to compute the GMS of every position in a genome. It leverages the parallelism of cloud computing to analyze large genomes, and enabled us to identify the 5-14% of the human, mouse, fly and yeast genomes that are difficult to analyze with short reads. We examined the accuracy of the widely used BWA/SAMtools polymorphism discovery pipeline in the context of the GMS, and found discovery errors are dominated by false negatives, especially in regions with poor GMS. These errors are fundamental to the mapping process and cannot be overcome by increasing coverage. As such, the GMS should be considered in every resequencing project to pinpoint the 'dark matter' of the genome, including of known clinically relevant variations in these regions. The source code and profiles of several model organisms are available at http://gma-bio.sourceforge.net
2011-01-01
Background Integration of genomic variation with phenotypic information is an effective approach for uncovering genotype-phenotype associations. This requires an accurate identification of the different types of variation in individual genomes. Results We report the integration of the whole genome sequence of a single Holstein Friesian bull with data from single nucleotide polymorphism (SNP) and comparative genomic hybridization (CGH) array technologies to determine a comprehensive spectrum of genomic variation. The performance of resequencing SNP detection was assessed by combining SNPs that were identified to be either in identity by descent (IBD) or in copy number variation (CNV) with results from SNP array genotyping. Coding insertions and deletions (indels) were found to be enriched for size in multiples of 3 and were located near the N- and C-termini of proteins. For larger indels, a combination of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays. Conclusions Our results provide high resolution mapping of diverse classes of genomic variation in an individual bovine genome and demonstrate that structural variation surpasses sequence variation as the main component of genomic variability. Better accuracy of SNP detection was achieved with little loss of sensitivity when algorithms that implemented mapping quality were used. IBD regions were found to be instrumental for calculating resequencing SNP accuracy, while SNP detection within CNVs tended to be less reliable. CNV discovery was affected dramatically by platform resolution and coverage biases. The combined data for this study showed that at a moderate level of sequencing coverage, an ensemble of platforms and tools can be applied together to maximize the accurate detection of sequence and structural variants. PMID:22082336
Wright, Imogen A.; Travers, Simon A.
2014-01-01
The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618
GPU-BSM: A GPU-Based Tool to Map Bisulfite-Treated Reads
Manconi, Andrea; Orro, Alessandro; Manca, Emanuele; Armano, Giuliano; Milanesi, Luciano
2014-01-01
Cytosine DNA methylation is an epigenetic mark implicated in several biological processes. Bisulfite treatment of DNA is acknowledged as the gold standard technique to study methylation. This technique introduces changes in the genomic DNA by converting cytosines to uracils while 5-methylcytosines remain nonreactive. During PCR amplification 5-methylcytosines are amplified as cytosine, whereas uracils and thymines as thymine. To detect the methylation levels, reads treated with the bisulfite must be aligned against a reference genome. Mapping these reads to a reference genome represents a significant computational challenge mainly due to the increased search space and the loss of information introduced by the treatment. To deal with this computational challenge we devised GPU-BSM, a tool based on modern Graphics Processing Units. Graphics Processing Units are hardware accelerators that are increasingly being used successfully to accelerate general-purpose scientific applications. GPU-BSM is a tool able to map bisulfite-treated reads from whole genome bisulfite sequencing and reduced representation bisulfite sequencing, and to estimate methylation levels, with the goal of detecting methylation. Due to the massive parallelization obtained by exploiting graphics cards, GPU-BSM aligns bisulfite-treated reads faster than other cutting-edge solutions, while outperforming most of them in terms of unique mapped reads. PMID:24842718
Numerical Conformal Mapping Using Cross-Ratios and Delaunay Triangulation
NASA Technical Reports Server (NTRS)
Driscoll, Tobin A.; Vavasis, Stephen A.
1996-01-01
We propose a new algorithm for computing the Riemann mapping of the unit disk to a polygon, also known as the Schwarz-Christoffel transformation. The new algorithm, CRDT, is based on cross-ratios of the prevertices, and also on cross-ratios of quadrilaterals in a Delaunay triangulation of the polygon. The CRDT algorithm produces an accurate representation of the Riemann mapping even in the presence of arbitrary long, thin regions in the polygon, unlike any previous conformal mapping algorithm. We believe that CRDT can never fail to converge to the correct Riemann mapping, but the correctness and convergence proof depend on conjectures that we have so far not been able to prove. We demonstrate convergence with computational experiments. The Riemann mapping has applications to problems in two-dimensional potential theory and to finite-difference mesh generation. We use CRDT to produce a mapping and solve a boundary value problem on long, thin regions for which no other algorithm can solve these problems.
New segmentation-based tone mapping algorithm for high dynamic range image
NASA Astrophysics Data System (ADS)
Duan, Weiwei; Guo, Huinan; Zhou, Zuofeng; Huang, Huimin; Cao, Jianzhong
2017-07-01
The traditional tone mapping algorithm for the display of high dynamic range (HDR) image has the drawback of losing the impression of brightness, contrast and color information. To overcome this phenomenon, we propose a new tone mapping algorithm based on dividing the image into different exposure regions in this paper. Firstly, the over-exposure region is determined using the Local Binary Pattern information of HDR image. Then, based on the peak and average gray of the histogram, the under-exposure and normal-exposure region of HDR image are selected separately. Finally, the different exposure regions are mapped by differentiated tone mapping methods to get the final result. The experiment results show that the proposed algorithm achieve the better performance both in visual quality and objective contrast criterion than other algorithms.
Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shi, Xuanhua; Luo, Xuan; Liang, Junling
GPUs have been increasingly used to accelerate graph processing for complicated computational problems regarding graph theory. Many parallel graph algorithms adopt the asynchronous computing model to accelerate the iterative convergence. Unfortunately, the consistent asynchronous computing requires locking or atomic operations, leading to significant penalties/overheads when implemented on GPUs. As such, coloring algorithm is adopted to separate the vertices with potential updating conflicts, guaranteeing the consistency/correctness of the parallel processing. Common coloring algorithms, however, may suffer from low parallelism because of a large number of colors generally required for processing a large-scale graph with billions of vertices. We propose a light-weightmore » asynchronous processing framework called Frog with a preprocessing/hybrid coloring model. The fundamental idea is based on Pareto principle (or 80-20 rule) about coloring algorithms as we observed through masses of realworld graph coloring cases. We find that a majority of vertices (about 80%) are colored with only a few colors, such that they can be read and updated in a very high degree of parallelism without violating the sequential consistency. Accordingly, our solution separates the processing of the vertices based on the distribution of colors. In this work, we mainly answer three questions: (1) how to partition the vertices in a sparse graph with maximized parallelism, (2) how to process large-scale graphs that cannot fit into GPU memory, and (3) how to reduce the overhead of data transfers on PCIe while processing each partition. We conduct experiments on real-world data (Amazon, DBLP, YouTube, RoadNet-CA, WikiTalk and Twitter) to evaluate our approach and make comparisons with well-known non-preprocessed (such as Totem, Medusa, MapGraph and Gunrock) and preprocessed (Cusha) approaches, by testing four classical algorithms (BFS, PageRank, SSSP and CC). On all the tested applications and datasets, Frog is able to significantly outperform existing GPU-based graph processing systems except Gunrock and MapGraph. MapGraph gets better performance than Frog when running BFS on RoadNet-CA. The comparison between Gunrock and Frog is inconclusive. Frog can outperform Gunrock more than 1.04X when running PageRank and SSSP, while the advantage of Frog is not obvious when running BFS and CC on some datasets especially for RoadNet-CA.« less
Parallel processing considerations for image recognition tasks
NASA Astrophysics Data System (ADS)
Simske, Steven J.
2011-01-01
Many image recognition tasks are well-suited to parallel processing. The most obvious example is that many imaging tasks require the analysis of multiple images. From this standpoint, then, parallel processing need be no more complicated than assigning individual images to individual processors. However, there are three less trivial categories of parallel processing that will be considered in this paper: parallel processing (1) by task; (2) by image region; and (3) by meta-algorithm. Parallel processing by task allows the assignment of multiple workflows-as diverse as optical character recognition [OCR], document classification and barcode reading-to parallel pipelines. This can substantially decrease time to completion for the document tasks. For this approach, each parallel pipeline is generally performing a different task. Parallel processing by image region allows a larger imaging task to be sub-divided into a set of parallel pipelines, each performing the same task but on a different data set. This type of image analysis is readily addressed by a map-reduce approach. Examples include document skew detection and multiple face detection and tracking. Finally, parallel processing by meta-algorithm allows different algorithms to be deployed on the same image simultaneously. This approach may result in improved accuracy.
Differing effects of two synthetic phonics programmes on early reading development.
Shapiro, Laura R; Solity, Jonathan
2016-06-01
Synthetic phonics is the widely accepted approach for teaching reading in English: Children are taught to sound out the letters in a word then blend these sounds together. We compared the impact of two synthetic phonics programmes on early reading. Children received Letters and Sounds (L&S; 7 schools) which teaches multiple letter-sound mappings or Early Reading Research (ERR; 10 schools) which teaches only the most consistent mappings plus frequent words by sight. We measured phonological awareness (PA) and reading from school entry to the end of the second (all schools) or third school year (4 ERR, 3 L&S schools). Phonological awareness was significantly related to all reading measures for the whole sample. However, there was a closer relationship between PA and exception word reading for children receiving the L&S programme. The programmes were equally effective overall, but their impact on reading significantly interacted with school-entry PA: Children with poor PA at school entry achieved higher reading attainments under ERR (significant group difference on exception word reading at the end of the first year), whereas children with good PA performed equally well under either programme. The more intensive phonics programme (L&S) heightened the association between PA and exception word reading. Although the programmes were equally effective for most children, results indicate potential benefits of ERR for children with poor PA. We suggest that phonics programmes could be simplified to teach only the most consistent mappings plus frequent words by sight. © 2015 The British Psychological Society.
Sun, Yongliang; Xu, Yubin; Li, Cheng; Ma, Lin
2013-11-13
A Kalman/map filtering (KMF)-aided fast normalized cross correlation (FNCC)-based Wi-Fi fingerprinting location sensing system is proposed in this paper. Compared with conventional neighbor selection algorithms that calculate localization results with received signal strength (RSS) mean samples, the proposed FNCC algorithm makes use of all the on-line RSS samples and reference point RSS variations to achieve higher fingerprinting accuracy. The FNCC computes efficiently while maintaining the same accuracy as the basic normalized cross correlation. Additionally, a KMF is also proposed to process fingerprinting localization results. It employs a new map matching algorithm to nonlinearize the linear location prediction process of Kalman filtering (KF) that takes advantage of spatial proximities of consecutive localization results. With a calibration model integrated into an indoor map, the map matching algorithm corrects unreasonable prediction locations of the KF according to the building interior structure. Thus, more accurate prediction locations are obtained. Using these locations, the KMF considerably improves fingerprinting algorithm performance. Experimental results demonstrate that the FNCC algorithm with reduced computational complexity outperforms other neighbor selection algorithms and the KMF effectively improves location sensing accuracy by using indoor map information and spatial proximities of consecutive localization results.
Sun, Yongliang; Xu, Yubin; Li, Cheng; Ma, Lin
2013-01-01
A Kalman/map filtering (KMF)-aided fast normalized cross correlation (FNCC)-based Wi-Fi fingerprinting location sensing system is proposed in this paper. Compared with conventional neighbor selection algorithms that calculate localization results with received signal strength (RSS) mean samples, the proposed FNCC algorithm makes use of all the on-line RSS samples and reference point RSS variations to achieve higher fingerprinting accuracy. The FNCC computes efficiently while maintaining the same accuracy as the basic normalized cross correlation. Additionally, a KMF is also proposed to process fingerprinting localization results. It employs a new map matching algorithm to nonlinearize the linear location prediction process of Kalman filtering (KF) that takes advantage of spatial proximities of consecutive localization results. With a calibration model integrated into an indoor map, the map matching algorithm corrects unreasonable prediction locations of the KF according to the building interior structure. Thus, more accurate prediction locations are obtained. Using these locations, the KMF considerably improves fingerprinting algorithm performance. Experimental results demonstrate that the FNCC algorithm with reduced computational complexity outperforms other neighbor selection algorithms and the KMF effectively improves location sensing accuracy by using indoor map information and spatial proximities of consecutive localization results. PMID:24233027
Multisensor Arrays for Greater Reliability and Accuracy
NASA Technical Reports Server (NTRS)
Immer, Christopher; Eckhoff, Anthony; Lane, John; Perotti, Jose; Randazzo, John; Blalock, Norman; Ree, Jeff
2004-01-01
Arrays of multiple, nominally identical sensors with sensor-output-processing electronic hardware and software are being developed in order to obtain accuracy, reliability, and lifetime greater than those of single sensors. The conceptual basis of this development lies in the statistical behavior of multiple sensors and a multisensor-array (MSA) algorithm that exploits that behavior. In addition, advances in microelectromechanical systems (MEMS) and integrated circuits are exploited. A typical sensor unit according to this concept includes multiple MEMS sensors and sensor-readout circuitry fabricated together on a single chip and packaged compactly with a microprocessor that performs several functions, including execution of the MSA algorithm. In the MSA algorithm, the readings from all the sensors in an array at a given instant of time are compared and the reliability of each sensor is quantified. This comparison of readings and quantification of reliabilities involves the calculation of the ratio between every sensor reading and every other sensor reading, plus calculation of the sum of all such ratios. Then one output reading for the given instant of time is computed as a weighted average of the readings of all the sensors. In this computation, the weight for each sensor is the aforementioned value used to quantify its reliability. In an optional variant of the MSA algorithm that can be implemented easily, a running sum of the reliability value for each sensor at previous time steps as well as at the present time step is used as the weight of the sensor in calculating the weighted average at the present time step. In this variant, the weight of a sensor that continually fails gradually decreases, so that eventually, its influence over the output reading becomes minimal: In effect, the sensor system "learns" which sensors to trust and which not to trust. The MSA algorithm incorporates a criterion for deciding whether there remain enough sensor readings that approximate each other sufficiently closely to constitute a majority for the purpose of quantifying reliability. This criterion is, simply, that if there do not exist at least three sensors having weights greater than a prescribed minimum acceptable value, then the array as a whole is deemed to have failed.
Improving depth maps of plants by using a set of five cameras
NASA Astrophysics Data System (ADS)
Kaczmarek, Adam L.
2015-03-01
Obtaining high-quality depth maps and disparity maps with the use of a stereo camera is a challenging task for some kinds of objects. The quality of these maps can be improved by taking advantage of a larger number of cameras. The research on the usage of a set of five cameras to obtain disparity maps is presented. The set consists of a central camera and four side cameras. An algorithm for making disparity maps called multiple similar areas (MSA) is introduced. The algorithm was specially designed for the set of five cameras. Experiments were performed with the MSA algorithm and the stereo matching algorithm based on the sum of sum of squared differences (sum of SSD, SSSD) measure. Moreover, the following measures were included in the experiments: sum of absolute differences (SAD), zero-mean SAD (ZSAD), zero-mean SSD (ZSSD), locally scaled SAD (LSAD), locally scaled SSD (LSSD), normalized cross correlation (NCC), and zero-mean NCC (ZNCC). Algorithms presented were applied to images of plants. Making depth maps of plants is difficult because parts of leaves are similar to each other. The potential usability of the described algorithms is especially high in agricultural applications such as robotic fruit harvesting.
Marinelli, Chiara Valeria; Cellini, Pamela; Zoccolotti, Pierluigi; Angelelli, Paola
This study examined the ability to master lexical processing and use knowledge of the relative frequency of sound-spelling mappings in both reading and spelling. Twenty-four dyslexic and dysgraphic children and 86 typically developing readers were followed longitudinally in 3rd and 5th grades. Effects of word regularity, word frequency, and probability of sound-spelling mappings were examined in two experimental tasks: (a) spelling to dictation; and (b) orthographic judgment. Dyslexic children showed larger regularity and frequency effects than controls in both tasks. Sensitivity to distributional information of sound-spelling mappings was already detected by third grade, indicating early acquisition even in children with dyslexia. Although with notable differences, knowledge of the relative frequencies of sound-spelling mapping influenced both reading and spelling. Results are discussed in terms of their theoretical and empirical implications.
Algorithm for Detecting a Bright Spot in an Image
NASA Technical Reports Server (NTRS)
2009-01-01
An algorithm processes the pixel intensities of a digitized image to detect and locate a circular bright spot, the approximate size of which is known in advance. The algorithm is used to find images of the Sun in cameras aboard the Mars Exploration Rovers. (The images are used in estimating orientations of the Rovers relative to the direction to the Sun.) The algorithm can also be adapted to tracking of circular shaped bright targets in other diverse applications. The first step in the algorithm is to calculate a dark-current ramp a correction necessitated by the scheme that governs the readout of pixel charges in the charge-coupled-device camera in the original Mars Exploration Rover application. In this scheme, the fraction of each frame period during which dark current is accumulated in a given pixel (and, hence, the dark-current contribution to the pixel image-intensity reading) is proportional to the pixel row number. For the purpose of the algorithm, the dark-current contribution to the intensity reading from each pixel is assumed to equal the average of intensity readings from all pixels in the same row, and the factor of proportionality is estimated on the basis of this assumption. Then the product of the row number and the factor of proportionality is subtracted from the reading from each pixel to obtain a dark-current-corrected intensity reading. The next step in the algorithm is to determine the best location, within the overall image, for a window of N N pixels (where N is an odd number) large enough to contain the bright spot of interest plus a small margin. (In the original application, the overall image contains 1,024 by 1,024 pixels, the image of the Sun is about 22 pixels in diameter, and N is chosen to be 29.)
ISRNA: an integrative online toolkit for short reads from high-throughput sequencing data.
Luo, Guan-Zheng; Yang, Wei; Ma, Ying-Ke; Wang, Xiu-Jie
2014-02-01
Integrative Short Reads NAvigator (ISRNA) is an online toolkit for analyzing high-throughput small RNA sequencing data. Besides the high-speed genome mapping function, ISRNA provides statistics for genomic location, length distribution and nucleotide composition bias analysis of sequence reads. Number of reads mapped to known microRNAs and other classes of short non-coding RNAs, coverage of short reads on genes, expression abundance of sequence reads as well as some other analysis functions are also supported. The versatile search functions enable users to select sequence reads according to their sub-sequences, expression abundance, genomic location, relationship to genes, etc. A specialized genome browser is integrated to visualize the genomic distribution of short reads. ISRNA also supports management and comparison among multiple datasets. ISRNA is implemented in Java/C++/Perl/MySQL and can be freely accessed at http://omicslab.genetics.ac.cn/ISRNA/.
Automatic Boosted Flood Mapping from Satellite Data
NASA Technical Reports Server (NTRS)
Coltin, Brian; McMichael, Scott; Smith, Trey; Fong, Terrence
2016-01-01
Numerous algorithms have been proposed to map floods from Moderate Resolution Imaging Spectroradiometer (MODIS) imagery. However, most require human input to succeed, either to specify a threshold value or to manually annotate training data. We introduce a new algorithm based on Adaboost which effectively maps floods without any human input, allowing for a truly rapid and automatic response. The Adaboost algorithm combines multiple thresholds to achieve results comparable to state-of-the-art algorithms which do require human input. We evaluate Adaboost, as well as numerous previously proposed flood mapping algorithms, on multiple MODIS flood images, as well as on hundreds of non-flood MODIS lake images, demonstrating its effectiveness across a wide variety of conditions.
Calculating Higher-Order Moments of Phylogenetic Stochastic Mapping Summaries in Linear Time.
Dhar, Amrit; Minin, Vladimir N
2017-05-01
Stochastic mapping is a simulation-based method for probabilistically mapping substitution histories onto phylogenies according to continuous-time Markov models of evolution. This technique can be used to infer properties of the evolutionary process on the phylogeny and, unlike parsimony-based mapping, conditions on the observed data to randomly draw substitution mappings that do not necessarily require the minimum number of events on a tree. Most stochastic mapping applications simulate substitution mappings only to estimate the mean and/or variance of two commonly used mapping summaries: the number of particular types of substitutions (labeled substitution counts) and the time spent in a particular group of states (labeled dwelling times) on the tree. Fast, simulation-free algorithms for calculating the mean of stochastic mapping summaries exist. Importantly, these algorithms scale linearly in the number of tips/leaves of the phylogenetic tree. However, to our knowledge, no such algorithm exists for calculating higher-order moments of stochastic mapping summaries. We present one such simulation-free dynamic programming algorithm that calculates prior and posterior mapping variances and scales linearly in the number of phylogeny tips. Our procedure suggests a general framework that can be used to efficiently compute higher-order moments of stochastic mapping summaries without simulations. We demonstrate the usefulness of our algorithm by extending previously developed statistical tests for rate variation across sites and for detecting evolutionarily conserved regions in genomic sequences.
Calculating Higher-Order Moments of Phylogenetic Stochastic Mapping Summaries in Linear Time
Dhar, Amrit
2017-01-01
Abstract Stochastic mapping is a simulation-based method for probabilistically mapping substitution histories onto phylogenies according to continuous-time Markov models of evolution. This technique can be used to infer properties of the evolutionary process on the phylogeny and, unlike parsimony-based mapping, conditions on the observed data to randomly draw substitution mappings that do not necessarily require the minimum number of events on a tree. Most stochastic mapping applications simulate substitution mappings only to estimate the mean and/or variance of two commonly used mapping summaries: the number of particular types of substitutions (labeled substitution counts) and the time spent in a particular group of states (labeled dwelling times) on the tree. Fast, simulation-free algorithms for calculating the mean of stochastic mapping summaries exist. Importantly, these algorithms scale linearly in the number of tips/leaves of the phylogenetic tree. However, to our knowledge, no such algorithm exists for calculating higher-order moments of stochastic mapping summaries. We present one such simulation-free dynamic programming algorithm that calculates prior and posterior mapping variances and scales linearly in the number of phylogeny tips. Our procedure suggests a general framework that can be used to efficiently compute higher-order moments of stochastic mapping summaries without simulations. We demonstrate the usefulness of our algorithm by extending previously developed statistical tests for rate variation across sites and for detecting evolutionarily conserved regions in genomic sequences. PMID:28177780
Introducing difference recurrence relations for faster semi-global alignment of long sequences.
Suzuki, Hajime; Kasahara, Masahiro
2018-02-19
The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) operations to accelerate calculation of dynamic programming (DP) matrices in the Smith-Waterman-Gotoh (SWG) algorithm with a fixed alignment start position at the origin. Nonetheless, 16-bit or 32-bit integers are necessary for storing the values in a DP matrix when sequences to be aligned are long; this situation hampers the use of the full SIMD width of modern processors. We proposed a faster semi-global alignment algorithm, "difference recurrence relations," that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1. Instead of calculating and storing all the values in a DP matrix directly, our algorithm computes and stores mainly the differences between the values of adjacent cells in the matrix. Although the SWG algorithm and our algorithm can output exactly the same result, our algorithm mainly involves 8-bit integer operations, enabling us to exploit the full width of SIMD operations (e.g., 32) on modern processors. We also developed a library, libgaba, so that developers can easily integrate our algorithm into alignment programs. Our novel algorithm and optimized library implementation will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages. The library is implemented in the C programming language and available at https://github.com/ocxtal/libgaba .
A novel algorithm for fully automated mapping of geospatial ontologies
NASA Astrophysics Data System (ADS)
Chaabane, Sana; Jaziri, Wassim
2018-01-01
Geospatial information is collected from different sources thus making spatial ontologies, built for the same geographic domain, heterogeneous; therefore, different and heterogeneous conceptualizations may coexist. Ontology integrating helps creating a common repository of the geospatial ontology and allows removing the heterogeneities between the existing ontologies. Ontology mapping is a process used in ontologies integrating and consists in finding correspondences between the source ontologies. This paper deals with the "mapping" process of geospatial ontologies which consist in applying an automated algorithm in finding the correspondences between concepts referring to the definitions of matching relationships. The proposed algorithm called "geographic ontologies mapping algorithm" defines three types of mapping: semantic, topological and spatial.
Chung, Dongjun; Kuan, Pei Fen; Li, Bo; Sanalkumar, Rajendran; Liang, Kun; Bresnick, Emery H; Dewey, Colin; Keleş, Sündüz
2011-07-01
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.
We developed the Generalized Read-Across (GenRA) approach to facilitate automated, algorithmic read across predictions. GenRA uses in vitro bioactivity data in conjunction with chemical information to predict up to 574 different apical outcomes from repeat-dose toxicity studies. ...
Evaluation of microRNA alignment techniques
Kaspi, Antony; El-Osta, Assam
2016-01-01
Genomic alignment of small RNA (smRNA) sequences such as microRNAs poses considerable challenges due to their short length (∼21 nucleotides [nt]) as well as the large size and complexity of plant and animal genomes. While several tools have been developed for high-throughput mapping of longer mRNA-seq reads (>30 nt), there are few that are specifically designed for mapping of smRNA reads including microRNAs. The accuracy of these mappers has not been systematically determined in the case of smRNA-seq. In addition, it is unknown whether these aligners accurately map smRNA reads containing sequence errors and polymorphisms. By using simulated read sets, we determine the alignment sensitivity and accuracy of 16 short-read mappers and quantify their robustness to mismatches, indels, and nontemplated nucleotide additions. These were explored in the context of a plant genome (Oryza sativa, ∼500 Mbp) and a mammalian genome (Homo sapiens, ∼3.1 Gbp). Analysis of simulated and real smRNA-seq data demonstrates that mapper selection impacts differential expression results and interpretation. These results will inform on best practice for smRNA mapping and enable more accurate smRNA detection and quantification of expression and RNA editing. PMID:27284164
ERIC Educational Resources Information Center
Soleimani, Hassan; Nabizadeh, Fatemeh
2012-01-01
Concept maps (CM) are powerful tools which have different uses in educational contexts; however, this study limited its extension and explored its impact on the reading comprehension skill of Iranian EFL students. To this purpose, a proficiency test was employed and 90 intermediate pre-university students were chosen and divided into three groups:…
Fast-SG: an alignment-free algorithm for hybrid assembly.
Di Genova, Alex; Ruz, Gonzalo A; Sagot, Marie-France; Maass, Alejandro
2018-05-01
Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short- and long-read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes. Here, we propose a new method, called Fast-SG, that uses a new ultrafast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures. Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short-read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short-read aligners when building the scaffoldinggraph and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long-read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878). Fast-SG opens a door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost.
Cloud computing-based TagSNP selection algorithm for human genome data.
Hung, Che-Lun; Chen, Wen-Pei; Hua, Guan-Jie; Zheng, Huiru; Tsai, Suh-Jen Jane; Lin, Yaw-Ling
2015-01-05
Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used.
Cloud Computing-Based TagSNP Selection Algorithm for Human Genome Data
Hung, Che-Lun; Chen, Wen-Pei; Hua, Guan-Jie; Zheng, Huiru; Tsai, Suh-Jen Jane; Lin, Yaw-Ling
2015-01-01
Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used. PMID:25569088
Manifold absolute pressure estimation using neural network with hybrid training algorithm
Selamat, Hazlina; Alimin, Ahmad Jais; Haniff, Mohamad Fadzli
2017-01-01
In a modern small gasoline engine fuel injection system, the load of the engine is estimated based on the measurement of the manifold absolute pressure (MAP) sensor, which took place in the intake manifold. This paper present a more economical approach on estimating the MAP by using only the measurements of the throttle position and engine speed, resulting in lower implementation cost. The estimation was done via two-stage multilayer feed-forward neural network by combining Levenberg-Marquardt (LM) algorithm, Bayesian Regularization (BR) algorithm and Particle Swarm Optimization (PSO) algorithm. Based on the results found in 20 runs, the second variant of the hybrid algorithm yields a better network performance than the first variant of hybrid algorithm, LM, LM with BR and PSO by estimating the MAP closely to the simulated MAP values. By using a valid experimental training data, the estimator network that trained with the second variant of the hybrid algorithm showed the best performance among other algorithms when used in an actual retrofit fuel injection system (RFIS). The performance of the estimator was also validated in steady-state and transient condition by showing a closer MAP estimation to the actual value. PMID:29190779
Mapped Landmark Algorithm for Precision Landing
NASA Technical Reports Server (NTRS)
Johnson, Andrew; Ansar, Adnan; Matthies, Larry
2007-01-01
A report discusses a computer vision algorithm for position estimation to enable precision landing during planetary descent. The Descent Image Motion Estimation System for the Mars Exploration Rovers has been used as a starting point for creating code for precision, terrain-relative navigation during planetary landing. The algorithm is designed to be general because it handles images taken at different scales and resolutions relative to the map, and can produce mapped landmark matches for any planetary terrain of sufficient texture. These matches provide a measurement of horizontal position relative to a known landing site specified on the surface map. Multiple mapped landmarks generated per image allow for automatic detection and elimination of bad matches. Attitude and position can be generated from each image; this image-based attitude measurement can be used by the onboard navigation filter to improve the attitude estimate, which will improve the position estimates. The algorithm uses normalized correlation of grayscale images, producing precise, sub-pixel images. The algorithm has been broken into two sub-algorithms: (1) FFT Map Matching (see figure), which matches a single large template by correlation in the frequency domain, and (2) Mapped Landmark Refinement, which matches many small templates by correlation in the spatial domain. Each relies on feature selection, the homography transform, and 3D image correlation. The algorithm is implemented in C++ and is rated at Technology Readiness Level (TRL) 4.
STORMSeq: an open-source, user-friendly pipeline for processing personal genomics data in the cloud.
Karczewski, Konrad J; Fernald, Guy Haskin; Martin, Alicia R; Snyder, Michael; Tatonetti, Nicholas P; Dudley, Joel T
2014-01-01
The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5-10 hours to process a full exome sequence and $30 and 3-8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2.
Sun, Xiaoji; Wang, Xuya; Tang, Zuojian; Grivainis, Mark; Kahler, David; Yun, Chi; Mita, Paolo; Fenyö, David
2018-01-01
Transposable elements (TEs) represent a substantial fraction of many eukaryotic genomes, and transcriptional regulation of these factors is important to determine TE activities in human cells. However, due to the repetitive nature of TEs, identifying transcription factor (TF)-binding sites from ChIP-sequencing (ChIP-seq) datasets is challenging. Current algorithms are focused on subtle differences between TE copies and thus bias the analysis to relatively old and inactive TEs. Here we describe an approach termed “MapRRCon” (mapping repeat reads to a consensus) which allows us to identify proteins binding to TE DNA sequences by mapping ChIP-seq reads to the TE consensus sequence after whole-genome alignment. Although this method does not assign binding sites to individual insertions in the genome, it provides a landscape of interacting TFs by capturing factors that bind to TEs under various conditions. We applied this method to screen TFs’ interaction with L1 in human cells/tissues using ENCODE ChIP-seq datasets and identified 178 of the 512 TFs tested as bound to L1 in at least one biological condition with most of them (138) localized to the promoter. Among these L1-binding factors, we focused on Myc and CTCF, as they play important roles in cancer progression and 3D chromatin structure formation. Furthermore, we explored the transcriptomes of The Cancer Genome Atlas breast and ovarian tumor samples in which a consistent anti-/correlation between L1 and Myc/CTCF expression was observed, suggesting that these two factors may play roles in regulating L1 transcription during the development of such tumors. PMID:29802231
Optimal mapping of neural-network learning on message-passing multicomputers
NASA Technical Reports Server (NTRS)
Chu, Lon-Chan; Wah, Benjamin W.
1992-01-01
A minimization of learning-algorithm completion time is sought in the present optimal-mapping study of the learning process in multilayer feed-forward artificial neural networks (ANNs) for message-passing multicomputers. A novel approximation algorithm for mappings of this kind is derived from observations of the dominance of a parallel ANN algorithm over its communication time. Attention is given to both static and dynamic mapping schemes for systems with static and dynamic background workloads, as well as to experimental results obtained for simulated mappings on multicomputers with dynamic background workloads.
Preciat Gonzalez, German A.; El Assal, Lemmer R. P.; Noronha, Alberto; ...
2017-06-14
The mechanism of each chemical reaction in a metabolic network can be represented as a set of atom mappings, each of which relates an atom in a substrate metabolite to an atom of the same element in a product metabolite. Genome-scale metabolic network reconstructions typically represent biochemistry at the level of reaction stoichiometry. However, a more detailed representation at the underlying level of atom mappings opens the possibility for a broader range of biological, biomedical and biotechnological applications than with stoichiometry alone. Complete manual acquisition of atom mapping data for a genome-scale metabolic network is a laborious process. However, manymore » algorithms exist to predict atom mappings. How do their predictions compare to each other and to manually curated atom mappings? For more than four thousand metabolic reactions in the latest human metabolic reconstruction, Recon 3D, we compared the atom mappings predicted by six atom mapping algorithms. We also compared these predictions to those obtained by manual curation of atom mappings for over five hundred reactions distributed among all top level Enzyme Commission number classes. Five of the evaluated algorithms had similarly high prediction accuracy of over 91% when compared to manually curated atom mapped reactions. On average, the accuracy of the prediction was highest for reactions catalysed by oxidoreductases and lowest for reactions catalysed by ligases. In addition to prediction accuracy, the algorithms were evaluated on their accessibility, their advanced features, such as the ability to identify equivalent atoms, and their ability to map hydrogen atoms. In addition to prediction accuracy, we found that software accessibility and advanced features were fundamental to the selection of an atom mapping algorithm in practice.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Preciat Gonzalez, German A.; El Assal, Lemmer R. P.; Noronha, Alberto
The mechanism of each chemical reaction in a metabolic network can be represented as a set of atom mappings, each of which relates an atom in a substrate metabolite to an atom of the same element in a product metabolite. Genome-scale metabolic network reconstructions typically represent biochemistry at the level of reaction stoichiometry. However, a more detailed representation at the underlying level of atom mappings opens the possibility for a broader range of biological, biomedical and biotechnological applications than with stoichiometry alone. Complete manual acquisition of atom mapping data for a genome-scale metabolic network is a laborious process. However, manymore » algorithms exist to predict atom mappings. How do their predictions compare to each other and to manually curated atom mappings? For more than four thousand metabolic reactions in the latest human metabolic reconstruction, Recon 3D, we compared the atom mappings predicted by six atom mapping algorithms. We also compared these predictions to those obtained by manual curation of atom mappings for over five hundred reactions distributed among all top level Enzyme Commission number classes. Five of the evaluated algorithms had similarly high prediction accuracy of over 91% when compared to manually curated atom mapped reactions. On average, the accuracy of the prediction was highest for reactions catalysed by oxidoreductases and lowest for reactions catalysed by ligases. In addition to prediction accuracy, the algorithms were evaluated on their accessibility, their advanced features, such as the ability to identify equivalent atoms, and their ability to map hydrogen atoms. In addition to prediction accuracy, we found that software accessibility and advanced features were fundamental to the selection of an atom mapping algorithm in practice.« less
Preciat Gonzalez, German A; El Assal, Lemmer R P; Noronha, Alberto; Thiele, Ines; Haraldsdóttir, Hulda S; Fleming, Ronan M T
2017-06-14
The mechanism of each chemical reaction in a metabolic network can be represented as a set of atom mappings, each of which relates an atom in a substrate metabolite to an atom of the same element in a product metabolite. Genome-scale metabolic network reconstructions typically represent biochemistry at the level of reaction stoichiometry. However, a more detailed representation at the underlying level of atom mappings opens the possibility for a broader range of biological, biomedical and biotechnological applications than with stoichiometry alone. Complete manual acquisition of atom mapping data for a genome-scale metabolic network is a laborious process. However, many algorithms exist to predict atom mappings. How do their predictions compare to each other and to manually curated atom mappings? For more than four thousand metabolic reactions in the latest human metabolic reconstruction, Recon 3D, we compared the atom mappings predicted by six atom mapping algorithms. We also compared these predictions to those obtained by manual curation of atom mappings for over five hundred reactions distributed among all top level Enzyme Commission number classes. Five of the evaluated algorithms had similarly high prediction accuracy of over 91% when compared to manually curated atom mapped reactions. On average, the accuracy of the prediction was highest for reactions catalysed by oxidoreductases and lowest for reactions catalysed by ligases. In addition to prediction accuracy, the algorithms were evaluated on their accessibility, their advanced features, such as the ability to identify equivalent atoms, and their ability to map hydrogen atoms. In addition to prediction accuracy, we found that software accessibility and advanced features were fundamental to the selection of an atom mapping algorithm in practice.
How similar are forest disturbance maps derived from different Landsat time series algorithms?
Warren B. Cohen; Sean P. Healey; Zhiqiang Yang; Stephen V. Stehman; C. Kenneth Brewer; Evan B. Brooks; Noel Gorelick; Chengqaun Huang; M. Joseph Hughes; Robert E. Kennedy; Thomas R. Loveland; Gretchen G. Moisen; Todd A. Schroeder; James E. Vogelmann; Curtis E. Woodcock; Limin Yang; Zhe Zhu
2017-01-01
Disturbance is a critical ecological process in forested systems, and disturbance maps are important for understanding forest dynamics. Landsat data are a key remote sensing dataset for monitoring forest disturbance and there recently has been major growth in the development of disturbance mapping algorithms. Many of these algorithms take advantage of the high temporal...
A MAP read-routine for IBM 7094 Fortran II binary tapes
Robert S. Helfman
1966-01-01
Two MAP (Macro Assembly Program) language routines are descrived. They permit Fortran IV programs to read binary tapes generated by Fortran II programs, on the IBM 7090 and 7094 computers. One routine is for use with 7040/44-IBSYS, the other for 7090/94-IBSYS.
Large-scale seismic signal analysis with Hadoop
Addair, T. G.; Dodge, D. A.; Walter, W. R.; ...
2014-02-11
In seismology, waveform cross correlation has been used for years to produce high-precision hypocenter locations and for sensitive detectors. Because correlated seismograms generally are found only at small hypocenter separation distances, correlation detectors have historically been reserved for spotlight purposes. However, many regions have been found to produce large numbers of correlated seismograms, and there is growing interest in building next-generation pipelines that employ correlation as a core part of their operation. In an effort to better understand the distribution and behavior of correlated seismic events, we have cross correlated a global dataset consisting of over 300 million seismograms. Thismore » was done using a conventional distributed cluster, and required 42 days. In anticipation of processing much larger datasets, we have re-architected the system to run as a series of MapReduce jobs on a Hadoop cluster. In doing so we achieved a factor of 19 performance increase on a test dataset. We found that fundamental algorithmic transformations were required to achieve the maximum performance increase. Whereas in the original IO-bound implementation, we went to great lengths to minimize IO, in the Hadoop implementation where IO is cheap, we were able to greatly increase the parallelism of our algorithms by performing a tiered series of very fine-grained (highly parallelizable) transformations on the data. Each of these MapReduce jobs required reading and writing large amounts of data.« less
Large-scale seismic signal analysis with Hadoop
DOE Office of Scientific and Technical Information (OSTI.GOV)
Addair, T. G.; Dodge, D. A.; Walter, W. R.
In seismology, waveform cross correlation has been used for years to produce high-precision hypocenter locations and for sensitive detectors. Because correlated seismograms generally are found only at small hypocenter separation distances, correlation detectors have historically been reserved for spotlight purposes. However, many regions have been found to produce large numbers of correlated seismograms, and there is growing interest in building next-generation pipelines that employ correlation as a core part of their operation. In an effort to better understand the distribution and behavior of correlated seismic events, we have cross correlated a global dataset consisting of over 300 million seismograms. Thismore » was done using a conventional distributed cluster, and required 42 days. In anticipation of processing much larger datasets, we have re-architected the system to run as a series of MapReduce jobs on a Hadoop cluster. In doing so we achieved a factor of 19 performance increase on a test dataset. We found that fundamental algorithmic transformations were required to achieve the maximum performance increase. Whereas in the original IO-bound implementation, we went to great lengths to minimize IO, in the Hadoop implementation where IO is cheap, we were able to greatly increase the parallelism of our algorithms by performing a tiered series of very fine-grained (highly parallelizable) transformations on the data. Each of these MapReduce jobs required reading and writing large amounts of data.« less
Wright, Imogen A; Travers, Simon A
2014-07-01
The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay
2013-01-01
Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.
A hybrid approach for text detection in natural scenes
NASA Astrophysics Data System (ADS)
Wang, Runmin; Sang, Nong; Wang, Ruolin; Kuang, Xiaoqin
2013-10-01
In this paper, a hybrid approach is proposed to detect texts in natural scenes. It is performed by the following steps: Firstly, the edge map and the text saliency region are obtained. Secondly, the text candidate regions are detected by connected components (CC) based method and are identified by an off-line trained HOG classifier. And then, the remaining CCs are grouped into text lines with some heuristic strategies to make up for the false negatives. Finally, the text lines are broken into separate words. The performance of the proposed approach is evaluated on the location detection database of ICDAR 2003 robust reading competition. Experimental results demonstrate the validity of our approach and are competitive with other state-of-the-art algorithms.
Conditional Random Field-Based Offline Map Matching for Indoor Environments
Bataineh, Safaa; Bahillo, Alfonso; Díez, Luis Enrique; Onieva, Enrique; Bataineh, Ikram
2016-01-01
In this paper, we present an offline map matching technique designed for indoor localization systems based on conditional random fields (CRF). The proposed algorithm can refine the results of existing indoor localization systems and match them with the map, using loose coupling between the existing localization system and the proposed map matching technique. The purpose of this research is to investigate the efficiency of using the CRF technique in offline map matching problems for different scenarios and parameters. The algorithm was applied to several real and simulated trajectories of different lengths. The results were then refined and matched with the map using the CRF algorithm. PMID:27537892
Conditional Random Field-Based Offline Map Matching for Indoor Environments.
Bataineh, Safaa; Bahillo, Alfonso; Díez, Luis Enrique; Onieva, Enrique; Bataineh, Ikram
2016-08-16
In this paper, we present an offline map matching technique designed for indoor localization systems based on conditional random fields (CRF). The proposed algorithm can refine the results of existing indoor localization systems and match them with the map, using loose coupling between the existing localization system and the proposed map matching technique. The purpose of this research is to investigate the efficiency of using the CRF technique in offline map matching problems for different scenarios and parameters. The algorithm was applied to several real and simulated trajectories of different lengths. The results were then refined and matched with the map using the CRF algorithm.
Hiding the Disk and Network Latency of Out-of-Core Visualization
NASA Technical Reports Server (NTRS)
Ellsworth, David
2001-01-01
This paper describes an algorithm that improves the performance of application-controlled demand paging for out-of-core visualization by hiding the latency of reading data from both local disks or disks on remote servers. The performance improvements come from better overlapping the computation with the page reading process, and by performing multiple page reads in parallel. The paper includes measurements that show that the new multithreaded paging algorithm decreases the time needed to compute visualizations by one third when using one processor and reading data from local disk. The time needed when using one processor and reading data from remote disk decreased by two thirds. Visualization runs using data from remote disk actually ran faster than ones using data from local disk because the remote runs were able to make use of the remote server's high performance disk array.
Range image registration based on hash map and moth-flame optimization
NASA Astrophysics Data System (ADS)
Zou, Li; Ge, Baozhen; Chen, Lei
2018-03-01
Over the past decade, evolutionary algorithms (EAs) have been introduced to solve range image registration problems because of their robustness and high precision. However, EA-based range image registration algorithms are time-consuming. To reduce the computational time, an EA-based range image registration algorithm using hash map and moth-flame optimization is proposed. In this registration algorithm, a hash map is used to avoid over-exploitation in registration process. Additionally, we present a search equation that is better at exploration and a restart mechanism to avoid being trapped in local minima. We compare the proposed registration algorithm with the registration algorithms using moth-flame optimization and several state-of-the-art EA-based registration algorithms. The experimental results show that the proposed algorithm has a lower computational cost than other algorithms and achieves similar registration precision.
AN AUTOMATIC DEVICE FOR READING TYPOGRAPHICAL TEXTS,
permissible. The system represents an attempt to apply the methods of machines designed for typescript reading to machines reading printed texts...Some characteristics by which typescript and typographical material differ are presented. The basic aspects of the recognition algorithm are given. A
Grados, F; Roux, C; de Vernejoul, M C; Utard, G; Sebert, J L; Fardellone, P
2001-01-01
The assessment of vertebral fracture in patients with osteoporosis by conventional radiography has been improved over the past 10 years using either the semiquantitative (SQ) method devised by Genant et al. or quantitative morphometry. However, there is still no internationally agreed definition for vertebral fracture and there have been few comparative studies between these different approaches. Our study assessed the reproducibility of the SQ method and of four commonly used morphometric algorithms (Melton's, Eastell's, Minne's and McCloskey's methods) for assessing prevalent vertebral fractures, and examined the agreement of each morphometric algorithm with a SQ consensus reading performed by three experts. With this consensus reading in place of a gold standard, we determined relative measures of sensitivity, specificity and optimal cutoff threshold for each morphometric algorithm. The study was conducted in 39 postmenopausal women who had at least one osteoporotic vertebral fracture. Normal values were derived from 84 healthy postmenopausal women with apparently normal vertebral bodies. Our results indicate that the concordance of SQ method was excellent (intraobserver agreement on serial radiographs = 96.4%, kappa = 0.91; agreement between individual readings and the consensus reading = 98%, kappa = 0.95). Three morphometric approaches demonstrated good intra- and interobserver concordance (Melton: intraobserver agreement on serial radiographs = 92.7%, kappa = 0.82, interobserver agreement = 91.1%, kappa = 0.79; Eastell: intraobserver agreement on serial radiographs = 87.6%, kappa = 0.66, interobserver agreement = 88.6%, kappa = 0.68; McCloskey: intraobserver agreement on serial radiographs = 91.5%, kappa = 0.72, interobserver agreement = 93.9%, kappa = 0.78). Except for McCloskey's method, the optimal cutoff thresholds defined in our study by highest kappa score or Youden index in comparison with the SQ consensus reading were near the cutoff thresholds that were arbitrarily fixed. The four morphometric algorithms provided a good agreement with the results of the SQ consensus reading, but the more complex algorithm did not provide better results and even if we adjusted the cutoff threshold, no morphometric algorithm agreed perfectly with the SQ consensus reading. We conclude that morphometric approaches currently used should not be employed alone to detect prevalent vertebral fractures in studies on osteoporosis, but should rather be used in combination with a visual assessment. The SQ approach that allows differential diagnosis of vertebral deformities and has demonstrated a better reproducibility can be employed alone when it is performed by experienced and well-trained readers.
An Annotation Agnostic Algorithm for Detecting Nascent RNA Transcripts in GRO-Seq.
Azofeifa, Joseph G; Allen, Mary A; Lladser, Manuel E; Dowell, Robin D
2017-01-01
We present a fast and simple algorithm to detect nascent RNA transcription in global nuclear run-on sequencing (GRO-seq). GRO-seq is a relatively new protocol that captures nascent transcripts from actively engaged polymerase, providing a direct read-out on bona fide transcription. Most traditional assays, such as RNA-seq, measure steady state RNA levels which are affected by transcription, post-transcriptional processing, and RNA stability. GRO-seq data, however, presents unique analysis challenges that are only beginning to be addressed. Here, we describe a new algorithm, Fast Read Stitcher (FStitch), that takes advantage of two popular machine-learning techniques, hidden Markov models and logistic regression, to classify which regions of the genome are transcribed. Given a small user-defined training set, our algorithm is accurate, robust to varying read depth, annotation agnostic, and fast. Analysis of GRO-seq data without a priori need for annotation uncovers surprising new insights into several aspects of the transcription process.
A Comparison of Some Processing Time Measures Based on Eye Movements. Technical Report No. 285.
ERIC Educational Resources Information Center
Blanchard, Harry E.
A study was conducted to provide a replication of the gaze duration algorithm proposed by M. A. Just and P. A. Carpenter using a different kind of passage, to compare the three gaze duration algorithms that have been proposed by other researchers, and to measure processing time in reading. Fifty-one college students read a passage while their eye…
Weapons Storage Area Survey of 400 Series Buildings at Medina Annex, San Antonio, Texas
2013-06-03
due to build u p of radon daughters Initial Ins trument Readin~ Results Inst. 1 lnst. lnst. Field Lab Sample Gross alpha Gross beta Gross Map...readings are due to build up of radon daughters ReadiJlg Results lust. lust. Field Lab San1ple Gross alpha Map Area Room # Location Inst. 1 Sample...outside) NOTE : High alph readings are due to build up of radon daughters Initial Instrument Reading Results Area on Room # Inst . 1 Inst. 2 Field
2008-01-01
CCA-MAP algorithm are analyzed. Further, we discuss the design considerations of the discussed cooperative localization algorithms to compare and...MAP and CCA-MAP to compare and evaluate their performance. Then a preliminary design analysis is given to address the implementation requirements and...plus précis, avec un nombre inférieur de nœuds ancres, comparativement aux autres types de schémas de localisation. En réalité, les algorithmes de
STORMSeq: An Open-Source, User-Friendly Pipeline for Processing Personal Genomics Data in the Cloud
Karczewski, Konrad J.; Fernald, Guy Haskin; Martin, Alicia R.; Snyder, Michael; Tatonetti, Nicholas P.; Dudley, Joel T.
2014-01-01
The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5–10 hours to process a full exome sequence and $30 and 3–8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2. PMID:24454756
Using Concept Mapping and Paraphrasing for Reading Comprehension
ERIC Educational Resources Information Center
Marashi, Hamid; Bagheri, Nazanin
2015-01-01
This study investigated the comparative impact of two types of teaching techniques, namely concept mapping and paraphrasing, on the reading comprehension of EFL learners. For this purpose, 60 learners of a total number of 90 intermediate learners studying at a language school in Karaj, Iran, were chosen through taking a piloted PET for…
Using Geospatial Analysis to Align Little Free Library Locations with Community Literacy Needs
ERIC Educational Resources Information Center
Rebori, Marlene K.; Burge, Peter
2017-01-01
We used geospatial analysis tools to develop community maps depicting fourth-grade reading proficiency test scores and locations of facilities offering public access to reading materials (i.e., public libraries, elementary schools, and Little Free Libraries). The maps visually highlighted areas with struggling readers and areas without adequate…
Mester, David; Ronin, Yefim; Schnable, Patrick; Aluru, Srinivas; Korol, Abraham
2015-01-01
Our aim was to develop a fast and accurate algorithm for constructing consensus genetic maps for chip-based SNP genotyping data with a high proportion of shared markers between mapping populations. Chip-based genotyping of SNP markers allows producing high-density genetic maps with a relatively standardized set of marker loci for different mapping populations. The availability of a standard high-throughput mapping platform simplifies consensus analysis by ignoring unique markers at the stage of consensus mapping thereby reducing mathematical complicity of the problem and in turn analyzing bigger size mapping data using global optimization criteria instead of local ones. Our three-phase analytical scheme includes automatic selection of ~100-300 of the most informative (resolvable by recombination) markers per linkage group, building a stable skeletal marker order for each data set and its verification using jackknife re-sampling, and consensus mapping analysis based on global optimization criterion. A novel Evolution Strategy optimization algorithm with a global optimization criterion presented in this paper is able to generate high quality, ultra-dense consensus maps, with many thousands of markers per genome. This algorithm utilizes "potentially good orders" in the initial solution and in the new mutation procedures that generate trial solutions, enabling to obtain a consensus order in reasonable time. The developed algorithm, tested on a wide range of simulated data and real world data (Arabidopsis), outperformed two tested state-of-the-art algorithms by mapping accuracy and computation time. PMID:25867943
Clustering single cells: a review of approaches on high-and low-depth single-cell RNA-seq data.
Menon, Vilas
2017-12-11
Advances in single-cell RNA-sequencing technology have resulted in a wealth of studies aiming to identify transcriptomic cell types in various biological systems. There are multiple experimental approaches to isolate and profile single cells, which provide different levels of cellular and tissue coverage. In addition, multiple computational strategies have been proposed to identify putative cell types from single-cell data. From a data generation perspective, recent single-cell studies can be classified into two groups: those that distribute reads shallowly over large numbers of cells and those that distribute reads more deeply over a smaller cell population. Although there are advantages to both approaches in terms of cellular and tissue coverage, it is unclear whether different computational cell type identification methods are better suited to one or the other experimental paradigm. This study reviews three cell type clustering algorithms, each representing one of three broad approaches, and finds that PCA-based algorithms appear most suited to low read depth data sets, whereas gene clustering-based and biclustering algorithms perform better on high read depth data sets. In addition, highly related cell classes are better distinguished by higher-depth data, given the same total number of reads; however, simultaneous discovery of distinct and similar types is better served by lower-depth, higher cell number data. Overall, this study suggests that the depth of profiling should be determined by initial assumptions about the diversity of cells in the population, and that the selection of clustering algorithm(s) is subsequently based on the depth of profiling will allow for better identification of putative transcriptomic cell types. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Decision-level fusion of SAR and IR sensor information for automatic target detection
NASA Astrophysics Data System (ADS)
Cho, Young-Rae; Yim, Sung-Hyuk; Cho, Hyun-Woong; Won, Jin-Ju; Song, Woo-Jin; Kim, So-Hyeon
2017-05-01
We propose a decision-level architecture that combines synthetic aperture radar (SAR) and an infrared (IR) sensor for automatic target detection. We present a new size-based feature, called target-silhouette to reduce the number of false alarms produced by the conventional target-detection algorithm. Boolean Map Visual Theory is used to combine a pair of SAR and IR images to generate the target-enhanced map. Then basic belief assignment is used to transform this map into a belief map. The detection results of sensors are combined to build the target-silhouette map. We integrate the fusion mass and the target-silhouette map on the decision level to exclude false alarms. The proposed algorithm is evaluated using a SAR and IR synthetic database generated by SE-WORKBENCH simulator, and compared with conventional algorithms. The proposed fusion scheme achieves higher detection rate and lower false alarm rate than the conventional algorithms.
Read-across remains a popular data gap filling technique within category and analogue approaches for regulatory purposes. Acceptance of read-across is an ongoing challenge with several efforts underway for identifying and addressing uncertainties. Here we demonstrate an algorithm...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cao Daliang; Earl, Matthew A.; Luan, Shuang
2006-04-15
A new leaf-sequencing approach has been developed that is designed to reduce the number of required beam segments for step-and-shoot intensity modulated radiation therapy (IMRT). This approach to leaf sequencing is called continuous-intensity-map-optimization (CIMO). Using a simulated annealing algorithm, CIMO seeks to minimize differences between the optimized and sequenced intensity maps. Two distinguishing features of the CIMO algorithm are (1) CIMO does not require that each optimized intensity map be clustered into discrete levels and (2) CIMO is not rule-based but rather simultaneously optimizes both the aperture shapes and weights. To test the CIMO algorithm, ten IMRT patient cases weremore » selected (four head-and-neck, two pancreas, two prostate, one brain, and one pelvis). For each case, the optimized intensity maps were extracted from the Pinnacle{sup 3} treatment planning system. The CIMO algorithm was applied, and the optimized aperture shapes and weights were loaded back into Pinnacle. A final dose calculation was performed using Pinnacle's convolution/superposition based dose calculation. On average, the CIMO algorithm provided a 54% reduction in the number of beam segments as compared with Pinnacle's leaf sequencer. The plans sequenced using the CIMO algorithm also provided improved target dose uniformity and a reduced discrepancy between the optimized and sequenced intensity maps. For ten clinical intensity maps, comparisons were performed between the CIMO algorithm and the power-of-two reduction algorithm of Xia and Verhey [Med. Phys. 25(8), 1424-1434 (1998)]. When the constraints of a Varian Millennium multileaf collimator were applied, the CIMO algorithm resulted in a 26% reduction in the number of segments. For an Elekta multileaf collimator, the CIMO algorithm resulted in a 67% reduction in the number of segments. An average leaf sequencing time of less than one minute per beam was observed.« less
Comparison of the Equine Reference Sequence with Its Sanger Source Data and New Illumina Reads
Rebolledo-Mendez, Jovan; Hestand, Matthew S.; Coleman, Stephen J.; Zeng, Zheng; Orlando, Ludovic; MacLeod, James N.; Kalbfleisch, Ted
2015-01-01
The reference assembly for the domestic horse, EquCab2, published in 2009, was built using approximately 30 million Sanger reads from a Thoroughbred mare named Twilight. Contiguity in the assembly was facilitated using nearly 315 thousand BAC end sequences from Twilight’s half brother Bravo. Since then, it has served as the foundation for many genome-wide analyses that include not only the modern horse, but ancient horses and other equid species as well. As data mapped to this reference has accumulated, consistent variation between mapped datasets and the reference, in terms of regions with no read coverage, single nucleotide variants, and small insertions/deletions have become apparent. In many cases, it is not clear whether these differences are the result of true sequence variation between the research subjects’ and Twilight’s genome or due to errors in the reference. EquCab2 is regarded as “The Twilight Assembly.” The objective of this study was to identify inconsistencies between the EquCab2 assembly and the source Twilight Sanger data used to build it. To that end, the original Sanger and BAC end reads have been mapped back to this equine reference and assessed with the addition of approximately 40X coverage of new Illumina Paired-End sequence data. The resulting mapped datasets identify those regions with low Sanger read coverage, as well as variation in genomic content that is not consistent with either the original Twilight Sanger data or the new genomic sequence data generated from Twilight on the Illumina platform. As the haploid EquCab2 reference assembly was created using Sanger reads derived largely from a single individual, the vast majority of variation detected in a mapped dataset comprised of those same Sanger reads should be heterozygous. In contrast, homozygous variations would represent either errors in the reference or contributions from Bravo's BAC end sequences. Our analysis identifies 720,843 homozygous discrepancies between new, high throughput genomic sequence data generated for Twilight and the EquCab2 reference assembly. Most of these represent errors in the assembly, while approximately 10,000 are demonstrated to be contributions from another horse. Other results are presented that include the binary alignment map file of the mapped Sanger reads, a list of variants identified as discrepancies between the source data and resulting reference, and a BED annotation file that lists the regions of the genome whose consensus was likely derived from low coverage alignments. PMID:26107638
Parallel algorithms for mapping pipelined and parallel computations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1988-01-01
Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
ERIC Educational Resources Information Center
Bui, Yvonne N.; Fagan, Yvette M.
2013-01-01
The study evaluated the effects of the Integrated Reading Comprehension Strategy on two levels. The Integrated Reading Comprehension Strategy integrated story grammar instruction and story maps, prior knowledge and prediction method, and word webs through a culturally responsive teaching framework; the Integrated Reading Comprehension Strategy…
A Comparative Study of Hawaii Middle School Science Student Academic Achievement
NASA Astrophysics Data System (ADS)
Askew Cain, Peggy
The problem was middle-grade students with specific learning disabilities (SWDs) in reading comprehension perform less well than their peers on standardized assessments. The purpose of this quantitative comparative study was to examine the effect of electronic concept maps on reading comprehension of eighth grade students with SWD reading comprehension in a Hawaii middle school Grade 8 science class on the island of Oahu. The target population consisted of Grade 8 science students for school year 2015-2016. The sampling method was a purposeful sampling with a final sample size of 338 grade 8 science students. De-identified archival records of grade 8 Hawaii standardized science test scores were analyzed using a one way analysis of variance (ANOVA) in SPSS. The finding for hypothesis 1 indicated a significant difference in student achievement between SWDs and SWODs as measured by Hawaii State Assessment (HSA) in science scores (p < 0.05), and for hypothesis 2, a significant difference in instructional modality for SWDs who used concept maps and does who did not as measured by the Hawaii State Assessment in science (p < 0.05). The implications of the findings (a) SWDs performed less well in science achievement than their peers and consequently, and (b) SWODs appeared to remember greater degrees of science knowledge, and answered more questions correctly than SWDs as a result of reading comprehension. Recommendations for practice were for educational leadership and noted: (a) teachers should practice using concept maps with SWDs as a specific reading strategy to support reading comprehension in science classes, (b) involve a strong focus on vocabulary building and concept building during concept map construction because the construction of concept maps sometimes requires frontloading of vocabulary, and (c) model for teachers how concept maps are created and to explain their educational purpose as a tool for learning. Recommendations for future research were to conduct (a) a quantitative comparative study between groups for academic achievement of subtests mean scores of SWDs and SWODs in physical science, earth science, and space science, and (b) a quantitative correlation study to examine relationships and predictive values for academic achievement of SWDs and concept map integration on standardized science assessments.
Assessing Approaches to Teaching Systems Thinking: Reading Article vs. Game Play
NASA Astrophysics Data System (ADS)
Pfirman, S. L.; O'Garra, T.; Lee, J.; Bachrach, E.; Bachman, G.; Orlove, B. S.
2016-12-01
Problem-solving in the complex domain of climate change requires consideration of the dynamics of systems as wholes. The long time frame, coupled with multiple interacting elements is challenging to teach through traditionally linear approaches, such as lectures or reading. On the other hand some have claimed that games are potentially useful in teaching system skills, due to their iterative, interacting, and problem solving character. In this experiment, we evaluated the impact of the EcoChains: Arctic Crisis card game on participants' mental models using a `fuzzy cognitive mapping' approach. The study population included 41 participants randomly assigned to the treatment/game play n=21 and the control/reading illustrated article: n=20. To obtain cognitive maps from participants, the first step was explaining how to draw a map, using an unrelated map as an example. Following the explanation, participants were handed large sheets of paper and asked to write down all the concepts they could think of related to: Arctic marine & sea-ice ecosystems, including the species & inhabitants of these ecosystems, all the different factors that negatively affect the health of Arctic marine & sea-ice ecosystems, its species & inhabitants, all the different factors that positively affect the health of Arctic marine & sea-ice ecosystems, its species & inhabitants. Once participants had drafted their list of concepts, they were asked to construct maps with the concepts in the center followed by arrows drawn between them to represent the direction of relationships between concepts. After the intervention - either playing the EcoChains card game or reading the illustrated article - participants were handed back their maps, together with a different colored pencil from the one they used previously, and asked to adjust the maps based on what they had learned from playing Ecochains/reading the article. Results indicate that both playing EcoChains and reading an illustrated article with similar information, increased systems thinking. Assessment indicates that the text-based approach had greater quantitative gains as evaluated through the number of nodes and types of connections. Additional analysis is ongoing to examine potential differences in the quality of learning gains.
Lee, Soohyun; Seo, Chae Hwa; Alver, Burak Han; Lee, Sanghyuk; Park, Peter J
2015-09-03
RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar.
2010-04-12
computer graphics, to music . Map L systems [2, 6, 7] extend the parallel rewriting in L systems to planar graphs with cycles, called maps [8]. The maps...carbon. For easy of reading , in the results below the mass is represented by the percentage carbon laminate coverage. As explained previously, the...previously, that value can be read in the left vertical axis. There are nine designs in the Pareto front. Depending on design goals any of these individuals
Automated Speech Rate Measurement in Dysarthria.
Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc
2015-06-01
In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. The new algorithm was trained and tested using Dutch speech samples of 36 speakers with no history of speech impairment and 40 speakers with mild to moderate dysarthria. We tested the algorithm under various conditions: according to speech task type (sentence reading, passage reading, and storytelling) and algorithm optimization method (speaker group optimization and individual speaker optimization). Correlations between automated and human SR determination were calculated for each condition. High correlations between automated and human SR determination were found in the various testing conditions. The new algorithm measures SR in a sufficiently reliable manner. It is currently being integrated in a clinical software tool for assessing and managing prosody in dysarthric speech. Further research is needed to fine-tune the algorithm to severely dysarthric speech, to make the algorithm less sensitive to background noise, and to evaluate how the algorithm deals with syllabic consonants.
ERIC Educational Resources Information Center
Tzeng, Jeng-Yi
2010-01-01
From the perspective of the Fuzzy Trace Theory, this study investigated the impacts of concept maps with two strategic orientations (comprehensive and thematic representations) on readers' performance of cognitive operations (such as perception, verbatim memory, gist reasoning and syntheses) while the readers were reading two history articles that…
Mapping the Chapter: One Way to Tackle the CTE Textbook
ERIC Educational Resources Information Center
Laverick DeFelice, Catherine
2010-01-01
This reading specialist has come up with a strategy to help other CTE instructors map the CTE textbook, so that students can better comprehend the information in them and discover a joy of reading. CTE textbooks present a particular challenge because they are packed with information and can be quite different in structure than texts student have…
Effects of Story Mapping on Third-Grade Students with Attention Deficit Hyperactivity Disorder
ERIC Educational Resources Information Center
Chavez, Jaime N.; Martinez, James; Pienta, Rachel S.
2015-01-01
The purpose of this study was to examine the effects of story mapping on the reading comprehension scores, on-task behaviors, and attitudes of third- -grade students (N = 6) with ADHD. Students' reading grade equivalencies were assessed before and after the study. The teacher-researcher compared two other achievement measures before and during…
Linking Reading and Writing: Concept Mapping as an Organizing Tactic.
ERIC Educational Resources Information Center
Osman-Jouchoux, Rionda
Writers often must summarize others' texts as part of their own work. To succeed at this, they must first read and understand new information and then transform that information to fulfill a specific purpose. Concept mapping, used as a visual organizing technique, can be an effective link between the two processes. In a preliminary study, students…
Using Concept Mapping to Teach Young EFL Learners Reading Skills
ERIC Educational Resources Information Center
Teo, Adeline; Shaw, Yun F.; Chen, Jimmy; Wang, Derek
2016-01-01
Many English as a foreign language (EFL) students fail to be effective readers because they lack knowledge of vocabulary and appropriate reading strategies. We believe that teaching proper reading strategies can help second-language learners overcome their reading problems, especially when the instruction begins in elementary school. Effective…
A real time QRS detection using delay-coordinate mapping for the microcontroller implementation.
Lee, Jeong-Whan; Kim, Kyeong-Seop; Lee, Bongsoo; Lee, Byungchae; Lee, Myoung-Ho
2002-01-01
In this article, we propose a new algorithm using the characteristics of reconstructed phase portraits by delay-coordinate mapping utilizing lag rotundity for a real-time detection of QRS complexes in ECG signals. In reconstructing phase portrait the mapping parameters, time delay, and mapping dimension play important roles in shaping of portraits drawn in a new dimensional space. Experimentally, the optimal mapping time delay for detection of QRS complexes turned out to be 20 ms. To explore the meaning of this time delay and the proper mapping dimension, we applied a fill factor, mutual information, and autocorrelation function algorithm that were generally used to analyze the chaotic characteristics of sampled signals. From these results, we could find the fact that the performance of our proposed algorithms relied mainly on the geometrical property such as an area of the reconstructed phase portrait. For the real application, we applied our algorithm for designing a small cardiac event recorder. This system was to record patients' ECG and R-R intervals for 1 h to investigate HRV characteristics of the patients who had vasovagal syncope symptom and for the evaluation, we implemented our algorithm in C language and applied to MIT/BIH arrhythmia database of 48 subjects. Our proposed algorithm achieved a 99.58% detection rate of QRS complexes.
Holt, Kathryn E; Teo, Yik Y; Li, Heng; Nair, Satheesh; Dougan, Gordon; Wain, John; Parkhill, Julian
2009-08-15
Here, we present a method for estimating the frequencies of SNP alleles present within pooled samples of DNA using high-throughput short-read sequencing. The method was tested on real data from six strains of the highly monomorphic pathogen Salmonella Paratyphi A, sequenced individually and in a pool. A variety of read mapping and quality-weighting procedures were tested to determine the optimal parameters, which afforded > or =80% sensitivity of SNP detection and strong correlation with true SNP frequency at poolwide read depth of 40x, declining only slightly at read depths 20-40x. The method was implemented in Perl and relies on the opensource software Maq for read mapping and SNP calling. The Perl script is freely available from ftp://ftp.sanger.ac.uk/pub/pathogens/pools/.
Preliminary Assessment of the Impact of Culture on Understanding Cartographic Representations
NASA Astrophysics Data System (ADS)
Reolon Schmidt, Marcio Augusto; de Alencar Mendonça, André Luiz; Wieczorek, Małgorzata
2018-05-01
When users read a topographic map, they have to decode the represented information. This decoding passes through various processes in order to perceive, interpret, and understand the reported information. This set of processes is intrinsically a question that is influenced by culture. In particular, when one thinks of maps distributed across the internet or representations of audiences from different origins, the chance of efficient communication is reduced or at least influenced. Therefore, there should be some degree of common visual communication, which the symbology of maps can be applied in order to assure the adequate communication of phenomenon being represented on it. In this context, the present work aims at testing which evaluation factors influence the reading of maps, the understanding of space and reasoning of the map user, in particular national topographic maps. The assessment was through internet considering official map representation from Brazil and Poland and questionnaires. The results shown that conventional topographic maps on the same scale are not capable of producing the correct interpretation of the user from another culture. This means that formal training has a direct influence on the quality of the interpretation and spatial reasoning. Those results indicate that high levels of formal training positively influence the reading and interpretation results of the map and that there is no evidence that the specialists with the symbology of their own country have significantly positive results, when compared to those used maps with systematic mapping from another country.
Lip reading using neural networks
NASA Astrophysics Data System (ADS)
Kalbande, Dhananjay; Mishra, Akassh A.; Patil, Sanjivani; Nirgudkar, Sneha; Patel, Prashant
2011-10-01
Computerized lip reading, or speech reading, is concerned with the difficult task of converting a video signal of a speaking person to written text. It has several applications like teaching deaf and dumb to speak and communicate effectively with the other people, its crime fighting potential and invariance to acoustic environment. We convert the video of the subject speaking vowels into images and then images are further selected manually for processing. However, several factors like fast speech, bad pronunciation, and poor illumination, movement of face, moustaches and beards make lip reading difficult. Contour tracking methods and Template matching are used for the extraction of lips from the face. K Nearest Neighbor algorithm is then used to classify the 'speaking' images and the 'silent' images. The sequence of images is then transformed into segments of utterances. Feature vector is calculated on each frame for all the segments and is stored in the database with properly labeled class. Character recognition is performed using modified KNN algorithm which assigns more weight to nearer neighbors. This paper reports the recognition of vowels using KNN algorithms
Edwards, Jane U; Mauch, Lois; Winkelman, Mark R
2011-02-01
To support curriculum and policy, a midwest city school district assessed the association of selected categories of nutrition and physical activity (NUTR/PA) behaviors, fitness measures, and body mass index (BMI) with academic performance (AP) for 800 sixth graders. Students completed an adapted Youth Risk Behavior Surveillance Survey (NUTR/PA behaviors), fitness assessments (mile run, curl-ups, push-ups, height, and weight) with results matched to standardized scores (Measures of Academic Progress [MAP]), meal price status, and gender. Differences in mean MAP scores (math and reading) were compared by selected categories of each variable utilizing 1-way analysis of variance. Associations were determined by stepwise multiple regression utilizing mean MAP scores (for math and for reading) as the dependent variable and NUTR/PA behaviors, fitness, and BMI categories as independent variables. Significance was set at α = 0.05. Higher MAP math scores were associated with NUTR (more milk and breakfast; less 100% fruit juice and sweetened beverages [SB]) and PA (increased vigorous PA and sports teams; reduced television), and fitness (higher mile run performance). Higher MAP reading scores were associated with NUTR (fewer SB) and PA (increased vigorous PA, reduced television). Regression analysis indicated about 11.1% of the variation in the mean MAP math scores and 6.7% of the mean MAP reading scores could be accounted for by selected NUTR/PA behaviors, fitness, meal price status, and gender. Many positive NUTR/PA behaviors and fitness measures were associated with higher MAP scores supporting the school district focus on healthy lifestyles. Additional factors, including meal price status and gender, contribute to AP. © 2011, Fargo Public School.
Parietotemporal Stimulation Affects Acquisition of Novel Grapheme-Phoneme Mappings in Adult Readers
Younger, Jessica W.; Booth, James R.
2018-01-01
Neuroimaging work from developmental and reading intervention research has suggested a cause of reading failure may be lack of engagement of parietotemporal cortex during initial acquisition of grapheme-phoneme (letter-sound) mappings. Parietotemporal activation increases following grapheme-phoneme learning and successful reading intervention. Further, stimulation of parietotemporal cortex improves reading skill in lower ability adults. However, it is unclear whether these improvements following stimulation are due to enhanced grapheme-phoneme mapping abilities. To test this hypothesis, we used transcranial direct current stimulation (tDCS) to manipulate parietotemporal function in adult readers as they learned a novel artificial orthography with new grapheme-phoneme mappings. Participants received real or sham stimulation to the left inferior parietal lobe (L IPL) for 20 min before training. They received explicit training over the course of 3 days on 10 novel words each day. Learning of the artificial orthography was assessed at a pre-training baseline session, the end of each of the three training sessions, an immediate post-training session and a delayed post-training session about 4 weeks after training. Stimulation interacted with baseline reading skill to affect learning of trained words and transfer to untrained words. Lower skill readers showed better acquisition, whereas higher skill readers showed worse acquisition, when training was paired with real stimulation, as compared to readers who received sham stimulation. However, readers of all skill levels showed better maintenance of trained material following parietotemporal stimulation, indicating a differential effect of stimulation on initial learning and consolidation. Overall, these results indicate that parietotemporal stimulation can enhance learning of new grapheme-phoneme relationships in readers with lower reading skill. Yet, while parietotemporal function is critical to new learning, its role in continued reading improvement likely changes as readers progress in skill. PMID:29628882
An image-space parallel convolution filtering algorithm based on shadow map
NASA Astrophysics Data System (ADS)
Li, Hua; Yang, Huamin; Zhao, Jianping
2017-07-01
Shadow mapping is commonly used in real-time rendering. In this paper, we presented an accurate and efficient method of soft shadows generation from planar area lights. First this method generated a depth map from light's view, and analyzed the depth-discontinuities areas as well as shadow boundaries. Then these areas were described as binary values in the texture map called binary light-visibility map, and a parallel convolution filtering algorithm based on GPU was enforced to smooth out the boundaries with a box filter. Experiments show that our algorithm is an effective shadow map based method that produces perceptually accurate soft shadows in real time with more details of shadow boundaries compared with the previous works.
Hu, Zhihong; Medioni, Gerard G; Hernandez, Matthias; Sadda, Srinivas R
2015-01-01
Geographic atrophy (GA) is a manifestation of the advanced or late stage of age-related macular degeneration (AMD). AMD is the leading cause of blindness in people over the age of 65 in the western world. The purpose of this study is to develop a fully automated supervised pixel classification approach for segmenting GA, including uni- and multifocal patches in fundus autofluorescene (FAF) images. The image features include region-wise intensity measures, gray-level co-occurrence matrix measures, and Gaussian filter banks. A [Formula: see text]-nearest-neighbor pixel classifier is applied to obtain a GA probability map, representing the likelihood that the image pixel belongs to GA. Sixteen randomly chosen FAF images were obtained from 16 subjects with GA. The algorithm-defined GA regions are compared with manual delineation performed by a certified image reading center grader. Eight-fold cross-validation is applied to evaluate the algorithm performance. The mean overlap ratio (OR), area correlation (Pearson's [Formula: see text]), accuracy (ACC), true positive rate (TPR), specificity (SPC), positive predictive value (PPV), and false discovery rate (FDR) between the algorithm- and manually defined GA regions are [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text], respectively.
Improved liver R2* mapping by pixel-wise curve fitting with adaptive neighborhood regularization.
Wang, Changqing; Zhang, Xinyuan; Liu, Xiaoyun; He, Taigang; Chen, Wufan; Feng, Qianjin; Feng, Yanqiu
2018-08-01
To improve liver R2* mapping by incorporating adaptive neighborhood regularization into pixel-wise curve fitting. Magnetic resonance imaging R2* mapping remains challenging because of the serial images with low signal-to-noise ratio. In this study, we proposed to exploit the neighboring pixels as regularization terms and adaptively determine the regularization parameters according to the interpixel signal similarity. The proposed algorithm, called the pixel-wise curve fitting with adaptive neighborhood regularization (PCANR), was compared with the conventional nonlinear least squares (NLS) and nonlocal means filter-based NLS algorithms on simulated, phantom, and in vivo data. Visually, the PCANR algorithm generates R2* maps with significantly reduced noise and well-preserved tiny structures. Quantitatively, the PCANR algorithm produces R2* maps with lower root mean square errors at varying R2* values and signal-to-noise-ratio levels compared with the NLS and nonlocal means filter-based NLS algorithms. For the high R2* values under low signal-to-noise-ratio levels, the PCANR algorithm outperforms the NLS and nonlocal means filter-based NLS algorithms in the accuracy and precision, in terms of mean and standard deviation of R2* measurements in selected region of interests, respectively. The PCANR algorithm can reduce the effect of noise on liver R2* mapping, and the improved measurement precision will benefit the assessment of hepatic iron in clinical practice. Magn Reson Med 80:792-801, 2018. © 2018 International Society for Magnetic Resonance in Medicine. © 2018 International Society for Magnetic Resonance in Medicine.
Muller, Sara; Hider, Samantha L; Raza, Karim; Stack, Rebecca J; Hayward, Richard A; Mallen, Christian D
2015-01-01
Objective Rheumatoid arthritis (RA) is a multisystem, inflammatory disorder associated with increased levels of morbidity and mortality. While much research into the condition is conducted in the secondary care setting, routinely collected primary care databases provide an important source of research data. This study aimed to update an algorithm to define RA that was previously developed and validated in the General Practice Research Database (GPRD). Methods The original algorithm consisted of two criteria. Individuals meeting at least one were considered to have RA. Criterion 1: ≥1 RA Read code and a disease modifying antirheumatic drug (DMARD) without an alternative indication. Criterion 2: ≥2 RA Read codes, with at least one ‘strong’ code and no alternative diagnoses. Lists of codes for consultations and prescriptions were obtained from the authors of the original algorithm where these were available, or compiled based on the original description and clinical knowledge. 4161 people with a first Read code for RA between 1 January 2010 and 31 December 2012 were selected from the Clinical Practice Research Datalink (CPRD, successor to the GPRD), and the criteria applied. Results Code lists were updated for the introduction of new Read codes and biological DMARDs. 3577/4161 (86%) of people met the updated algorithm for RA, compared to 61% in the original development study. 62.8% of people fulfilled both Criterion 1 and Criterion 2. Conclusions Those wishing to define RA in the CPRD, should consider using this updated algorithm, rather than a single RA code, if they wish to identify only those who are most likely to have RA. PMID:26700281
HSA: a heuristic splice alignment tool.
Bu, Jingde; Chi, Xuebin; Jin, Zhong
2013-01-01
RNA-Seq methodology is a revolutionary transcriptomics sequencing technology, which is the representative of Next generation Sequencing (NGS). With the high throughput sequencing of RNA-Seq, we can acquire much more information like differential expression and novel splice variants from deep sequence analysis and data mining. But the short read length brings a great challenge to alignment, especially when the reads span two or more exons. A two steps heuristic splice alignment tool is generated in this investigation. First, map raw reads to reference with unspliced aligner--BWA; second, split initial unmapped reads into three equal short reads (seeds), align each seed to the reference, filter hits, search possible split position of read and extend hits to a complete match. Compare with other splice alignment tools like SOAPsplice and Tophat2, HSA has a better performance in call rate and efficiency, but its results do not as accurate as the other software to some extent. HSA is an effective spliced aligner of RNA-Seq reads mapping, which is available at https://github.com/vlcc/HSA.
Recursive approach to the moment-based phase unwrapping method.
Langley, Jason A; Brice, Robert G; Zhao, Qun
2010-06-01
The moment-based phase unwrapping algorithm approximates the phase map as a product of Gegenbauer polynomials, but the weight function for the Gegenbauer polynomials generates artificial singularities along the edge of the phase map. A method is presented to remove the singularities inherent to the moment-based phase unwrapping algorithm by approximating the phase map as a product of two one-dimensional Legendre polynomials and applying a recursive property of derivatives of Legendre polynomials. The proposed phase unwrapping algorithm is tested on simulated and experimental data sets. The results are then compared to those of PRELUDE 2D, a widely used phase unwrapping algorithm, and a Chebyshev-polynomial-based phase unwrapping algorithm. It was found that the proposed phase unwrapping algorithm provides results that are comparable to those obtained by using PRELUDE 2D and the Chebyshev phase unwrapping algorithm.
Greenberg, D; Istrail, S
1994-09-01
The Human Genome Project requires better software for the creation of physical maps of chromosomes. Current mapping techniques involve breaking large segments of DNA into smaller, more-manageable pieces, gathering information on all the small pieces, and then constructing a map of the original large piece from the information about the small pieces. Unfortunately, in the process of breaking up the DNA some information is lost and noise of various types is introduced; in particular, the order of the pieces is not preserved. Thus, the map maker must solve a combinatorial problem in order to reconstruct the map. Good software is indispensable for quick, accurate reconstruction. The reconstruction is complicated by various experimental errors. A major source of difficulty--which seems to be inherent to the recombination technology--is the presence of chimeric DNA clones. It is fairly common for two disjoint DNA pieces to form a chimera, i.e., a fusion of two pieces which appears as a single piece. Attempts to order chimera will fail unless they are algorithmically divided into their constituent pieces. Despite consensus within the genomic mapping community of the critical importance of correcting chimerism, algorithms for solving the chimeric clone problem have received only passing attention in the literature. Based on a model proposed by Lander (1992a, b) this paper presents the first algorithms for analyzing chimerism. We construct physical maps in the presence of chimerism by creating optimization functions which have minimizations which correlate with map quality. Despite the fact that these optimization functions are invariably NP-complete our algorithms are guaranteed to produce solutions which are close to the optimum. The practical import of using these algorithms depends on the strength of the correlation of the function to the map quality as well as on the accuracy of the approximations. We employ two fundamentally different optimization functions as a means of avoiding biases likely to decorrelate the solutions from the desired map. Experiments on simulated data show that both our algorithm which minimizes the number of chimeric fragments in a solution and our algorithm which minimizes the maximum number of fragments per clone in a solution do, in fact, correlate to high quality solutions. Furthermore, tests on simulated data using parameters set to mimic real experiments show that that the algorithms have the potential to find high quality solutions with real data. We plan to test our software against real data from the Whitehead Institute and from Los Alamos Genomic Research Center in the near future.
Read-across is a popular data gap filling technique within category and analogue approaches for regulatory purposes. Acceptance of read-across remains an ongoing challenge with several efforts underway for identifying and addressing uncertainties. Here we demonstrate an algorithm...
Quasi-conformal mapping with genetic algorithms applied to coordinate transformations
NASA Astrophysics Data System (ADS)
González-Matesanz, F. J.; Malpica, J. A.
2006-11-01
In this paper, piecewise conformal mapping for the transformation of geodetic coordinates is studied. An algorithm, which is an improved version of a previous algorithm published by Lippus [2004a. On some properties of piecewise conformal mappings. Eesti NSV Teaduste Akademmia Toimetised Füüsika-Matemaakika 53, 92-98; 2004b. Transformation of coordinates using piecewise conformal mapping. Journal of Geodesy 78 (1-2), 40] is presented; the improvement comes from using a genetic algorithm to partition the complex plane into convex polygons, whereas the original one did so manually. As a case study, the method is applied to the transformation of the Spanish datum ED50 and ETRS89, and both its advantages and disadvantages are discussed herein.
NASA Astrophysics Data System (ADS)
Janidarmian, Majid; Fekr, Atena Roshan; Bokharaei, Vahhab Samadi
2011-08-01
Mapping algorithm which means which core should be linked to which router is one of the key issues in the design flow of network-on-chip. To achieve an application-specific NoC design procedure that minimizes the communication cost and improves the fault tolerant property, first a heuristic mapping algorithm that produces a set of different mappings in a reasonable time is presented. This algorithm allows the designers to identify the set of most promising solutions in a large design space, which has low communication costs while yielding optimum communication costs in some cases. Another evaluated parameter, vulnerability index, is then considered as a principle of estimating the fault-tolerance property in all produced mappings. Finally, in order to yield a mapping which considers trade-offs between these two parameters, a linear function is defined and introduced. It is also observed that more flexibility to prioritize solutions within the design space is possible by adjusting a set of if-then rules in fuzzy logic.
NASA Technical Reports Server (NTRS)
Rompala, John T.
1992-01-01
Algorithms are presented for determining the size and location of electric charges which model storm systems and lightning strikes. The analysis utilizes readings from a grid of ground level field mills and geometric constraints on parameters to arrive at a representative set of charges. This set is used to generate three dimensional graphical depictions of the set as well as contour maps of the ground level electrical environment over the grid. The composite, analytic and graphic package is demonstrated and evaluated using controlled input data and archived data from a storm system. The results demonstrate the packages utility as: an operational tool in appraising adverse weather conditions; a research tool in studies of topics such as storm structure, storm dynamics, and lightning; and a tool in designing and evaluating grid systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sastry, S. S.; Desoer, C. A.
1980-01-01
Fixed point methods from nonlinear anaysis are used to establish conditions under which the uniform complete controllability of linear time-varying systems is preserved under non-linear perturbations in the state dynamics and the zero-input uniform complete observability of linear time-varying systems is preserved under non-linear perturbation in the state dynamics and output read out map. Algorithms for computing the specific input to steer the perturbed systems from a given initial state to a given final state are also presented. As an application, a very specific emergency control of an interconnected power system is formulated as a steering problem and it ismore » shown that this emergency control is indeed possible in finite time.« less
Asymmetric neighborhood functions accelerate ordering process of self-organizing maps
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ota, Kaiichiro; Aoki, Takaaki; Kurata, Koji
2011-02-15
A self-organizing map (SOM) algorithm can generate a topographic map from a high-dimensional stimulus space to a low-dimensional array of units. Because a topographic map preserves neighborhood relationships between the stimuli, the SOM can be applied to certain types of information processing such as data visualization. During the learning process, however, topological defects frequently emerge in the map. The presence of defects tends to drastically slow down the formation of a globally ordered topographic map. To remove such topological defects, it has been reported that an asymmetric neighborhood function is effective, but only in the simple case of mapping one-dimensionalmore » stimuli to a chain of units. In this paper, we demonstrate that even when high-dimensional stimuli are used, the asymmetric neighborhood function is effective for both artificial and real-world data. Our results suggest that applying the asymmetric neighborhood function to the SOM algorithm improves the reliability of the algorithm. In addition, it enables processing of complicated, high-dimensional data by using this algorithm.« less
Modeling the Effects of Reading Lessons on Text Processing.
ERIC Educational Resources Information Center
Omanson, Richard C.; And Others
A study evaluated the effectiveness of various models constructed to account for how children read and comprehended a story presented in a directed reading lesson. A commercial directed reading lesson was revised to introduce information related to the story and to help the children form a "map" of the central story content. Data were…
How Fifth-Grade Students Use Story Mapping to Aid Their Reading Comprehension
ERIC Educational Resources Information Center
Weih, Timothy G.
2000-01-01
This thesis examined the reading comprehension process of three fifth-grade students who demonstrated the ability to read and write fluently but had difficulties remembering and understanding important information about what they read. The aim of this research was to develop and implement an effective teaching strategy for low-achieving students…
Reading on Paper and Screen among Senior Adults: Cognitive Map and Technophobia
Hou, Jinghui; Wu, Yijie; Harrell, Erin
2017-01-01
While the senior population has been increasingly engaged with reading on mobile technologies, research that specifically documents the impact of technologies on reading for this age group has still been lacking. The present study investigated how different reading media (screen versus paper) might result in different reading outcomes among older adults due to both cognitive and psychological factors. Using a laboratory experiment with 81participants aged 57 to 85, our results supported past research and showed the influence of cognitive map formation on readers’ feelings of fatigue. We contributed empirical evidence to the contention that reading on a screen could match that of reading from paper if the presentation of the text on screen resemble that of the print. Our findings also suggested that individual levels of technophobia was an important barrier to older adults’ effective use of mobile technologies for reading. In the post hoc analyses, we further showed that technophobia was correlated with technology experience, certain personality traits, and age. The present study highlights the importance of providing tailored support that helps older adults overcome psychological obstacles in using technologies. PMID:29312073
S-MART, a software toolbox to aid RNA-Seq data analysis.
Zytnicki, Matthias; Quesneville, Hadi
2011-01-01
High-throughput sequencing is now routinely performed in many experiments. But the analysis of the millions of sequences generated, is often beyond the expertise of the wet labs who have no personnel specializing in bioinformatics. Whereas several tools are now available to map high-throughput sequencing data on a genome, few of these can extract biological knowledge from the mapped reads. We have developed a toolbox called S-MART, which handles mapped RNA-Seq data. S-MART is an intuitive and lightweight tool which performs many of the tasks usually required for the analysis of mapped RNA-Seq reads. S-MART does not require any computer science background and thus can be used by all of the biologist community through a graphical interface. S-MART can run on any personal computer, yielding results within an hour even for Gb of data for most queries. S-MART may perform the entire analysis of the mapped reads, without any need for other ad hoc scripts. With this tool, biologists can easily perform most of the analyses on their computer for their RNA-Seq data, from the mapped data to the discovery of important loci.
S-MART, A Software Toolbox to Aid RNA-seq Data Analysis
Zytnicki, Matthias; Quesneville, Hadi
2011-01-01
High-throughput sequencing is now routinely performed in many experiments. But the analysis of the millions of sequences generated, is often beyond the expertise of the wet labs who have no personnel specializing in bioinformatics. Whereas several tools are now available to map high-throughput sequencing data on a genome, few of these can extract biological knowledge from the mapped reads. We have developed a toolbox called S-MART, which handles mapped RNA-Seq data. S-MART is an intuitive and lightweight tool which performs many of the tasks usually required for the analysis of mapped RNA-Seq reads. S-MART does not require any computer science background and thus can be used by all of the biologist community through a graphical interface. S-MART can run on any personal computer, yielding results within an hour even for Gb of data for most queries. S-MART may perform the entire analysis of the mapped reads, without any need for other ad hoc scripts. With this tool, biologists can easily perform most of the analyses on their computer for their RNA-Seq data, from the mapped data to the discovery of important loci. PMID:21998740
Text image authenticating algorithm based on MD5-hash function and Henon map
NASA Astrophysics Data System (ADS)
Wei, Jinqiao; Wang, Ying; Ma, Xiaoxue
2017-07-01
In order to cater to the evidentiary requirements of the text image, this paper proposes a fragile watermarking algorithm based on Hash function and Henon map. The algorithm is to divide a text image into parts, get flippable pixels and nonflippable pixels of every lump according to PSD, generate watermark of non-flippable pixels with MD5-Hash, encrypt watermark with Henon map and select embedded blocks. The simulation results show that the algorithm with a good ability in tampering localization can be used to authenticate and forensics the authenticity and integrity of text images
Flattening maps for the visualization of multibranched vessels.
Zhu, Lei; Haker, Steven; Tannenbaum, Allen
2005-02-01
In this paper, we present two novel algorithms which produce flattened visualizations of branched physiological surfaces, such as vessels. The first approach is a conformal mapping algorithm based on the minimization of two Dirichlet functionals. From a triangulated representation of vessel surfaces, we show how the algorithm can be implemented using a finite element technique. The second method is an algorithm which adjusts the conformal mapping to produce a flattened representation of the original surface while preserving areas. This approach employs the theory of optimal mass transport. Furthermore, a new way of extracting center lines for vessel fly-throughs is provided.
Flattening Maps for the Visualization of Multibranched Vessels
Zhu, Lei; Haker, Steven; Tannenbaum, Allen
2013-01-01
In this paper, we present two novel algorithms which produce flattened visualizations of branched physiological surfaces, such as vessels. The first approach is a conformal mapping algorithm based on the minimization of two Dirichlet functionals. From a triangulated representation of vessel surfaces, we show how the algorithm can be implemented using a finite element technique. The second method is an algorithm which adjusts the conformal mapping to produce a flattened representation of the original surface while preserving areas. This approach employs the theory of optimal mass transport. Furthermore, a new way of extracting center lines for vessel fly-throughs is provided. PMID:15707245
Classification of fMRI resting-state maps using machine learning techniques: A comparative study
NASA Astrophysics Data System (ADS)
Gallos, Ioannis; Siettos, Constantinos
2017-11-01
We compare the efficiency of Principal Component Analysis (PCA) and nonlinear learning manifold algorithms (ISOMAP and Diffusion maps) for classifying brain maps between groups of schizophrenia patients and healthy from fMRI scans during a resting-state experiment. After a standard pre-processing pipeline, we applied spatial Independent component analysis (ICA) to reduce (a) noise and (b) spatial-temporal dimensionality of fMRI maps. On the cross-correlation matrix of the ICA components, we applied PCA, ISOMAP and Diffusion Maps to find an embedded low-dimensional space. Finally, support-vector-machines (SVM) and k-NN algorithms were used to evaluate the performance of the algorithms in classifying between the two groups.
Indel-tolerant read mapping with trinucleotide frequencies using cache-oblivious kd-trees.
Mahmud, Md Pavel; Wiedenhoeft, John; Schliep, Alexander
2012-09-15
Mapping billions of reads from next generation sequencing experiments to reference genomes is a crucial task, which can require hundreds of hours of running time on a single CPU even for the fastest known implementations. Traditional approaches have difficulties dealing with matches of large edit distance, particularly in the presence of frequent or large insertions and deletions (indels). This is a serious obstacle both in determining the spectrum and abundance of genetic variations and in personal genomics. For the first time, we adopt the approximate string matching paradigm of geometric embedding to read mapping, thus rephrasing it to nearest neighbor queries in a q-gram frequency vector space. Using the L(1) distance between frequency vectors has the benefit of providing lower bounds for an edit distance with affine gap costs. Using a cache-oblivious kd-tree, we realize running times, which match the state-of-the-art. Additionally, running time and memory requirements are about constant for read lengths between 100 and 1000 bp. We provide a first proof-of-concept that geometric embedding is a promising paradigm for read mapping and that L(1) distance might serve to detect structural variations. TreQ, our initial implementation of that concept, performs more accurate than many popular read mappers over a wide range of structural variants. TreQ will be released under the GNU Public License (GPL), and precomputed genome indices will be provided for download at http://treq.sf.net. pavelm@cs.rutgers.edu Supplementary data are available at Bioinformatics online.
Indel-tolerant read mapping with trinucleotide frequencies using cache-oblivious kd-trees
Mahmud, Md Pavel; Wiedenhoeft, John; Schliep, Alexander
2012-01-01
Motivation: Mapping billions of reads from next generation sequencing experiments to reference genomes is a crucial task, which can require hundreds of hours of running time on a single CPU even for the fastest known implementations. Traditional approaches have difficulties dealing with matches of large edit distance, particularly in the presence of frequent or large insertions and deletions (indels). This is a serious obstacle both in determining the spectrum and abundance of genetic variations and in personal genomics. Results: For the first time, we adopt the approximate string matching paradigm of geometric embedding to read mapping, thus rephrasing it to nearest neighbor queries in a q-gram frequency vector space. Using the L1 distance between frequency vectors has the benefit of providing lower bounds for an edit distance with affine gap costs. Using a cache-oblivious kd-tree, we realize running times, which match the state-of-the-art. Additionally, running time and memory requirements are about constant for read lengths between 100 and 1000 bp. We provide a first proof-of-concept that geometric embedding is a promising paradigm for read mapping and that L1 distance might serve to detect structural variations. TreQ, our initial implementation of that concept, performs more accurate than many popular read mappers over a wide range of structural variants. Availability and implementation: TreQ will be released under the GNU Public License (GPL), and precomputed genome indices will be provided for download at http://treq.sf.net. Contact: pavelm@cs.rutgers.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22962448
Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver.
Wymant, Chris; Blanquart, François; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J; Hall, Matthew; Hillebregt, Mariska; Ong, Swee Hoe; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M Kate; Gunsenheimer-Bartmeyer, Barbara; Günthard, Huldrych F; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Berkhout, Ben; Cornelissen, Marion; Kellam, Paul; Reiss, Peter; Fraser, Christophe
2018-01-01
Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user's choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver's constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver.
Saurin, Jean-Christophe; Jacob, Philippe; Heyries, Laurent; Pesanti, Christian; Cholet, Franck; Fassler, Isaac; Boulant, James; Bramli, Slim; De Leusse, Antoin; Rahmi, Gabriel
2018-05-01
Reducing the reading time of capsule endoscopy films is of high priority for gastroenterologists. We report a prospective multicenter evaluation of an "express view" reading mode (Intromedic capsule system). Eighty-three patients with obscure gastrointestinal bleeding were prospectively included in 10 centers. All patients underwent small-bowel capsule endoscopy (Intromedic, Seoul, Republic of Korea). Films were read in standard mode, then a second reading was performed in express view mode at a second center. For each lesion, the precise location, nature, and relevance were collected. A consensus reading and review were done by three experts, and considered to be the gold standard. The mean reading time of capsule films was 39.7 minutes (11 - 180 minutes) and 19.7 minutes (4 - 40 minutes) by standard and express view mode, respectively ( P < 1 × 10 - 4 ). The consensus review identified a significant lesion in 44/83 patients (53.0 %). Standard reading and express view reading had a 93.3 % and 82.2 % sensitivity, respectively (NS). Consensus review identified 70 significant images from which standard reading and express view reading detected 58 (82.9 %) and 55 (78.6 %), respectively. The informatics algorithm detected 66/70 images (94.3 %) thus missing four small-bowel angiodysplasia. The express view algorithm allows an important shortening of Intromedic capsule film reading time with a high sensitivity.
Automatic and efficient methods applied to the binarization of a subway map
NASA Astrophysics Data System (ADS)
Durand, Philippe; Ghorbanzadeh, Dariush; Jaupi, Luan
2015-12-01
The purpose of this paper is the study of efficient methods for image binarization. The objective of the work is the metro maps binarization. the goal is to binarize, avoiding noise to disturb the reading of subway stations. Different methods have been tested. By this way, a method given by Otsu gives particularly interesting results. The difficulty of the binarization is the choice of this threshold in order to reconstruct. Image sticky as possible to reality. Vectorization is a step subsequent to that of the binarization. It is to retrieve the coordinates points containing information and to store them in the two matrices X and Y. Subsequently, these matrices can be exported to a file format 'CSV' (Comma Separated Value) enabling us to deal with them in a variety of software including Excel. The algorithm uses quite a time calculation in Matlab because it is composed of two "for" loops nested. But the "for" loops are poorly supported by Matlab, especially in each other. This therefore penalizes the computation time, but seems the only method to do this.
G-CNV: A GPU-Based Tool for Preparing Data to Detect CNVs with Read-Depth Methods.
Manconi, Andrea; Manca, Emanuele; Moscatelli, Marco; Gnocchi, Matteo; Orro, Alessandro; Armano, Giuliano; Milanesi, Luciano
2015-01-01
Copy number variations (CNVs) are the most prevalent types of structural variations (SVs) in the human genome and are involved in a wide range of common human diseases. Different computational methods have been devised to detect this type of SVs and to study how they are implicated in human diseases. Recently, computational methods based on high-throughput sequencing (HTS) are increasingly used. The majority of these methods focus on mapping short-read sequences generated from a donor against a reference genome to detect signatures distinctive of CNVs. In particular, read-depth based methods detect CNVs by analyzing genomic regions with significantly different read-depth from the other ones. The pipeline analysis of these methods consists of four main stages: (i) data preparation, (ii) data normalization, (iii) CNV regions identification, and (iv) copy number estimation. However, available tools do not support most of the operations required at the first two stages of this pipeline. Typically, they start the analysis by building the read-depth signal from pre-processed alignments. Therefore, third-party tools must be used to perform most of the preliminary operations required to build the read-depth signal. These data-intensive operations can be efficiently parallelized on graphics processing units (GPUs). In this article, we present G-CNV, a GPU-based tool devised to perform the common operations required at the first two stages of the analysis pipeline. G-CNV is able to filter low-quality read sequences, to mask low-quality nucleotides, to remove adapter sequences, to remove duplicated read sequences, to map the short-reads, to resolve multiple mapping ambiguities, to build the read-depth signal, and to normalize it. G-CNV can be efficiently used as a third-party tool able to prepare data for the subsequent read-depth signal generation and analysis. Moreover, it can also be integrated in CNV detection tools to generate read-depth signals.
Doble, Brett; Lorgelly, Paula
2016-04-01
To determine the external validity of existing mapping algorithms for predicting EQ-5D-3L utility values from EORTC QLQ-C30 responses and to establish their generalizability in different types of cancer. A main analysis (pooled) sample of 3560 observations (1727 patients) and two disease severity patient samples (496 and 93 patients) with repeated observations over time from Cancer 2015 were used to validate the existing algorithms. Errors were calculated between observed and predicted EQ-5D-3L utility values using a single pooled sample and ten pooled tumour type-specific samples. Predictive accuracy was assessed using mean absolute error (MAE) and standardized root-mean-squared error (RMSE). The association between observed and predicted EQ-5D utility values and other covariates across the distribution was tested using quantile regression. Quality-adjusted life years (QALYs) were calculated using observed and predicted values to test responsiveness. Ten 'preferred' mapping algorithms were identified. Two algorithms estimated via response mapping and ordinary least-squares regression using dummy variables performed well on number of validation criteria, including accurate prediction of the best and worst QLQ-C30 health states, predicted values within the EQ-5D tariff range, relatively small MAEs and RMSEs, and minimal differences between estimated QALYs. Comparison of predictive accuracy across ten tumour type-specific samples highlighted that algorithms are relatively insensitive to grouping by tumour type and affected more by differences in disease severity. Two of the 'preferred' mapping algorithms suggest more accurate predictions, but limitations exist. We recommend extensive scenario analyses if mapped utilities are used in cost-utility analyses.
Fatyga, Mirek; Dogan, Nesrin; Weiss, Elizabeth; Sleeman, William C; Zhang, Baoshe; Lehman, William J; Williamson, Jeffrey F; Wijesooriya, Krishni; Christensen, Gary E
2015-01-01
Commonly used methods of assessing the accuracy of deformable image registration (DIR) rely on image segmentation or landmark selection. These methods are very labor intensive and thus limited to relatively small number of image pairs. The direct voxel-by-voxel comparison can be automated to examine fluctuations in DIR quality on a long series of image pairs. A voxel-by-voxel comparison of three DIR algorithms applied to lung patients is presented. Registrations are compared by comparing volume histograms formed both with individual DIR maps and with a voxel-by-voxel subtraction of the two maps. When two DIR maps agree one concludes that both maps are interchangeable in treatment planning applications, though one cannot conclude that either one agrees with the ground truth. If two DIR maps significantly disagree one concludes that at least one of the maps deviates from the ground truth. We use the method to compare 3 DIR algorithms applied to peak inhale-peak exhale registrations of 4DFBCT data obtained from 13 patients. All three algorithms appear to be nearly equivalent when compared using DICE similarity coefficients. A comparison based on Jacobian volume histograms shows that all three algorithms measure changes in total volume of the lungs with reasonable accuracy, but show large differences in the variance of Jacobian distribution on contoured structures. Analysis of voxel-by-voxel subtraction of DIR maps shows differences between algorithms that exceed a centimeter for some registrations. Deformation maps produced by DIR algorithms must be treated as mathematical approximations of physical tissue deformation that are not self-consistent and may thus be useful only in applications for which they have been specifically validated. The three algorithms tested in this work perform fairly robustly for the task of contour propagation, but produce potentially unreliable results for the task of DVH accumulation or measurement of local volume change. Performance of DIR algorithms varies significantly from one image pair to the next hence validation efforts, which are exhaustive but performed on a small number of image pairs may not reflect the performance of the same algorithm in practical clinical situations. Such efforts should be supplemented by validation based on a longer series of images of clinical quality.
NASA Astrophysics Data System (ADS)
Jia, Duo; Wang, Cangjiao; Lei, Shaogang
2018-01-01
Mapping vegetation dynamic types in mining areas is significant for revealing the mechanisms of environmental damage and for guiding ecological construction. Dynamic types of vegetation can be identified by applying interannual normalized difference vegetation index (NDVI) time series. However, phase differences and time shifts in interannual time series decrease mapping accuracy in mining regions. To overcome these problems and to increase the accuracy of mapping vegetation dynamics, an interannual Landsat time series for optimum vegetation growing status was constructed first by using the enhanced spatial and temporal adaptive reflectance fusion model algorithm. We then proposed a Markov random field optimized semisupervised Gaussian dynamic time warping kernel-based fuzzy c-means (FCM) cluster algorithm for interannual NDVI time series to map dynamic vegetation types in mining regions. The proposed algorithm has been tested in the Shengli mining region and Shendong mining region, which are typical representatives of China's open-pit and underground mining regions, respectively. Experiments show that the proposed algorithm can solve the problems of phase differences and time shifts to achieve better performance when mapping vegetation dynamic types. The overall accuracies for the Shengli and Shendong mining regions were 93.32% and 89.60%, respectively, with improvements of 7.32% and 25.84% when compared with the original semisupervised FCM algorithm.
A novel image encryption algorithm based on chaos maps with Markov properties
NASA Astrophysics Data System (ADS)
Liu, Quan; Li, Pei-yue; Zhang, Ming-chao; Sui, Yong-xin; Yang, Huai-jiang
2015-02-01
In order to construct high complexity, secure and low cost image encryption algorithm, a class of chaos with Markov properties was researched and such algorithm was also proposed. The kind of chaos has higher complexity than the Logistic map and Tent map, which keeps the uniformity and low autocorrelation. An improved couple map lattice based on the chaos with Markov properties is also employed to cover the phase space of the chaos and enlarge the key space, which has better performance than the original one. A novel image encryption algorithm is constructed on the new couple map lattice, which is used as a key stream generator. A true random number is used to disturb the key which can dynamically change the permutation matrix and the key stream. From the experiments, it is known that the key stream can pass SP800-22 test. The novel image encryption can resist CPA and CCA attack and differential attack. The algorithm is sensitive to the initial key and can change the distribution the pixel values of the image. The correlation of the adjacent pixels can also be eliminated. When compared with the algorithm based on Logistic map, it has higher complexity and better uniformity, which is nearer to the true random number. It is also efficient to realize which showed its value in common use.
A Double Perturbation Method for Reducing Dynamical Degradation of the Digital Baker Map
NASA Astrophysics Data System (ADS)
Liu, Lingfeng; Lin, Jun; Miao, Suoxia; Liu, Bocheng
2017-06-01
The digital Baker map is widely used in different kinds of cryptosystems, especially for image encryption. However, any chaotic map which is realized on the finite precision device (e.g. computer) will suffer from dynamical degradation, which refers to short cycle lengths, low complexity and strong correlations. In this paper, a novel double perturbation method is proposed for reducing the dynamical degradation of the digital Baker map. Both state variables and system parameters are perturbed by the digital logistic map. Numerical experiments show that the perturbed Baker map can achieve good statistical and cryptographic properties. Furthermore, a new image encryption algorithm is provided as a simple application. With a rather simple algorithm, the encrypted image can achieve high security, which is competitive to the recently proposed image encryption algorithms.
Compression of next-generation sequencing reads aided by highly efficient de novo assembly
Jones, Daniel C.; Ruzzo, Walter L.; Peng, Xinxia
2012-01-01
We present Quip, a lossless compression algorithm for next-generation sequencing data in the FASTQ and SAM/BAM formats. In addition to implementing reference-based compression, we have developed, to our knowledge, the first assembly-based compressor, using a novel de novo assembly algorithm. A probabilistic data structure is used to dramatically reduce the memory required by traditional de Bruijn graph assemblers, allowing millions of reads to be assembled very efficiently. Read sequences are then stored as positions within the assembled contigs. This is combined with statistical compression of read identifiers, quality scores, alignment information and sequences, effectively collapsing very large data sets to <15% of their original size with no loss of information. Availability: Quip is freely available under the 3-clause BSD license from http://cs.washington.edu/homes/dcjones/quip. PMID:22904078
Understanding disordered systems through numerical simulation and algorithm development
NASA Astrophysics Data System (ADS)
Sweeney, Sean Michael
Disordered systems arise in many physical contexts. Not all matter is uniform, and impurities or heterogeneities can be modeled by fixed random disorder. Numerous complex networks also possess fixed disorder, leading to applications in transportation systems, telecommunications, social networks, and epidemic modeling, to name a few. Due to their random nature and power law critical behavior, disordered systems are difficult to study analytically. Numerical simulation can help overcome this hurdle by allowing for the rapid computation of system states. In order to get precise statistics and extrapolate to the thermodynamic limit, large systems must be studied over many realizations. Thus, innovative algorithm development is essential in order reduce memory or running time requirements of simulations. This thesis presents a review of disordered systems, as well as a thorough study of two particular systems through numerical simulation, algorithm development and optimization, and careful statistical analysis of scaling properties. Chapter 1 provides a thorough overview of disordered systems, the history of their study in the physics community, and the development of techniques used to study them. Topics of quenched disorder, phase transitions, the renormalization group, criticality, and scale invariance are discussed. Several prominent models of disordered systems are also explained. Lastly, analysis techniques used in studying disordered systems are covered. In Chapter 2, minimal spanning trees on critical percolation clusters are studied, motivated in part by an analytic perturbation expansion by Jackson and Read that I check against numerical calculations. This system has a direct mapping to the ground state of the strongly disordered spin glass. We compute the path length fractal dimension of these trees in dimensions d = {2, 3, 4, 5} and find our results to be compatible with the analytic results suggested by Jackson and Read. In Chapter 3, the random bond Ising ferromagnet is studied, which is especially useful since it serves as a prototype for more complicated disordered systems such as the random field Ising model and spin glasses. We investigate the effect that changing boundary spins has on the locations of domain walls in the interior of the random ferromagnet system. We provide an analytic proof that ground state domain walls in the two dimensional system are decomposable, and we map these domain walls to a shortest paths problem. By implementing a multiple-source shortest paths algorithm developed by Philip Klein, we are able to efficiently probe domain wall locations for all possible configurations of boundary spins. We consider lattices with uncorrelated dis- order, as well as disorder that is spatially correlated according to a power law. We present numerical results for the scaling exponent governing the probability that a domain wall can be induced that passes through a particular location in the system's interior, and we compare these results to previous results on the directed polymer problem.
Growing a hypercubical output space in a self-organizing feature map.
Bauer, H U; Villmann, T
1997-01-01
Neural maps project data from an input space onto a neuron position in a (often lower dimensional) output space grid in a neighborhood preserving way, with neighboring neurons in the output space responding to neighboring data points in the input space. A map-learning algorithm can achieve an optimal neighborhood preservation only, if the output space topology roughly matches the effective structure of the data in the input space. We here present a growth algorithm, called the GSOM or growing self-organizing map, which enhances a widespread map self-organization process, Kohonen's self-organizing feature map (SOFM), by an adaptation of the output space grid during learning. The GSOM restricts the output space structure to the shape of a general hypercubical shape, with the overall dimensionality of the grid and its extensions along the different directions being subject of the adaptation. This constraint meets the demands of many larger information processing systems, of which the neural map can be a part. We apply our GSOM-algorithm to three examples, two of which involve real world data. Using recently developed methods for measuring the degree of neighborhood preservation in neural maps, we find the GSOM-algorithm to produce maps which preserve neighborhoods in a nearly optimal fashion.
USDA-ARS?s Scientific Manuscript database
BACKGROUND: Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain u...
The Structure-Mapping Engine: Algorithm and Examples.
ERIC Educational Resources Information Center
Falkenhainer, Brian; And Others
This description of the Structure-Mapping Engine (SME), a flexible, cognitive simulation program for studying analogical processing which is based on Gentner's Structure-Mapping theory of analogy, points out that the SME provides a "tool kit" for constructing matching algorithms consistent with this theory. This report provides: (1) a…
Hsu, Chia-Cheng; Chen, Hsin-Chin; Su, Yen-Ning; Huang, Kuo-Kuang; Huang, Yueh-Min
2012-10-22
A growing number of educational studies apply sensors to improve student learning in real classroom settings. However, how can sensors be integrated into classrooms to help instructors find out students' reading concentration rates and thus better increase learning effectiveness? The aim of the current study was to develop a reading concentration monitoring system for use with e-books in an intelligent classroom and to help instructors find out the students' reading concentration rates. The proposed system uses three types of sensor technologies, namely a webcam, heartbeat sensor, and blood oxygen sensor to detect the learning behaviors of students by capturing various physiological signals. An artificial bee colony (ABC) optimization approach is applied to the data gathered from these sensors to help instructors understand their students' reading concentration rates in a classroom learning environment. The results show that the use of the ABC algorithm in the proposed system can effectively obtain near-optimal solutions. The system has a user-friendly graphical interface, making it easy for instructors to clearly understand the reading status of their students.
Hsu, Chia-Cheng; Chen, Hsin-Chin; Su, Yen-Ning; Huang, Kuo-Kuang; Huang, Yueh-Min
2012-01-01
A growing number of educational studies apply sensors to improve student learning in real classroom settings. However, how can sensors be integrated into classrooms to help instructors find out students' reading concentration rates and thus better increase learning effectiveness? The aim of the current study was to develop a reading concentration monitoring system for use with e-books in an intelligent classroom and to help instructors find out the students' reading concentration rates. The proposed system uses three types of sensor technologies, namely a webcam, heartbeat sensor, and blood oxygen sensor to detect the learning behaviors of students by capturing various physiological signals. An artificial bee colony (ABC) optimization approach is applied to the data gathered from these sensors to help instructors understand their students' reading concentration rates in a classroom learning environment. The results show that the use of the ABC algorithm in the proposed system can effectively obtain near-optimal solutions. The system has a user-friendly graphical interface, making it easy for instructors to clearly understand the reading status of their students. PMID:23202042
Madan, Jason; Khan, Kamran A; Petrou, Stavros; Lamb, Sarah E
2017-05-01
Mapping algorithms are increasingly being used to predict health-utility values based on responses or scores from non-preference-based measures, thereby informing economic evaluations. We explored whether predictions in the EuroQol 5-dimension 3-level instrument (EQ-5D-3L) health-utility gains from mapping algorithms might differ if estimated using differenced versus raw scores, using the Roland-Morris Disability Questionnaire (RMQ), a widely used health status measure for low back pain, as an example. We estimated algorithms mapping within-person changes in RMQ scores to changes in EQ-5D-3L health utilities using data from two clinical trials with repeated observations. We also used logistic regression models to estimate response mapping algorithms from these data to predict within-person changes in responses to each EQ-5D-3L dimension from changes in RMQ scores. Predicted health-utility gains from these mappings were compared with predictions based on raw RMQ data. Using differenced scores reduced the predicted health-utility gain from a unit decrease in RMQ score from 0.037 (standard error [SE] 0.001) to 0.020 (SE 0.002). Analysis of response mapping data suggests that the use of differenced data reduces the predicted impact of reducing RMQ scores across EQ-5D-3L dimensions and that patients can experience health-utility gains on the EQ-5D-3L 'usual activity' dimension independent from improvements captured by the RMQ. Mappings based on raw RMQ data overestimate the EQ-5D-3L health utility gains from interventions that reduce RMQ scores. Where possible, mapping algorithms should reflect within-person changes in health outcome and be estimated from datasets containing repeated observations if they are to be used to estimate incremental health-utility gains.
Planning Readings: A Comparative Exploration of Basic Algorithms
ERIC Educational Resources Information Center
Piater, Justus H.
2009-01-01
Conventional introduction to computer science presents individual algorithmic paradigms in the context of specific, prototypical problems. To complement this algorithm-centric instruction, this study additionally advocates problem-centric instruction. I present an original problem drawn from students' life that is simply stated but provides rich…
NASA Technical Reports Server (NTRS)
Millman, Daniel R.
2017-01-01
Air Data Systems (FADS) are becoming more prevalent on re-entry vehicles, as evi- denced by the Mars Science Laboratory and the Orion Multipurpose Crew Vehicle. A FADS consists of flush-mounted pressure transducers located at various locations on the fore-body of a flight vehicle or the heat shield of a re-entry capsule. A pressure model converts the pressure readings into useful air data quantities. Two algorithms for converting pressure readings to air data have become predominant- the iterative Least Squares State Estimator (LSSE) and the Triples Algorithm. What follows herein is a new algorithm that takes advantage of the best features of both the Triples Algorithm and the LSSE. This approach employs the potential flow model and strategic differencing of the Triples Algorithm to obtain the defective flight angles; however, the requirements on port placement are far less restrictive, allowing for configurations that are considered optimal for a FADS.
Tang, Wenming; Liu, Guixiong; Li, Yuzhong; Tan, Daji
2017-01-01
High data transmission efficiency is a key requirement for an ultrasonic phased array with multi-group ultrasonic sensors. Here, a novel FIFOs scheduling algorithm was proposed and the data transmission efficiency with hardware technology was improved. This algorithm includes FIFOs as caches for the ultrasonic scanning data obtained from the sensors with the output data in a bandwidth-sharing way, on the basis of which an optimal length ratio of all the FIFOs is achieved, allowing the reading operations to be switched among all the FIFOs without time slot waiting. Therefore, this algorithm enhances the utilization ratio of the reading bandwidth resources so as to obtain higher efficiency than the traditional scheduling algorithms. The reliability and validity of the algorithm are substantiated after its implementation in the field programmable gate array (FPGA) technology, and the bandwidth utilization ratio and the real-time performance of the ultrasonic phased array are enhanced. PMID:29035345
Aircraft Detection in High-Resolution SAR Images Based on a Gradient Textural Saliency Map.
Tan, Yihua; Li, Qingyun; Li, Yansheng; Tian, Jinwen
2015-09-11
This paper proposes a new automatic and adaptive aircraft target detection algorithm in high-resolution synthetic aperture radar (SAR) images of airport. The proposed method is based on gradient textural saliency map under the contextual cues of apron area. Firstly, the candidate regions with the possible existence of airport are detected from the apron area. Secondly, directional local gradient distribution detector is used to obtain a gradient textural saliency map in the favor of the candidate regions. In addition, the final targets will be detected by segmenting the saliency map using CFAR-type algorithm. The real high-resolution airborne SAR image data is used to verify the proposed algorithm. The results demonstrate that this algorithm can detect aircraft targets quickly and accurately, and decrease the false alarm rate.
Zook, Justin M.; Samarov, Daniel; McDaniel, Jennifer; Sen, Shurjo K.; Salit, Marc
2012-01-01
While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a data set used to calculate association of SSEs with various features in the reads and sequence context. This data set is typically either from a part of the data set being “recalibrated” (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 Phred-scaled quality score units, and by as much as 13 units at CpG sites. In addition, since the spike-in data used for recalibration are independent of the genome being sequenced, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration. PMID:22859977
2012-01-01
Background Electronic health records are invaluable for medical research, but much information is stored as free text rather than in a coded form. For example, in the UK General Practice Research Database (GPRD), causes of death and test results are sometimes recorded only in free text. Free text can be difficult to use for research if it requires time-consuming manual review. Our aim was to develop an automated method for extracting coded information from free text in electronic patient records. Methods We reviewed the electronic patient records in GPRD of a random sample of 3310 patients who died in 2001, to identify the cause of death. We developed a computer program called the Freetext Matching Algorithm (FMA) to map diagnoses in text to the Read Clinical Terminology. The program uses lookup tables of synonyms and phrase patterns to identify diagnoses, dates and selected test results. We tested it on two random samples of free text from GPRD (1000 texts associated with death in 2001, and 1000 general texts from cases and controls in a coronary artery disease study), comparing the output to the U.S. National Library of Medicine’s MetaMap program and the gold standard of manual review. Results Among 3310 patients registered in the GPRD who died in 2001, the cause of death was recorded in coded form in 38.1% of patients, and in the free text alone in 19.4%. On the 1000 texts associated with death, FMA coded 683 of the 735 positive diagnoses, with precision (positive predictive value) 98.4% (95% confidence interval (CI) 97.2, 99.2) and recall (sensitivity) 92.9% (95% CI 90.8, 94.7). On the general sample, FMA detected 346 of the 447 positive diagnoses, with precision 91.5% (95% CI 88.3, 94.1) and recall 77.4% (95% CI 73.2, 81.2), which was similar to MetaMap. Conclusions We have developed an algorithm to extract coded information from free text in GP records with good precision. It may facilitate research using free text in electronic patient records, particularly for extracting the cause of death. PMID:22870911
DOE Office of Scientific and Technical Information (OSTI.GOV)
Englbrecht, F; Lindner, F; Bin, J
2016-06-15
Purpose: To measure and simulate well-defined electron spectra using a linear accelerator and a permanent-magnetic wide-angle spectrometer to test the performance of a novel reconstruction algorithm for retrieval of unknown electron-sources, in view of application to diagnostics of laser-driven particle acceleration. Methods: Six electron energies (6, 9, 12, 15, 18 and 21 MeV, 40cm × 40cm field-size) delivered by a Siemens Oncor linear accelerator were recorded using a permanent-magnetic wide-angle electron spectrometer (150mT) with a one dimensional slit (0.2mm × 5cm). Two dimensional maps representing beam-energy and entrance-position along the slit were measured using different scintillating screens, read by anmore » online CMOS detector of high resolution (0.048mm × 0.048mm pixels) and large field of view (5cm × 10cm). Measured energy-slit position maps were compared to forward FLUKA simulations of electron transport through the spectrometer, starting from IAEA phase-spaces of the accelerator. The latter ones were validated against measured depth-dose and lateral profiles in water. Agreement of forward simulation and measurement was quantified in terms of position and shape of the signal distribution on the detector. Results: Measured depth-dose distributions and lateral profiles in the water phantom showed good agreement with forward simulations of IAEA phase-spaces, thus supporting usage of this simulation source in the study. Measured energy-slit position maps and those obtained by forward Monte-Carlo simulations showed satisfactory agreement in shape and position. Conclusion: Well-defined electron beams of known energy and shape will provide an ideal scenario to study the performance of a novel reconstruction algorithm using measured and simulated signal. Future work will increase the stability and convergence of the reconstruction-algorithm for unknown electron sources, towards final application to the electrons which drive the interaction of TW-class laser pulses with nanometer thin target foils to accelerate protons and ions to multi-MeV kinetic energy. Cluster of Excellence of the German Research Foundation (DFG) “Munich-Centre for Advanced Photonics”.« less
ERIC Educational Resources Information Center
Khaghaninejad, Mohammad Saber; Arefinejad, Mansour
2015-01-01
This study was an attempt to examine the effect of concept mapping on reading comprehension of Iranian EFL learners. Pretest-posttest design was employed to scrutinize the possible improvement of the study's participants who were male and female learners whose ages ranged from 19 to 40 and had taken general English courses at Islamic Azad…
ERIC Educational Resources Information Center
Isikdogan, Necla; Kargin, Tevhide
2010-01-01
The purpose of this study was to investigate the effectiveness of the story-map technique on reading comprehension skills among students with mild mental retardation. The research group consisted of 14 students with mild mental retardation. The students in the research group were chosen from students who attended to an elementary school and a…
The Dilemmas of Teaching Reading. Eighth Yearbook of The American Reading Forum, 1988.
ERIC Educational Resources Information Center
Lumpkin, Donavon, Ed.; And Others
Articles in this eighth yearbook of the American Reading Forum address the dilemmas of teaching reading. Articles, listed with their authors, are as follows: (1) "Deepening a Dilemma: Stylus vs. Computer Writing at an Early Primary Level" (J. Heep); (2) "Concept Maps and Vee Diagrams: Strategies To Deal with the Dilemma of the Restricted…
X-MATE: a flexible system for mapping short read data
Pearson, John V.; Cloonan, Nicole; Grimmond, Sean M.
2011-01-01
Summary: Accurate and complete mapping of short-read sequencing to a reference genome greatly enhances the discovery of biological results and improves statistical predictions. We recently presented RNA-MATE, a pipeline for the recursive mapping of RNA-Seq datasets. With the rapid increase in genome re-sequencing projects, progression of available mapping software and the evolution of file formats, we now present X-MATE, an updated version of RNA-MATE, capable of mapping both RNA-Seq and DNA datasets and with improved performance, output file formats, configuration files, and flexibility in core mapping software. Availability: Executables, source code, junction libraries, test data and results and the user manual are available from http://grimmond.imb.uq.edu.au/X-MATE/. Contact: n.cloonan@uq.edu.au; s.grimmond@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics Online. PMID:21216778
Case study of rotating sonar sensor application in unmanned automated guided vehicle
NASA Astrophysics Data System (ADS)
Chandak, Pravin; Cao, Ming; Hall, Ernest L.
2001-10-01
A single rotating sonar element is used with a restricted angle of sweep to obtain readings to develop a range map for the unobstructed path of an autonomous guided vehicle (AGV). A Polaroid ultrasound transducer element is mounted on a micromotor with an encoder feedback. The motion of this motor is controlled using a Galil DMC 1000 motion control board. The encoder is interfaced with the DMC 1000 board using an intermediate IMC 1100 break-out board. By adjusting the parameters of the Polaroid element, it is possible to obtain range readings at known angles with respect to the center of the robot. The readings are mapped to obtain a range map of the unobstructed path in front of the robot. The idea can be extended to a 360 degree mapping by changing the assembly level programming on the Galil Motion control board. Such a system would be compact and reliable over a range of environments and AGV applications.
Backup Attitude Control Algorithms for the MAP Spacecraft
NASA Technical Reports Server (NTRS)
ODonnell, James R., Jr.; Andrews, Stephen F.; Ericsson-Jackson, Aprille J.; Flatley, Thomas W.; Ward, David K.; Bay, P. Michael
1999-01-01
The Microwave Anisotropy Probe (MAP) is a follow-on to the Differential Microwave Radiometer (DMR) instrument on the Cosmic Background Explorer (COBE) spacecraft. The MAP spacecraft will perform its mission, studying the early origins of the universe, in a Lissajous orbit around the Earth-Sun L(sub 2) Lagrange point. Due to limited mass, power, and financial resources, a traditional reliability concept involving fully redundant components was not feasible. This paper will discuss the redundancy philosophy used on MAP, describe the hardware redundancy selected (and why), and present backup modes and algorithms that were designed in lieu of additional attitude control hardware redundancy to improve the odds of mission success. Three of these modes have been implemented in the spacecraft flight software. The first onboard mode allows the MAP Kalman filter to be used with digital sun sensor (DSS) derived rates, in case of the failure of one of MAP's two two-axis inertial reference units. Similarly, the second onboard mode allows a star tracker only mode, using attitude and derived rate from one or both of MAP's star trackers for onboard attitude determination and control. The last backup mode onboard allows a sun-line angle offset to be commanded that will allow solar radiation pressure to be used for momentum management and orbit stationkeeping. In addition to the backup modes implemented on the spacecraft, two backup algorithms have been developed in the event of less likely contingencies. One of these is an algorithm for implementing an alternative scan pattern to MAP's nominal dual-spin science mode using only one or two reaction wheels and thrusters. Finally, an algorithm has been developed that uses thruster one shots while in science mode for momentum management. This algorithm has been developed in case system momentum builds up faster than anticipated, to allow adequate momentum management while minimizing interruptions to science. In this paper, each mode and algorithm will be discussed, and simulation results presented.
A hardware-oriented algorithm for floating-point function generation
NASA Technical Reports Server (NTRS)
O'Grady, E. Pearse; Young, Baek-Kyu
1991-01-01
An algorithm is presented for performing accurate, high-speed, floating-point function generation for univariate functions defined at arbitrary breakpoints. Rapid identification of the breakpoint interval, which includes the input argument, is shown to be the key operation in the algorithm. A hardware implementation which makes extensive use of read/write memories is used to illustrate the algorithm.
Aligner optimization increases accuracy and decreases compute times in multi-species sequence data.
Robinson, Kelly M; Hawkins, Aziah S; Santana-Cruz, Ivette; Adkins, Ricky S; Shetty, Amol C; Nagaraj, Sushma; Sadzewicz, Lisa; Tallon, Luke J; Rasko, David A; Fraser, Claire M; Mahurkar, Anup; Silva, Joana C; Dunning Hotopp, Julie C
2017-09-01
As sequencing technologies have evolved, the tools to analyze these sequences have made similar advances. However, for multi-species samples, we observed important and adverse differences in alignment specificity and computation time for bwa- mem (Burrows-Wheeler aligner-maximum exact matches) relative to bwa-aln. Therefore, we sought to optimize bwa-mem for alignment of data from multi-species samples in order to reduce alignment time and increase the specificity of alignments. In the multi-species cases examined, there was one majority member (i.e. Plasmodium falciparum or Brugia malayi ) and one minority member (i.e. human or the Wolbachia endosymbiont w Bm) of the sequence data. Increasing bwa-mem seed length from the default value reduced the number of read pairs from the majority sequence member that incorrectly aligned to the reference genome of the minority sequence member. Combining both source genomes into a single reference genome increased the specificity of mapping, while also reducing the central processing unit (CPU) time. In Plasmodium , at a seed length of 18 nt, 24.1 % of reads mapped to the human genome using 1.7±0.1 CPU hours, while 83.6 % of reads mapped to the Plasmodium genome using 0.2±0.0 CPU hours (total: 107.7 % reads mapping; in 1.9±0.1 CPU hours). In contrast, 97.1 % of the reads mapped to a combined Plasmodium- human reference in only 0.7±0.0 CPU hours. Overall, the results suggest that combining all references into a single reference database and using a 23 nt seed length reduces the computational time, while maximizing specificity. Similar results were found for simulated sequence reads from a mock metagenomic data set. We found similar improvements to computation time in a publicly available human-only data set.
NASA Astrophysics Data System (ADS)
Gruber, Thomas; Grim, Larry; Fauth, Ryan; Tercha, Brian; Powell, Chris; Steinhardt, Kristin
2011-05-01
Large networks of disparate chemical/biological (C/B) sensors, MET sensors, and intelligence, surveillance, and reconnaissance (ISR) sensors reporting to various command/display locations can lead to conflicting threat information, questions of alarm confidence, and a confused situational awareness. Sensor netting algorithms (SNA) are being developed to resolve these conflicts and to report high confidence consensus threat map data products on a common operating picture (COP) display. A data fusion algorithm design was completed in a Phase I SBIR effort and development continues in the Phase II SBIR effort. The initial implementation and testing of the algorithm has produced some performance results. The algorithm accepts point and/or standoff sensor data, and event detection data (e.g., the location of an explosion) from various ISR sensors (e.g., acoustic, infrared cameras, etc.). These input data are preprocessed to assign estimated uncertainty to each incoming piece of data. The data are then sent to a weighted tomography process to obtain a consensus threat map, including estimated threat concentration level uncertainty. The threat map is then tested for consistency and the overall confidence for the map result is estimated. The map and confidence results are displayed on a COP. The benefits of a modular implementation of the algorithm and comparisons of fused / un-fused data results will be presented. The metrics for judging the sensor-netting algorithm performance are warning time, threat map accuracy (as compared to ground truth), false alarm rate, and false alarm rate v. reported threat confidence level.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Demmel, James W.
This project addresses both communication-avoiding algorithms, and reproducible floating-point computation. Communication, i.e. moving data, either between levels of memory or processors over a network, is much more expensive per operation than arithmetic (measured in time or energy), so we seek algorithms that greatly reduce communication. We developed many new algorithms for both dense and sparse, and both direct and iterative linear algebra, attaining new communication lower bounds, and getting large speedups in many cases. We also extended this work in several ways: (1) We minimize writes separately from reads, since writes may be much more expensive than reads on emergingmore » memory technologies, like Flash, sometimes doing asymptotically fewer writes than reads. (2) We extend the lower bounds and optimal algorithms to arbitrary algorithms that may be expressed as perfectly nested loops accessing arrays, where the array subscripts may be arbitrary affine functions of the loop indices (eg A(i), B(i,j+k, k+3*m-7, …) etc.). (3) We extend our communication-avoiding approach to some machine learning algorithms, such as support vector machines. This work has won a number of awards. We also address reproducible floating-point computation. We define reproducibility to mean getting bitwise identical results from multiple runs of the same program, perhaps with different hardware resources or other changes that should ideally not change the answer. Many users depend on reproducibility for debugging or correctness. However, dynamic scheduling of parallel computing resources, combined with nonassociativity of floating point addition, makes attaining reproducibility a challenge even for simple operations like summing a vector of numbers, or more complicated operations like the Basic Linear Algebra Subprograms (BLAS). We describe an algorithm that computes a reproducible sum of floating point numbers, independent of the order of summation. The algorithm depends only on a subset of the IEEE Floating Point Standard 754-2008, uses just 6 words to represent a “reproducible accumulator,” and requires just one read-only pass over the data, or one reduction in parallel. New instructions based on this work are being considered for inclusion in the future IEEE 754-2018 floating-point standard, and new reproducible BLAS are being considered for the next version of the BLAS standard.« less
Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data
Kosugi, Shunichi; Natsume, Satoshi; Yoshida, Kentaro; MacLean, Daniel; Cano, Liliana; Kamoun, Sophien; Terauchi, Ryohei
2013-01-01
Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in ‘targeted’ alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/. PMID:24116042
Mapping and phasing of structural variation in patient genomes using nanopore sequencing.
Cretu Stancu, Mircea; van Roosmalen, Markus J; Renkens, Ivo; Nieboer, Marleen M; Middelkamp, Sjors; de Ligt, Joep; Pregno, Giulia; Giachino, Daniela; Mandrile, Giorgia; Espejo Valle-Inclan, Jose; Korzelius, Jerome; de Bruijn, Ewart; Cuppen, Edwin; Talkowski, Michael E; Marschall, Tobias; de Ridder, Jeroen; Kloosterman, Wigard P
2017-11-06
Despite improvements in genomics technology, the detection of structural variants (SVs) from short-read sequencing still poses challenges, particularly for complex variation. Here we analyse the genomes of two patients with congenital abnormalities using the MinION nanopore sequencer and a novel computational pipeline-NanoSV. We demonstrate that nanopore long reads are superior to short reads with regard to detection of de novo chromothripsis rearrangements. The long reads also enable efficient phasing of genetic variations, which we leveraged to determine the parental origin of all de novo chromothripsis breakpoints and to resolve the structure of these complex rearrangements. Additionally, genome-wide surveillance of inherited SVs reveals novel variants, missed in short-read data sets, a large proportion of which are retrotransposon insertions. We provide a first exploration of patient genome sequencing with a nanopore sequencer and demonstrate the value of long-read sequencing in mapping and phasing of SVs for both clinical and research applications.
First results in terrain mapping for a roving planetary explorer
NASA Technical Reports Server (NTRS)
Krotkov, E.; Caillas, C.; Hebert, M.; Kweon, I. S.; Kanade, Takeo
1989-01-01
To perform planetary exploration without human supervision, a complete autonomous rover must be able to model its environment while exploring its surroundings. Researchers present a new algorithm to construct a geometric terrain representation from a single range image. The form of the representation is an elevation map that includes uncertainty, unknown areas, and local features. By virtue of working in spherical-polar space, the algorithm is independent of the desired map resolution and the orientation of the sensor, unlike other algorithms that work in Cartesian space. They also describe new methods to evaluate regions of the constructed elevation maps to support legged locomotion over rough terrain.
Cao, Jianfang; Cui, Hongyan; Shi, Hao; Jiao, Lijuan
2016-01-01
A back-propagation (BP) neural network can solve complicated random nonlinear mapping problems; therefore, it can be applied to a wide range of problems. However, as the sample size increases, the time required to train BP neural networks becomes lengthy. Moreover, the classification accuracy decreases as well. To improve the classification accuracy and runtime efficiency of the BP neural network algorithm, we proposed a parallel design and realization method for a particle swarm optimization (PSO)-optimized BP neural network based on MapReduce on the Hadoop platform using both the PSO algorithm and a parallel design. The PSO algorithm was used to optimize the BP neural network's initial weights and thresholds and improve the accuracy of the classification algorithm. The MapReduce parallel programming model was utilized to achieve parallel processing of the BP algorithm, thereby solving the problems of hardware and communication overhead when the BP neural network addresses big data. Datasets on 5 different scales were constructed using the scene image library from the SUN Database. The classification accuracy of the parallel PSO-BP neural network algorithm is approximately 92%, and the system efficiency is approximately 0.85, which presents obvious advantages when processing big data. The algorithm proposed in this study demonstrated both higher classification accuracy and improved time efficiency, which represents a significant improvement obtained from applying parallel processing to an intelligent algorithm on big data.
A Probabilistic Feature Map-Based Localization System Using a Monocular Camera.
Kim, Hyungjin; Lee, Donghwa; Oh, Taekjun; Choi, Hyun-Taek; Myung, Hyun
2015-08-31
Image-based localization is one of the most widely researched localization techniques in the robotics and computer vision communities. As enormous image data sets are provided through the Internet, many studies on estimating a location with a pre-built image-based 3D map have been conducted. Most research groups use numerous image data sets that contain sufficient features. In contrast, this paper focuses on image-based localization in the case of insufficient images and features. A more accurate localization method is proposed based on a probabilistic map using 3D-to-2D matching correspondences between a map and a query image. The probabilistic feature map is generated in advance by probabilistic modeling of the sensor system as well as the uncertainties of camera poses. Using the conventional PnP algorithm, an initial camera pose is estimated on the probabilistic feature map. The proposed algorithm is optimized from the initial pose by minimizing Mahalanobis distance errors between features from the query image and the map to improve accuracy. To verify that the localization accuracy is improved, the proposed algorithm is compared with the conventional algorithm in a simulation and realenvironments.
A Probabilistic Feature Map-Based Localization System Using a Monocular Camera
Kim, Hyungjin; Lee, Donghwa; Oh, Taekjun; Choi, Hyun-Taek; Myung, Hyun
2015-01-01
Image-based localization is one of the most widely researched localization techniques in the robotics and computer vision communities. As enormous image data sets are provided through the Internet, many studies on estimating a location with a pre-built image-based 3D map have been conducted. Most research groups use numerous image data sets that contain sufficient features. In contrast, this paper focuses on image-based localization in the case of insufficient images and features. A more accurate localization method is proposed based on a probabilistic map using 3D-to-2D matching correspondences between a map and a query image. The probabilistic feature map is generated in advance by probabilistic modeling of the sensor system as well as the uncertainties of camera poses. Using the conventional PnP algorithm, an initial camera pose is estimated on the probabilistic feature map. The proposed algorithm is optimized from the initial pose by minimizing Mahalanobis distance errors between features from the query image and the map to improve accuracy. To verify that the localization accuracy is improved, the proposed algorithm is compared with the conventional algorithm in a simulation and realenvironments. PMID:26404284
An optimization method of VON mapping for energy efficiency and routing in elastic optical networks
NASA Astrophysics Data System (ADS)
Liu, Huanlin; Xiong, Cuilian; Chen, Yong; Li, Changping; Chen, Derun
2018-03-01
To improve resources utilization efficiency, network virtualization in elastic optical networks has been developed by sharing the same physical network for difference users and applications. In the process of virtual nodes mapping, longer paths between physical nodes will consume more spectrum resources and energy. To address the problem, we propose a virtual optical network mapping algorithm called genetic multi-objective optimize virtual optical network mapping algorithm (GM-OVONM-AL), which jointly optimizes the energy consumption and spectrum resources consumption in the process of virtual optical network mapping. Firstly, a vector function is proposed to balance the energy consumption and spectrum resources by optimizing population classification and crowding distance sorting. Then, an adaptive crossover operator based on hierarchical comparison is proposed to improve search ability and convergence speed. In addition, the principle of the survival of the fittest is introduced to select better individual according to the relationship of domination rank. Compared with the spectrum consecutiveness-opaque virtual optical network mapping-algorithm and baseline-opaque virtual optical network mapping algorithm, simulation results show the proposed GM-OVONM-AL can achieve the lowest bandwidth blocking probability and save the energy consumption.
A Radio-Map Automatic Construction Algorithm Based on Crowdsourcing
Yu, Ning; Xiao, Chenxian; Wu, Yinfeng; Feng, Renjian
2016-01-01
Traditional radio-map-based localization methods need to sample a large number of location fingerprints offline, which requires huge amount of human and material resources. To solve the high sampling cost problem, an automatic radio-map construction algorithm based on crowdsourcing is proposed. The algorithm employs the crowd-sourced information provided by a large number of users when they are walking in the buildings as the source of location fingerprint data. Through the variation characteristics of users’ smartphone sensors, the indoor anchors (doors) are identified and their locations are regarded as reference positions of the whole radio-map. The AP-Cluster method is used to cluster the crowdsourced fingerprints to acquire the representative fingerprints. According to the reference positions and the similarity between fingerprints, the representative fingerprints are linked to their corresponding physical locations and the radio-map is generated. Experimental results demonstrate that the proposed algorithm reduces the cost of fingerprint sampling and radio-map construction and guarantees the localization accuracy. The proposed method does not require users’ explicit participation, which effectively solves the resource-consumption problem when a location fingerprint database is established. PMID:27070623
Detection of breast cancer in automated 3D breast ultrasound
NASA Astrophysics Data System (ADS)
Tan, Tao; Platel, Bram; Mus, Roel; Karssemeijer, Nico
2012-03-01
Automated 3D breast ultrasound (ABUS) is a novel imaging modality, in which motorized scans of the breasts are made with a wide transducer through a membrane under modest compression. The technology has gained high interest and may become widely used in screening of dense breasts, where sensitivity of mammography is poor. ABUS has a high sensitivity for detecting solid breast lesions. However, reading ABUS images is time consuming, and subtle abnormalities may be missed. Therefore, we are developing a computer aided detection (CAD) system to help reduce reading time and errors. In the multi-stage system we propose, segmentations of the breast and nipple are performed, providing landmarks for the detection algorithm. Subsequently, voxel features characterizing coronal spiculation patterns, blobness, contrast, and locations with respect to landmarks are extracted. Using an ensemble of classifiers, a likelihood map indicating potential malignancies is computed. Local maxima in the likelihood map are determined using a local maxima detector and form a set of candidate lesions in each view. These candidates are further processed in a second detection stage, which includes region segmentation, feature extraction and a final classification. Region segmentation is performed using a 3D spiral-scanning dynamic programming method. Region features include descriptors of shape, acoustic behavior and texture. Performance was determined using a 78-patient dataset with 93 images, including 50 malignant lesions. We used 10-fold cross-validation. Using FROC analysis we found that the system obtains a lesion sensitivity of 60% and 70% at 2 and 4 false positives per image respectively.
PolyaPeak: Detecting Transcription Factor Binding Sites from ChIP-seq Using Peak Shape Information
Wu, Hao; Ji, Hongkai
2014-01-01
ChIP-seq is a powerful technology for detecting genomic regions where a protein of interest interacts with DNA. ChIP-seq data for mapping transcription factor binding sites (TFBSs) have a characteristic pattern: around each binding site, sequence reads aligned to the forward and reverse strands of the reference genome form two separate peaks shifted away from each other, and the true binding site is located in between these two peaks. While it has been shown previously that the accuracy and resolution of binding site detection can be improved by modeling the pattern, efficient methods are unavailable to fully utilize that information in TFBS detection procedure. We present PolyaPeak, a new method to improve TFBS detection by incorporating the peak shape information. PolyaPeak describes peak shapes using a flexible Pólya model. The shapes are automatically learnt from the data using Minorization-Maximization (MM) algorithm, then integrated with the read count information via a hierarchical model to distinguish true binding sites from background noises. Extensive real data analyses show that PolyaPeak is capable of robustly improving TFBS detection compared with existing methods. An R package is freely available. PMID:24608116
Topographic Maps and Coal Mining.
ERIC Educational Resources Information Center
Raitz, Karl B.
1984-01-01
Geography teachers can illustrate the patterns associated with mineral fuel production, especially coal, by using United States Geological Survey topographic maps, which are illustrated by symbols that indicate mine-related features, such as shafts and tailings. Map reading exercises are presented; an interpretative map key that can facilitate…
ERIC Educational Resources Information Center
Hofferber, Michael
1989-01-01
Orienteering--the game of following a map to find predetermined locations--can spark interest and develop skills in map making and map reading. This article gives background on orienteering; describes indoor and outdoor orienteering activities; offers suggestions for incorporating orienteering into science, math, and language arts; and provides a…
A Practical Framework for Cartographic Design
NASA Astrophysics Data System (ADS)
Denil, Mark
2018-05-01
Creation of a map artifact that can be recognized, accepted, read, and absorbed is the cartographer's chief responsibility. This involves bringing coherence and order out of chaos and randomness through the construction of map artifacts that mediate processes of social communication. Maps are artifacts, first and foremost: they are artifacts with particular formal attributes. It is the formal aspects of the map artifact that allows it to invoke and sustain a reading as a map. This paper examines Cartographic Design as the sole means at the cartographer's disposal for constructing the meaning bearing artifacts we know as maps, by placing it in a center of a practical analytic framework. The framework draws together the Theoretic and Craft aspects of map making, and examines how Style and Taste operate through the rubric of a schema of Mapicity to produce high quality maps. The role of the Cartographic Canon, and the role of Critique, are also explored, and a few design resources are identified.
Read-out electronics for DC squid magnetic measurements
Ganther, Jr., Kenneth R.; Snapp, Lowell D.
2002-01-01
Read-out electronics for DC SQUID sensor systems, the read-out electronics incorporating low Johnson noise radio-frequency flux-locked loop circuitry and digital signal processing algorithms in order to improve upon the prior art by a factor of at least ten, thereby alleviating problems caused by magnetic interference when operating DC SQUID sensor systems in magnetically unshielded environments.
Jiménez, Felipe; Monzón, Sergio; Naranjo, Jose Eugenio
2016-02-04
Vehicle positioning is a key factor for numerous information and assistance applications that are included in vehicles and for which satellite positioning is mainly used. However, this positioning process can result in errors and lead to measurement uncertainties. These errors come mainly from two sources: errors and simplifications of digital maps and errors in locating the vehicle. From that inaccurate data, the task of assigning the vehicle's location to a link on the digital map at every instant is carried out by map-matching algorithms. These algorithms have been developed to fulfil that need and attempt to amend these errors to offer the user a suitable positioning. In this research; an algorithm is developed that attempts to solve the errors in positioning when the Global Navigation Satellite System (GNSS) signal reception is frequently lost. The algorithm has been tested with satisfactory results in a complex urban environment of narrow streets and tall buildings where errors and signal reception losses of the GPS receiver are frequent.
Jiménez, Felipe; Monzón, Sergio; Naranjo, Jose Eugenio
2016-01-01
Vehicle positioning is a key factor for numerous information and assistance applications that are included in vehicles and for which satellite positioning is mainly used. However, this positioning process can result in errors and lead to measurement uncertainties. These errors come mainly from two sources: errors and simplifications of digital maps and errors in locating the vehicle. From that inaccurate data, the task of assigning the vehicle’s location to a link on the digital map at every instant is carried out by map-matching algorithms. These algorithms have been developed to fulfil that need and attempt to amend these errors to offer the user a suitable positioning. In this research; an algorithm is developed that attempts to solve the errors in positioning when the Global Navigation Satellite System (GNSS) signal reception is frequently lost. The algorithm has been tested with satisfactory results in a complex urban environment of narrow streets and tall buildings where errors and signal reception losses of the GPS receiver are frequent. PMID:26861320
Aircraft Detection in High-Resolution SAR Images Based on a Gradient Textural Saliency Map
Tan, Yihua; Li, Qingyun; Li, Yansheng; Tian, Jinwen
2015-01-01
This paper proposes a new automatic and adaptive aircraft target detection algorithm in high-resolution synthetic aperture radar (SAR) images of airport. The proposed method is based on gradient textural saliency map under the contextual cues of apron area. Firstly, the candidate regions with the possible existence of airport are detected from the apron area. Secondly, directional local gradient distribution detector is used to obtain a gradient textural saliency map in the favor of the candidate regions. In addition, the final targets will be detected by segmenting the saliency map using CFAR-type algorithm. The real high-resolution airborne SAR image data is used to verify the proposed algorithm. The results demonstrate that this algorithm can detect aircraft targets quickly and accurately, and decrease the false alarm rate. PMID:26378543
ERIC Educational Resources Information Center
Savage, Robert; Georgiou, George; Parrila, Rauno; Maiorino, Kristina
2018-01-01
We evaluated two experimenter-delivered, small-group word reading programs among at-risk poor readers in Grade 1 classes of regular elementary schools using a two-arm, dual-site-matched control trial intervention. At-risk poor word readers (n = 201) were allocated to either (a) Direct Mapping and Set-for-Variability (DMSfV) or (b) Current or…
Text Mapping Plus: Improving Comprehension through Supported Retellings
ERIC Educational Resources Information Center
Lapp, Diane; Fisher, Douglas; Johnson, Kelly
2010-01-01
Modeled in this column is the teaching of a text mapping routine that supports students reading and remembering the salient features of the text. The authors renamed the story mapping technique "text mapping plus" because they found that as students added relational words and graphics to their maps their retells of both fiction and nonnarrative…
Efficient design of nanoplasmonic waveguide devices using the space mapping algorithm.
Dastmalchi, Pouya; Veronis, Georgios
2013-12-30
We show that the space mapping algorithm, originally developed for microwave circuit optimization, can enable the efficient design of nanoplasmonic waveguide devices which satisfy a set of desired specifications. Space mapping utilizes a physics-based coarse model to approximate a fine model accurately describing a device. Here the fine model is a full-wave finite-difference frequency-domain (FDFD) simulation of the device, while the coarse model is based on transmission line theory. We demonstrate that simply optimizing the transmission line model of the device is not enough to obtain a device which satisfies all the required design specifications. On the other hand, when the iterative space mapping algorithm is used, it converges fast to a design which meets all the specifications. In addition, full-wave FDFD simulations of only a few candidate structures are required before the iterative process is terminated. Use of the space mapping algorithm therefore results in large reductions in the required computation time when compared to any direct optimization method of the fine FDFD model.
An improved dehazing algorithm of aerial high-definition image
NASA Astrophysics Data System (ADS)
Jiang, Wentao; Ji, Ming; Huang, Xiying; Wang, Chao; Yang, Yizhou; Li, Tao; Wang, Jiaoying; Zhang, Ying
2016-01-01
For unmanned aerial vehicle(UAV) images, the sensor can not get high quality images due to fog and haze weather. To solve this problem, An improved dehazing algorithm of aerial high-definition image is proposed. Based on the model of dark channel prior, the new algorithm firstly extracts the edges from crude estimated transmission map and expands the extracted edges. Then according to the expended edges, the algorithm sets a threshold value to divide the crude estimated transmission map into different areas and makes different guided filter on the different areas compute the optimized transmission map. The experimental results demonstrate that the performance of the proposed algorithm is substantially the same as the one based on dark channel prior and guided filter. The average computation time of the new algorithm is around 40% of the one as well as the detection ability of UAV image is improved effectively in fog and haze weather.
NASA Technical Reports Server (NTRS)
Kweon, In SO; Hebert, Martial; Kanade, Takeo
1989-01-01
A three-dimensional perception system for building a geometrical description of rugged terrain environments from range image data is presented with reference to the exploration of the rugged terrain of Mars. An intermediate representation consisting of an elevation map that includes an explicit representation of uncertainty and labeling of the occluded regions is proposed. The locus method used to convert range image to an elevation map is introduced, along with an uncertainty model based on this algorithm. Both the elevation map and the locus method are the basis of a terrain matching algorithm which does not assume any correspondences between range images. The two-stage algorithm consists of a feature-based matching algorithm to compute an initial transform and an iconic terrain matching algorithm to merge multiple range images into a uniform representation. Terrain modeling results on real range images of rugged terrain are presented. The algorithms considered are a fundamental part of the perception system for the Ambler, a legged locomotor.
Assembling short reads from jumping libraries with large insert sizes.
Vasilinetc, Irina; Prjibelski, Andrey D; Gurevich, Alexey; Korobeynikov, Anton; Pevzner, Pavel A
2015-10-15
Advances in Next-Generation Sequencing technologies and sample preparation recently enabled generation of high-quality jumping libraries that have a potential to significantly improve short read assemblies. However, assembly algorithms have to catch up with experimental innovations to benefit from them and to produce high-quality assemblies. We present a new algorithm that extends recently described exSPAnder universal repeat resolution approach to enable its applications to several challenging data types, including jumping libraries generated by the recently developed Illumina Nextera Mate Pair protocol. We demonstrate that, with these improvements, bacterial genomes often can be assembled in a few contigs using only a single Nextera Mate Pair library of short reads. Described algorithms are implemented in C++ as a part of SPAdes genome assembler, which is freely available at bioinf.spbau.ru/en/spades. ap@bioinf.spbau.ru Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
QSRA: a quality-value guided de novo short read assembler.
Bryant, Douglas W; Wong, Weng-Keen; Mockler, Todd C
2009-02-24
New rapid high-throughput sequencing technologies have sparked the creation of a new class of assembler. Since all high-throughput sequencing platforms incorporate errors in their output, short-read assemblers must be designed to account for this error while utilizing all available data. We have designed and implemented an assembler, Quality-value guided Short Read Assembler, created to take advantage of quality-value scores as a further method of dealing with error. Compared to previous published algorithms, our assembler shows significant improvements not only in speed but also in output quality. QSRA generally produced the highest genomic coverage, while being faster than VCAKE. QSRA is extremely competitive in its longest contig and N50/N80 contig lengths, producing results of similar quality to those of EDENA and VELVET. QSRA provides a step closer to the goal of de novo assembly of complex genomes, improving upon the original VCAKE algorithm by not only drastically reducing runtimes but also increasing the viability of the assembly algorithm through further error handling capabilities.
A Modified Direct-Reading Azimuth Protractor
ERIC Educational Resources Information Center
Larson, William C.; Pugliese, Joseph M.
1977-01-01
Describes the construction of a direct-reading azimuth protractor (DRAP) used for mapping fracture and joint-surface orientations in underground mines where magnetic disturbances affect typical geologic pocket transit. (SL)
Map Reading beyond Information Given: The Expert Orienteers' Internal Knowledge about Terrain.
ERIC Educational Resources Information Center
Murakoshi, Shin
1990-01-01
Compares novice and expert orienteers' map interpretation skills. Subjects asked to judge terrain from maps, including conditions inferable without corresponding map symbols. Experts' interpretation of identical symbols implies use of experiential knowledge. Internal knowledge characteristics discussed in terms of episodic-semantic memory…
Comparison of three methods for materials identification and mapping with imaging spectroscopy
NASA Technical Reports Server (NTRS)
Clark, Roger N.; Swayze, Gregg; Boardman, Joe; Kruse, Fred
1993-01-01
We are comparing three methods of mapping analysis tools for imaging spectroscopy data. The purpose of this comparison is to understand the advantages and disadvantages of each algorithm so others would be better able to choose the best algorithm or combinations of algorithms for a particular problem. The three algorithms are: (1) the spectralfeature modified least squares mapping algorithm of Clark et al (1990, 1991): programs mbandmap and tricorder; (2) the Spectral Angle Mapper Algorithm(Boardman, 1993) found in the CU CSES SIPS package; and (3) the Expert System of Kruse et al. (1993). The comparison uses a ground-calibrated 1990 AVIRIS scene of 400 by 410 pixels over Cuprite, Nevada. Along with the test data set is a spectral library of 38 minerals. Each algorithm is tested with the same AVIRIS data set and spectral library. Field work has confirmed the presence of many of these minerals in the AVIRIS scene (Swayze et al. 1992).
Lossless Compression of Classification-Map Data
NASA Technical Reports Server (NTRS)
Hua, Xie; Klimesh, Matthew
2009-01-01
A lossless image-data-compression algorithm intended specifically for application to classification-map data is based on prediction, context modeling, and entropy coding. The algorithm was formulated, in consideration of the differences between classification maps and ordinary images of natural scenes, so as to be capable of compressing classification- map data more effectively than do general-purpose image-data-compression algorithms. Classification maps are typically generated from remote-sensing images acquired by instruments aboard aircraft (see figure) and spacecraft. A classification map is a synthetic image that summarizes information derived from one or more original remote-sensing image(s) of a scene. The value assigned to each pixel in such a map is the index of a class that represents some type of content deduced from the original image data for example, a type of vegetation, a mineral, or a body of water at the corresponding location in the scene. When classification maps are generated onboard the aircraft or spacecraft, it is desirable to compress the classification-map data in order to reduce the volume of data that must be transmitted to a ground station.
An improved image non-blind image deblurring method based on FoEs
NASA Astrophysics Data System (ADS)
Zhu, Qidan; Sun, Lei
2013-03-01
Traditional non-blind image deblurring algorithms always use maximum a posterior(MAP). MAP estimates involving natural image priors can reduce the ripples effectively in contrast to maximum likelihood(ML). However, they have been found lacking in terms of restoration performance. Based on this issue, we utilize MAP with KL penalty to replace traditional MAP. We develop an image reconstruction algorithm that minimizes the KL divergence between the reference distribution and the prior distribution. The approximate KL penalty can restrain over-smooth caused by MAP. We use three groups of images and Harris corner detection to prove our method. The experimental results show that our algorithm of non-blind image restoration can effectively reduce the ringing effect and exhibit the state-of-the-art deblurring results.
Olarerin-George, Anthony O.; Hogenesch, John B.
2015-01-01
Mycoplasmas are notorious contaminants of cell culture and can have profound effects on host cell biology by depriving cells of nutrients and inducing global changes in gene expression. Over the last two decades, sentinel testing has revealed wide-ranging contamination rates in mammalian culture. To obtain an unbiased assessment from hundreds of labs, we analyzed sequence data from 9395 rodent and primate samples from 884 series in the NCBI Sequence Read Archive. We found 11% of these series were contaminated (defined as ≥100 reads/million mapping to mycoplasma in one or more samples). Ninety percent of mycoplasma-mapped reads aligned to ribosomal RNA. This was unexpected given 37% of contaminated series used poly(A)-selection for mRNA enrichment. Lastly, we examined the relationship between mycoplasma contamination and host gene expression in a single cell RNA-seq dataset and found 61 host genes (P < 0.001) were significantly associated with mycoplasma-mapped read counts. In all, this study suggests mycoplasma contamination is still prevalent today and poses substantial risk to research quality. PMID:25712092
Long-read sequencing and de novo assembly of a Chinese genome
USDA-ARS?s Scientific Manuscript database
Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arr...
Koa-Wing, Michael; Nakagawa, Hiroshi; Luther, Vishal; Jamil-Copley, Shahnaz; Linton, Nick; Sandler, Belinda; Qureshi, Norman; Peters, Nicholas S; Davies, D Wyn; Francis, Darrel P; Jackman, Warren; Kanagaratnam, Prapa
2015-11-15
Ripple Mapping (RM) is designed to overcome the limitations of existing isochronal 3D mapping systems by representing the intracardiac electrogram as a dynamic bar on a surface bipolar voltage map that changes in height according to the electrogram voltage-time relationship, relative to a fiduciary point. We tested the hypothesis that standard approaches to atrial tachycardia CARTO™ activation maps were inadequate for RM creation and interpretation. From the results, we aimed to develop an algorithm to optimize RMs for future prospective testing on a clinical RM platform. CARTO-XP™ activation maps from atrial tachycardia ablations were reviewed by two blinded assessors on an off-line RM workstation. Ripple Maps were graded according to a diagnostic confidence scale (Grade I - high confidence with clear pattern of activation through to Grade IV - non-diagnostic). The RM-based diagnoses were corroborated against the clinical diagnoses. 43 RMs from 14 patients were classified as Grade I (5 [11.5%]); Grade II (17 [39.5%]); Grade III (9 [21%]) and Grade IV (12 [28%]). Causes of low gradings/errors included the following: insufficient chamber point density; window-of-interest<100% of cycle length (CL); <95% tachycardia CL mapped; variability of CL and/or unstable fiducial reference marker; and suboptimal bar height and scar settings. A data collection and map interpretation algorithm has been developed to optimize Ripple Maps in atrial tachycardias. This algorithm requires prospective testing on a real-time clinical platform. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Einstein, Daniel R.; Kuprat, Andrew P.; Jiao, Xiangmin
2013-01-01
Geometries for organ scale and multiscale simulations of organ function are now routinely derived from imaging data. However, medical images may also contain spatially heterogeneous information other than geometry that are relevant to such simulations either as initial conditions or in the form of model parameters. In this manuscript, we present an algorithm for the efficient and robust mapping of such data to imaging based unstructured polyhedral grids in parallel. We then illustrate the application of our mapping algorithm to three different mapping problems: 1) the mapping of MRI diffusion tensor data to an unstuctured ventricular grid; 2) the mappingmore » of serial cyro-section histology data to an unstructured mouse brain grid; and 3) the mapping of CT-derived volumetric strain data to an unstructured multiscale lung grid. Execution times and parallel performance are reported for each case.« less
Localization from Visual Landmarks on a Free-Flying Robot
NASA Technical Reports Server (NTRS)
Coltin, Brian; Fusco, Jesse; Moratto, Zack; Alexandrov, Oleg; Nakamura, Robert
2016-01-01
We present the localization approach for Astrobee, a new free-flying robot designed to navigate autonomously on the International Space Station (ISS). Astrobee will accommodate a variety of payloads and enable guest scientists to run experiments in zero-g, as well as assist astronauts and ground controllers. Astrobee will replace the SPHERES robots which currently operate on the ISS, whose use of fixed ultrasonic beacons for localization limits them to work in a 2 meter cube. Astrobee localizes with monocular vision and an IMU, without any environmental modifications. Visual features detected on a pre-built map, optical flow information, and IMU readings are all integrated into an extended Kalman filter (EKF) to estimate the robot pose. We introduce several modifications to the filter to make it more robust to noise, and extensively evaluate the localization algorithm.
NASA Technical Reports Server (NTRS)
Sanyal, Soumya; Jain, Amit; Das, Sajal K.; Biswas, Rupak
2003-01-01
In this paper, we propose a distributed approach for mapping a single large application to a heterogeneous grid environment. To minimize the execution time of the parallel application, we distribute the mapping overhead to the available nodes of the grid. This approach not only provides a fast mapping of tasks to resources but is also scalable. We adopt a hierarchical grid model and accomplish the job of mapping tasks to this topology using a scheduler tree. Results show that our three-phase algorithm provides high quality mappings, and is fast and scalable.
Wyoming Geology and Geography, Unit I.
ERIC Educational Resources Information Center
Robinson, Terry
This unit on the geology and geography of Wyoming for elementary school students provides activities for map and globe skills. Goals include reading and interpreting maps and globes, interpreting map symbols, comparing maps and drawing inferences, and understanding time and chronology. Outlines and charts are provided for Wyoming geology and…
ERIC Educational Resources Information Center
Kim, Kyung
2017-01-01
The present study considers the potential influence of first language (L1) in reading second language (L2) science text. University mixed proficiency Korean English language learners (n = 136) were asked to complete pre- and post-reading "sorting maps" in L1 or L2 (e.g., sort Korean, read text, sort English). All of the participants'…
Mobile robot motion estimation using Hough transform
NASA Astrophysics Data System (ADS)
Aldoshkin, D. N.; Yamskikh, T. N.; Tsarev, R. Yu
2018-05-01
This paper proposes an algorithm for estimation of mobile robot motion. The geometry of surrounding space is described with range scans (samples of distance measurements) taken by the mobile robot’s range sensors. A similar sample of space geometry in any arbitrary preceding moment of time or the environment map can be used as a reference. The suggested algorithm is invariant to isotropic scaling of samples or map that allows using samples measured in different units and maps made at different scales. The algorithm is based on Hough transform: it maps from measurement space to a straight-line parameters space. In the straight-line parameters, space the problems of estimating rotation, scaling and translation are solved separately breaking down a problem of estimating mobile robot localization into three smaller independent problems. The specific feature of the algorithm presented is its robustness to noise and outliers inherited from Hough transform. The prototype of the system of mobile robot orientation is described.
Cao, Jianfang; Cui, Hongyan; Shi, Hao; Jiao, Lijuan
2016-01-01
A back-propagation (BP) neural network can solve complicated random nonlinear mapping problems; therefore, it can be applied to a wide range of problems. However, as the sample size increases, the time required to train BP neural networks becomes lengthy. Moreover, the classification accuracy decreases as well. To improve the classification accuracy and runtime efficiency of the BP neural network algorithm, we proposed a parallel design and realization method for a particle swarm optimization (PSO)-optimized BP neural network based on MapReduce on the Hadoop platform using both the PSO algorithm and a parallel design. The PSO algorithm was used to optimize the BP neural network’s initial weights and thresholds and improve the accuracy of the classification algorithm. The MapReduce parallel programming model was utilized to achieve parallel processing of the BP algorithm, thereby solving the problems of hardware and communication overhead when the BP neural network addresses big data. Datasets on 5 different scales were constructed using the scene image library from the SUN Database. The classification accuracy of the parallel PSO-BP neural network algorithm is approximately 92%, and the system efficiency is approximately 0.85, which presents obvious advantages when processing big data. The algorithm proposed in this study demonstrated both higher classification accuracy and improved time efficiency, which represents a significant improvement obtained from applying parallel processing to an intelligent algorithm on big data. PMID:27304987
Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver
Blanquart, François; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J; Hall, Matthew; Hillebregt, Mariska; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M Kate; Gunsenheimer-Bartmeyer, Barbara; Günthard, Huldrych F; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Cornelissen, Marion; Kellam, Paul; Reiss, Peter
2018-01-01
Abstract Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user’s choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver’s constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver. PMID:29876136
Design and application of star map simulation system for star sensors
NASA Astrophysics Data System (ADS)
Wu, Feng; Shen, Weimin; Zhu, Xifang; Chen, Yuheng; Xu, Qinquan
2013-12-01
Modern star sensors are powerful to measure attitude automatically which assure a perfect performance of spacecrafts. They achieve very accurate attitudes by applying algorithms to process star maps obtained by the star camera mounted on them. Therefore, star maps play an important role in designing star cameras and developing procession algorithms. Furthermore, star maps supply significant supports to exam the performance of star sensors completely before their launch. However, it is not always convenient to supply abundant star maps by taking pictures of the sky. Thus, star map simulation with the aid of computer attracts a lot of interests by virtue of its low price and good convenience. A method to simulate star maps by programming and extending the function of the optical design program ZEMAX is proposed. The star map simulation system is established. Firstly, based on analyzing the working procedures of star sensors to measure attitudes and the basic method to design optical system by ZEMAX, the principle of simulating star sensor imaging is given out in detail. The theory about adding false stars and noises, and outputting maps is discussed and the corresponding approaches are proposed. Then, by external programming, the star map simulation program is designed and produced. Its user interference and operation are introduced. Applications of star map simulation method in evaluating optical system, star image extraction algorithm and star identification algorithm, and calibrating system errors are presented completely. It was proved that the proposed simulation method provides magnificent supports to the study on star sensors, and improves the performance of star sensors efficiently.
ERIC Educational Resources Information Center
Hulse, Grace
2012-01-01
In this article, the author describes how her fourth graders made ceramic heart maps. The impetus for this project came from reading "My Map Book" by Sara Fanelli. This book is a collection of quirky, hand-drawn and collaged maps that diagram a child's world. There are maps of her stomach, her day, her family, and her heart, among others. The…
The World in Spatial Terms: Mapmaking and Map Reading
ERIC Educational Resources Information Center
Ekiss, Gale Olp; Trapido-Lurie, Barbara; Phillips, Judy; Hinde, Elizabeth
2007-01-01
Maps and mapping activities are essential in the primary grades. Maps are truly ubiquitous today, as evidenced by the popularity of websites such as Google Earth and Mapquest, and by devices such as Global Positioning System (GPS) units in cars, planes, and boats. Maps can give visual settings to travel stories and historical narratives and can…
Algorithmic Approaches for Place Recognition in Featureless, Walled Environments
2015-01-01
inertial measurement unit LIDAR light detection and ranging RANSAC random sample consensus SLAM simultaneous localization and mapping SUSAN smallest...algorithm 38 21 Typical input image for general junction based algorithm 39 22 Short exposure image of hallway junction taken by LIDAR 40 23...discipline of simultaneous localization and mapping ( SLAM ) has been studied intensively over the past several years. Many technical approaches
Baichoo, Shakuntala; Ouzounis, Christos A
A multitude of algorithms for sequence comparison, short-read assembly and whole-genome alignment have been developed in the general context of molecular biology, to support technology development for high-throughput sequencing, numerous applications in genome biology and fundamental research on comparative genomics. The computational complexity of these algorithms has been previously reported in original research papers, yet this often neglected property has not been reviewed previously in a systematic manner and for a wider audience. We provide a review of space and time complexity of key sequence analysis algorithms and highlight their properties in a comprehensive manner, in order to identify potential opportunities for further research in algorithm or data structure optimization. The complexity aspect is poised to become pivotal as we will be facing challenges related to the continuous increase of genomic data on unprecedented scales and complexity in the foreseeable future, when robust biological simulation at the cell level and above becomes a reality. Copyright © 2017 Elsevier B.V. All rights reserved.
Characterization of robotics parallel algorithms and mapping onto a reconfigurable SIMD machine
NASA Technical Reports Server (NTRS)
Lee, C. S. G.; Lin, C. T.
1989-01-01
The kinematics, dynamics, Jacobian, and their corresponding inverse computations are six essential problems in the control of robot manipulators. Efficient parallel algorithms for these computations are discussed and analyzed. Their characteristics are identified and a scheme on the mapping of these algorithms to a reconfigurable parallel architecture is presented. Based on the characteristics including type of parallelism, degree of parallelism, uniformity of the operations, fundamental operations, data dependencies, and communication requirement, it is shown that most of the algorithms for robotic computations possess highly regular properties and some common structures, especially the linear recursive structure. Moreover, they are well-suited to be implemented on a single-instruction-stream multiple-data-stream (SIMD) computer with reconfigurable interconnection network. The model of a reconfigurable dual network SIMD machine with internal direct feedback is introduced. A systematic procedure internal direct feedback is introduced. A systematic procedure to map these computations to the proposed machine is presented. A new scheduling problem for SIMD machines is investigated and a heuristic algorithm, called neighborhood scheduling, that reorders the processing sequence of subtasks to reduce the communication time is described. Mapping results of a benchmark algorithm are illustrated and discussed.
Injecting Errors for Testing Built-In Test Software
NASA Technical Reports Server (NTRS)
Gender, Thomas K.; Chow, James
2010-01-01
Two algorithms have been conceived to enable automated, thorough testing of Built-in test (BIT) software. The first algorithm applies to BIT routines that define pass/fail criteria based on values of data read from such hardware devices as memories, input ports, or registers. This algorithm simulates effects of errors in a device under test by (1) intercepting data from the device and (2) performing AND operations between the data and the data mask specific to the device. This operation yields values not expected by the BIT routine. This algorithm entails very small, permanent instrumentation of the software under test (SUT) for performing the AND operations. The second algorithm applies to BIT programs that provide services to users application programs via commands or callable interfaces and requires a capability for test-driver software to read and write the memory used in execution of the SUT. This algorithm identifies all SUT code execution addresses where errors are to be injected, then temporarily replaces the code at those addresses with small test code sequences to inject latent severe errors, then determines whether, as desired, the SUT detects the errors and recovers
PSO algorithm enhanced with Lozi Chaotic Map - Tuning experiment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pluhacek, Michal; Senkerik, Roman; Zelinka, Ivan
2015-03-10
In this paper it is investigated the effect of tuning of control parameters of the Lozi Chaotic Map employed as a chaotic pseudo-random number generator for the particle swarm optimization algorithm. Three different benchmark functions are selected from the IEEE CEC 2013 competition benchmark set. The Lozi map is extensively tuned and the performance of PSO is evaluated.
2006-01-01
information of the robot (Figure 1) acquired via laser-based localization techniques. The results are maps of the global soundscape . The algorithmic...environments than noise maps. Furthermore, provided the acoustic localization algorithm can detect the sources, the soundscape can be mapped with many...gathering information about the auditory soundscape in which it is working. In addition to robustness in the presence of noise, it has also been
NASA Astrophysics Data System (ADS)
Cui, Yang; Luo, Wang; Fan, Qiang; Peng, Qiwei; Cai, Yiting; Yao, Yiyang; Xu, Changfu
2018-01-01
This paper adopts a low power consumption ARM Hisilicon mobile processing platform and OV4689 camera, combined with a new skeleton extraction based on distance transform algorithm and the improved Hough algorithm for multi meters real-time reading. The design and implementation of the device were completed. Experimental results shows that The average error of measurement was 0.005MPa, and the average reading time was 5s. The device had good stability and high accuracy which meets the needs of practical application.
The Improved Locating Algorithm of Particle Filter Based on ROS Robot
NASA Astrophysics Data System (ADS)
Fang, Xun; Fu, Xiaoyang; Sun, Ming
2018-03-01
This paperanalyzes basic theory and primary algorithm of the real-time locating system and SLAM technology based on ROS system Robot. It proposes improved locating algorithm of particle filter effectively reduces the matching time of laser radar and map, additional ultra-wideband technology directly accelerates the global efficiency of FastSLAM algorithm, which no longer needs searching on the global map. Meanwhile, the re-sampling has been largely reduced about 5/6 that directly cancels the matching behavior on Roboticsalgorithm.
NASA Astrophysics Data System (ADS)
Gandomi, A. H.; Yang, X.-S.; Talatahari, S.; Alavi, A. H.
2013-01-01
A recently developed metaheuristic optimization algorithm, firefly algorithm (FA), mimics the social behavior of fireflies based on the flashing and attraction characteristics of fireflies. In the present study, we will introduce chaos into FA so as to increase its global search mobility for robust global optimization. Detailed studies are carried out on benchmark problems with different chaotic maps. Here, 12 different chaotic maps are utilized to tune the attractive movement of the fireflies in the algorithm. The results show that some chaotic FAs can clearly outperform the standard FA.
ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers.
Coombe, Lauren; Zhang, Jessica; Vandervalk, Benjamin P; Chu, Justin; Jackman, Shaun D; Birol, Inanc; Warren, René L
2018-06-20
The long-range sequencing information captured by linked reads, such as those available from 10× Genomics (10xG), helps resolve genome sequence repeats, and yields accurate and contiguous draft genome assemblies. We introduce ARKS, an alignment-free linked read genome scaffolding methodology that uses linked reads to organize genome assemblies further into contiguous drafts. Our approach departs from other read alignment-dependent linked read scaffolders, including our own (ARCS), and uses a kmer-based mapping approach. The kmer mapping strategy has several advantages over read alignment methods, including better usability and faster processing, as it precludes the need for input sequence formatting and draft sequence assembly indexing. The reliance on kmers instead of read alignments for pairing sequences relaxes the workflow requirements, and drastically reduces the run time. Here, we show how linked reads, when used in conjunction with Hi-C data for scaffolding, improve a draft human genome assembly of PacBio long-read data five-fold (baseline vs. ARKS NG50 = 4.6 vs. 23.1 Mbp, respectively). We also demonstrate how the method provides further improvements of a megabase-scale Supernova human genome assembly (NG50 = 14.74 Mbp vs. 25.94 Mbp before and after ARKS), which itself exclusively uses linked read data for assembly, with an execution speed six to nine times faster than competitive linked read scaffolders (~ 10.5 h compared to 75.7 h, on average). Following ARKS scaffolding of a human genome 10xG Supernova assembly (of cell line NA12878), fewer than 9 scaffolds cover each chromosome, except the largest (chromosome 1, n = 13). ARKS uses a kmer mapping strategy instead of linked read alignments to record and associate the barcode information needed to order and orient draft assembly sequences. The simplified workflow, when compared to that of our initial implementation, ARCS, markedly improves run time performances on experimental human genome datasets. Furthermore, the novel distance estimator in ARKS utilizes barcoding information from linked reads to estimate gap sizes. It accomplishes this by modeling the relationship between known distances of a region within contigs and calculating associated Jaccard indices. ARKS has the potential to provide correct, chromosome-scale genome assemblies, promptly. We expect ARKS to have broad utility in helping refine draft genomes.
NRGC: a novel referential genome compression algorithm.
Saha, Subrata; Rajasekaran, Sanguthevar
2016-11-15
Next-generation sequencing techniques produce millions to billions of short reads. The procedure is not only very cost effective but also can be done in laboratory environment. The state-of-the-art sequence assemblers then construct the whole genomic sequence from these reads. Current cutting edge computing technology makes it possible to build genomic sequences from the billions of reads within a minimal cost and time. As a consequence, we see an explosion of biological sequences in recent times. In turn, the cost of storing the sequences in physical memory or transmitting them over the internet is becoming a major bottleneck for research and future medical applications. Data compression techniques are one of the most important remedies in this context. We are in need of suitable data compression algorithms that can exploit the inherent structure of biological sequences. Although standard data compression algorithms are prevalent, they are not suitable to compress biological sequencing data effectively. In this article, we propose a novel referential genome compression algorithm (NRGC) to effectively and efficiently compress the genomic sequences. We have done rigorous experiments to evaluate NRGC by taking a set of real human genomes. The simulation results show that our algorithm is indeed an effective genome compression algorithm that performs better than the best-known algorithms in most of the cases. Compression and decompression times are also very impressive. The implementations are freely available for non-commercial purposes. They can be downloaded from: http://www.engr.uconn.edu/~rajasek/NRGC.zip CONTACT: rajasek@engr.uconn.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
RC-MAPS: Bridging the Comprehension Gap in EAP Reading
ERIC Educational Resources Information Center
Sterzik, Angela Meyer; Fraser, Carol
2012-01-01
In academic environments, reading is assigned not simply to transmit information; students are required to take the information, and based on the task set by the instructor, assess, analyze, and critique it on the basis of personal experiences, prior knowledge, and other readings (Grabe, 2009). Thus text-based comprehension (Kintsch, 1998) alone…
Strategies Training in the Teaching of Reading Comprehension for EFL Learners in Indonesia
ERIC Educational Resources Information Center
Mistar, Junaidi; Zuhairi, Alfan; Yanti, Nofita
2016-01-01
This study investigated the effect of reading strategies training on the students' literal and inferential reading comprehension. The training involved three concrete strategies: predicting, text mapping, and summarizing. To achieve the purpose of this study, a quasi experimental design was selected with the experimental group being given reading…
Everyday Reading and Writing: English. 5112.24.
ERIC Educational Resources Information Center
Knowles, Marlene; Wardell, Arlene
A curriculum guide to help students improve their everyday English skills has been designed for the Dade County Public Schools. The course, for grades 8 through 12, is to help students learn to read, write, and interpret letters, business forms, instructions, signs, maps, and magazines. The practical subject matter emphasizes basic reading and…
Wehbe, Leila; Murphy, Brian; Talukdar, Partha; Fyshe, Alona; Ramdas, Aaditya; Mitchell, Tom
2014-01-01
Story understanding involves many perceptual and cognitive subprocesses, from perceiving individual words, to parsing sentences, to understanding the relationships among the story characters. We present an integrated computational model of reading that incorporates these and additional subprocesses, simultaneously discovering their fMRI signatures. Our model predicts the fMRI activity associated with reading arbitrary text passages, well enough to distinguish which of two story segments is being read with 74% accuracy. This approach is the first to simultaneously track diverse reading subprocesses during complex story processing and predict the detailed neural representation of diverse story features, ranging from visual word properties to the mention of different story characters and different actions they perform. We construct brain representation maps that replicate many results from a wide range of classical studies that focus each on one aspect of language processing and offer new insights on which type of information is processed by different areas involved in language processing. Additionally, this approach is promising for studying individual differences: it can be used to create single subject maps that may potentially be used to measure reading comprehension and diagnose reading disorders.
zUMIs - A fast and flexible pipeline to process RNA sequencing data with UMIs.
Parekh, Swati; Ziegenhain, Christoph; Vieth, Beate; Enard, Wolfgang; Hellmann, Ines
2018-06-01
Single-cell RNA-sequencing (scRNA-seq) experiments typically analyze hundreds or thousands of cells after amplification of the cDNA. The high throughput is made possible by the early introduction of sample-specific bar codes (BCs), and the amplification bias is alleviated by unique molecular identifiers (UMIs). Thus, the ideal analysis pipeline for scRNA-seq data needs to efficiently tabulate reads according to both BC and UMI. zUMIs is a pipeline that can handle both known and random BCs and also efficiently collapse UMIs, either just for exon mapping reads or for both exon and intron mapping reads. If BC annotation is missing, zUMIs can accurately detect intact cells from the distribution of sequencing reads. Another unique feature of zUMIs is the adaptive downsampling function that facilitates dealing with hugely varying library sizes but also allows the user to evaluate whether the library has been sequenced to saturation. To illustrate the utility of zUMIs, we analyzed a single-nucleus RNA-seq dataset and show that more than 35% of all reads map to introns. Also, we show that these intronic reads are informative about expression levels, significantly increasing the number of detected genes and improving the cluster resolution. zUMIs flexibility makes if possible to accommodate data generated with any of the major scRNA-seq protocols that use BCs and UMIs and is the most feature-rich, fast, and user-friendly pipeline to process such scRNA-seq data.
The remote sensing image segmentation mean shift algorithm parallel processing based on MapReduce
NASA Astrophysics Data System (ADS)
Chen, Xi; Zhou, Liqing
2015-12-01
With the development of satellite remote sensing technology and the remote sensing image data, traditional remote sensing image segmentation technology cannot meet the massive remote sensing image processing and storage requirements. This article put cloud computing and parallel computing technology in remote sensing image segmentation process, and build a cheap and efficient computer cluster system that uses parallel processing to achieve MeanShift algorithm of remote sensing image segmentation based on the MapReduce model, not only to ensure the quality of remote sensing image segmentation, improved split speed, and better meet the real-time requirements. The remote sensing image segmentation MeanShift algorithm parallel processing algorithm based on MapReduce shows certain significance and a realization of value.
NASA Astrophysics Data System (ADS)
He, Yaoyao; Yang, Shanlin; Xu, Qifa
2013-07-01
In order to solve the model of short-term cascaded hydroelectric system scheduling, a novel chaotic particle swarm optimization (CPSO) algorithm using improved logistic map is introduced, which uses the water discharge as the decision variables combined with the death penalty function. According to the principle of maximum power generation, the proposed approach makes use of the ergodicity, symmetry and stochastic property of improved logistic chaotic map for enhancing the performance of particle swarm optimization (PSO) algorithm. The new hybrid method has been examined and tested on two test functions and a practical cascaded hydroelectric system. The experimental results show that the effectiveness and robustness of the proposed CPSO algorithm in comparison with other traditional algorithms.
Wilson, Maximiliano A; Joubert, Sven; Ferré, Perrine; Belleville, Sylvie; Ansaldo, Ana Inés; Joanette, Yves; Rouleau, Isabelle; Brambati, Simona Maria
2012-05-01
Semantic dementia (SD) is a neurodegenerative disease that occurs following the atrophy of the anterior temporal lobes (ATLs). It is characterised by the degradation of semantic knowledge and difficulties in reading exception words (surface dyslexia). This disease has highlighted the role of the ATLs in the process of exception word reading. However, imaging studies in healthy subjects have failed to detect activation of the ATLs during exception word reading. The aim of the present study was to test whether the functional brain regions that mediate exception word reading in normal readers overlap those brain regions atrophied in SD. In Study One, we map the brain regions of grey matter atrophy in AF, a patient with mild SD and surface dyslexia profile. In Study Two, we map the activation pattern associated with exception word compared to pseudoword reading in young, healthy participants using fMRI. The results revealed areas of significant activation in healthy subjects engaged in the exception word reading task in the left anterior middle temporal gyrus, in a region observed to be atrophic in the patient AF. These results reconcile neuropsychological and functional imaging data, revealing the critical role of the left ATL in exception word reading. Copyright © 2012 Elsevier Inc. All rights reserved.
Observation of quantum criticality with ultracold atoms in optical lattices
NASA Astrophysics Data System (ADS)
Zhang, Xibo
As biological problems are becoming more complex and data growing at a rate much faster than that of computer hardware, new and faster algorithms are required. This dissertation investigates computational problems arising in two of the fields: comparative genomics and epigenomics, and employs a variety of computational techniques to address the problems. One fundamental question in the studies of chromosome evolution is whether the rearrangement breakpoints are happening at random positions or along certain hotspots. We investigate the breakpoint reuse phenomenon, and show the analyses that support the more recently proposed fragile breakage model as opposed to the conventional random breakage models for chromosome evolution. The identification of syntenic regions between chromosomes forms the basis for studies of genome architectures, comparative genomics, and evolutionary genomics. The previous synteny block reconstruction algorithms could not be scaled to a large number of mammalian genomes being sequenced; neither did they address the issue of generating non-overlapping synteny blocks suitable for analyzing rearrangements and evolutionary history of large-scale duplications prevalent in plant genomes. We present a new unified synteny block generation algorithm based on A-Bruijn graph framework that overcomes these shortcomings. In the epigenome sequencing, a sample may contain a mixture of epigenomes and there is a need to resolve the distinct methylation patterns from the mixture. Many sequencing applications, such as haplotype inference for diploid or polyploid genomes, and metagenomic sequencing, share the similar objective: to infer a set of distinct assemblies from reads that are sequenced from a heterogeneous sample and subsequently aligned to a reference genome. We model the problem from both a combinatorial and a statistical angles. First, we describe a theoretical framework. A linear-time algorithm is then given to resolve a minimum number of assemblies that are consistent with all reads, substantially improving on previous algorithms. An efficient algorithm is also described to determine a set of assemblies that is consistent with a maximum subset of the reads, a previously untreated problem. We then prove that allowing nested reads or permitting mismatches between reads and their assemblies renders these problems NP-hard. Second, we describe a mixture model-based approach, and applied the model for the detection of allele-specific methylations.
A MAP blind image deconvolution algorithm with bandwidth over-constrained
NASA Astrophysics Data System (ADS)
Ren, Zhilei; Liu, Jin; Liang, Yonghui; He, Yulong
2018-03-01
We demonstrate a maximum a posteriori (MAP) blind image deconvolution algorithm with bandwidth over-constrained and total variation (TV) regularization to recover a clear image from the AO corrected images. The point spread functions (PSFs) are estimated by bandwidth limited less than the cutoff frequency of the optical system. Our algorithm performs well in avoiding noise magnification. The performance is demonstrated on simulated data.
ERIC Educational Resources Information Center
Koren, Mike
2009-01-01
In this article, the author describes a bike trip which marks the culmination of a unit reviewing map-reading capabilities. In seventh grade, students develop various map skills, including cardinal and intermediate directions, how to measure distance on a map using a scale of miles, how to interpret the legend of a map, and how to locate places…
NASA Astrophysics Data System (ADS)
Liu, Zeyu; Xia, Tiecheng; Wang, Jinbo
2018-03-01
We propose a new fractional two-dimensional triangle function combination discrete chaotic map (2D-TFCDM) with the discrete fractional difference. Moreover, the chaos behaviors of the proposed map are observed and the bifurcation diagrams, the largest Lyapunov exponent plot, and the phase portraits are derived, respectively. Finally, with the secret keys generated by Menezes–Vanstone elliptic curve cryptosystem, we apply the discrete fractional map into color image encryption. After that, the image encryption algorithm is analyzed in four aspects and the result indicates that the proposed algorithm is more superior than the other algorithms. Project supported by the National Natural Science Foundation of China (Grant Nos. 61072147 and 11271008).
Libraries, the MAP, and Student Achievement.
ERIC Educational Resources Information Center
Jones, Cherri; Singer, Marietta; Miller, David W.; Makemson, Carroll; Elliott, Kara; Litsch, Diana; Irwin, Barbara; Hoemann, Cheryl; Elmore, Jennifer; Roe, Patty; Gregg, Diane; Needham, Joyce; Stanley, Jerri; Reinert, John; Holtz, Judy; Jenkins, Sandra; Giles, Paula
2002-01-01
Includes 17 articles that discuss the Missouri Assessment Program (MAP) and the role of school library media centers. Highlights include improving student achievement; improving student scores on the MAP; graphic organizers; programs for volunteer student library workers; research process; research skills; reading initiatives; collaborative…
NASA Astrophysics Data System (ADS)
Tatar, N.; Saadatseresht, M.; Arefi, H.
2017-09-01
Semi Global Matching (SGM) algorithm is known as a high performance and reliable stereo matching algorithm in photogrammetry community. However, there are some challenges using this algorithm especially for high resolution satellite stereo images over urban areas and images with shadow areas. As it can be seen, unfortunately the SGM algorithm computes highly noisy disparity values for shadow areas around the tall neighborhood buildings due to mismatching in these lower entropy areas. In this paper, a new method is developed to refine the disparity map in shadow areas. The method is based on the integration of potential of panchromatic and multispectral image data to detect shadow areas in object level. In addition, a RANSAC plane fitting and morphological filtering are employed to refine the disparity map. The results on a stereo pair of GeoEye-1 captured over Qom city in Iran, shows a significant increase in the rate of matched pixels compared to standard SGM algorithm.
An algorithmic approach to the brain biopsy--part I.
Kleinschmidt-DeMasters, B K; Prayson, Richard A
2006-11-01
The formulation of appropriate differential diagnoses for a slide is essential to the practice of surgical pathology but can be particularly challenging for residents and fellows. Algorithmic flow charts can help the less experienced pathologist to systematically consider all possible choices and eliminate incorrect diagnoses. They can assist pathologists-in-training in developing orderly, sequential, and logical thinking skills when confronting difficult cases. To present an algorithmic flow chart as an approach to formulating differential diagnoses for lesions seen in surgical neuropathology. An algorithmic flow chart to be used in teaching residents. Algorithms are not intended to be final diagnostic answers on any given case. Algorithms do not substitute for training received from experienced mentors nor do they substitute for comprehensive reading by trainees of reference textbooks. Algorithmic flow diagrams can, however, direct the viewer to the correct spot in reference texts for further in-depth reading once they hone down their diagnostic choices to a smaller number of entities. The best feature of algorithms is that they remind the user to consider all possibilities on each case, even if they can be quickly eliminated from further consideration. In Part I, we assist the resident in learning how to handle brain biopsies in general and how to distinguish nonneoplastic lesions that mimic tumors from true neoplasms.
An algorithmic approach to the brain biopsy--part II.
Prayson, Richard A; Kleinschmidt-DeMasters, B K
2006-11-01
The formulation of appropriate differential diagnoses for a slide is essential to the practice of surgical pathology but can be particularly challenging for residents and fellows. Algorithmic flow charts can help the less experienced pathologist to systematically consider all possible choices and eliminate incorrect diagnoses. They can assist pathologists-in-training in developing orderly, sequential, and logical thinking skills when confronting difficult cases. To present an algorithmic flow chart as an approach to formulating differential diagnoses for lesions seen in surgical neuropathology. An algorithmic flow chart to be used in teaching residents. Algorithms are not intended to be final diagnostic answers on any given case. Algorithms do not substitute for training received from experienced mentors nor do they substitute for comprehensive reading by trainees of reference textbooks. Algorithmic flow diagrams can, however, direct the viewer to the correct spot in reference texts for further in-depth reading once they hone down their diagnostic choices to a smaller number of entities. The best feature of algorithms is that they remind the user to consider all possibilities on each case, even if they can be quickly eliminated from further consideration. In Part II, we assist the resident in arriving at the correct diagnosis for neuropathologic lesions containing granulomatous inflammation, macrophages, or abnormal blood vessels.
Pomareda, Víctor; Magrans, Rudys; Jiménez-Soto, Juan M; Martínez, Dani; Tresánchez, Marcel; Burgués, Javier; Palacín, Jordi; Marco, Santiago
2017-04-20
We present the estimation of a likelihood map for the location of the source of a chemical plume dispersed under atmospheric turbulence under uniform wind conditions. The main contribution of this work is to extend previous proposals based on Bayesian inference with binary detections to the use of concentration information while at the same time being robust against the presence of background chemical noise. For that, the algorithm builds a background model with robust statistics measurements to assess the posterior probability that a given chemical concentration reading comes from the background or from a source emitting at a distance with a specific release rate. In addition, our algorithm allows multiple mobile gas sensors to be used. Ten realistic simulations and ten real data experiments are used for evaluation purposes. For the simulations, we have supposed that sensors are mounted on cars which do not have among its main tasks navigating toward the source. To collect the real dataset, a special arena with induced wind is built, and an autonomous vehicle equipped with several sensors, including a photo ionization detector (PID) for sensing chemical concentration, is used. Simulation results show that our algorithm, provides a better estimation of the source location even for a low background level that benefits the performance of binary version. The improvement is clear for the synthetic data while for real data the estimation is only slightly better, probably because our exploration arena is not able to provide uniform wind conditions. Finally, an estimation of the computational cost of the algorithmic proposal is presented.
Evaluating progressive-rendering algorithms in appearance design tasks.
Jiawei Ou; Karlik, Ondrej; Křivánek, Jaroslav; Pellacini, Fabio
2013-01-01
Progressive rendering is becoming a popular alternative to precomputational approaches to appearance design. However, progressive algorithms create images exhibiting visual artifacts at early stages. A user study investigated these artifacts' effects on user performance in appearance design tasks. Novice and expert subjects performed lighting and material editing tasks with four algorithms: random path tracing, quasirandom path tracing, progressive photon mapping, and virtual-point-light rendering. Both the novices and experts strongly preferred path tracing to progressive photon mapping and virtual-point-light rendering. None of the participants preferred random path tracing to quasirandom path tracing or vice versa; the same situation held between progressive photon mapping and virtual-point-light rendering. The user workflow didn’t differ significantly with the four algorithms. The Web Extras include a video showing how four progressive-rendering algorithms converged (at http://youtu.be/ck-Gevl1e9s), the source code used, and other supplementary materials.
A simple algorithm for large-scale mapping of evergreen forests in tropical America, Africa and Asia
Xiangming Xiao; Chandrashekhar M. Biradar; Christina Czarnecki; Tunrayo Alabi; Michael Keller
2009-01-01
The areal extent and spatial distribution of evergreen forests in the tropical zones are important for the study of climate, carbon cycle and biodiversity. However, frequent cloud cover in the tropical regions makes mapping evergreen forests a challenging task. In this study we developed a simple and novel mapping algorithm that is based on the temporal profile...
Threshold automatic selection hybrid phase unwrapping algorithm for digital holographic microscopy
NASA Astrophysics Data System (ADS)
Zhou, Meiling; Min, Junwei; Yao, Baoli; Yu, Xianghua; Lei, Ming; Yan, Shaohui; Yang, Yanlong; Dan, Dan
2015-01-01
Conventional quality-guided (QG) phase unwrapping algorithm is hard to be applied to digital holographic microscopy because of the long execution time. In this paper, we present a threshold automatic selection hybrid phase unwrapping algorithm that combines the existing QG algorithm and the flood-filled (FF) algorithm to solve this problem. The original wrapped phase map is divided into high- and low-quality sub-maps by selecting a threshold automatically, and then the FF and QG unwrapping algorithms are used in each level to unwrap the phase, respectively. The feasibility of the proposed method is proved by experimental results, and the execution speed is shown to be much faster than that of the original QG unwrapping algorithm.
Concept of AHRS Algorithm Designed for Platform Independent Imu Attitude Alignment
NASA Astrophysics Data System (ADS)
Tomaszewski, Dariusz; Rapiński, Jacek; Pelc-Mieczkowska, Renata
2017-12-01
Nowadays, along with the advancement of technology one can notice the rapid development of various types of navigation systems. So far the most popular satellite navigation, is now supported by positioning results calculated with use of other measurement system. The method and manner of integration will depend directly on the destination of system being developed. To increase the frequency of readings and improve the operation of outdoor navigation systems, one will support satellite navigation systems (GPS, GLONASS ect.) with inertial navigation. Such method of navigation consists of several steps. The first stage is the determination of initial orientation of inertial measurement unit, called INS alignment. During this process, on the basis of acceleration and the angular velocity readings, values of Euler angles (pitch, roll, yaw) are calculated allowing for unambiguous orientation of the sensor coordinate system relative to external coordinate system. The following study presents the concept of AHRS (Attitude and heading reference system) algorithm, allowing to define the Euler angles.The study were conducted with the use of readings from low-cost MEMS cell phone sensors. Subsequently the results of the study were analyzed to determine the accuracy of featured algorithm. On the basis of performed experiments the legitimacy of developed algorithm was stated.
Apriori Versions Based on MapReduce for Mining Frequent Patterns on Big Data.
Luna, Jose Maria; Padillo, Francisco; Pechenizkiy, Mykola; Ventura, Sebastian
2017-09-27
Pattern mining is one of the most important tasks to extract meaningful and useful information from raw data. This task aims to extract item-sets that represent any type of homogeneity and regularity in data. Although many efficient algorithms have been developed in this regard, the growing interest in data has caused the performance of existing pattern mining techniques to be dropped. The goal of this paper is to propose new efficient pattern mining algorithms to work in big data. To this aim, a series of algorithms based on the MapReduce framework and the Hadoop open-source implementation have been proposed. The proposed algorithms can be divided into three main groups. First, two algorithms [Apriori MapReduce (AprioriMR) and iterative AprioriMR] with no pruning strategy are proposed, which extract any existing item-set in data. Second, two algorithms (space pruning AprioriMR and top AprioriMR) that prune the search space by means of the well-known anti-monotone property are proposed. Finally, a last algorithm (maximal AprioriMR) is also proposed for mining condensed representations of frequent patterns. To test the performance of the proposed algorithms, a varied collection of big data datasets have been considered, comprising up to 3 · 10#x00B9;⁸ transactions and more than 5 million of distinct single-items. The experimental stage includes comparisons against highly efficient and well-known pattern mining algorithms. Results reveal the interest of applying MapReduce versions when complex problems are considered, and also the unsuitability of this paradigm when dealing with small data.
Geological Mapping Uses Landsat 4-5TM Satellite Data in Manlai Soum of Omnogovi Aimag
NASA Astrophysics Data System (ADS)
Norovsuren, B.
2014-12-01
Author: Bayanmonkh N1, Undram.G1, Tsolmon.R2, Ariunzul.Ya1, Bayartungalag B31 Environmental Research Information and Study Center 2NUM-ITC-UNESCO Space Science and Remote Sensing International Laboratory, National University of Mongolia 3Geology and Hydrology School, Korea University KEY WORDS: geology, mineral resources, fracture, structure, lithologyABSTRACTGeologic map is the most important map for mining when it does exploration job. In Mongolia geological map completed by Russian geologists which is done by earlier technology. Those maps doesn't satisfy for present requirements. Thus we want to study improve geological map which includes fracture, structural map and lithology use Landsat TM4-5 satellite data. If we can produce a geological map from satellite data with more specification then geologist can explain or read mineralogy very easily. We searched all methodology and researches of every single element of geological mapping. Then we used 3 different remote sensing methodologies to produce structural and lithology and fracture map based on geographic information system's softwares. There can be found a visible lithology border improvement and understandable structural map and we found fracture of the Russian geological map has a lot of distortion. The result of research geologist can read mineralogy elements very easy and discovered 3 unfound important elements from satellite image.
Improvement of the cost-benefit analysis algorithm for high-rise construction projects
NASA Astrophysics Data System (ADS)
Gafurov, Andrey; Skotarenko, Oksana; Plotnikov, Vladimir
2018-03-01
The specific nature of high-rise investment projects entailing long-term construction, high risks, etc. implies a need to improve the standard algorithm of cost-benefit analysis. An improved algorithm is described in the article. For development of the improved algorithm of cost-benefit analysis for high-rise construction projects, the following methods were used: weighted average cost of capital, dynamic cost-benefit analysis of investment projects, risk mapping, scenario analysis, sensitivity analysis of critical ratios, etc. This comprehensive approach helped to adapt the original algorithm to feasibility objectives in high-rise construction. The authors put together the algorithm of cost-benefit analysis for high-rise construction projects on the basis of risk mapping and sensitivity analysis of critical ratios. The suggested project risk management algorithms greatly expand the standard algorithm of cost-benefit analysis in investment projects, namely: the "Project analysis scenario" flowchart, improving quality and reliability of forecasting reports in investment projects; the main stages of cash flow adjustment based on risk mapping for better cost-benefit project analysis provided the broad range of risks in high-rise construction; analysis of dynamic cost-benefit values considering project sensitivity to crucial variables, improving flexibility in implementation of high-rise projects.
An algorithm for automated layout of process description maps drawn in SBGN.
Genc, Begum; Dogrusoz, Ugur
2016-01-01
Evolving technology has increased the focus on genomics. The combination of today's advanced techniques with decades of molecular biology research has yielded huge amounts of pathway data. A standard, named the Systems Biology Graphical Notation (SBGN), was recently introduced to allow scientists to represent biological pathways in an unambiguous, easy-to-understand and efficient manner. Although there are a number of automated layout algorithms for various types of biological networks, currently none specialize on process description (PD) maps as defined by SBGN. We propose a new automated layout algorithm for PD maps drawn in SBGN. Our algorithm is based on a force-directed automated layout algorithm called Compound Spring Embedder (CoSE). On top of the existing force scheme, additional heuristics employing new types of forces and movement rules are defined to address SBGN-specific rules. Our algorithm is the only automatic layout algorithm that properly addresses all SBGN rules for drawing PD maps, including placement of substrates and products of process nodes on opposite sides, compact tiling of members of molecular complexes and extensively making use of nested structures (compound nodes) to properly draw cellular locations and molecular complex structures. As demonstrated experimentally, the algorithm results in significant improvements over use of a generic layout algorithm such as CoSE in addressing SBGN rules on top of commonly accepted graph drawing criteria. An implementation of our algorithm in Java is available within ChiLay library (https://github.com/iVis-at-Bilkent/chilay). ugur@cs.bilkent.edu.tr or dogrusoz@cbio.mskcc.org Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
An algorithm for automated layout of process description maps drawn in SBGN
Genc, Begum; Dogrusoz, Ugur
2016-01-01
Motivation: Evolving technology has increased the focus on genomics. The combination of today’s advanced techniques with decades of molecular biology research has yielded huge amounts of pathway data. A standard, named the Systems Biology Graphical Notation (SBGN), was recently introduced to allow scientists to represent biological pathways in an unambiguous, easy-to-understand and efficient manner. Although there are a number of automated layout algorithms for various types of biological networks, currently none specialize on process description (PD) maps as defined by SBGN. Results: We propose a new automated layout algorithm for PD maps drawn in SBGN. Our algorithm is based on a force-directed automated layout algorithm called Compound Spring Embedder (CoSE). On top of the existing force scheme, additional heuristics employing new types of forces and movement rules are defined to address SBGN-specific rules. Our algorithm is the only automatic layout algorithm that properly addresses all SBGN rules for drawing PD maps, including placement of substrates and products of process nodes on opposite sides, compact tiling of members of molecular complexes and extensively making use of nested structures (compound nodes) to properly draw cellular locations and molecular complex structures. As demonstrated experimentally, the algorithm results in significant improvements over use of a generic layout algorithm such as CoSE in addressing SBGN rules on top of commonly accepted graph drawing criteria. Availability and implementation: An implementation of our algorithm in Java is available within ChiLay library (https://github.com/iVis-at-Bilkent/chilay). Contact: ugur@cs.bilkent.edu.tr or dogrusoz@cbio.mskcc.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26363029
A trace map comparison algorithm for the discrete fracture network models of rock masses
NASA Astrophysics Data System (ADS)
Han, Shuai; Wang, Gang; Li, Mingchao
2018-06-01
Discrete fracture networks (DFN) are widely used to build refined geological models. However, validating whether a refined model can match to reality is a crucial problem, concerning whether the model can be used for analysis. The current validation methods include numerical validation and graphical validation. However, the graphical validation, aiming at estimating the similarity between a simulated trace map and the real trace map by visual observation, is subjective. In this paper, an algorithm for the graphical validation of DFN is set up. Four main indicators, including total gray, gray grade curve, characteristic direction and gray density distribution curve, are presented to assess the similarity between two trace maps. A modified Radon transform and loop cosine similarity are presented based on Radon transform and cosine similarity respectively. Besides, how to use Bézier curve to reduce the edge effect is described. Finally, a case study shows that the new algorithm can effectively distinguish which simulated trace map is more similar to the real trace map.
Development of a Two-Wheel Contingency Mode for the MAP Spacecraft
NASA Technical Reports Server (NTRS)
Starin, Scott R.; ODonnell, James R., Jr.; Bauer, Frank (Technical Monitor)
2002-01-01
The Microwave Anisotropy Probe (MAP) is a follow-on mission to the Cosmic Background Explorer (COBE), and is currently collecting data from its orbit near the second Sun-Earth libration point. Due to limited mass, power, and financial resources, a traditional reliability concept including fully redundant components was not feasible for MAP. Instead, the MAP design employs selective hardware redundancy in tandem with contingency software modes and algorithms to improve the odds of mission success. One direction for such improvement has been the development of a two-wheel backup control strategy. This strategy would allow MAP to position itself for maneuvers and collect science data should one of its three reaction wheels fail. Along with operational considerations, the strategy includes three new control algorithms. These algorithms would use the remaining attitude control actuators-thrusters and two reaction wheels-in ways that achieve control goals while minimizing adverse impacts on the functionality of other subsystems and software.
Reading in Two Writing Systems: Accommodation and Assimilation of the Brain's Reading Network
ERIC Educational Resources Information Center
Perfetti, Charles A.; Liu, Ying; Fiez, Julie; Nelson, Jessica; Bolger, Donald J.; Tan, Li-Hai
2007-01-01
Bilingual reading can require more than knowing two languages. Learners must acquire also the writing conventions of their second language, which can differ in its deep mapping principles (writing system) and its visual configurations (script). We review ERP (event-related potential) and fMRI studies of both Chinese-English bilingualism and…
ERIC Educational Resources Information Center
Steacy, Laura M.; Elleman, Amy M.; Lovett, Maureen W.; Compton, Donald L.
2016-01-01
In English, gains in decoding skill do not map directly onto increases in word reading. However, beyond the Self-Teaching Hypothesis, little is known about the transfer of decoding skills to word reading. In this study, we offer a new approach to testing specific decoding elements on transfer to word reading. To illustrate, we modeled word-reading…
Improving the MODIS Global Snow-Mapping Algorithm
NASA Technical Reports Server (NTRS)
Klein, Andrew G.; Hall, Dorothy K.; Riggs, George A.
1997-01-01
An algorithm (Snowmap) is under development to produce global snow maps at 500 meter resolution on a daily basis using data from the NASA MODIS instrument. MODIS, the Moderate Resolution Imaging Spectroradiometer, will be launched as part of the first Earth Observing System (EOS) platform in 1998. Snowmap is a fully automated, computationally frugal algorithm that will be ready to implement at launch. Forests represent a major limitation to the global mapping of snow cover as a forest canopy both obscures and shadows the snow underneath. Landsat Thematic Mapper (TM) and MODIS Airborne Simulator (MAS) data are used to investigate the changes in reflectance that occur as a forest stand becomes snow covered and to propose changes to the Snowmap algorithm that will improve snow classification accuracy forested areas.
NASA Astrophysics Data System (ADS)
Oza, Nikunj
2012-03-01
A supervised learning task involves constructing a mapping from input data (normally described by several features) to the appropriate outputs. A set of training examples— examples with known output values—is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. Within supervised learning, one type of task is a classification learning task, in which each output is one or more classes to which the input belongs. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate’s measurements. The generalization performance of a learned model (how closely the target outputs and the model’s predicted outputs agree for patterns that have not been presented to the learning algorithm) would provide an indication of how well the model has learned the desired mapping. More formally, a classification learning algorithm L takes a training set T as its input. The training set consists of |T| examples or instances. It is assumed that there is a probability distribution D from which all training examples are drawn independently—that is, all the training examples are independently and identically distributed (i.i.d.). The ith training example is of the form (x_i, y_i), where x_i is a vector of values of several features and y_i represents the class to be predicted.* In the sunspot classification example given above, each training example would represent one sunspot’s classification (y_i) and the corresponding set of measurements (x_i). The output of a supervised learning algorithm is a model h that approximates the unknown mapping from the inputs to the outputs. In our example, h would map from the sunspot measurements to the type of sunspot. We may have a test set S—a set of examples not used in training that we use to test how well the model h predicts the outputs on new examples. Just as with the examples in T, the examples in S are assumed to be independent and identically distributed (i.i.d.) draws from the distribution D. We measure the error of h on the test set as the proportion of test cases that h misclassifies: 1/|S| Sigma(x,y union S)[I(h(x)!= y)] where I(v) is the indicator function—it returns 1 if v is true and 0 otherwise. In our sunspot classification example, we would identify additional examples of sunspots that were not used in generating the model, and use these to determine how accurate the model is—the fraction of the test samples that the model classifies correctly. An example of a classification model is the decision tree shown in Figure 23.1. We will discuss the decision tree learning algorithm in more detail later—for now, we assume that, given a training set with examples of sunspots, this decision tree is derived. This can be used to classify previously unseen examples of sunpots. For example, if a new sunspot’s inputs indicate that its "Group Length" is in the range 10-15, then the decision tree would classify the sunspot as being of type “E,” whereas if the "Group Length" is "NULL," the "Magnetic Type" is "bipolar," and the "Penumbra" is "rudimentary," then it would be classified as type "C." In this chapter, we will add to the above description of classification problems. We will discuss decision trees and several other classification models. In particular, we will discuss the learning algorithms that generate these classification models, how to use them to classify new examples, and the strengths and weaknesses of these models. We will end with pointers to further reading on classification methods applied to astronomy data.
Inoue, Kentaro; Shimozono, Shinichi; Yoshida, Hideaki; Kurata, Hiroyuki
2012-01-01
Background For visualizing large-scale biochemical network maps, it is important to calculate the coordinates of molecular nodes quickly and to enhance the understanding or traceability of them. The grid layout is effective in drawing compact, orderly, balanced network maps with node label spaces, but existing grid layout algorithms often require a high computational cost because they have to consider complicated positional constraints through the entire optimization process. Results We propose a hybrid grid layout algorithm that consists of a non-grid, fast layout (preprocessor) algorithm and an approximate pattern matching algorithm that distributes the resultant preprocessed nodes on square grid points. To demonstrate the feasibility of the hybrid layout algorithm, it is characterized in terms of the calculation time, numbers of edge-edge and node-edge crossings, relative edge lengths, and F-measures. The proposed algorithm achieves outstanding performances compared with other existing grid layouts. Conclusions Use of an approximate pattern matching algorithm quickly redistributes the laid-out nodes by fast, non-grid algorithms on the square grid points, while preserving the topological relationships among the nodes. The proposed algorithm is a novel use of the pattern matching, thereby providing a breakthrough for grid layout. This application program can be freely downloaded from http://www.cadlive.jp/hybridlayout/hybridlayout.html. PMID:22679486
Virtual Network Embedding via Monte Carlo Tree Search.
Haeri, Soroush; Trajkovic, Ljiljana
2018-02-01
Network virtualization helps overcome shortcomings of the current Internet architecture. The virtualized network architecture enables coexistence of multiple virtual networks (VNs) on an existing physical infrastructure. VN embedding (VNE) problem, which deals with the embedding of VN components onto a physical network, is known to be -hard. In this paper, we propose two VNE algorithms: MaVEn-M and MaVEn-S. MaVEn-M employs the multicommodity flow algorithm for virtual link mapping while MaVEn-S uses the shortest-path algorithm. They formalize the virtual node mapping problem by using the Markov decision process (MDP) framework and devise action policies (node mappings) for the proposed MDP using the Monte Carlo tree search algorithm. Service providers may adjust the execution time of the MaVEn algorithms based on the traffic load of VN requests. The objective of the algorithms is to maximize the profit of infrastructure providers. We develop a discrete event VNE simulator to implement and evaluate performance of MaVEn-M, MaVEn-S, and several recently proposed VNE algorithms. We introduce profitability as a new performance metric that captures both acceptance and revenue to cost ratios. Simulation results show that the proposed algorithms find more profitable solutions than the existing algorithms. Given additional computation time, they further improve embedding solutions.
Inoue, Kentaro; Shimozono, Shinichi; Yoshida, Hideaki; Kurata, Hiroyuki
2012-01-01
For visualizing large-scale biochemical network maps, it is important to calculate the coordinates of molecular nodes quickly and to enhance the understanding or traceability of them. The grid layout is effective in drawing compact, orderly, balanced network maps with node label spaces, but existing grid layout algorithms often require a high computational cost because they have to consider complicated positional constraints through the entire optimization process. We propose a hybrid grid layout algorithm that consists of a non-grid, fast layout (preprocessor) algorithm and an approximate pattern matching algorithm that distributes the resultant preprocessed nodes on square grid points. To demonstrate the feasibility of the hybrid layout algorithm, it is characterized in terms of the calculation time, numbers of edge-edge and node-edge crossings, relative edge lengths, and F-measures. The proposed algorithm achieves outstanding performances compared with other existing grid layouts. Use of an approximate pattern matching algorithm quickly redistributes the laid-out nodes by fast, non-grid algorithms on the square grid points, while preserving the topological relationships among the nodes. The proposed algorithm is a novel use of the pattern matching, thereby providing a breakthrough for grid layout. This application program can be freely downloaded from http://www.cadlive.jp/hybridlayout/hybridlayout.html.
Concept Maps: Practice Applications in Adult Education and Human Resource Development
ERIC Educational Resources Information Center
Daley, Barbara J.
2010-01-01
Concept maps can be used as both a cognitive and constructivist learning strategy in teaching and learning in adult education and human resource development. The maps can be used to understand course readings, analyze case studies, develop reflective thinking and enhance research skills. The creation of concept maps can also be supported by the…
2015-03-13
Maps of magnesium/silicon (left) and thermal neutron absorption (right) across Mercury's surface (red indicates high values, blue low) are shown. These maps, together with maps of other elemental abundances, reveal the presence of distinct geochemical terranes. Volcanic smooth plains deposits are outlined in white. Read the mission news story to learn more! http://photojournal.jpl.nasa.gov/catalog/PIA19242
Distance-Based Phylogenetic Methods Around a Polytomy.
Davidson, Ruth; Sullivant, Seth
2014-01-01
Distance-based phylogenetic algorithms attempt to solve the NP-hard least-squares phylogeny problem by mapping an arbitrary dissimilarity map representing biological data to a tree metric. The set of all dissimilarity maps is a Euclidean space properly containing the space of all tree metrics as a polyhedral fan. Outputs of distance-based tree reconstruction algorithms such as UPGMA and neighbor-joining are points in the maximal cones in the fan. Tree metrics with polytomies lie at the intersections of maximal cones. A phylogenetic algorithm divides the space of all dissimilarity maps into regions based upon which combinatorial tree is reconstructed by the algorithm. Comparison of phylogenetic methods can be done by comparing the geometry of these regions. We use polyhedral geometry to compare the local nature of the subdivisions induced by least-squares phylogeny, UPGMA, and neighbor-joining when the true tree has a single polytomy with exactly four neighbors. Our results suggest that in some circumstances, UPGMA and neighbor-joining poorly match least-squares phylogeny.
Mapreduce is Good Enough? If All You Have is a Hammer, Throw Away Everything That's Not a Nail!
Lin, Jimmy
2013-03-01
Hadoop is currently the large-scale data analysis "hammer" of choice, but there exist classes of algorithms that aren't "nails" in the sense that they are not particularly amenable to the MapReduce programming model. To address this, researchers have proposed MapReduce extensions or alternative programming models in which these algorithms can be elegantly expressed. This article espouses a very different position: that MapReduce is "good enough," and that instead of trying to invent screwdrivers, we should simply get rid of everything that's not a nail. To be more specific, much discussion in the literature surrounds the fact that iterative algorithms are a poor fit for MapReduce. The simple solution is to find alternative, noniterative algorithms that solve the same problem. This article captures my personal experiences as an academic researcher as well as a software engineer in a "real-world" production analytics environment. From this combined perspective, I reflect on the current state and future of "big data" research.
Liu, Zhong; Gao, Xiaoguang; Fu, Xiaowei
2018-05-08
In this paper, we mainly study a cooperative search and coverage algorithm for a given bounded rectangle region, which contains several unknown stationary targets, by a team of unmanned aerial vehicles (UAVs) with non-ideal sensors and limited communication ranges. Our goal is to minimize the search time, while gathering more information about the environment and finding more targets. For this purpose, a novel cooperative search and coverage algorithm with controllable revisit mechanism is presented. Firstly, as the representation of the environment, the cognitive maps that included the target probability map (TPM), the uncertain map (UM), and the digital pheromone map (DPM) are constituted. We also design a distributed update and fusion scheme for the cognitive map. This update and fusion scheme can guarantee that each one of the cognitive maps converges to the same one, which reflects the targets’ true existence or absence in each cell of the search region. Secondly, we develop a controllable revisit mechanism based on the DPM. This mechanism can concentrate the UAVs to revisit sub-areas that have a large target probability or high uncertainty. Thirdly, in the frame of distributed receding horizon optimizing, a path planning algorithm for the multi-UAVs cooperative search and coverage is designed. In the path planning algorithm, the movement of the UAVs is restricted by the potential fields to meet the requirements of avoiding collision and maintaining connectivity constraints. Moreover, using the minimum spanning tree (MST) topology optimization strategy, we can obtain a tradeoff between the search coverage enhancement and the connectivity maintenance. The feasibility of the proposed algorithm is demonstrated by comparison simulations by way of analyzing the effects of the controllable revisit mechanism and the connectivity maintenance scheme. The Monte Carlo method is employed to validate the influence of the number of UAVs, the sensing radius, the detection and false alarm probabilities, and the communication range on the proposed algorithm.
Boyer, Nicole R S; Miller, Sarah; Connolly, Paul; McIntosh, Emma
2016-04-01
The Strengths and Difficulties Questionnaire (SDQ) is a behavioural screening tool for children. The SDQ is increasingly used as the primary outcome measure in population health interventions involving children, but it is not preference based; therefore, its role in allocative economic evaluation is limited. The Child Health Utility 9D (CHU9D) is a generic preference-based health-related quality of-life measure. This study investigates the applicability of the SDQ outcome measure for use in economic evaluations and examines its relationship with the CHU9D by testing previously published mapping algorithms. The aim of the paper is to explore the feasibility of using the SDQ within economic evaluations of school-based population health interventions. Data were available from children participating in a cluster randomised controlled trial of the school-based roots of empathy programme in Northern Ireland. Utility was calculated using the original and alternative CHU9D tariffs along with two SDQ mapping algorithms. t tests were performed for pairwise differences in utility values from the preference-based tariffs and mapping algorithms. Mean (standard deviation) SDQ total difficulties and prosocial scores were 12 (3.2) and 8.3 (2.1). Utility values obtained from the original tariff, alternative tariff, and mapping algorithms using five and three SDQ subscales were 0.84 (0.11), 0.80 (0.13), 0.84 (0.05), and 0.83 (0.04), respectively. Each method for calculating utility produced statistically significantly different values except the original tariff and five SDQ subscale algorithm. Initial evidence suggests the SDQ and CHU9D are related in some of their measurement properties. The mapping algorithm using five SDQ subscales was found to be optimal in predicting mean child health utility. Future research valuing changes in the SDQ scores would contribute to this research.
Heuristic and algorithmic processing in English, mathematics, and science education.
Sharps, Matthew J; Hess, Adam B; Price-Sharps, Jana L; Teh, Jane
2008-01-01
Many college students experience difficulties in basic academic skills. Recent research suggests that much of this difficulty may lie in heuristic competency--the ability to use and successfully manage general cognitive strategies. In the present study, the authors evaluated this possibility. They compared participants' performance on a practice California Basic Educational Skills Test and on a series of questions in the natural sciences with heuristic and algorithmic performance on a series of mathematics and reading comprehension exercises. Heuristic competency in mathematics was associated with better scores in science and mathematics. Verbal and algorithmic skills were associated with better reading comprehension. These results indicate the importance of including heuristic training in educational contexts and highlight the importance of a relatively domain-specific approach to questions of cognition in higher education.
Towards Unmanned Systems for Dismounted Operations in the Canadian Forces
2011-01-01
LIDAR , and RADAR) and lower power/mass, passive imaging techniques such as structure from motion and simultaneous localisation and mapping ( SLAM ...sensors and learning algorithms. 5.1.2 Simultaneous localisation and mapping SLAM algorithms concurrently estimate a robot pose and a map of unique...locations and vehicle pose are part of the SLAM state vector and are estimated in each update step. AISS developed a monocular camera-based SLAM
The present and future of de novo whole-genome assembly.
Sohn, Jang-Il; Nam, Jin-Wu
2018-01-01
As the advent of next-generation sequencing (NGS) technology, various de novo assembly algorithms based on the de Bruijn graph have been developed to construct chromosome-level sequences. However, numerous technical or computational challenges in de novo assembly still remain, although many bright ideas and heuristics have been suggested to tackle the challenges in both experimental and computational settings. In this review, we categorize de novo assemblers on the basis of the type of de Bruijn graphs (Hamiltonian and Eulerian) and discuss the challenges of de novo assembly for short NGS reads regarding computational complexity and assembly ambiguity. Then, we discuss how the limitations of the short reads can be overcome by using a single-molecule sequencing platform that generates long reads of up to several kilobases. In fact, the long read assembly has caused a paradigm shift in whole-genome assembly in terms of algorithms and supporting steps. We also summarize (i) hybrid assemblies using both short and long reads and (ii) overlap-based assemblies for long reads and discuss their challenges and future prospects. This review provides guidelines to determine the optimal approach for a given input data type, computational budget or genome. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Handling Data Skew in MapReduce Cluster by Using Partition Tuning
Gao, Yufei; Zhou, Yanjie; Zhou, Bing; Shi, Lei; Zhang, Jiacai
2017-01-01
The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years. The MapReduce programming model has been successfully used for big data analytics. However, data skew invariably occurs in big data analytics and seriously affects efficiency. To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH). In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew. The robustness and efficiency of the proposed algorithm were tested on a wide variety of simulated datasets and real healthcare datasets. The results showed that PTSH algorithm can handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the native Hadoop, Closer, and locality-aware and fairness-aware key partitioning (LEEN). We also found that the time needed for rule extraction can be reduced significantly by adopting the PTSH algorithm, since it is more suitable for association rule mining (ARM) on healthcare data. © 2017 Yufei Gao et al.
Handling Data Skew in MapReduce Cluster by Using Partition Tuning.
Gao, Yufei; Zhou, Yanjie; Zhou, Bing; Shi, Lei; Zhang, Jiacai
2017-01-01
The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years. The MapReduce programming model has been successfully used for big data analytics. However, data skew invariably occurs in big data analytics and seriously affects efficiency. To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH). In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew. The robustness and efficiency of the proposed algorithm were tested on a wide variety of simulated datasets and real healthcare datasets. The results showed that PTSH algorithm can handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the native Hadoop, Closer, and locality-aware and fairness-aware key partitioning (LEEN). We also found that the time needed for rule extraction can be reduced significantly by adopting the PTSH algorithm, since it is more suitable for association rule mining (ARM) on healthcare data.
Handling Data Skew in MapReduce Cluster by Using Partition Tuning
Zhou, Yanjie; Zhou, Bing; Shi, Lei
2017-01-01
The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years. The MapReduce programming model has been successfully used for big data analytics. However, data skew invariably occurs in big data analytics and seriously affects efficiency. To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH). In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew. The robustness and efficiency of the proposed algorithm were tested on a wide variety of simulated datasets and real healthcare datasets. The results showed that PTSH algorithm can handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the native Hadoop, Closer, and locality-aware and fairness-aware key partitioning (LEEN). We also found that the time needed for rule extraction can be reduced significantly by adopting the PTSH algorithm, since it is more suitable for association rule mining (ARM) on healthcare data. PMID:29065568
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.
Kundeti, Vamsi K; Rajasekaran, Sanguthevar; Dinh, Hieu; Vaughn, Matthew; Thapar, Vishal
2010-11-15
Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ) messages (Σ being the size of the alphabet). In this paper we present a Θ(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster--both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. The bi-directed de Bruijn graph is a fundamental data structure for any sequence assembly program based on Eulerian approach. Our algorithms for constructing Bi-directed de Bruijn graphs are efficient in parallel and out of core settings. These algorithms can be used in building large scale bi-directed de Bruijn graphs. Furthermore, our algorithms do not employ any all-to-all communications in a parallel setting and perform better than the prior algorithms. Finally our out-of-core algorithm is extremely memory efficient and can replace the existing graph construction algorithm in VELVET.
Biomedical Terminology Mapper for UML projects.
Thibault, Julien C; Frey, Lewis
2013-01-01
As the biomedical community collects and generates more and more data, the need to describe these datasets for exchange and interoperability becomes crucial. This paper presents a mapping algorithm that can help developers expose local implementations described with UML through standard terminologies. The input UML class or attribute name is first normalized and tokenized, then lookups in a UMLS-based dictionary are performed. For the evaluation of the algorithm 142 UML projects were extracted from caGrid and automatically mapped to National Cancer Institute (NCI) terminology concepts. Resulting mappings at the UML class and attribute levels were compared to the manually curated annotations provided in caGrid. Results are promising and show that this type of algorithm could speed-up the tedious process of mapping local implementations to standard biomedical terminologies.
Biomedical Terminology Mapper for UML projects
Thibault, Julien C.; Frey, Lewis
As the biomedical community collects and generates more and more data, the need to describe these datasets for exchange and interoperability becomes crucial. This paper presents a mapping algorithm that can help developers expose local implementations described with UML through standard terminologies. The input UML class or attribute name is first normalized and tokenized, then lookups in a UMLS-based dictionary are performed. For the evaluation of the algorithm 142 UML projects were extracted from caGrid and automatically mapped to National Cancer Institute (NCI) terminology concepts. Resulting mappings at the UML class and attribute levels were compared to the manually curated annotations provided in caGrid. Results are promising and show that this type of algorithm could speed-up the tedious process of mapping local implementations to standard biomedical terminologies. PMID:24303278
Texture Analysis of Chaotic Coupled Map Lattices Based Image Encryption Algorithm
NASA Astrophysics Data System (ADS)
Khan, Majid; Shah, Tariq; Batool, Syeda Iram
2014-09-01
As of late, data security is key in different enclosures like web correspondence, media frameworks, therapeutic imaging, telemedicine and military correspondence. In any case, a large portion of them confronted with a few issues, for example, the absence of heartiness and security. In this letter, in the wake of exploring the fundamental purposes of the chaotic trigonometric maps and the coupled map lattices, we have presented the algorithm of chaos-based image encryption based on coupled map lattices. The proposed mechanism diminishes intermittent impact of the ergodic dynamical systems in the chaos-based image encryption. To assess the security of the encoded image of this scheme, the association of two nearby pixels and composition peculiarities were performed. This algorithm tries to minimize the problems arises in image encryption.
Xu, Kebiao; Xie, Tianyu; Li, Zhaokai; Xu, Xiangkun; Wang, Mengqi; Ye, Xiangyu; Kong, Fei; Geng, Jianpei; Duan, Changkui; Shi, Fazhan; Du, Jiangfeng
2017-03-31
The adiabatic quantum computation is a universal and robust method of quantum computing. In this architecture, the problem can be solved by adiabatically evolving the quantum processor from the ground state of a simple initial Hamiltonian to that of a final one, which encodes the solution of the problem. Adiabatic quantum computation has been proved to be a compatible candidate for scalable quantum computation. In this Letter, we report on the experimental realization of an adiabatic quantum algorithm on a single solid spin system under ambient conditions. All elements of adiabatic quantum computation, including initial state preparation, adiabatic evolution (simulated by optimal control), and final state read-out, are realized experimentally. As an example, we found the ground state of the problem Hamiltonian S_{z}I_{z} on our adiabatic quantum processor, which can be mapped to the factorization of 35 into its prime factors 5 and 7.
NASA Astrophysics Data System (ADS)
Xu, Kebiao; Xie, Tianyu; Li, Zhaokai; Xu, Xiangkun; Wang, Mengqi; Ye, Xiangyu; Kong, Fei; Geng, Jianpei; Duan, Changkui; Shi, Fazhan; Du, Jiangfeng
2017-03-01
The adiabatic quantum computation is a universal and robust method of quantum computing. In this architecture, the problem can be solved by adiabatically evolving the quantum processor from the ground state of a simple initial Hamiltonian to that of a final one, which encodes the solution of the problem. Adiabatic quantum computation has been proved to be a compatible candidate for scalable quantum computation. In this Letter, we report on the experimental realization of an adiabatic quantum algorithm on a single solid spin system under ambient conditions. All elements of adiabatic quantum computation, including initial state preparation, adiabatic evolution (simulated by optimal control), and final state read-out, are realized experimentally. As an example, we found the ground state of the problem Hamiltonian SzIz on our adiabatic quantum processor, which can be mapped to the factorization of 35 into its prime factors 5 and 7.
ABACAS: algorithm-based automatic contiguation of assembled sequences
Assefa, Samuel; Keane, Thomas M.; Otto, Thomas D.; Newbold, Chris; Berriman, Matthew
2009-01-01
Summary: Due to the availability of new sequencing technologies, we are now increasingly interested in sequencing closely related strains of existing finished genomes. Recently a number of de novo and mapping-based assemblers have been developed to produce high quality draft genomes from new sequencing technology reads. New tools are necessary to take contigs from a draft assembly through to a fully contiguated genome sequence. ABACAS is intended as a tool to rapidly contiguate (align, order, orientate), visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence. The input to ABACAS is a set of contigs which will be aligned to the reference genome, ordered and orientated, visualized in the ACT comparative browser, and optimal primer sequences are automatically generated. Availability and Implementation: ABACAS is implemented in Perl and is freely available for download from http://abacas.sourceforge.net Contact: sa4@sanger.ac.uk PMID:19497936
A CT and MRI scan to MCNP input conversion program.
Van Riper, Kenneth A
2005-01-01
We describe a new program to read a sequence of tomographic scans and prepare the geometry and material sections of an MCNP input file. Image processing techniques include contrast controls and mapping of grey scales to colour. The user interface provides several tools with which the user can associate a range of image intensities to an MCNP material. Materials are loaded from a library. A separate material assignment can be made to a pixel intensity or range of intensities when that intensity dominates the image boundaries; this material is assigned to all pixels with that intensity contiguous with the boundary. Material fractions are computed in a user-specified voxel grid overlaying the scans. New materials are defined by mixing the library materials using the fractions. The geometry can be written as an MCNP lattice or as individual cells. A combination algorithm can be used to join neighbouring cells with the same material.
Hernandez, Penni; Podchiyska, Tanya; Weber, Susan; Ferris, Todd; Lowe, Henry
2009-11-14
The Stanford Translational Research Integrated Database Environment (STRIDE) clinical data warehouse integrates medication information from two Stanford hospitals that use different drug representation systems. To merge this pharmacy data into a single, standards-based model supporting research we developed an algorithm to map HL7 pharmacy orders to RxNorm concepts. A formal evaluation of this algorithm on 1.5 million pharmacy orders showed that the system could accurately assign pharmacy orders in over 96% of cases. This paper describes the algorithm and discusses some of the causes of failures in mapping to RxNorm.
NASA Technical Reports Server (NTRS)
Eberhardt, D. S.; Baganoff, D.; Stevens, K.
1984-01-01
Implicit approximate-factored algorithms have certain properties that are suitable for parallel processing. A particular computational fluid dynamics (CFD) code, using this algorithm, is mapped onto a multiple-instruction/multiple-data-stream (MIMD) computer architecture. An explanation of this mapping procedure is presented, as well as some of the difficulties encountered when trying to run the code concurrently. Timing results are given for runs on the Ames Research Center's MIMD test facility which consists of two VAX 11/780's with a common MA780 multi-ported memory. Speedups exceeding 1.9 for characteristic CFD runs were indicated by the timing results.
Picking ChIP-seq peak detectors for analyzing chromatin modification experiments
Micsinai, Mariann; Parisi, Fabio; Strino, Francesco; Asp, Patrik; Dynlacht, Brian D.; Kluger, Yuval
2012-01-01
Numerous algorithms have been developed to analyze ChIP-Seq data. However, the complexity of analyzing diverse patterns of ChIP-Seq signals, especially for epigenetic marks, still calls for the development of new algorithms and objective comparisons of existing methods. We developed Qeseq, an algorithm to detect regions of increased ChIP read density relative to background. Qeseq employs critical novel elements, such as iterative recalibration and neighbor joining of reads to identify enriched regions of any length. To objectively assess its performance relative to other 14 ChIP-Seq peak finders, we designed a novel protocol based on Validation Discriminant Analysis (VDA) to optimally select validation sites and generated two validation datasets, which are the most comprehensive to date for algorithmic benchmarking of key epigenetic marks. In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets. Nevertheless, default parameters show the most stable performance, suggesting that they should be used. This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development. PMID:22307239
Picking ChIP-seq peak detectors for analyzing chromatin modification experiments.
Micsinai, Mariann; Parisi, Fabio; Strino, Francesco; Asp, Patrik; Dynlacht, Brian D; Kluger, Yuval
2012-05-01
Numerous algorithms have been developed to analyze ChIP-Seq data. However, the complexity of analyzing diverse patterns of ChIP-Seq signals, especially for epigenetic marks, still calls for the development of new algorithms and objective comparisons of existing methods. We developed Qeseq, an algorithm to detect regions of increased ChIP read density relative to background. Qeseq employs critical novel elements, such as iterative recalibration and neighbor joining of reads to identify enriched regions of any length. To objectively assess its performance relative to other 14 ChIP-Seq peak finders, we designed a novel protocol based on Validation Discriminant Analysis (VDA) to optimally select validation sites and generated two validation datasets, which are the most comprehensive to date for algorithmic benchmarking of key epigenetic marks. In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets. Nevertheless, default parameters show the most stable performance, suggesting that they should be used. This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development.
Personalized Medicine in Veterans with Traumatic Brain Injuries
2011-05-01
UPGMA algorithm with cosine correlation as the similarity metric. Results are present as a heat map (left panel) demonstrating that the panel of 18... UPGMA algorithm with cosine correlation as the similarity metric. Results are presented as heat maps demonstrating the efficacy of using all 13
Campbell, J R; Carpenter, P; Sneiderman, C; Cohn, S; Chute, C G; Warren, J
1997-01-01
To compare three potential sources of controlled clinical terminology (READ codes version 3.1, SNOMED International, and Unified Medical Language System (UMLS) version 1.6) relative to attributes of completeness, clinical taxonomy, administrative mapping, term definitions and clarity (duplicate coding rate). The authors assembled 1929 source concept records from a variety of clinical information taken from four medical centers across the United States. The source data included medical as well as ample nursing terminology. The source records were coded in each scheme by an investigator and checked by the coding scheme owner. The codings were then scored by an independent panel of clinicians for acceptability. Codes were checked for definitions provided with the scheme. Codes for a random sample of source records were analyzed by an investigator for "parent" and "child" codes within the scheme. Parent and child pairs were scored by an independent panel of medical informatics specialists for clinical acceptability. Administrative and billing code mapping from the published scheme were reviewed for all coded records and analyzed by independent reviewers for accuracy. The investigator for each scheme exhaustively searched a sample of coded records for duplications. SNOMED was judged to be significantly more complete in coding the source material than the other schemes (SNOMED* 70%; READ 57%; UMLS 50%; *p < .00001). SNOMED also had a richer clinical taxonomy judged by the number of acceptable first-degree relatives per coded concept (SNOMED* 4.56, UMLS 3.17; READ 2.14, *p < .005). Only the UMLS provided any definitions; these were found for 49% of records which had a coding assignment. READ and UMLS had better administrative mappings (composite score: READ* 40.6%; UMLS* 36.1%; SNOMED 20.7%, *p < .00001), and SNOMED had substantially more duplications of coding assignments (duplication rate: READ 0%; UMLS 4.2%; SNOMED* 13.9%, *p < .004) associated with a loss of clarity. No major terminology source can lay claim to being the ideal resource for a computer-based patient record. However, based upon this analysis of releases for April 1995, SNOMED International is considerably more complete, has a compositional nature and a richer taxonomy. Is suffers from less clarity, resulting from a lack of syntax and evolutionary changes in its coding scheme. READ has greater clarity and better mapping to administrative schemes (ICD-10 and OPCS-4), is rapidly changing and is less complete. UMLS is a rich lexical resource, with mappings to many source vocabularies. It provides definitions for many of its terms. However, due to the varying granularities and purposes of its source schemes, it has limitations for representation of clinical concepts within a computer-based patient record.
NASA Technical Reports Server (NTRS)
Clark, Roger N.; Swayze, Gregg A.; Gallagher, Andrea
1992-01-01
The sedimentary sections exposed in the Canyonlands and Arches National Parks region of Utah (generally referred to as 'Canyonlands') consist of sandstones, shales, limestones, and conglomerates. Reflectance spectra of weathered surfaces of rocks from these areas show two components: (1) variations in spectrally detectable mineralogy, and (2) variations in the relative ratios of the absorption bands between minerals. Both types of information can be used together to map each major lithology and the Clark spectral features mapping algorithm is applied to do the job.
Barrick, Jeffrey E; Colburn, Geoffrey; Deatherage, Daniel E; Traverse, Charles C; Strand, Matthew D; Borges, Jordan J; Knoester, David B; Reba, Aaron; Meyer, Austin G
2014-11-29
Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for ~25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold). Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected.
Yang, Jianfeng; Shu, Hua; McCandliss, Bruce D.; Zevin, Jason D.
2013-01-01
Learning to read any language requires learning to map among print, sound and meaning. Writing systems differ in a number of factors that influence both the ease and rate with which reading skill can be acquired, as well as the eventual division of labor between phonological and semantic processes. Further, developmental reading disability manifests differently across writing systems, and may be related to different deficits in constitutive processes. Here we simulate some aspects of reading acquisition in Chinese and English using the same model architecture for both writing systems. The contribution of semantic and phonological processing to literacy acquisition in the two languages is simulated, including specific effects of phonological and semantic deficits. Further, we demonstrate that similar patterns of performance are observed when the same model is trained on both Chinese and English as an "early bilingual." The results are consistent with the view that reading skill is acquired by the application of statistical learning rules to mappings among print, sound and meaning, and that differences in the typical and disordered acquisition of reading skill between writing systems are driven by differences in the statistical patterns of the writing systems themselves, rather than differences in cognitive architecture of the learner. PMID:24587693
Biomedical cloud computing with Amazon Web Services.
Fusaro, Vincent A; Patil, Prasad; Gafni, Erik; Wall, Dennis P; Tonellato, Peter J
2011-08-01
In this overview to biomedical computing in the cloud, we discussed two primary ways to use the cloud (a single instance or cluster), provided a detailed example using NGS mapping, and highlighted the associated costs. While many users new to the cloud may assume that entry is as straightforward as uploading an application and selecting an instance type and storage options, we illustrated that there is substantial up-front effort required before an application can make full use of the cloud's vast resources. Our intention was to provide a set of best practices and to illustrate how those apply to a typical application pipeline for biomedical informatics, but also general enough for extrapolation to other types of computational problems. Our mapping example was intended to illustrate how to develop a scalable project and not to compare and contrast alignment algorithms for read mapping and genome assembly. Indeed, with a newer aligner such as Bowtie, it is possible to map the entire African genome using one m2.2xlarge instance in 48 hours for a total cost of approximately $48 in computation time. In our example, we were not concerned with data transfer rates, which are heavily influenced by the amount of available bandwidth, connection latency, and network availability. When transferring large amounts of data to the cloud, bandwidth limitations can be a major bottleneck, and in some cases it is more efficient to simply mail a storage device containing the data to AWS (http://aws.amazon.com/importexport/). More information about cloud computing, detailed cost analysis, and security can be found in references.
MapSentinel: Can the Knowledge of Space Use Improve Indoor Tracking Further?
Jia, Ruoxi; Jin, Ming; Zou, Han; Yesilata, Yigitcan; Xie, Lihua; Spanos, Costas
2016-01-01
Estimating an occupant’s location is arguably the most fundamental sensing task in smart buildings. The applications for fine-grained, responsive building operations require the location sensing systems to provide location estimates in real time, also known as indoor tracking. Existing indoor tracking systems require occupants to carry specialized devices or install programs on their smartphone to collect inertial sensing data. In this paper, we propose MapSentinel, which performs non-intrusive location sensing based on WiFi access points and ultrasonic sensors. MapSentinel combines the noisy sensor readings with the floormap information to estimate locations. One key observation supporting our work is that occupants exhibit distinctive motion characteristics at different locations on the floormap, e.g., constrained motion along the corridor or in the cubicle zones, and free movement in the open space. While extensive research has been performed on using a floormap as a tool to obtain correct walking trajectories without wall-crossings, there have been few attempts to incorporate the knowledge of space use available from the floormap into the location estimation. This paper argues that the knowledge of space use as an additional information source presents new opportunities for indoor tracking. The fusion of heterogeneous information is theoretically formulated within the Factor Graph framework, and the Context-Augmented Particle Filtering algorithm is developed to efficiently solve real-time walking trajectories. Our evaluation in a large office space shows that the MapSentinel can achieve accuracy improvement of 31.3% compared with the purely WiFi-based tracking system. PMID:27049387
MapSentinel: Can the Knowledge of Space Use Improve Indoor Tracking Further?
Jia, Ruoxi; Jin, Ming; Zou, Han; Yesilata, Yigitcan; Xie, Lihua; Spanos, Costas
2016-04-02
Estimating an occupant's location is arguably the most fundamental sensing task in smart buildings. The applications for fine-grained, responsive building operations require the location sensing systems to provide location estimates in real time, also known as indoor tracking. Existing indoor tracking systems require occupants to carry specialized devices or install programs on their smartphone to collect inertial sensing data. In this paper, we propose MapSentinel, which performs non-intrusive location sensing based on WiFi access points and ultrasonic sensors. MapSentinel combines the noisy sensor readings with the floormap information to estimate locations. One key observation supporting our work is that occupants exhibit distinctive motion characteristics at different locations on the floormap, e.g., constrained motion along the corridor or in the cubicle zones, and free movement in the open space. While extensive research has been performed on using a floormap as a tool to obtain correct walking trajectories without wall-crossings, there have been few attempts to incorporate the knowledge of space use available from the floormap into the location estimation. This paper argues that the knowledge of space use as an additional information source presents new opportunities for indoor tracking. The fusion of heterogeneous information is theoretically formulated within the Factor Graph framework, and the Context-Augmented Particle Filtering algorithm is developed to efficiently solve real-time walking trajectories. Our evaluation in a large office space shows that the MapSentinel can achieve accuracy improvement of 31.3% compared with the purely WiFi-based tracking system.
Genotyping in the cloud with Crossbow.
Gurtowski, James; Schatz, Michael C; Langmead, Ben
2012-09-01
Crossbow is a scalable, portable, and automatic cloud computing tool for identifying SNPs from high-coverage, short-read resequencing data. It is built on Apache Hadoop, an implementation of the MapReduce software framework. Hadoop allows Crossbow to distribute read alignment and SNP calling subtasks over a cluster of commodity computers. Two robust tools, Bowtie and SOAPsnp, implement the fundamental alignment and variant calling operations respectively, and have demonstrated capabilities within Crossbow of analyzing approximately one billion short reads per hour on a commodity Hadoop cluster with 320 cores. Through protocol examples, this unit will demonstrate the use of Crossbow for identifying variations in three different operating modes: on a Hadoop cluster, on a single computer, and on the Amazon Elastic MapReduce cloud computing service.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cunliffe, Alexandra R.; Armato, Samuel G.; White, Bradley
2015-01-15
Purpose: To characterize the effects of deformable image registration of serial computed tomography (CT) scans on the radiation dose calculated from a treatment planning scan. Methods: Eighteen patients who received curative doses (≥60 Gy, 2 Gy/fraction) of photon radiation therapy for lung cancer treatment were retrospectively identified. For each patient, a diagnostic-quality pretherapy (4–75 days) CT scan and a treatment planning scan with an associated dose map were collected. To establish correspondence between scan pairs, a researcher manually identified anatomically corresponding landmark point pairs between the two scans. Pretherapy scans then were coregistered with planning scans (and associated dose maps)more » using the demons deformable registration algorithm and two variants of the Fraunhofer MEVIS algorithm (“Fast” and “EMPIRE10”). Landmark points in each pretherapy scan were automatically mapped to the planning scan using the displacement vector field output from each of the three algorithms. The Euclidean distance between manually and automatically mapped landmark points (d{sub E}) and the absolute difference in planned dose (|ΔD|) were calculated. Using regression modeling, |ΔD| was modeled as a function of d{sub E}, dose (D), dose standard deviation (SD{sub dose}) in an eight-pixel neighborhood, and the registration algorithm used. Results: Over 1400 landmark point pairs were identified, with 58–93 (median: 84) points identified per patient. Average |ΔD| across patients was 3.5 Gy (range: 0.9–10.6 Gy). Registration accuracy was highest using the Fraunhofer MEVIS EMPIRE10 algorithm, with an average d{sub E} across patients of 5.2 mm (compared with >7 mm for the other two algorithms). Consequently, average |ΔD| was also lowest using the Fraunhofer MEVIS EMPIRE10 algorithm. |ΔD| increased significantly as a function of d{sub E} (0.42 Gy/mm), D (0.05 Gy/Gy), SD{sub dose} (1.4 Gy/Gy), and the algorithm used (≤1 Gy). Conclusions: An average error of <4 Gy in radiation dose was introduced when points were mapped between CT scan pairs using deformable registration, with the majority of points yielding dose-mapping error <2 Gy (approximately 3% of the total prescribed dose). Registration accuracy was highest using the Fraunhofer MEVIS EMPIRE10 algorithm, resulting in the smallest errors in mapped dose. Dose differences following registration increased significantly with increasing spatial registration errors, dose, and dose gradient (i.e., SD{sub dose}). This model provides a measurement of the uncertainty in the radiation dose when points are mapped between serial CT scans through deformable registration.« less
ERIC Educational Resources Information Center
Pirnay-Dummer, Pablo; Ifenthaler, Dirk
2011-01-01
Our study integrates automated natural language-oriented assessment and analysis methodologies into feasible reading comprehension tasks. With the newly developed T-MITOCAR toolset, prose text can be automatically converted into an association net which has similarities to a concept map. The "text to graph" feature of the software is based on…
Diffraction gratings used as identifying markers
Deason, Vance A.; Ward, Michael B.
1991-01-01
A finely detailed defraction grating is applied to an object as an identifier or tag which is unambiguous, difficult to duplicate, or remove and transfer to another item, and can be read and compared with prior readings with relative ease. The exact pattern of the defraction grating is mapped by diffraction moire techniques and recorded for comparison with future readings of the same grating.
A pseudoinverse deformation vector field generator and its applications
Yan, C.; Zhong, H.; Murphy, M.; Weiss, E.; Siebers, J. V.
2010-01-01
Purpose: To present, implement, and test a self-consistent pseudoinverse displacement vector field (PIDVF) generator, which preserves the location of information mapped back-and-forth between image sets. Methods: The algorithm is an iterative scheme based on nearest neighbor interpolation and a subsequent iterative search. Performance of the algorithm is benchmarked using a lung 4DCT data set with six CT images from different breathing phases and eight CT images for a single prostrate patient acquired on different days. A diffeomorphic deformable image registration is used to validate our PIDVFs. Additionally, the PIDVF is used to measure the self-consistency of two nondiffeomorphic algorithms which do not use a self-consistency constraint: The ITK Demons algorithm for the lung patient images and an in-house B-Spline algorithm for the prostate patient images. Both Demons and B-Spline have been QAed through contour comparison. Self-consistency is determined by using a DIR to generate a displacement vector field (DVF) between reference image R and study image S (DVFR–S). The same DIR is used to generate DVFS–R. Additionally, our PIDVF generator is used to create PIDVFS–R. Back-and-forth mapping of a set of points (used as surrogates of contours) using DVFR–S and DVFS–R is compared to back-and-forth mapping performed with DVFR–S and PIDVFS–R. The Euclidean distances between the original unmapped points and the mapped points are used as a self-consistency measure. Results: Test results demonstrate that the consistency error observed in back-and-forth mappings can be reduced two to nine times in point mapping and 1.5 to three times in dose mapping when the PIDVF is used in place of the B-Spline algorithm. These self-consistency improvements are not affected by the exchanging of R and S. It is also demonstrated that differences between DVFS–R and PIDVFS–R can be used as a criteria to check the quality of the DVF. Conclusions: Use of DVF and its PIDVF will improve the self-consistency of points, contour, and dose mappings in image guided adaptive therapy. PMID:20384247
Vera-Sánchez, Juan Antonio; Ruiz-Morales, Carmen; González-López, Antonio
2018-03-01
To provide a multi-stage model to calculate uncertainty in radiochromic film dosimetry with Monte-Carlo techniques. This new approach is applied to single-channel and multichannel algorithms. Two lots of Gafchromic EBT3 are exposed in two different Varian linacs. They are read with an EPSON V800 flatbed scanner. The Monte-Carlo techniques in uncertainty analysis provide a numerical representation of the probability density functions of the output magnitudes. From this numerical representation, traditional parameters of uncertainty analysis as the standard deviations and bias are calculated. Moreover, these numerical representations are used to investigate the shape of the probability density functions of the output magnitudes. Also, another calibration film is read in four EPSON scanners (two V800 and two 10000XL) and the uncertainty analysis is carried out with the four images. The dose estimates of single-channel and multichannel algorithms show a Gaussian behavior and low bias. The multichannel algorithms lead to less uncertainty in the final dose estimates when the EPSON V800 is employed as reading device. In the case of the EPSON 10000XL, the single-channel algorithms provide less uncertainty in the dose estimates for doses higher than four Gy. A multi-stage model has been presented. With the aid of this model and the use of the Monte-Carlo techniques, the uncertainty of dose estimates for single-channel and multichannel algorithms are estimated. The application of the model together with Monte-Carlo techniques leads to a complete characterization of the uncertainties in radiochromic film dosimetry. Copyright © 2018 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.
Aphasia in Persian: Implications for cognitive models of lexical processing.
Bakhtiar, Mehdi; Jafary, Reyhane; Weekes, Brendan S
2017-09-01
Current models of oral reading assume that different routes (sublexical, lexical, and semantic) mediate oral reading performance and reliance on different routes during oral reading depends on the characteristics of print to sound mappings. Studies of single cases of acquired dyslexia in aphasia have contributed to the development of such models by revealing patterns of double dissociation in object naming and oral reading skill that follow brain damage in Indo-European and Sino-Tibetan languages. Print to sound mapping in Persian varies in transparency because orthography to phonology translation depends uniquely on the presence or absence of vowel letters in print. Here a hypothesis is tested that oral reading in Persian requires a semantic reading pathway that is independent of a direct non-semantic reading pathway, by investigating whether Persian speakers with aphasia show selective impairments to object naming and reading aloud. A sample of 21 Persian speakers with aphasia ranging in age from 18 to 77 (mean = 53, SD = 16.9) was asked to name a same set of 200 objects and to read aloud the printed names of these objects in different sessions. As an additional measure of sublexical reading, patients were asked to read aloud 30 non-word stimuli. Results showed that oral reading is significantly more preserved than object naming in Persian speakers with aphasia. However, more preserved object naming than oral reading was also observed in some cases. There was a moderate positive correlation between picture naming and oral reading success (p < .05). Mixed-effects logistic regression revealed that word frequency, age of acquisition and imageability predict success across both tasks and there is an interaction between these variables and orthographic transparency in oral reading. Furthermore, opaque words were read less accurately than transparent words. The results reveal different patterns of acquired dyslexia in some cases that closely resemble phonological, deep, and surface dyslexia in other scripts - reported here in Persian for the first time. © 2016 The British Psychological Society.
State Highway Maps: A Route to a Learning Adventure
ERIC Educational Resources Information Center
McDuffie, Thomas E.; Cifelli, Joseph
2006-01-01
Science within the folds of highway maps is explored through a series of hands-on experiences designed to reinforce and extend map-reading skills in grades 6-8. The increasingly sophisticated, standards-related activities include measuring distances between population centers, finding communities named after trees, animals, and geologic features,…
The Impact of the Measures of Academic Progress (MAP) Program on Student Reading Achievement
ERIC Educational Resources Information Center
Cordray, David S.; Pion, Georgine M.; Brandt, Chris; Molefe, Ayrin
2013-01-01
One of the most widely used commercially available systems incorporating benchmark assessment and training in differentiated instruction is the Northwest Evaluation Association's (NWEA) Measures of Academic Progress (MAP) program. The MAP program involves two components: (1) computer-adaptive assessments administered to students three to four…
Changes in running pattern due to fatigue and cognitive load in orienteering.
Millet, Guillaume Y; Divert, Caroline; Banizette, Marion; Morin, Jean-Benoit
2010-01-01
The aim of this study was to examine the influence of fatigue on running biomechanics in normal running, in normal running with a cognitive task, and in running while map reading. Nineteen international and less experienced orienteers performed a fatiguing running exercise of duration and intensity similar to a classic distance orienteering race on an instrumented treadmill while performing mental arithmetic, an orienteering simulation, and control running at regular intervals. Two-way repeated-measures analysis of variance did not reveal any significant difference between mental arithmetic and control running for any of the kinematic and kinetic parameters analysed eight times over the fatiguing protocol. However, these parameters were systematically different between the orienteering simulation and the other two conditions (mental arithmetic and control running). The adaptations in orienteering simulation running were significantly more pronounced in the elite group when step frequency, peak vertical ground reaction force, vertical stiffness, and maximal downward displacement of the centre of mass during contact were considered. The effects of fatigue on running biomechanics depended on whether the orienteers read their map or ran normally. It is concluded that adding a cognitive load does not modify running patterns. Therefore, all changes in running pattern observed during the orienteering simulation, particularly in elite orienteers, are the result of adaptations to enable efficient map reading and/or potentially prevent injuries. Finally, running patterns are not affected to the same extent by fatigue when a map reading task is added.
SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read
2010-01-01
Background High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. Results SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. Conclusions SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts. PMID:20089148
Doing Mathematics with Purpose: Mathematical Text Types
ERIC Educational Resources Information Center
Dostal, Hannah M.; Robinson, Richard
2018-01-01
Mathematical literacy includes learning to read and write different types of mathematical texts as part of purposeful mathematical meaning making. Thus in this article, we describe how learning to read and write mathematical texts (proof text, algorithmic text, algebraic/symbolic text, and visual text) supports the development of students'…
Fast imputation using medium- or low-coverage sequence data
USDA-ARS?s Scientific Manuscript database
Direct imputation from raw sequence reads can be more accurate than calling genotypes first and then imputing, especially if read depth is low or error rates high, but different imputation strategies are required than those used for data from genotyping chips. A fast algorithm to impute from lower t...
A model-based 3D phase unwrapping algorithm using Gegenbauer polynomials.
Langley, Jason; Zhao, Qun
2009-09-07
The application of a two-dimensional (2D) phase unwrapping algorithm to a three-dimensional (3D) phase map may result in an unwrapped phase map that is discontinuous in the direction normal to the unwrapped plane. This work investigates the problem of phase unwrapping for 3D phase maps. The phase map is modeled as a product of three one-dimensional Gegenbauer polynomials. The orthogonality of Gegenbauer polynomials and their derivatives on the interval [-1, 1] are exploited to calculate the expansion coefficients. The algorithm was implemented using two well-known Gegenbauer polynomials: Chebyshev polynomials of the first kind and Legendre polynomials. Both implementations of the phase unwrapping algorithm were tested on 3D datasets acquired from a magnetic resonance imaging (MRI) scanner. The first dataset was acquired from a homogeneous spherical phantom. The second dataset was acquired using the same spherical phantom but magnetic field inhomogeneities were introduced by an external coil placed adjacent to the phantom, which provided an additional burden to the phase unwrapping algorithm. Then Gaussian noise was added to generate a low signal-to-noise ratio dataset. The third dataset was acquired from the brain of a human volunteer. The results showed that Chebyshev implementation and the Legendre implementation of the phase unwrapping algorithm give similar results on the 3D datasets. Both implementations of the phase unwrapping algorithm compare well to PRELUDE 3D, 3D phase unwrapping software well recognized for functional MRI.
Evaluation of algorithms used to order markers on genetic maps.
Mollinari, M; Margarido, G R A; Vencovsky, R; Garcia, A A F
2009-12-01
When building genetic maps, it is necessary to choose from several marker ordering algorithms and criteria, and the choice is not always simple. In this study, we evaluate the efficiency of algorithms try (TRY), seriation (SER), rapid chain delineation (RCD), recombination counting and ordering (RECORD) and unidirectional growth (UG), as well as the criteria PARF (product of adjacent recombination fractions), SARF (sum of adjacent recombination fractions), SALOD (sum of adjacent LOD scores) and LHMC (likelihood through hidden Markov chains), used with the RIPPLE algorithm for error verification, in the construction of genetic linkage maps. A linkage map of a hypothetical diploid and monoecious plant species was simulated containing one linkage group and 21 markers with fixed distance of 3 cM between them. In all, 700 F(2) populations were randomly simulated with 100 and 400 individuals with different combinations of dominant and co-dominant markers, as well as 10 and 20% of missing data. The simulations showed that, in the presence of co-dominant markers only, any combination of algorithm and criteria may be used, even for a reduced population size. In the case of a smaller proportion of dominant markers, any of the algorithms and criteria (except SALOD) investigated may be used. In the presence of high proportions of dominant markers and smaller samples (around 100), the probability of repulsion linkage increases between them and, in this case, use of the algorithms TRY and SER associated to RIPPLE with criterion LHMC would provide better results.
A new chaotic multi-verse optimization algorithm for solving engineering optimization problems
NASA Astrophysics Data System (ADS)
Sayed, Gehad Ismail; Darwish, Ashraf; Hassanien, Aboul Ella
2018-03-01
Multi-verse optimization algorithm (MVO) is one of the recent meta-heuristic optimization algorithms. The main inspiration of this algorithm came from multi-verse theory in physics. However, MVO like most optimization algorithms suffers from low convergence rate and entrapment in local optima. In this paper, a new chaotic multi-verse optimization algorithm (CMVO) is proposed to overcome these problems. The proposed CMVO is applied on 13 benchmark functions and 7 well-known design problems in the engineering and mechanical field; namely, three-bar trust, speed reduce design, pressure vessel problem, spring design, welded beam, rolling element-bearing and multiple disc clutch brake. In the current study, a modified feasible-based mechanism is employed to handle constraints. In this mechanism, four rules were used to handle the specific constraint problem through maintaining a balance between feasible and infeasible solutions. Moreover, 10 well-known chaotic maps are used to improve the performance of MVO. The experimental results showed that CMVO outperforms other meta-heuristic optimization algorithms on most of the optimization problems. Also, the results reveal that sine chaotic map is the most appropriate map to significantly boost MVO's performance.
Deriving health utilities from the MacNew Heart Disease Quality of Life Questionnaire.
Chen, Gang; McKie, John; Khan, Munir A; Richardson, Jeff R
2015-10-01
Quality of life is included in the economic evaluation of health services by measuring the preference for health states, i.e. health state utilities. However, most intervention studies include a disease-specific, not a utility, instrument. Consequently, there has been increasing use of statistical mapping algorithms which permit utilities to be estimated from a disease-specific instrument. The present paper provides such algorithms between the MacNew Heart Disease Quality of Life Questionnaire (MacNew) instrument and six multi-attribute utility (MAU) instruments, the Euroqol (EQ-5D), the Short Form 6D (SF-6D), the Health Utilities Index (HUI) 3, the Quality of Wellbeing (QWB), the 15D (15 Dimension) and the Assessment of Quality of Life (AQoL-8D). Heart disease patients and members of the healthy public were recruited from six countries. Non-parametric rank tests were used to compare subgroup utilities and MacNew scores. Mapping algorithms were estimated using three separate statistical techniques. Mapping algorithms achieved a high degree of precision. Based on the mean absolute error and the intra class correlation the preferred mapping is MacNew into SF-6D or 15D. Using the R squared statistic the preferred mapping is MacNew into AQoL-8D. The algorithms reported in this paper enable MacNew data to be mapped into utilities predicted from any of six instruments. This permits studies which have included the MacNew to be used in cost utility analyses which, in turn, allows the comparison of services with interventions across the health system. © The European Society of Cardiology 2014.
A Novel Real-Time Reference Key Frame Scan Matching Method.
Mohamed, Haytham; Moussa, Adel; Elhabiby, Mohamed; El-Sheimy, Naser; Sesay, Abu
2017-05-07
Unmanned aerial vehicles represent an effective technology for indoor search and rescue operations. Typically, most indoor missions' environments would be unknown, unstructured, and/or dynamic. Navigation of UAVs in such environments is addressed by simultaneous localization and mapping approach using either local or global approaches. Both approaches suffer from accumulated errors and high processing time due to the iterative nature of the scan matching method. Moreover, point-to-point scan matching is prone to outlier association processes. This paper proposes a low-cost novel method for 2D real-time scan matching based on a reference key frame (RKF). RKF is a hybrid scan matching technique comprised of feature-to-feature and point-to-point approaches. This algorithm aims at mitigating errors accumulation using the key frame technique, which is inspired from video streaming broadcast process. The algorithm depends on the iterative closest point algorithm during the lack of linear features which is typically exhibited in unstructured environments. The algorithm switches back to the RKF once linear features are detected. To validate and evaluate the algorithm, the mapping performance and time consumption are compared with various algorithms in static and dynamic environments. The performance of the algorithm exhibits promising navigational, mapping results and very short computational time, that indicates the potential use of the new algorithm with real-time systems.
Interpreting Quantifier Scope Ambiguity: Evidence of Heuristic First, Algorithmic Second Processing
Dwivedi, Veena D.
2013-01-01
The present work suggests that sentence processing requires both heuristic and algorithmic processing streams, where the heuristic processing strategy precedes the algorithmic phase. This conclusion is based on three self-paced reading experiments in which the processing of two-sentence discourses was investigated, where context sentences exhibited quantifier scope ambiguity. Experiment 1 demonstrates that such sentences are processed in a shallow manner. Experiment 2 uses the same stimuli as Experiment 1 but adds questions to ensure deeper processing. Results indicate that reading times are consistent with a lexical-pragmatic interpretation of number associated with context sentences, but responses to questions are consistent with the algorithmic computation of quantifier scope. Experiment 3 shows the same pattern of results as Experiment 2, despite using stimuli with different lexical-pragmatic biases. These effects suggest that language processing can be superficial, and that deeper processing, which is sensitive to structure, only occurs if required. Implications for recent studies of quantifier scope ambiguity are discussed. PMID:24278439
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tessier, Francois; Vishwanath, Venkatram
2017-11-28
Reading and writing data efficiently from different tiers of storage is necessary for most scientific simulations to achieve good performance at scale. Many software solutions have been developed to decrease the I/O bottleneck. One wellknown strategy, in the context of collective I/O operations, is the two-phase I/O scheme. This strategy consists of selecting a subset of processes to aggregate contiguous pieces of data before performing reads/writes. In our previous work, we implemented the two-phase I/O scheme with a MPI-based topology-aware algorithm. Our algorithm showed very good performance at scale compared to the standard I/O libraries such as POSIX I/O andmore » MPI I/O. However, the algorithm had several limitations hindering a satisfying reproducibility of our experiments. In this paper, we extend our work by 1) identifying the obstacles we face to reproduce our experiments and 2) discovering solutions that reduce the unpredictability of our results.« less
Dyslexia: Neuroanatomical/Neurolinguistic Perspectives.
ERIC Educational Resources Information Center
Hynd, George W.; Hynd, Cynthia R.
1984-01-01
Reviews attempts to adequately define dyslexia with a focus on recent efforts at developing a nosology of dyslexia and discusses the neurological basis of reading and severe reading failure with an emphasis on validating evidence provided through brain-mapping procedures and postmortem studies. (HOD)
Stereo-vision-based terrain mapping for off-road autonomous navigation
NASA Astrophysics Data System (ADS)
Rankin, Arturo L.; Huertas, Andres; Matthies, Larry H.
2009-05-01
Successful off-road autonomous navigation by an unmanned ground vehicle (UGV) requires reliable perception and representation of natural terrain. While perception algorithms are used to detect driving hazards, terrain mapping algorithms are used to represent the detected hazards in a world model a UGV can use to plan safe paths. There are two primary ways to detect driving hazards with perception sensors mounted to a UGV: binary obstacle detection and traversability cost analysis. Binary obstacle detectors label terrain as either traversable or non-traversable, whereas, traversability cost analysis assigns a cost to driving over a discrete patch of terrain. In uncluttered environments where the non-obstacle terrain is equally traversable, binary obstacle detection is sufficient. However, in cluttered environments, some form of traversability cost analysis is necessary. The Jet Propulsion Laboratory (JPL) has explored both approaches using stereo vision systems. A set of binary detectors has been implemented that detect positive obstacles, negative obstacles, tree trunks, tree lines, excessive slope, low overhangs, and water bodies. A compact terrain map is built from each frame of stereo images. The mapping algorithm labels cells that contain obstacles as nogo regions, and encodes terrain elevation, terrain classification, terrain roughness, traversability cost, and a confidence value. The single frame maps are merged into a world map where temporal filtering is applied. In previous papers, we have described our perception algorithms that perform binary obstacle detection. In this paper, we summarize the terrain mapping capabilities that JPL has implemented during several UGV programs over the last decade and discuss some challenges to building terrain maps with stereo range data.
NASA Astrophysics Data System (ADS)
Kosugi, Akito; Takemi, Mitsuaki; Tia, Banty; Castagnola, Elisa; Ansaldo, Alberto; Sato, Kenta; Awiszus, Friedemann; Seki, Kazuhiko; Ricci, Davide; Fadiga, Luciano; Iriki, Atsushi; Ushiba, Junichi
2018-06-01
Objective. Motor map has been widely used as an indicator of motor skills and learning, cortical injury, plasticity, and functional recovery. Cortical stimulation mapping using epidural electrodes is recently adopted for animal studies. However, several technical limitations still remain. Test-retest reliability of epidural cortical stimulation (ECS) mapping has not been examined in detail. Many previous studies defined evoked movements and motor thresholds by visual inspection, and thus, lacked quantitative measurements. A reliable and quantitative motor map is important to elucidate the mechanisms of motor cortical reorganization. The objective of the current study was to perform reliable ECS mapping of motor representations based on the motor thresholds, which were stochastically estimated by motor evoked potentials and chronically implanted micro-electrocorticographical (µECoG) electrode arrays, in common marmosets. Approach. ECS was applied using the implanted µECoG electrode arrays in three adult common marmosets under awake conditions. Motor evoked potentials were recorded through electromyographical electrodes implanted in upper limb muscles. The motor threshold was calculated through a modified maximum likelihood threshold-hunting algorithm fitted with the recorded data from marmosets. Further, a computer simulation confirmed reliability of the algorithm. Main results. Computer simulation suggested that the modified maximum likelihood threshold-hunting algorithm enabled to estimate motor threshold with acceptable precision. In vivo ECS mapping showed high test-retest reliability with respect to the excitability and location of the cortical forelimb motor representations. Significance. Using implanted µECoG electrode arrays and a modified motor threshold-hunting algorithm, we were able to achieve reliable motor mapping in common marmosets with the ECS system.
Kosugi, Akito; Takemi, Mitsuaki; Tia, Banty; Castagnola, Elisa; Ansaldo, Alberto; Sato, Kenta; Awiszus, Friedemann; Seki, Kazuhiko; Ricci, Davide; Fadiga, Luciano; Iriki, Atsushi; Ushiba, Junichi
2018-06-01
Motor map has been widely used as an indicator of motor skills and learning, cortical injury, plasticity, and functional recovery. Cortical stimulation mapping using epidural electrodes is recently adopted for animal studies. However, several technical limitations still remain. Test-retest reliability of epidural cortical stimulation (ECS) mapping has not been examined in detail. Many previous studies defined evoked movements and motor thresholds by visual inspection, and thus, lacked quantitative measurements. A reliable and quantitative motor map is important to elucidate the mechanisms of motor cortical reorganization. The objective of the current study was to perform reliable ECS mapping of motor representations based on the motor thresholds, which were stochastically estimated by motor evoked potentials and chronically implanted micro-electrocorticographical (µECoG) electrode arrays, in common marmosets. ECS was applied using the implanted µECoG electrode arrays in three adult common marmosets under awake conditions. Motor evoked potentials were recorded through electromyographical electrodes implanted in upper limb muscles. The motor threshold was calculated through a modified maximum likelihood threshold-hunting algorithm fitted with the recorded data from marmosets. Further, a computer simulation confirmed reliability of the algorithm. Computer simulation suggested that the modified maximum likelihood threshold-hunting algorithm enabled to estimate motor threshold with acceptable precision. In vivo ECS mapping showed high test-retest reliability with respect to the excitability and location of the cortical forelimb motor representations. Using implanted µECoG electrode arrays and a modified motor threshold-hunting algorithm, we were able to achieve reliable motor mapping in common marmosets with the ECS system.
Stereo Vision Based Terrain Mapping for Off-Road Autonomous Navigation
NASA Technical Reports Server (NTRS)
Rankin, Arturo L.; Huertas, Andres; Matthies, Larry H.
2009-01-01
Successful off-road autonomous navigation by an unmanned ground vehicle (UGV) requires reliable perception and representation of natural terrain. While perception algorithms are used to detect driving hazards, terrain mapping algorithms are used to represent the detected hazards in a world model a UGV can use to plan safe paths. There are two primary ways to detect driving hazards with perception sensors mounted to a UGV: binary obstacle detection and traversability cost analysis. Binary obstacle detectors label terrain as either traversable or non-traversable, whereas, traversability cost analysis assigns a cost to driving over a discrete patch of terrain. In uncluttered environments where the non-obstacle terrain is equally traversable, binary obstacle detection is sufficient. However, in cluttered environments, some form of traversability cost analysis is necessary. The Jet Propulsion Laboratory (JPL) has explored both approaches using stereo vision systems. A set of binary detectors has been implemented that detect positive obstacles, negative obstacles, tree trunks, tree lines, excessive slope, low overhangs, and water bodies. A compact terrain map is built from each frame of stereo images. The mapping algorithm labels cells that contain obstacles as no-go regions, and encodes terrain elevation, terrain classification, terrain roughness, traversability cost, and a confidence value. The single frame maps are merged into a world map where temporal filtering is applied. In previous papers, we have described our perception algorithms that perform binary obstacle detection. In this paper, we summarize the terrain mapping capabilities that JPL has implemented during several UGV programs over the last decade and discuss some challenges to building terrain maps with stereo range data.
Satellite-map position estimation for the Mars rover
NASA Technical Reports Server (NTRS)
Hayashi, Akira; Dean, Thomas
1989-01-01
A method for locating the Mars rover using an elevation map generated from satellite data is described. In exploring its environment, the rover is assumed to generate a local rover-centered elevation map that can be used to extract information about the relative position and orientation of landmarks corresponding to local maxima. These landmarks are integrated into a stochastic map which is then matched with the satellite map to obtain an estimate of the robot's current location. The landmarks are not explicitly represented in the satellite map. The results of the matching algorithm correspond to a probabilistic assessment of whether or not the robot is located within a given region of the satellite map. By assigning a probabilistic interpretation to the information stored in the satellite map, researchers are able to provide a precise characterization of the results computed by the matching algorithm.
Autonomous exploration and mapping of unknown environments
NASA Astrophysics Data System (ADS)
Owens, Jason; Osteen, Phil; Fields, MaryAnne
2012-06-01
Autonomous exploration and mapping is a vital capability for future robotic systems expected to function in arbitrary complex environments. In this paper, we describe an end-to-end robotic solution for remotely mapping buildings. For a typical mapping system, an unmanned system is directed to enter an unknown building at a distance, sense the internal structure, and, barring additional tasks, while in situ, create a 2-D map of the building. This map provides a useful and intuitive representation of the environment for the remote operator. We have integrated a robust mapping and exploration system utilizing laser range scanners and RGB-D cameras, and we demonstrate an exploration and metacognition algorithm on a robotic platform. The algorithm allows the robot to safely navigate the building, explore the interior, report significant features to the operator, and generate a consistent map - all while maintaining localization.
The MAP Spacecraft Angular State Estimation After Sensor Failure
NASA Technical Reports Server (NTRS)
Bar-Itzhack, Itzhack Y.; Harman, Richard R.
2003-01-01
This work describes two algorithms for computing the angular rate and attitude in case of a gyro and a Star Tracker failure in the Microwave Anisotropy Probe (MAP) satellite, which was placed in the L2 parking point from where it collects data to determine the origin of the universe. The nature of the problem is described, two algorithms are suggested, an observability study is carried out and real MAP data are used to determine the merit of the algorithms. It is shown that one of the algorithms yields a good estimate of the rates but not of the attitude whereas the other algorithm yields a good estimate of the rate as well as two of the three attitude angles. The estimation of the third angle depends on the initial state estimate. There is a contradiction between this result and the outcome of the observability analysis. An explanation of this contradiction is given in the paper. Although this work treats a particular spacecraft, the conclusions have a far reaching consequence.
The Effect of Sensor Failure on the Attitude and Rate Estimation of MAP Spacecraft
NASA Technical Reports Server (NTRS)
Bar-Itzhack, Itzhack Y.; Harman, Richard R.
2003-01-01
This work describes two algorithms for computing the angular rate and attitude in case of a gyro and a Star Tracker failure in the Microwave Anisotropy Probe (MAP) satellite, which was placed in the L2 parking point from where it collects data to determine the origin of the universe. The nature of the problem is described, two algorithms are suggested, an observability study is carried out and real MAP data are used to determine the merit of the algorithms. It is shown that one of the algorithms yields a good estimate of the rates but not of the attitude whereas the other algorithm yields a good estimate of the rate as well as two of the three attitude angles. The estimation of the third angle depends on the initial state estimate. There is a contradiction between this result and the outcome of the observability analysis. An explanation of this contradiction is given in the paper. Although this work treats a particular spacecraft, its conclusions are more general.
Topological mappings of video and audio data.
Fyfe, Colin; Barbakh, Wesam; Ooi, Wei Chuan; Ko, Hanseok
2008-12-01
We review a new form of self-organizing map which is based on a nonlinear projection of latent points into data space, identical to that performed in the Generative Topographic Mapping (GTM).(1) But whereas the GTM is an extension of a mixture of experts, this model is an extension of a product of experts.(2) We show visualisation and clustering results on a data set composed of video data of lips uttering 5 Korean vowels. Finally we note that we may dispense with the probabilistic underpinnings of the product of experts and derive the same algorithm as a minimisation of mean squared error between the prototypes and the data. This leads us to suggest a new algorithm which incorporates local and global information in the clustering. Both ot the new algorithms achieve better results than the standard Self-Organizing Map.
Taylor, J S H; Davis, Matthew H; Rastle, Kathleen
2017-06-01
There is strong scientific consensus that emphasizing print-to-sound relationships is critical when learning to read alphabetic languages. Nevertheless, reading instruction varies across English-speaking countries, from intensive phonic training to multicuing environments that teach sound- and meaning-based strategies. We sought to understand the behavioral and neural consequences of these differences in relative emphasis. We taught 24 English-speaking adults to read 2 sets of 24 novel words (e.g., /buv/, /sig/), written in 2 different unfamiliar orthographies. Following pretraining on oral vocabulary, participants learned to read the novel words over 8 days. Training in 1 language was biased toward print-to-sound mappings while training in the other language was biased toward print-to-meaning mappings. Results showed striking benefits of print-sound training on reading aloud, generalization, and comprehension of single words. Univariate analyses of fMRI data collected at the end of training showed that print-meaning relative to print-sound relative training increased neural effort in dorsal pathway regions involved in reading aloud. Conversely, activity in ventral pathway brain regions involved in reading comprehension was no different following print-meaning versus print-sound training. Multivariate analyses validated our artificial language approach, showing high similarity between the spatial distribution of fMRI activity during artificial and English word reading. Our results suggest that early literacy education should focus on the systematicities present in print-to-sound relationships in alphabetic languages, rather than teaching meaning-based strategies, in order to enhance both reading aloud and comprehension of written words. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
2017-01-01
There is strong scientific consensus that emphasizing print-to-sound relationships is critical when learning to read alphabetic languages. Nevertheless, reading instruction varies across English-speaking countries, from intensive phonic training to multicuing environments that teach sound- and meaning-based strategies. We sought to understand the behavioral and neural consequences of these differences in relative emphasis. We taught 24 English-speaking adults to read 2 sets of 24 novel words (e.g., /buv/, /sig/), written in 2 different unfamiliar orthographies. Following pretraining on oral vocabulary, participants learned to read the novel words over 8 days. Training in 1 language was biased toward print-to-sound mappings while training in the other language was biased toward print-to-meaning mappings. Results showed striking benefits of print–sound training on reading aloud, generalization, and comprehension of single words. Univariate analyses of fMRI data collected at the end of training showed that print–meaning relative to print–sound relative training increased neural effort in dorsal pathway regions involved in reading aloud. Conversely, activity in ventral pathway brain regions involved in reading comprehension was no different following print–meaning versus print–sound training. Multivariate analyses validated our artificial language approach, showing high similarity between the spatial distribution of fMRI activity during artificial and English word reading. Our results suggest that early literacy education should focus on the systematicities present in print-to-sound relationships in alphabetic languages, rather than teaching meaning-based strategies, in order to enhance both reading aloud and comprehension of written words. PMID:28425742
Du, Jia; Younes, Laurent; Qiu, Anqi
2011-01-01
This paper introduces a novel large deformation diffeomorphic metric mapping algorithm for whole brain registration where sulcal and gyral curves, cortical surfaces, and intensity images are simultaneously carried from one subject to another through a flow of diffeomorphisms. To the best of our knowledge, this is the first time that the diffeomorphic metric from one brain to another is derived in a shape space of intensity images and point sets (such as curves and surfaces) in a unified manner. We describe the Euler–Lagrange equation associated with this algorithm with respect to momentum, a linear transformation of the velocity vector field of the diffeomorphic flow. The numerical implementation for solving this variational problem, which involves large-scale kernel convolution in an irregular grid, is made feasible by introducing a class of computationally friendly kernels. We apply this algorithm to align magnetic resonance brain data. Our whole brain mapping results show that our algorithm outperforms the image-based LDDMM algorithm in terms of the mapping accuracy of gyral/sulcal curves, sulcal regions, and cortical and subcortical segmentation. Moreover, our algorithm provides better whole brain alignment than combined volumetric and surface registration (Postelnicu et al., 2009) and hierarchical attribute matching mechanism for elastic registration (HAMMER) (Shen and Davatzikos, 2002) in terms of cortical and subcortical volume segmentation. PMID:21281722
Friedman, Lee; Rigas, Ioannis; Abdulin, Evgeny; Komogortsev, Oleg V
2018-05-15
Nystrӧm and Holmqvist have published a method for the classification of eye movements during reading (ONH) (Nyström & Holmqvist, 2010). When we applied this algorithm to our data, the results were not satisfactory, so we modified the algorithm (now the MNH) to better classify our data. The changes included: (1) reducing the amount of signal filtering, (2) excluding a new type of noise, (3) removing several adaptive thresholds and replacing them with fixed thresholds, (4) changing the way that the start and end of each saccade was determined, (5) employing a new algorithm for detecting PSOs, and (6) allowing a fixation period to either begin or end with noise. A new method for the evaluation of classification algorithms is presented. It was designed to provide comprehensive feedback to an algorithm developer, in a time-efficient manner, about the types and numbers of classification errors that an algorithm produces. This evaluation was conducted by three expert raters independently, across 20 randomly chosen recordings, each classified by both algorithms. The MNH made many fewer errors in determining when saccades start and end, and it also detected some fixations and saccades that the ONH did not. The MNH fails to detect very small saccades. We also evaluated two additional algorithms: the EyeLink Parser and a more current, machine-learning-based algorithm. The EyeLink Parser tended to find more saccades that ended too early than did the other methods, and we found numerous problems with the output of the machine-learning-based algorithm.
NASA Astrophysics Data System (ADS)
Liu, Zhi; Zhou, Baotong; Zhang, Changnian
2017-03-01
Vehicle-mounted panoramic system is important safety assistant equipment for driving. However, traditional systems only render fixed top-down perspective view of limited view field, which may have potential safety hazard. In this paper, a texture mapping algorithm for 3D vehicle-mounted panoramic system is introduced, and an implementation of the algorithm utilizing OpenGL ES library based on Android smart platform is presented. Initial experiment results show that the proposed algorithm can render a good 3D panorama, and has the ability to change view point freely.
Validation of a standardized mapping system of the hip joint for radial MRA sequencing.
Klenke, Frank M; Hoffmann, Daniel B; Cross, Brian J; Siebenrock, Klaus A
2015-03-01
Intraarticular gadolinium-enhanced magnetic resonance arthrography (MRA) is commonly applied to characterize morphological disorders of the hip. However, the reproducibility of retrieving anatomic landmarks on MRA scans and their correlation with intraarticular pathologies is unknown. A precise mapping system for the exact localization of hip pathomorphologies with radial MRA sequences is lacking. Therefore, the purpose of the study was the establishment and validation of a reproducible mapping system for radial sequences of hip MRA. Sixty-nine consecutive intraarticular gadolinium-enhanced hip MRAs were evaluated. Radial sequencing consisted of 14 cuts orientated along the axis of the femoral neck. Three orthopedic surgeons read the radial sequences independently. Each MRI was read twice with a minimum interval of 7 days from the first reading. The intra- and inter-observer reliability of the mapping procedure was determined. A clockwise system for hip MRA was established. The teardrop figure served to determine the 6 o'clock position of the acetabulum; the center of the greater trochanter served to determine the 12 o'clock position of the femoral head-neck junction. The intra- and inter-observer ICCs to retrieve the correct 6/12 o'clock positions were 0.906-0.996 and 0.978-0.988, respectively. The established mapping system for radial sequences of hip joint MRA is reproducible and easy to perform.
Algebraic grid generation using tensor product B-splines. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Saunders, B. V.
1985-01-01
Finite difference methods are more successful if the accompanying grid has lines which are smooth and nearly orthogonal. The development of an algorithm which produces such a grid when given the boundary description. Topological considerations in structuring the grid generation mapping are discussed. The concept of the degree of a mapping and how it can be used to determine what requirements are necessary if a mapping is to produce a suitable grid is examined. The grid generation algorithm uses a mapping composed of bicubic B-splines. Boundary coefficients are chosen so that the splines produce Schoenberg's variation diminishing spline approximation to the boundary. Interior coefficients are initially chosen to give a variation diminishing approximation to the transfinite bilinear interpolant of the function mapping the boundary of the unit square onto the boundary grid. The practicality of optimizing the grid by minimizing a functional involving the Jacobian of the grid generation mapping at each interior grid point and the dot product of vectors tangent to the grid lines is investigated. Grids generated by using the algorithm are presented.
2002-01-01
UNITY program that implements exactly the same algorithm as Specification 1.1. The correctness of this program is proven in amanner sim- 4 program...chapter, we introduce the Dynamic UNITY formalism, which allows us to reason about algorithms and protocols in which the sets of participating processes...implements Euclid’s algorithm for calculating the greatest common divisor (GCD) of two integers; it repeat- edly reads an integer message from each of its
Algorithms and Complexity Results for Genome Mapping Problems.
Rajaraman, Ashok; Zanetti, Joao Paulo Pereira; Manuch, Jan; Chauve, Cedric
2017-01-01
Genome mapping algorithms aim at computing an ordering of a set of genomic markers based on local ordering information such as adjacencies and intervals of markers. In most genome mapping models, markers are assumed to occur uniquely in the resulting map. We introduce algorithmic questions that consider repeats, i.e., markers that can have several occurrences in the resulting map. We show that, provided with an upper bound on the copy number of repeated markers and with intervals that span full repeat copies, called repeat spanning intervals, the problem of deciding if a set of adjacencies and repeat spanning intervals admits a genome representation is tractable if the target genome can contain linear and/or circular chromosomal fragments. We also show that extracting a maximum cardinality or weight subset of repeat spanning intervals given a set of adjacencies that admits a genome realization is NP-hard but fixed-parameter tractable in the maximum copy number and the number of adjacent repeats, and tractable if intervals contain a single repeated marker.
ERIC Educational Resources Information Center
Torres Soto, Nayda E.
2013-01-01
Learning to read consists of processes that allow the reader to recognize print, understand words and comprehend text. Thus, vocabulary is one of the most fundamental elements for reading comprehension. Learners can acquire new words from text upon the first encounter with a new word on the basis of fast mapping. Teachers also contribute to…
Diffraction gratings used as identifying markers
Deason, V.A.; Ward, M.B.
1991-03-26
A finely detailed diffraction grating is applied to an object as an identifier or tag which is unambiguous, difficult to duplicate, or remove and transfer to another item, and can be read and compared with prior readings with relative ease. The exact pattern of the diffraction grating is mapped by diffraction moire techniques and recorded for comparison with future readings of the same grating. 7 figures.
ERIC Educational Resources Information Center
School Library Media Activities Monthly, 2003
2003-01-01
Provides five fully developed library media activities that are designed for use with specific curriculum units in dramatics, reading, language arts, science, and social studies. Library media skills, curriculum objectives, grade levels, resources, instructional roles, activities and procedures, evaluation, and follow-up are describes for each…
Shapes and sounds as self-objects in learning geography.
Baum, E A
1978-01-01
The pleasure which some children find in maps and map reading is manifold in origin. Children cathect patterns of configuration and color and derive joy from the visual mastery of these. This gratification is enhanced by the child's knowledge that the map represents something bigger than and external to itself. Likewise, some children take pleasure in the pronunciation of names themselves. The phonetic transcription of multisyllabic names is often a plearurable challenge. The vocalized name has its origin in the self, becomes barely external to self, and is self-monitored. Thus, in children both the configurations and the vocalizations associated with map reading have the properties of "self=objects" (Kohut, 1971). From the author's observation the delight which some children take in sounding out geographic names on a map may, in some instances, indicate pre-existing gratifying sound associations. Childish amusement in punning on cognomens may be an even greater stimulant for learning than visual configurations or artificial cognitive devices.
Consensus generation and variant detection by Celera Assembler.
Denisov, Gennady; Walenz, Brian; Halpern, Aaron L; Miller, Jason; Axelrod, Nelson; Levy, Samuel; Sutton, Granger
2008-04-15
We present an algorithm to identify allelic variation given a Whole Genome Shotgun (WGS) assembly of haploid sequences, and to produce a set of haploid consensus sequences rather than a single consensus sequence. Existing WGS assemblers take a column-by-column approach to consensus generation, and produce a single consensus sequence which can be inconsistent with the underlying haploid alleles, and inconsistent with any of the aligned sequence reads. Our new algorithm uses a dynamic windowing approach. It detects alleles by simultaneously processing the portions of aligned reads spanning a region of sequence variation, assigns reads to their respective alleles, phases adjacent variant alleles and generates a consensus sequence corresponding to each confirmed allele. This algorithm was used to produce the first diploid genome sequence of an individual human. It can also be applied to assemblies of multiple diploid individuals and hybrid assemblies of multiple haploid organisms. Being applied to the individual human genome assembly, the new algorithm detects exactly two confirmed alleles and reports two consensus sequences in 98.98% of the total number 2,033311 detected regions of sequence variation. In 33,269 out of 460,373 detected regions of size >1 bp, it fixes the constructed errors of a mosaic haploid representation of a diploid locus as produced by the original Celera Assembler consensus algorithm. Using an optimized procedure calibrated against 1 506 344 known SNPs, it detects 438 814 new heterozygous SNPs with false positive rate 12%. The open source code is available at: http://wgs-assembler.cvs.sourceforge.net/wgs-assembler/
Fast object detection algorithm based on HOG and CNN
NASA Astrophysics Data System (ADS)
Lu, Tongwei; Wang, Dandan; Zhang, Yanduo
2018-04-01
In the field of computer vision, object classification and object detection are widely used in many fields. The traditional object detection have two main problems:one is that sliding window of the regional selection strategy is high time complexity and have window redundancy. And the other one is that Robustness of the feature is not well. In order to solve those problems, Regional Proposal Network (RPN) is used to select candidate regions instead of selective search algorithm. Compared with traditional algorithms and selective search algorithms, RPN has higher efficiency and accuracy. We combine HOG feature and convolution neural network (CNN) to extract features. And we use SVM to classify. For TorontoNet, our algorithm's mAP is 1.6 percentage points higher. For OxfordNet, our algorithm's mAP is 1.3 percentage higher.
Fast Mapping by Bilingual Children: Storybooks and Cartoons
ERIC Educational Resources Information Center
Van Horn, Danielle; Kan, Pui Fong
2016-01-01
The purpose of this study was to examine the fast mapping skills in Spanish-English bilingual preschool children in two learning contexts: storybook reading and cartoon viewing. Eighteen typically developing Spanish-English bilingual preschool children completed a fast mapping task in Spanish (L1) and in English (L2). In 4 different sessions, each…
ERIC Educational Resources Information Center
Hsu, Hsiao-Ping; Tsai, Bor-Wen; Chen, Che-Ming
2018-01-01
Teaching high-school geomorphological concepts and topographic map reading entails many challenges. This research reports the applicability and effectiveness of Google Earth in teaching topographic map skills and geomorphological concepts, by a single teacher, in a one-computer classroom. Compared to learning via a conventional instructional…
An Exploratory Study of Civil Servants Spatial Thinking, Awareness and Use of Maps in Africa-Nigeria
NASA Astrophysics Data System (ADS)
Asiyanbola, R. A.
2018-04-01
The paper is an exploratory study of spatial thinking, awareness and use of maps among civil servants in Nigeria with a view towards enhancing capacity building in the development and use of global mapping and geospatial information technologies products and services. The data used in the paper was from administration of 152 questionnaires to civil servants in Ibadan, Oyo State, Nigeria between February and August, 2017. Descriptive statistics were used to analyse the data. The study shows among others that majority of the civil servants had situations in their daily lives or specialty that require spatial thinking; the three top situations in their daily lives or specialty that require spatial thinking were identification of places, wayfinding and walking; majority of them asked from people information about location, direction, distances and other needed information about places they do not know; majority of them were aware of maps; majority of them could read maps; majority of them had interest to learn more how to read maps and were willing to pay for the training.
NASA Astrophysics Data System (ADS)
Niazmardi, S.; Safari, A.; Homayouni, S.
2017-09-01
Crop mapping through classification of Satellite Image Time-Series (SITS) data can provide very valuable information for several agricultural applications, such as crop monitoring, yield estimation, and crop inventory. However, the SITS data classification is not straightforward. Because different images of a SITS data have different levels of information regarding the classification problems. Moreover, the SITS data is a four-dimensional data that cannot be classified using the conventional classification algorithms. To address these issues in this paper, we presented a classification strategy based on Multiple Kernel Learning (MKL) algorithms for SITS data classification. In this strategy, initially different kernels are constructed from different images of the SITS data and then they are combined into a composite kernel using the MKL algorithms. The composite kernel, once constructed, can be used for the classification of the data using the kernel-based classification algorithms. We compared the computational time and the classification performances of the proposed classification strategy using different MKL algorithms for the purpose of crop mapping. The considered MKL algorithms are: MKL-Sum, SimpleMKL, LPMKL and Group-Lasso MKL algorithms. The experimental tests of the proposed strategy on two SITS data sets, acquired by SPOT satellite sensors, showed that this strategy was able to provide better performances when compared to the standard classification algorithm. The results also showed that the optimization method of the used MKL algorithms affects both the computational time and classification accuracy of this strategy.
Assignment Of Finite Elements To Parallel Processors
NASA Technical Reports Server (NTRS)
Salama, Moktar A.; Flower, Jon W.; Otto, Steve W.
1990-01-01
Elements assigned approximately optimally to subdomains. Mapping algorithm based on simulated-annealing concept used to minimize approximate time required to perform finite-element computation on hypercube computer or other network of parallel data processors. Mapping algorithm needed when shape of domain complicated or otherwise not obvious what allocation of elements to subdomains minimizes cost of computation.
Efficient Bit-to-Symbol Likelihood Mappings
NASA Technical Reports Server (NTRS)
Moision, Bruce E.; Nakashima, Michael A.
2010-01-01
This innovation is an efficient algorithm designed to perform bit-to-symbol and symbol-to-bit likelihood mappings that represent a significant portion of the complexity of an error-correction code decoder for high-order constellations. Recent implementation of the algorithm in hardware has yielded an 8- percent reduction in overall area relative to the prior design.
An Alternative Retrieval Algorithm for the Ozone Mapping and Profiler Suite Limb Profiler
2012-05-01
behavior of aerosol extinction from the upper troposphere through the stratosphere is critical for retrieving ozone in this region. Aerosol scattering is......include area code) b. ABSTRACT c. THIS PAGE 18. NUMBER OF PAGES 17. LIMITATION OF ABSTRACT An Alternative Retrieval Algorithm for the Ozone Mapping and
Power Control and Optimization of Photovoltaic and Wind Energy Conversion Systems
NASA Astrophysics Data System (ADS)
Ghaffari, Azad
Power map and Maximum Power Point (MPP) of Photovoltaic (PV) and Wind Energy Conversion Systems (WECS) highly depend on system dynamics and environmental parameters, e.g., solar irradiance, temperature, and wind speed. Power optimization algorithms for PV systems and WECS are collectively known as Maximum Power Point Tracking (MPPT) algorithm. Gradient-based Extremum Seeking (ES), as a non-model-based MPPT algorithm, governs the system to its peak point on the steepest descent curve regardless of changes of the system dynamics and variations of the environmental parameters. Since the power map shape defines the gradient vector, then a close estimate of the power map shape is needed to create user assignable transients in the MPPT algorithm. The Hessian gives a precise estimate of the power map in a neighborhood around the MPP. The estimate of the inverse of the Hessian in combination with the estimate of the gradient vector are the key parts to implement the Newton-based ES algorithm. Hence, we generate an estimate of the Hessian using our proposed perturbation matrix. Also, we introduce a dynamic estimator to calculate the inverse of the Hessian which is an essential part of our algorithm. We present various simulations and experiments on the micro-converter PV systems to verify the validity of our proposed algorithm. The ES scheme can also be used in combination with other control algorithms to achieve desired closed-loop performance. The WECS dynamics is slow which causes even slower response time for the MPPT based on the ES. Hence, we present a control scheme, extended from Field-Oriented Control (FOC), in combination with feedback linearization to reduce the convergence time of the closed-loop system. Furthermore, the nonlinear control prevents magnetic saturation of the stator of the Induction Generator (IG). The proposed control algorithm in combination with the ES guarantees the closed-loop system robustness with respect to high level parameter uncertainty in the IG dynamics. The simulation results verify the effectiveness of the proposed algorithm.
Artificial Intelligence Software for Assessing Postural Stability
NASA Technical Reports Server (NTRS)
Lieberman, Erez; Forth, Katharine; Paloski, William
2013-01-01
A software package reads and analyzes pressure distributions from sensors mounted under a person's feet. Pressure data from sensors mounted in shoes, or in a platform, can be used to provide a description of postural stability (assessing competence to deficiency) and enables the determination of the person's present activity (running, walking, squatting, falling). This package has three parts: a preprocessing algorithm for reading input from pressure sensors; a Hidden Markov Model (HMM), which is used to determine the person's present activity and level of sensing-motor competence; and a suite of graphical algorithms, which allows visual representation of the person's activity and vestibular function over time.
SlideSort: all pairs similarity search for short reads
Shimizu, Kana; Tsuda, Koji
2011-01-01
Motivation: Recent progress in DNA sequencing technologies calls for fast and accurate algorithms that can evaluate sequence similarity for a huge amount of short reads. Searching similar pairs from a string pool is a fundamental process of de novo genome assembly, genome-wide alignment and other important analyses. Results: In this study, we designed and implemented an exact algorithm SlideSort that finds all similar pairs from a string pool in terms of edit distance. Using an efficient pattern growth algorithm, SlideSort discovers chains of common k-mers to narrow down the search. Compared to existing methods based on single k-mers, our method is more effective in reducing the number of edit distance calculations. In comparison to backtracking methods such as BWA, our method is much faster in finding remote matches, scaling easily to tens of millions of sequences. Our software has an additional function of single link clustering, which is useful in summarizing short reads for further processing. Availability: Executable binary files and C++ libraries are available at http://www.cbrc.jp/~shimizu/slidesort/ for Linux and Windows. Contact: slidesort@m.aist.go.jp; shimizu-kana@aist.go.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21148542
Novel ID-based anti-collision approach for RFID
NASA Astrophysics Data System (ADS)
Zhang, De-Gan; Li, Wen-Bin
2016-09-01
Novel correlation ID-based (CID) anti-collision approach for RFID under the banner of the Internet of Things (IOT) has been presented in this paper. The key insights are as follows: according to the deterministic algorithms which are based on the binary search tree, we propose a method to increase the association between tags so that tags can initiatively send their own ID under certain trigger conditions, at the same time, we present a multi-tree search method for querying. When the number of tags is small, by replacing the actual ID with the temporary ID, it can greatly reduce the number of times that the reader reads and writes to tag's ID. Active tags send data to the reader by the way of modulation binary pulses. When applying this method to the uncertain ALOHA algorithms, the reader can determine the locations of the empty slots according to the position of the binary pulse, so it can avoid the decrease in efficiency which is caused by reading empty slots when reading slots. Theory and experiment show that this method can greatly improve the recognition efficiency of the system when applied to either the search tree or the ALOHA anti-collision algorithms.
Tile-Based Two-Dimensional Phase Unwrapping for Digital Holography Using a Modular Framework
Antonopoulos, Georgios C.; Steltner, Benjamin; Heisterkamp, Alexander; Ripken, Tammo; Meyer, Heiko
2015-01-01
A variety of physical and biomedical imaging techniques, such as digital holography, interferometric synthetic aperture radar (InSAR), or magnetic resonance imaging (MRI) enable measurement of the phase of a physical quantity additionally to its amplitude. However, the phase can commonly only be measured modulo 2π, as a so called wrapped phase map. Phase unwrapping is the process of obtaining the underlying physical phase map from the wrapped phase. Tile-based phase unwrapping algorithms operate by first tessellating the phase map, then unwrapping individual tiles, and finally merging them to a continuous phase map. They can be implemented computationally efficiently and are robust to noise. However, they are prone to failure in the presence of phase residues or erroneous unwraps of single tiles. We tried to overcome these shortcomings by creating novel tile unwrapping and merging algorithms as well as creating a framework that allows to combine them in modular fashion. To increase the robustness of the tile unwrapping step, we implemented a model-based algorithm that makes efficient use of linear algebra to unwrap individual tiles. Furthermore, we adapted an established pixel-based unwrapping algorithm to create a quality guided tile merger. These original algorithms as well as previously existing ones were implemented in a modular phase unwrapping C++ framework. By examining different combinations of unwrapping and merging algorithms we compared our method to existing approaches. We could show that the appropriate choice of unwrapping and merging algorithms can significantly improve the unwrapped result in the presence of phase residues and noise. Beyond that, our modular framework allows for efficient design and test of new tile-based phase unwrapping algorithms. The software developed in this study is freely available. PMID:26599984
Tile-Based Two-Dimensional Phase Unwrapping for Digital Holography Using a Modular Framework.
Antonopoulos, Georgios C; Steltner, Benjamin; Heisterkamp, Alexander; Ripken, Tammo; Meyer, Heiko
2015-01-01
A variety of physical and biomedical imaging techniques, such as digital holography, interferometric synthetic aperture radar (InSAR), or magnetic resonance imaging (MRI) enable measurement of the phase of a physical quantity additionally to its amplitude. However, the phase can commonly only be measured modulo 2π, as a so called wrapped phase map. Phase unwrapping is the process of obtaining the underlying physical phase map from the wrapped phase. Tile-based phase unwrapping algorithms operate by first tessellating the phase map, then unwrapping individual tiles, and finally merging them to a continuous phase map. They can be implemented computationally efficiently and are robust to noise. However, they are prone to failure in the presence of phase residues or erroneous unwraps of single tiles. We tried to overcome these shortcomings by creating novel tile unwrapping and merging algorithms as well as creating a framework that allows to combine them in modular fashion. To increase the robustness of the tile unwrapping step, we implemented a model-based algorithm that makes efficient use of linear algebra to unwrap individual tiles. Furthermore, we adapted an established pixel-based unwrapping algorithm to create a quality guided tile merger. These original algorithms as well as previously existing ones were implemented in a modular phase unwrapping C++ framework. By examining different combinations of unwrapping and merging algorithms we compared our method to existing approaches. We could show that the appropriate choice of unwrapping and merging algorithms can significantly improve the unwrapped result in the presence of phase residues and noise. Beyond that, our modular framework allows for efficient design and test of new tile-based phase unwrapping algorithms. The software developed in this study is freely available.
Gottlieb, Klaus; Hussain, Fez
2015-02-19
Independent central reading or off-site reading of imaging endpoints is increasingly used in clinical trials. Clinician-reported outcomes, such as endoscopic disease activity scores, have been shown to be subject to bias and random error. Central reading attempts to limit bias and improve accuracy of the assessment, two factors that are critical to trial success. Whether one central reader is sufficient and how to best integrate the input of more than one central reader into one output measure, is currently not known.In this concept paper we develop the theoretical foundations of a reading algorithm that can achieve both objectives without jeopardizing operational efficiency We examine the role of expert versus competent reader, frame scoring of imaging as a classification task, and propose a voting algorithm (VISA: Voting for Image Scoring and Assessment) as the most appropriate solution which could also be used to operationally define imaging gold standards. We propose two image readers plus an optional third reader in cases of disagreement (2 + 1) for ordinary scoring tasks. We argue that it is critical in trials with endoscopically determined endpoints to include the score determined by the site reader, at least in endoscopy clinical trials. Juries with more than 3 readers could define a reference standard that would allow a transition from measuring reader agreement to measuring reader accuracy. We support VISA by applying concepts from engineering (triple-modular redundancy) and voting theory (Condorcet's jury theorem) and illustrate our points with examples from inflammatory bowel disease trials, specifically, the endoscopy component of the Mayo Clinic Score of ulcerative colitis disease activity. Detailed flow-diagrams (pseudo-code) are provided that can inform program design.The VISA "2 + 1" reading algorithm, based on voting, can translate individual reader scores into a final score in a fashion that is both mathematically sound (by avoiding averaging of ordinal data) and in a manner that is consistent with the scoring task at hand (based on decisions about the presence or absence of features, a subjective classification task). While the VISA 2 + 1 algorithm is currently being used in clinical trials, empirical data of its performance have not yet been reported.
Molecular surface mesh generation by filtering electron density map.
Giard, Joachim; Macq, Benoît
2010-01-01
Bioinformatics applied to macromolecules are now widely spread and in continuous expansion. In this context, representing external molecular surface such as the Van der Waals Surface or the Solvent Excluded Surface can be useful for several applications. We propose a fast and parameterizable algorithm giving good visual quality meshes representing molecular surfaces. It is obtained by isosurfacing a filtered electron density map. The density map is the result of the maximum of Gaussian functions placed around atom centers. This map is filtered by an ideal low-pass filter applied on the Fourier Transform of the density map. Applying the marching cubes algorithm on the inverse transform provides a mesh representation of the molecular surface.
Improving the interoperability of biomedical ontologies with compound alignments.
Oliveira, Daniela; Pesquita, Catia
2018-01-09
Ontologies are commonly used to annotate and help process life sciences data. Although their original goal is to facilitate integration and interoperability among heterogeneous data sources, when these sources are annotated with distinct ontologies, bridging this gap can be challenging. In the last decade, ontology matching systems have been evolving and are now capable of producing high-quality mappings for life sciences ontologies, usually limited to the equivalence between two ontologies. However, life sciences research is becoming increasingly transdisciplinary and integrative, fostering the need to develop matching strategies that are able to handle multiple ontologies and more complex relations between their concepts. We have developed ontology matching algorithms that are able to find compound mappings between multiple biomedical ontologies, in the form of ternary mappings, finding for instance that "aortic valve stenosis"(HP:0001650) is equivalent to the intersection between "aortic valve"(FMA:7236) and "constricted" (PATO:0001847). The algorithms take advantage of search space filtering based on partial mappings between ontology pairs, to be able to handle the increased computational demands. The evaluation of the algorithms has shown that they are able to produce meaningful results, with precision in the range of 60-92% for new mappings. The algorithms were also applied to the potential extension of logical definitions of the OBO and the matching of several plant-related ontologies. This work is a first step towards finding more complex relations between multiple ontologies. The evaluation shows that the results produced are significant and that the algorithms could satisfy specific integration needs.
A Bayesian approach to tracking patients having changing pharmacokinetic parameters
NASA Technical Reports Server (NTRS)
Bayard, David S.; Jelliffe, Roger W.
2004-01-01
This paper considers the updating of Bayesian posterior densities for pharmacokinetic models associated with patients having changing parameter values. For estimation purposes it is proposed to use the Interacting Multiple Model (IMM) estimation algorithm, which is currently a popular algorithm in the aerospace community for tracking maneuvering targets. The IMM algorithm is described, and compared to the multiple model (MM) and Maximum A-Posteriori (MAP) Bayesian estimation methods, which are presently used for posterior updating when pharmacokinetic parameters do not change. Both the MM and MAP Bayesian estimation methods are used in their sequential forms, to facilitate tracking of changing parameters. Results indicate that the IMM algorithm is well suited for tracking time-varying pharmacokinetic parameters in acutely ill and unstable patients, incurring only about half of the integrated error compared to the sequential MM and MAP methods on the same example.
A floor-map-aided WiFi/pseudo-odometry integration algorithm for an indoor positioning system.
Wang, Jian; Hu, Andong; Liu, Chunyan; Li, Xin
2015-03-24
This paper proposes a scheme for indoor positioning by fusing floor map, WiFi and smartphone sensor data to provide meter-level positioning without additional infrastructure. A topology-constrained K nearest neighbor (KNN) algorithm based on a floor map layout provides the coordinates required to integrate WiFi data with pseudo-odometry (P-O) measurements simulated using a pedestrian dead reckoning (PDR) approach. One method of further improving the positioning accuracy is to use a more effective multi-threshold step detection algorithm, as proposed by the authors. The "go and back" phenomenon caused by incorrect matching of the reference points (RPs) of a WiFi algorithm is eliminated using an adaptive fading-factor-based extended Kalman filter (EKF), taking WiFi positioning coordinates, P-O measurements and fused heading angles as observations. The "cross-wall" problem is solved based on the development of a floor-map-aided particle filter algorithm by weighting the particles, thereby also eliminating the gross-error effects originating from WiFi or P-O measurements. The performance observed in a field experiment performed on the fourth floor of the School of Environmental Science and Spatial Informatics (SESSI) building on the China University of Mining and Technology (CUMT) campus confirms that the proposed scheme can reliably achieve meter-level positioning.