Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding.
Lan, Freeman; Demaree, Benjamin; Ahmed, Noorsher; Abate, Adam R
2017-07-01
The application of single-cell genome sequencing to large cell populations has been hindered by technical challenges in isolating single cells during genome preparation. Here we present single-cell genomic sequencing (SiC-seq), which uses droplet microfluidics to isolate, fragment, and barcode the genomes of single cells, followed by Illumina sequencing of pooled DNA. We demonstrate ultra-high-throughput sequencing of >50,000 cells per run in a synthetic community of Gram-negative and Gram-positive bacteria and fungi. The sequenced genomes can be sorted in silico based on characteristic sequences. We use this approach to analyze the distributions of antibiotic-resistance genes, virulence factors, and phage sequences in microbial communities from an environmental sample. The ability to routinely sequence large populations of single cells will enable the de-convolution of genetic heterogeneity in diverse cell populations.
A Glance at Microsatellite Motifs from 454 Sequencing Reads of Watermelon Genomic DNA
USDA-ARS?s Scientific Manuscript database
A single 454 (Life Sciences Sequencing Technology) run of Charleston Gray watermelon (Citrullus lanatus var. lanatus) genomic DNA was performed and sequence data were assembled. A large scale identification of simple sequence repeat (SSR) was performed and SSR sequence data were used for the develo...
Kim, Kwondo; Jung, Jaehoon; Caetano-Anollés, Kelsey; Sung, Samsun; Yoo, DongAhn; Choi, Bong-Hwan; Kim, Hyung-Chul; Jeong, Jin-Young; Cho, Yong-Min; Park, Eung-Woo; Choi, Tae-Jeong; Park, Byoungho; Lim, Dajeong
2018-01-01
Artificial selection has been demonstrated to have a rapid and significant effect on the phenotype and genome of an organism. However, most previous studies on artificial selection have focused solely on genomic sequences modified by artificial selection or genomic sequences associated with a specific trait. In this study, we generated whole genome sequencing data of 126 cattle under artificial selection, and 24,973,862 single nucleotide variants to investigate the relationship among artificial selection, genomic sequences and trait. Using runs of homozygosity detected by the variants, we showed increase of inbreeding for decades, and at the same time demonstrated a little influence of recent inbreeding on body weight. Also, we could identify ~0.2 Mb runs of homozygosity segment which may be created by recent artificial selection. This approach may aid in development of genetic markers directly influenced by artificial selection, and provide insight into the process of artificial selection. PMID:29561881
Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske
2007-02-14
The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.
PANGEA: pipeline for analysis of next generation amplicons
Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz FW; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W
2010-01-01
High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including preprocessing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the χ2 step, are joined into one program called the ‘backbone’. PMID:20182525
PANGEA: pipeline for analysis of next generation amplicons.
Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz F W; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W
2010-07-01
High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including pre-processing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the chi(2) step, are joined into one program called the 'backbone'.
Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing.
Vega-Arreguín, Julio C; Ibarra-Laclette, Enrique; Jiménez-Moraila, Beatriz; Martínez, Octavio; Vielle-Calzada, Jean Philippe; Herrera-Estrella, Luis; Herrera-Estrella, Alfredo
2009-07-06
In-depth sequencing analysis has not been able to determine the overall complexity of transcriptional activity of a plant organ or tissue sample. In some cases, deep parallel sequencing of Expressed Sequence Tags (ESTs), although not yet optimized for the sequencing of cDNAs, has represented an efficient procedure for validating gene prediction and estimating overall gene coverage. This approach could be very valuable for complex plant genomes. In addition, little emphasis has been given to efforts aiming at an estimation of the overall transcriptional universe found in a multicellular organism at a specific developmental stage. To explore, in depth, the transcriptional diversity in an ancient maize landrace, we developed a protocol to optimize the sequencing of cDNAs and performed 4 consecutive GS20-454 pyrosequencing runs of a cDNA library obtained from 2 week-old Palomero Toluqueño maize plants. The protocol reported here allowed obtaining over 90% of informative sequences. These GS20-454 runs generated over 1.5 Million reads, representing the largest amount of sequences reported from a single plant cDNA library. A collection of 367,391 quality-filtered reads (30.09 Mb) from a single run was sufficient to identify transcripts corresponding to 34% of public maize ESTs databases; total sequences generated after 4 filtered runs increased this coverage to 50%. Comparisons of all 1.5 Million reads to the Maize Assembled Genomic Islands (MAGIs) provided evidence for the transcriptional activity of 11% of MAGIs. We estimate that 5.67% (86,069 sequences) do not align with public ESTs or annotated genes, potentially representing new maize transcripts. Following the assembly of 74.4% of the reads in 65,493 contigs, real-time PCR of selected genes confirmed a predicted correlation between the abundance of GS20-454 sequences and corresponding levels of gene expression. A protocol was developed that significantly increases the number, length and quality of cDNA reads using massive 454 parallel sequencing. We show that recurrent 454 pyrosequencing of a single cDNA sample is necessary to attain a thorough representation of the transcriptional universe present in maize, that can also be used to estimate transcript abundance of specific genes. This data suggests that the molecular and functional diversity contained in the vast native landraces remains to be explored, and that large-scale transcriptional sequencing of a presumed ancestor of the modern maize varieties represents a valuable approach to characterize the functional diversity of maize for future agricultural and evolutionary studies.
Canary: an atomic pipeline for clinical amplicon assays.
Doig, Kenneth D; Ellul, Jason; Fellowes, Andrew; Thompson, Ella R; Ryland, Georgina; Blombery, Piers; Papenfuss, Anthony T; Fox, Stephen B
2017-12-15
High throughput sequencing requires bioinformatics pipelines to process large volumes of data into meaningful variants that can be translated into a clinical report. These pipelines often suffer from a number of shortcomings: they lack robustness and have many components written in multiple languages, each with a variety of resource requirements. Pipeline components must be linked together with a workflow system to achieve the processing of FASTQ files through to a VCF file of variants. Crafting these pipelines requires considerable bioinformatics and IT skills beyond the reach of many clinical laboratories. Here we present Canary, a single program that can be run on a laptop, which takes FASTQ files from amplicon assays through to an annotated VCF file ready for clinical analysis. Canary can be installed and run with a single command using Docker containerization or run as a single JAR file on a wide range of platforms. Although it is a single utility, Canary performs all the functions present in more complex and unwieldy pipelines. All variants identified by Canary are 3' shifted and represented in their most parsimonious form to provide a consistent nomenclature, irrespective of sequencing variation. Further, proximate in-phase variants are represented as a single HGVS 'delins' variant. This allows for correct nomenclature and consequences to be ascribed to complex multi-nucleotide polymorphisms (MNPs), which are otherwise difficult to represent and interpret. Variants can also be annotated with hundreds of attributes sourced from MyVariant.info to give up to date details on pathogenicity, population statistics and in-silico predictors. Canary has been used at the Peter MacCallum Cancer Centre in Melbourne for the last 2 years for the processing of clinical sequencing data. By encapsulating clinical features in a single, easily installed executable, Canary makes sequencing more accessible to all pathology laboratories. Canary is available for download as source or a Docker image at https://github.com/PapenfussLab/Canary under a GPL-3.0 License.
Rapid evaluation and quality control of next generation sequencing data with FaQCs.
Lo, Chien-Chi; Chain, Patrick S G
2014-11-19
Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly process large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.
Kaneda, Shohei; Ono, Koichi; Fukuba, Tatsuhiro; Nojima, Takahiko; Yamamoto, Takatoki; Fujii, Teruo
2011-01-01
In this paper, a rapid and simple method to determine the optimal temperature conditions for denaturant electrophoresis using a temperature-controlled on-chip capillary electrophoresis (CE) device is presented. Since on-chip CE operations including sample loading, injection and separation are carried out just by switching the electric field, we can repeat consecutive run-to-run CE operations on a single on-chip CE device by programming the voltage sequences. By utilizing the high-speed separation and the repeatability of the on-chip CE, a series of electrophoretic operations with different running temperatures can be implemented. Using separations of reaction products of single-stranded DNA (ssDNA) with a peptide nucleic acid (PNA) oligomer, the effectiveness of the presented method to determine the optimal temperature conditions required to discriminate a single-base substitution (SBS) between two different ssDNAs is demonstrated. It is shown that a single run for one temperature condition can be executed within 4 min, and the optimal temperature to discriminate the SBS could be successfully found using the present method. PMID:21845077
ERIC Educational Resources Information Center
Shah, Kushani; Thomas, Shelby; Stein, Arnold
2013-01-01
In this report, we describe a 5-week laboratory exercise for undergraduate biology and biochemistry students in which students learn to sequence DNA and to genotype their DNA for selected single nucleotide polymorphisms (SNPs). Students use miniaturized DNA sequencing gels that require approximately 8 min to run. The students perform G, A, T, C…
Mount, D W; Conrad, B
1986-01-01
We have previously described programs for a variety of types of sequence analysis (1-4). These programs have now been integrated into a single package. They are written in the standard C programming language and run on virtually any computer system with a C compiler, such as the IBM/PC and other computers running under the MS/DOS and UNIX operating systems. The programs are widely distributed and may be obtained from the authors as described below. PMID:3753780
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lo, Chien -Chi; Chain, Patrick S. G.
Background: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Results: Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly processmore » large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. Conclusion: FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.« less
Rapid evaluation and quality control of next generation sequencing data with FaQCs
Lo, Chien -Chi; Chain, Patrick S. G.
2014-12-01
Background: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Results: Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly processmore » large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. Conclusion: FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.« less
AlexSys: a knowledge-based expert system for multiple sequence alignment construction and analysis
Aniba, Mohamed Radhouene; Poch, Olivier; Marchler-Bauer, Aron; Thompson, Julie Dawn
2010-01-01
Multiple sequence alignment (MSA) is a cornerstone of modern molecular biology and represents a unique means of investigating the patterns of conservation and diversity in complex biological systems. Many different algorithms have been developed to construct MSAs, but previous studies have shown that no single aligner consistently outperforms the rest. This has led to the development of a number of ‘meta-methods’ that systematically run several aligners and merge the output into one single solution. Although these methods generally produce more accurate alignments, they are inefficient because all the aligners need to be run first and the choice of the best solution is made a posteriori. Here, we describe the development of a new expert system, AlexSys, for the multiple alignment of protein sequences. AlexSys incorporates an intelligent inference engine to automatically select an appropriate aligner a priori, depending only on the nature of the input sequences. The inference engine was trained on a large set of reference multiple alignments, using a novel machine learning approach. Applying AlexSys to a test set of 178 alignments, we show that the expert system represents a good compromise between alignment quality and running time, making it suitable for high throughput projects. AlexSys is freely available from http://alnitak.u-strasbg.fr/∼aniba/alexsys. PMID:20530533
2013-09-01
sequence dataset. All procedures were performed by personnel in the IIMT UT Southwestern Genomics and Microarray Core using standard protocols. More... sequencing run, samples were demultiplexed using standard algorithms in the Genomics and Microarray Core and processed into individual sample Illumina single... Sequencing (RNA-Seq), using Illumina’s multiplexing mRNA-Seq to generate full sequence libraries from the poly-A tailed RNA to a read depth of 30
Bio-Docklets: virtualization containers for single-step execution of NGS pipelines.
Kim, Baekdoo; Ali, Thahmina; Lijeron, Carlos; Afgan, Enis; Krampis, Konstantinos
2017-08-01
Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a "meta-script" that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets. © The Authors 2017. Published by Oxford University Press.
Bio-Docklets: virtualization containers for single-step execution of NGS pipelines
Kim, Baekdoo; Ali, Thahmina; Lijeron, Carlos; Afgan, Enis
2017-01-01
Abstract Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a “meta-script” that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets. PMID:28854616
Gupta, Parth Sarthi Sen; Banerjee, Shyamashree; Islam, Rifat Nawaz Ul; Mondal, Sudipta; Mondal, Buddhadev; Bandyopadhyay, Amal K
2014-01-01
In the genomic and proteomic era, efficient and automated analyses of sequence properties of protein have become an important task in bioinformatics. There are general public licensed (GPL) software tools to perform a part of the job. However, computations of mean properties of large number of orthologous sequences are not possible from the above mentioned GPL sets. Further, there is no GPL software or server which can calculate window dependent sequence properties for a large number of sequences in a single run. With a view to overcome above limitations, we have developed a standalone procedure i.e. PHYSICO, which performs various stages of computation in a single run based on the type of input provided either in RAW-FASTA or BLOCK-FASTA format and makes excel output for: a) Composition, Class composition, Mean molecular weight, Isoelectic point, Aliphatic index and GRAVY, b) column based compositions, variability and difference matrix, c) 25 kinds of window dependent sequence properties. The program is fast, efficient, error free and user friendly. Calculation of mean and standard deviation of homologous sequences sets, for comparison purpose when relevant, is another attribute of the program; a property seldom seen in existing GPL softwares. PHYSICO is freely available for non-commercial/academic user in formal request to the corresponding author akbanerjee@biotech.buruniv.ac.in.
Gupta, Parth Sarthi Sen; Banerjee, Shyamashree; Islam, Rifat Nawaz Ul; Mondal, Sudipta; Mondal, Buddhadev; Bandyopadhyay, Amal K
2014-01-01
In the genomic and proteomic era, efficient and automated analyses of sequence properties of protein have become an important task in bioinformatics. There are general public licensed (GPL) software tools to perform a part of the job. However, computations of mean properties of large number of orthologous sequences are not possible from the above mentioned GPL sets. Further, there is no GPL software or server which can calculate window dependent sequence properties for a large number of sequences in a single run. With a view to overcome above limitations, we have developed a standalone procedure i.e. PHYSICO, which performs various stages of computation in a single run based on the type of input provided either in RAW-FASTA or BLOCK-FASTA format and makes excel output for: a) Composition, Class composition, Mean molecular weight, Isoelectic point, Aliphatic index and GRAVY, b) column based compositions, variability and difference matrix, c) 25 kinds of window dependent sequence properties. The program is fast, efficient, error free and user friendly. Calculation of mean and standard deviation of homologous sequences sets, for comparison purpose when relevant, is another attribute of the program; a property seldom seen in existing GPL softwares. Availability PHYSICO is freely available for non-commercial/academic user in formal request to the corresponding author akbanerjee@biotech.buruniv.ac.in PMID:24616564
Short-Read Sequencing for Genomic Analysis of the Brown Rot Fungus Fibroporia radiculosa
J. D. Tang; A. D. Perkins; T. S. Sonstegard; S. G. Schroeder; S. C. Burgess; S. V. Diehl
2012-01-01
The feasibility of short-read sequencing for genomic analysis was demonstrated for Fibroporia radiculosa, a copper-tolerant fungus that causes brown rot decay of wood. The effect of read quality on genomic assembly was assessed by filtering Illumina GAIIx reads from a single run of a paired-end library (75-nucleotide read length and 300-bp fragment...
High-throughput sequence alignment using Graphics Processing Units
Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh
2007-01-01
Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356
Jakupciak, John P; Wells, Jeffrey M; Karalus, Richard J; Pawlowski, David R; Lin, Jeffrey S; Feldman, Andrew B
2013-01-01
Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.
Jakupciak, John P.; Wells, Jeffrey M.; Karalus, Richard J.; Pawlowski, David R.; Lin, Jeffrey S.; Feldman, Andrew B.
2013-01-01
Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations. PMID:24455204
Jayashree, B; Hanspal, Manindra S; Srinivasan, Rajgopal; Vigneshwaran, R; Varshney, Rajeev K; Spurthi, N; Eshwar, K; Ramesh, N; Chandra, S; Hoisington, David A
2007-01-01
The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs). In the case of model organisms, the data available are numerous, given the degree of redundancy in the deposited EST data. There are several available bioinformatics tools that can be used to mine this data; however, using them requires a certain level of expertise: the tools have to be used sequentially with accompanying format conversion and steps like clustering and assembly of sequences become time-intensive jobs even for moderately sized datasets. We report here a pipeline of open source software extended to run on multiple CPU architectures that can be used to mine large EST datasets for SNPs and identify restriction sites for assaying the SNPs so that cost-effective CAPS assays can be developed for SNP genotyping in genetics and breeding applications. At the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), the pipeline has been implemented to run on a Paracel high-performance system consisting of four dual AMD Opteron processors running Linux with MPICH. The pipeline can be accessed through user-friendly web interfaces at http://hpc.icrisat.cgiar.org/PBSWeb and is available on request for academic use. We have validated the developed pipeline by mining chickpea ESTs for interspecies SNPs, development of CAPS assays for SNP genotyping, and confirmation of restriction digestion pattern at the sequence level.
The current status and portability of our sequence handling software.
Staden, R
1986-01-01
I describe the current status of our sequence analysis software. The package contains a comprehensive suite of programs for managing large shotgun sequencing projects, a program containing 61 functions for analysing single sequences and a program for comparing pairs of sequences for similarity. The programs that have been described before have been improved by the addition of new functions and by being made very much easier to use. The major interactive programs have 125 pages of online help available from within them. Several new programs are described including screen editing of aligned gel readings for shotgun sequencing projects; a method to highlight errors in aligned gel readings, new methods for searching for putative signals in sequences. We use the programs on a VAX computer but the whole package has been rewritten to make it easy to transport it to other machines. I believe the programs will now run on any machine with a FORTRAN77 compiler and sufficient memory. We are currently putting the programs onto an IBM PC XT/AT and another micro running under UNIX. PMID:3511446
ERIC Educational Resources Information Center
Wood, David
2006-01-01
Formulaic sequences are fixed combinations of words that have a range of functions and uses in speech production and communication, and seem to be cognitively stored and retrieved by speakers as if they were single words. They can facilitate fluency in speech by making pauses shorter and less frequent, and allowing longer runs of speech between…
Rapid and accurate pyrosequencing of angiosperm plastid genomes
Moore, Michael J; Dhingra, Amit; Soltis, Pamela S; Shaw, Regina; Farmerie, William G; Folta, Kevin M; Soltis, Douglas E
2006-01-01
Background Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20) System (454 Life Sciences Corporation), to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae) and Platanus occidentalis (Platanaceae). Results More than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6× in Nandina and 17.3× in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with ~60% of all errors associated with homopolymer runs of 5 or more nucleotides and ~50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions. Conclusion Highly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy observed in the GS 20 plastid genome sequence was generated for a significant reduction in time and cost over traditional shotgun-based genome sequencing techniques, although with approximately half the coverage of previously reported GS 20 de novo genome sequence. The GS 20 should be broadly applicable to angiosperm plastid genome sequencing, and therefore promises to expand the scale of plant genetic and phylogenetic research dramatically. PMID:16934154
CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline.
Agrawal, Sonia; Arze, Cesar; Adkins, Ricky S; Crabtree, Jonathan; Riley, David; Vangala, Mahesh; Galens, Kevin; Fraser, Claire M; Tettelin, Hervé; White, Owen; Angiuoli, Samuel V; Mahurkar, Anup; Fricke, W Florian
2017-04-27
The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36 h on a local desktop or at a cost of <$20 on EC2. CloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise.
Spotlight-8 Image Analysis Software
NASA Technical Reports Server (NTRS)
Klimek, Robert; Wright, Ted
2006-01-01
Spotlight is a cross-platform GUI-based software package designed to perform image analysis on sequences of images generated by combustion and fluid physics experiments run in a microgravity environment. Spotlight can perform analysis on a single image in an interactive mode or perform analysis on a sequence of images in an automated fashion. Image processing operations can be employed to enhance the image before various statistics and measurement operations are performed. An arbitrarily large number of objects can be analyzed simultaneously with independent areas of interest. Spotlight saves results in a text file that can be imported into other programs for graphing or further analysis. Spotlight can be run on Microsoft Windows, Linux, and Apple OS X platforms.
Gries, Jasmin; Schumacher, Dirk; Arand, Julia; Lutsik, Pavlo; Markelova, Maria Rivera; Fichtner, Iduna; Walter, Jörn; Sers, Christine; Tierling, Sascha
2013-01-01
The use of next generation sequencing has expanded our view on whole mammalian methylome patterns. In particular, it provides a genome-wide insight of local DNA methylation diversity at single nucleotide level and enables the examination of single chromosome sequence sections at a sufficient statistical power. We describe a bisulfite-based sequence profiling pipeline, Bi-PROF, which is based on the 454 GS-FLX Titanium technology that allows to obtain up to one million sequence stretches at single base pair resolution without laborious subcloning. To illustrate the performance of the experimental workflow connected to a bioinformatics program pipeline (BiQ Analyzer HT) we present a test analysis set of 68 different epigenetic marker regions (amplicons) in five individual patient-derived xenograft tissue samples of colorectal cancer and one healthy colon epithelium sample as a control. After the 454 GS-FLX Titanium run, sequence read processing and sample decoding, the obtained alignments are quality controlled and statistically evaluated. Comprehensive methylation pattern interpretation (profiling) assessed by analyzing 102-104 sequence reads per amplicon allows an unprecedented deep view on pattern formation and methylation marker heterogeneity in tissues concerned by complex diseases like cancer. PMID:23803588
POSTMan (POST-translational modification analysis), a software application for PTM discovery.
Arntzen, Magnus Ø; Osland, Christoffer Leif; Raa, Christopher Rasch-Olsen; Kopperud, Reidun; Døskeland, Stein-Ove; Lewis, Aurélia E; D'Santos, Clive S
2009-03-01
Post-translationally modified peptides present in low concentrations are often not selected for CID, resulting in no sequence information for these peptides. We have developed a software POSTMan (POST-translational Modification analysis) allowing post-translationally modified peptides to be targeted for fragmentation. The software aligns LC-MS runs (MS(1) data) between individual runs or within a single run and isolates pairs of peptides which differ by a user defined mass difference (post-translationally modified peptides). The method was validated for acetylated peptides and allowed an assessment of even the basal protein phosphorylation of phenylalanine hydroxylase (PHA) in intact cells.
Optically Mapping Multiple Bacterial Genomes Simultaneously in a Single Run
2011-11-21
sequence orientation. We have demonstrated mapping of Shigella dysenteriae and Escherichia coli simultaneously, despite their very close phylogenetic...relationship ( Shigella and Escherichia coli are generally considered to be within a single species, but are segregated at the genus level for historical...reasons [4]); two clones of Shigella would likely not map together successfully using the mixed DNA method. Similarly, based on reference maps being
Chang, Chia-Wei; Lai, Yi-Shin; Pawlik, Kevin M; Liu, Kaimao; Sun, Chiao-Wang; Li, Chao; Schoeb, Trenton R; Townes, Tim M
2009-05-01
We report the derivation of induced pluripotent stem (iPS) cells from adult skin fibroblasts using a single, polycistronic lentiviral vector encoding the reprogramming factors Oct4, Sox2, and Klf4. Porcine teschovirus-1 2A sequences that trigger ribosome skipping were inserted between human cDNAs for these factors, and the polycistron was subcloned downstream of the elongation factor 1 alpha promoter in a self-inactivating (SIN) lentiviral vector containing a loxP site in the truncated 3' long terminal repeat (LTR). Adult skin fibroblasts from a humanized mouse model of sickle cell disease were transduced with this single lentiviral vector, and iPS cell colonies were picked within 30 days. These cells expressed endogenous Oct4, Sox2, Nanog, alkaline phosphatase, stage-specific embryonic antigen-1, and other markers of pluripotency. The iPS cells produced teratomas containing tissue derived from all three germ layers after injection into immunocompromised mice and formed high-level chimeras after injection into murine blastocysts. iPS cell lines with as few as three lentiviral insertions were obtained. Expression of Cre recombinase in these iPS cells resulted in deletion of the lentiviral vector, and sequencing of insertion sites demonstrated that remnant 291-bp SIN LTRs containing a single loxP site did not interrupt coding sequences, promoters, or known regulatory elements. These results suggest that a single, polycistronic "hit and run" vector can safely and effectively reprogram adult dermal fibroblasts into iPS cells.
Shah, Kushani; Thomas, Shelby; Stein, Arnold
2013-01-01
In this report, we describe a 5-week laboratory exercise for undergraduate biology and biochemistry students in which students learn to sequence DNA and to genotype their DNA for selected single nucleotide polymorphisms (SNPs). Students use miniaturized DNA sequencing gels that require approximately 8 min to run. The students perform G, A, T, C Sanger sequencing reactions. They prepare and run the gels, perform Southern blots (which require only 10 min), and detect sequencing ladders using a colorimetric detection system. Students enlarge their sequencing ladders from digital images of their small nylon membranes, and read the sequence manually. They compare their reads with the actual DNA sequence using BLAST2. After mastering the DNA sequencing system, students prepare their own DNA from a cheek swab, polymerase chain reaction-amplify a region of their DNA that encompasses a SNP of interest, and perform sequencing to determine their genotype at the SNP position. A family pedigree can also be constructed. The SNP chosen by the instructor was rs17822931, which is in the ABCC11 gene and is the determinant of human earwax type. Genotypes at the rs178229931 site vary in different ethnic populations. © 2013 by The International Union of Biochemistry and Molecular Biology.
SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws
NASA Technical Reports Server (NTRS)
Cooke, Daniel; Rushton, Nelson
2013-01-01
With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.
Accelerating calculations of RNA secondary structure partition functions using GPUs
2013-01-01
Background RNA performs many diverse functions in the cell in addition to its role as a messenger of genetic information. These functions depend on its ability to fold to a unique three-dimensional structure determined by the sequence. The conformation of RNA is in part determined by its secondary structure, or the particular set of contacts between pairs of complementary bases. Prediction of the secondary structure of RNA from its sequence is therefore of great interest, but can be computationally expensive. In this work we accelerate computations of base-pair probababilities using parallel graphics processing units (GPUs). Results Calculation of the probabilities of base pairs in RNA secondary structures using nearest-neighbor standard free energy change parameters has been implemented using CUDA to run on hardware with multiprocessor GPUs. A modified set of recursions was introduced, which reduces memory usage by about 25%. GPUs are fastest in single precision, and for some hardware, restricted to single precision. This may introduce significant roundoff error. However, deviations in base-pair probabilities calculated using single precision were found to be negligible compared to those resulting from shifting the nearest-neighbor parameters by a random amount of magnitude similar to their experimental uncertainties. For large sequences running on our particular hardware, the GPU implementation reduces execution time by a factor of close to 60 compared with an optimized serial implementation, and by a factor of 116 compared with the original code. Conclusions Using GPUs can greatly accelerate computation of RNA secondary structure partition functions, allowing calculation of base-pair probabilities for large sequences in a reasonable amount of time, with a negligible compromise in accuracy due to working in single precision. The source code is integrated into the RNAstructure software package and available for download at http://rna.urmc.rochester.edu. PMID:24180434
Psychophysical spectro-temporal receptive fields in an auditory task.
Shub, Daniel E; Richards, Virginia M
2009-05-01
Psychophysical relative weighting functions, which provide information about the importance of different regions of a stimulus in forming decisions, are traditionally estimated using trial-based procedures, where a single stimulus is presented and a single response is recorded. Everyday listening is much more "free-running" in that we often must detect randomly occurring signals in the presence of a continuous background. Psychophysical relative weighting functions have not been measured with free-running paradigms. Here, we combine a free-running paradigm with the reverse correlation technique used to estimate physiological spectro-temporal receptive fields (STRFs) to generate psychophysical relative weighting functions that are analogous to physiological STRFs. The psychophysical task required the detection of a fixed target signal (a sequence of spectro-temporally coherent tone pips with a known frequency) in the presence of a continuously presented informational masker (spectro-temporally random tone pips). A comparison of psychophysical relative weighting functions estimated with the current free-running paradigm and trial-based paradigms, suggests that in informational-masking tasks subjects' decision strategies are similar in both free-running and trial-based paradigms. For more cognitively challenging tasks there may be differences in the decision strategies with free-running and trial-based paradigms.
AfterQC: automatic filtering, trimming, error removing and quality control for fastq data.
Chen, Shifu; Huang, Tanxiao; Zhou, Yanqing; Han, Yue; Xu, Mingyan; Gu, Jia
2017-03-14
Some applications, especially those clinical applications requiring high accuracy of sequencing data, usually have to face the troubles caused by unavoidable sequencing errors. Several tools have been proposed to profile the sequencing quality, but few of them can quantify or correct the sequencing errors. This unmet requirement motivated us to develop AfterQC, a tool with functions to profile sequencing errors and correct most of them, plus highly automated quality control and data filtering features. Different from most tools, AfterQC analyses the overlapping of paired sequences for pair-end sequencing data. Based on overlapping analysis, AfterQC can detect and cut adapters, and furthermore it gives a novel function to correct wrong bases in the overlapping regions. Another new feature is to detect and visualise sequencing bubbles, which can be commonly found on the flowcell lanes and may raise sequencing errors. Besides normal per cycle quality and base content plotting, AfterQC also provides features like polyX (a long sub-sequence of a same base X) filtering, automatic trimming and K-MER based strand bias profiling. For each single or pair of FastQ files, AfterQC filters out bad reads, detects and eliminates sequencer's bubble effects, trims reads at front and tail, detects the sequencing errors and corrects part of them, and finally outputs clean data and generates HTML reports with interactive figures. AfterQC can run in batch mode with multiprocess support, it can run with a single FastQ file, a single pair of FastQ files (for pair-end sequencing), or a folder for all included FastQ files to be processed automatically. Based on overlapping analysis, AfterQC can estimate the sequencing error rate and profile the error transform distribution. The results of our error profiling tests show that the error distribution is highly platform dependent. Much more than just another new quality control (QC) tool, AfterQC is able to perform quality control, data filtering, error profiling and base correction automatically. Experimental results show that AfterQC can help to eliminate the sequencing errors for pair-end sequencing data to provide much cleaner outputs, and consequently help to reduce the false-positive variants, especially for the low-frequency somatic mutations. While providing rich configurable options, AfterQC can detect and set all the options automatically and require no argument in most cases.
Li, Ruichao; Xie, Miaomiao; Dong, Ning; Lin, Dachuan; Yang, Xuemei; Wong, Marcus Ho Yin; Chan, Edward Wai-Chi; Chen, Sheng
2018-03-01
Multidrug resistance (MDR)-encoding plasmids are considered major molecular vehicles responsible for transmission of antibiotic resistance genes among bacteria of the same or different species. Delineating the complete sequences of such plasmids could provide valuable insight into the evolution and transmission mechanisms underlying bacterial antibiotic resistance development. However, due to the presence of multiple repeats of mobile elements, complete sequencing of MDR plasmids remains technically complicated, expensive, and time-consuming. Here, we demonstrate a rapid and efficient approach to obtaining multiple MDR plasmid sequences through the use of the MinION nanopore sequencing platform, which is incorporated in a portable device. By assembling the long sequencing reads generated by a single MinION run according to a rapid barcoding sequencing protocol, we obtained the complete sequences of 20 plasmids harbored by multiple bacterial strains. Importantly, single long reads covering a plasmid end-to-end were recorded, indicating that de novo assembly may be unnecessary if the single reads exhibit high accuracy. This workflow represents a convenient and cost-effective approach for systematic assessment of MDR plasmids responsible for treatment failure of bacterial infections, offering the opportunity to perform detailed molecular epidemiological studies to probe the evolutionary and transmission mechanisms of MDR-encoding elements.
SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information
2014-01-01
Background The recent introduction of the Pacific Biosciences RS single molecule sequencing technology has opened new doors to scaffolding genome assemblies in a cost-effective manner. The long read sequence information is promised to enhance the quality of incomplete and inaccurate draft assemblies constructed from Next Generation Sequencing (NGS) data. Results Here we propose a novel hybrid assembly methodology that aims to scaffold pre-assembled contigs in an iterative manner using PacBio RS long read information as a backbone. On a test set comprising six bacterial draft genomes, assembled using either a single Illumina MiSeq or Roche 454 library, we show that even a 50× coverage of uncorrected PacBio RS long reads is sufficient to drastically reduce the number of contigs. Comparisons to the AHA scaffolder indicate our strategy is better capable of producing (nearly) complete bacterial genomes. Conclusions The current work describes our SSPACE-LongRead software which is designed to upgrade incomplete draft genomes using single molecule sequences. We conclude that the recent advances of the PacBio sequencing technology and chemistry, in combination with the limited computational resources required to run our program, allow to scaffold genomes in a fast and reliable manner. PMID:24950923
DOE Office of Scientific and Technical Information (OSTI.GOV)
With the flood of whole genome finished and draft microbial sequences, we need faster, more scalable bioinformatics tools for sequence comparison. An algorithm is described to find single nucleotide polymorphisms (SNPs) in whole genome data. It scales to hundreds of bacterial or viral genomes, and can be used for finished and/or draft genomes available as unassembled contigs or raw, unassembled reads. The method is fast to compute, finding SNPs and building a SNP phylogeny in minutes to hours, depending on the size and diversity of the input sequences. The SNP-based trees that result are consistent with known taxonomy and treesmore » determined in other studies. The approach we describe can handle many gigabases of sequence in a single run. The algorithm is based on k-mer analysis.« less
Rapid and Accurate Sequencing of Enterovirus Genomes Using MinION Nanopore Sequencer.
Wang, Ji; Ke, Yue Hua; Zhang, Yong; Huang, Ke Qiang; Wang, Lei; Shen, Xin Xin; Dong, Xiao Ping; Xu, Wen Bo; Ma, Xue Jun
2017-10-01
Knowledge of an enterovirus genome sequence is very important in epidemiological investigation to identify transmission patterns and ascertain the extent of an outbreak. The MinION sequencer is increasingly used to sequence various viral pathogens in many clinical situations because of its long reads, portability, real-time accessibility of sequenced data, and very low initial costs. However, information is lacking on MinION sequencing of enterovirus genomes. In this proof-of-concept study using Enterovirus 71 (EV71) and Coxsackievirus A16 (CA16) strains as examples, we established an amplicon-based whole genome sequencing method using MinION. We explored the accuracy, minimum sequencing time, discrimination and high-throughput sequencing ability of MinION, and compared its performance with Sanger sequencing. Within the first minute (min) of sequencing, the accuracy of MinION was 98.5% for the single EV71 strain and 94.12%-97.33% for 10 genetically-related CA16 strains. In as little as 14 min, 99% identity was reached for the single EV71 strain, and in 17 min (on average), 99% identity was achieved for 10 CA16 strains in a single run. MinION is suitable for whole genome sequencing of enteroviruses with sufficient accuracy and fine discrimination and has the potential as a fast, reliable and convenient method for routine use. Copyright © 2017 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.
Traverse, Charles C.
2017-01-01
ABSTRACT Advances in sequencing technologies have enabled direct quantification of genome-wide errors that occur during RNA transcription. These errors occur at rates that are orders of magnitude higher than rates during DNA replication, but due to technical difficulties such measurements have been limited to single-base substitutions and have not yet quantified the scope of transcription insertions and deletions. Previous reporter gene assay findings suggested that transcription indels are produced exclusively by elongation complex slippage at homopolymeric runs, so we enumerated indels across the protein-coding transcriptomes of Escherichia coli and Buchnera aphidicola, which differ widely in their genomic base compositions and incidence of repeat regions. As anticipated from prior assays, transcription insertions prevailed in homopolymeric runs of A and T; however, transcription deletions arose in much more complex sequences and were rarely associated with homopolymeric runs. By reconstructing the relocated positions of the elongation complex as inferred from the sequences inserted or deleted during transcription, we show that continuation of transcription after slippage hinges on the degree of nucleotide complementarity within the RNA:DNA hybrid at the new DNA template location. PMID:28851848
Blank-Landeshammer, Bernhard; Kollipara, Laxmikanth; Biß, Karsten; Pfenninger, Markus; Malchow, Sebastian; Shuvaev, Konstantin; Zahedi, René P; Sickmann, Albert
2017-09-01
Complex mass spectrometry based proteomics data sets are mostly analyzed by protein database searches. While this approach performs considerably well for sequenced organisms, direct inference of peptide sequences from tandem mass spectra, i.e., de novo peptide sequencing, oftentimes is the only way to obtain information when protein databases are absent. However, available algorithms suffer from drawbacks such as lack of validation and often high rates of false positive hits (FP). Here we present a simple method of combining results from commonly available de novo peptide sequencing algorithms, which in conjunction with minor tweaks in data acquisition ensues lower empirical FDR compared to the analysis using single algorithms. Results were validated using state-of-the art database search algorithms as well specifically synthesized reference peptides. Thus, we could increase the number of PSMs meeting a stringent FDR of 5% more than 3-fold compared to the single best de novo sequencing algorithm alone, accounting for an average of 11 120 PSMs (combined) instead of 3476 PSMs (alone) in triplicate 2 h LC-MS runs of tryptic HeLa digestion.
Diagnostic Applications of Next Generation Sequencing in Immunogenetics and Molecular Oncology
Grumbt, Barbara; Eck, Sebastian H.; Hinrichsen, Tanja; Hirv, Kaimo
2013-01-01
Summary With the introduction of the next generation sequencing (NGS) technologies, remarkable new diagnostic applications have been established in daily routine. Implementation of NGS is challenging in clinical diagnostics, but definite advantages and new diagnostic possibilities make the switch to the technology inevitable. In addition to the higher sequencing capacity, clonal sequencing of single molecules, multiplexing of samples, higher diagnostic sensitivity, workflow miniaturization, and cost benefits are some of the valuable features of the technology. After the recent advances, NGS emerged as a proven alternative for classical Sanger sequencing in the typing of human leukocyte antigens (HLA). By virtue of the clonal amplification of single DNA molecules ambiguous typing results can be avoided. Simultaneously, a higher sample throughput can be achieved by tagging of DNA molecules with multiplex identifiers and pooling of PCR products before sequencing. In our experience, up to 380 samples can be typed for HLA-A, -B, and -DRB1 in high-resolution during every sequencing run. In molecular oncology, NGS shows a markedly increased sensitivity in comparison to the conventional Sanger sequencing and is developing to the standard diagnostic tool in detection of somatic mutations in cancer cells with great impact on personalized treatment of patients. PMID:23922545
Rayner, Simon; Brignac, Stafford; Bumeister, Ron; Belosludtsev, Yuri; Ward, Travis; Grant, O’dell; O’Brien, Kevin; Evans, Glen A.; Garner, Harold R.
1998-01-01
We have designed and constructed a machine that synthesizes two standard 96-well plates of oligonucleotides in a single run using standard phosphoramidite chemistry. The machine is capable of making a combination of standard, degenerate, or modified oligos in a single plate. The run time is typically 17 hr for two plates of 20-mers and a reaction scale of 40 nm. The reaction vessel is a standard polypropylene 96-well plate with a hole drilled in the bottom of each well. The two plates are placed in separate vacuum chucks and mounted on an xy table. Each well in turn is positioned under the appropriate reagent injection line and the reagent is injected by switching a dedicated valve. All aspects of machine operation are controlled by a Macintosh computer, which also guides the user through the startup and shutdown procedures, provides a continuous update on the status of the run, and facilitates a number of service procedures that need to be carried out periodically. Over 25,000 oligos have been synthesized for use in dye terminator sequencing reactions, polymerase chain reactions (PCRs), hybridization, and RT–PCR. Oligos up to 100 bases in length have been made with a coupling efficiency in excess of 99%. These machines, working in conjunction with our oligo prediction code are particularly well suited to application in automated high throughput genomic sequencing. PMID:9685322
Rayner, S; Brignac, S; Bumeister, R; Belosludtsev, Y; Ward, T; Grant, O; O'Brien, K; Evans, G A; Garner, H R
1998-07-01
We have designed and constructed a machine that synthesizes two standard 96-well plates of oligonucleotides in a single run using standard phosphoramidite chemistry. The machine is capable of making a combination of standard, degenerate, or modified oligos in a single plate. The run time is typically 17 hr for two plates of 20-mers and a reaction scale of 40 nM. The reaction vessel is a standard polypropylene 96-well plate with a hole drilled in the bottom of each well. The two plates are placed in separate vacuum chucks and mounted on an xy table. Each well in turn is positioned under the appropriate reagent injection line and the reagent is injected by switching a dedicated valve. All aspects of machine operation are controlled by a Macintosh computer, which also guides the user through the startup and shutdown procedures, provides a continuous update on the status of the run, and facilitates a number of service procedures that need to be carried out periodically. Over 25,000 oligos have been synthesized for use in dye terminator sequencing reactions, polymerase chain reactions (PCRs), hybridization, and RT-PCR. Oligos up to 100 bases in length have been made with a coupling efficiency in excess of 99%. These machines, working in conjunction with our oligo prediction code are particularly well suited to application in automated high throughput genomic sequencing.
GRAbB: Selective Assembly of Genomic Regions, a New Niche for Genomic Research
Zhang, Hao; van Diepeningen, Anne D.; van der Lee, Theo A. J.; Waalwijk, Cees; de Hoog, G. Sybren
2016-01-01
GRAbB (Genomic Region Assembly by Baiting) is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often neglected or poorly assembled, although they contain interesting information from phylogenetic or epidemiologic perspectives, but also single copy regions can be assembled. The program is capable of targeting multiple regions within a single run. Furthermore, GRAbB can be used to extract specific loci from NGS data, based on homology, like sequences that are used for barcoding. To make the assembly specific, a known part of the region, such as the sequence of a PCR amplicon or a homologous sequence from a related species must be specified. By assembling only the region of interest, the assembly process is computationally much less demanding and may lead to assemblies of better quality. In this study the different applications and functionalities of the program are demonstrated such as: exhaustive assembly (rDNA region and mitochondrial genome), extracting homologous regions or genes (IGS, RPB1, RPB2 and TEF1a), as well as extracting multiple regions within a single run. The program is also compared with MITObim, which is meant for the exhaustive assembly of a single target based on a similar query sequence. GRAbB is shown to be more efficient than MITObim in terms of speed, memory and disk usage. The other functionalities (handling multiple targets simultaneously and extracting homologous regions) of the new program are not matched by other programs. The program is available with explanatory documentation at https://github.com/b-brankovics/grabb. GRAbB has been tested on Ubuntu (12.04 and 14.04), Fedora (23), CentOS (7.1.1503) and Mac OS X (10.7). Furthermore, GRAbB is available as a docker repository: brankovics/grabb (https://hub.docker.com/r/brankovics/grabb/). PMID:27308864
GRAbB: Selective Assembly of Genomic Regions, a New Niche for Genomic Research.
Brankovics, Balázs; Zhang, Hao; van Diepeningen, Anne D; van der Lee, Theo A J; Waalwijk, Cees; de Hoog, G Sybren
2016-06-01
GRAbB (Genomic Region Assembly by Baiting) is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often neglected or poorly assembled, although they contain interesting information from phylogenetic or epidemiologic perspectives, but also single copy regions can be assembled. The program is capable of targeting multiple regions within a single run. Furthermore, GRAbB can be used to extract specific loci from NGS data, based on homology, like sequences that are used for barcoding. To make the assembly specific, a known part of the region, such as the sequence of a PCR amplicon or a homologous sequence from a related species must be specified. By assembling only the region of interest, the assembly process is computationally much less demanding and may lead to assemblies of better quality. In this study the different applications and functionalities of the program are demonstrated such as: exhaustive assembly (rDNA region and mitochondrial genome), extracting homologous regions or genes (IGS, RPB1, RPB2 and TEF1a), as well as extracting multiple regions within a single run. The program is also compared with MITObim, which is meant for the exhaustive assembly of a single target based on a similar query sequence. GRAbB is shown to be more efficient than MITObim in terms of speed, memory and disk usage. The other functionalities (handling multiple targets simultaneously and extracting homologous regions) of the new program are not matched by other programs. The program is available with explanatory documentation at https://github.com/b-brankovics/grabb. GRAbB has been tested on Ubuntu (12.04 and 14.04), Fedora (23), CentOS (7.1.1503) and Mac OS X (10.7). Furthermore, GRAbB is available as a docker repository: brankovics/grabb (https://hub.docker.com/r/brankovics/grabb/).
Lu, Emily; Elizondo-Riojas, Miguel-Angel; Chang, Jeffrey T; Volk, David E
2014-06-10
Next-generation sequencing results from bead-based aptamer libraries have demonstrated that traditional DNA/RNA alignment software is insufficient. This is particularly true for X-aptamers containing specialty bases (W, X, Y, Z, ...) that are identified by special encoding. Thus, we sought an automated program that uses the inherent design scheme of bead-based X-aptamers to create a hypothetical reference library and Markov modeling techniques to provide improved alignments. Aptaligner provides this feature as well as length error and noise level cutoff features, is parallelized to run on multiple central processing units (cores), and sorts sequences from a single chip into projects and subprojects.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gardner, Shea; Slezak, Tom
With the flood of whole genome finished and draft microbial sequences, we need faster, more scalable bioinformatics tools for sequence comparison. An algorithm is described to find single nucleotide polymorphisms (SNPs) in whole genome data. It scales to hundreds of bacterial or viral genomes, and can be used for finished and/or draft genomes available as unassembled contigs. The method is fast to compute, finding SNPs and building a SNP phylogeny in seconds to hours. We use it to identify thousands of putative SNPs from all publicly available Filoviridae, Poxviridae, foot-and-mouth disease virus, Bacillus, and Escherichia coli genomes and plasmids. Themore » SNP-based trees that result are consistent with known taxonomy and trees determined in other studies. The approach we describe can handle as input hundreds of gigabases of sequence in a single run. The algorithm is based on k-mer analysis using a suffix array, so we call it saSNP.« less
DistMap: a toolkit for distributed short read mapping on a Hadoop cluster.
Pandey, Ram Vinay; Schlötterer, Christian
2013-01-01
With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/
DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster
Pandey, Ram Vinay; Schlötterer, Christian
2013-01-01
With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/ PMID:24009693
Lee, Sejoon; Lee, Soohyun; Ouellette, Scott; Park, Woong-Yang; Lee, Eunjung A; Park, Peter J
2017-06-20
In many next-generation sequencing (NGS) studies, multiple samples or data types are profiled for each individual. An important quality control (QC) step in these studies is to ensure that datasets from the same subject are properly paired. Given the heterogeneity of data types, file types and sequencing depths in a multi-dimensional study, a robust program that provides a standardized metric for genotype comparisons would be useful. Here, we describe NGSCheckMate, a user-friendly software package for verifying sample identities from FASTQ, BAM or VCF files. This tool uses a model-based method to compare allele read fractions at known single-nucleotide polymorphisms, considering depth-dependent behavior of similarity metrics for identical and unrelated samples. Our evaluation shows that NGSCheckMate is effective for a variety of data types, including exome sequencing, whole-genome sequencing, RNA-seq, ChIP-seq, targeted sequencing and single-cell whole-genome sequencing, with a minimal requirement for sequencing depth (>0.5X). An alignment-free module can be run directly on FASTQ files for a quick initial check. We recommend using this software as a QC step in NGS studies. https://github.com/parklab/NGSCheckMate. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Quick, Joshua; Quinlan, Aaron R; Loman, Nicholas J
2014-01-01
The MinION™ is a new, portable single-molecule sequencer developed by Oxford Nanopore Technologies. It measures four inches in length and is powered from the USB 3.0 port of a laptop computer. The MinION™ measures the change in current resulting from DNA strands interacting with a charged protein nanopore. These measurements can then be used to deduce the underlying nucleotide sequence. We present a read dataset from whole-genome shotgun sequencing of the model organism Escherichia coli K-12 substr. MG1655 generated on a MinION™ device during the early-access MinION™ Access Program (MAP). Sequencing runs of the MinION™ are presented, one generated using R7 chemistry (released in July 2014) and one using R7.3 (released in September 2014). Base-called sequence data are provided to demonstrate the nature of data produced by the MinION™ platform and to encourage the development of customised methods for alignment, consensus and variant calling, de novo assembly and scaffolding. FAST5 files containing event data within the HDF5 container format are provided to assist with the development of improved base-calling methods.
Kidd, Kenneth K; Pakstis, Andrew J; Speed, William C; Lagacé, Robert; Chang, Joseph; Wootton, Sharon; Haigh, Eva; Kidd, Judith R
2014-09-01
SNPs that are molecularly very close (<10kb) will generally have extremely low recombination rates, much less than 10(-4). Multiple haplotypes will often exist because of the history of the origins of the variants at the different sites, rare recombinants, and the vagaries of random genetic drift and/or selection. Such multiallelic haplotype loci are potentially important in forensic work for individual identification, for defining ancestry, and for identifying familial relationships. The new DNA sequencing capabilities currently available make possible continuous runs of a few hundred base pairs so that we can now determine the allelic combination of multiple SNPs on each chromosome of an individual, i.e., the phase, for multiple SNPs within a small segment of DNA. Therefore, we have begun to identify regions, encompassing two to four SNPs with an extent of <200bp that define multiallelic haplotype loci. We have identified candidate regions and have collected pilot data on many candidate microhaplotype loci. Here we present 31 microhaplotype loci that have at least three alleles, have high heterozygosity, are globally informative, and are statistically independent at the population level. This study of microhaplotype loci (microhaps) provides proof of principle that such markers exist and validates their usefulness for ancestry inference, lineage-clan-family inference, and individual identification. The true value of microhaplotypes will come with sequencing methods that can establish alleles unambiguously, including disentangling of mixtures, because a single sequencing run on a single strand of DNA will encompass all of the SNPs. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Java bioinformatics analysis web services for multiple sequence alignment--JABAWS:MSA.
Troshin, Peter V; Procter, James B; Barton, Geoffrey J
2011-07-15
JABAWS is a web services framework that simplifies the deployment of web services for bioinformatics. JABAWS:MSA provides services for five multiple sequence alignment (MSA) methods (Probcons, T-coffee, Muscle, Mafft and ClustalW), and is the system employed by the Jalview multiple sequence analysis workbench since version 2.6. A fully functional, easy to set up server is provided as a Virtual Appliance (VA), which can be run on most operating systems that support a virtualization environment such as VMware or Oracle VirtualBox. JABAWS is also distributed as a Web Application aRchive (WAR) and can be configured to run on a single computer and/or a cluster managed by Grid Engine, LSF or other queuing systems that support DRMAA. JABAWS:MSA provides clients full access to each application's parameters, allows administrators to specify named parameter preset combinations and execution limits for each application through simple configuration files. The JABAWS command-line client allows integration of JABAWS services into conventional scripts. JABAWS is made freely available under the Apache 2 license and can be obtained from: http://www.compbio.dundee.ac.uk/jabaws.
A reference human genome dataset of the BGISEQ-500 sequencer.
Huang, Jie; Liang, Xinming; Xuan, Yuankai; Geng, Chunyu; Li, Yuxiang; Lu, Haorong; Qu, Shoufang; Mei, Xianglin; Chen, Hongbo; Yu, Ting; Sun, Nan; Rao, Junhua; Wang, Jiahao; Zhang, Wenwei; Chen, Ying; Liao, Sha; Jiang, Hui; Liu, Xin; Yang, Zhaopeng; Mu, Feng; Gao, Shangxian
2017-05-01
BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoball and combinational probe anchor synthesis developed from Complete Genomics™ sequencing technologies, it generates short reads at a large scale. Here, we present the first human whole-genome sequencing dataset of BGISEQ-500. The dataset was generated by sequencing the widely used cell line HG001 (NA12878) in two sequencing runs of paired-end 50 bp (PE50) and two sequencing runs of paired-end 100 bp (PE100). We also include examples of the raw images from the sequencer for reference. Finally, we identified variations using this dataset, estimated the accuracy of the variations, and compared to that of the variations identified from similar amounts of publicly available HiSeq2500 data. We found similar single nucleotide polymorphism (SNP) detection accuracy for the BGISEQ-500 PE100 data (false positive rate [FPR] = 0.00020%, sensitivity = 96.20%) compared to the PE150 HiSeq2500 data (FPR = 0.00017%, sensitivity = 96.60%) better SNP detection accuracy than the PE50 data (FPR = 0.0006%, sensitivity = 94.15%). But for insertions and deletions (indels), we found lower accuracy for BGISEQ-500 data (FPR = 0.00069% and 0.00067% for PE100 and PE50 respectively, sensitivity = 88.52% and 70.93%) than the HiSeq2500 data (FPR = 0.00032%, sensitivity = 96.28%). Our dataset can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform. © The Authors 2017. Published by Oxford University Press.
NO PLIF imaging in the CUBRC 48-inch shock tunnel
NASA Astrophysics Data System (ADS)
Jiang, N.; Bruzzese, J.; Patton, R.; Sutton, J.; Yentsch, R.; Gaitonde, D. V.; Lempert, W. R.; Miller, J. D.; Meyer, T. R.; Parker, R.; Wadham, T.; Holden, M.; Danehy, P. M.
2012-12-01
Nitric oxide planar laser-induced fluorescence (NO PLIF) imaging is demonstrated at a 10-kHz repetition rate in the Calspan University at Buffalo Research Center's (CUBRC) 48-inch Mach 9 hypervelocity shock tunnel using a pulse burst laser-based high frame rate imaging system. Sequences of up to ten images are obtained internal to a supersonic combustor model, located within the shock tunnel, during a single ~10-millisecond duration run of the ground test facility. Comparison with a CFD simulation shows good overall qualitative agreement in the jet penetration and spreading observed with an average of forty individual PLIF images obtained during several facility runs.
Angiuoli, Samuel V; Matalka, Malcolm; Gussman, Aaron; Galens, Kevin; Vangala, Mahesh; Riley, David R; Arze, Cesar; White, James R; White, Owen; Fricke, W Florian
2011-08-30
Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.
Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors
NASA Astrophysics Data System (ADS)
Khajeh-Saeed, Ali; Poole, Stephen; Blair Perot, J.
2010-06-01
Finding regions of similarity between two very long data streams is a computationally intensive problem referred to as sequence alignment. Alignment algorithms must allow for imperfect sequence matching with different starting locations and some gaps and errors between the two data sequences. Perhaps the most well known application of sequence matching is the testing of DNA or protein sequences against genome databases. The Smith-Waterman algorithm is a method for precisely characterizing how well two sequences can be aligned and for determining the optimal alignment of those two sequences. Like many applications in computational science, the Smith-Waterman algorithm is constrained by the memory access speed and can be accelerated significantly by using graphics processors (GPUs) as the compute engine. In this work we show that effective use of the GPU requires a novel reformulation of the Smith-Waterman algorithm. The performance of this new version of the algorithm is demonstrated using the SSCA#1 (Bioinformatics) benchmark running on one GPU and on up to four GPUs executing in parallel. The results indicate that for large problems a single GPU is up to 45 times faster than a CPU for this application, and the parallel implementation shows linear speed up on up to 4 GPUs.
Head direction cells in the postsubiculum do not show replay of prior waking sequences during sleep
Brandon, Mark P.; Bogaard, Andrew; Andrews, Chris M.; Hasselmo, Michael E.
2011-01-01
During slow-wave sleep and REM sleep, hippocampal place cells in the rat show replay of sequences previously observed during waking. We tested the hypothesis from computational modelling that the temporal structure of REM sleep replay could arise from an interplay of place cells with head direction cells in the postsubiculum. Physiological single-unit recording was performed simultaneously from five or more head direction or place by head direction cells in the postsubiculum during running on a circular track allowing sampling of a full range of head directions, and during sleep periods before and after running on the circular track. Data analysis compared the spiking activity during individual REM periods with waking as in previous analysis procedures for REM sleep. We also used a new procedure comparing groups of similar runs during waking with REM sleep periods. There was no consistent evidence for a statistically significant correlation of the temporal structure of spiking during REM sleep with spiking during waking running periods. Thus, the spiking activity of head direction cells during REM sleep does not show replay of head direction cell activity occurring during a previous waking period of running on the task. In addition, we compared the spiking of postsubiculum neurons during hippocampal sharp wave ripple events. We show that head direction cells are not activated during sharp wave ripples, while neurons responsive to place in the postsubiculum show reliable spiking at ripple events. PMID:21509854
Advances in DNA sequencing technologies for high resolution HLA typing.
Cereb, Nezih; Kim, Hwa Ran; Ryu, Jaejun; Yang, Soo Young
2015-12-01
This communication describes our experience in large-scale G group-level high resolution HLA typing using three different DNA sequencing platforms - ABI 3730 xl, Illumina MiSeq and PacBio RS II. Recent advances in DNA sequencing technologies, so-called next generation sequencing (NGS), have brought breakthroughs in deciphering the genetic information in all living species at a large scale and at an affordable level. The NGS DNA indexing system allows sequencing multiple genes for large number of individuals in a single run. Our laboratory has adopted and used these technologies for HLA molecular testing services. We found that each sequencing technology has its own strengths and weaknesses, and their sequencing performances complement each other. HLA genes are highly complex and genotyping them is quite challenging. Using these three sequencing platforms, we were able to meet all requirements for G group-level high resolution and high volume HLA typing. Copyright © 2015 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.
Auer, Lucas; Mariadassou, Mahendra; O'Donohue, Michael; Klopp, Christophe; Hernandez-Raquet, Guillermina
2017-11-01
Next-generation sequencing technologies give access to large sets of data, which are extremely useful in the study of microbial diversity based on 16S rRNA gene. However, the production of such large data sets is not only marred by technical biases and sequencing noise but also increases computation time and disc space use. To improve the accuracy of OTU predictions and overcome both computations, storage and noise issues, recent studies and tools suggested removing all single reads and low abundant OTUs, considering them as noise. Although the effect of applying an OTU abundance threshold on α- and β-diversity has been well documented, the consequences of removing single reads have been poorly studied. Here, we test the effect of singleton read filtering (SRF) on microbial community composition using in silico simulated data sets as well as sequencing data from synthetic and real communities displaying different levels of diversity and abundance profiles. Scalability to large data sets is also assessed using a complete MiSeq run. We show that SRF drastically reduces the chimera content and computational time, enabling the analysis of a complete MiSeq run in just a few minutes. Moreover, SRF accurately determines the actual community diversity: the differences in α- and β-community diversity obtained with SRF and standard procedures are much smaller than the intrinsic variability of technical and biological replicates. © 2017 John Wiley & Sons Ltd.
Program Synthesizes UML Sequence Diagrams
NASA Technical Reports Server (NTRS)
Barry, Matthew R.; Osborne, Richard N.
2006-01-01
A computer program called "Rational Sequence" generates Universal Modeling Language (UML) sequence diagrams of a target Java program running on a Java virtual machine (JVM). Rational Sequence thereby performs a reverse engineering function that aids in the design documentation of the target Java program. Whereas previously, the construction of sequence diagrams was a tedious manual process, Rational Sequence generates UML sequence diagrams automatically from the running Java code.
Laurie, Matthew T; Bertout, Jessica A; Taylor, Sean D; Burton, Joshua N; Shendure, Jay A; Bielas, Jason H
2013-08-01
Due to the high cost of failed runs and suboptimal data yields, quantification and determination of fragment size range are crucial steps in the library preparation process for massively parallel sequencing (or next-generation sequencing). Current library quality control methods commonly involve quantification using real-time quantitative PCR and size determination using gel or capillary electrophoresis. These methods are laborious and subject to a number of significant limitations that can make library calibration unreliable. Herein, we propose and test an alternative method for quality control of sequencing libraries using droplet digital PCR (ddPCR). By exploiting a correlation we have discovered between droplet fluorescence and amplicon size, we achieve the joint quantification and size determination of target DNA with a single ddPCR assay. We demonstrate the accuracy and precision of applying this method to the preparation of sequencing libraries.
Pattern Discovery and Change Detection of Online Music Query Streams
NASA Astrophysics Data System (ADS)
Li, Hua-Fu
In this paper, an efficient stream mining algorithm, called FTP-stream (Frequent Temporal Pattern mining of streams), is proposed to find the frequent temporal patterns over melody sequence streams. In the framework of our proposed algorithm, an effective bit-sequence representation is used to reduce the time and memory needed to slide the windows. The FTP-stream algorithm can calculate the support threshold in only a single pass based on the concept of bit-sequence representation. It takes the advantage of "left" and "and" operations of the representation. Experiments show that the proposed algorithm only scans the music query stream once, and runs significant faster and consumes less memory than existing algorithms, such as SWFI-stream and Moment.
Lin, Hsin-Hung; Liao, Yu-Chieh
2015-01-01
Despite the ever-increasing output of next-generation sequencing data along with developing assemblers, dozens to hundreds of gaps still exist in de novo microbial assemblies due to uneven coverage and large genomic repeats. Third-generation single-molecule, real-time (SMRT) sequencing technology avoids amplification artifacts and generates kilobase-long reads with the potential to complete microbial genome assembly. However, due to the low accuracy (~85%) of third-generation sequences, a considerable amount of long reads (>50X) are required for self-correction and for subsequent de novo assembly. Recently-developed hybrid approaches, using next-generation sequencing data and as few as 5X long reads, have been proposed to improve the completeness of microbial assembly. In this study we have evaluated the contemporary hybrid approaches and demonstrated that assembling corrected long reads (by runCA) produced the best assembly compared to long-read scaffolding (e.g., AHA, Cerulean and SSPACE-LongRead) and gap-filling (SPAdes). For generating corrected long reads, we further examined long-read correction tools, such as ECTools, LSC, LoRDEC, PBcR pipeline and proovread. We have demonstrated that three microbial genomes including Escherichia coli K12 MG1655, Meiothermus ruber DSM1279 and Pdeobacter heparinus DSM2366 were successfully hybrid assembled by runCA into near-perfect assemblies using ECTools-corrected long reads. In addition, we developed a tool, Patch, which implements corrected long reads and pre-assembled contigs as inputs, to enhance microbial genome assemblies. With the additional 20X long reads, short reads of S. cerevisiae W303 were hybrid assembled into 115 contigs using the verified strategy, ECTools + runCA. Patch was subsequently applied to upgrade the assembly to a 35-contig draft genome. Our evaluation of the hybrid approaches shows that assembling the ECTools-corrected long reads via runCA generates near complete microbial genomes, suggesting that genome assembly could benefit from re-analyzing the available hybrid datasets that were not assembled in an optimal fashion.
Qualitative and quantitative assessment of Illumina's forensic STR and SNP kits on MiSeq FGx™.
Sharma, Vishakha; Chow, Hoi Yan; Siegel, Donald; Wurmbach, Elisa
2017-01-01
Massively parallel sequencing (MPS) is a powerful tool transforming DNA analysis in multiple fields ranging from medicine, to environmental science, to evolutionary biology. In forensic applications, MPS offers the ability to significantly increase the discriminatory power of human identification as well as aid in mixture deconvolution. However, before the benefits of any new technology can be employed, a thorough evaluation of its quality, consistency, sensitivity, and specificity must be rigorously evaluated in order to gain a detailed understanding of the technique including sources of error, error rates, and other restrictions/limitations. This extensive study assessed the performance of Illumina's MiSeq FGx MPS system and ForenSeq™ kit in nine experimental runs including 314 reaction samples. In-depth data analysis evaluated the consequences of different assay conditions on test results. Variables included: sample numbers per run, targets per run, DNA input per sample, and replications. Results are presented as heat maps revealing patterns for each locus. Data analysis focused on read numbers (allele coverage), drop-outs, drop-ins, and sequence analysis. The study revealed that loci with high read numbers performed better and resulted in fewer drop-outs and well balanced heterozygous alleles. Several loci were prone to drop-outs which led to falsely typed homozygotes and therefore to genotype errors. Sequence analysis of allele drop-in typically revealed a single nucleotide change (deletion, insertion, or substitution). Analyses of sequences, no template controls, and spurious alleles suggest no contamination during library preparation, pooling, and sequencing, but indicate that sequencing or PCR errors may have occurred due to DNA polymerase infidelities. Finally, we found utilizing Illumina's FGx System at recommended conditions does not guarantee 100% outcomes for all samples tested, including the positive control, and required manual editing due to low read numbers and/or allele drop-in. These findings are important for progressing towards implementation of MPS in forensic DNA testing.
NASA Technical Reports Server (NTRS)
1981-01-01
Technical readiness for the production of photovoltaic modules using single crystal silicon dendritic web sheet material is demonstrated by: (1) selection, design and implementation of solar cell and photovoltaic module process sequence in a Module Experimental Process System Development Unit; (2) demonstration runs; (3) passing of acceptance and qualification tests; and (4) achievement of a cost effective module.
BONSAI Garden: Parallel knowledge discovery system for amino acid sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shoudai, T.; Miyano, S.; Shinohara, A.
1995-12-31
We have developed a machine discovery system BON-SAI which receives positive and negative examples as inputs and produces as a hypothesis a pair of a decision tree over regular patterns and an alphabet indexing. This system has succeeded in discovering reasonable knowledge on transmembrane domain sequences and signal peptide sequences by computer experiments. However, when several kinds of sequences axe mixed in the data, it does not seem reasonable for a single BONSAI system to find a hypothesis of a reasonably small size with high accuracy. For this purpose, we have designed a system BONSAI Garden, in which several BONSAI`smore » and a program called Gardener run over a network in parallel, to partition the data into some number of classes together with hypotheses explaining these classes accurately.« less
DNA Sequencing Using capillary Electrophoresis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dr. Barry Karger
2011-05-09
The overall goal of this program was to develop capillary electrophoresis as the tool to be used to sequence for the first time the Human Genome. Our program was part of the Human Genome Project. In this work, we were highly successful and the replaceable polymer we developed, linear polyacrylamide, was used by the DOE sequencing lab in California to sequence a significant portion of the human genome using the MegaBase multiple capillary array electrophoresis instrument. In this final report, we summarize our efforts and success. We began our work by separating by capillary electrophoresis double strand oligonucleotides using cross-linkedmore » polyacrylamide gels in fused silica capillaries. This work showed the potential of the methodology. However, preparation of such cross-linked gel capillaries was difficult with poor reproducibility, and even more important, the columns were not very stable. We improved stability by using non-cross linked linear polyacrylamide. Here, the entangled linear chains could move when osmotic pressure (e.g. sample injection) was imposed on the polymer matrix. This relaxation of the polymer dissipated the stress in the column. Our next advance was to use significantly lower concentrations of the linear polyacrylamide that the polymer could be automatically blown out after each run and replaced with fresh linear polymer solution. In this way, a new column was available for each analytical run. Finally, while testing many linear polymers, we selected linear polyacrylamide as the best matrix as it was the most hydrophilic polymer available. Under our DOE program, we demonstrated initially the success of the linear polyacrylamide to separate double strand DNA. We note that the method is used even today to assay purity of double stranded DNA fragments. Our focus, of course, was on the separation of single stranded DNA for sequencing purposes. In one paper, we demonstrated the success of our approach in sequencing up to 500 bases. Other application papers of sequencing up to this level were also published in the mid 1990's. A major interest of the sequencing community has always been read length. The longer the sequence read per run the more efficient the process as well as the ability to read repeat sequences. We therefore devoted a great deal of time to studying the factors influencing read length in capillary electrophoresis, including polymer type and molecule weight, capillary column temperature, applied electric field, etc. In our initial optimization, we were able to demonstrate, for the first time, the sequencing of over 1000 bases with 90% accuracy. The run required 80 minutes for separation. Sequencing of 1000 bases per column was next demonstrated on a multiple capillary instrument. Our studies revealed that linear polyacrylamide produced the longest read lengths because the hydrophilic single strand DNA had minimal interaction with the very hydrophilic linear polyacrylamide. Any interaction of the DNA with the polymer would lead to broader peaks and lower read length. Another important parameter was the molecular weight of the linear chains. High molecular weight (> 1 MDA) was important to allow the long single strand DNA to reptate through the entangled polymer matrix. In an important paper, we showed an inverse emulsion method to prepare reproducibility linear polyacrylamide polymer with an average MWT of 9MDa. This approach was used in the polymer for sequencing the human genome. Another critical factor in the successful use of capillary electrophoresis for sequencing was the sample preparation method. In the Sanger sequencing reaction, high concentration of salts and dideoxynucleotide remained. Since the sample was introduced to the capillary column by electrokinetic injection, these salt ions would be favorably injected into the column over the sequencing fragments, thus reducing the signal for longer fragments and hence reading read length. In two papers, we examined the role of individual components from the sequencing reaction and then developed a protocol to reduce the deleterious salts. We demonstrated a robust method for achieving long read length DNA sequencing. Continuing our advances, we next demonstrated the achievement of over 1000 bases in less than one hour with a base calling accuracy of between 98 and 99%. In this work, we implemented energy transfer dyes which allowed for cleaner differentiation of the 4 dye labeled terminal nucleotides. In addition, we developed improved base calling software to help read sequencing when the separation was only minimal as occurs at long read lengths. Another critical parameter we studied was column temperature. We demonstrated that read lengths improved as the column temperature was increased from room temperature to 60 C or 70 C. The higher temperature relaxed the DNA chains under the influence of the high electric field.« less
Scaling exponents for ordered maxima
Ben-Naim, E.; Krapivsky, P. L.; Lemons, N. W.
2015-12-22
We study extreme value statistics of multiple sequences of random variables. For each sequence with N variables, independently drawn from the same distribution, the running maximum is defined as the largest variable to date. We compare the running maxima of m independent sequences and investigate the probability S N that the maxima are perfectly ordered, that is, the running maximum of the first sequence is always larger than that of the second sequence, which is always larger than the running maximum of the third sequence, and so on. The probability S N is universal: it does not depend on themore » distribution from which the random variables are drawn. For two sequences, S N~N –1/2, and in general, the decay is algebraic, S N~N –σm, for large N. We analytically obtain the exponent σ 3≅1.302931 as root of a transcendental equation. Moreover, the exponents σ m grow with m, and we show that σ m~m for large m.« less
2011-01-01
Background Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. Results We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. Conclusion The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing. PMID:21878105
Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.
2013-01-01
SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).
Miller, Mark P; Knaus, Brian J; Mullins, Thomas D; Haig, Susan M
2013-01-01
SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25 bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).
Schult, Janette; von Stülpnagel, Rul; Steffens, Melanie C.
2014-01-01
What are the memory-related consequences of learning actions (such as “apply the patch”) by enactment during study, as compared to action observation? Theories converge in postulating that enactment encoding increases item-specific processing, but not the processing of relational information. Typically, in the laboratory enactment encoding is studied for lists of unrelated single actions in which one action execution has no overarching purpose or relation with other actions. In contrast, real-life actions are usually carried out with the intention to achieve such a purpose. When actions are embedded in action sequences, relational information provides efficient retrieval cues. We contrasted memory for single actions with memory for action sequences in three experiments. We found more reliance on relational processing for action-sequences than single actions. To what degree can this relational information be used after enactment versus after the observation of an actor? We found indicators of superior relational processing after observation than enactment in ordered pair recall (Experiment 1A) and in emerging subjective organization of repeated recall protocols (recall runs 2–3, Experiment 2). An indicator of superior item-specific processing after enactment compared to observation was recognition (Experiment 1B, Experiment 2). Similar net recall suggests that observation can be as good a learning strategy as enactment. We discuss possible reasons why these findings only partly converge with previous research and theorizing. PMID:24927279
Brassac, Jonathan; Blattner, Frank R
2015-09-01
Polyploidization is an important speciation mechanism in the barley genus Hordeum. To analyze evolutionary changes after allopolyploidization, knowledge of parental relationships is essential. One chloroplast and 12 nuclear single-copy loci were amplified by polymerase chain reaction (PCR) in all Hordeum plus six out-group species. Amplicons from each of 96 individuals were pooled, sheared, labeled with individual-specific barcodes and sequenced in a single run on a 454 platform. Reference sequences were obtained by cloning and Sanger sequencing of all loci for nine supplementary individuals. The 454 reads were assembled into contigs representing the 13 loci and, for polyploids, also homoeologues. Phylogenetic analyses were conducted for all loci separately and for a concatenated data matrix of all loci. For diploid taxa, a Bayesian concordance analysis and a coalescent-based dated species tree was inferred from all gene trees. Chloroplast matK was used to determine the maternal parent in allopolyploid taxa. The relative performance of different multilocus analyses in the presence of incomplete lineage sorting and hybridization was also assessed. The resulting multilocus phylogeny reveals for the first time species phylogeny and progenitor-derivative relationships of all di- and polyploid Hordeum taxa within a single analysis. Our study proves that it is possible to obtain a multilocus species-level phylogeny for di- and polyploid taxa by combining PCR with next-generation sequencing, without cloning and without creating a heavy load of sequence data. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Customisation of the exome data analysis pipeline using a combinatorial approach.
Pattnaik, Swetansu; Vaidyanathan, Srividya; Pooja, Durgad G; Deepak, Sa; Panda, Binay
2012-01-01
The advent of next generation sequencing (NGS) technologies have revolutionised the way biologists produce, analyse and interpret data. Although NGS platforms provide a cost-effective way to discover genome-wide variants from a single experiment, variants discovered by NGS need follow up validation due to the high error rates associated with various sequencing chemistries. Recently, whole exome sequencing has been proposed as an affordable option compared to whole genome runs but it still requires follow up validation of all the novel exomic variants. Customarily, a consensus approach is used to overcome the systematic errors inherent to the sequencing technology, alignment and post alignment variant detection algorithms. However, the aforementioned approach warrants the use of multiple sequencing chemistry, multiple alignment tools, multiple variant callers which may not be viable in terms of time and money for individual investigators with limited informatics know-how. Biologists often lack the requisite training to deal with the huge amount of data produced by NGS runs and face difficulty in choosing from the list of freely available analytical tools for NGS data analysis. Hence, there is a need to customise the NGS data analysis pipeline to preferentially retain true variants by minimising the incidence of false positives and make the choice of right analytical tools easier. To this end, we have sampled different freely available tools used at the alignment and post alignment stage suggesting the use of the most suitable combination determined by a simple framework of pre-existing metrics to create significant datasets.
FPGA Sequencer for Radar Altimeter Applications
NASA Technical Reports Server (NTRS)
Berkun, Andrew C.; Pollard, Brian D.; Chen, Curtis W.
2011-01-01
A sequencer for a radar altimeter provides accurate attitude information for a reliable soft landing of the Mars Science Laboratory (MSL). This is a field-programmable- gate-array (FPGA)-only implementation. A table loaded externally into the FPGA controls timing, processing, and decision structures. Radar is memory-less and does not use previous acquisitions to assist in the current acquisition. All cycles complete in exactly 50 milliseconds, regardless of range or whether a target was found. A RAM (random access memory) within the FPGA holds instructions for up to 15 sets. For each set, timing is run, echoes are processed, and a comparison is made. If a target is seen, more detailed processing is run on that set. If no target is seen, the next set is tried. When all sets have been run, the FPGA terminates and waits for the next 50-millisecond event. This setup simplifies testing and improves reliability. A single vertex chip does the work of an entire assembly. Output products require minor processing to become range and velocity. This technology is the heart of the Terminal Descent Sensor, which is an integral part of the Entry Decent and Landing system for MSL. In addition, it is a strong candidate for manned landings on Mars or the Moon.
Wilson, Kitchener D; Shen, Peidong; Fung, Eula; Karakikes, Ioannis; Zhang, Angela; InanlooRahatloo, Kolsoum; Odegaard, Justin; Sallam, Karim; Davis, Ronald W; Lui, George K; Ashley, Euan A; Scharfe, Curt; Wu, Joseph C
2015-09-11
Thousands of mutations across >50 genes have been implicated in inherited cardiomyopathies. However, options for sequencing this rapidly evolving gene set are limited because many sequencing services and off-the-shelf kits suffer from slow turnaround, inefficient capture of genomic DNA, and high cost. Furthermore, customization of these assays to cover emerging targets that suit individual needs is often expensive and time consuming. We sought to develop a custom high throughput, clinical-grade next-generation sequencing assay for detecting cardiac disease gene mutations with improved accuracy, flexibility, turnaround, and cost. We used double-stranded probes (complementary long padlock probes), an inexpensive and customizable capture technology, to efficiently capture and amplify the entire coding region and flanking intronic and regulatory sequences of 88 genes and 40 microRNAs associated with inherited cardiomyopathies, congenital heart disease, and cardiac development. Multiplexing 11 samples per sequencing run resulted in a mean base pair coverage of 420, of which 97% had >20× coverage and >99% were concordant with known heterozygous single nucleotide polymorphisms. The assay correctly detected germline variants in 24 individuals and revealed several polymorphic regions in miR-499. Total run time was 3 days at an approximate cost of $100 per sample. Accurate, high-throughput detection of mutations across numerous cardiac genes is achievable with complementary long padlock probe technology. Moreover, this format allows facile insertion of additional probes as more cardiomyopathy and congenital heart disease genes are discovered, giving researchers a powerful new tool for DNA mutation detection and discovery. © 2015 American Heart Association, Inc.
DNA Multiple Sequence Alignment Guided by Protein Domains: The MSA-PAD 2.0 Method.
Balech, Bachir; Monaco, Alfonso; Perniola, Michele; Santamaria, Monica; Donvito, Giacinto; Vicario, Saverio; Maggi, Giorgio; Pesole, Graziano
2018-01-01
Multiple sequence alignment (MSA) is a fundamental component in many DNA sequence analyses including metagenomics studies and phylogeny inference. When guided by protein profiles, DNA multiple alignments assume a higher precision and robustness. Here we present details of the use of the upgraded version of MSA-PAD (2.0), which is a DNA multiple sequence alignment framework able to align DNA sequences coding for single/multiple protein domains guided by PFAM or user-defined annotations. MSA-PAD has two alignment strategies, called "Gene" and "Genome," accounting for coding domains order and genomic rearrangements, respectively. Novel options were added to the present version, where the MSA can be guided by protein profiles provided by the user. This allows MSA-PAD 2.0 to run faster and to add custom protein profiles sometimes not present in PFAM database according to the user's interest. MSA-PAD 2.0 is currently freely available as a Web application at https://recasgateway.cloud.ba.infn.it/ .
Seneca, Sara; Vancampenhout, Kim; Van Coster, Rudy; Smet, Joél; Lissens, Willy; Vanlander, Arnaud; De Paepe, Boel; Jonckheere, An; Stouffs, Katrien; De Meirleir, Linda
2015-01-01
Next-generation sequencing (NGS), an innovative sequencing technology that enables the successful analysis of numerous gene sequences in a massive parallel sequencing approach, has revolutionized the field of molecular biology. Although NGS was introduced in a rather recent past, the technology has already demonstrated its potential and effectiveness in many research projects, and is now on the verge of being introduced into the diagnostic setting of routine laboratories to delineate the molecular basis of genetic disease in undiagnosed patient samples. We tested a benchtop device on retrospective genomic DNA (gDNA) samples of controls and patients with a clinical suspicion of a mitochondrial DNA disorder. This Ion Torrent Personal Genome Machine platform is a high-throughput sequencer with a fast turnaround time and reasonable running costs. We challenged the chemistry and technology with the analysis and processing of a mutational spectrum composed of samples with single-nucleotide substitutions, indels (insertions and deletions) and large single or multiple deletions, occasionally in heteroplasmy. The output data were compared with previously obtained conventional dideoxy sequencing results and the mitochondrial revised Cambridge Reference Sequence (rCRS). We were able to identify the majority of all nucleotide alterations, but three false-negative results were also encountered in the data set. At the same time, the poor performance of the PGM instrument in regions associated with homopolymeric stretches generated many false-positive miscalls demanding additional manual curation of the data.
De novo characterization of Lentinula edodes C(91-3) transcriptome by deep Solexa sequencing.
Zhong, Mintao; Liu, Ben; Wang, Xiaoli; Liu, Lei; Lun, Yongzhi; Li, Xingyun; Ning, Anhong; Cao, Jing; Huang, Min
2013-02-01
Lentinula edodes, has been utilized as food, as well as, in popular medicine, moreover, its extract isolated from its mycelium and fruiting body have shown several therapeutic properties. Yet little is understood about its genes involved in these properties, and the absence of L.edodes genomes has been a barrier to the development of functional genomics research. However, high throughput sequencing technologies are now being widely applied to non-model species. To facilitate research on L.edodes, we leveraged Solexa sequencing technology in de novo assembly of L.edodes C(91-3) transcriptome. In a single run, we produced more than 57 million sequencing reads. These reads were assembled into 28,923 unigene sequences (mean size=689bp) including 18,120 unigenes with coding sequence (CDS). Based on similarity search with known proteins, assembled unigene sequences were annotated with gene descriptions, gene ontology (GO) and clusters of orthologous group (COG) terms. Our data provides the first comprehensive sequence resource available for functional genomics studies in L.edodes, and demonstrates the utility of Illumina/Solexa sequencing for de novo transcriptome characterization and gene discovery in a non-model mushroom. Copyright © 2012 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brinson, E.C.; Adriano, T.; Bloch, W.
1994-09-01
We have developed a rapid, single-tube, non-isotopic assay that screens a patient sample for the presence of 31 cystic fibrosis (CF) mutations. This assay can identify these mutations in a single reaction tube and a single electrophoresis run. Sample preparation is a simple, boil-and-go procedure, completed in less than an hour. The assay is composed of a 15-plex PCR, followed by a 61-plex oligonucleotide ligation assay (OLA), and incorporates a novel detection scheme, Sequence Coded Separation. Initially, the multiplex PCR amplifies 15 relevant segments of the CFTR gene, simultaneously. These PCR amplicons serve as templates for the multiplex OLA, whichmore » detects the normal or mutant allele at all loci, simultaneously. Each polymorphic site is interrogated by three oligonucleotide probes, a common probe and two allele-specific probes. Each common probe is tagged with a fluorescent dye, and the competing normal and mutant allelic probes incorporate different, non-nucleotide, mobility modifiers. These modifiers are composed of hexaethylene oxide (HEO) units, incorporated as HEO phosphoramidite monomers during automated DNA synthesis. The OLA is based on both probe hybridization and the ability of DNA ligase to discriminate single base mismatches at the junction between paired probes. Each single tube assay is electrophoresed in a single gel lane of a 4-color fluorescent DNA sequencer (Applied Biosystems, Model 373A). Each of the ligation products is identified by its unique combination of electrophoretic mobility and one of three colors. The fourth color is reserved for the in-lane size standard, used by GENESCAN{sup TM} software (Applied Biosystems) to size the OLA electrophoresis products. The Genotyper{sub TM} software (Applied Biosystems) decodes these Sequence-Coded-Separation data to create a patient summary report for all loci tested.« less
Muth, Thilo; García-Martín, Juan A; Rausell, Antonio; Juan, David; Valencia, Alfonso; Pazos, Florencio
2012-02-15
We have implemented in a single package all the features required for extracting, visualizing and manipulating fully conserved positions as well as those with a family-dependent conservation pattern in multiple sequence alignments. The program allows, among other things, to run different methods for extracting these positions, combine the results and visualize them in protein 3D structures and sequence spaces. JDet is a multiplatform application written in Java. It is freely available, including the source code, at http://csbg.cnb.csic.es/JDet. The package includes two of our recently developed programs for detecting functional positions in protein alignments (Xdet and S3Det), and support for other methods can be added as plug-ins. A help file and a guided tutorial for JDet are also available.
CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment
Manavski, Svetlin A; Valle, Giorgio
2008-01-01
Background Searching for similarities in protein and DNA databases has become a routine procedure in Molecular Biology. The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. Furthermore, the exponential growth of protein and DNA databases makes the Smith-Waterman algorithm unrealistic for searching similarities in large sets of sequences. For these reasons heuristic approaches such as those implemented in FASTA and BLAST tend to be preferred, allowing faster execution times at the cost of reduced sensitivity. The main motivation of our work is to exploit the huge computational power of commonly available graphic cards, to develop high performance solutions for sequence alignment. Results In this paper we present what we believe is the fastest solution of the exact Smith-Waterman algorithm running on commodity hardware. It is implemented in the recently released CUDA programming environment by NVidia. CUDA allows direct access to the hardware primitives of the last-generation Graphics Processing Units (GPU) G80. Speeds of more than 3.5 GCUPS (Giga Cell Updates Per Second) are achieved on a workstation running two GeForce 8800 GTX. Exhaustive tests have been done to compare our implementation to SSEARCH and BLAST, running on a 3 GHz Intel Pentium IV processor. Our solution was also compared to a recently published GPU implementation and to a Single Instruction Multiple Data (SIMD) solution. These tests show that our implementation performs from 2 to 30 times faster than any other previous attempt available on commodity hardware. Conclusions The results show that graphic cards are now sufficiently advanced to be used as efficient hardware accelerators for sequence alignment. Their performance is better than any alternative available on commodity hardware platforms. The solution presented in this paper allows large scale alignments to be performed at low cost, using the exact Smith-Waterman algorithm instead of the largely adopted heuristic approaches. PMID:18387198
Volume sharing of reservoir water
NASA Astrophysics Data System (ADS)
Dudley, Norman J.
1988-05-01
Previous models optimize short-, intermediate-, and long-run irrigation decision making in a simplified river valley system characterized by highly variable water supplies and demands for a single decision maker controlling both reservoir releases and farm water use. A major problem in relaxing the assumption of one decision maker is communicating the stochastic nature of supplies and demands between reservoir and farm managers. In this paper, an optimizing model is used to develop release rules for reservoir management when all users share equally in releases, and computer simulation is used to generate an historical time sequence of announced releases. These announced releases become a state variable in a farm management model which optimizes farm area-to-irrigate decisions through time. Such modeling envisages the use of growing area climatic data by the reservoir authority to gauge water demand and the transfer of water supply data from reservoir to farm managers via computer data files. Alternative model forms, including allocating water on a priority basis, are discussed briefly. Results show lower mean aggregate farm income and lower variance of aggregate farm income than in the single decision-maker case. This short-run economic efficiency loss coupled with likely long-run economic efficiency losses due to the attenuated nature of property rights indicates the need for quite different ways of integrating reservoir and farm management.
Churkin, Alexander; Barash, Danny
2008-01-01
Background RNAmute is an interactive Java application which, given an RNA sequence, calculates the secondary structure of all single point mutations and organizes them into categories according to their similarity to the predicted structure of the wild type. The secondary structure predictions are performed using the Vienna RNA package. A more efficient implementation of RNAmute is needed, however, to extend from the case of single point mutations to the general case of multiple point mutations, which may often be desired for computational predictions alongside mutagenesis experiments. But analyzing multiple point mutations, a process that requires traversing all possible mutations, becomes highly expensive since the running time is O(nm) for a sequence of length n with m-point mutations. Using Vienna's RNAsubopt, we present a method that selects only those mutations, based on stability considerations, which are likely to be conformational rearranging. The approach is best examined using the dot plot representation for RNA secondary structure. Results Using RNAsubopt, the suboptimal solutions for a given wild-type sequence are calculated once. Then, specific mutations are selected that are most likely to cause a conformational rearrangement. For an RNA sequence of about 100 nts and 3-point mutations (n = 100, m = 3), for example, the proposed method reduces the running time from several hours or even days to several minutes, thus enabling the practical application of RNAmute to the analysis of multiple-point mutations. Conclusion A highly efficient addition to RNAmute that is as user friendly as the original application but that facilitates the practical analysis of multiple-point mutations is presented. Such an extension can now be exploited prior to site-directed mutagenesis experiments by virologists, for example, who investigate the change of function in an RNA virus via mutations that disrupt important motifs in its secondary structure. A complete explanation of the application, called MultiRNAmute, is available at [1]. PMID:18445289
2016-10-27
Institute of Infectious Diseases, Fort Detrick, Frederick, Maryland, USA 9 10 11 Running head: Complete Genome Sequence of Y. pestis strain Cadman...1 Complete Genome Sequence of Pigmentation Negative Yersinia pestis strain Cadman 1 2 3 Sean Lovetta, Kitty Chaseb, Galina Korolevaa, Gustavo...we report the genome sequence of Yersinia pestis strain Cadman, an attenuated strain 25 lacking the pgm locus. Y. pestis is the causative agent of
Taverna: a tool for building and running workflows of services
Hull, Duncan; Wolstencroft, Katy; Stevens, Robert; Goble, Carole; Pocock, Mathew R.; Li, Peter; Oinn, Tom
2006-01-01
Taverna is an application that eases the use and integration of the growing number of molecular biology tools and databases available on the web, especially web services. It allows bioinformaticians to construct workflows or pipelines of services to perform a range of different analyses, such as sequence analysis and genome annotation. These high-level workflows can integrate many different resources into a single analysis. Taverna is available freely under the terms of the GNU Lesser General Public License (LGPL) from . PMID:16845108
Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing.
Zhao, Shanrong; Prenger, Kurt; Smith, Lance; Messina, Thomas; Fan, Hongtao; Jaeger, Edward; Stephens, Susan
2013-06-27
Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available for third-party implementation and use, and can be downloaded from http://s3.amazonaws.com/jnj_rainbow/index.html.
Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing
2013-01-01
Background Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Results Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Conclusions Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available for third-party implementation and use, and can be downloaded from http://s3.amazonaws.com/jnj_rainbow/index.html. PMID:23802613
Scaling bioinformatics applications on HPC.
Mikailov, Mike; Luo, Fu-Jyh; Barkley, Stuart; Valleru, Lohit; Whitney, Stephen; Liu, Zhichao; Thakkar, Shraddha; Tong, Weida; Petrick, Nicholas
2017-12-28
Recent breakthroughs in molecular biology and next generation sequencing technologies have led to the expenential growh of the sequence databases. Researchrs use BLAST for processing these sequences. However traditional software parallelization techniques (threads, message passing interface) applied in newer versios of BLAST are not adequate for processing these sequences in timely manner. A new method for array job parallelization has been developed which offers O(T) theoretical speed-up in comparison to multi-threading and MPI techniques. Here T is the number of array job tasks. (The number of CPUs that will be used to complete the job equals the product of T multiplied by the number of CPUs used by a single task.) The approach is based on segmentation of both input datasets to the BLAST process, combining partial solutions published earlier (Dhanker and Gupta, Int J Comput Sci Inf Technol_5:4818-4820, 2014), (Grant et al., Bioinformatics_18:765-766, 2002), (Mathog, Bioinformatics_19:1865-1866, 2003). It is accordingly referred to as a "dual segmentation" method. In order to implement the new method, the BLAST source code was modified to allow the researcher to pass to the program the number of records (effective number of sequences) in the original database. The team also developed methods to manage and consolidate the large number of partial results that get produced. Dual segmentation allows for massive parallelization, which lifts the scaling ceiling in exciting ways. BLAST jobs that hitherto failed or slogged inefficiently to completion now finish with speeds that characteristically reduce wallclock time from 27 days on 40 CPUs to a single day using 4104 tasks, each task utilizing eight CPUs and taking less than 7 minutes to complete. The massive increase in the number of tasks when running an analysis job with dual segmentation reduces the size, scope and execution time of each task. Besides significant speed of completion, additional benefits include fine-grained checkpointing and increased flexibility of job submission. "Trickling in" a swarm of individual small tasks tempers competition for CPU time in the shared HPC environment, and jobs submitted during quiet periods can complete in extraordinarily short time frames. The smaller task size also allows the use of older and less powerful hardware. The CDRH workhorse cluster was commissioned in 2010, yet its eight-core CPUs with only 24GB RAM work well in 2017 for these dual segmentation jobs. Finally, these techniques are excitingly friendly to budget conscious scientific research organizations where probabilistic algorithms such as BLAST might discourage attempts at greater certainty because single runs represent a major resource drain. If a job that used to take 24 days can now be completed in less than an hour or on a space available basis (which is the case at CDRH), repeated runs for more exhaustive analyses can be usefully contemplated.
NO PLIF Imaging in the CUBRC 48 Inch Shock Tunnel
NASA Technical Reports Server (NTRS)
Jiang, N.; Bruzzese, J.; Patton, R.; Sutton J.; Lempert W.; Miller, J. D.; Meyer, T. R.; Parker, R.; Wadham, T.; Holden, M.;
2011-01-01
Nitric Oxide Planar Laser-Induced Fluorescence (NO PLIF) imaging is demonstrated at a 10 kHz repetition rate in the Calspan-University at Buffalo Research Center s (CUBRC) 48-inch Mach 9 hypervelocity shock tunnel using a pulse burst laser-based high frame rate imaging system. Sequences of up to ten images are obtained internal to a supersonic combustor model, located within the shock tunnel, during a single approx.10-millisecond duration run of the ground test facility. This represents over an order of magnitude improvement in data rate from previous PLIF-based diagnostic approaches. Comparison with a preliminary CFD simulation shows good overall qualitative agreement between the prediction of the mean NO density field and the observed PLIF image intensity, averaged over forty individual images obtained during several facility runs.
Stahl, Robert; Luke, Anthony; Ma, C Benjamin; Krug, Roland; Steinbach, Lynne; Majumdar, Sharmila; Link, Thomas M
2008-07-01
To determine the prevalence of pathologic findings in asymptomatic knees of marathon runners before and after a competition in comparison with physically active subjects. To compare the diagnostic performance of cartilage-dedicated magnetic resonance imaging (MRI) sequences at 3.0 T. Ten marathon runners underwent 3.0 T MRI 2-3 days before and after competition. Twelve physically active asymptomatic subjects not performing long-distance running were examined as controls. Pathologic condition was assessed with the whole-organ magnetic resonance imaging score (WORMS). Cartilage abnormalities and bone marrow edema pattern (BMEP) were quantified. Visualization of cartilage pathology was assessed with intermediate-weighted fast spin-echo (IM-w FSE), fast imaging employing steady-state acquisition (FIESTA) and T1-weighted three-dimensional (3D) high-spatial-resolution volumetric fat-suppressed spoiled gradient-echo (SPGR) MRI sequences. Eight of ten marathon runners and 7/12 controls showed knee abnormality. Slightly more and larger cartilage abnormalities, and BMEP, in marathon runners yielded higher but not significantly different WORMS (P > 0.05) than in controls. Running a single marathon did not alter MR findings substantially. Cartilage abnormalities were best visualized with IM-w FSE images (P < 0.05). A high prevalence of knee abnormalities was found in marathon runners and also in active subjects participating in other recreational sports. IM-w FSE sequences delineated more cartilage MR imaging abnormalities than did FIESTA and SPGR sequences.
Pandey, Ram Vinay; Pabinger, Stephan; Kriegner, Albert; Weinhäusel, Andreas
2016-01-01
Traditional Sanger sequencing as well as Next-Generation Sequencing have been used for the identification of disease causing mutations in human molecular research. The majority of currently available tools are developed for research and explorative purposes and often do not provide a complete, efficient, one-stop solution. As the focus of currently developed tools is mainly on NGS data analysis, no integrative solution for the analysis of Sanger data is provided and consequently a one-stop solution to analyze reads from both sequencing platforms is not available. We have therefore developed a new pipeline called MutAid to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. It performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid is freely available at https://sourceforge.net/projects/mutaid.
Pandey, Ram Vinay; Pabinger, Stephan; Kriegner, Albert; Weinhäusel, Andreas
2016-01-01
Traditional Sanger sequencing as well as Next-Generation Sequencing have been used for the identification of disease causing mutations in human molecular research. The majority of currently available tools are developed for research and explorative purposes and often do not provide a complete, efficient, one-stop solution. As the focus of currently developed tools is mainly on NGS data analysis, no integrative solution for the analysis of Sanger data is provided and consequently a one-stop solution to analyze reads from both sequencing platforms is not available. We have therefore developed a new pipeline called MutAid to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. It performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid is freely available at https://sourceforge.net/projects/mutaid. PMID:26840129
Same-day genomic and epigenomic diagnosis of brain tumors using real-time nanopore sequencing.
Euskirchen, Philipp; Bielle, Franck; Labreche, Karim; Kloosterman, Wigard P; Rosenberg, Shai; Daniau, Mailys; Schmitt, Charlotte; Masliah-Planchon, Julien; Bourdeaut, Franck; Dehais, Caroline; Marie, Yannick; Delattre, Jean-Yves; Idbaih, Ahmed
2017-11-01
Molecular classification of cancer has entered clinical routine to inform diagnosis, prognosis, and treatment decisions. At the same time, new tumor entities have been identified that cannot be defined histologically. For central nervous system tumors, the current World Health Organization classification explicitly demands molecular testing, e.g., for 1p/19q-codeletion or IDH mutations, to make an integrated histomolecular diagnosis. However, a plethora of sophisticated technologies is currently needed to assess different genomic and epigenomic alterations and turnaround times are in the range of weeks, which makes standardized and widespread implementation difficult and hinders timely decision making. Here, we explored the potential of a pocket-size nanopore sequencing device for multimodal and rapid molecular diagnostics of cancer. Low-pass whole genome sequencing was used to simultaneously generate copy number (CN) and methylation profiles from native tumor DNA in the same sequencing run. Single nucleotide variants in IDH1, IDH2, TP53, H3F3A, and the TERT promoter region were identified using deep amplicon sequencing. Nanopore sequencing yielded ~0.1X genome coverage within 6 h and resulting CN and epigenetic profiles correlated well with matched microarray data. Diagnostically relevant alterations, such as 1p/19q codeletion, and focal amplifications could be recapitulated. Using ad hoc random forests, we could perform supervised pan-cancer classification to distinguish gliomas, medulloblastomas, and brain metastases of different primary sites. Single nucleotide variants in IDH1, IDH2, and H3F3A were identified using deep amplicon sequencing within minutes of sequencing. Detection of TP53 and TERT promoter mutations shows that sequencing of entire genes and GC-rich regions is feasible. Nanopore sequencing allows same-day detection of structural variants, point mutations, and methylation profiling using a single device with negligible capital cost. It outperforms hybridization-based and current sequencing technologies with respect to time to diagnosis and required laboratory equipment and expertise, aiming to make precision medicine possible for every cancer patient, even in resource-restricted settings.
HangOut: generating clean PSI-BLAST profiles for domains with long insertions.
Kim, Bong-Hyun; Cong, Qian; Grishin, Nick V
2010-06-15
Profile-based similarity search is an essential step in structure-function studies of proteins. However, inclusion of non-homologous sequence segments into a profile causes its corruption and results in false positives. Profile corruption is common in multidomain proteins, and single domains with long insertions are a significant source of errors. We developed a procedure (HangOut) that, for a single domain with specified insertion position, cleans erroneously extended PSI-BLAST alignments to generate better profiles. HangOut is implemented in Python 2.3 and runs on all Unix-compatible platforms. The source code is available under the GNU GPL license at http://prodata.swmed.edu/HangOut/. Supplementary data are available at Bioinformatics online.
Software manual for operating particle displacement tracking data acquisition and reduction system
NASA Technical Reports Server (NTRS)
Wernet, Mark P.
1991-01-01
The software manual is presented. The necessary steps required to record, analyze, and reduce Particle Image Velocimetry (PIV) data using the Particle Displacement Tracking (PDT) technique are described. The new PDT system is an all electronic technique employing a CCD video camera and a large memory buffer frame-grabber board to record low velocity (less than or equal to 20 cm/s) flows. Using a simple encoding scheme, a time sequence of single exposure images are time coded into a single image and then processed to track particle displacements and determine 2-D velocity vectors. All the PDT data acquisition, analysis, and data reduction software is written to run on an 80386 PC.
Genetically improved BarraCUDA.
Langdon, W B; Lam, Brian Yee Hong
2017-01-01
BarraCUDA is an open source C program which uses the BWA algorithm in parallel with nVidia CUDA to align short next generation DNA sequences against a reference genome. Recently its source code was optimised using "Genetic Improvement". The genetically improved (GI) code is up to three times faster on short paired end reads from The 1000 Genomes Project and 60% more accurate on a short BioPlanet.com GCAT alignment benchmark. GPGPU BarraCUDA running on a single K80 Tesla GPU can align short paired end nextGen sequences up to ten times faster than bwa on a 12 core server. The speed up was such that the GI version was adopted and has been regularly downloaded from SourceForge for more than 12 months.
Software for Analyzing Sequences of Flow-Related Images
NASA Technical Reports Server (NTRS)
Klimek, Robert; Wright, Ted
2004-01-01
Spotlight is a computer program for analysis of sequences of images generated in combustion and fluid physics experiments. Spotlight can perform analysis of a single image in an interactive mode or a sequence of images in an automated fashion. The primary type of analysis is tracking of positions of objects over sequences of frames. Features and objects that are typically tracked include flame fronts, particles, droplets, and fluid interfaces. Spotlight automates the analysis of object parameters, such as centroid position, velocity, acceleration, size, shape, intensity, and color. Images can be processed to enhance them before statistical and measurement operations are performed. An unlimited number of objects can be analyzed simultaneously. Spotlight saves results of analyses in a text file that can be exported to other programs for graphing or further analysis. Spotlight is a graphical-user-interface-based program that at present can be executed on Microsoft Windows and Linux operating systems. A version that runs on Macintosh computers is being considered.
NASA Astrophysics Data System (ADS)
Gong, Liang; Wu, Yu; Jian, Qijie; Yin, Chunxiao; Li, Taotao; Gupta, Vijai Kumar; Duan, Xuewu; Jiang, Yueming
2018-01-01
Vibrio qinghaiensis sp.-Q67 (Vqin-Q67) is a freshwater luminescent bacterium that continuously emits blue-green light (485 nm). The bacterium has been widely used for detecting toxic contaminants. Here, we report the complete genome sequence of Vqin-Q67, obtained using third-generation PacBio sequencing technology. Continuous long reads were attained from three PacBio sequencing runs and reads >500 bp with a quality value of >0.75 were merged together into a single dataset. This resultant highly-contiguous de novo assembly has no genome gaps, and comprises two chromosomes with substantial genetic information, including protein-coding genes, non-coding RNA, transposon and gene islands. Our dataset can be useful as a comparative genome for evolution and speciation studies, as well as for the analysis of protein-coding gene families, the pathogenicity of different Vibrio species in fish, the evolution of non-coding RNA and transposon, and the regulation of gene expression in relation to the bioluminescence of Vqin-Q67.
160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA)
Li, Isaac TS; Shum, Warren; Truong, Kevin
2007-01-01
Background To infer homology and subsequently gene function, the Smith-Waterman (SW) algorithm is used to find the optimal local alignment between two sequences. When searching sequence databases that may contain hundreds of millions of sequences, this algorithm becomes computationally expensive. Results In this paper, we focused on accelerating the Smith-Waterman algorithm by using FPGA-based hardware that implemented a module for computing the score of a single cell of the SW matrix. Then using a grid of this module, the entire SW matrix was computed at the speed of field propagation through the FPGA circuit. These modifications dramatically accelerated the algorithm's computation time by up to 160 folds compared to a pure software implementation running on the same FPGA with an Altera Nios II softprocessor. Conclusion This design of FPGA accelerated hardware offers a new promising direction to seeking computation improvement of genomic database searching. PMID:17555593
160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA).
Li, Isaac T S; Shum, Warren; Truong, Kevin
2007-06-07
To infer homology and subsequently gene function, the Smith-Waterman (SW) algorithm is used to find the optimal local alignment between two sequences. When searching sequence databases that may contain hundreds of millions of sequences, this algorithm becomes computationally expensive. In this paper, we focused on accelerating the Smith-Waterman algorithm by using FPGA-based hardware that implemented a module for computing the score of a single cell of the SW matrix. Then using a grid of this module, the entire SW matrix was computed at the speed of field propagation through the FPGA circuit. These modifications dramatically accelerated the algorithm's computation time by up to 160 folds compared to a pure software implementation running on the same FPGA with an Altera Nios II softprocessor. This design of FPGA accelerated hardware offers a new promising direction to seeking computation improvement of genomic database searching.
Kapeller, Christoph; Kamada, Kyousuke; Ogawa, Hiroshi; Prueckl, Robert; Scharinger, Josef; Guger, Christoph
2014-01-01
A brain-computer-interface (BCI) allows the user to control a device or software with brain activity. Many BCIs rely on visual stimuli with constant stimulation cycles that elicit steady-state visual evoked potentials (SSVEP) in the electroencephalogram (EEG). This EEG response can be generated with a LED or a computer screen flashing at a constant frequency, and similar EEG activity can be elicited with pseudo-random stimulation sequences on a screen (code-based BCI). Using electrocorticography (ECoG) instead of EEG promises higher spatial and temporal resolution and leads to more dominant evoked potentials due to visual stimulation. This work is focused on BCIs based on visual evoked potentials (VEP) and its capability as a continuous control interface for augmentation of video applications. One 35 year old female subject with implanted subdural grids participated in the study. The task was to select one out of four visual targets, while each was flickering with a code sequence. After a calibration run including 200 code sequences, a linear classifier was used during an evaluation run to identify the selected visual target based on the generated code-based VEPs over 20 trials. Multiple ECoG buffer lengths were tested and the subject reached a mean online classification accuracy of 99.21% for a window length of 3.15 s. Finally, the subject performed an unsupervised free run in combination with visual feedback of the current selection. Additionally, an algorithm was implemented that allowed to suppress false positive selections and this allowed the subject to start and stop the BCI at any time. The code-based BCI system attained very high online accuracy, which makes this approach very promising for control applications where a continuous control signal is needed. PMID:25147509
NASA Astrophysics Data System (ADS)
Qu, Zijie; Temel, Fatma; Henderikx, Rene; Breuer, Kenneth
2017-11-01
The motility of bacteria E.coli in viscous fluids has been widely studied, although conflicting results on the effect of viscosity on swimming speed abound. The swimming mode of wild-type E.coli is idealized as a run-and-tumble sequence in which periods of straight swimming at a constant speed are randomly interrupted by a tumble, defined as a sudden change of direction with a very low speed. Using a tracking microscope, we follow cells for extended time and find that the swimming behavior of a single cell can exhibit a variety of behaviors including run-and-tumble and ``slow-random-walk'' in which the cells move at relatively low speed without the characteristic run. Although the characteristic swimming speed varies between individuals and in different polymer solutions, we find that the skewness of the speed distribution is solely a function of viscosity, and uniquely determines the ratio of the average speed to the characteristic run speed. Using Resistive Force Theory and the cell-specific measured characteristic run speed, we show that differences in the swimming behavior observed in solutions of different viscosity are due to changes in the flagellar bundling time, which increases as the viscosity rises, due to lower rotation rate of the flagellar motor. National Science Foundation.
Grégoire, Catherine-Alexandra; Tobin, Stephanie; Goldenstein, Brianna L; Samarut, Éric; Leclerc, Andréanne; Aumont, Anne; Drapeau, Pierre; Fulton, Stephanie; Fernandes, Karl J L
2018-01-01
Environmental enrichment (EE) is a powerful stimulus of brain plasticity and is among the most accessible treatment options for brain disease. In rodents, EE is modeled using multi-factorial environments that include running, social interactions, and/or complex surroundings. Here, we show that running and running-independent EE differentially affect the hippocampal dentate gyrus (DG), a brain region critical for learning and memory. Outbred male CD1 mice housed individually with a voluntary running disk showed improved spatial memory in the radial arm maze compared to individually- or socially-housed mice with a locked disk. We therefore used RNA sequencing to perform an unbiased interrogation of DG gene expression in mice exposed to either a voluntary running disk (RUN), a locked disk (LD), or a locked disk plus social enrichment and tunnels [i.e., a running-independent complex environment (CE)]. RNA sequencing revealed that RUN and CE mice showed distinct, non-overlapping patterns of transcriptomic changes versus the LD control. Bio-informatics uncovered that the RUN and CE environments modulate separate transcriptional networks, biological processes, cellular compartments and molecular pathways, with RUN preferentially regulating synaptic and growth-related pathways and CE altering extracellular matrix-related functions. Within the RUN group, high-distance runners also showed selective stress pathway alterations that correlated with a drastic decline in overall transcriptional changes, suggesting that excess running causes a stress-induced suppression of running's genetic effects. Our findings reveal stimulus-dependent transcriptional signatures of EE on the DG, and provide a resource for generating unbiased, data-driven hypotheses for novel mediators of EE-induced cognitive changes.
TEA: the epigenome platform for Arabidopsis methylome study.
Su, Sheng-Yao; Chen, Shu-Hwa; Lu, I-Hsuan; Chiang, Yih-Shien; Wang, Yu-Bin; Chen, Pao-Yang; Lin, Chung-Yen
2016-12-22
Bisulfite sequencing (BS-seq) has become a standard technology to profile genome-wide DNA methylation at single-base resolution. It allows researchers to conduct genome-wise cytosine methylation analyses on issues about genomic imprinting, transcriptional regulation, cellular development and differentiation. One single data from a BS-Seq experiment is resolved into many features according to the sequence contexts, making methylome data analysis and data visualization a complex task. We developed a streamlined platform, TEA, for analyzing and visualizing data from whole-genome BS-Seq (WGBS) experiments conducted in the model plant Arabidopsis thaliana. To capture the essence of the genome methylation level and to meet the efficiency for running online, we introduce a straightforward method for measuring genome methylation in each sequence context by gene. The method is scripted in Java to process BS-Seq mapping results. Through a simple data uploading process, the TEA server deploys a web-based platform for deep analysis by linking data to an updated Arabidopsis annotation database and toolkits. TEA is an intuitive and efficient online platform for analyzing the Arabidopsis genomic DNA methylation landscape. It provides several ways to help users exploit WGBS data. TEA is freely accessible for academic users at: http://tea.iis.sinica.edu.tw .
MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach
Watson, Mick; Minot, Samuel S.; Rivera, Maria C.; Franklin, Rima B.
2017-01-01
Abstract Background: Environmental metagenomic analysis is typically accomplished by assigning taxonomy and/or function from whole genome sequencing or 16S amplicon sequences. Both of these approaches are limited, however, by read length, among other technical and biological factors. A nanopore-based sequencing platform, MinION™, produces reads that are ≥1 × 104 bp in length, potentially providing for more precise assignment, thereby alleviating some of the limitations inherent in determining metagenome composition from short reads. We tested the ability of sequence data produced by MinION (R7.3 flow cells) to correctly assign taxonomy in single bacterial species runs and in three types of low-complexity synthetic communities: a mixture of DNA using equal mass from four species, a community with one relatively rare (1%) and three abundant (33% each) components, and a mixture of genomic DNA from 20 bacterial strains of staggered representation. Taxonomic composition of the low-complexity communities was assessed by analyzing the MinION sequence data with three different bioinformatic approaches: Kraken, MG-RAST, and One Codex. Results: Long read sequences generated from libraries prepared from single strains using the version 5 kit and chemistry, run on the original MinION device, yielded as few as 224 to as many as 3497 bidirectional high-quality (2D) reads with an average overall study length of 6000 bp. For the single-strain analyses, assignment of reads to the correct genus by different methods ranged from 53.1% to 99.5%, assignment to the correct species ranged from 23.9% to 99.5%, and the majority of misassigned reads were to closely related organisms. A synthetic metagenome sequenced with the same setup yielded 714 high quality 2D reads of approximately 5500 bp that were up to 98% correctly assigned to the species level. Synthetic metagenome MinION libraries generated using version 6 kit and chemistry yielded from 899 to 3497 2D reads with lengths averaging 5700 bp with up to 98% assignment accuracy at the species level. The observed community proportions for “equal” and “rare” synthetic libraries were close to the known proportions, deviating from 0.1% to 10% across all tests. For a 20-species mock community with staggered contributions, a sequencing run detected all but 3 species (each included at <0.05% of DNA in the total mixture), 91% of reads were assigned to the correct species, 93% of reads were assigned to the correct genus, and >99% of reads were assigned to the correct family. Conclusions: At the current level of output and sequence quality (just under 4 × 103 2D reads for a synthetic metagenome), MinION sequencing followed by Kraken or One Codex analysis has the potential to provide rapid and accurate metagenomic analysis where the consortium is comprised of a limited number of taxa. Important considerations noted in this study included: high sensitivity of the MinION platform to the quality of input DNA, high variability of sequencing results across libraries and flow cells, and relatively small numbers of 2D reads per analysis limit. Together, these limited detection of very rare components of the microbial consortia, and would likely limit the utility of MinION for the sequencing of high-complexity metagenomic communities where thousands of taxa are expected. Furthermore, the limitations of the currently available data analysis tools suggest there is considerable room for improvement in the analytical approaches for the characterization of microbial communities using long reads. Nevertheless, the fact that the accurate taxonomic assignment of high-quality reads generated by MinION is approaching 99.5% and, in most cases, the inferred community structure mirrors the known proportions of a synthetic mixture warrants further exploration of practical application to environmental metagenomics as the platform continues to develop and improve. With further improvement in sequence throughput and error rate reduction, this platform shows great promise for precise real-time analysis of the composition and structure of more complex microbial communities. PMID:28327976
MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.
Brown, Bonnie L; Watson, Mick; Minot, Samuel S; Rivera, Maria C; Franklin, Rima B
2017-03-01
Environmental metagenomic analysis is typically accomplished by assigning taxonomy and/or function from whole genome sequencing or 16S amplicon sequences. Both of these approaches are limited, however, by read length, among other technical and biological factors. A nanopore-based sequencing platform, MinION™, produces reads that are ≥1 × 104 bp in length, potentially providing for more precise assignment, thereby alleviating some of the limitations inherent in determining metagenome composition from short reads. We tested the ability of sequence data produced by MinION (R7.3 flow cells) to correctly assign taxonomy in single bacterial species runs and in three types of low-complexity synthetic communities: a mixture of DNA using equal mass from four species, a community with one relatively rare (1%) and three abundant (33% each) components, and a mixture of genomic DNA from 20 bacterial strains of staggered representation. Taxonomic composition of the low-complexity communities was assessed by analyzing the MinION sequence data with three different bioinformatic approaches: Kraken, MG-RAST, and One Codex. Results: Long read sequences generated from libraries prepared from single strains using the version 5 kit and chemistry, run on the original MinION device, yielded as few as 224 to as many as 3497 bidirectional high-quality (2D) reads with an average overall study length of 6000 bp. For the single-strain analyses, assignment of reads to the correct genus by different methods ranged from 53.1% to 99.5%, assignment to the correct species ranged from 23.9% to 99.5%, and the majority of misassigned reads were to closely related organisms. A synthetic metagenome sequenced with the same setup yielded 714 high quality 2D reads of approximately 5500 bp that were up to 98% correctly assigned to the species level. Synthetic metagenome MinION libraries generated using version 6 kit and chemistry yielded from 899 to 3497 2D reads with lengths averaging 5700 bp with up to 98% assignment accuracy at the species level. The observed community proportions for “equal” and “rare” synthetic libraries were close to the known proportions, deviating from 0.1% to 10% across all tests. For a 20-species mock community with staggered contributions, a sequencing run detected all but 3 species (each included at <0.05% of DNA in the total mixture), 91% of reads were assigned to the correct species, 93% of reads were assigned to the correct genus, and >99% of reads were assigned to the correct family. Conclusions: At the current level of output and sequence quality (just under 4 × 103 2D reads for a synthetic metagenome), MinION sequencing followed by Kraken or One Codex analysis has the potential to provide rapid and accurate metagenomic analysis where the consortium is comprised of a limited number of taxa. Important considerations noted in this study included: high sensitivity of the MinION platform to the quality of input DNA, high variability of sequencing results across libraries and flow cells, and relatively small numbers of 2D reads per analysis limit. Together, these limited detection of very rare components of the microbial consortia, and would likely limit the utility of MinION for the sequencing of high-complexity metagenomic communities where thousands of taxa are expected. Furthermore, the limitations of the currently available data analysis tools suggest there is considerable room for improvement in the analytical approaches for the characterization of microbial communities using long reads. Nevertheless, the fact that the accurate taxonomic assignment of high-quality reads generated by MinION is approaching 99.5% and, in most cases, the inferred community structure mirrors the known proportions of a synthetic mixture warrants further exploration of practical application to environmental metagenomics as the platform continues to develop and improve. With further improvement in sequence throughput and error rate reduction, this platform shows great promise for precise real-time analysis of the composition and structure of more complex microbial communities. © The Author 2017. Published by Oxford University Press.
acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data
Lux, Markus; Kruger, Jan; Rinke, Christian; ...
2016-12-20
A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. We present acdc, a tool specifically developed to aidmore » the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools.« less
acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lux, Markus; Kruger, Jan; Rinke, Christian
A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. We present acdc, a tool specifically developed to aidmore » the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools.« less
Rinke, Jenny; Schäfer, Vivien; Schmidt, Mathias; Ziermann, Janine; Kohlmann, Alexander; Hochhaus, Andreas; Ernst, Thomas
2013-08-01
We sought to establish a convenient, sensitive next-generation sequencing (NGS) method for genotyping the 26 most commonly mutated leukemia-associated genes in a single work flow and to optimize this method for low amounts of input template DNA. We designed 184 PCR amplicons that cover all of the candidate genes. NGS was performed with genomic DNA (gDNA) from a cohort of 10 individuals with chronic myelomonocytic leukemia. The results were compared with NGS data obtained from sequencing of DNA generated by whole-genome amplification (WGA) of 20 ng template gDNA. Differences between gDNA and WGA samples in variant frequencies were determined for 2 different WGA kits. For gDNA samples, 25 of 26 genes were successfully sequenced with a sensitivity of 5%, which was achieved by a median coverage of 492 reads (range, 308-636 reads) per amplicon. We identified 24 distinct mutations in 11 genes. With WGA samples, we reliably detected all mutations above 5% sensitivity with a median coverage of 506 reads (range, 256-653 reads) per amplicon. With all variants included in the analysis, WGA amplification by the 2 kits tested yielded differences in variant frequencies that ranged from -28.19% to +9.94% [mean (SD) difference, -0.2% (4.08%)] and from -35.03% to +18.67% [mean difference, -0.75% (5.12%)]. Our method permits simultaneous analysis of a wide range of leukemia-associated target genes in a single sequencing run. NGS can be performed after WGA of template DNA for reliable detection of variants without introducing appreciable bias.
Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.
Rahn, René; Budach, Stefan; Costanza, Pascal; Ehrhardt, Marcel; Hancox, Jonny; Reinert, Knut
2018-05-03
Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (Single Instruction Multiple Data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we a) distribute many independent alignments on multiple threads and b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon Phi™ (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon Phi™ and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4. under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME::SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms. rene.rahn@fu-berlin.de.
Medeiros, J D; Leite, L R; Pylro, V S; Oliveira, F S; Almeida, V M; Fernandes, G R; Salim, A C M; Araújo, F M G; Volpini, A C; Oliveira, G; Cuadros-Orellana, S
2017-10-01
Acid mine drainage (AMD) is characterized by an acid and metal-rich run-off that originates from mining systems. Despite having been studied for many decades, much remains unknown about the microbial community dynamics in AMD sites, especially during their early development, when the acidity is moderate. Here, we describe draft genome assemblies from single cells retrieved from an early-stage AMD sample. These cells belong to the genus Hydrotalea and are closely related to Hydrotalea flava. The phylogeny and average nucleotide identity analysis suggest that all single amplified genomes (SAGs) form two clades that may represent different strains. These cells have the genomic potential for denitrification, copper and other metal resistance. Two coexisting CRISPR-Cas loci were recovered across SAGs, and we observed heterogeneity in the population with regard to the spacer sequences, together with the loss of trailer-end spacers. Our results suggest that the genomes of Hydrotalea sp. strains studied here are adjusting to a quickly changing selective pressure at the microhabitat scale, and an important form of this selective pressure is infection by foreign DNA. © 2017 John Wiley & Sons Ltd.
NASA Technical Reports Server (NTRS)
Uffelman, Hal; Goodson, Troy; Pellegrin, Michael; Stavert, Lynn; Burk, Thomas; Beach, David; Signorelli, Joel; Jones, Jeremy; Hahn, Yungsun; Attiyah, Ahlam;
2009-01-01
The Maneuver Automation Software (MAS) automates the process of generating commands for maneuvers to keep the spacecraft of the Cassini-Huygens mission on a predetermined prime mission trajectory. Before MAS became available, a team of approximately 10 members had to work about two weeks to design, test, and implement each maneuver in a process that involved running many maneuver-related application programs and then serially handing off data products to other parts of the team. MAS enables a three-member team to design, test, and implement a maneuver in about one-half hour after Navigation has process-tracking data. MAS accepts more than 60 parameters and 22 files as input directly from users. MAS consists of Practical Extraction and Reporting Language (PERL) scripts that link, sequence, and execute the maneuver- related application programs: "Pushing a single button" on a graphical user interface causes MAS to run navigation programs that design a maneuver; programs that create sequences of commands to execute the maneuver on the spacecraft; and a program that generates predictions about maneuver performance and generates reports and other files that enable users to quickly review and verify the maneuver design. MAS can also generate presentation materials, initiate electronic command request forms, and archive all data products for future reference.
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.
Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard
2004-09-09
Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.
StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics
Ramirez-Gonzalez, Ricardo H.; Leggett, Richard M.; Waite, Darren; Thanki, Anil; Drou, Nizar; Caccamo, Mario; Davey, Robert
2014-01-01
Modern sequencing platforms generate enormous quantities of data in ever-decreasing amounts of time. Additionally, techniques such as multiplex sequencing allow one run to contain hundreds of different samples. With such data comes a significant challenge to understand its quality and to understand how the quality and yield are changing across instruments and over time. As well as the desire to understand historical data, sequencing centres often have a duty to provide clear summaries of individual run performance to collaborators or customers. We present StatsDB, an open-source software package for storage and analysis of next generation sequencing run metrics. The system has been designed for incorporation into a primary analysis pipeline, either at the programmatic level or via integration into existing user interfaces. Statistics are stored in an SQL database and APIs provide the ability to store and access the data while abstracting the underlying database design. This abstraction allows simpler, wider querying across multiple fields than is possible by the manual steps and calculation required to dissect individual reports, e.g. ”provide metrics about nucleotide bias in libraries using adaptor barcode X, across all runs on sequencer A, within the last month”. The software is supplied with modules for storage of statistics from FastQC, a commonly used tool for analysis of sequence reads, but the open nature of the database schema means it can be easily adapted to other tools. Currently at The Genome Analysis Centre (TGAC), reports are accessed through our LIMS system or through a standalone GUI tool, but the API and supplied examples make it easy to develop custom reports and to interface with other packages. PMID:24627795
Flexbar 3.0 - SIMD and multicore parallelization.
Roehr, Johannes T; Dieterich, Christoph; Reinert, Knut
2017-09-15
High-throughput sequencing machines can process many samples in a single run. For Illumina systems, sequencing reads are barcoded with an additional DNA tag that is contained in the respective sequencing adapters. The recognition of barcode and adapter sequences is hence commonly needed for the analysis of next-generation sequencing data. Flexbar performs demultiplexing based on barcodes and adapter trimming for such data. The massive amounts of data generated on modern sequencing machines demand that this preprocessing is done as efficiently as possible. We present Flexbar 3.0, the successor of the popular program Flexbar. It employs now twofold parallelism: multi-threading and additionally SIMD vectorization. Both types of parallelism are used to speed-up the computation of pair-wise sequence alignments, which are used for the detection of barcodes and adapters. Furthermore, new features were included to cover a wide range of applications. We evaluated the performance of Flexbar based on a simulated sequencing dataset. Our program outcompetes other tools in terms of speed and is among the best tools in the presented quality benchmark. https://github.com/seqan/flexbar. johannes.roehr@fu-berlin.de or knut.reinert@fu-berlin.de. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data.
Polishchuk, Maya; Paz, Inbal; Yakhini, Zohar; Mandel-Gutfreund, Yael
2018-05-25
Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.
Hanson-Smith, Victor; Johnson, Alexander
2016-07-01
The method of phylogenetic ancestral sequence reconstruction is a powerful approach for studying evolutionary relationships among protein sequence, structure, and function. In particular, this approach allows investigators to (1) reconstruct and "resurrect" (that is, synthesize in vivo or in vitro) extinct proteins to study how they differ from modern proteins, (2) identify key amino acid changes that, over evolutionary timescales, have altered the function of the protein, and (3) order historical events in the evolution of protein function. Widespread use of this approach has been slow among molecular biologists, in part because the methods require significant computational expertise. Here we present PhyloBot, a web-based software tool that makes ancestral sequence reconstruction easy. Designed for non-experts, it integrates all the necessary software into a single user interface. Additionally, PhyloBot provides interactive tools to explore evolutionary trajectories between ancestors, enabling the rapid generation of hypotheses that can be tested using genetic or biochemical approaches. Early versions of this software were used in previous studies to discover genetic mechanisms underlying the functions of diverse protein families, including V-ATPase ion pumps, DNA-binding transcription regulators, and serine/threonine protein kinases. PhyloBot runs in a web browser, and is available at the following URL: http://www.phylobot.com. The software is implemented in Python using the Django web framework, and runs on elastic cloud computing resources from Amazon Web Services. Users can create and submit jobs on our free server (at the URL listed above), or use our open-source code to launch their own PhyloBot server.
Hanson-Smith, Victor; Johnson, Alexander
2016-01-01
The method of phylogenetic ancestral sequence reconstruction is a powerful approach for studying evolutionary relationships among protein sequence, structure, and function. In particular, this approach allows investigators to (1) reconstruct and “resurrect” (that is, synthesize in vivo or in vitro) extinct proteins to study how they differ from modern proteins, (2) identify key amino acid changes that, over evolutionary timescales, have altered the function of the protein, and (3) order historical events in the evolution of protein function. Widespread use of this approach has been slow among molecular biologists, in part because the methods require significant computational expertise. Here we present PhyloBot, a web-based software tool that makes ancestral sequence reconstruction easy. Designed for non-experts, it integrates all the necessary software into a single user interface. Additionally, PhyloBot provides interactive tools to explore evolutionary trajectories between ancestors, enabling the rapid generation of hypotheses that can be tested using genetic or biochemical approaches. Early versions of this software were used in previous studies to discover genetic mechanisms underlying the functions of diverse protein families, including V-ATPase ion pumps, DNA-binding transcription regulators, and serine/threonine protein kinases. PhyloBot runs in a web browser, and is available at the following URL: http://www.phylobot.com. The software is implemented in Python using the Django web framework, and runs on elastic cloud computing resources from Amazon Web Services. Users can create and submit jobs on our free server (at the URL listed above), or use our open-source code to launch their own PhyloBot server. PMID:27472806
Scala, Giovanni; Affinito, Ornella; Palumbo, Domenico; Florio, Ermanno; Monticelli, Antonella; Miele, Gennaro; Chiariotti, Lorenzo; Cocozza, Sergio
2016-11-25
CpG sites in an individual molecule may exist in a binary state (methylated or unmethylated) and each individual DNA molecule, containing a certain number of CpGs, is a combination of these states defining an epihaplotype. Classic quantification based approaches to study DNA methylation are intrinsically unable to fully represent the complexity of the underlying methylation substrate. Epihaplotype based approaches, on the other hand, allow methylation profiles of cell populations to be studied at the single molecule level. For such investigations, next-generation sequencing techniques can be used, both for quantitative and for epihaplotype analysis. Currently available tools for methylation analysis lack output formats that explicitly report CpG methylation profiles at the single molecule level and that have suited statistical tools for their interpretation. Here we present ampliMethProfiler, a python-based pipeline for the extraction and statistical epihaplotype analysis of amplicons from targeted deep bisulfite sequencing of multiple DNA regions. ampliMethProfiler tool provides an easy and user friendly way to extract and analyze the epihaplotype composition of reads from targeted bisulfite sequencing experiments. ampliMethProfiler is written in python language and requires a local installation of BLAST and (optionally) QIIME tools. It can be run on Linux and OS X platforms. The software is open source and freely available at http://amplimethprofiler.sourceforge.net .
Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment
2013-01-01
Background Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. Results In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Conclusion Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to identify conserved regions fast or even interactively using a standard PC. Our method has many potential applications such as finding characteristic signature sequences for families of organisms and studying conserved and variable regions in, for example, 16S rRNA. PMID:24564200
Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment.
Nagar, Anurag; Hahsler, Michael
2013-01-01
Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to identify conserved regions fast or even interactively using a standard PC. Our method has many potential applications such as finding characteristic signature sequences for families of organisms and studying conserved and variable regions in, for example, 16S rRNA.
preAssemble: a tool for automatic sequencer trace data processing.
Adzhubei, Alexei A; Laerdahl, Jon K; Vlasova, Anna V
2006-01-17
Trace or chromatogram files (raw data) are produced by automatic nucleic acid sequencing equipment or sequencers. Each file contains information which can be interpreted by specialised software to reveal the sequence (base calling). This is done by the sequencer proprietary software or publicly available programs. Depending on the size of a sequencing project the number of trace files can vary from just a few to thousands of files. Sequencing quality assessment on various criteria is important at the stage preceding clustering and contig assembly. Two major publicly available packages--Phred and Staden are used by preAssemble to perform sequence quality processing. The preAssemble pre-assembly sequence processing pipeline has been developed for small to large scale automatic processing of DNA sequencer chromatogram (trace) data. The Staden Package Pregap4 module and base-calling program Phred are utilized in the pipeline, which produces detailed and self-explanatory output that can be displayed with a web browser. preAssemble can be used successfully with very little previous experience, however options for parameter tuning are provided for advanced users. preAssemble runs under UNIX and LINUX operating systems. It is available for downloading and will run as stand-alone software. It can also be accessed on the Norwegian Salmon Genome Project web site where preAssemble jobs can be run on the project server. preAssemble is a tool allowing to perform quality assessment of sequences generated by automatic sequencing equipment. preAssemble is flexible since both interactive jobs on the preAssemble server and the stand alone downloadable version are available. Virtually no previous experience is necessary to run a default preAssemble job, on the other hand options for parameter tuning are provided. Consequently preAssemble can be used as efficiently for just several trace files as for large scale sequence processing.
Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns
2013-01-01
Background It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enough for searching large sequence databases because of the high computational costs of the underlying sequence-structure alignment problem. Results We present new fast index-based and online algorithms for approximate matching of RNA sequence-structure patterns supporting a full set of edit operations on single bases and base pairs. Our methods efficiently compute semi-global alignments of structural RNA patterns and substrings of the target sequence whose costs satisfy a user-defined sequence-structure edit distance threshold. For this purpose, we introduce a new computing scheme to optimally reuse the entries of the required dynamic programming matrices for all substrings and combine it with a technique for avoiding the alignment computation of non-matching substrings. Our new index-based methods exploit suffix arrays preprocessed from the target database and achieve running times that are sublinear in the size of the searched sequences. To support the description of RNA molecules that fold into complex secondary structures with multiple ordered sequence-structure patterns, we use fast algorithms for the local or global chaining of approximate sequence-structure pattern matches. The chaining step removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our improved online algorithm is faster than the best previous method by up to factor 45. Our best new index-based algorithm achieves a speedup of factor 560. Conclusions The presented methods achieve considerable speedups compared to the best previous method. This, together with the expected sublinear running time of the presented index-based algorithms, allows for the first time approximate matching of RNA sequence-structure patterns in large sequence databases. Beyond the algorithmic contributions, we provide with RaligNAtor a robust and well documented open-source software package implementing the algorithms presented in this manuscript. The RaligNAtor software is available at http://www.zbh.uni-hamburg.de/ralignator. PMID:23865810
2012-01-01
Background Adaptive divergence driven by environmental heterogeneity has long been a fascinating topic in ecology and evolutionary biology. The study of the genetic basis of adaptive divergence has, however, been greatly hampered by a lack of genomic information. The recent development of transcriptome sequencing provides an unprecedented opportunity to generate large amounts of genomic data for detailed investigations of the genetics of adaptive divergence in non-model organisms. Herein, we used the Illumina sequencing platform to sequence the transcriptome of brain and liver tissues from a single individual of the Vinous-throated Parrotbill, Paradoxornis webbianus bulomachus, an ecologically important avian species in Taiwan with a wide elevational range of sea level to 3100 m. Results Our 10.1 Gbp of sequences were first assembled based on Zebra Finch (Taeniopygia guttata) and chicken (Gallus gallus) RNA references. The remaining reads were then de novo assembled. After filtering out contigs with low coverage (<10X), we retained 67,791 of 487,336 contigs, which covered approximately 5.3% of the P. w. bulomachus genome. Of 7,779 contigs retained for a top-hit species distribution analysis, the majority (about 86%) were matched to known Zebra Finch and chicken transcripts. We also annotated 6,365 contigs to gene ontology (GO) terms: in total, 122 GO-slim terms were assigned, including biological process (41%), molecular function (32%), and cellular component (27%). Many potential genetic markers for future adaptive genomic studies were also identified: 8,589 single nucleotide polymorphisms, 1,344 simple sequence repeats and 109 candidate genes that might be involved in elevational or climate adaptation. Conclusions Our study shows that transcriptome data can serve as a rich genetic resource, even for a single run of short-read sequencing from a single individual of a non-model species. This is the first study providing transcriptomic information for species in the avian superfamily Sylvioidea, which comprises more than 1,000 species. Our data can be used to study adaptive divergence in heterogeneous environments and investigate other important ecological and evolutionary questions in parrotbills from different populations and even in other species in the Sylvioidea. PMID:22530590
HEP Computing Tools, Grid and Supercomputers for Genome Sequencing Studies
NASA Astrophysics Data System (ADS)
De, K.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Novikov, A.; Poyda, A.; Tertychnyy, I.; Wenaus, T.
2017-10-01
PanDA - Production and Distributed Analysis Workload Management System has been developed to address ATLAS experiment at LHC data processing and analysis challenges. Recently PanDA has been extended to run HEP scientific applications on Leadership Class Facilities and supercomputers. The success of the projects to use PanDA beyond HEP and Grid has drawn attention from other compute intensive sciences such as bioinformatics. Recent advances of Next Generation Genome Sequencing (NGS) technology led to increasing streams of sequencing data that need to be processed, analysed and made available for bioinformaticians worldwide. Analysis of genomes sequencing data using popular software pipeline PALEOMIX can take a month even running it on the powerful computer resource. In this paper we will describe the adaptation the PALEOMIX pipeline to run it on a distributed computing environment powered by PanDA. To run pipeline we split input files into chunks which are run separately on different nodes as separate inputs for PALEOMIX and finally merge output file, it is very similar to what it done by ATLAS to process and to simulate data. We dramatically decreased the total walltime because of jobs (re)submission automation and brokering within PanDA. Using software tools developed initially for HEP and Grid can reduce payload execution time for Mammoths DNA samples from weeks to days.
Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca
2015-01-01
Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources. PMID:26151450
Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca
2015-01-01
Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.
SCALCE: boosting sequence compression algorithms using locally consistent encoding.
Hach, Faraz; Numanagic, Ibrahim; Alkan, Can; Sahinalp, S Cenk
2012-12-01
The high throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for the computational infrastructure. Data management, storage and analysis have become major logistical obstacles for those adopting the new platforms. The requirement for large investment for this purpose almost signalled the end of the Sequence Read Archive hosted at the National Center for Biotechnology Information (NCBI), which holds most of the sequence data generated world wide. Currently, most HTS data are compressed through general purpose algorithms such as gzip. These algorithms are not designed for compressing data generated by the HTS platforms; for example, they do not take advantage of the specific nature of genomic sequence data, that is, limited alphabet size and high similarity among reads. Fast and efficient compression algorithms designed specifically for HTS data should be able to address some of the issues in data management, storage and communication. Such algorithms would also help with analysis provided they offer additional capabilities such as random access to any read and indexing for efficient sequence similarity search. Here we present SCALCE, a 'boosting' scheme based on Locally Consistent Parsing technique, which reorganizes the reads in a way that results in a higher compression speed and compression rate, independent of the compression algorithm in use and without using a reference genome. Our tests indicate that SCALCE can improve the compression rate achieved through gzip by a factor of 4.19-when the goal is to compress the reads alone. In fact, on SCALCE reordered reads, gzip running time can improve by a factor of 15.06 on a standard PC with a single core and 6 GB memory. Interestingly even the running time of SCALCE + gzip improves that of gzip alone by a factor of 2.09. When compared with the recently published BEETL, which aims to sort the (inverted) reads in lexicographic order for improving bzip2, SCALCE + gzip provides up to 2.01 times better compression while improving the running time by a factor of 5.17. SCALCE also provides the option to compress the quality scores as well as the read names, in addition to the reads themselves. This is achieved by compressing the quality scores through order-3 Arithmetic Coding (AC) and the read names through gzip through the reordering SCALCE provides on the reads. This way, in comparison with gzip compression of the unordered FASTQ files (including reads, read names and quality scores), SCALCE (together with gzip and arithmetic encoding) can provide up to 3.34 improvement in the compression rate and 1.26 improvement in running time. Our algorithm, SCALCE (Sequence Compression Algorithm using Locally Consistent Encoding), is implemented in C++ with both gzip and bzip2 compression options. It also supports multithreading when gzip option is selected, and the pigz binary is available. It is available at http://scalce.sourceforge.net. fhach@cs.sfu.ca or cenk@cs.sfu.ca Supplementary data are available at Bioinformatics online.
Comparison of muscle synergies for running between different foot strike patterns
Nishida, Koji; Hagio, Shota; Kibushi, Benio; Moritani, Toshio; Kouzaki, Motoki
2017-01-01
It is well known that humans run with a fore-foot strike (FFS), a mid-foot strike (MFS) or a rear-foot strike (RFS). A modular neural control mechanism of human walking and running has been discussed in terms of muscle synergies. However, the neural control mechanisms for different foot strike patterns during running have been overlooked even though kinetic and kinematic differences between different foot strike patterns have been reported. Thus, we examined the differences in the neural control mechanisms of human running between FFS and RFS by comparing the muscle synergies extracted from each foot strike pattern during running. Muscle synergies were extracted using non-negative matrix factorization with electromyogram activity recorded bilaterally from 12 limb and trunk muscles in ten male subjects during FFS and RFS running at different speeds (5–15 km/h). Six muscle synergies were extracted from all conditions, and each synergy had a specific function and a single main peak of activity in a cycle. The six muscle synergies were similar between FFS and RFS as well as across subjects and speeds. However, some muscle weightings showed significant differences between FFS and RFS, especially the weightings of the tibialis anterior of the landing leg in synergies activated just before touchdown. The activation patterns of the synergies were also different for each foot strike pattern in terms of the timing, duration, and magnitude of the main peak of activity. These results suggest that the central nervous system controls running by sending a sequence of signals to six muscle synergies. Furthermore, a change in the foot strike pattern is accomplished by modulating the timing, duration and magnitude of the muscle synergy activity and by selectively activating other muscle synergies or subsets of the muscle synergies. PMID:28158258
Nanopore DNA Sequencing and Genome Assembly on the International Space Station.
Castro-Wallace, Sarah L; Chiu, Charles Y; John, Kristen K; Stahl, Sarah E; Rubins, Kathleen H; McIntyre, Alexa B R; Dworkin, Jason P; Lupisella, Mark L; Smith, David J; Botkin, Douglas J; Stephenson, Timothy A; Juul, Sissel; Turner, Daniel J; Izquierdo, Fernando; Federman, Scot; Stryke, Doug; Somasekar, Sneha; Alexander, Noah; Yu, Guixia; Mason, Christopher E; Burton, Aaron S
2017-12-21
We evaluated the performance of the MinION DNA sequencer in-flight on the International Space Station (ISS), and benchmarked its performance off-Earth against the MinION, Illumina MiSeq, and PacBio RS II sequencing platforms in terrestrial laboratories. Samples contained equimolar mixtures of genomic DNA from lambda bacteriophage, Escherichia coli (strain K12, MG1655) and Mus musculus (female BALB/c mouse). Nine sequencing runs were performed aboard the ISS over a 6-month period, yielding a total of 276,882 reads with no apparent decrease in performance over time. From sequence data collected aboard the ISS, we constructed directed assemblies of the ~4.6 Mb E. coli genome, ~48.5 kb lambda genome, and a representative M. musculus sequence (the ~16.3 kb mitochondrial genome), at 100%, 100%, and 96.7% consensus pairwise identity, respectively; de novo assembly of the E. coli genome from raw reads yielded a single contig comprising 99.9% of the genome at 98.6% consensus pairwise identity. Simulated real-time analyses of in-flight sequence data using an automated bioinformatic pipeline and laptop-based genomic assembly demonstrated the feasibility of sequencing analysis and microbial identification aboard the ISS. These findings illustrate the potential for sequencing applications including disease diagnosis, environmental monitoring, and elucidating the molecular basis for how organisms respond to spaceflight.
Next-Generation Molecular Testing of Newborn Dried Blood Spots for Cystic Fibrosis.
Lefterova, Martina I; Shen, Peidong; Odegaard, Justin I; Fung, Eula; Chiang, Tsoyu; Peng, Gang; Davis, Ronald W; Wang, Wenyi; Kharrazi, Martin; Schrijver, Iris; Scharfe, Curt
2016-03-01
Newborn screening for cystic fibrosis enables early detection and management of this debilitating genetic disease. Implementing comprehensive CFTR analysis using Sanger sequencing as a component of confirmatory testing of all screen-positive newborns has remained impractical due to relatively lengthy turnaround times and high cost. Here, we describe CFseq, a highly sensitive, specific, rapid (<3 days), and cost-effective assay for comprehensive CFTR gene analysis from dried blood spots, the common newborn screening specimen. The unique design of CFseq integrates optimized dried blood spot sample processing, a novel multiplex amplification method from as little as 1 ng of genomic DNA, and multiplex next-generation sequencing of 96 samples in a single run to detect all relevant CFTR mutation types. Sequence data analysis utilizes publicly available software supplemented by an expert-curated compendium of >2000 CFTR variants. Validation studies across 190 dried blood spots demonstrated 100% sensitivity and a positive predictive value of 100% for single-nucleotide variants and insertions and deletions and complete concordance across the polymorphic poly-TG and consecutive poly-T tracts. Additionally, we accurately detected both a known exon 2,3 deletion and a previously undetected exon 22,23 deletion. CFseq is thus able to replace all existing CFTR molecular assays with a single robust, definitive assay at significant cost and time savings and could be adapted to high-throughput screening of other inherited conditions. Copyright © 2016 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Using video-oriented instructions to speed up sequence comparison.
Wozniak, A
1997-04-01
This document presents an implementation of the well-known Smith-Waterman algorithm for comparison of proteic and nucleic sequences, using specialized video instructions. These instructions, SIMD-like in their design, make possible parallelization of the algorithm at the instruction level. Benchmarks on an ULTRA SPARC running at 167 MHz show a speed-up factor of two compared to the same algorithm implemented with integer instructions on the same machine. Performance reaches over 18 million matrix cells per second on a single processor, giving to our knowledge the fastest implementation of the Smith-Waterman algorithm on a workstation. The accelerated procedure was introduced in LASSAP--a LArge Scale Sequence compArison Package software developed at INRIA--which handles parallelism at higher level. On a SUN Enterprise 6000 server with 12 processors, a speed of nearly 200 million matrix cells per second has been obtained. A sequence of length 300 amino acids is scanned against SWISSPROT R33 (1,8531,385 residues) in 29 s. This procedure is not restricted to databank scanning. It applies to all cases handled by LASSAP (intra- and inter-bank comparisons, Z-score computation, etc.
Discovering Motifs in Biological Sequences Using the Micron Automata Processor.
Roy, Indranil; Aluru, Srinivas
2016-01-01
Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology.
Devesse, Laurence; Ballard, David; Davenport, Lucinda; Riethorst, Immy; Mason-Buck, Gabriella; Syndercombe Court, Denise
2018-05-01
By using sequencing technology to genotype loci of forensic interest it is possible to simultaneously target autosomal, X and Y STRs as well as identity, ancestry and phenotypic informative SNPs, resulting in a breadth of data obtained from a single run that is considerable when compared to that generated with standard technologies. It is important however that this information aligns with the genotype data currently obtained using commercially available kits for CE-based investigations such that results are compatible with existing databases and hence can be of use to the forensic community. In this work, 400 samples were typed using commercially available STR kits and CE, as well as using the Ilumina ForenSeq™ DNA Signature Prep Kit and MiSeq ® FGx to assess concordance of autosomal STRs and population variability. Results show a concordance rate between the two technologies exceeding 99.98% while numerous novel sequence based alleles are described. In order to make use of the sequence variation observed, sequence specific allele frequencies were generated for White British and British Chinese populations. Copyright © 2017 Elsevier B.V. All rights reserved.
Herbold, Craig W.; Pelikan, Claus; Kuzyk, Orest; Hausmann, Bela; Angel, Roey; Berry, David; Loy, Alexander
2015-01-01
High throughput sequencing of phylogenetic and functional gene amplicons provides tremendous insight into the structure and functional potential of complex microbial communities. Here, we introduce a highly adaptable and economical PCR approach to barcoding and pooling libraries of numerous target genes. In this approach, we replace gene- and sequencing platform-specific fusion primers with general, interchangeable barcoding primers, enabling nearly limitless customized barcode-primer combinations. Compared to barcoding with long fusion primers, our multiple-target gene approach is more economical because it overall requires lower number of primers and is based on short primers with generally lower synthesis and purification costs. To highlight our approach, we pooled over 900 different small-subunit rRNA and functional gene amplicon libraries obtained from various environmental or host-associated microbial community samples into a single, paired-end Illumina MiSeq run. Although the amplicon regions ranged in size from approximately 290 to 720 bp, we found no significant systematic sequencing bias related to amplicon length or gene target. Our results indicate that this flexible multiplexing approach produces large, diverse, and high quality sets of amplicon sequence data for modern studies in microbial ecology. PMID:26236305
Romer, Katherine A.; Kayombya, Guy-Richard; Fraenkel, Ernest
2007-01-01
WebMOTIFS provides a web interface that facilitates the discovery and analysis of DNA-sequence motifs. Several studies have shown that the accuracy of motif discovery can be significantly improved by using multiple de novo motif discovery programs and using randomized control calculations to identify the most significant motifs or by using Bayesian approaches. WebMOTIFS makes it easy to apply these strategies. Using a single submission form, users can run several motif discovery programs and score, cluster and visualize the results. In addition, the Bayesian motif discovery program THEME can be used to determine the class of transcription factors that is most likely to regulate a set of sequences. Input can be provided as a list of gene or probe identifiers. Used with the default settings, WebMOTIFS accurately identifies biologically relevant motifs from diverse data in several species. WebMOTIFS is freely available at http://fraenkel.mit.edu/webmotifs. PMID:17584794
Mulder, Kevin P.; Cortazar-Chinarro, Maria; Harris, D. James; Crottini, Angelica; Grant, Evan H. Campbell; Fleischer, Robert C.; Savage, Anna E.
2017-01-01
The Major Histocompatibility Complex (MHC) is a genomic region encoding immune loci that are important and frequently used markers in studies of adaptive genetic variation and disease resistance. Given the primary role of infectious diseases in contributing to global amphibian declines, we characterized the hypervariable exon 2 and flanking introns of the MHC Class IIβ chain for 17 species of frogs in the Ranidae, a speciose and cosmopolitan family facing widespread pathogen infections and declines. We find high levels of genetic variation concentrated in the Peptide Binding Region (PBR) of the exon. Ten codons are under positive selection, nine of which are located in the mammal-defined PBR. We hypothesize that the tenth codon (residue 21) is an amphibian-specific PBR site that may be important in disease resistance. Trans-species and trans-generic polymorphisms are evident from exon-based genealogies, and co-phylogenetic analyses between intron, exon and mitochondrial based reconstructions reveal incongruent topologies, likely due to different locus histories. We developed two sets of barcoded adapters that reliably amplify a single and likely functional locus in all screened species using both 454 and Illumina based sequencing methods. These primers provide a resource for multiplexing and directly sequencing hundreds of samples in a single sequencing run, avoiding the labour and chimeric sequences associated with cloning, and enabling MHC population genetic analyses. Although the primers are currently limited to the 17 species we tested, these sequences and protocols provide a useful genetic resource and can serve as a starting point for future disease, adaptation and conservation studies across a range of anuran taxa.
Mulder, Kevin P; Cortazar-Chinarro, Maria; Harris, D James; Crottini, Angelica; Campbell Grant, Evan H; Fleischer, Robert C; Savage, Anna E
2017-11-01
The Major Histocompatibility Complex (MHC) is a genomic region encoding immune loci that are important and frequently used markers in studies of adaptive genetic variation and disease resistance. Given the primary role of infectious diseases in contributing to global amphibian declines, we characterized the hypervariable exon 2 and flanking introns of the MHC Class IIβ chain for 17 species of frogs in the Ranidae, a speciose and cosmopolitan family facing widespread pathogen infections and declines. We find high levels of genetic variation concentrated in the Peptide Binding Region (PBR) of the exon. Ten codons are under positive selection, nine of which are located in the mammal-defined PBR. We hypothesize that the tenth codon (residue 21) is an amphibian-specific PBR site that may be important in disease resistance. Trans-species and trans-generic polymorphisms are evident from exon-based genealogies, and co-phylogenetic analyses between intron, exon and mitochondrial based reconstructions reveal incongruent topologies, likely due to different locus histories. We developed two sets of barcoded adapters that reliably amplify a single and likely functional locus in all screened species using both 454 and Illumina based sequencing methods. These primers provide a resource for multiplexing and directly sequencing hundreds of samples in a single sequencing run, avoiding the labour and chimeric sequences associated with cloning, and enabling MHC population genetic analyses. Although the primers are currently limited to the 17 species we tested, these sequences and protocols provide a useful genetic resource and can serve as a starting point for future disease, adaptation and conservation studies across a range of anuran taxa. Copyright © 2017 Elsevier Ltd. All rights reserved.
Integrated Building Management System (IBMS)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anita Lewis
This project provides a combination of software and services that more easily and cost-effectively help to achieve optimized building performance and energy efficiency. Featuring an open-platform, cloud- hosted application suite and an intuitive user experience, this solution simplifies a traditionally very complex process by collecting data from disparate building systems and creating a single, integrated view of building and system performance. The Fault Detection and Diagnostics algorithms developed within the IBMS have been designed and tested as an integrated component of the control algorithms running the equipment being monitored. The algorithms identify the normal control behaviors of the equipment withoutmore » interfering with the equipment control sequences. The algorithms also work without interfering with any cooperative control sequences operating between different pieces of equipment or building systems. In this manner the FDD algorithms create an integrated building management system.« less
Traverse, Charles C; Ochman, Howard
2017-08-29
Advances in sequencing technologies have enabled direct quantification of genome-wide errors that occur during RNA transcription. These errors occur at rates that are orders of magnitude higher than rates during DNA replication, but due to technical difficulties such measurements have been limited to single-base substitutions and have not yet quantified the scope of transcription insertions and deletions. Previous reporter gene assay findings suggested that transcription indels are produced exclusively by elongation complex slippage at homopolymeric runs, so we enumerated indels across the protein-coding transcriptomes of Escherichia coli and Buchnera aphidicola , which differ widely in their genomic base compositions and incidence of repeat regions. As anticipated from prior assays, transcription insertions prevailed in homopolymeric runs of A and T; however, transcription deletions arose in much more complex sequences and were rarely associated with homopolymeric runs. By reconstructing the relocated positions of the elongation complex as inferred from the sequences inserted or deleted during transcription, we show that continuation of transcription after slippage hinges on the degree of nucleotide complementarity within the RNA:DNA hybrid at the new DNA template location. IMPORTANCE The high level of mistakes generated during transcription can result in the accumulation of malfunctioning and misfolded proteins which can alter global gene regulation and in the expenditure of energy to degrade these nonfunctional proteins. The transcriptome-wide occurrence of base substitutions has been elucidated in bacteria, but information on transcription insertions and deletions-errors that potentially have more dire effects on protein function-is limited to reporter gene constructs. Here, we capture the transcriptome-wide spectrum of insertions and deletions in Escherichia coli and Buchnera aphidicola and show that they occur at rates approaching those of base substitutions. Knowledge of the full extent of sequences subject to transcription indels supports a new model of bacterial transcription slippage, one that relies on the number of complementary bases between the transcript and the DNA template to which it slipped. Copyright © 2017 Traverse and Ochman.
A massive parallel sequencing workflow for diagnostic genetic testing of mismatch repair genes
Hansen, Maren F; Neckmann, Ulrike; Lavik, Liss A S; Vold, Trine; Gilde, Bodil; Toft, Ragnhild K; Sjursen, Wenche
2014-01-01
The purpose of this study was to develop a massive parallel sequencing (MPS) workflow for diagnostic analysis of mismatch repair (MMR) genes using the GS Junior system (Roche). A pathogenic variant in one of four MMR genes, (MLH1, PMS2, MSH6, and MSH2), is the cause of Lynch Syndrome (LS), which mainly predispose to colorectal cancer. We used an amplicon-based sequencing method allowing specific and preferential amplification of the MMR genes including PMS2, of which several pseudogenes exist. The amplicons were pooled at different ratios to obtain coverage uniformity and maximize the throughput of a single-GS Junior run. In total, 60 previously identified and distinct variants (substitutions and indels), were sequenced by MPS and successfully detected. The heterozygote detection range was from 19% to 63% and dependent on sequence context and coverage. We were able to distinguish between false-positive and true-positive calls in homopolymeric regions by cross-sample comparison and evaluation of flow signal distributions. In addition, we filtered variants according to a predefined status, which facilitated variant annotation. Our study shows that implementation of MPS in routine diagnostics of LS can accelerate sample throughput and reduce costs without compromising sensitivity, compared to Sanger sequencing. PMID:24689082
Structator: fast index-based search for RNA sequence-structure patterns
2011-01-01
Background The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs. Results We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods. Conclusions The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator. PMID:21619640
Compacting de Bruijn graphs from sequencing data quickly and in low memory.
Chikhi, Rayan; Limasset, Antoine; Medvedev, Paul
2016-06-15
As the quantity of data per sequencing experiment increases, the challenges of fragment assembly are becoming increasingly computational. The de Bruijn graph is a widely used data structure in fragment assembly algorithms, used to represent the information from a set of reads. Compaction is an important data reduction step in most de Bruijn graph based algorithms where long simple paths are compacted into single vertices. Compaction has recently become the bottleneck in assembly pipelines, and improving its running time and memory usage is an important problem. We present an algorithm and a tool bcalm 2 for the compaction of de Bruijn graphs. bcalm 2 is a parallel algorithm that distributes the input based on a minimizer hashing technique, allowing for good balance of memory usage throughout its execution. For human sequencing data, bcalm 2 reduces the computational burden of compacting the de Bruijn graph to roughly an hour and 3 GB of memory. We also applied bcalm 2 to the 22 Gbp loblolly pine and 20 Gbp white spruce sequencing datasets. Compacted graphs were constructed from raw reads in less than 2 days and 40 GB of memory on a single machine. Hence, bcalm 2 is at least an order of magnitude more efficient than other available methods. Source code of bcalm 2 is freely available at: https://github.com/GATB/bcalm rayan.chikhi@univ-lille1.fr. © The Author 2016. Published by Oxford University Press.
Computational Prediction of miRNA Genes from Small RNA Sequencing Data
Kang, Wenjing; Friedländer, Marc R.
2015-01-01
Next-generation sequencing now for the first time allows researchers to gage the depth and variation of entire transcriptomes. However, now as rare transcripts can be detected that are present in cells at single copies, more advanced computational tools are needed to accurately annotate and profile them. microRNAs (miRNAs) are 22 nucleotide small RNAs (sRNAs) that post-transcriptionally reduce the output of protein coding genes. They have established roles in numerous biological processes, including cancers and other diseases. During miRNA biogenesis, the sRNAs are sequentially cleaved from precursor molecules that have a characteristic hairpin RNA structure. The vast majority of new miRNA genes that are discovered are mined from small RNA sequencing (sRNA-seq), which can detect more than a billion RNAs in a single run. However, given that many of the detected RNAs are degradation products from all types of transcripts, the accurate identification of miRNAs remain a non-trivial computational problem. Here, we review the tools available to predict animal miRNAs from sRNA sequencing data. We present tools for generalist and specialist use cases, including prediction from massively pooled data or in species without reference genome. We also present wet-lab methods used to validate predicted miRNAs, and approaches to computationally benchmark prediction accuracy. For each tool, we reference validation experiments and benchmarking efforts. Last, we discuss the future of the field. PMID:25674563
Myers, E W; Mount, D W
1986-01-01
We describe a program which may be used to find approximate matches to a short predefined DNA sequence in a larger target DNA sequence. The program predicts the usefulness of specific DNA probes and sequencing primers and finds nearly identical sequences that might represent the same regulatory signal. The program is written in the C programming language and will run on virtually any computer system with a C compiler, such as the IBM/PC and other computers running under the MS/DOS and UNIX operating systems. The program has been integrated into an existing software package for the IBM personal computer (see article by Mount and Conrad, this volume). Some examples of its use are given. PMID:3753785
Conformational flexibility of two RNA trimers explored by computational tools and database search.
Fadrná, Eva; Koca, Jaroslav
2003-04-01
Two RNA sequences, AAA and AUG, were studied by the conformational search program CICADA and by molecular dynamics (MD) in the framework of the AMBER force field, and also via thorough PDB database search. CICADA was used to provide detailed information about conformers and conformational interconversions on the energy surfaces of the above molecules. Several conformational families were found for both sequences. Analysis of the results shows differences, especially between the energy of the single families, and also in flexibility and concerted conformational movement. Therefore, several MD trajectories (altogether 16 ns) were run to obtain more details about both the stability of conformers belonging to different conformational families and about the dynamics of the two systems. Results show that the trajectories strongly depend on the starting structure. When the MD start from the global minimum found by CICADA, they provide a stable run, while MD starting from another conformational family generates a trajectory where several different conformational families are visited. The results obtained by theoretical methods are compared with the thorough database search data. It is concluded that all except for the highest energy conformational families found in theoretical result also appear in experimental data. Registry numbers: adenylyl-(3' --> 5')-adenylyl-(3' --> 5')-adenosine [917-44-2] adenylyl-(3' --> 5')-uridylyl-(3' --> 5')-guanosine [3494-35-7].
Implicit Learning of a Finger Motor Sequence by Patients with Cerebral Palsy After Neurofeedback.
Alves-Pinto, Ana; Turova, Varvara; Blumenstein, Tobias; Hantuschke, Conny; Lampe, Renée
2017-03-01
Facilitation of implicit learning of a hand motor sequence after a single session of neurofeedback training of alpha power recorded from the motor cortex has been shown in healthy individuals (Ros et al., Biological Psychology 95:54-58, 2014). This facilitation effect could be potentially applied to improve the outcome of rehabilitation in patients with impaired hand motor function. In the current study a group of ten patients diagnosed with cerebral palsy trained reduction of alpha power derived from brain activity recorded from right and left motor areas. Training was distributed in three periods of 8 min each. In between, participants performed a serial reaction time task with their non-dominant hand, to a total of five runs. A similar procedure was repeated a week or more later but this time training was based on simulated brain activity. Reaction times pooled across participants decreased on each successive run faster after neurofeedback training than after the simulation training. Also recorded were two 3-min baseline conditions, once with the eyes open, another with the eyes closed, at the beginning and end of the experimental session. No significant changes in alpha power with neurofeedback or with simulation training were obtained and no correlation with the reductions in reaction time could be established. Contributions for this are discussed.
Monitoring Error Rates In Illumina Sequencing.
Manley, Leigh J; Ma, Duanduan; Levine, Stuart S
2016-12-01
Guaranteeing high-quality next-generation sequencing data in a rapidly changing environment is an ongoing challenge. The introduction of the Illumina NextSeq 500 and the depreciation of specific metrics from Illumina's Sequencing Analysis Viewer (SAV; Illumina, San Diego, CA, USA) have made it more difficult to determine directly the baseline error rate of sequencing runs. To improve our ability to measure base quality, we have created an open-source tool to construct the Percent Perfect Reads (PPR) plot, previously provided by the Illumina sequencers. The PPR program is compatible with HiSeq 2000/2500, MiSeq, and NextSeq 500 instruments and provides an alternative to Illumina's quality value (Q) scores for determining run quality. Whereas Q scores are representative of run quality, they are often overestimated and are sourced from different look-up tables for each platform. The PPR's unique capabilities as a cross-instrument comparison device, as a troubleshooting tool, and as a tool for monitoring instrument performance can provide an increase in clarity over SAV metrics that is often crucial for maintaining instrument health. These capabilities are highlighted.
Skill-dependent proximal-to-distal sequence in team-handball throwing.
Wagner, Herbert; Pfusterschmied, Jürgen; Von Duvillard, Serge P; Müller, Erich
2012-01-01
The importance of proximal-to-distal sequencing in human performance throwing has been reported previously. However, a comprehensive comparison of the proximal-to-distal sequence in team-handball throwing in athletes with different training experience and competition is lacking. Therefore, the aim of the study was to compare the ball velocity and proximal-to-distal sequence in the team-handball standing throw with run-up of players of different skill (less experienced, experienced, and elite). Twenty-four male team-handball players (n = 8 for each group) performed five standing throws with run-up with maximal ball velocity and accuracy. Kinematics and ball trajectories were recorded with a Vicon motion capture system and joint movements were calculated. A specific proximal-to-distal sequence, where elbow flexion occurred before shoulder internal rotation, was found in all three groups. These results are in line with previous studies in team-handball. Furthermore, the results of the present study suggest that in the team-handball standing throw with run-up, increased playing experience is associated with an increase in ball velocity as well as a delayed start to trunk flexion.
Parvin, C A
1993-03-01
The error detection characteristics of quality-control (QC) rules that use control observations within a single analytical run are investigated. Unlike the evaluation of QC rules that span multiple analytical runs, most of the fundamental results regarding the performance of QC rules applied within a single analytical run can be obtained from statistical theory, without the need for simulation studies. The case of two control observations per run is investigated for ease of graphical display, but the conclusions can be extended to more than two control observations per run. Results are summarized in a graphical format that offers many interesting insights into the relations among the various QC rules. The graphs provide heuristic support to the theoretical conclusions that no QC rule is best under all error conditions, but the multirule that combines the mean rule and a within-run standard deviation rule offers an attractive compromise.
CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment.
Chen, Xi; Wang, Chen; Tang, Shanjiang; Yu, Ce; Zou, Quan
2017-06-24
The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users' sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously. This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users' submitted sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its center sequence selection process from O(mn 2 ) to O(mn). The experimental results show that CMSA achieves an up to 11× speedup and outperforms the state-of-the-art software. CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of modern GPU is a promising approach to accelerate multiple sequence alignment. Besides, adopting the co-run computation model can maximize the entire system utilization significantly. The source code is available at https://github.com/wangvsa/CMSA .
Genome sequencing in microfabricated high-density picolitre reactors.
Margulies, Marcel; Egholm, Michael; Altman, William E; Attiya, Said; Bader, Joel S; Bemben, Lisa A; Berka, Jan; Braverman, Michael S; Chen, Yi-Ju; Chen, Zhoutao; Dewell, Scott B; Du, Lei; Fierro, Joseph M; Gomes, Xavier V; Godwin, Brian C; He, Wen; Helgesen, Scott; Ho, Chun Heen; Ho, Chun He; Irzyk, Gerard P; Jando, Szilveszter C; Alenquer, Maria L I; Jarvie, Thomas P; Jirage, Kshama B; Kim, Jong-Bum; Knight, James R; Lanza, Janna R; Leamon, John H; Lefkowitz, Steven M; Lei, Ming; Li, Jing; Lohman, Kenton L; Lu, Hong; Makhijani, Vinod B; McDade, Keith E; McKenna, Michael P; Myers, Eugene W; Nickerson, Elizabeth; Nobile, John R; Plant, Ramona; Puc, Bernard P; Ronan, Michael T; Roth, George T; Sarkis, Gary J; Simons, Jan Fredrik; Simpson, John W; Srinivasan, Maithreyan; Tartaro, Karrie R; Tomasz, Alexander; Vogt, Kari A; Volkmer, Greg A; Wang, Shally H; Wang, Yong; Weiner, Michael P; Yu, Pengguang; Begley, Richard F; Rothberg, Jonathan M
2005-09-15
The proliferation of large-scale DNA-sequencing projects in recent years has driven a search for alternative methods to reduce time and cost. Here we describe a scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments. The apparatus uses a novel fibre-optic slide of individual wells and is able to sequence 25 million bases, at 99% or better accuracy, in one four-hour run. To achieve an approximately 100-fold increase in throughput over current Sanger sequencing technology, we have developed an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing protocol optimized for solid support and picolitre-scale volumes. Here we show the utility, throughput, accuracy and robustness of this system by shotgun sequencing and de novo assembly of the Mycoplasma genitalium genome with 96% coverage at 99.96% accuracy in one run of the machine.
NASA Astrophysics Data System (ADS)
Derkachov, G.; Jakubczyk, T.; Jakubczyk, D.; Archer, J.; Woźniak, M.
2017-07-01
Utilising Compute Unified Device Architecture (CUDA) platform for Graphics Processing Units (GPUs) enables significant reduction of computation time at a moderate cost, by means of parallel computing. In the paper [Jakubczyk et al., Opto-Electron. Rev., 2016] we reported using GPU for Mie scattering inverse problem solving (up to 800-fold speed-up). Here we report the development of two subroutines utilising GPU at data preprocessing stages for the inversion procedure: (i) A subroutine, based on ray tracing, for finding spherical aberration correction function. (ii) A subroutine performing the conversion of an image to a 1D distribution of light intensity versus azimuth angle (i.e. scattering diagram), fed from a movie-reading CPU subroutine running in parallel. All subroutines are incorporated in PikeReader application, which we make available on GitHub repository. PikeReader returns a sequence of intensity distributions versus a common azimuth angle vector, corresponding to the recorded movie. We obtained an overall ∼ 400 -fold speed-up of calculations at data preprocessing stages using CUDA codes running on GPU in comparison to single thread MATLAB-only code running on CPU.
SCALCE: boosting sequence compression algorithms using locally consistent encoding
Hach, Faraz; Numanagić, Ibrahim; Sahinalp, S Cenk
2012-01-01
Motivation: The high throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for the computational infrastructure. Data management, storage and analysis have become major logistical obstacles for those adopting the new platforms. The requirement for large investment for this purpose almost signalled the end of the Sequence Read Archive hosted at the National Center for Biotechnology Information (NCBI), which holds most of the sequence data generated world wide. Currently, most HTS data are compressed through general purpose algorithms such as gzip. These algorithms are not designed for compressing data generated by the HTS platforms; for example, they do not take advantage of the specific nature of genomic sequence data, that is, limited alphabet size and high similarity among reads. Fast and efficient compression algorithms designed specifically for HTS data should be able to address some of the issues in data management, storage and communication. Such algorithms would also help with analysis provided they offer additional capabilities such as random access to any read and indexing for efficient sequence similarity search. Here we present SCALCE, a ‘boosting’ scheme based on Locally Consistent Parsing technique, which reorganizes the reads in a way that results in a higher compression speed and compression rate, independent of the compression algorithm in use and without using a reference genome. Results: Our tests indicate that SCALCE can improve the compression rate achieved through gzip by a factor of 4.19—when the goal is to compress the reads alone. In fact, on SCALCE reordered reads, gzip running time can improve by a factor of 15.06 on a standard PC with a single core and 6 GB memory. Interestingly even the running time of SCALCE + gzip improves that of gzip alone by a factor of 2.09. When compared with the recently published BEETL, which aims to sort the (inverted) reads in lexicographic order for improving bzip2, SCALCE + gzip provides up to 2.01 times better compression while improving the running time by a factor of 5.17. SCALCE also provides the option to compress the quality scores as well as the read names, in addition to the reads themselves. This is achieved by compressing the quality scores through order-3 Arithmetic Coding (AC) and the read names through gzip through the reordering SCALCE provides on the reads. This way, in comparison with gzip compression of the unordered FASTQ files (including reads, read names and quality scores), SCALCE (together with gzip and arithmetic encoding) can provide up to 3.34 improvement in the compression rate and 1.26 improvement in running time. Availability: Our algorithm, SCALCE (Sequence Compression Algorithm using Locally Consistent Encoding), is implemented in C++ with both gzip and bzip2 compression options. It also supports multithreading when gzip option is selected, and the pigz binary is available. It is available at http://scalce.sourceforge.net. Contact: fhach@cs.sfu.ca or cenk@cs.sfu.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23047557
NASA Technical Reports Server (NTRS)
Backes, Paul G. (Inventor); Tso, Kam S. (Inventor)
1993-01-01
This invention relates to an operator interface for controlling a telerobot to perform tasks in a poorly modeled environment and/or within unplanned scenarios. The telerobot control system includes a remote robot manipulator linked to an operator interface. The operator interface includes a setup terminal, simulation terminal, and execution terminal for the control of the graphics simulator and local robot actuator as well as the remote robot actuator. These terminals may be combined in a single terminal. Complex tasks are developed from sequential combinations of parameterized task primitives and recorded teleoperations, and are tested by execution on a graphics simulator and/or local robot actuator, together with adjustable time delays. The novel features of this invention include the shared and supervisory control of the remote robot manipulator via operator interface by pretested complex tasks sequences based on sequences of parameterized task primitives combined with further teleoperation and run-time binding of parameters based on task context.
Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis
Navarro, Javier; Nevado, Bruno; Hernández, Porfidio; Vera, Gonzalo; Ramos-Onsins, Sebastián E
2017-01-01
The accurate estimation of nucleotide variability using next-generation sequencing data is challenged by the high number of sequencing errors produced by new sequencing technologies, especially for nonmodel species, where reference sequences may not be available and the read depth may be low due to limited budgets. The most popular single-nucleotide polymorphism (SNP) callers are designed to obtain a high SNP recovery and low false discovery rate but are not designed to account appropriately the frequency of the variants. Instead, algorithms designed to account for the frequency of SNPs give precise results for estimating the levels and the patterns of variability. These algorithms are focused on the unbiased estimation of the variability and not on the high recovery of SNPs. Here, we implemented a fast and optimized parallel algorithm that includes the method developed by Roesti et al and Lynch, which estimates the genotype of each individual at each site, considering the possibility to call both bases from the genotype, a single one or none. This algorithm does not consider the reference and therefore is independent of biases related to the reference nucleotide specified. The pipeline starts from a BAM file converted to pileup or mpileup format and the software outputs a FASTA file. The new program not only reduces the running times but also, given the improved use of resources, it allows its usage with smaller computers and large parallel computers, expanding its benefits to a wider range of researchers. The output file can be analyzed using software for population genetics analysis, such as the R library PopGenome, the software VariScan, and the program mstatspop for analysis considering positions with missing data. PMID:28894353
Verbist, Bie M P; Thys, Kim; Reumers, Joke; Wetzels, Yves; Van der Borght, Koen; Talloen, Willem; Aerssens, Jeroen; Clement, Lieven; Thas, Olivier
2015-01-01
In virology, massively parallel sequencing (MPS) opens many opportunities for studying viral quasi-species, e.g. in HIV-1- and HCV-infected patients. This is essential for understanding pathways to resistance, which can substantially improve treatment. Although MPS platforms allow in-depth characterization of sequence variation, their measurements still involve substantial technical noise. For Illumina sequencing, single base substitutions are the main error source and impede powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores (Qs) that are useful for differentiating errors from the real low-frequency mutations. A variant calling tool, Q-cpileup, is proposed, which exploits the Qs of nucleotides in a filtering strategy to increase specificity. The tool is imbedded in an open-source pipeline, VirVarSeq, which allows variant calling starting from fastq files. Using both plasmid mixtures and clinical samples, we show that Q-cpileup is able to reduce the number of false-positive findings. The filtering strategy is adaptive and provides an optimized threshold for individual samples in each sequencing run. Additionally, linkage information is kept between single-nucleotide polymorphisms as variants are called at the codon level. This enables virologists to have an immediate biological interpretation of the reported variants with respect to their antiviral drug responses. A comparison with existing SNP caller tools reveals that calling variants at the codon level with Q-cpileup results in an outstanding sensitivity while maintaining a good specificity for variants with frequencies down to 0.5%. The VirVarSeq is available, together with a user's guide and test data, at sourceforge: http://sourceforge.net/projects/virtools/?source=directory. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Veeranagouda, Yaligara; Debono-Lagneaux, Delphine; Fournet, Hamida; Thill, Gilbert; Didier, Michel
2018-01-16
The emergence of clustered regularly interspaced short palindromic repeats-Cas9 (CRISPR-Cas9) gene editing systems has enabled the creation of specific mutants at low cost, in a short time and with high efficiency, in eukaryotic cells. Since a CRISPR-Cas9 system typically creates an array of mutations in targeted sites, a successful gene editing project requires careful selection of edited clones. This process can be very challenging, especially when working with multiallelic genes and/or polyploid cells (such as cancer and plants cells). Here we described a next-generation sequencing method called CRISPR-Cas9 Edited Site Sequencing (CRES-Seq) for the efficient and high-throughput screening of CRISPR-Cas9-edited clones. CRES-Seq facilitates the precise genotyping up to 96 CRISPR-Cas9-edited sites (CRES) in a single MiniSeq (Illumina) run with an approximate sequencing cost of $6/clone. CRES-Seq is particularly useful when multiple genes are simultaneously targeted by CRISPR-Cas9, and also for screening of clones generated from multiallelic genes/polyploid cells. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.
GenomeGems: evaluation of genetic variability from deep sequencing data
2012-01-01
Background Detection of disease-causing mutations using Deep Sequencing technologies possesses great challenges. In particular, organizing the great amount of sequences generated so that mutations, which might possibly be biologically relevant, are easily identified is a difficult task. Yet, for this assignment only limited automatic accessible tools exist. Findings We developed GenomeGems to gap this need by enabling the user to view and compare Single Nucleotide Polymorphisms (SNPs) from multiple datasets and to load the data onto the UCSC Genome Browser for an expanded and familiar visualization. As such, via automatic, clear and accessible presentation of processed Deep Sequencing data, our tool aims to facilitate ranking of genomic SNP calling. GenomeGems runs on a local Personal Computer (PC) and is freely available at http://www.tau.ac.il/~nshomron/GenomeGems. Conclusions GenomeGems enables researchers to identify potential disease-causing SNPs in an efficient manner. This enables rapid turnover of information and leads to further experimental SNP validation. The tool allows the user to compare and visualize SNPs from multiple experiments and to easily load SNP data onto the UCSC Genome browser for further detailed information. PMID:22748151
Selecting sequence variants to improve genomic predictions for dairy cattle
USDA-ARS?s Scientific Manuscript database
Millions of genetic variants have been identified by population-scale sequencing projects, but subsets are needed for routine genomic predictions or to include on genotyping arrays. Methods of selecting sequence variants were compared using both simulated sequence genotypes and actual data from run ...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tan, H.
1999-03-31
The purpose of this research is to develop a multiplexed sample processing system in conjunction with multiplexed capillary electrophoresis for high-throughput DNA sequencing. The concept from DNA template to called bases was first demonstrated with a manually operated single capillary system. Later, an automated microfluidic system with 8 channels based on the same principle was successfully constructed. The instrument automatically processes 8 templates through reaction, purification, denaturation, pre-concentration, injection, separation and detection in a parallel fashion. A multiplexed freeze/thaw switching principle and a distribution network were implemented to manage flow direction and sample transportation. Dye-labeled terminator cycle-sequencing reactions are performedmore » in an 8-capillary array in a hot air thermal cycler. Subsequently, the sequencing ladders are directly loaded into a corresponding size-exclusion chromatographic column operated at {approximately} 60 C for purification. On-line denaturation and stacking injection for capillary electrophoresis is simultaneously accomplished at a cross assembly set at {approximately} 70 C. Not only the separation capillary array but also the reaction capillary array and purification columns can be regenerated after every run. DNA sequencing data from this system allow base calling up to 460 bases with accuracy of 98%.« less
BarraCUDA - a fast short read sequence aligner using graphics processing units
2012-01-01
Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497
Bremner, P D; Blacklock, C J; Paganga, G; Mullen, W; Rice-Evans, C A; Crozier, A
2000-06-01
After minimal sample preparation, two different HPLC methodologies, one based on a single gradient reversed-phase HPLC step, the other on multiple HPLC runs each optimised for specific components, were used to investigate the composition of flavonoids and phenolic acids in apple and tomato juices. The principal components in apple juice were identified as chlorogenic acid, phloridzin, caffeic acid and p-coumaric acid. Tomato juice was found to contain chlorogenic acid, caffeic acid, p-coumaric acid, naringenin and rutin. The quantitative estimates of the levels of these compounds, obtained with the two HPLC procedures, were very similar, demonstrating that either method can be used to analyse accurately the phenolic components of apple and tomato juices. Chlorogenic acid in tomato juice was the only component not fully resolved in the single run study and the multiple run analysis prior to enzyme treatment. The single run system of analysis is recommended for the initial investigation of plant phenolics and the multiple run approach for analyses where chromatographic resolution requires improvement.
Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.
Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro
2010-05-07
Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.
Stegelmann, Frank; Bullinger, Lars; Griesshammer, Martin; Holzmann, Karlheinz; Habdank, Marianne; Kuhn, Susanne; Maile, Carmen; Schauer, Stefanie; Döhner, Hartmut; Döhner, Konstanze
2010-01-01
Single-nucleotide polymorphism arrays allow for genome-wide profiling of copy-number alterations and copy-neutral runs of homozygosity at high resolution. To identify novel genetic lesions in myeloproliferative neoplasms, a large series of 151 clinically well characterized patients was analyzed in our study. Copy-number alterations were rare in essential thrombocythemia and polycythemia vera. In contrast, approximately one third of myelofibrosis patients exhibited small genomic losses (less than 5 Mb). In 2 secondary myelofibrosis cases the tumor suppressor gene NF1 in 17q11.2 was affected. Sequencing analyses revealed a mutation in the remaining NF1 allele of one patient. In terms of copy-neutral aberrations, no chromosomes other than 9p were recurrently affected. In conclusion, novel genomic aberrations were identified in our study, in particular in patients with myelofibrosis. Further analyses on single-gene level are necessary to uncover the mechanisms that are involved in the pathogenesis of myeloproliferative neoplasms. PMID:20015882
GeoBoost: accelerating research involving the geospatial metadata of virus GenBank records.
Tahsin, Tasnia; Weissenbacher, Davy; O'Connor, Karen; Magge, Arjun; Scotch, Matthew; Gonzalez-Hernandez, Graciela
2018-05-01
GeoBoost is a command-line software package developed to address sparse or incomplete metadata in GenBank sequence records that relate to the location of the infected host (LOIH) of viruses. Given a set of GenBank accession numbers corresponding to virus GenBank records, GeoBoost extracts, integrates and normalizes geographic information reflecting the LOIH of the viruses using integrated information from GenBank metadata and related full-text publications. In addition, to facilitate probabilistic geospatial modeling, GeoBoost assigns probability scores for each possible LOIH. Binaries and resources required for running GeoBoost are packed into a single zipped file and freely available for download at https://tinyurl.com/geoboost. A video tutorial is included to help users quickly and easily install and run the software. The software is implemented in Java 1.8, and supported on MS Windows and Linux platforms. gragon@upenn.edu. Supplementary data are available at Bioinformatics online.
Autonomous proximity operations using machine vision for trajectory control and pose estimation
NASA Technical Reports Server (NTRS)
Cleghorn, Timothy F.; Sternberg, Stanley R.
1991-01-01
A machine vision algorithm was developed which permits guidance control to be maintained during autonomous proximity operations. At present this algorithm exists as a simulation, running upon an 80386 based personal computer, using a ModelMATE CAD package to render the target vehicle. However, the algorithm is sufficiently simple, so that following off-line training on a known target vehicle, it should run in real time with existing vision hardware. The basis of the algorithm is a sequence of single camera images of the target vehicle, upon which radial transforms were performed. Selected points of the resulting radial signatures are fed through a decision tree, to determine whether the signature matches that of the known reference signatures for a particular view of the target. Based upon recognized scenes, the position of the maneuvering vehicle with respect to the target vehicles can be calculated, and adjustments made in the former's trajectory. In addition, the pose and spin rates of the target satellite can be estimated using this method.
A Machine Learning Method for Power Prediction on the Mobile Devices.
Chen, Da-Ren; Chen, You-Shyang; Chen, Lin-Chih; Hsu, Ming-Yang; Chiang, Kai-Feng
2015-10-01
Energy profiling and estimation have been popular areas of research in multicore mobile architectures. While short sequences of system calls have been recognized by machine learning as pattern descriptions for anomalous detection, power consumption of running processes with respect to system-call patterns are not well studied. In this paper, we propose a fuzzy neural network (FNN) for training and analyzing process execution behaviour with respect to series of system calls, parameters and their power consumptions. On the basis of the patterns of a series of system calls, we develop a power estimation daemon (PED) to analyze and predict the energy consumption of the running process. In the initial stage, PED categorizes sequences of system calls as functional groups and predicts their energy consumptions by FNN. In the operational stage, PED is applied to identify the predefined sequences of system calls invoked by running processes and estimates their energy consumption.
ParDRe: faster parallel duplicated reads removal tool for sequencing studies.
González-Domínguez, Jorge; Schmidt, Bertil
2016-05-15
Current next generation sequencing technologies often generate duplicated or near-duplicated reads that (depending on the application scenario) do not provide any interesting biological information but can increase memory requirements and computational time of downstream analysis. In this work we present ParDRe, a de novo parallel tool to remove duplicated and near-duplicated reads through the clustering of Single-End or Paired-End sequences from fasta or fastq files. It uses a novel bitwise approach to compare the suffixes of DNA strings and employs hybrid MPI/multithreading to reduce runtime on multicore systems. We show that ParDRe is up to 27.29 times faster than Fulcrum (a representative state-of-the-art tool) on a platform with two 8-core Sandy-Bridge processors. Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at https://sourceforge.net/projects/pardre/ jgonzalezd@udc.es. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Hippocampal replay of extended experience
Davidson, Thomas J.; Kloosterman, Fabian; Wilson, Matthew A.
2009-01-01
Summary During pauses in exploration, ensembles of place cells in the rat hippocampus re-express firing sequences corresponding to recent spatial experience. Such ‘replay’ co-occurs with ripple events: short-lasting (~50–120 ms), high frequency (~200 Hz) oscillations that are associated with increased hippocampal-cortical communication. In previous studies, rats explored small environments, and replay was found to be anchored to the rat’s current location, and compressed in time such that replay of the complete environment occurred during a single ripple event. It is not known whether or how longer behavioral sequences are replayed in the hippocampus. Here we show, using a neural decoding approach, that firing sequences corresponding to long runs through a large environment are replayed with high fidelity (in both forward and reverse order), and that such replay can begin at remote locations on the track. Extended replay proceeds at a characteristic virtual speed of ~8 m/s, and remains coherent across trains of ripple events. These results suggest that extended replay is composed of chains of shorter subsequences, which may reflect a strategy for the storage and flexible expression of memories of prolonged experience. PMID:19709631
Combining fluorescence imaging with Hi-C to study 3D genome architecture of the same single cell.
Lando, David; Basu, Srinjan; Stevens, Tim J; Riddell, Andy; Wohlfahrt, Kai J; Cao, Yang; Boucher, Wayne; Leeb, Martin; Atkinson, Liam P; Lee, Steven F; Hendrich, Brian; Klenerman, Dave; Laue, Ernest D
2018-05-01
Fluorescence imaging and chromosome conformation capture assays such as Hi-C are key tools for studying genome organization. However, traditionally, they have been carried out independently, making integration of the two types of data difficult to perform. By trapping individual cell nuclei inside a well of a 384-well glass-bottom plate with an agarose pad, we have established a protocol that allows both fluorescence imaging and Hi-C processing to be carried out on the same single cell. The protocol identifies 30,000-100,000 chromosome contacts per single haploid genome in parallel with fluorescence images. Contacts can be used to calculate intact genome structures to better than 100-kb resolution, which can then be directly compared with the images. Preparation of 20 single-cell Hi-C libraries using this protocol takes 5 d of bench work by researchers experienced in molecular biology techniques. Image acquisition and analysis require basic understanding of fluorescence microscopy, and some bioinformatics knowledge is required to run the sequence-processing tools described here.
Influence of Number of Contact Efforts on Running Performance During Game-Based Activities.
Johnston, Rich D; Gabbett, Tim J; Jenkins, David G
2015-09-01
To determine the influence the number of contact efforts during a single bout has on running intensity during game-based activities and assess relationships between physical qualities and distances covered in each game. Eighteen semiprofessional rugby league players (age 23.6 ± 2.8 y) competed in 3 off-side small-sided games (2 × 10-min halves) with a contact bout performed every 2 min. The rules of each game were identical except for the number of contact efforts performed in each bout. Players performed 1, 2, or 3 × 5-s wrestles in the single-, double-, and triple-contact game, respectively. The movement demands (including distance covered and intensity of exercise) in each game were monitored using global positioning system units. Bench-press and back-squat 1-repetition maximum and the 30-15 Intermittent Fitness Test (30-15IFT) assessed muscle strength and high-intensity-running ability, respectively. There was little change in distance covered during the single-contact game (ES = -0.16 to -0.61), whereas there were larger reductions in the double- (ES = -0.52 to -0.81) and triple-contact (ES = -0.50 to -1.15) games. Significant relationships (P < .05) were observed between 30-15IFT and high-speed running during the single- (r = .72) and double- (r = .75), but not triple-contact (r = .20) game. There is little change in running intensity when only single contacts are performed each bout; however, when multiple contacts are performed, greater reductions in running intensity result. In addition, high-intensity-running ability is only associated with running performance when contact demands are low.
Genome sequencing of the redbanded stink bug (Piezodorus guildinii)
USDA-ARS?s Scientific Manuscript database
We assembled a partial genome sequence from the redbanded stink bug, Piezodorus guildinii from Illumina MiSeq sequencing runs. The sequence has been submitted and published under NCBI GenBank Accession Number JTEQ01000000. The BioProject and BioSample Accession numbers are PRJNA263369 and SAMN030997...
Wone, Bernard W M; Yim, Won C; Schutz, Heidi; Meek, Thomas H; Garland, Theodore
2018-04-04
Mitochondrial haplotypes have been associated with human and rodent phenotypes, including nonshivering thermogenesis capacity, learning capability, and disease risk. Although the mammalian mitochondrial D-loop is highly polymorphic, D-loops in laboratory mice are identical, and variation occurs elsewhere mainly between nucleotides 9820 and 9830. Part of this region codes for the tRNA Arg gene and is associated with mitochondrial densities and number of mtDNA copies. We hypothesized that the capacity for high levels of voluntary wheel-running behavior would be associated with mitochondrial haplotype. Here, we analyzed the mtDNA polymorphic region in mice from each of four replicate lines selectively bred for 54 generations for high voluntary wheel running (HR) and from four control lines (Control) randomly bred for 54 generations. Sequencing the polymorphic region revealed a variable number of adenine repeats. Single nucleotide polymorphisms (SNPs) varied from 2 to 3 adenine insertions, resulting in three haplotypes. We found significant genetic differentiations between the HR and Control groups (F st = 0.779, p ≤ 0.0001), as well as among the replicate lines of mice within groups (F sc = 0.757, p ≤ 0.0001). Haplotypes, however, were not strongly associated with voluntary wheel running (revolutions run per day), nor with either body mass or litter size. This system provides a useful experimental model to dissect the physiological processes linking mitochondrial, genomic SNPs, epigenetics, or nuclear-mitochondrial cross-talk to exercise activity. Copyright © 2018. Published by Elsevier B.V.
Low-Bandwidth and Non-Compute Intensive Remote Identification of Microbes from Raw Sequencing Reads
Gautier, Laurent; Lund, Ole
2013-01-01
Cheap DNA sequencing may soon become routine not only for human genomes but also for practically anything requiring the identification of living organisms from their DNA: tracking of infectious agents, control of food products, bioreactors, or environmental samples. We propose a novel general approach to the analysis of sequencing data where a reference genome does not have to be specified. Using a distributed architecture we are able to query a remote server for hints about what the reference might be, transferring a relatively small amount of data. Our system consists of a server with known reference DNA indexed, and a client with raw sequencing reads. The client sends a sample of unidentified reads, and in return receives a list of matching references. Sequences for the references can be retrieved and used for exhaustive computation on the reads, such as alignment. To demonstrate this approach we have implemented a web server, indexing tens of thousands of publicly available genomes and genomic regions from various organisms and returning lists of matching hits from query sequencing reads. We have also implemented two clients: one running in a web browser, and one as a python script. Both are able to handle a large number of sequencing reads and from portable devices (the browser-based running on a tablet), perform its task within seconds, and consume an amount of bandwidth compatible with mobile broadband networks. Such client-server approaches could develop in the future, allowing a fully automated processing of sequencing data and routine instant quality check of sequencing runs from desktop sequencers. A web access is available at http://tapir.cbs.dtu.dk. The source code for a python command-line client, a server, and supplementary data are available at http://bit.ly/1aURxkc. PMID:24391826
Low-bandwidth and non-compute intensive remote identification of microbes from raw sequencing reads.
Gautier, Laurent; Lund, Ole
2013-01-01
Cheap DNA sequencing may soon become routine not only for human genomes but also for practically anything requiring the identification of living organisms from their DNA: tracking of infectious agents, control of food products, bioreactors, or environmental samples. We propose a novel general approach to the analysis of sequencing data where a reference genome does not have to be specified. Using a distributed architecture we are able to query a remote server for hints about what the reference might be, transferring a relatively small amount of data. Our system consists of a server with known reference DNA indexed, and a client with raw sequencing reads. The client sends a sample of unidentified reads, and in return receives a list of matching references. Sequences for the references can be retrieved and used for exhaustive computation on the reads, such as alignment. To demonstrate this approach we have implemented a web server, indexing tens of thousands of publicly available genomes and genomic regions from various organisms and returning lists of matching hits from query sequencing reads. We have also implemented two clients: one running in a web browser, and one as a python script. Both are able to handle a large number of sequencing reads and from portable devices (the browser-based running on a tablet), perform its task within seconds, and consume an amount of bandwidth compatible with mobile broadband networks. Such client-server approaches could develop in the future, allowing a fully automated processing of sequencing data and routine instant quality check of sequencing runs from desktop sequencers. A web access is available at http://tapir.cbs.dtu.dk. The source code for a python command-line client, a server, and supplementary data are available at http://bit.ly/1aURxkc.
Monkoondee, Sarawut; Kuntiya, Ampin; Chaiyaso, Thanongsak; Leksawasdi, Noppol; Techapun, Charin; Kawee-Ai, Arthitaya; Seesuriyachan, Phisit
2016-07-03
This study aimed to investigate the efficiency of an aerobic sequencing batch reactor (aerobic SBR) in a nonsterile system using the application of an experimental design via central composite design (CCD). The acidic whey obtained from lactic acid fermentation by immobilized Lactobacillus plantarum sp. TISTR 2265 was fed into the bioreactor of the aerobic SBR in an appropriate ratio between acidic whey and cheese whey to produce an acidic environment below 4.5 and then was used to support the growth of Dioszegia sp. TISTR 5792 by inhibiting bacterial contamination. At the optimal condition for a high yield of biomass production, the system was run with a hydraulic retention time (HRT) of 4 days, a solid retention time (SRT) of 8.22 days, and an acidic whey concentration of 80% feeding. The chemical oxygen demand (COD) decreased from 25,230 mg/L to 6,928 mg/L, which represented a COD removal of 72.15%. The yield of biomass production and lactose utilization by Dioszegia sp. TISTR 5792 were 13.14 g/L and 33.36%, respectively, with a long run of up to 180 cycles and the pH values of effluent were rose up to 8.32 without any pH adjustment.
Web-Beagle: a web server for the alignment of RNA secondary structures.
Mattei, Eugenio; Pietrosanto, Marco; Ferrè, Fabrizio; Helmer-Citterich, Manuela
2015-07-01
Web-Beagle (http://beagle.bio.uniroma2.it) is a web server for the pairwise global or local alignment of RNA secondary structures. The server exploits a new encoding for RNA secondary structure and a substitution matrix of RNA structural elements to perform RNA structural alignments. The web server allows the user to compute up to 10 000 alignments in a single run, taking as input sets of RNA sequences and structures or primary sequences alone. In the latter case, the server computes the secondary structure prediction for the RNAs on-the-fly using RNAfold (free energy minimization). The user can also compare a set of input RNAs to one of five pre-compiled RNA datasets including lncRNAs and 3' UTRs. All types of comparison produce in output the pairwise alignments along with structural similarity and statistical significance measures for each resulting alignment. A graphical color-coded representation of the alignments allows the user to easily identify structural similarities between RNAs. Web-Beagle can be used for finding structurally related regions in two or more RNAs, for the identification of homologous regions or for functional annotation. Benchmark tests show that Web-Beagle has lower computational complexity, running time and better performances than other available methods. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Gardner, Shea N.; Hall, Barry G.
2013-01-01
Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs) in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc.) from Genbank file(s). We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four “raw read” genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS), and caused at least 50 deaths. PMID:24349125
Gardner, Shea N; Hall, Barry G
2013-01-01
Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs) in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc.) from Genbank file(s). We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four "raw read" genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS), and caused at least 50 deaths.
40 CFR 86.1772-99 - Road load power, test weight, and inertia weight class determination.
Code of Federal Regulations, 2012 CFR
2012-07-01
... vehicle under all-electric power to complete the running loss test fuel tank temperature profile test sequence without air conditioning and the same vehicle tested over the running loss test fuel tank... fan modes with the system set at 72 deg. F. The running loss test fuel tank temperature profile test...
40 CFR 86.1772-99 - Road load power, test weight, and inertia weight class determination.
Code of Federal Regulations, 2011 CFR
2011-07-01
... vehicle under all-electric power to complete the running loss test fuel tank temperature profile test sequence without air conditioning and the same vehicle tested over the running loss test fuel tank... fan modes with the system set at 72 deg. F. The running loss test fuel tank temperature profile test...
40 CFR 86.1772-99 - Road load power, test weight, and inertia weight class determination.
Code of Federal Regulations, 2013 CFR
2013-07-01
... vehicle under all-electric power to complete the running loss test fuel tank temperature profile test sequence without air conditioning and the same vehicle tested over the running loss test fuel tank... fan modes with the system set at 72 deg. F. The running loss test fuel tank temperature profile test...
Wu, Xin; Koslowski, Axel; Thiel, Walter
2012-07-10
In this work, we demonstrate that semiempirical quantum chemical calculations can be accelerated significantly by leveraging the graphics processing unit (GPU) as a coprocessor on a hybrid multicore CPU-GPU computing platform. Semiempirical calculations using the MNDO, AM1, PM3, OM1, OM2, and OM3 model Hamiltonians were systematically profiled for three types of test systems (fullerenes, water clusters, and solvated crambin) to identify the most time-consuming sections of the code. The corresponding routines were ported to the GPU and optimized employing both existing library functions and a GPU kernel that carries out a sequence of noniterative Jacobi transformations during pseudodiagonalization. The overall computation times for single-point energy calculations and geometry optimizations of large molecules were reduced by one order of magnitude for all methods, as compared to runs on a single CPU core.
Alvarado, David M; Yang, Ping; Druley, Todd E; Lovett, Michael; Gurnett, Christina A
2014-06-01
Despite declining sequencing costs, few methods are available for cost-effective single-nucleotide polymorphism (SNP), insertion/deletion (INDEL) and copy number variation (CNV) discovery in a single assay. Commercially available methods require a high investment to a specific region and are only cost-effective for large samples. Here, we introduce a novel, flexible approach for multiplexed targeted sequencing and CNV analysis of large genomic regions called multiplexed direct genomic selection (MDiGS). MDiGS combines biotinylated bacterial artificial chromosome (BAC) capture and multiplexed pooled capture for SNP/INDEL and CNV detection of 96 multiplexed samples on a single MiSeq run. MDiGS is advantageous over other methods for CNV detection because pooled sample capture and hybridization to large contiguous BAC baits reduces sample and probe hybridization variability inherent in other methods. We performed MDiGS capture for three chromosomal regions consisting of ∼ 550 kb of coding and non-coding sequence with DNA from 253 patients with congenital lower limb disorders. PITX1 nonsense and HOXC11 S191F missense mutations were identified that segregate in clubfoot families. Using a novel pooled-capture reference strategy, we identified recurrent chromosome chr17q23.1q23.2 duplications and small HOXC 5' cluster deletions (51 kb and 12 kb). Given the current interest in coding and non-coding variants in human disease, MDiGS fulfills a niche for comprehensive and low-cost evaluation of CNVs, coding, and non-coding variants across candidate regions of interest. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
SW#db: GPU-Accelerated Exact Sequence Similarity Database Search.
Korpar, Matija; Šošić, Martin; Blažeka, Dino; Šikić, Mile
2015-01-01
In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result-the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4-5 times faster than SSEARCH, 6-25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases.
The swimming behavior of flagellated bacteria in viscous and viscoelastic media
NASA Astrophysics Data System (ADS)
Qu, Zijie; Henderikx, Rene; Breuer, Kenneth
2016-11-01
The motility of bacteria E.coli in viscous and viscoelastic fluids has been widely studied although full understanding remains elusive. The swimming mode of wild-type E.coli is well-described by a run-and-tumble sequence in which periods of straight swimming at a constant speed are randomly interrupted by a tumble, defined as a sudden change of direction with a very low speed. Using a tracking microscope, we follow cells for extended periods of time and find that the swimming behavior can be more complex, and can include a wider variety of behaviors including a "slow random walk" in which the cells move at relatively low speed without the characteristic run. Significant variation between individual cells is observed, and furthermore, a single cell can change its motility during the course of a tracking event. Changing the viscosity and viscoelasticy of the swimming media also has profound effects on the average swimming speed and run-tumble nature of the cell motility, including changing the distribution, duration of tumbling and slow random walk events. The reasons for these changes are explained using a Purcell-style resistive force model for the cell and flagellar behavior as well as model for the changes in flagellar bundling in different fluid viscosities. National Science Foundation.
Perceived synchrony for realistic and dynamic audiovisual events.
Eg, Ragnhild; Behne, Dawn M
2015-01-01
In well-controlled laboratory experiments, researchers have found that humans can perceive delays between auditory and visual signals as short as 20 ms. Conversely, other experiments have shown that humans can tolerate audiovisual asynchrony that exceeds 200 ms. This seeming contradiction in human temporal sensitivity can be attributed to a number of factors such as experimental approaches and precedence of the asynchronous signals, along with the nature, duration, location, complexity and repetitiveness of the audiovisual stimuli, and even individual differences. In order to better understand how temporal integration of audiovisual events occurs in the real world, we need to close the gap between the experimental setting and the complex setting of everyday life. With this work, we aimed to contribute one brick to the bridge that will close this gap. We compared perceived synchrony for long-running and eventful audiovisual sequences to shorter sequences that contain a single audiovisual event, for three types of content: action, music, and speech. The resulting windows of temporal integration showed that participants were better at detecting asynchrony for the longer stimuli, possibly because the long-running sequences contain multiple corresponding events that offer audiovisual timing cues. Moreover, the points of subjective simultaneity differ between content types, suggesting that the nature of a visual scene could influence the temporal perception of events. An expected outcome from this type of experiment was the rich variation among participants' distributions and the derived points of subjective simultaneity. Hence, the designs of similar experiments call for more participants than traditional psychophysical studies. Heeding this caution, we conclude that existing theories on multisensory perception are ready to be tested on more natural and representative stimuli.
Grape RNA-Seq analysis pipeline environment
Knowles, David G.; Röder, Maik; Merkel, Angelika; Guigó, Roderic
2013-01-01
Motivation: The avalanche of data arriving since the development of NGS technologies have prompted the need for developing fast, accurate and easily automated bioinformatic tools capable of dealing with massive datasets. Among the most productive applications of NGS technologies is the sequencing of cellular RNA, known as RNA-Seq. Although RNA-Seq provides similar or superior dynamic range than microarrays at similar or lower cost, the lack of standard and user-friendly pipelines is a bottleneck preventing RNA-Seq from becoming the standard for transcriptome analysis. Results: In this work we present a pipeline for processing and analyzing RNA-Seq data, that we have named Grape (Grape RNA-Seq Analysis Pipeline Environment). Grape supports raw sequencing reads produced by a variety of technologies, either in FASTA or FASTQ format, or as prealigned reads in SAM/BAM format. A minimal Grape configuration consists of the file location of the raw sequencing reads, the genome of the species and the corresponding gene and transcript annotation. Grape first runs a set of quality control steps, and then aligns the reads to the genome, a step that is omitted for prealigned read formats. Grape next estimates gene and transcript expression levels, calculates exon inclusion levels and identifies novel transcripts. Grape can be run on a single computer or in parallel on a computer cluster. It is distributed with specific mapping and quantification tools, but given its modular design, any tool supporting popular data interchange formats can be integrated. Availability: Grape can be obtained from the Bioinformatics and Genomics website at: http://big.crg.cat/services/grape. Contact: david.gonzalez@crg.eu or roderic.guigo@crg.eu PMID:23329413
Perceived synchrony for realistic and dynamic audiovisual events
Eg, Ragnhild; Behne, Dawn M.
2015-01-01
In well-controlled laboratory experiments, researchers have found that humans can perceive delays between auditory and visual signals as short as 20 ms. Conversely, other experiments have shown that humans can tolerate audiovisual asynchrony that exceeds 200 ms. This seeming contradiction in human temporal sensitivity can be attributed to a number of factors such as experimental approaches and precedence of the asynchronous signals, along with the nature, duration, location, complexity and repetitiveness of the audiovisual stimuli, and even individual differences. In order to better understand how temporal integration of audiovisual events occurs in the real world, we need to close the gap between the experimental setting and the complex setting of everyday life. With this work, we aimed to contribute one brick to the bridge that will close this gap. We compared perceived synchrony for long-running and eventful audiovisual sequences to shorter sequences that contain a single audiovisual event, for three types of content: action, music, and speech. The resulting windows of temporal integration showed that participants were better at detecting asynchrony for the longer stimuli, possibly because the long-running sequences contain multiple corresponding events that offer audiovisual timing cues. Moreover, the points of subjective simultaneity differ between content types, suggesting that the nature of a visual scene could influence the temporal perception of events. An expected outcome from this type of experiment was the rich variation among participants' distributions and the derived points of subjective simultaneity. Hence, the designs of similar experiments call for more participants than traditional psychophysical studies. Heeding this caution, we conclude that existing theories on multisensory perception are ready to be tested on more natural and representative stimuli. PMID:26082738
Introducing difference recurrence relations for faster semi-global alignment of long sequences.
Suzuki, Hajime; Kasahara, Masahiro
2018-02-19
The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) operations to accelerate calculation of dynamic programming (DP) matrices in the Smith-Waterman-Gotoh (SWG) algorithm with a fixed alignment start position at the origin. Nonetheless, 16-bit or 32-bit integers are necessary for storing the values in a DP matrix when sequences to be aligned are long; this situation hampers the use of the full SIMD width of modern processors. We proposed a faster semi-global alignment algorithm, "difference recurrence relations," that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1. Instead of calculating and storing all the values in a DP matrix directly, our algorithm computes and stores mainly the differences between the values of adjacent cells in the matrix. Although the SWG algorithm and our algorithm can output exactly the same result, our algorithm mainly involves 8-bit integer operations, enabling us to exploit the full width of SIMD operations (e.g., 32) on modern processors. We also developed a library, libgaba, so that developers can easily integrate our algorithm into alignment programs. Our novel algorithm and optimized library implementation will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages. The library is implemented in the C programming language and available at https://github.com/ocxtal/libgaba .
Measurement and Modeling of Fugitive Dust from Off Road DoD Activities
2017-12-08
each soil and vehicle type (see Table 2). Note, no tracked vehicles were run at YTC. CT is the curve track sampling location, CR is the curve ridge...Soil is SL = sandy loam. ...................... 116 Figure 35. Single-event Wind Erosion Evaluation Program (SWEEP) Run example results. ... 121...Figure 36. Single-event Wind Erosion Evaluation Program (SWEEP) Threshold Run example results screen
High-Throughput Next-Generation Sequencing of Polioviruses
Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.
2016-01-01
ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929
Cost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly
Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka
2010-01-01
Background Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. Methodology We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ∼800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. Conclusions The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only ∼US$3 per clone, demonstrating a significant advantage over previous approaches. PMID:20479877
Tvete, Ingunn F; Olsen, Inge C; Fagerland, Morten W; Meland, Nils; Aldrin, Magne; Smerud, Knut T; Holden, Lars
2012-04-01
In active run-in trials, where patients may be excluded after a run-in period based on their response to the treatment, it is implicitly assumed that patients have individual treatment effects. If individual patient data are available, active run-in trials can be modelled using patient-specific random effects. With more than one trial on the same medication available, one can obtain a more precise overall treatment effect estimate. We present a model for joint analysis of a two-sequence, four-period cross-over trial (AABB/BBAA) and a three-sequence, two-period active run-in trial (AB/AA/A), where the aim is to investigate the effect of a new treatment for patients with pain due to osteoarthritis. Our approach enables us to separately estimate the direct treatment effect for all patients, for the patients excluded after the active run-in trial prior to randomisation, and for the patients who completed the active run-in trial. A similar model approach can be used to analyse other types of run-in trials, but this depends on the data and type of other trials available. We assume equality of the various carry-over effects over time. The proposed approach is flexible and can be modified to handle other designs. Our results should be encouraging for those responsible for planning cost-efficient clinical development programmes.
Renard, Bernhard Y.; Xu, Buote; Kirchner, Marc; Zickmann, Franziska; Winter, Dominic; Korten, Simone; Brattig, Norbert W.; Tzur, Amit; Hamprecht, Fred A.; Steen, Hanno
2012-01-01
Currently, the reliable identification of peptides and proteins is only feasible when thoroughly annotated sequence databases are available. Although sequencing capacities continue to grow, many organisms remain without reliable, fully annotated reference genomes required for proteomic analyses. Standard database search algorithms fail to identify peptides that are not exactly contained in a protein database. De novo searches are generally hindered by their restricted reliability, and current error-tolerant search strategies are limited by global, heuristic tradeoffs between database and spectral information. We propose a Bayesian information criterion-driven error-tolerant peptide search (BICEPS) and offer an open source implementation based on this statistical criterion to automatically balance the information of each single spectrum and the database, while limiting the run time. We show that BICEPS performs as well as current database search algorithms when such algorithms are applied to sequenced organisms, whereas BICEPS only uses a remotely related organism database. For instance, we use a chicken instead of a human database corresponding to an evolutionary distance of more than 300 million years (International Chicken Genome Sequencing Consortium (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716). We demonstrate the successful application to cross-species proteomics with a 33% increase in the number of identified proteins for a filarial nematode sample of Litomosoides sigmodontis. PMID:22493179
Single Common Powertrain Lubricant Development
2012-01-01
2 2.2 ENGINE DURABILITY TESTING...Page Figure 1 – General Engine Products 6.5L(T) Test Cell Installation ............................................... 9 Figure 2 ... 2 Run 3 Repeatability Run - 1 Repeatability Run - 2 Repeatability Run - 3 3-Run Average Engine Oil Consumption [lb/hr] 0.061 0.082 0.086 0.076
Hydraulic logic gates: building a digital water computer
NASA Astrophysics Data System (ADS)
Taberlet, Nicolas; Marsal, Quentin; Ferrand, Jérémy; Plihon, Nicolas
2018-03-01
In this article, we propose an easy-to-build hydraulic machine which serves as a digital binary computer. We first explain how an elementary adder can be built from test tubes and pipes (a cup filled with water representing a 1, and empty cup a 0). Using a siphon and a slow drain, the proposed setup combines AND and XOR logical gates in a single device which can add two binary digits. We then show how these elementary units can be combined to construct a full 4-bit adder. The sequencing of the computation is discussed and a water clock can be incorporated so that the machine can run without any exterior intervention.
3D reconstruction software comparison for short sequences
NASA Astrophysics Data System (ADS)
Strupczewski, Adam; Czupryński, BłaŻej
2014-11-01
Large scale multiview reconstruction is recently a very popular area of research. There are many open source tools that can be downloaded and run on a personal computer. However, there are few, if any, comparisons between all the available software in terms of accuracy on small datasets that a single user can create. The typical datasets for testing of the software are archeological sites or cities, comprising thousands of images. This paper presents a comparison of currently available open source multiview reconstruction software for small datasets. It also compares the open source solutions with a simple structure from motion pipeline developed by the authors from scratch with the use of OpenCV and Eigen libraries.
USDA-ARS?s Scientific Manuscript database
Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole...
Aozan: an automated post-sequencing data-processing pipeline.
Perrin, Sandrine; Firmo, Cyril; Lemoine, Sophie; Le Crom, Stéphane; Jourdren, Laurent
2017-07-15
Data management and quality control of output from Illumina sequencers is a disk space- and time-consuming task. Thus, we developed Aozan to automatically handle data transfer, demultiplexing, conversion and quality control once a run has finished. This software greatly improves run data management and the monitoring of run statistics via automatic emails and HTML web reports. Aozan is implemented in Java and Python, supported on Linux systems, and distributed under the GPLv3 License at: http://www.outils.genomique.biologie.ens.fr/aozan/ . Aozan source code is available on GitHub: https://github.com/GenomicParisCentre/aozan . aozan@biologie.ens.fr. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ADEPT, a dynamic next generation sequencing data error-detection program with trimming
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feng, Shihai; Lo, Chien-Chi; Li, Po-E
Illumina is the most widely used next generation sequencing technology and produces millions of short reads that contain errors. These sequencing errors constitute a major problem in applications such as de novo genome assembly, metagenomics analysis and single nucleotide polymorphism discovery. In this study, we present ADEPT, a dynamic error detection method, based on the quality scores of each nucleotide and its neighboring nucleotides, together with their positions within the read and compares this to the position-specific quality score distribution of all bases within the sequencing run. This method greatly improves upon other available methods in terms of the truemore » positive rate of error discovery without affecting the false positive rate, particularly within the middle of reads. We conclude that ADEPT is the only tool to date that dynamically assesses errors within reads by comparing position-specific and neighboring base quality scores with the distribution of quality scores for the dataset being analyzed. The result is a method that is less prone to position-dependent under-prediction, which is one of the most prominent issues in error prediction. The outcome is that ADEPT improves upon prior efforts in identifying true errors, primarily within the middle of reads, while reducing the false positive rate.« less
ADEPT, a dynamic next generation sequencing data error-detection program with trimming
Feng, Shihai; Lo, Chien-Chi; Li, Po-E; ...
2016-02-29
Illumina is the most widely used next generation sequencing technology and produces millions of short reads that contain errors. These sequencing errors constitute a major problem in applications such as de novo genome assembly, metagenomics analysis and single nucleotide polymorphism discovery. In this study, we present ADEPT, a dynamic error detection method, based on the quality scores of each nucleotide and its neighboring nucleotides, together with their positions within the read and compares this to the position-specific quality score distribution of all bases within the sequencing run. This method greatly improves upon other available methods in terms of the truemore » positive rate of error discovery without affecting the false positive rate, particularly within the middle of reads. We conclude that ADEPT is the only tool to date that dynamically assesses errors within reads by comparing position-specific and neighboring base quality scores with the distribution of quality scores for the dataset being analyzed. The result is a method that is less prone to position-dependent under-prediction, which is one of the most prominent issues in error prediction. The outcome is that ADEPT improves upon prior efforts in identifying true errors, primarily within the middle of reads, while reducing the false positive rate.« less
Turton, Jane F; Wright, Laura; Underwood, Anthony; Witney, Adam A; Chan, Yuen-Ting; Al-Shahib, Ali; Arnold, Catherine; Doumith, Michel; Patel, Bharat; Planche, Timothy D; Green, Jonathan; Holliman, Richard; Woodford, Neil
2015-08-01
Whole-genome sequencing (WGS) was carried out on 87 isolates of sequence type 111 (ST-111) of Pseudomonas aeruginosa collected between 2005 and 2014 from 65 patients and 12 environmental isolates from 24 hospital laboratories across the United Kingdom on an Illumina HiSeq instrument. Most isolates (73) carried VIM-2, but others carried IMP-1 or IMP-13 (5) or NDM-1 (1); one isolate had VIM-2 and IMP-18, and 7 carried no metallo-beta-lactamase (MBL) gene. Single nucleotide polymorphism analysis divided the isolates into distinct clusters; the NDM-1 isolate was an outlier, and the IMP isolates and 6/7 MBL-negative isolates clustered separately from the main set of 73 VIM-2 isolates. Within the VIM-2 set, there were at least 3 distinct clusters, including a tightly clustered set of isolates from 3 hospital laboratories consistent with an outbreak from a single introduction that was quickly brought under control and a much broader set dominated by isolates from a long-running outbreak in a London hospital likely seeded from an environmental source, requiring different control measures; isolates from 7 other hospital laboratories in London and southeast England were also included. Bayesian evolutionary analysis indicated that all the isolates shared a common ancestor dating back ∼50 years (1960s), with the main VIM-2 set separating approximately 20 to 30 years ago. Accessory gene profiling revealed blocks of genes associated with particular clusters, with some having high similarity (≥95%) to bacteriophage genes. WGS of widely found international lineages such as ST-111 provides the necessary resolution to inform epidemiological investigations and intervention policies. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Moser, Lindsey A.; Ramirez-Carvajal, Lisbeth; Puri, Vinita; Pauszek, Steven J.; Matthews, Krystal; Dilley, Kari A.; Mullan, Clancy; McGraw, Jennifer; Khayat, Michael; Beeri, Karen; Yee, Anthony; Dugan, Vivien; Heise, Mark T.; Frieman, Matthew B.; Rodriguez, Luis L.; Bernard, Kristen A.; Wentworth, David E.
2016-01-01
ABSTRACT Several biosafety level 3 and/or 4 (BSL-3/4) pathogens are high-consequence, single-stranded RNA viruses, and their genomes, when introduced into permissive cells, are infectious. Moreover, many of these viruses are select agents (SAs), and their genomes are also considered SAs. For this reason, cDNAs and/or their derivatives must be tested to ensure the absence of infectious virus and/or viral RNA before transfer out of the BSL-3/4 and/or SA laboratory. This tremendously limits the capacity to conduct viral genomic research, particularly the application of next-generation sequencing (NGS). Here, we present a sequence-independent method to rapidly amplify viral genomic RNA while simultaneously abolishing both viral and genomic RNA infectivity across multiple single-stranded positive-sense RNA (ssRNA+) virus families. The process generates barcoded DNA amplicons that range in length from 300 to 1,000 bp, which cannot be used to rescue a virus and are stable to transport at room temperature. Our barcoding approach allows for up to 288 barcoded samples to be pooled into a single library and run across various NGS platforms without potential reconstitution of the viral genome. Our data demonstrate that this approach provides full-length genomic sequence information not only from high-titer virion preparations but it can also recover specific viral sequence from samples with limited starting material in the background of cellular RNA, and it can be used to identify pathogens from unknown samples. In summary, we describe a rapid, universal standard operating procedure that generates high-quality NGS libraries free of infectious virus and infectious viral RNA. IMPORTANCE This report establishes and validates a standard operating procedure (SOP) for select agents (SAs) and other biosafety level 3 and/or 4 (BSL-3/4) RNA viruses to rapidly generate noninfectious, barcoded cDNA amenable for next-generation sequencing (NGS). This eliminates the burden of testing all processed samples derived from high-consequence pathogens prior to transfer from high-containment laboratories to lower-containment facilities for sequencing. Our established protocol can be scaled up for high-throughput sequencing of hundreds of samples simultaneously, which can dramatically reduce the cost and effort required for NGS library construction. NGS data from this SOP can provide complete genome coverage from viral stocks and can also detect virus-specific reads from limited starting material. Our data suggest that the procedure can be implemented and easily validated by institutional biosafety committees across research laboratories. PMID:27822536
NASA Technical Reports Server (NTRS)
Horvath, Joan C.; Alkalaj, Leon J.; Schneider, Karl M.; Amador, Arthur V.; Spitale, Joseph N.
1993-01-01
Robotic spacecraft are controlled by sets of commands called 'sequences.' These sequences must be checked against mission constraints. Making our existing constraint checking program faster would enable new capabilities in our uplink process. Therefore, we are rewriting this program to run on a parallel computer. To do so, we had to determine how to run constraint-checking algorithms in parallel and create a new method of specifying spacecraft models and constraints. This new specification gives us a means of representing flight systems and their predicted response to commands which could be used in a variety of applications throughout the command process, particularly during anomaly or high-activity operations. This commonality could reduce operations cost and risk for future complex missions. Lessons learned in applying some parts of this system to the TOPEX/Poseidon mission will be described.
High-throughput full-length single-cell mRNA-seq of rare cells.
Ooi, Chin Chun; Mantalas, Gary L; Koh, Winston; Neff, Norma F; Fuchigami, Teruaki; Wong, Dawson J; Wilson, Robert J; Park, Seung-Min; Gambhir, Sanjiv S; Quake, Stephen R; Wang, Shan X
2017-01-01
Single-cell characterization techniques, such as mRNA-seq, have been applied to a diverse range of applications in cancer biology, yielding great insight into mechanisms leading to therapy resistance and tumor clonality. While single-cell techniques can yield a wealth of information, a common bottleneck is the lack of throughput, with many current processing methods being limited to the analysis of small volumes of single cell suspensions with cell densities on the order of 107 per mL. In this work, we present a high-throughput full-length mRNA-seq protocol incorporating a magnetic sifter and magnetic nanoparticle-antibody conjugates for rare cell enrichment, and Smart-seq2 chemistry for sequencing. We evaluate the efficiency and quality of this protocol with a simulated circulating tumor cell system, whereby non-small-cell lung cancer cell lines (NCI-H1650 and NCI-H1975) are spiked into whole blood, before being enriched for single-cell mRNA-seq by EpCAM-functionalized magnetic nanoparticles and the magnetic sifter. We obtain high efficiency (> 90%) capture and release of these simulated rare cells via the magnetic sifter, with reproducible transcriptome data. In addition, while mRNA-seq data is typically only used for gene expression analysis of transcriptomic data, we demonstrate the use of full-length mRNA-seq chemistries like Smart-seq2 to facilitate variant analysis of expressed genes. This enables the use of mRNA-seq data for differentiating cells in a heterogeneous population by both their phenotypic and variant profile. In a simulated heterogeneous mixture of circulating tumor cells in whole blood, we utilize this high-throughput protocol to differentiate these heterogeneous cells by both their phenotype (lung cancer versus white blood cells), and mutational profile (H1650 versus H1975 cells), in a single sequencing run. This high-throughput method can help facilitate single-cell analysis of rare cell populations, such as circulating tumor or endothelial cells, with demonstrably high-quality transcriptomic data.
[Evaluation on running status of Chinese Polio Laboratories Network in 2008].
Zhu, Shuang-li; Yan, Dong-mei; Zhu, Hui
2010-04-01
In order to evaluate the running status and provide the laboratory data for maintaining polio-free status in China, the virology surveillance database of Chinese Polio Laboratories Network (not include Hong Kong, Macao, and Taiwan)in 2008 were analyzed. The case investigation data of Acute Flaccid Paralysis(AFP)cases reported by 31 provinces (municipal, autonomous regions) through EPI surveillance information management system and the database of National Polio Laboratory (NPL) were analyzed, and the indicators of running status of Chinese Polio Laboratories Network were evaluated. 10,116 stool samples were collected from 5116 AFP cases by Chinese Polio Laboratories Network in 2008, and viral isolation and identification of all stool samples were done according to 4th World Health Organization (WHO) Polio Laboratory Manual. The rate of viral isolation and identification performed within 28d was 94.9%. 189 polioviruses (PV) and 597 of non-polio enteroviruses (NPEV) were isolated from AFP cases, the isolatien rates were 3.72% and 11.74% respectively. 251 polio positive isolates were sent to NPL from 31 provincial polio laboratories. There were 318 single serotype PVs were performed VPI sequencing. And no wild polioviruses and Vaccine-derived Polioviruses (VDPVs) were found in 2008. NPL passed the proficiency test and got full accreditation for on-site review by WHO experts in 2008. All 31 provincial Polio laboratories passed the proficiency test with the same panel as NPL, and 13 provincial Polio laboratories joined and passed the on-site review by WHO experts. The running status of Chinese Polio Laboratories Network was good, polio-free status was maintained in China in 2008. The Chinese polio laboratories network running is normaly, the laboratory surveillance system was sensitive and laboratory data were provided for maintaining the polio-free status in China.
A better sequence-read simulator program for metagenomics.
Johnson, Stephen; Trost, Brett; Long, Jeffrey R; Pittet, Vanessa; Kusalik, Anthony
2014-01-01
There are many programs available for generating simulated whole-genome shotgun sequence reads. The data generated by many of these programs follow predefined models, which limits their use to the authors' original intentions. For example, many models assume that read lengths follow a uniform or normal distribution. Other programs generate models from actual sequencing data, but are limited to reads from single-genome studies. To our knowledge, there are no programs that allow a user to generate simulated data following non-parametric read-length distributions and quality profiles based on empirically-derived information from metagenomics sequencing data. We present BEAR (Better Emulation for Artificial Reads), a program that uses a machine-learning approach to generate reads with lengths and quality values that closely match empirically-derived distributions. BEAR can emulate reads from various sequencing platforms, including Illumina, 454, and Ion Torrent. BEAR requires minimal user input, as it automatically determines appropriate parameter settings from user-supplied data. BEAR also uses a unique method for deriving run-specific error rates, and extracts useful statistics from the metagenomic data itself, such as quality-error models. Many existing simulators are specific to a particular sequencing technology; however, BEAR is not restricted in this way. Because of its flexibility, BEAR is particularly useful for emulating the behaviour of technologies like Ion Torrent, for which no dedicated sequencing simulators are currently available. BEAR is also the first metagenomic sequencing simulator program that automates the process of generating abundances, which can be an arduous task. BEAR is useful for evaluating data processing tools in genomics. It has many advantages over existing comparable software, such as generating more realistic reads and being independent of sequencing technology, and has features particularly useful for metagenomics work.
A comprehensive quality control workflow for paired tumor-normal NGS experiments.
Schroeder, Christopher M; Hilke, Franz J; Löffler, Markus W; Bitzer, Michael; Lenz, Florian; Sturm, Marc
2017-06-01
Quality control (QC) is an important part of all NGS data analysis stages. Many available tools calculate QC metrics from different analysis steps of single sample experiments (raw reads, mapped reads and variant lists). Multi-sample experiments, as sequencing of tumor-normal pairs, require additional QC metrics to ensure validity of results. These multi-sample QC metrics still lack standardization. We therefore suggest a new workflow for QC of DNA sequencing of tumor-normal pairs. With this workflow well-known single-sample QC metrics and additional metrics specific for tumor-normal pairs can be calculated. The segmentation into different tools offers a high flexibility and allows reuse for other purposes. All tools produce qcML, a generic XML format for QC of -omics experiments. qcML uses quality metrics defined in an ontology, which was adapted for NGS. All QC tools are implemented in C ++ and run both under Linux and Windows. Plotting requires python 2.7 and matplotlib. The software is available under the 'GNU General Public License version 2' as part of the ngs-bits project: https://github.com/imgag/ngs-bits. christopher.schroeder@med.uni-tuebingen.de. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
On the role of verbalization during task set selection: switching or serial order control?
Bryck, Richard L; Mayr, Ulrich
2005-06-01
Recent task-switching work in which paper-and-pencil administered single-task lists were compared with task-alternation lists has demonstrated large increases in task-switch costs with concurrent articulatory suppression (AS), implicating a crucial role for verbalization during switching (Baddeley, Chincotta, & Adlam, 2001; Emerson & Miyake, 2003). Experiment 1 replicated this result, using computerized assessment, albeit with much smaller effect sizes than in the original reports. In Experiment 2, AS interference was reduced when a sequential cue (spatial location) that indicated the current position in the sequence of task alternations was given. Finally, in Experiment 3, switch trials and no-switch trials were compared within a block of alternating runs of two tasks. Again, AS interference was obtained mainly when the endogenous sequencing demand was high, and it was comparable for no-switch and switch trials. These results suggest that verbalization may be critical for endogenous maintenance and updating of a sequential plan, rather than exclusively for the actual switching process.
MOCAT: A Metagenomics Assembly and Gene Prediction Toolkit
Li, Junhua; Chen, Weineng; Chen, Hua; Mende, Daniel R.; Arumugam, Manimozhiyan; Pan, Qi; Liu, Binghang; Qin, Junjie; Wang, Jun; Bork, Peer
2012-01-01
MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/. PMID:23082188
MOCAT: a metagenomics assembly and gene prediction toolkit.
Kultima, Jens Roat; Sunagawa, Shinichi; Li, Junhua; Chen, Weineng; Chen, Hua; Mende, Daniel R; Arumugam, Manimozhiyan; Pan, Qi; Liu, Binghang; Qin, Junjie; Wang, Jun; Bork, Peer
2012-01-01
MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/.
Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato; Takeyama, Haruko
2017-07-12
Massively parallel single-cell genome sequencing is required to further understand genetic diversities in complex biological systems. Whole genome amplification (WGA) is the first step for single-cell sequencing, but its throughput and accuracy are insufficient in conventional reaction platforms. Here, we introduce single droplet multiple displacement amplification (sd-MDA), a method that enables massively parallel amplification of single cell genomes while maintaining sequence accuracy and specificity. Tens of thousands of single cells are compartmentalized in millions of picoliter droplets and then subjected to lysis and WGA by passive droplet fusion in microfluidic channels. Because single cells are isolated in compartments, their genomes are amplified to saturation without contamination. This enables the high-throughput acquisition of contamination-free and cell specific sequence reads from single cells (21,000 single-cells/h), resulting in enhancement of the sequence data quality compared to conventional methods. This method allowed WGA of both single bacterial cells and human cancer cells. The obtained sequencing coverage rivals those of conventional techniques with superior sequence quality. In addition, we also demonstrate de novo assembly of uncultured soil bacteria and obtain draft genomes from single cell sequencing. This sd-MDA is promising for flexible and scalable use in single-cell sequencing.
Chen, Gang; Wang, Feng; Dillenburger, Barbara C.; Friedman, Robert M.; Chen, Li M.; Gore, John C.; Avison, Malcolm J.; Roe, Anna W.
2011-01-01
Functional magnetic resonance imaging (fMRI), at high magnetic field strength can suffer from serious degradation of image quality because of motion and physiological noise, as well as spatial distortions and signal losses due to susceptibility effects. Overcoming such limitations is essential for sensitive detection and reliable interpretation of fMRI data. These issues are particularly problematic in studies of awake animals. As part of our initial efforts to study functional brain activations in awake, behaving monkeys using fMRI at 4.7T, we have developed acquisition and analysis procedures to improve image quality with encouraging results. We evaluated the influence of two main variables on image quality. First, we show how important the level of behavioral training is for obtaining good data stability and high temporal signal-to-noise ratios. In initial sessions, our typical scan session lasted 1.5 hours, partitioned into short (<10 minutes) runs. During reward periods and breaks between runs, the monkey exhibited movements resulting in considerable image misregistrations. After a few months of extensive behavioral training, we were able to increase the length of individual runs and the total length of each session. The monkey learned to wait until the end of a block for fluid reward, resulting in longer periods of continuous acquisition. Each additional 60 training sessions extended the duration of each session by 60 minutes, culminating, after about 140 training sessions, in sessions that last about four hours. As a result, the average translational movement decreased from over 500 μm to less than 80 μm, a displacement close to that observed in anesthetized monkeys scanned in a 7 T horizontal scanner. Another major source of distortion at high fields arises from susceptibility variations. To reduce such artifacts, we used segmented gradient-echo echo-planar imaging (EPI) sequences. Increasing the number of segments significantly decreased susceptibility artifacts and image distortion. Comparisons of images from functional runs using four segments with those using a single-shot EPI sequence revealed a roughly two-fold improvement in functional signal-to-noise-ratio and 50% decrease in distortion. These methods enabled reliable detection of neural activation and permitted blood-oxygenation-level-dependent (BOLD) based mapping of early visual areas in monkeys using a volume coil. In summary, both extensive behavioral training of monkeys and application of segmented gradient-echo EPI sequence improved signal-to-noise and image quality. Understanding the effects these factors have is important for the application of high field imaging methods to the detection of sub-millimeter functional structures in the awake monkey brain. PMID:22055855
Preparation of protein samples for mass spectrometry and N-terminal sequencing.
Glenn, Gary
2014-01-01
The preparation of protein samples for mass spectrometry and N-terminal sequencing is a key step in successfully identifying proteins. Mass spectrometry is a very sensitive technique, and as such, samples must be prepared carefully since they can be subject to contamination of the sample (e.g., due to incomplete subcellular fractionation or purification of a multiprotein complex), overwhelming of the sample by highly abundant proteins, and contamination from skin or hair (keratin can be a very common hit). One goal of sample preparation for mass spec is to reduce the complexity of the sample - in the example presented here, mitochondria are purified, solubilized, and fractionated by sucrose density gradient sedimentation prior to preparative 1D SDS-PAGE. It is important to verify the purity and integrity of the sample so that you can have confidence in the hits obtained. More protein is needed for N-terminal sequencing and ideally it should be purified to a single band when run on an SDS-polyacrylamide gel. The example presented here involves stably expressing a tagged protein in HEK293 cells and then isolating the protein by affinity purification and SDS-PAGE. © 2014 Elsevier Inc. All rights reserved.
McCann, Joshua C.; Wickersham, Tryon A.; Loor, Juan J.
2014-01-01
Diversity in the forestomach microbiome is one of the key features of ruminant animals. The diverse microbial community adapts to a wide array of dietary feedstuffs and management strategies. Understanding rumen microbiome composition, adaptation, and function has global implications ranging from climatology to applied animal production. Classical knowledge of rumen microbiology was based on anaerobic, culture-dependent methods. Next-generation sequencing and other molecular techniques have uncovered novel features of the rumen microbiome. For instance, pyrosequencing of the 16S ribosomal RNA gene has revealed the taxonomic identity of bacteria and archaea to the genus level, and when complemented with barcoding adds multiple samples to a single run. Whole genome shotgun sequencing generates true metagenomic sequences to predict the functional capability of a microbiome, and can also be used to construct genomes of isolated organisms. Integration of high-throughput data describing the rumen microbiome with classic fermentation and animal performance parameters has produced meaningful advances and opened additional areas for study. In this review, we highlight recent studies of the rumen microbiome in the context of cattle production focusing on nutrition, rumen development, animal efficiency, and microbial function. PMID:24940050
An improved Four-Russians method and sparsified Four-Russians algorithm for RNA folding.
Frid, Yelena; Gusfield, Dan
2016-01-01
The basic RNA secondary structure prediction problem or single sequence folding problem (SSF) was solved 35 years ago by a now well-known [Formula: see text]-time dynamic programming method. Recently three methodologies-Valiant, Four-Russians, and Sparsification-have been applied to speedup RNA secondary structure prediction. The sparsification method exploits two properties of the input: the number of subsequence Z with the endpoints belonging to the optimal folding set and the maximum number base-pairs L. These sparsity properties satisfy [Formula: see text] and [Formula: see text], and the method reduces the algorithmic running time to O(LZ). While the Four-Russians method utilizes tabling partial results. In this paper, we explore three different algorithmic speedups. We first expand the reformulate the single sequence folding Four-Russians [Formula: see text]-time algorithm, to utilize an on-demand lookup table. Second, we create a framework that combines the fastest Sparsification and new fastest on-demand Four-Russians methods. This combined method has worst-case running time of [Formula: see text], where [Formula: see text] and [Formula: see text]. Third we update the Four-Russians formulation to achieve an on-demand [Formula: see text]-time parallel algorithm. This then leads to an asymptotic speedup of [Formula: see text] where [Formula: see text] and [Formula: see text] the number of subsequence with the endpoint j belonging to the optimal folding set. The on-demand formulation not only removes all extraneous computation and allows us to incorporate more realistic scoring schemes, but leads us to take advantage of the sparsity properties. Through asymptotic analysis and empirical testing on the base-pair maximization variant and a more biologically informative scoring scheme, we show that this Sparse Four-Russians framework is able to achieve a speedup on every problem instance, that is asymptotically never worse, and empirically better than achieved by the minimum of the two methods alone.
García-Cañas, Virginia; Mondello, Monica; Cifuentes, Alejandro
2010-07-01
In this work, an innovative method useful to simultaneously analyze multiple genetically modified organisms is described. The developed method consists in the combination of multiplex ligation-dependent genome dependent amplification (MLGA) with CGE and LIF detection using bare-fused silica capillaries. The MLGA process is based on oligonucleotide constructs, formed by a universal sequence (vector) and long specific oligonucleotides (selectors) that facilitate the circularization of specific DNA target regions. Subsequently, the circularized target sequences are simultaneously amplified with the same couple of primers and analyzed by CGE-LIF using a bare-fused silica capillary and a run electrolyte containing 2-hydroxyethyl cellulose acting as both sieving matrix and dynamic capillary coating. CGE-LIF is shown to be very useful and informative for optimizing MLGA parameters such as annealing temperature, number of ligation cycles, and selector probes concentration. We demonstrate the specificity of the method in detecting the presence of transgenic DNA in certified reference and raw commercial samples. The method developed is sensitive and allows the simultaneous detection in a single run of percentages of transgenic maize as low as 1% of GA21, 1% of MON863, and 1% of MON810 in maize samples with signal-to-noise ratios for the corresponding DNA peaks of 15, 12, and 26, respectively. These results demonstrate, to our knowledge for the first time, the great possibilities of MLGA techniques for genetically modified organisms analysis.
Highly multiplexed targeted DNA sequencing from single nuclei.
Leung, Marco L; Wang, Yong; Kim, Charissa; Gao, Ruli; Jiang, Jerry; Sei, Emi; Navin, Nicholas E
2016-02-01
Single-cell DNA sequencing methods are challenged by poor physical coverage, high technical error rates and low throughput. To address these issues, we developed a single-cell DNA sequencing protocol that combines flow-sorting of single nuclei, time-limited multiple-displacement amplification (MDA), low-input library preparation, DNA barcoding, targeted capture and next-generation sequencing (NGS). This approach represents a major improvement over our previous single nucleus sequencing (SNS) Nature Protocols paper in terms of generating higher-coverage data (>90%), thereby enabling the detection of genome-wide variants in single mammalian cells at base-pair resolution. Furthermore, by pooling 48-96 single-cell libraries together for targeted capture, this approach can be used to sequence many single-cell libraries in parallel in a single reaction. This protocol greatly reduces the cost of single-cell DNA sequencing, and it can be completed in 5-6 d by advanced users. This single-cell DNA sequencing protocol has broad applications for studying rare cells and complex populations in diverse fields of biological research and medicine.
Applications of Single-Cell Sequencing for Multiomics.
Xu, Yungang; Zhou, Xiaobo
2018-01-01
Single-cell sequencing interrogates the sequence or chromatin information from individual cells with advanced next-generation sequencing technologies. It provides a higher resolution of cellular differences and a better understanding of the underlying genetic and epigenetic mechanisms of an individual cell in the context of its survival and adaptation to microenvironment. However, it is more challenging to perform single-cell sequencing and downstream data analysis, owing to the minimal amount of starting materials, sample loss, and contamination. In addition, due to the picogram level of the amount of nucleic acids used, heavy amplification is often needed during sample preparation of single-cell sequencing, resulting in the uneven coverage, noise, and inaccurate quantification of sequencing data. All these unique properties raise challenges in and thus high demands for computational methods that specifically fit single-cell sequencing data. We here comprehensively survey the current strategies and challenges for multiple single-cell sequencing, including single-cell transcriptome, genome, and epigenome, beginning with a brief introduction to multiple sequencing techniques for single cells.
Northern Cascadia Subduction Zone Earthquake Records from Onshore and Offshore Core Data
NASA Astrophysics Data System (ADS)
Hausmann, R. B.; Goldfinger, C.; Black, B.; Romsos, C. G.; Galer, S.; Collins, T.
2016-12-01
We are investigating the paleoseismic record at Bull Run Lake, at the latitude of Portland, Oregon, central Cascadia margin. Bull Run is a landslide dammed lake in a cirque basin on the western flanks of Mt. Hood, 65 km east of Portland, and is the City of Portland's primary water supply. We collected full coverage high-resolution multibeam and backscatter data, high resolution CHIRP sub-bottom profiles, and seven sediment cores which contain a correlative turbidite sequence of post Mazama beds. The continuity of the turbidite record shows little or no relationship to the minor stream inlets, suggesting the disturbance beds are not likely to be storm related. CT and physical property data were used to separate major visible beds and background sedimentation, which also contain thin laminae. The XRF element Compton scattering may show grading due to mineralogical variation and a change in wave profile, commonly found at bed boundaries. We have identified 27 post -Mazama event beds and 5 ashes in the lake, and constructed an OxCal age model anchored by radiocarbon ages, the Mazama ash, and the twin Timberline ash beds. The radiocarbon ages, age model results, as well as electron microprobe (EMP) data clearly identify the Mazama ash at the base of our cores. Two closely-spaced ash beds in our cores likely correlate to the Timberline eruptive period at 1.5ka. The number, timing and sequence of the event beds, and physical property log correlation, as well as key bed characteristics, closely matches offshore turbidite sequences off northern Oregon. For example, key regional bed T11, observed as a thick two-pulse bed in all offshore cores, also anchors the Bull Run sequence. One difference is that the twin Timberline ash occupies the stratigraphic position of regional offshore paleoseismic bed T4, which is also a two pulse event at this latitude. The cores also contain many faint laminae that may contain a storm record, however, the identification of small beds is complicated by the low sedimentation rate and low resolution of the Bull Run cores. The watershed and lake may also contain evidence of crustal faulting, though the event sequence appears to be primarily that of the Cascadia subduction zone earthquake sequence. See also Goldfinger et al. for investigation of slope stability and ground motions at Bull Run and other Cascadia lakes.
Mercury BLASTP: Accelerating Protein Sequence Alignment
Jacob, Arpith; Lancaster, Joseph; Buhler, Jeremy; Harris, Brandon; Chamberlain, Roger D.
2008-01-01
Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence databases has required either exponentially more running time or a cluster of machines to keep pace. To address this problem, we have designed and built a high-performance FPGA-accelerated version of BLASTP, Mercury BLASTP. In this paper, we describe the architecture of the portions of the application that are accelerated in the FPGA, and we also describe the integration of these FPGA-accelerated portions with the existing BLASTP software. We have implemented Mercury BLASTP on a commodity workstation with two Xilinx Virtex-II 6000 FPGAs. We show that the new design runs 11-15 times faster than software BLASTP on a modern CPU while delivering close to 99% identical results. PMID:19492068
Mismatch and G-Stack Modulated Probe Signals on SNP Microarrays
Binder, Hans; Fasold, Mario; Glomb, Torsten
2009-01-01
Background Single nucleotide polymorphism (SNP) arrays are important tools widely used for genotyping and copy number estimation. This technology utilizes the specific affinity of fragmented DNA for binding to surface-attached oligonucleotide DNA probes. We analyze the variability of the probe signals of Affymetrix GeneChip SNP arrays as a function of the probe sequence to identify relevant sequence motifs which potentially cause systematic biases of genotyping and copy number estimates. Methodology/Principal Findings The probe design of GeneChip SNP arrays enables us to disentangle different sources of intensity modulations such as the number of mismatches per duplex, matched and mismatched base pairings including nearest and next-nearest neighbors and their position along the probe sequence. The effect of probe sequence was estimated in terms of triple-motifs with central matches and mismatches which include all 256 combinations of possible base pairings. The probe/target interactions on the chip can be decomposed into nearest neighbor contributions which correlate well with free energy terms of DNA/DNA-interactions in solution. The effect of mismatches is about twice as large as that of canonical pairings. Runs of guanines (G) and the particular type of mismatched pairings formed in cross-allelic probe/target duplexes constitute sources of systematic biases of the probe signals with consequences for genotyping and copy number estimates. The poly-G effect seems to be related to the crowded arrangement of probes which facilitates complex formation of neighboring probes with at minimum three adjacent G's in their sequence. Conclusions The applied method of “triple-averaging” represents a model-free approach to estimate the mean intensity contributions of different sequence motifs which can be applied in calibration algorithms to correct signal values for sequence effects. Rules for appropriate sequence corrections are suggested. PMID:19924253
Single-Cell Sequencing for Drug Discovery and Drug Development.
Wu, Hongjin; Wang, Charles; Wu, Shixiu
2017-01-01
Next-generation sequencing (NGS), particularly single-cell sequencing, has revolutionized the scale and scope of genomic and biomedical research. Recent technological advances in NGS and singlecell studies have made the deep whole-genome (DNA-seq), whole epigenome and whole-transcriptome sequencing (RNA-seq) at single-cell level feasible. NGS at the single-cell level expands our view of genome, epigenome and transcriptome and allows the genome, epigenome and transcriptome of any organism to be explored without a priori assumptions and with unprecedented throughput. And it does so with single-nucleotide resolution. NGS is also a very powerful tool for drug discovery and drug development. In this review, we describe the current state of single-cell sequencing techniques, which can provide a new, more powerful and precise approach for analyzing effects of drugs on treated cells and tissues. Our review discusses single-cell whole genome/exome sequencing (scWGS/scWES), single-cell transcriptome sequencing (scRNA-seq), single-cell bisulfite sequencing (scBS), and multiple omics of single-cell sequencing. We also highlight the advantages and challenges of each of these approaches. Finally, we describe, elaborate and speculate the potential applications of single-cell sequencing for drug discovery and drug development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
DOT National Transportation Integrated Search
2012-09-01
The Center for Health and Safety Culture conducted research for the Idaho Transportation Department to develop media messages and tools to reduce fatalities and serious injuries related to Run-Off-the-Road, single-vehicle crashes in Idaho using the P...
Lidierth, Malcolm
2005-02-15
This paper describes software that runs in the Spike2 for Windows environment and provides a versatile tool for generating stimuli during data acquisition from the 1401 family of interfaces (CED, UK). A graphical user interface (GUI) is used to provide dynamic control of stimulus timing. Both single stimuli and trains of stimuli can be generated. The pulse generation routines make use of programmable variables within the interface and allow these to be rapidly changed during an experiment. The routines therefore provide the ease-of-use associated with external, stand-alone pulse generators. Complex stimulus protocols can be loaded from an external text file and facilities are included to create these files through the GUI. The software consists of a Spike2 script that runs in the host PC, and accompanying routines written in the 1401 sequencer control code, that run in the 1401 interface. Handshaking between the PC and the interface card are built into the routines and provides for full integration of sampling, analysis and stimulus generation during an experiment. Control of the 1401 digital-to-analogue converters is also provided; this allows control of stimulus amplitude as well as timing and also provides a sample-hold feature that may be used to remove DC offsets and drift from recorded data.
NG6: Integrated next generation sequencing storage and processing environment.
Mariette, Jérôme; Escudié, Frédéric; Allias, Nicolas; Salin, Gérald; Noirot, Céline; Thomas, Sylvain; Klopp, Christophe
2012-09-09
Next generation sequencing platforms are now well implanted in sequencing centres and some laboratories. Upcoming smaller scale machines such as the 454 junior from Roche or the MiSeq from Illumina will increase the number of laboratories hosting a sequencer. In such a context, it is important to provide these teams with an easily manageable environment to store and process the produced reads. We describe a user-friendly information system able to manage large sets of sequencing data. It includes, on one hand, a workflow environment already containing pipelines adapted to different input formats (sff, fasta, fastq and qseq), different sequencers (Roche 454, Illumina HiSeq) and various analyses (quality control, assembly, alignment, diversity studies,…) and, on the other hand, a secured web site giving access to the results. The connected user will be able to download raw and processed data and browse through the analysis result statistics. The provided workflows can easily be modified or extended and new ones can be added. Ergatis is used as a workflow building, running and monitoring system. The analyses can be run locally or in a cluster environment using Sun Grid Engine. NG6 is a complete information system designed to answer the needs of a sequencing platform. It provides a user-friendly interface to process, store and download high-throughput sequencing data.
Ergodicity of financial indices
NASA Astrophysics Data System (ADS)
Kolesnikov, A. V.; Rühl, T.
2010-05-01
We introduce the concept of the ensemble averaging for financial markets. We address the question of equality of ensemble and time averaging in their sequence and investigate if these averagings are equivalent for large amount of equity indices and branches. We start with the model of Gaussian-distributed returns, equal-weighted stocks in each index and absence of correlations within a single day and show that even this oversimplified model captures already the run of the corresponding index reasonably well due to its self-averaging properties. We introduce the concept of the instant cross-sectional volatility and discuss its relation to the ordinary time-resolved counterpart. The role of the cross-sectional volatility for the description of the corresponding index as well as the role of correlations between the single stocks and the role of non-Gaussianity of stock distributions is briefly discussed. Our model reveals quickly and efficiently some anomalies or bubbles in a particular financial market and gives an estimate of how large these effects can be and how quickly they disappear.
Phyx: phylogenetic tools for unix.
Brown, Joseph W; Walker, Joseph F; Smith, Stephen A
2017-06-15
The ease with which phylogenomic data can be generated has drastically escalated the computational burden for even routine phylogenetic investigations. To address this, we present phyx : a collection of programs written in C ++ to explore, manipulate, analyze and simulate phylogenetic objects (alignments, trees and MCMC logs). Modelled after Unix/GNU/Linux command line tools, individual programs perform a single task and operate on standard I/O streams that can be piped to quickly and easily form complex analytical pipelines. Because of the stream-centric paradigm, memory requirements are minimized (often only a single tree or sequence in memory at any instance), and hence phyx is capable of efficiently processing very large datasets. phyx runs on POSIX-compliant operating systems. Source code, installation instructions, documentation and example files are freely available under the GNU General Public License at https://github.com/FePhyFoFum/phyx. eebsmith@umich.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Phyx: phylogenetic tools for unix
Brown, Joseph W.; Walker, Joseph F.; Smith, Stephen A.
2017-01-01
Abstract Summary: The ease with which phylogenomic data can be generated has drastically escalated the computational burden for even routine phylogenetic investigations. To address this, we present phyx: a collection of programs written in C ++ to explore, manipulate, analyze and simulate phylogenetic objects (alignments, trees and MCMC logs). Modelled after Unix/GNU/Linux command line tools, individual programs perform a single task and operate on standard I/O streams that can be piped to quickly and easily form complex analytical pipelines. Because of the stream-centric paradigm, memory requirements are minimized (often only a single tree or sequence in memory at any instance), and hence phyx is capable of efficiently processing very large datasets. Availability and Implementation: phyx runs on POSIX-compliant operating systems. Source code, installation instructions, documentation and example files are freely available under the GNU General Public License at https://github.com/FePhyFoFum/phyx Contact: eebsmith@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28174903
Application of next generation sequencing in clinical microbiology and infection prevention.
Deurenberg, Ruud H; Bathoorn, Erik; Chlebowicz, Monika A; Couto, Natacha; Ferdous, Mithila; García-Cobos, Silvia; Kooistra-Smid, Anna M D; Raangs, Erwin C; Rosema, Sigrid; Veloo, Alida C M; Zhou, Kai; Friedrich, Alexander W; Rossen, John W A
2017-02-10
Current molecular diagnostics of human pathogens provide limited information that is often not sufficient for outbreak and transmission investigation. Next generation sequencing (NGS) determines the DNA sequence of a complete bacterial genome in a single sequence run, and from these data, information on resistance and virulence, as well as information for typing is obtained, useful for outbreak investigation. The obtained genome data can be further used for the development of an outbreak-specific screening test. In this review, a general introduction to NGS is presented, including the library preparation and the major characteristics of the most common NGS platforms, such as the MiSeq (Illumina) and the Ion PGM™ (ThermoFisher). An overview of the software used for NGS data analyses used at the medical microbiology diagnostic laboratory in the University Medical Center Groningen in The Netherlands is given. Furthermore, applications of NGS in the clinical setting are described, such as outbreak management, molecular case finding, characterization and surveillance of pathogens, rapid identification of bacteria using the 16S-23S rRNA region, taxonomy, metagenomics approaches on clinical samples, and the determination of the transmission of zoonotic micro-organisms from animals to humans. Finally, we share our vision on the use of NGS in personalised microbiology in the near future, pointing out specific requirements. Copyright © 2016 The Author(s). Published by Elsevier B.V. All rights reserved.
Origin of noncoding DNA sequences: molecular fossils of genome evolution
DOE Office of Scientific and Technical Information (OSTI.GOV)
Naora, H.; Miyahara, K.; Curnow, R.N.
The total amount of noncoding sequences on chromosomes of contemporary organisms varies significantly from species to species. The authors propose a hypothesis for the origin of these noncoding sequences that assumes that (i) an approx. 0.55-kilobase (kb)-long reading frame composed the primordial gene and (ii) a 20-kb-long single-stranded polynucleotide is the longest molecule (as a genome) that was polymerized at random and without a specific template in the primordial soup/cell. The statistical distribution of stop codons allows examination of the probability of generating reading frames of approx. 0.55 kb in this primordial polynucleotide. This analysis reveals that with three stopmore » codons, a run of at least 0.55-kb equivalent length of nonstop codons would occur in 4.6% of 20-kb-long polynucleotide molecules. They attempt to estimate the total amount of noncoding sequences that would be present on the chromosomes of contemporary species assuming that present-day chromosomes retain the prototype primordial genome structure. Theoretical estimates thus obtained for most eukaryotes do not differ significantly from those reported for these specific organisms, with only a few exceptions. Furthermore, analysis of possible stop-codon distributions suggests that life on earth would not exist, at least in its present form, had two or four stop codons been selected early in evolution.« less
Fredholm, Daniel V; Coleman, James K; Childress, April L; Wellehan, James F X
2015-03-01
Agamid adenovirus 1 (AgAdv-1) is a significant cause of disease in bearded dragons (Pogona sp.). Clinical manifestations of AgAdv-1 infection are variable and often nonspecific; the manifestations range from lethargy, weight loss, and inappetence, to severe enteritis, hepatitis, and sudden death. Currently, diagnosis of AgAdv-1 infection is achieved through a single published method: standard nested polymerase chain reaction (nPCR) and sequencing. Standard nPCR with sequencing provides reliable sensitivity, specificity, and validation of PCR products. However, this process is comparatively expensive, laborious, and slow. Probe hybridization, as used in a TaqMan assay, represents the best option for validating PCR products aside from the time-consuming process of sequencing. This study developed a real-time PCR (qPCR) assay using a TaqMan probe-based assay, targeting a highly conserved region of the AgAdv-1 genome. Standard curves were generated, detection results were compared with the gold standard conventional PCR and sequencing assay, and limits of detection were determined. Additionally, the qPCR assay was run on samples known to be positive for AgAdv-1 and samples known to be positive for other adenoviruses. Based on the results of these evaluations, this assay allows for a less expensive, rapid, quantitative detection of AgAdv-1 in bearded dragons. © 2015 The Author(s).
elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling
Decap, Dries; Fostier, Jan; Reumers, Joke
2015-01-01
elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost. PMID:26182406
Deurenberg, Ruud H; Bathoorn, Erik; Chlebowicz, Monika A; Couto, Natacha; Ferdous, Mithila; García-Cobos, Silvia; Kooistra-Smid, Anna M D; Raangs, Erwin C; Rosema, Sigrid; Veloo, Alida C M; Zhou, Kai; Friedrich, Alexander W; Rossen, John W A
2017-05-20
Current molecular diagnostics of human pathogens provide limited information that is often not sufficient for outbreak and transmission investigation. Next generation sequencing (NGS) determines the DNA sequence of a complete bacterial genome in a single sequence run, and from these data, information on resistance and virulence, as well as information for typing is obtained, useful for outbreak investigation. The obtained genome data can be further used for the development of an outbreak-specific screening test. In this review, a general introduction to NGS is presented, including the library preparation and the major characteristics of the most common NGS platforms, such as the MiSeq (Illumina) and the Ion PGM™ (ThermoFisher). An overview of the software used for NGS data analyses used at the medical microbiology diagnostic laboratory in the University Medical Center Groningen in The Netherlands is given. Furthermore, applications of NGS in the clinical setting are described, such as outbreak management, molecular case finding, characterization and surveillance of pathogens, rapid identification of bacteria using the 16S-23S rRNA region, taxonomy, metagenomics approaches on clinical samples, and the determination of the transmission of zoonotic micro-organisms from animals to humans. Finally, we share our vision on the use of NGS in personalised microbiology in the near future, pointing out specific requirements. Copyright © 2017. Published by Elsevier B.V.
Assessment of replicate bias in 454 pyrosequencing and a multi-purpose read-filtering tool.
Jérôme, Mariette; Noirot, Céline; Klopp, Christophe
2011-05-26
Roche 454 pyrosequencing platform is often considered the most versatile of the Next Generation Sequencing technology platforms, permitting the sequencing of large genomes, the analysis of variations or the study of transcriptomes. A recent reported bias leads to the production of multiple reads for a unique DNA fragment in a random manner within a run. This bias has a direct impact on the quality of the measurement of the representation of the fragments using the reads. Other cleaning steps are usually performed on the reads before assembly or alignment. PyroCleaner is a software module intended to clean 454 pyrosequencing reads in order to ease the assembly process. This program is a free software and is distributed under the terms of the GNU General Public License as published by the Free Software Foundation. It implements several filters using criteria such as read duplication, length, complexity, base-pair quality and number of undetermined bases. It also permits to clean flowgram files (.sff) of paired-end sequences generating on one hand validated paired-ends file and the other hand single read file. Read cleaning has always been an important step in sequence analysis. The pyrocleaner python module is a Swiss knife dedicated to 454 reads cleaning. It includes commonly used filters as well as specialised ones such as duplicated read removal and paired-end read verification.
OpenKnowledge for peer-to-peer experimentation in protein identification by MS/MS
2011-01-01
Background Traditional scientific workflow platforms usually run individual experiments with little evaluation and analysis of performance as required by automated experimentation in which scientists are being allowed to access numerous applicable workflows rather than being committed to a single one. Experimental protocols and data under a peer-to-peer environment could potentially be shared freely without any single point of authority to dictate how experiments should be run. In such environment it is necessary to have mechanisms by which each individual scientist (peer) can assess, locally, how he or she wants to be involved with others in experiments. This study aims to implement and demonstrate simple peer ranking under the OpenKnowledge peer-to-peer infrastructure by both simulated and real-world bioinformatics experiments involving multi-agent interactions. Methods A simulated experiment environment with a peer ranking capability was specified by the Lightweight Coordination Calculus (LCC) and automatically executed under the OpenKnowledge infrastructure. The peers such as MS/MS protein identification services (including web-enabled and independent programs) were made accessible as OpenKnowledge Components (OKCs) for automated execution as peers in the experiments. The performance of the peers in these automated experiments was monitored and evaluated by simple peer ranking algorithms. Results Peer ranking experiments with simulated peers exhibited characteristic behaviours, e.g., power law effect (a few dominant peers dominate), similar to that observed in the traditional Web. Real-world experiments were run using an interaction model in LCC involving two different types of MS/MS protein identification peers, viz., peptide fragment fingerprinting (PFF) and de novo sequencing with another peer ranking algorithm simply based on counting the successful and failed runs. This study demonstrated a novel integration and useful evaluation of specific proteomic peers and found MASCOT to be a dominant peer as judged by peer ranking. Conclusion The simulated and real-world experiments in the present study demonstrated that the OpenKnowledge infrastructure with peer ranking capability can serve as an evaluative environment for automated experimentation. PMID:22192521
GEANT4 distributed computing for compact clusters
NASA Astrophysics Data System (ADS)
Harrawood, Brian P.; Agasthya, Greeshma A.; Lakshmanan, Manu N.; Raterman, Gretchen; Kapadia, Anuj J.
2014-11-01
A new technique for distribution of GEANT4 processes is introduced to simplify running a simulation in a parallel environment such as a tightly coupled computer cluster. Using a new C++ class derived from the GEANT4 toolkit, multiple runs forming a single simulation are managed across a local network of computers with a simple inter-node communication protocol. The class is integrated with the GEANT4 toolkit and is designed to scale from a single symmetric multiprocessing (SMP) machine to compact clusters ranging in size from tens to thousands of nodes. User designed 'work tickets' are distributed to clients using a client-server work flow model to specify the parameters for each individual run of the simulation. The new g4DistributedRunManager class was developed and well tested in the course of our Neutron Stimulated Emission Computed Tomography (NSECT) experiments. It will be useful for anyone running GEANT4 for large discrete data sets such as covering a range of angles in computed tomography, calculating dose delivery with multiple fractions or simply speeding the through-put of a single model.
A 1-minute full brain MR exam using a multicontrast EPI sequence.
Skare, Stefan; Sprenger, Tim; Norbeck, Ola; Rydén, Henric; Blomberg, Lars; Avventi, Enrico; Engström, Mathias
2018-06-01
A new multicontrast echo-planar imaging (EPI)-based sequence is proposed for brain MRI, which can directly generate six MR contrasts (T 1 -FLAIR, T 2 -w, diffusion-weighted (DWI), apparent diffusion coefficient (ADC), T2*-w, T 2 -FLAIR) in 1 min with full brain coverage. This could enable clinical MR clinical screening in similar time as a conventional CT exam but with more soft-tissue information. Eleven sequence modules were created as dynamic building blocks for the sequence. Two EPI readout modules were reused throughout the sequence and were prepended by other modules to form the desired MR contrasts. Two scan protocols were optimized with scan times of 55-75 s. Motion experiments were carried out on two volunteers to investigate the robustness against head motion. Scans on patients were carried out and compared to conventional clinical images. The pulse sequence is found to be robust against motion given its single-shot nature of each contrast. For excessive out-of-plane head motion, the T 1 -FLAIR and T 2 -FLAIR contrasts suffer from incomplete inversion. Despite lower signal-to-noise ratio (SNR) and resolution, the 1-min multicontrast EPI data show promising correspondence with conventional diagnostic scans on patients. A 1 min multicontrast brain MRI scan based on EPI readouts has been presented in this feasibility study. Preliminary data show potential for clinical brain MRI use with minimal bore time for the patient. Such short examination time could be useful (e.g., for screening and acute stroke). The sequence may also help planning conventional brain MRI scans if run at the beginning of an examination. Magn Reson Med 79:3045-3054, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments
Yim, Won Cheol; Cushman, John C.
2017-07-22
Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less
Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yim, Won Cheol; Cushman, John C.
Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less
History of Satellite Orbit Determination at NSWCDD
2018-01-31
run . Segment 40 did pass editing and its use was optional after Segment 20. Segment 30 needed to be run before Segment 80. Segment 70 was run as...control cards required to run the program. These included a CHARGE card related to usage charges and various REQUEST, ATTACH, and CATALOG cards...each) could be done in a single run after the long-arc solution had converged. These short arcs used the pass matrices from the long-arc run in their
Design and Analysis of Single-Cell Sequencing Experiments.
Grün, Dominic; van Oudenaarden, Alexander
2015-11-05
Recent advances in single-cell sequencing hold great potential for exploring biological systems with unprecedented resolution. Sequencing the genome of individual cells can reveal somatic mutations and allows the investigation of clonal dynamics. Single-cell transcriptome sequencing can elucidate the cell type composition of a sample. However, single-cell sequencing comes with major technical challenges and yields complex data output. In this Primer, we provide an overview of available methods and discuss experimental design and single-cell data analysis. We hope that these guidelines will enable a growing number of researchers to leverage the power of single-cell sequencing. Copyright © 2015 Elsevier Inc. All rights reserved.
Zook, Justin M.; Samarov, Daniel; McDaniel, Jennifer; Sen, Shurjo K.; Salit, Marc
2012-01-01
While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a data set used to calculate association of SSEs with various features in the reads and sequence context. This data set is typically either from a part of the data set being “recalibrated” (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 Phred-scaled quality score units, and by as much as 13 units at CpG sites. In addition, since the spike-in data used for recalibration are independent of the genome being sequenced, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration. PMID:22859977
CMB constraints on running non-Gaussianity
NASA Astrophysics Data System (ADS)
Oppizzi, F.; Liguori, M.; Renzi, A.; Arroja, F.; Bartolo, N.
2018-05-01
We develop a complete set of tools for CMB forecasting, simulation and estimation of primordial running bispectra, arising from a variety of curvaton and single-field (DBI) models of Inflation. We validate our pipeline using mock CMB running non-Gaussianity realizations and test it on real data by obtaining experimental constraints on the fNL running spectral index, nNG, using WMAP 9-year data. Our final bounds (68% C.L.) read ‑0.6< nNG<1.4}, ‑0.3< nNG<1.2, ‑1.1
Robotic Enrichment Processing of Roche 454 Titanium Emlusion PCR at the DOE Joint Genome Institute
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hamilton, Matthew; Wilson, Steven; Bauer, Diane
2010-05-28
Enrichment of emulsion PCR product is the most laborious and pipette-intensive step in the 454 Titanium process, posing the biggest obstacle for production-oriented scale up. The Joint Genome Institute has developed a pair of custom-made robots based on the Microlab Star liquid handling deck manufactured by Hamilton to mediate the complexity and ergonomic demands of the 454 enrichment process. The robot includes a custom built centrifuge, magnetic deck positions, as well as heating and cooling elements. At present processing eight emulsion cup samples in a single 2.5 hour run, these robots are capable of processing up to 24 emulsion cupmore » samples. Sample emulsions are broken using the standard 454 breaking process and transferred from a pair of 50ml conical tubes to a single 2ml tube and loaded on the robot. The robot performs the enrichment protocol and produces beads in 2ml tubes ready for counting. The robot follows the Roche 454 enrichment protocol with slight exceptions to the manner in which it resuspends beads via pipette mixing rather than vortexing and a set number of null bead removal washes. The robotic process is broken down in similar discrete steps: First Melt and Neutralization, Enrichment Primer Annealing, Enrichment Bead Incubation, Null Bead Removal, Second Melt and Neutralization and Sequencing Primer Annealing. Data indicating our improvements in enrichment efficiency and total number of bases per run will also be shown.« less
Single molecule sequencing of the M13 virus genome without amplification
Zhao, Luyang; Deng, Liwei; Li, Gailing; Jin, Huan; Cai, Jinsen; Shang, Huan; Li, Yan; Wu, Haomin; Xu, Weibin; Zeng, Lidong; Zhang, Renli; Zhao, Huan; Wu, Ping; Zhou, Zhiliang; Zheng, Jiao; Ezanno, Pierre; Yang, Andrew X.; Yan, Qin; Deem, Michael W.; He, Jiankui
2017-01-01
Next generation sequencing (NGS) has revolutionized life sciences research. However, GC bias and costly, time-intensive library preparation make NGS an ill fit for increasing sequencing demands in the clinic. A new class of third-generation sequencing platforms has arrived to meet this need, capable of directly measuring DNA and RNA sequences at the single-molecule level without amplification. Here, we use the new GenoCare single-molecule sequencing platform from Direct Genomics to sequence the genome of the M13 virus. Our platform detects single-molecule fluorescence by total internal reflection microscopy, with sequencing-by-synthesis chemistry. We sequenced the genome of M13 to a depth of 316x, with 100% coverage. We determined a consensus sequence accuracy of 100%. In contrast to GC bias inherent to NGS results, we demonstrated that our single-molecule sequencing method yields minimal GC bias. PMID:29253901
Single molecule sequencing of the M13 virus genome without amplification.
Zhao, Luyang; Deng, Liwei; Li, Gailing; Jin, Huan; Cai, Jinsen; Shang, Huan; Li, Yan; Wu, Haomin; Xu, Weibin; Zeng, Lidong; Zhang, Renli; Zhao, Huan; Wu, Ping; Zhou, Zhiliang; Zheng, Jiao; Ezanno, Pierre; Yang, Andrew X; Yan, Qin; Deem, Michael W; He, Jiankui
2017-01-01
Next generation sequencing (NGS) has revolutionized life sciences research. However, GC bias and costly, time-intensive library preparation make NGS an ill fit for increasing sequencing demands in the clinic. A new class of third-generation sequencing platforms has arrived to meet this need, capable of directly measuring DNA and RNA sequences at the single-molecule level without amplification. Here, we use the new GenoCare single-molecule sequencing platform from Direct Genomics to sequence the genome of the M13 virus. Our platform detects single-molecule fluorescence by total internal reflection microscopy, with sequencing-by-synthesis chemistry. We sequenced the genome of M13 to a depth of 316x, with 100% coverage. We determined a consensus sequence accuracy of 100%. In contrast to GC bias inherent to NGS results, we demonstrated that our single-molecule sequencing method yields minimal GC bias.
Carrillo-de-la-Peña, M T
1999-05-01
(1) To achieve a better understanding of the intensity dependence function of AEPs recorded at fronto-central and temporal electrode sites; (2) To assess the possible influence of the order of stimuli presentation on this function; and (3) To investigate if a subject's AEPs augmenting or reducing (A/R) tendency is consistent throughout two intra-session runs. Two sequences of 288 stimuli of different intensities (60, 80, 90 and 110 dB SPL) were delivered to 29 psychology students. In the first run, stimuli were presented in 4 consecutive blocks of 72 tones of each intensity, either in an ascendant (from lowest to loudest stimuli) or descendent (from loudest to lowest) way. In the second run, a pseudo-randomized sequence of stimuli of the 4 intensities was presented. (1) AEPs recorded at fronto-central electrodes showed a stronger intensity dependence than those recorded at temporal leads; (2) The delivery of tones of different intensities in an aleatory sequence provoked higher amplitudes at Fz and Cz - especially for the loudest tones - but not at temporal leads; (3) The individual's AEP responses to stimuli of increasing intensity are highly consistent throughout two intra-session runs. The different findings obtained for the fronto-central N1P2 and the T complex in relation to the effect of intensity and order of stimuli presentation may be explained in terms of the cortical origin of those components. The higher amplitudes found with an aleatory sequence, especially for the highest intensity stimuli, may reflect that these stimuli capture the subject's attention and provoke an enhancement of the N1 component. The implications of the present results for investigation into A/R and the clinical relevance of this phenomenon are discussed.
DRUMS: Disk Repository with Update Management and Select option for high throughput sequencing data.
Nettling, Martin; Thieme, Nils; Both, Andreas; Grosse, Ivo
2014-02-04
New technologies for analyzing biological samples, like next generation sequencing, are producing a growing amount of data together with quality scores. Moreover, software tools (e.g., for mapping sequence reads), calculating transcription factor binding probabilities, estimating epigenetic modification enriched regions or determining single nucleotide polymorphism increase this amount of position-specific DNA-related data even further. Hence, requesting data becomes challenging and expensive and is often implemented using specialised hardware. In addition, picking specific data as fast as possible becomes increasingly important in many fields of science. The general problem of handling big data sets was addressed by developing specialized databases like HBase, HyperTable or Cassandra. However, these database solutions require also specialized or distributed hardware leading to expensive investments. To the best of our knowledge, there is no database capable of (i) storing billions of position-specific DNA-related records, (ii) performing fast and resource saving requests, and (iii) running on a single standard computer hardware. Here, we present DRUMS (Disk Repository with Update Management and Select option), satisfying demands (i)-(iii). It tackles the weaknesses of traditional databases while handling position-specific DNA-related data in an efficient manner. DRUMS is capable of storing up to billions of records. Moreover, it focuses on optimizing relating single lookups as range request, which are needed permanently for computations in bioinformatics. To validate the power of DRUMS, we compare it to the widely used MySQL database. The test setting considers two biological data sets. We use standard desktop hardware as test environment. DRUMS outperforms MySQL in writing and reading records by a factor of two up to a factor of 10000. Furthermore, it can work with significantly larger data sets. Our work focuses on mid-sized data sets up to several billion records without requiring cluster technology. Storing position-specific data is a general problem and the concept we present here is a generalized approach. Hence, it can be easily applied to other fields of bioinformatics.
Streaming fragment assignment for real-time analysis of sequencing experiments
Roberts, Adam; Pachter, Lior
2013-01-01
We present eXpress, a software package for highly efficient probabilistic assignment of ambiguously mapping sequenced fragments. eXpress uses a streaming algorithm with linear run time and constant memory use. It can determine abundances of sequenced molecules in real time, and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data. We demonstrate its use on RNA-seq data, showing greater efficiency than other quantification methods. PMID:23160280
Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis
2012-01-01
Background The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Results Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. Conclusions By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand. PMID:22276739
Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis.
Tu, Jing; Ge, Qinyu; Wang, Shengqin; Wang, Lei; Sun, Beili; Yang, Qi; Bai, Yunfei; Lu, Zuhong
2012-01-25
The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand.
Leonard, Laurence B.; Fey, Marc E.; Deevy, Patricia; Bredin-Oja, Shelley L.
2015-01-01
We tested four predictions based on the assumption that optional infinitives can be attributed to properties of the input whereby children inappropriately extract nonfinite subject-verb sequences (e.g. the girl run) from larger input utterances (e.g. Does the girl run? Let’s watch the girl run). Thirty children with specific language impairment (SLI) and 30 typically developing children heard novel and familiar verbs that appeared exclusively either in utterances containing nonfinite subject-verb sequences or in simple sentences with the verb inflected for third person singular –s. Subsequent testing showed strong input effects, especially for the SLI group. The results provide support for input-based factors as significant contributors not only to the optional infinitive period in typical development, but also to the especially protracted optional infinitive period seen in SLI. PMID:25076070
Single-cell sequencing technologies: current and future.
Liang, Jialong; Cai, Wanshi; Sun, Zhongsheng
2014-10-20
Intensively developed in the last few years, single-cell sequencing technologies now present numerous advantages over traditional sequencing methods for solving the problems of biological heterogeneity and low quantities of available biological materials. The application of single-cell sequencing technologies has profoundly changed our understanding of a series of biological phenomena, including gene transcription, embryo development, and carcinogenesis. However, before single-cell sequencing technologies can be used extensively, researchers face the serious challenge of overcoming inherent issues of high amplification bias, low accuracy and reproducibility. Here, we simply summarize the techniques used for single-cell isolation, and review the current technologies used in single-cell genomic, transcriptomic, and epigenomic sequencing. We discuss the merits, defects, and scope of application of single-cell sequencing technologies and then speculate on the direction of future developments. Copyright © 2014 Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and Genetics Society of China. Published by Elsevier Ltd. All rights reserved.
Karamanidis, Kiros; Arampatzis, Adamantios
2007-01-01
The goals of this study were to investigate whether the lower muscle-tendon units (MTUs) capacities in older affect their ability to recover balance with a single-step after a fall, and to examine whether running experience enhances and protects this motor skill in young and old adults. The investigation was conducted on 30 older and 19 younger divided into two subgroups: runners versus non-active. In previous studies we documented that the older had lower leg extensor muscle strength and tendon stiffness while running had no effect on MTUs capacities. The current study examined recovery mechanics of the same individuals after an induced forward fall. Younger were better able to recover balance with a single-step compared to older (P < 0.001); this ability was associated with a more effective body configuration at touchdown (more posterior COM position relative to the recovery foot, P <0.001). MTUs capacities classified 88.6% of the subjects into single- or multiple-steppers. Runners showed a superior ability to recover balance with a single-step (P < 0.001) compared to non-active subjects due to a more effective mechanical response during the stance phase (greater knee joint flexion, P <0.05). We concluded that the age-related degeneration of the MTUs significantly diminished the older adults' ability to restore balance with a single-step. Running seems to enhance and protect this motor skill. We suggested that runners, due to their running experience, could update the internal representation of mechanisms responsible for the control of dynamic stability during a forward fall and, thus, were able to restore balance more often with a single-step compared to the non-active subjects.
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.
Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf
2015-08-01
RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. © The Author 2015. Published by Oxford University Press.
SCOPE: a web server for practical de novo motif discovery.
Carlson, Jonathan M; Chakravarty, Arijit; DeZiel, Charles E; Gross, Robert H
2007-07-01
SCOPE is a novel parameter-free method for the de novo identification of potential regulatory motifs in sets of coordinately regulated genes. The SCOPE algorithm combines the output of three component algorithms, each designed to identify a particular class of motifs. Using an ensemble learning approach, SCOPE identifies the best candidate motifs from its component algorithms. In tests on experimentally determined datasets, SCOPE identified motifs with a significantly higher level of accuracy than a number of other web-based motif finders run with their default parameters. Because SCOPE has no adjustable parameters, the web server has an intuitive interface, requiring only a set of gene names or FASTA sequences and a choice of species. The most significant motifs found by SCOPE are displayed graphically on the main results page with a table containing summary statistics for each motif. Detailed motif information, including the sequence logo, PWM, consensus sequence and specific matching sites can be viewed through a single click on a motif. SCOPE's efficient, parameter-free search strategy has enabled the development of a web server that is readily accessible to the practising biologist while providing results that compare favorably with those of other motif finders. The SCOPE web server is at
Process in manufacturing high efficiency AlGaAs/GaAs solar cells by MO-CVD
NASA Technical Reports Server (NTRS)
Yeh, Y. C. M.; Chang, K. I.; Tandon, J.
1984-01-01
Manufacturing technology for mass producing high efficiency GaAs solar cells is discussed. A progress using a high throughput MO-CVD reactor to produce high efficiency GaAs solar cells is discussed. Thickness and doping concentration uniformity of metal oxide chemical vapor deposition (MO-CVD) GaAs and AlGaAs layer growth are discussed. In addition, new tooling designs are given which increase the throughput of solar cell processing. To date, 2cm x 2cm AlGaAs/GaAs solar cells with efficiency up to 16.5% were produced. In order to meet throughput goals for mass producing GaAs solar cells, a large MO-CVD system (Cambridge Instrument Model MR-200) with a susceptor which was initially capable of processing 20 wafers (up to 75 mm diameter) during a single growth run was installed. In the MR-200, the sequencing of the gases and the heating power are controlled by a microprocessor-based programmable control console. Hence, operator errors can be reduced, leading to a more reproducible production sequence.
2012-01-01
Background In the last 30 years, a number of DNA fingerprinting methods such as RFLP, RAPD, AFLP, SSR, DArT, have been extensively used in marker development for molecular plant breeding. However, it remains a daunting task to identify highly polymorphic and closely linked molecular markers for a target trait for molecular marker-assisted selection. The next-generation sequencing (NGS) technology is far more powerful than any existing generic DNA fingerprinting methods in generating DNA markers. In this study, we employed a grain legume crop Lupinus angustifolius (lupin) as a test case, and examined the utility of an NGS-based method of RAD (restriction-site associated DNA) sequencing as DNA fingerprinting for rapid, cost-effective marker development tagging a disease resistance gene for molecular breeding. Results Twenty informative plants from a cross of RxS (disease resistant x susceptible) in lupin were subjected to RAD single-end sequencing by multiplex identifiers. The entire RAD sequencing products were resolved in two lanes of the 16-lanes per run sequencing platform Solexa HiSeq2000. A total of 185 million raw reads, approximately 17 Gb of sequencing data, were collected. Sequence comparison among the 20 test plants discovered 8207 SNP markers. Filtration of DNA sequencing data with marker identification parameters resulted in the discovery of 38 molecular markers linked to the disease resistance gene Lanr1. Five randomly selected markers were converted into cost-effective, simple PCR-based markers. Linkage analysis using marker genotyping data and disease resistance phenotyping data on a F8 population consisting of 186 individual plants confirmed that all these five markers were linked to the R gene. Two of these newly developed sequence-specific PCR markers, AnSeq3 and AnSeq4, flanked the target R gene at a genetic distance of 0.9 centiMorgan (cM), and are now replacing the markers previously developed by a traditional DNA fingerprinting method for marker-assisted selection in the Australian national lupin breeding program. Conclusions We demonstrated that more than 30 molecular markers linked to a target gene of agronomic trait of interest can be identified from a small portion (1/8) of one sequencing run on HiSeq2000 by applying NGS based RAD sequencing in marker development. The markers developed by the strategy described in this study are all co-dominant SNP markers, which can readily be converted into high throughput multiplex format or low-cost, simple PCR-based markers desirable for large scale marker implementation in plant breeding programs. The high density and closely linked molecular markers associated with a target trait help to overcome a major bottleneck for implementation of molecular markers on a wide range of germplasm in breeding programs. We conclude that application of NGS based RAD sequencing as DNA fingerprinting is a very rapid and cost-effective strategy for marker development in molecular plant breeding. The strategy does not require any prior genome knowledge or molecular information for the species under investigation, and it is applicable to other plant species. PMID:22805587
Yang, Huaan; Tao, Ye; Zheng, Zequn; Li, Chengdao; Sweetingham, Mark W; Howieson, John G
2012-07-17
In the last 30 years, a number of DNA fingerprinting methods such as RFLP, RAPD, AFLP, SSR, DArT, have been extensively used in marker development for molecular plant breeding. However, it remains a daunting task to identify highly polymorphic and closely linked molecular markers for a target trait for molecular marker-assisted selection. The next-generation sequencing (NGS) technology is far more powerful than any existing generic DNA fingerprinting methods in generating DNA markers. In this study, we employed a grain legume crop Lupinus angustifolius (lupin) as a test case, and examined the utility of an NGS-based method of RAD (restriction-site associated DNA) sequencing as DNA fingerprinting for rapid, cost-effective marker development tagging a disease resistance gene for molecular breeding. Twenty informative plants from a cross of RxS (disease resistant x susceptible) in lupin were subjected to RAD single-end sequencing by multiplex identifiers. The entire RAD sequencing products were resolved in two lanes of the 16-lanes per run sequencing platform Solexa HiSeq2000. A total of 185 million raw reads, approximately 17 Gb of sequencing data, were collected. Sequence comparison among the 20 test plants discovered 8207 SNP markers. Filtration of DNA sequencing data with marker identification parameters resulted in the discovery of 38 molecular markers linked to the disease resistance gene Lanr1. Five randomly selected markers were converted into cost-effective, simple PCR-based markers. Linkage analysis using marker genotyping data and disease resistance phenotyping data on a F8 population consisting of 186 individual plants confirmed that all these five markers were linked to the R gene. Two of these newly developed sequence-specific PCR markers, AnSeq3 and AnSeq4, flanked the target R gene at a genetic distance of 0.9 centiMorgan (cM), and are now replacing the markers previously developed by a traditional DNA fingerprinting method for marker-assisted selection in the Australian national lupin breeding program. We demonstrated that more than 30 molecular markers linked to a target gene of agronomic trait of interest can be identified from a small portion (1/8) of one sequencing run on HiSeq2000 by applying NGS based RAD sequencing in marker development. The markers developed by the strategy described in this study are all co-dominant SNP markers, which can readily be converted into high throughput multiplex format or low-cost, simple PCR-based markers desirable for large scale marker implementation in plant breeding programs. The high density and closely linked molecular markers associated with a target trait help to overcome a major bottleneck for implementation of molecular markers on a wide range of germplasm in breeding programs. We conclude that application of NGS based RAD sequencing as DNA fingerprinting is a very rapid and cost-effective strategy for marker development in molecular plant breeding. The strategy does not require any prior genome knowledge or molecular information for the species under investigation, and it is applicable to other plant species.
Spontaneous Entrainment of Running Cadence to Music Tempo.
Van Dyck, Edith; Moens, Bart; Buhmann, Jeska; Demey, Michiel; Coorevits, Esther; Dalla Bella, Simone; Leman, Marc
Since accumulating evidence suggests that step rate is strongly associated with running-related injuries, it is important for runners to exercise at an appropriate running cadence. As music tempo has been shown to be capable of impacting exercise performance of repetitive endurance activities, it might also serve as a means to (re)shape running cadence. The aim of this study was to validate the impact of music tempo on running cadence. Sixteen recreational runners ran four laps of 200 m (i.e. 800 m in total); this task was repeated 11 times with a short break in between each four-lap sequence. During the first lap of a sequence, participants ran at a self-paced tempo without musical accompaniment. Running cadence of the first lap was registered, and during the second lap, music with a tempo matching the assessed cadence was played. In the final two laps, the music tempo was either increased/decreased by 3.00, 2.50, 2.00, 1.50, or 1.00 % or was kept stable. This range was chosen since the aim of this study was to test spontaneous entrainment (an average person can distinguish tempo variations of about 4 %). Each participant performed all conditions. Imperceptible shifts in musical tempi in proportion to the runner's self-paced running tempo significantly influenced running cadence ( p < .001). Contrasts revealed a linear relation between the tempo conditions and adaptation in running cadence ( p < .001). In addition, a significant effect of condition on the level of entrainment was revealed ( p < .05), which suggests that maximal effects of music tempo on running cadence can only be obtained up to a certain level of tempo modification. Finally, significantly higher levels of tempo entrainment were found for female participants compared to their male counterparts ( p < .05). The applicable contribution of these novel findings is that music tempo could serve as an unprompted means to impact running cadence. As increases in step rate may prove beneficial in the prevention and treatment of common running-related injuries, this finding could be especially relevant for treatment purposes, such as exercise prescription and gait retraining. Music tempo can spontaneously impact running cadence.A basin for unsolicited entrainment of running cadence to music tempo was discovered.The effect of music tempo on running cadence proves to be stronger for women than for men.
Yeo, Thong-Hiang; Ho, Mer-Lin; Loke, Weng-Keong
2008-01-01
A novel liquid chromatography-multiple reaction monitoring (LC-MRM) procedure has been developed for retrospective diagnosis of exposure to different forms of mustard agents. This concise method is able to validate prior exposure to nitrogen mustards (HN-1, HN-2, and HN-3) or sulfur mustard (HD) in a single run, which significantly reduces analysis time compared to separate runs to screen for different mustards' biomarkers based on tandem mass spectrometry. Belonging to one of the more toxic classes of chemical warfare agents, these potent vesicants bind covalently to the cysteine-34 residue of human serum albumin. This results in the formation of stable adducts whose identities were confirmed by a de novo sequencing bioinformatics software package. Our developed technique tracks these albumin-derived adduct biomarkers in blood samples which persist in vitro following exposure, enabling a detection limit of 200 nM of HN-1, 100 nM of HN-2, 200 nM of HN-3, or 50 nM of HD in human blood. The CWA-adducts formed in blood samples can be conveniently and sensitively analyzed by this MRM technique to allow rapid and reliable screening.
Simulating maar-diatreme volcanic systems in bench-scale experiments
NASA Astrophysics Data System (ADS)
Andrews, R. G.; White, J. D. L.; Dürig, T.; Zimanowski, B.
2015-12-01
Maar-diatreme eruptions are incompletely understood, and explanations for the processes involved in them have been debated for decades. This study extends bench-scale analogue experiments previously conducted on maar-diatreme systems and attempts to scale the results up to both field-scale experimentation and natural volcanic systems in order to produce a reconstructive toolkit for maar volcanoes. These experimental runs produced via multiple mechanisms complex deposits that match many features seen in natural maar-diatreme deposits. The runs include deeper single blasts, series of descending discrete blasts, and series of ascending blasts. Debris-jet inception and diatreme formation are indicated by this study to involve multiple types of granular fountains within diatreme deposits produced under varying initial conditions. The individual energies of blasts in multiple-blast series are not possible to infer from the final deposits. The depositional record of blast sequences can be ascertained from the proportion of fallback sedimentation versus maar ejecta rim material, the final crater size and the degree of overturning or slumping of accessory strata. Quantitatively, deeper blasts involve a roughly equal partitioning of energy into crater excavation energy versus mass movement of juvenile material, whereas shallower blasts expend a much greater proportion of energy in crater excavation.
The Impact of Odor--Reward Memory on Chemotaxis in Larval "Drosophila"
ERIC Educational Resources Information Center
Schleyer, Michael; Reid, Samuel F.; Pamir, Evren; Saumweber, Timo; Paisios, Emmanouil; Davies, Alexander; Gerber, Bertram; Louis, Matthieu
2015-01-01
How do animals adaptively integrate innate with learned behavioral tendencies? We tackle this question using chemotaxis as a paradigm. Chemotaxis in the "Drosophila" larva largely results from a sequence of runs and oriented turns. Thus, the larvae minimally need to determine (i) how fast to run, (ii) when to initiate a turn, and (iii)…
Code of Federal Regulations, 2010 CFR
2010-07-01
... zero and span settings of the smokemeter. (If a recorder is used, a chart speed of approximately one... collection, it shall be run at a minimum chart speed of one inch per minute during the idle mode and... zero and full scale response may be rechecked and reset during the idle mode of each test sequence. (v...
Benchmarking short sequence mapping tools
2013-01-01
Background The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison. Results We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others. Conclusion The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results. PMID:23758764
2013-01-01
Background The revolution in DNA sequencing technology continues unabated, and is affecting all aspects of the biological and medical sciences. The training and recruitment of the next generation of researchers who are able to use and exploit the new technology is severely lacking and potentially negatively influencing research and development efforts to advance genome biology. Here we present a cross-disciplinary course that provides undergraduate students with practical experience in running a next generation sequencing instrument through to the analysis and annotation of the generated DNA sequences. Results Many labs across world are installing next generation sequencing technology and we show that the undergraduate students produce quality sequence data and were excited to participate in cutting edge research. The students conducted the work flow from DNA extraction, library preparation, running the sequencing instrument, to the extraction and analysis of the data. They sequenced microbes, metagenomes, and a marine mammal, the Californian sea lion, Zalophus californianus. The students met sequencing quality controls, had no detectable contamination in the targeted DNA sequences, provided publication quality data, and became part of an international collaboration to investigate carcinomas in carnivores. Conclusions Students learned important skills for their future education and career opportunities, and a perceived increase in students’ ability to conduct independent scientific research was measured. DNA sequencing is rapidly expanding in the life sciences. Teaching undergraduates to use the latest technology to sequence genomic DNA ensures they are ready to meet the challenges of the genomic era and allows them to participate in annotating the tree of life. PMID:24007365
Single-cell genomic sequencing using Multiple Displacement Amplification.
Lasken, Roger S
2007-10-01
Single microbial cells can now be sequenced using DNA amplified by the Multiple Displacement Amplification (MDA) reaction. The few femtograms of DNA in a bacterium are amplified into micrograms of high molecular weight DNA suitable for DNA library construction and Sanger sequencing. The MDA-generated DNA also performs well when used directly as template for pyrosequencing by the 454 Life Sciences method. While MDA from single cells loses some of the genomic sequence, this approach will greatly accelerate the pace of sequencing from uncultured microbes. The genetically linked sequences from single cells are also a powerful tool to be used in guiding genomic assembly of shotgun sequences of multiple organisms from environmental DNA extracts (metagenomic sequences).
Sequence alignment visualization in HTML5 without Java.
Gille, Christoph; Birgit, Weyand; Gille, Andreas
2014-01-01
Java has been extensively used for the visualization of biological data in the web. However, the Java runtime environment is an additional layer of software with an own set of technical problems and security risks. HTML in its new version 5 provides features that for some tasks may render Java unnecessary. Alignment-To-HTML is the first HTML-based interactive visualization for annotated multiple sequence alignments. The server side script interpreter can perform all tasks like (i) sequence retrieval, (ii) alignment computation, (iii) rendering, (iv) identification of a homologous structural models and (v) communication with BioDAS-servers. The rendered alignment can be included in web pages and is displayed in all browsers on all platforms including touch screen tablets. The functionality of the user interface is similar to legacy Java applets and includes color schemes, highlighting of conserved and variable alignment positions, row reordering by drag and drop, interlinked 3D visualization and sequence groups. Novel features are (i) support for multiple overlapping residue annotations, such as chemical modifications, single nucleotide polymorphisms and mutations, (ii) mechanisms to quickly hide residue annotations, (iii) export to MS-Word and (iv) sequence icons. Alignment-To-HTML, the first interactive alignment visualization that runs in web browsers without additional software, confirms that to some extend HTML5 is already sufficient to display complex biological data. The low speed at which programs are executed in browsers is still the main obstacle. Nevertheless, we envision an increased use of HTML and JavaScript for interactive biological software. Under GPL at: http://www.bioinformatics.org/strap/toHTML/.
Pollen, Alex A; Nowakowski, Tomasz J; Shuga, Joe; Wang, Xiaohui; Leyrat, Anne A; Lui, Jan H; Li, Nianzhen; Szpankowski, Lukasz; Fowler, Brian; Chen, Peilin; Ramalingam, Naveen; Sun, Gang; Thu, Myo; Norris, Michael; Lebofsky, Ronald; Toppani, Dominique; Kemp, Darnell W; Wong, Michael; Clerkson, Barry; Jones, Brittnee N; Wu, Shiquan; Knutsson, Lawrence; Alvarado, Beatriz; Wang, Jing; Weaver, Lesley S; May, Andrew P; Jones, Robert C; Unger, Marc A; Kriegstein, Arnold R; West, Jay A A
2014-10-01
Large-scale surveys of single-cell gene expression have the potential to reveal rare cell populations and lineage relationships but require efficient methods for cell capture and mRNA sequencing. Although cellular barcoding strategies allow parallel sequencing of single cells at ultra-low depths, the limitations of shallow sequencing have not been investigated directly. By capturing 301 single cells from 11 populations using microfluidics and analyzing single-cell transcriptomes across downsampled sequencing depths, we demonstrate that shallow single-cell mRNA sequencing (~50,000 reads per cell) is sufficient for unbiased cell-type classification and biomarker identification. In the developing cortex, we identify diverse cell types, including multiple progenitor and neuronal subtypes, and we identify EGR1 and FOS as previously unreported candidate targets of Notch signaling in human but not mouse radial glia. Our strategy establishes an efficient method for unbiased analysis and comparison of cell populations from heterogeneous tissue by microfluidic single-cell capture and low-coverage sequencing of many cells.
CLAST: CUDA implemented large-scale alignment search tool.
Yano, Masahiro; Mori, Hiroshi; Akiyama, Yutaka; Yamada, Takuji; Kurokawa, Ken
2014-12-11
Metagenomics is a powerful methodology to study microbial communities, but it is highly dependent on nucleotide sequence similarity searching against sequence databases. Metagenomic analyses with next-generation sequencing technologies produce enormous numbers of reads from microbial communities, and many reads are derived from microbes whose genomes have not yet been sequenced, limiting the usefulness of existing sequence similarity search tools. Therefore, there is a clear need for a sequence similarity search tool that can rapidly detect weak similarity in large datasets. We developed a tool, which we named CLAST (CUDA implemented large-scale alignment search tool), that enables analyses of millions of reads and thousands of reference genome sequences, and runs on NVIDIA Fermi architecture graphics processing units. CLAST has four main advantages over existing alignment tools. First, CLAST was capable of identifying sequence similarities ~80.8 times faster than BLAST and 9.6 times faster than BLAT. Second, CLAST executes global alignment as the default (local alignment is also an option), enabling CLAST to assign reads to taxonomic and functional groups based on evolutionarily distant nucleotide sequences with high accuracy. Third, CLAST does not need a preprocessed sequence database like Burrows-Wheeler Transform-based tools, and this enables CLAST to incorporate large, frequently updated sequence databases. Fourth, CLAST requires <2 GB of main memory, making it possible to run CLAST on a standard desktop computer or server node. CLAST achieved very high speed (similar to the Burrows-Wheeler Transform-based Bowtie 2 for long reads) and sensitivity (equal to BLAST, BLAT, and FR-HIT) without the need for extensive database preprocessing or a specialized computing platform. Our results demonstrate that CLAST has the potential to be one of the most powerful and realistic approaches to analyze the massive amount of sequence data from next-generation sequencing technologies.
Reilly, Kevin J.; Spencer, Kristie A.
2013-01-01
The current study investigated the processes responsible for selection of sounds and syllables during production of speech sequences in 10 adults with hypokinetic dysarthria from Parkinson’s disease, five adults with ataxic dysarthria, and 14 healthy control speakers. Speech production data from a choice reaction time task were analyzed to evaluate the effects of sequence length and practice on speech sound sequencing. Speakers produced sequences that were between one and five syllables in length over five experimental runs of 60 trials each. In contrast to the healthy speakers, speakers with hypokinetic dysarthria demonstrated exaggerated sequence length effects for both inter-syllable intervals (ISIs) and speech error rates. Conversely, speakers with ataxic dysarthria failed to demonstrate a sequence length effect on ISIs and were also the only group that did not exhibit practice-related changes in ISIs and speech error rates over the five experimental runs. The exaggerated sequence length effects in the hypokinetic speakers with Parkinson’s disease are consistent with an impairment of action selection during speech sequence production. The absent length effects observed in the speakers with ataxic dysarthria is consistent with previous findings that indicate a limited capacity to buffer speech sequences in advance of their execution. In addition, the lack of practice effects in these speakers suggests that learning-related improvements in the production rate and accuracy of speech sequences involves processing by structures of the cerebellum. Together, the current findings inform models of serial control for speech in healthy speakers and support the notion that sequencing deficits contribute to speech symptoms in speakers with hypokinetic or ataxic dysarthria. In addition, these findings indicate that speech sequencing is differentially impaired in hypokinetic and ataxic dysarthria. PMID:24137121
Di Marino, Daniele; Oteri, Francesco; della Rocca, Blasco Morozzo; D'Annessa, Ilda; Falconi, Mattia
2012-06-01
The mitochondrial adenosine diphosphate/adenosine triphosphate (ADP/ATP) carrier-AAC-was crystallized in complex with its specific inhibitor carboxyatractyloside (CATR). The protein consists of a six-transmembrane helix bundle that defines the nucleotide translocation pathway, which is closed towards the matrix side due to sharp kinks in the odd-numbered helices. In this paper, we describe the interaction between the matrix side of the AAC transporter and the ATP(4-) molecule using carrier structures obtained through classical molecular dynamics simulation (MD) and a protein-ligand docking procedure. Fifteen structures were extracted from a previously published MD trajectory through clustering analysis, and 50 docking runs were carried out for each carrier conformation, for a total of 750 runs ("MD docking"). The results were compared to those from 750 docking runs performed on the X-ray structure ("X docking"). The docking procedure indicated the presence of a single interaction site in the X-ray structure that was conserved in the structures extracted from the MD trajectory. MD docking showed the presence of a second binding site that was not found in the X docking. The interaction strategy between the AAC transporter and the ATP(4-) molecule was analyzed by investigating the composition and 3D arrangement of the interaction pockets, together with the orientations of the substrate inside them. A relationship between sequence repeats and the ATP(4-) binding sites in the AAC carrier structure is proposed.
29 CFR 1910.305 - Wiring methods, components, and equipment for general use.
Code of Federal Regulations, 2010 CFR
2010-07-01
... distribution center. (B) Conductors shall be run as multiconductor cord or cable assemblies. However, if... persons, feeders may be run as single insulated conductors. (v) The following requirements apply to branch... shall be multiconductor cord or cable assemblies or open conductors. If run as open conductors, they...
29 CFR 1910.305 - Wiring methods, components, and equipment for general use.
Code of Federal Regulations, 2011 CFR
2011-07-01
... distribution center. (B) Conductors shall be run as multiconductor cord or cable assemblies. However, if... persons, feeders may be run as single insulated conductors. (v) The following requirements apply to branch... shall be multiconductor cord or cable assemblies or open conductors. If run as open conductors, they...
29 CFR 1910.305 - Wiring methods, components, and equipment for general use.
Code of Federal Regulations, 2013 CFR
2013-07-01
... distribution center. (B) Conductors shall be run as multiconductor cord or cable assemblies. However, if... persons, feeders may be run as single insulated conductors. (v) The following requirements apply to branch... shall be multiconductor cord or cable assemblies or open conductors. If run as open conductors, they...
29 CFR 1910.305 - Wiring methods, components, and equipment for general use.
Code of Federal Regulations, 2014 CFR
2014-07-01
... distribution center. (B) Conductors shall be run as multiconductor cord or cable assemblies. However, if... persons, feeders may be run as single insulated conductors. (v) The following requirements apply to branch... shall be multiconductor cord or cable assemblies or open conductors. If run as open conductors, they...
29 CFR 1910.305 - Wiring methods, components, and equipment for general use.
Code of Federal Regulations, 2012 CFR
2012-07-01
... distribution center. (B) Conductors shall be run as multiconductor cord or cable assemblies. However, if... persons, feeders may be run as single insulated conductors. (v) The following requirements apply to branch... shall be multiconductor cord or cable assemblies or open conductors. If run as open conductors, they...
Ultraaccurate genome sequencing and haplotyping of single human cells.
Chu, Wai Keung; Edge, Peter; Lee, Ho Suk; Bansal, Vikas; Bafna, Vineet; Huang, Xiaohua; Zhang, Kun
2017-11-21
Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10 -8 and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs. Copyright © 2017 the Author(s). Published by PNAS.
Sensitivity to sequencing depth in single-cell cancer genomics.
Alves, João M; Posada, David
2018-04-16
Querying cancer genomes at single-cell resolution is expected to provide a powerful framework to understand in detail the dynamics of cancer evolution. However, given the high costs currently associated with single-cell sequencing, together with the inevitable technical noise arising from single-cell genome amplification, cost-effective strategies that maximize the quality of single-cell data are critically needed. Taking advantage of previously published single-cell whole-genome and whole-exome cancer datasets, we studied the impact of sequencing depth and sampling effort towards single-cell variant detection. Five single-cell whole-genome and whole-exome cancer datasets were independently downscaled to 25, 10, 5, and 1× sequencing depth. For each depth level, ten technical replicates were generated, resulting in a total of 6280 single-cell BAM files. The sensitivity of variant detection, including structural and driver mutations, genotyping, clonal inference, and phylogenetic reconstruction to sequencing depth was evaluated using recent tools specifically designed for single-cell data. Altogether, our results suggest that for relatively large sample sizes (25 or more cells) sequencing single tumor cells at depths > 5× does not drastically improve somatic variant discovery, characterization of clonal genotypes, or estimation of single-cell phylogenies. We suggest that sequencing multiple individual tumor cells at a modest depth represents an effective alternative to explore the mutational landscape and clonal evolutionary patterns of cancer genomes.
Zhang, Changsheng; Cai, Hongmin; Huang, Jingying; Song, Yan
2016-09-17
Variations in DNA copy number have an important contribution to the development of several diseases, including autism, schizophrenia and cancer. Single-cell sequencing technology allows the dissection of genomic heterogeneity at the single-cell level, thereby providing important evolutionary information about cancer cells. In contrast to traditional bulk sequencing, single-cell sequencing requires the amplification of the whole genome of a single cell to accumulate enough samples for sequencing. However, the amplification process inevitably introduces amplification bias, resulting in an over-dispersing portion of the sequencing data. Recent study has manifested that the over-dispersed portion of the single-cell sequencing data could be well modelled by negative binomial distributions. We developed a read-depth based method, nbCNV to detect the copy number variants (CNVs). The nbCNV method uses two constraints-sparsity and smoothness to fit the CNV patterns under the assumption that the read signals are negatively binomially distributed. The problem of CNV detection was formulated as a quadratic optimization problem, and was solved by an efficient numerical solution based on the classical alternating direction minimization method. Extensive experiments to compare nbCNV with existing benchmark models were conducted on both simulated data and empirical single-cell sequencing data. The results of those experiments demonstrate that nbCNV achieves superior performance and high robustness for the detection of CNVs in single-cell sequencing data.
This research evaluates a recently developed comprehensive 2-D GC coupled with a time-of-flight (TOF) mass spectrometer for the potential separation of 209 PCB congeners, using a sequence of 1-D and 2-D chromatographic modes. In two consecutive chromatographic runs, using a 40 m,...
Automated ensemble assembly and validation of microbial genomes.
Koren, Sergey; Treangen, Todd J; Hill, Christopher M; Pop, Mihai; Phillippy, Adam M
2014-05-03
The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible. To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers. Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.
Defensive Swarm: An Agent Based Modeling Analysis
2017-12-01
INITIAL ALGORITHM (SINGLE- RUN ) TESTING .........................43 1. Patrol Algorithm—Passive...scalability are therefore quite important to modeling in this highly variable domain. One can force the software to run the gamut of options to see...changes in operating constructs or procedures. Additionally, modelers can run thousands of iterations testing the model under different circumstances
Magro, Elsa; Gentric, Jean-Christophe; Talagas, Matthieu; Alavi, Zarrin; Nonent, Michel; Dam-Hieu, Phong; Seizeur, Romuald
2015-07-01
The anatomical arrangement of the venous system within the transverse foramen is controversial; there is disagreement whether the anatomy consists of a single vertebral vein or a confluence of venous plexus. Precise knowledge of this arrangement is necessary in imaging when vertebral artery dissection is suspected, as well as in surgical approaches for the cervical spine. This study aimed to better explain anatomical organization of the venous system within the transverse foramen according to the Trolard hypothesis of a transverse vertebral sinus. This was an anatomical and radiological study. For the anatomical study, 10 specimens were analyzed after vascular injection. After dissection, histological cuts were prepared. For the radiological study, a high-resolution MRI study with 2D time-of-flight segment MR venography sequences was performed on 10 healthy volunteers. Vertebral veins are arranged in a plexiform manner within the transverse canal. This arrangement begins at the upper part of the transverse canal before the vertebral vein turns into a single vein along with the vertebral artery running from the transverse foramen of the C-6. This venous system runs somewhat ventrolaterally to the vertebral artery. In most cases, this arrangement is symmetrical and facilitates radiological readings. The anastomoses between vertebral veins and ventral longitudinal veins are uniform and arranged segmentally at each vertebra. These findings confirm recent or previous anatomical descriptions and invalidate others. It is hard to come up with a common description of the arrangement of vertebral veins. The authors suggest providing clinicians as well as anatomists with a well-detailed description of components essential to the understanding of this organization.
The RNASeq-er API-a gateway to systematically updated analysis of public RNA-seq data.
Petryszak, Robert; Fonseca, Nuno A; Füllgrabe, Anja; Huerta, Laura; Keays, Maria; Tang, Y Amy; Brazma, Alvis
2017-07-15
The exponential growth of publicly available RNA-sequencing (RNA-Seq) data poses an increasing challenge to researchers wishing to discover, analyse and store such data, particularly those based in institutions with limited computational resources. EMBL-EBI is in an ideal position to address these challenges and to allow the scientific community easy access to not just raw, but also processed RNA-Seq data. We present a Web service to access the results of a systematically and continually updated standardized alignment as well as gene and exon expression quantification of all public bulk (and in the near future also single-cell) RNA-Seq runs in 264 species in European Nucleotide Archive, using Representational State Transfer. The RNASeq-er API (Application Programming Interface) enables ontology-powered search for and retrieval of CRAM, bigwig and bedGraph files, gene and exon expression quantification matrices (Fragments Per Kilobase Of Exon Per Million Fragments Mapped, Transcripts Per Million, raw counts) as well as sample attributes annotated with ontology terms. To date over 270 00 RNA-Seq runs in nearly 10 000 studies (1PB of raw FASTQ data) in 264 species in ENA have been processed and made available via the API. The RNASeq-er API can be accessed at http://www.ebi.ac.uk/fg/rnaseq/api . The commands used to analyse the data are available in supplementary materials and at https://github.com/nunofonseca/irap/wiki/iRAP-single-library . rnaseq@ebi.ac.uk ; rpetry@ebi.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Integrated sequencing of exome and mRNA of large-sized single cells.
Wang, Lily Yan; Guo, Jiajie; Cao, Wei; Zhang, Meng; He, Jiankui; Li, Zhoufang
2018-01-10
Current approaches of single cell DNA-RNA integrated sequencing are difficult to call SNPs, because a large amount of DNA and RNA is lost during DNA-RNA separation. Here, we performed simultaneous single-cell exome and transcriptome sequencing on individual mouse oocytes. Using microinjection, we kept the nuclei intact to avoid DNA loss, while retaining the cytoplasm inside the cell membrane, to maximize the amount of DNA and RNA captured from the single cell. We then conducted exome-sequencing on the isolated nuclei and mRNA-sequencing on the enucleated cytoplasm. For single oocytes, exome-seq can cover up to 92% of exome region with an average sequencing depth of 10+, while mRNA-sequencing reveals more than 10,000 expressed genes in enucleated cytoplasm, with similar performance for intact oocytes. This approach provides unprecedented opportunities to study DNA-RNA regulation, such as RNA editing at single nucleotide level in oocytes. In future, this method can also be applied to other large cells, including neurons, large dendritic cells and large tumour cells for integrated exome and transcriptome sequencing.
Bressel, Eadric; Louder, Talin J; Hoover, James P; Roberts, Luke C; Dolny, Dennis G
2017-11-01
The aim of this study was to determine if selected kinematic measures (foot strike index [SI], knee contact angle and overstride angle) were different between aquatic treadmill (ATM) and land treadmill (LTM) running, and to determine if these measures were altered during LTM running as a result of 6 weeks of ATM training. Acute effects were tested using 15 competitive distance runners who completed 1 session of running on each treadmill type at 5 different running speeds. Subsequently, three recreational runners completed 6 weeks of ATM training following a single-subject baseline, intervention and withdrawal experiment. Kinematic measures were quantified from digitisation of video. Regardless of speed, SI values during ATM running (61.3 ± 17%) were significantly greater (P = 0.002) than LTM running (42.7 ± 23%). Training on the ATM did not change (pre/post) the SI (26 ± 3.2/27 ± 3.1), knee contact angle (165 ± 0.3/164 ± 0.8) or overstride angle (89 ± 0.4/89 ± 0.1) during LTM running. Although SI values were different between acute ATM and LTM running, 6 weeks of ATM training did not appear to alter LTM running kinematics as evidenced by no change in kinematic values from baseline to post intervention assessments.
Single-Cell RNA-Sequencing: Assessment of Differential Expression Analysis Methods.
Dal Molin, Alessandra; Baruzzo, Giacomo; Di Camillo, Barbara
2017-01-01
The sequencing of the transcriptomes of single-cells, or single-cell RNA-sequencing, has now become the dominant technology for the identification of novel cell types and for the study of stochastic gene expression. In recent years, various tools for analyzing single-cell RNA-sequencing data have been proposed, many of them with the purpose of performing differentially expression analysis. In this work, we compare four different tools for single-cell RNA-sequencing differential expression, together with two popular methods originally developed for the analysis of bulk RNA-sequencing data, but largely applied to single-cell data. We discuss results obtained on two real and one synthetic dataset, along with considerations about the perspectives of single-cell differential expression analysis. In particular, we explore the methods performance in four different scenarios, mimicking different unimodal or bimodal distributions of the data, as characteristic of single-cell transcriptomics. We observed marked differences between the selected methods in terms of precision and recall, the number of detected differentially expressed genes and the overall performance. Globally, the results obtained in our study suggest that is difficult to identify a best performing tool and that efforts are needed to improve the methodologies for single-cell RNA-sequencing data analysis and gain better accuracy of results.
ChronQC: a quality control monitoring system for clinical next generation sequencing.
Tawari, Nilesh R; Seow, Justine Jia Wen; Perumal, Dharuman; Ow, Jack L; Ang, Shimin; Devasia, Arun George; Ng, Pauline C
2018-05-15
ChronQC is a quality control (QC) tracking system for clinical implementation of next-generation sequencing (NGS). ChronQC generates time series plots for various QC metrics to allow comparison of current runs to historical runs. ChronQC has multiple features for tracking QC data including Westgard rules for clinical validity, laboratory-defined thresholds and historical observations within a specified time period. Users can record their notes and corrective actions directly onto the plots for long-term recordkeeping. ChronQC facilitates regular monitoring of clinical NGS to enable adherence to high quality clinical standards. ChronQC is freely available on GitHub (https://github.com/nilesh-tawari/ChronQC), Docker (https://hub.docker.com/r/nileshtawari/chronqc/) and the Python Package Index. ChronQC is implemented in Python and runs on all common operating systems (Windows, Linux and Mac OS X). tawari.nilesh@gmail.com or pauline.c.ng@gmail.com. Supplementary data are available at Bioinformatics online.
Research Techniques Made Simple: Single-Cell RNA Sequencing and its Applications in Dermatology.
Wu, Xiaojun; Yang, Bin; Udo-Inyang, Imo; Ji, Suyun; Ozog, David; Zhou, Li; Mi, Qing-Sheng
2018-05-01
RNA sequencing is one of the most highly reliable and reproducible methods of assessing the cell transcriptome. As high-throughput RNA sequencing libraries at the single cell level have recently developed, single cell RNA sequencing has become more feasible and popular in biology research. Single cell RNA sequencing allows investigators to evaluate cell transcriptional profiles at the single cell level. It has become a very useful tool to perform investigations that could not be addressed by other methodologies, such as the assessment of cell-to-cell variation, the identification of rare populations, and the determination of heterogeneity within a cell population. So far, the single cell RNA sequencing technique has been widely applied to embryonic development, immune cell development, and human disease progress and treatment. Here, we describe the history of single cell technology development and its potential application in the field of dermatology. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Pease, Anthony; Sullivan, Stacey; Olby, Natasha; Galano, Heather; Cerda-Gonzalez, Sophia; Robertson, Ian D; Gavin, Patrick; Thrall, Donald
2006-01-01
Three case history reports are presented to illustrate the value of the single-shot turbo spin-echo pulse sequence for assessment of the subarachnoid space. The use of the single-shot turbo spin-echo pulse sequence, which is a heavily T2-weighted sequence, allows for a rapid, noninvasive evaluation of the subarachnoid space by using the high signal from cerebrospinal fluid. This sequence can be completed in seconds rather than the several minutes required for a T2-fast spin-echo sequence. Unlike the standard T2-fast spin-echo sequence, a single-shot turbo spin-echo pulse sequence also provides qualitative information about the protein and the cellular content of the cerebrospinal fluid, such as in patients with inflammatory debris or hemorrhage in the cerebrospinal fluid. Although the resolution of the single-shot turbo spin-echo pulse sequence images is relatively poor compared with more conventional sequences, the qualitative information about the subarachnoid space and cerebrospinal fluid and the rapid acquisition time, make it a useful sequence to include in standard protocols of spinal magnetic resonance imaging.
Modeling genome coverage in single-cell sequencing
Daley, Timothy; Smith, Andrew D.
2014-01-01
Motivation: Single-cell DNA sequencing is necessary for examining genetic variation at the cellular level, which remains hidden in bulk sequencing experiments. But because they begin with such small amounts of starting material, the amount of information that is obtained from single-cell sequencing experiment is highly sensitive to the choice of protocol employed and variability in library preparation. In particular, the fraction of the genome represented in single-cell sequencing libraries exhibits extreme variability due to quantitative biases in amplification and loss of genetic material. Results: We propose a method to predict the genome coverage of a deep sequencing experiment using information from an initial shallow sequencing experiment mapped to a reference genome. The observed coverage statistics are used in a non-parametric empirical Bayes Poisson model to estimate the gain in coverage from deeper sequencing. This approach allows researchers to know statistical features of deep sequencing experiments without actually sequencing deeply, providing a basis for optimizing and comparing single-cell sequencing protocols or screening libraries. Availability and implementation: The method is available as part of the preseq software package. Source code is available at http://smithlabresearch.org/preseq. Contact: andrewds@usc.edu Supplementary information: Supplementary material is available at Bioinformatics online. PMID:25107873
NASA Astrophysics Data System (ADS)
Felder, Thomas; Gambogi, William; Stika, Katherine; Yu, Bao-Ling; Bradley, Alex; Hu, Hongjie; Garreau-Iles, Lucie; Trout, T. John
2016-09-01
DuPont has been working steadily to develop accelerated backsheet tests that correlate with solar panels observations in the field. This report updates efforts in sequential testing. Single exposure tests are more commonly used and can be completed more quickly, and certain tests provide helpful predictions of certain backsheet failure modes. DuPont recommendations for single exposure tests are based on 25-year exposure levels for UV and humidity/temperature, and form a good basis for sequential test development. We recommend a sequential exposure of damp heat followed by UV then repetitions of thermal cycling and UVA. This sequence preserves 25-year exposure levels for humidity/temperature and UV, and correlates well with a large body of field observations. Measurements can be taken at intervals in the test, although the full test runs 10 months. A second, shorter sequential test based on damp heat and thermal cycling tests mechanical durability and correlates with loss of mechanical properties seen in the field. Ongoing work is directed toward shorter sequential tests that preserve good correlation to field data.
Automatic Prediction of Protein 3D Structures by Probabilistic Multi-template Homology Modeling.
Meier, Armin; Söding, Johannes
2015-10-01
Homology modeling predicts the 3D structure of a query protein based on the sequence alignment with one or more template proteins of known structure. Its great importance for biological research is owed to its speed, simplicity, reliability and wide applicability, covering more than half of the residues in protein sequence space. Although multiple templates have been shown to generally increase model quality over single templates, the information from multiple templates has so far been combined using empirically motivated, heuristic approaches. We present here a rigorous statistical framework for multi-template homology modeling. First, we find that the query proteins' atomic distance restraints can be accurately described by two-component Gaussian mixtures. This insight allowed us to apply the standard laws of probability theory to combine restraints from multiple templates. Second, we derive theoretically optimal weights to correct for the redundancy among related templates. Third, a heuristic template selection strategy is proposed. We improve the average GDT-ha model quality score by 11% over single template modeling and by 6.5% over a conventional multi-template approach on a set of 1000 query proteins. Robustness with respect to wrong constraints is likewise improved. We have integrated our multi-template modeling approach with the popular MODELLER homology modeling software in our free HHpred server http://toolkit.tuebingen.mpg.de/hhpred and also offer open source software for running MODELLER with the new restraints at https://bitbucket.org/soedinglab/hh-suite.
Kelly, Scott A.; Bell, Timothy A.; Selitsky, Sara R.; Buus, Ryan J.; Hua, Kunjie; Weinstock, George M.; Garland, Theodore; Pardo-Manuel de Villena, Fernando; Pomp, Daniel
2013-01-01
Replicated artificial selection for high levels of voluntary wheel running in an outbred strain of mice favored an autosomal recessive allele whose primary phenotypic effect is a 50% reduction in hind-limb muscle mass. Within the High Runner (HR) lines of mice, the numerous pleiotropic effects (e.g., larger hearts, reduced total body mass and fat mass, longer hind-limb bones) of this hypothesized adaptive allele include functional characteristics that facilitate high levels of voluntary wheel running (e.g., doubling of mass-specific muscle aerobic capacity, increased fatigue resistance of isolated muscles, longer hind-limb bones). Previously, we created a backcross population suitable for mapping the responsible locus. We phenotypically characterized the population and mapped the Minimsc locus to a 2.6-Mb interval on MMU11, a region containing ∼100 known or predicted genes. Here, we present a novel strategy to identify the genetic variant causing the mini-muscle phenotype. Using high-density genotyping and whole-genome sequencing of key backcross individuals and HR mice with and without the mini-muscle mutation, from both recent and historical generations of the HR lines, we show that a SNP representing a C-to-T transition located in a 709-bp intron between exons 11 and 12 of the Myosin heavy polypeptide 4 (Myh4) skeletal muscle gene (position 67,244,850 on MMU11; assembly, December 2011, GRCm38/mm10; ENSMUSG00000057003) is responsible for the mini-muscle phenotype, Myh4Minimsc. Using next-generation sequencing, our approach can be extended to identify causative mutations arising in mouse inbred lines and thus offers a great avenue to overcome one of the most challenging steps in quantitative genetics. PMID:24056412
Kelly, Scott A; Bell, Timothy A; Selitsky, Sara R; Buus, Ryan J; Hua, Kunjie; Weinstock, George M; Garland, Theodore; Pardo-Manuel de Villena, Fernando; Pomp, Daniel
2013-12-01
Replicated artificial selection for high levels of voluntary wheel running in an outbred strain of mice favored an autosomal recessive allele whose primary phenotypic effect is a 50% reduction in hind-limb muscle mass. Within the High Runner (HR) lines of mice, the numerous pleiotropic effects (e.g., larger hearts, reduced total body mass and fat mass, longer hind-limb bones) of this hypothesized adaptive allele include functional characteristics that facilitate high levels of voluntary wheel running (e.g., doubling of mass-specific muscle aerobic capacity, increased fatigue resistance of isolated muscles, longer hind-limb bones). Previously, we created a backcross population suitable for mapping the responsible locus. We phenotypically characterized the population and mapped the Minimsc locus to a 2.6-Mb interval on MMU11, a region containing ∼100 known or predicted genes. Here, we present a novel strategy to identify the genetic variant causing the mini-muscle phenotype. Using high-density genotyping and whole-genome sequencing of key backcross individuals and HR mice with and without the mini-muscle mutation, from both recent and historical generations of the HR lines, we show that a SNP representing a C-to-T transition located in a 709-bp intron between exons 11 and 12 of the Myosin heavy polypeptide 4 (Myh4) skeletal muscle gene (position 67,244,850 on MMU11; assembly, December 2011, GRCm38/mm10; ENSMUSG00000057003) is responsible for the mini-muscle phenotype, Myh4(Minimsc). Using next-generation sequencing, our approach can be extended to identify causative mutations arising in mouse inbred lines and thus offers a great avenue to overcome one of the most challenging steps in quantitative genetics.
Industrial applications of high-performance computing for phylogeny reconstruction
NASA Astrophysics Data System (ADS)
Bader, David A.; Moret, Bernard M.; Vawter, Lisa
2001-07-01
Phylogenies (that is, tree-of-life relationships) derived from gene order data may prove crucial in answering some fundamental open questions in biomolecular evolution. Real-world interest is strong in determining these relationships. For example, pharmaceutical companies may use phylogeny reconstruction in drug discovery for discovering synthetic pathways unique to organisms that they wish to target. Health organizations study the phylogenies of organisms such as HIV in order to understand their epidemiologies and to aid in predicting the behaviors of future outbreaks. And governments are interested in aiding the production of such foodstuffs as rice, wheat and potatoes via genetics through understanding of the phylogenetic distribution of genetic variation in wild populations. Yet few techniques are available for difficult phylogenetic reconstruction problems. Appropriate tools for analysis of such data may aid in resolving some of the phylogenetic problems that have been analyzed without much resolution for decades. With the rapid accumulation of whole genome sequences for a wide diversity of taxa, especially microbial taxa, phylogenetic reconstruction based on changes in gene order and gene content is showing promise, particularly for resolving deep (i.e., ancient) branch splits. However, reconstruction from gene-order data is even more computationally expensive than reconstruction from sequence data, particularly in groups with large numbers of genes and highly-rearranged genomes. We have developed a software suite, GRAPPA, that extends the breakpoint analysis (BPAnalysis) method of Sankoff and Blanchette while running much faster: in a recent analysis of chloroplast genome data for species of Campanulaceae on a 512-processor Linux supercluster with Myrinet, we achieved a one-million-fold speedup over BPAnalysis. GRAPPA can use either breakpoint or inversion distance (computed exactly) for its computation and runs on single-processor machines as well as parallel and high-performance computers.
Jou, Jonathan D; Jain, Swati; Georgiev, Ivelin S; Donald, Bruce R
2016-06-01
Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem.
Single-Molecule Electrical Random Resequencing of DNA and RNA
NASA Astrophysics Data System (ADS)
Ohshiro, Takahito; Matsubara, Kazuki; Tsutsui, Makusu; Furuhashi, Masayuki; Taniguchi, Masateru; Kawai, Tomoji
2012-07-01
Two paradigm shifts in DNA sequencing technologies--from bulk to single molecules and from optical to electrical detection--are expected to realize label-free, low-cost DNA sequencing that does not require PCR amplification. It will lead to development of high-throughput third-generation sequencing technologies for personalized medicine. Although nanopore devices have been proposed as third-generation DNA-sequencing devices, a significant milestone in these technologies has been attained by demonstrating a novel technique for resequencing DNA using electrical signals. Here we report single-molecule electrical resequencing of DNA and RNA using a hybrid method of identifying single-base molecules via tunneling currents and random sequencing. Our method reads sequences of nine types of DNA oligomers. The complete sequence of 5'-UGAGGUA-3' from the let-7 microRNA family was also identified by creating a composite of overlapping fragment sequences, which was randomly determined using tunneling current conducted by single-base molecules as they passed between a pair of nanoelectrodes.
New operator assistance features in the CMS Run Control System
NASA Astrophysics Data System (ADS)
Andre, J.-M.; Behrens, U.; Branson, J.; Brummer, P.; Chaze, O.; Cittolin, S.; Contescu, C.; Craigs, B. G.; Darlea, G.-L.; Deldicque, C.; Demiragli, Z.; Dobson, M.; Doualot, N.; Erhan, S.; Fulcher, J. R.; Gigi, D.; Gładki, M.; Glege, F.; Gomez-Ceballos, G.; Hegeman, J.; Holzner, A.; Janulis, M.; Jimenez-Estupiñán, R.; Masetti, L.; Meijers, F.; Meschi, E.; Mommsen, R. K.; Morovic, S.; O'Dell, V.; Orsini, L.; Paus, C.; Petrova, P.; Pieri, M.; Racz, A.; Reis, T.; Sakulin, H.; Schwick, C.; Simelevicius, D.; Vougioukas, M.; Zejdl, P.
2017-10-01
During Run-1 of the LHC, many operational procedures have been automated in the run control system of the Compact Muon Solenoid (CMS) experiment. When detector high voltages are ramped up or down or upon certain beam mode changes of the LHC, the DAQ system is automatically partially reconfigured with new parameters. Certain types of errors such as errors caused by single-event upsets may trigger an automatic recovery procedure. Furthermore, the top-level control node continuously performs cross-checks to detect sub-system actions becoming necessary because of changes in configuration keys, changes in the set of included front-end drivers or because of potential clock instabilities. The operator is guided to perform the necessary actions through graphical indicators displayed next to the relevant command buttons in the user interface. Through these indicators, consistent configuration of CMS is ensured. However, manually following the indicators can still be inefficient at times. A new assistant to the operator has therefore been developed that can automatically perform all the necessary actions in a streamlined order. If additional problems arise, the new assistant tries to automatically recover from these. With the new assistant, a run can be started from any state of the sub-systems with a single click. An ongoing run may be recovered with a single click, once the appropriate recovery action has been selected. We review the automation features of CMS Run Control and discuss the new assistant in detail including first operational experience.
New Operator Assistance Features in the CMS Run Control System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andre, J.M.; et al.
During Run-1 of the LHC, many operational procedures have been automated in the run control system of the Compact Muon Solenoid (CMS) experiment. When detector high voltages are ramped up or down or upon certain beam mode changes of the LHC, the DAQ system is automatically partially reconfigured with new parameters. Certain types of errors such as errors caused by single-event upsets may trigger an automatic recovery procedure. Furthermore, the top-level control node continuously performs cross-checks to detect sub-system actions becoming necessary because of changes in configuration keys, changes in the set of included front-end drivers or because of potentialmore » clock instabilities. The operator is guided to perform the necessary actions through graphical indicators displayed next to the relevant command buttons in the user interface. Through these indicators, consistent configuration of CMS is ensured. However, manually following the indicators can still be inefficient at times. A new assistant to the operator has therefore been developed that can automatically perform all the necessary actions in a streamlined order. If additional problems arise, the new assistant tries to automatically recover from these. With the new assistant, a run can be started from any state of the sub-systems with a single click. An ongoing run may be recovered with a single click, once the appropriate recovery action has been selected. We review the automation features of CMS Run Control and discuss the new assistant in detail including first operational experience.« less
NASA Technical Reports Server (NTRS)
Smith, David J.; Burton, Aaron; Castro-Wallace, Sarah; John, Kristen; Stahl, Sarah E.; Dworkin, Jason Peter; Lupisella, Mark L.
2016-01-01
On the International Space Station (ISS), technologies capable of rapid microbial identification and disease diagnostics are not currently available. NASA still relies upon sample return for comprehensive, molecular-based sample characterization. Next-generation DNA sequencing is a powerful approach for identifying microorganisms in air, water, and surfaces onboard spacecraft. The Biomolecule Sequencer payload, manifested to SpaceX-9 and scheduled on the Increment 4748 research plan (June 2016), will assess the functionality of a commercially-available next-generation DNA sequencer in the microgravity environment of ISS. The MinION device from Oxford Nanopore Technologies (Oxford, UK) measures picoamp changes in electrical current dependent on nucleotide sequences of the DNA strand migrating through nanopores in the system. The hardware is exceptionally small (9.5 x 3.2 x 1.6 cm), lightweight (120 grams), and powered only by a USB connection. For the ISS technology demonstration, the Biomolecule Sequencer will be powered by a Microsoft Surface Pro3. Ground-prepared samples containing lambda bacteriophage, Escherichia coli, and mouse genomic DNA, will be launched and stored frozen on the ISS until experiment initiation. Immediately prior to sequencing, a crew member will collect and thaw frozen DNA samples, connect the sequencer to the Surface Pro3, inject thawed samples into a MinION flow cell, and initiate sequencing. At the completion of the sequencing run, data will be downlinked for ground analysis. Identical, synchronous ground controls will be used for data comparisons to determine sequencer functionality, run-time sequence, current dynamics, and overall accuracy. We will present our latest results from the ISS flight experiment the first time DNA has ever been sequenced in space and discuss the many potential applications of the Biomolecule Sequencer for environmental monitoring, medical diagnostics, higher fidelity and more adaptable Space Biology Human Research Program investigations, and even life detection experiments for astrobiology missions.
Single-Cell RNA Sequencing of the Bronchial Epithelium in Smokers With Lung Cancer
2015-07-01
AWARD NUMBER: W81XWH-14-1-0234 TITLE: Single-Cell RNA Sequencing of the Bronchial Epithelium in Smokers With Lung Cancer PRINCIPAL INVESTIGATOR...TITLE AND SUBTITLE Single-Cell RNA Sequencing of the Bronchial Epithelium in Smokers With Lung Cancer 5a. CONTRACT NUMBER 5b. GRANT NUMBER W81XWH...single cell RNA sequencing on airway epithelial cells obtained from smokers with and without lung cancer to identify cell-type dependent gene expression
Helaers, Raphaël; Milinkovitch, Michel C
2010-07-15
The development, in the last decade, of stochastic heuristics implemented in robust application softwares has made large phylogeny inference a key step in most comparative studies involving molecular sequences. Still, the choice of a phylogeny inference software is often dictated by a combination of parameters not related to the raw performance of the implemented algorithm(s) but rather by practical issues such as ergonomics and/or the availability of specific functionalities. Here, we present MetaPIGA v2.0, a robust implementation of several stochastic heuristics for large phylogeny inference (under maximum likelihood), including a Simulated Annealing algorithm, a classical Genetic Algorithm, and the Metapopulation Genetic Algorithm (metaGA) together with complex substitution models, discrete Gamma rate heterogeneity, and the possibility to partition data. MetaPIGA v2.0 also implements the Likelihood Ratio Test, the Akaike Information Criterion, and the Bayesian Information Criterion for automated selection of substitution models that best fit the data. Heuristics and substitution models are highly customizable through manual batch files and command line processing. However, MetaPIGA v2.0 also offers an extensive graphical user interface for parameters setting, generating and running batch files, following run progress, and manipulating result trees. MetaPIGA v2.0 uses standard formats for data sets and trees, is platform independent, runs in 32 and 64-bits systems, and takes advantage of multiprocessor and multicore computers. The metaGA resolves the major problem inherent to classical Genetic Algorithms by maintaining high inter-population variation even under strong intra-population selection. Implementation of the metaGA together with additional stochastic heuristics into a single software will allow rigorous optimization of each heuristic as well as a meaningful comparison of performances among these algorithms. MetaPIGA v2.0 gives access both to high customization for the phylogeneticist, as well as to an ergonomic interface and functionalities assisting the non-specialist for sound inference of large phylogenetic trees using nucleotide sequences. MetaPIGA v2.0 and its extensive user-manual are freely available to academics at http://www.metapiga.org.
2010-01-01
Background The development, in the last decade, of stochastic heuristics implemented in robust application softwares has made large phylogeny inference a key step in most comparative studies involving molecular sequences. Still, the choice of a phylogeny inference software is often dictated by a combination of parameters not related to the raw performance of the implemented algorithm(s) but rather by practical issues such as ergonomics and/or the availability of specific functionalities. Results Here, we present MetaPIGA v2.0, a robust implementation of several stochastic heuristics for large phylogeny inference (under maximum likelihood), including a Simulated Annealing algorithm, a classical Genetic Algorithm, and the Metapopulation Genetic Algorithm (metaGA) together with complex substitution models, discrete Gamma rate heterogeneity, and the possibility to partition data. MetaPIGA v2.0 also implements the Likelihood Ratio Test, the Akaike Information Criterion, and the Bayesian Information Criterion for automated selection of substitution models that best fit the data. Heuristics and substitution models are highly customizable through manual batch files and command line processing. However, MetaPIGA v2.0 also offers an extensive graphical user interface for parameters setting, generating and running batch files, following run progress, and manipulating result trees. MetaPIGA v2.0 uses standard formats for data sets and trees, is platform independent, runs in 32 and 64-bits systems, and takes advantage of multiprocessor and multicore computers. Conclusions The metaGA resolves the major problem inherent to classical Genetic Algorithms by maintaining high inter-population variation even under strong intra-population selection. Implementation of the metaGA together with additional stochastic heuristics into a single software will allow rigorous optimization of each heuristic as well as a meaningful comparison of performances among these algorithms. MetaPIGA v2.0 gives access both to high customization for the phylogeneticist, as well as to an ergonomic interface and functionalities assisting the non-specialist for sound inference of large phylogenetic trees using nucleotide sequences. MetaPIGA v2.0 and its extensive user-manual are freely available to academics at http://www.metapiga.org. PMID:20633263
Quantum-Sequencing: Fast electronic single DNA molecule sequencing
NASA Astrophysics Data System (ADS)
Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant
2014-03-01
A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.
Yanagi, Tomohiro; Shirasawa, Kenta; Terachi, Mayuko; Isobe, Sachiko
2017-01-01
Cultivated strawberry ( Fragaria × ananassa Duch.) has homoeologous chromosomes because of allo-octoploidy. For example, two homoeologous chromosomes that belong to different sub-genome of allopolyploids have similar base sequences. Thus, when conducting de novo assembly of DNA sequences, it is difficult to determine whether these sequences are derived from the same chromosome. To avoid the difficulties associated with homoeologous chromosomes and demonstrate the possibility of sequencing allopolyploids using single chromosomes, we conducted sequence analysis using microdissected single somatic chromosomes of cultivated strawberry. Three hundred and ten somatic chromosomes of the Japanese octoploid strawberry 'Reiko' were individually selected under a light microscope using a microdissection system. DNA from 288 of the dissected chromosomes was successfully amplified using a DNA amplification kit. Using next-generation sequencing, we decoded the base sequences of the amplified DNA segments, and on the basis of mapping, we identified DNA sequences from 144 samples that were best matched to the reference genomes of the octoploid strawberry, F. × ananassa , and the diploid strawberry, F. vesca . The 144 samples were classified into seven pseudo-molecules of F. vesca . The coverage rates of the DNA sequences from the single chromosome onto all pseudo-molecular sequences varied from 3 to 29.9%. We demonstrated an efficient method for sequence analysis of allopolyploid plants using microdissected single chromosomes. On the basis of our results, we believe that whole-genome analysis of allopolyploid plants can be enhanced using methodology that employs microdissected single chromosomes.
Implications of random variation in the Stand Prognosis Model
David A. Hamilton
1991-01-01
Although the Stand Prognosis Model has several stochastic components, features have been included in the model in an attempt to minimize run-to-run variation attributable to these stochastic components. This has led many users to assume that comparisons of management alternatives could be made based on a single run of the model for each alternative. Recent analyses...
Dual Optical Comb LWIR Source and Sensor
2017-10-12
Figure 39. Locking loop only controls one parameter, whereas there are two free- running parameters to control...optical frequency, along with a 12 point running average (black) equivalent to a 4 cm -1 resolution. .............................. 52 Figure 65...and processed on a single epitaxial substrate. Each OFC will be electrically driven and free- running (requiring no optical locking mechanisms). This
NASA Astrophysics Data System (ADS)
Zhang, Miao; Tong, Xiaojun
2017-07-01
This paper proposes a joint image encryption and compression scheme based on a new hyperchaotic system and curvelet transform. A new five-dimensional hyperchaotic system based on the Rabinovich system is presented. By means of the proposed hyperchaotic system, a new pseudorandom key stream generator is constructed. The algorithm adopts diffusion and confusion structure to perform encryption, which is based on the key stream generator and the proposed hyperchaotic system. The key sequence used for image encryption is relation to plain text. By means of the second generation curvelet transform, run-length coding, and Huffman coding, the image data are compressed. The joint operation of compression and encryption in a single process is performed. The security test results indicate the proposed methods have high security and good compression effect.
Spreadsheet macros for coloring sequence alignments.
Haygood, M G
1993-12-01
This article describes a set of Microsoft Excel macros designed to color amino acid and nucleotide sequence alignments for review and preparation of visual aids. The colored alignments can then be modified to emphasize features of interest. Procedures for importing and coloring sequences are described. The macro file adds a new menu to the menu bar containing sequence-related commands to enable users unfamiliar with Excel to use the macros more readily. The macros were designed for use with Macintosh computers but will also run with the DOS version of Excel.
Using single cell sequencing data to model the evolutionary history of a tumor.
Kim, Kyung In; Simon, Richard
2014-01-24
The introduction of next-generation sequencing (NGS) technology has made it possible to detect genomic alterations within tumor cells on a large scale. However, most applications of NGS show the genetic content of mixtures of cells. Recently developed single cell sequencing technology can identify variation within a single cell. Characterization of multiple samples from a tumor using single cell sequencing can potentially provide information on the evolutionary history of that tumor. This may facilitate understanding how key mutations accumulate and evolve in lineages to form a heterogeneous tumor. We provide a computational method to infer an evolutionary mutation tree based on single cell sequencing data. Our approach differs from traditional phylogenetic tree approaches in that our mutation tree directly describes temporal order relationships among mutation sites. Our method also accommodates sequencing errors. Furthermore, we provide a method for estimating the proportion of time from the earliest mutation event of the sample to the most recent common ancestor of the sample of cells. Finally, we discuss current limitations on modeling with single cell sequencing data and possible improvements under those limitations. Inferring the temporal ordering of mutational sites using current single cell sequencing data is a challenge. Our proposed method may help elucidate relationships among key mutations and their role in tumor progression.
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics
Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf
2015-01-01
Motivation: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of O(n6). Subsequently, numerous faster ‘Sankoff-style’ approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity (≥ quartic time). Results: Breaking this barrier, we introduce the novel Sankoff-style algorithm ‘sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)’, which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff’s original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. Availability and implementation: SPARSE is freely available at http://www.bioinf.uni-freiburg.de/Software/SPARSE. Contact: backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25838465
[The principle and application of the single-molecule real-time sequencing technology].
Yanhu, Liu; Lu, Wang; Li, Yu
2015-03-01
Last decade witnessed the explosive development of the third-generation sequencing strategy, including single-molecule real-time sequencing (SMRT), true single-molecule sequencing (tSMSTM) and the single-molecule nanopore DNA sequencing. In this review, we summarize the principle, performance and application of the SMRT sequencing technology. Compared with the traditional Sanger method and the next-generation sequencing (NGS) technologies, the SMRT approach has several advantages, including long read length, high speed, PCR-free and the capability of direct detection of epigenetic modifications. However, the disadvantage of its low accuracy, most of which resulted from insertions and deletions, is also notable. So, the raw sequence data need to be corrected before assembly. Up to now, the SMRT is a good fit for applications in the de novo genomic sequencing and the high-quality assemblies of small genomes. In the future, it is expected to play an important role in epigenetics, transcriptomic sequencing, and assemblies of large genomes.
Reducing assembly complexity of microbial genomes with single-molecule sequencing.
Koren, Sergey; Harhay, Gregory P; Smith, Timothy P L; Bono, James L; Harhay, Dayna M; Mcvey, Scott D; Radune, Diana; Bergman, Nicholas H; Phillippy, Adam M
2013-01-01
The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Third-generation, single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem. To measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These single-library assemblies are also more accurate than typical short-read assemblies and hybrid assemblies of short and long reads. Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to $1,000 for most genomes, and future advances in this technology are expected to drive the cost lower. This is expected to increase the number of completed genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan-genomes and chromosomal organization.
Automated Flight Dynamics Product Generation for the EOS AM-1 Spacecraft
NASA Technical Reports Server (NTRS)
Matusow, Carla
1999-01-01
As part of NASA's Earth Science Enterprise, the Earth Observing System (EOS) AM-1 spacecraft is designed to monitor long-term, global, environmental changes. Because of the complexity of the AM-1 spacecraft, the mission operations center requires more than 80 distinct flight dynamics products (reports). To create these products, the AM-1 Flight Dynamics Team (FDT) will use a combination of modified commercial software packages (e.g., Analytical Graphic's Satellite ToolKit) and NASA-developed software applications. While providing the most cost-effective solution to meeting the mission requirements, the integration of these software applications raises several operational concerns: (1) Routine product generation requires knowledge of multiple applications executing on variety of hardware platforms. (2) Generating products is a highly interactive process requiring a user to interact with each application multiple times to generate each product. (3) Routine product generation requires several hours to complete. (4) User interaction with each application introduces the potential for errors, since users are required to manually enter filenames and input parameters as well as run applications in the correct sequence. Generating products requires some level of flight dynamics expertise to determine the appropriate inputs and sequencing. To address these issues, the FDT developed an automation software tool called AutoProducts, which runs on a single hardware platform and provides all necessary coordination and communication among the various flight dynamics software applications. AutoProducts, autonomously retrieves necessary files, sequences and executes applications with correct input parameters, and deliver the final flight dynamics products to the appropriate customers. Although AutoProducts will normally generate pre-programmed sets of routine products, its graphical interface allows for easy configuration of customized and one-of-a-kind products. Additionally, AutoProducts has been designed as a mission-independent tool, and can be easily reconfigured to support other missions or incorporate new flight dynamics software packages. After the AM-1 launch, AutoProducts will run automatically at pre-determined time intervals . The AutoProducts tool reduces many of the concerns associated with the flight dynamics product generation. Although AutoProducts required a significant effort to develop because of the complexity of the interfaces involved, its use will provide significant cost savings through reduced operator time and maximum product reliability. In addition, user satisfaction is significantly improved and flight dynamics experts have more time to perform valuable analysis work. This paper will describe the evolution of the AutoProducts tool, highlighting the cost savings and customer satisfaction resulting from its development. It will also provide details about the tool including its graphical interface and operational capabilities.
Geography and Location Are the Primary Drivers of Office Microbiome Composition
Chase, John; Fouquier, Jennifer; Zare, Mahnaz; Sonderegger, Derek L.; Knight, Rob; Kelley, Scott T.; Siegel, Jeffrey
2016-01-01
ABSTRACT In the United States, humans spend the majority of their time indoors, where they are exposed to the microbiome of the built environment (BE) they inhabit. Despite the ubiquity of microbes in BEs and their potential impacts on health and building materials, basic questions about the microbiology of these environments remain unanswered. We present a study on the impacts of geography, material type, human interaction, location in a room, seasonal variation, and indoor and microenvironmental parameters on bacterial communities in offices. Our data elucidate several important features of microbial communities in BEs. First, under normal office environmental conditions, bacterial communities do not differ on the basis of surface material (e.g., ceiling tile or carpet) but do differ on the basis of the location in a room (e.g., ceiling or floor), two features that are often conflated but that we are able to separate here. We suspect that previous work showing differences in bacterial composition with surface material was likely detecting differences based on different usage patterns. Next, we find that offices have city-specific bacterial communities, such that we can accurately predict which city an office microbiome sample is derived from, but office-specific bacterial communities are less apparent. This differs from previous work, which has suggested office-specific compositions of bacterial communities. We again suspect that the difference from prior work arises from different usage patterns. As has been previously shown, we observe that human skin contributes heavily to the composition of BE surfaces. IMPORTANCE Our study highlights several points that should impact the design of future studies of the microbiology of BEs. First, projects tracking changes in BE bacterial communities should focus sampling efforts on surveying different locations in offices and in different cities but not necessarily different materials or different offices in the same city. Next, disturbance due to repeated sampling, though detectable, is small compared to that due to other variables, opening up a range of longitudinal study designs in the BE. Next, studies requiring more samples than can be sequenced on a single sequencing run (which is increasingly common) must control for run effects by including some of the same samples in all of the sequencing runs as technical replicates. Finally, detailed tracking of indoor and material environment covariates is likely not essential for BE microbiome studies, as the normal range of indoor environmental conditions is likely not large enough to impact bacterial communities. PMID:27822521
Kitanaka, Nobue; Kitanaka, Junichi; Hall, F. Scott; Uhl, George R.; Watabe, Kaname; Kubo, Hitoshi; Takahashi, Hitoshi; Tatsuta, Tomohiro; Morita, Yoshio; Takemura, Motohiko
2014-01-01
Repeated intermittent administration of amphetamines acutely increases appetitive and consummatory aspects of motivated behaviors as well as general activity and exploratory behavior, including voluntary running wheel activity. Subsequently, if the drug is withdrawn, the frequency of these behaviors decrease, which is thought to be indicative of dysphoric symptoms associated with amphetamine withdrawal. Such decreases may be observed after chronic treatment or even after single drug administrations. In the present study, the effect of acute methamphetamine (METH) on running wheel activity, horizontal locomotion, appetitive behavior (food access), and consummatory behavior (food and water intake) was investigated in mice. A multi-configuration behavior apparatus designed to monitor the five behaviors was developed, where combined measures were recorded simultaneously. In the first experiment, naïve male ICR mice showed gradually increasing running wheel activity over three consecutive days after exposure to a running wheel, while mice without a running wheel showed gradually decreasing horizontal locomotion, consistent with running wheel activity being a positively motivated form of natural motor activity. In experiment 2, increased horizontal locomotion and food access, and decreased food intake, were observed for the initial 3 h after acute METH challenge. Subsequently, during the dark phase period decreased running wheel activity and horizontal locomotion were observed. The reductions in running wheel activity and horizontal locomotion may be indicative of reduced dopaminergic function, although it remains to be seen if these changes may be more pronounced after more prolonged METH treatments. PMID:22079320
Nadkarni, P M; Miller, P L
1991-01-01
A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.
Breakpoint structure of the Anopheles gambiae 2Rb chromosomal inversion.
Lobo, Neil F; Sangaré, Djibril M; Regier, Allison A; Reidenbach, Kyanne R; Bretz, David A; Sharakhova, Maria V; Emrich, Scott J; Traore, Sekou F; Costantini, Carlo; Besansky, Nora J; Collins, Frank H
2010-10-25
Alternative arrangements of chromosome 2 inversions in Anopheles gambiae are important sources of population structure, and are associated with adaptation to environmental heterogeneity. The forces responsible for their origin and maintenance are incompletely understood. Molecular characterization of inversion breakpoints provides insight into how they arose, and provides the basis for development of molecular karyotyping methods useful in future studies. Sequence comparison of regions near the cytological breakpoints of 2Rb allowed the molecular delineation of breakpoint boundaries. Comparisons were made between the standard 2R+b arrangement in the An. gambiae PEST reference genome and the inverted 2Rb arrangements in the An. gambiae M and S genome assemblies. Sequence differences between alternative 2Rb arrangements were exploited in the design of a PCR diagnostic assay, which was evaluated against the known chromosomal banding pattern of laboratory colonies and field-collected samples from Mali and Cameroon. The breakpoints of the 7.55 Mb 2Rb inversion are flanked by extensive runs of the same short (72 bp) tandemly organized sequence, which was likely responsible for chromosomal breakage and rearrangement. Application of the molecular diagnostic assay suggested that 2Rb has a single common origin in An. gambiae and its sibling species, Anopheles arabiensis, and also that the standard arrangement (2R+b) may have arisen twice through breakpoint reuse. The molecular diagnostic was reliable when applied to laboratory colonies, but its accuracy was lower in natural populations. The complex repetitive sequence flanking the 2Rb breakpoint region may be prone to structural and sequence-level instability. The 2Rb molecular diagnostic has immediate application in studies based on laboratory colonies, but its usefulness in natural populations awaits development of complementary molecular tools.
De Barba, M; Miquel, C; Lobréaux, S; Quenette, P Y; Swenson, J E; Taberlet, P
2017-05-01
Microsatellite markers have played a major role in ecological, evolutionary and conservation research during the past 20 years. However, technical constrains related to the use of capillary electrophoresis and a recent technological revolution that has impacted other marker types have brought to question the continued use of microsatellites for certain applications. We present a study for improving microsatellite genotyping in ecology using high-throughput sequencing (HTS). This approach entails selection of short markers suitable for HTS, sequencing PCR-amplified microsatellites on an Illumina platform and bioinformatic treatment of the sequence data to obtain multilocus genotypes. It takes advantage of the fact that HTS gives direct access to microsatellite sequences, allowing unambiguous allele identification and enabling automation of the genotyping process through bioinformatics. In addition, the massive parallel sequencing abilities expand the information content of single experimental runs far beyond capillary electrophoresis. We illustrated the method by genotyping brown bear samples amplified with a multiplex PCR of 13 new microsatellite markers and a sex marker. HTS of microsatellites provided accurate individual identification and parentage assignment and resulted in a significant improvement of genotyping success (84%) of faecal degraded DNA and costs reduction compared to capillary electrophoresis. The HTS approach holds vast potential for improving success, accuracy, efficiency and standardization of microsatellite genotyping in ecological and conservation applications, especially those that rely on profiling of low-quantity/quality DNA and on the construction of genetic databases. We discuss and give perspectives for the implementation of the method in the light of the challenges encountered in wildlife studies. © 2016 John Wiley & Sons Ltd.
Péterfia, Bálint; Kalmár, Alexandra; Patai, Árpád V; Csabai, István; Bodor, András; Micsik, Tamás; Wichmann, Barnabás; Egedi, Krisztina; Hollósi, Péter; Kovalszky, Ilona; Tulassay, Zsolt; Molnár, Béla
2017-01-01
Background: To support cancer therapy, development of low cost library preparation techniques for targeted next generation sequencing (NGS) is needed. In this study we designed and tested a PCR-based library preparation panel with limited target area for sequencing the top 12 somatic mutation hot spots in colorectal cancer on the GS Junior instrument. Materials and Methods: A multiplex PCR panel was designed to amplify regions of mutation hot spots in 12 selected genes ( APC, BRAF, CTNNB1, EGFR, FBXW7, KRAS, NRAS, MSH6, PIK3CA, SMAD2, SMAD4, TP53 ). Amplicons were sequenced on a GS Junior instrument using ligated and barcoded adaptors. Eight samples were sequenced in a single run. Colonic DNA samples (8 normal mucosa; 33 adenomas; 17 adenocarcinomas) as well as HT-29 and Caco-2 cell lines with known mutation profiles were analyzed. Variants found by the panel on APC, BRAF, KRAS and NRAS genes were validated by conventional sequencing. Results: In total, 34 kinds of mutations were detected including two novel mutations ( FBXW7 c.1740:C>G and SMAD4 c.413C>G) that have not been recorded in mutation databases, and one potential germline mutation ( APC ). The most frequently mutated genes were APC, TP53 and KRAS with 30%, 15% and 21% frequencies in adenomas and 29%, 53% and 29% frequencies in carcinomas, respectively. In cell lines, all the expected mutations were detected except for one located in a homopolymer region. According to re-sequencing results sensitivity and specificity was 100% and 92% respectively. Conclusions: Our NGS-based screening panel denotes a promising step towards low cost colorectal cancer genotyping on the GS Junior instrument. Despite the relatively low coverage, we discovered two novel mutations and obtained mutation frequencies comparable to literature data. Additionally, as an advantage, this panel requires less template DNA than sequence capture colon cancer panels currently available for the GS Junior instrument.
NASA Technical Reports Server (NTRS)
Mathews, William S.; Liu, Ning; Francis, Laurie K.; OReilly, Taifun L.; Schrock, Mitchell; Page, Dennis N.; Morris, John R.; Joswig, Joseph C.; Crockett, Thomas M.; Shams, Khawaja S.
2011-01-01
Previously, it was time-consuming to hand-edit data and then set up simulation runs to find the effect and impact of the input data on a spacecraft. MPS Editor provides the user the capability to create/edit/update models and sequences, and immediately try them out using what appears to the user as one piece of software. MPS Editor provides an integrated sequencing environment for users. It provides them with software that can be utilized during development as well as actual operations. In addition, it provides them with a single, consistent, user friendly interface. MPS Editor uses the Eclipse Rich Client Platform to provide an environment that can be tailored to specific missions. It provides the capability to create and edit, and includes an Activity Dictionary to build the simulation spacecraft models, build and edit sequences of commands, and model the effects of those commands on the spacecraft. MPS Editor is written in Java using the Eclipse Rich Client Platform. It is currently built with four perspectives: the Activity Dictionary Perspective, the Project Adaptation Perspective, the Sequence Building Perspective, and the Sequence Modeling Perspective. Each perspective performs a given task. If a mission doesn't require that task, the unneeded perspective is not added to that project's delivery. In the Activity Dictionary Perspective, the user builds the project-specific activities, observations, calibrations, etc. Typically, this is used during the development phases of the mission, although it can be used later to make changes and updates to the Project Activity Dictionary. In the Adaptation Perspective, the user creates the spacecraft models such as power, data store, etc. Again, this is typically used during development, but will be used to update or add models of the spacecraft. The Sequence Building Perspective allows the user to create a sequence of activities or commands that go to the spacecraft. It provides a simulation of the activities and commands that have been created.
How behavioral economics can help to avoid 'The last mile problem' in whole genome sequencing.
Blumenthal-Barby, Jennifer S; McGuire, Amy L; Green, Robert C; Ubel, Peter A
2015-01-01
Failure to consider lessons from behavioral economics in the case of whole genome sequencing may cause us to run into the 'last mile problem' - the failure to integrate newly developed technology, on which billions of dollars have been invested, into society in a way that improves human behavior and decision-making.
1981-03-01
Again E( XnX 1 Xn) Xn + (l-aB)/X PlXn-1 + (l-Pl)/x 2.11) and X0 E0 gives a stationary sequence. Thus the correla- tions and regressions are the...sequence, although the sample paths will tend to have runs-up. A similar analysis given in Lawrance and Lewis [5] shows that 1 1 + i a + au (3.7) E( XnX
A trace display and editing program for data from fluorescence based sequencing machines.
Gleeson, T; Hillier, L
1991-12-11
'Ted' (Trace editor) is a graphical editor for sequence and trace data from automated fluorescence sequencing machines. It provides facilities for viewing sequence and trace data (in top or bottom strand orientation), for editing the base sequence, for automated or manual trimming of the head (vector) and tail (uncertain data) from the sequence, for vertical and horizontal trace scaling, for keeping a history of sequence editing, and for output of the edited sequence. Ted has been used extensively in the C.elegans genome sequencing project, both as a stand-alone program and integrated into the Staden sequence assembly package, and has greatly aided in the efficiency and accuracy of sequence editing. It runs in the X windows environment on Sun workstations and is available from the authors. Ted currently supports sequence and trace data from the ABI 373A and Pharmacia A.L.F. sequencers.
Performance of a supercharged direct-injection stratified-charge rotary combustion engine
NASA Technical Reports Server (NTRS)
Bartrand, Timothy A.; Willis, Edward A.
1990-01-01
A zero-dimensional thermodynamic performance computer model for direct-injection stratified-charge rotary combustion engines was modified and run for a single rotor supercharged engine. Operating conditions for the computer runs were a single boost pressure and a matrix of speeds, loads and engine materials. A representative engine map is presented showing the predicted range of efficient operation. After discussion of the engine map, a number of engine features are analyzed individually. These features are: heat transfer and the influence insulating materials have on engine performance and exhaust energy; intake manifold pressure oscillations and interactions with the combustion chamber; and performance losses and seal friction. Finally, code running times and convergence data are presented.
NASA Astrophysics Data System (ADS)
Yang, Yao-Joe; Kuo, Wen-Cheng; Fan, Kuang-Chao
2006-01-01
In this work, we present a single-run single-mask (SRM) process for fabricating suspended high-aspect-ratio structures on standard silicon wafers using an inductively coupled plasma-reactive ion etching (ICP-RIE) etcher. This process eliminates extra fabrication steps which are required for structure release after trench etching. Released microstructures with 120 μm thickness are obtained by this process. The corresponding maximum aspect ratio of the trench is 28. The SRM process is an extended version of the standard process proposed by BOSCH GmbH (BOSCH process). The first step of the SRM process is a standard BOSCH process for trench etching, then a polymer layer is deposited on trench sidewalls as a protective layer for the subsequent structure-releasing step. The structure is released by dry isotropic etching after the polymer layer on the trench floor is removed. All the steps can be integrated into a single-run ICP process. Also, only one mask is required. Therefore, the process complexity and fabrication cost can be effectively reduced. Discussions on each SRM step and considerations for avoiding undesired etching of the silicon structures during the release process are also presented.
Ardui, Simon; Ameur, Adam; Vermeesch, Joris R; Hestand, Matthew S
2018-01-01
Abstract Short read massive parallel sequencing has emerged as a standard diagnostic tool in the medical setting. However, short read technologies have inherent limitations such as GC bias, difficulties mapping to repetitive elements, trouble discriminating paralogous sequences, and difficulties in phasing alleles. Long read single molecule sequencers resolve these obstacles. Moreover, they offer higher consensus accuracies and can detect epigenetic modifications from native DNA. The first commercially available long read single molecule platform was the RS system based on PacBio's single molecule real-time (SMRT) sequencing technology, which has since evolved into their RSII and Sequel systems. Here we capsulize how SMRT sequencing is revolutionizing constitutional, reproductive, cancer, microbial and viral genetic testing. PMID:29401301
DeNovoGUI: An Open Source Graphical User Interface for de Novo Sequencing of Tandem Mass Spectra
2013-01-01
De novo sequencing is a popular technique in proteomics for identifying peptides from tandem mass spectra without having to rely on a protein sequence database. Despite the strong potential of de novo sequencing algorithms, their adoption threshold remains quite high. We here present a user-friendly and lightweight graphical user interface called DeNovoGUI for running parallelized versions of the freely available de novo sequencing software PepNovo+, greatly simplifying the use of de novo sequencing in proteomics. Our platform-independent software is freely available under the permissible Apache2 open source license. Source code, binaries, and additional documentation are available at http://denovogui.googlecode.com. PMID:24295440
DeNovoGUI: an open source graphical user interface for de novo sequencing of tandem mass spectra.
Muth, Thilo; Weilnböck, Lisa; Rapp, Erdmann; Huber, Christian G; Martens, Lennart; Vaudel, Marc; Barsnes, Harald
2014-02-07
De novo sequencing is a popular technique in proteomics for identifying peptides from tandem mass spectra without having to rely on a protein sequence database. Despite the strong potential of de novo sequencing algorithms, their adoption threshold remains quite high. We here present a user-friendly and lightweight graphical user interface called DeNovoGUI for running parallelized versions of the freely available de novo sequencing software PepNovo+, greatly simplifying the use of de novo sequencing in proteomics. Our platform-independent software is freely available under the permissible Apache2 open source license. Source code, binaries, and additional documentation are available at http://denovogui.googlecode.com .
Chiral pathways in DNA dinucleotides using gradient optimized refinement along metastable borders
NASA Astrophysics Data System (ADS)
Romano, Pablo; Guenza, Marina
We present a study of DNA breathing fluctuations using Markov state models (MSM) with our novel refinement procedure. MSM have become a favored method of building kinetic models, however their accuracy has always depended on using a significant number of microstates, making the method costly. We present a method which optimizes macrostates by refining borders with respect to the gradient along the free energy surface. As the separation between macrostates contains highest discretization errors, this method corrects for any errors produced by limited microstate sampling. Using our refined MSM methods, we investigate DNA breathing fluctuations, thermally induced conformational changes in native B-form DNA. Running several microsecond MD simulations of DNA dinucleotides of varying sequences, to include sequence and polarity effects, we've analyzed using our refined MSM to investigate conformational pathways inherent in the unstacking of DNA bases. Our kinetic analysis has shown preferential chirality in unstacking pathways that may be critical in how proteins interact with single stranded regions of DNA. These breathing dynamics can help elucidate the connection between conformational changes and key mechanisms within protein-DNA recognition. NSF Chemistry Division (Theoretical Chemistry), the Division of Physics (Condensed Matter: Material Theory), XSEDE.
Indel-tolerant read mapping with trinucleotide frequencies using cache-oblivious kd-trees.
Mahmud, Md Pavel; Wiedenhoeft, John; Schliep, Alexander
2012-09-15
Mapping billions of reads from next generation sequencing experiments to reference genomes is a crucial task, which can require hundreds of hours of running time on a single CPU even for the fastest known implementations. Traditional approaches have difficulties dealing with matches of large edit distance, particularly in the presence of frequent or large insertions and deletions (indels). This is a serious obstacle both in determining the spectrum and abundance of genetic variations and in personal genomics. For the first time, we adopt the approximate string matching paradigm of geometric embedding to read mapping, thus rephrasing it to nearest neighbor queries in a q-gram frequency vector space. Using the L(1) distance between frequency vectors has the benefit of providing lower bounds for an edit distance with affine gap costs. Using a cache-oblivious kd-tree, we realize running times, which match the state-of-the-art. Additionally, running time and memory requirements are about constant for read lengths between 100 and 1000 bp. We provide a first proof-of-concept that geometric embedding is a promising paradigm for read mapping and that L(1) distance might serve to detect structural variations. TreQ, our initial implementation of that concept, performs more accurate than many popular read mappers over a wide range of structural variants. TreQ will be released under the GNU Public License (GPL), and precomputed genome indices will be provided for download at http://treq.sf.net. pavelm@cs.rutgers.edu Supplementary data are available at Bioinformatics online.
Indel-tolerant read mapping with trinucleotide frequencies using cache-oblivious kd-trees
Mahmud, Md Pavel; Wiedenhoeft, John; Schliep, Alexander
2012-01-01
Motivation: Mapping billions of reads from next generation sequencing experiments to reference genomes is a crucial task, which can require hundreds of hours of running time on a single CPU even for the fastest known implementations. Traditional approaches have difficulties dealing with matches of large edit distance, particularly in the presence of frequent or large insertions and deletions (indels). This is a serious obstacle both in determining the spectrum and abundance of genetic variations and in personal genomics. Results: For the first time, we adopt the approximate string matching paradigm of geometric embedding to read mapping, thus rephrasing it to nearest neighbor queries in a q-gram frequency vector space. Using the L1 distance between frequency vectors has the benefit of providing lower bounds for an edit distance with affine gap costs. Using a cache-oblivious kd-tree, we realize running times, which match the state-of-the-art. Additionally, running time and memory requirements are about constant for read lengths between 100 and 1000 bp. We provide a first proof-of-concept that geometric embedding is a promising paradigm for read mapping and that L1 distance might serve to detect structural variations. TreQ, our initial implementation of that concept, performs more accurate than many popular read mappers over a wide range of structural variants. Availability and implementation: TreQ will be released under the GNU Public License (GPL), and precomputed genome indices will be provided for download at http://treq.sf.net. Contact: pavelm@cs.rutgers.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22962448
ATD-1 Operational Integration Assessment Final Report
NASA Technical Reports Server (NTRS)
Witzberger, Kevin E.; Sharma, Shivanjli; Martin, Lynn Hazel; Wynnyk, Mitch; McGarry, Katie
2015-01-01
The FAA and NASA conducted an Operational Integration Assessment (OIA) of a prototype Terminal Sequencing and Spacing (formerly TSS, now TSAS) system at the FAA's William J. Hughes Technical Center (WJHTC). The OIA took approximately one year to plan and execute, culminating in a formal data collection, referred to as the Run for Record, from May 12-21, 2015. This report presents quantitative and qualitative results from the Run for Record.
Single Cell Total RNA Sequencing through Isothermal Amplification in Picoliter-Droplet Emulsion.
Fu, Yusi; Chen, He; Liu, Lu; Huang, Yanyi
2016-11-15
Prevalent single cell RNA amplification and sequencing chemistries mainly focus on polyadenylated RNAs in eukaryotic cells by using oligo(dT) primers for reverse transcription. We develop a new RNA amplification method, "easier-seq", to reverse transcribe and amplify the total RNAs, both with and without polyadenylate tails, from a single cell for transcriptome sequencing with high efficiency, reproducibility, and accuracy. By distributing the reverse transcribed cDNA molecules into 1.5 × 10 5 aqueous droplets in oil, the cDNAs are isothermally amplified using random primers in each of these 65-pL reactors separately. This new method greatly improves the ease of single-cell RNA sequencing by reducing the experimental steps. Meanwhile, with less chance to induce errors, this method can easily maintain the quality of single-cell sequencing. In addition, this polyadenylate-tail-independent method can be seamlessly applied to prokaryotic cell RNA sequencing.
DRUMS: Disk Repository with Update Management and Select option for high throughput sequencing data
2014-01-01
Background New technologies for analyzing biological samples, like next generation sequencing, are producing a growing amount of data together with quality scores. Moreover, software tools (e.g., for mapping sequence reads), calculating transcription factor binding probabilities, estimating epigenetic modification enriched regions or determining single nucleotide polymorphism increase this amount of position-specific DNA-related data even further. Hence, requesting data becomes challenging and expensive and is often implemented using specialised hardware. In addition, picking specific data as fast as possible becomes increasingly important in many fields of science. The general problem of handling big data sets was addressed by developing specialized databases like HBase, HyperTable or Cassandra. However, these database solutions require also specialized or distributed hardware leading to expensive investments. To the best of our knowledge, there is no database capable of (i) storing billions of position-specific DNA-related records, (ii) performing fast and resource saving requests, and (iii) running on a single standard computer hardware. Results Here, we present DRUMS (Disk Repository with Update Management and Select option), satisfying demands (i)-(iii). It tackles the weaknesses of traditional databases while handling position-specific DNA-related data in an efficient manner. DRUMS is capable of storing up to billions of records. Moreover, it focuses on optimizing relating single lookups as range request, which are needed permanently for computations in bioinformatics. To validate the power of DRUMS, we compare it to the widely used MySQL database. The test setting considers two biological data sets. We use standard desktop hardware as test environment. Conclusions DRUMS outperforms MySQL in writing and reading records by a factor of two up to a factor of 10000. Furthermore, it can work with significantly larger data sets. Our work focuses on mid-sized data sets up to several billion records without requiring cluster technology. Storing position-specific data is a general problem and the concept we present here is a generalized approach. Hence, it can be easily applied to other fields of bioinformatics. PMID:24495746
Deevy, Patricia; Leonard, Laurence B; Marchman, Virginia A
2017-03-01
This study tested the feasibility of a method designed to assess children's sensitivity to tense/agreement information in fronted auxiliaries during online comprehension of questions (e.g., Are the nice little dogs running?). We expected that a group of children who were proficient in auxiliary use would show this sensitivity, indicating an awareness of the relation between the subject-verb sequence (e.g., dogs running) and preceding information (e.g., are). Failure to grasp this relation is proposed to play a role in the protracted inconsistency in auxiliary use in children with specific language impairment (SLI). Fifteen 3-year-old typically developing children who demonstrated proficiency in auxiliary use viewed pairs of pictures showing a single agent and multiple agents while hearing questions with or without an agreeing fronted auxiliary. Proportion looking to the target was measured. Children showed anticipatory looking on the basis of the number information contained in the auxiliary (is or are). The children tested in this study represent a group that frequently serves as a comparison for older children with SLI. Because the method successfully demonstrated their sensitivity to tense/agreement information in questions, future research that involves direct comparisons of these 2 groups is warranted.
Close encounters of the third-body kind. [intruding bodies in binary star systems
NASA Technical Reports Server (NTRS)
Davies, M. B.; Benz, W.; Hills, J. G.
1994-01-01
We simulated encounters involving binaries of two eccentricities: e = 0 (i.e., circular binaries) and e = 0.5. In both cases the binary contained a point mass of 1.4 solar masses (i.e., a neutron star) and a 0.8 solar masses main-sequence star modeled as a polytrope. The semimajor axes of both binaries were set to 60 solar radii (0.28 AU). We considered intruders of three masses: 1.4 solar masses (a neutron star), 0.8 solar masses (a main-sequence star or a higher mass white dwarf), and 0.64 solar masses (a more typical mass white dwarf). Our strategy was to perform a large number (40,000) of encounters using a three-body code, then to rerun a small number of cases with a three-dimensional smoothed particle hydrodynamics (SPH) code to determine the importance of hydrodynamical effects. Using the results of the three-body runs, we computed the exchange across sections, sigma(sub ex). From the results of the SPH runs, we computed the cross sections for clean exchange, denoted by sigma(sub cx); the formation of a triple system, denoted by sigma(sub trp); and the formation of a merged binary with an object formed from the merger of two of the stars left in orbit around the third star, denoted by sigma(sub mb). For encounters between either binary and a 1.4 solar masses neutron star, sigma(sub cx) approx. 0.7 sigma(sub ex) and sigma(sub mb) + sigma(sub trp) approx. 0.3 sigma(sub ex). For encounters between either binary and the 0.8 solar masses main-sequence star, sigma(sub cx) approx. 0.50 sigma(sub ex) and sigma(sub mb) + sigma(sub trp) approx. 1.0 sigma(sub ex). If the main sequence star is replaced by a main-sequence star of the same mass, we have sigma(sub cx) approx. 0.5 sigma(sub ex) and sigma(sub mb) + sigma(sub trp) approx. 1.6 sigma(sub ex). Although the exchange cross section is a sensitive function of intruder mass, we see that the cross section to produce merged binaries is roughly independent of intruder mass. The merged binaries produced have semi-major axes much larger than either those of the original binaries or those of binaries produced in clean exchanges. Coupled with their lower kick velocities, received from the encounters, their larger size will enhance their cross section, shortening the waiting time to a subsequent encounter with another single star.
The Effects of a Duathlon Simulation on Ventilatory Threshold and Running Economy
Berry, Nathaniel T.; Wideman, Laurie; Shields, Edgar W.; Battaglini, Claudio L.
2016-01-01
Multisport events continue to grow in popularity among recreational, amateur, and professional athletes around the world. This study aimed to determine the compounding effects of the initial run and cycling legs of an International Triathlon Union (ITU) Duathlon simulation on maximal oxygen uptake (VO2max), ventilatory threshold (VT) and running economy (RE) within a thermoneutral, laboratory controlled setting. Seven highly trained multisport athletes completed three trials; Trial-1 consisted of a speed only VO2max treadmill protocol (SOVO2max) to determine VO2max, VT, and RE during a single-bout run; Trial-2 consisted of a 10 km run at 98% of VT followed by an incremental VO2max test on the cycle ergometer; Trial-3 consisted of a 10 km run and 30 km cycling bout at 98% of VT followed by a speed only treadmill test to determine the compounding effects of the initial legs of a duathlon on VO2max, VT, and RE. A repeated measures ANOVA was performed to determine differences between variables across trials. No difference in VO2max, VT (%VO2max), maximal HR, or maximal RPE was observed across trials. Oxygen consumption at VT was significantly lower during Trial-3 compared to Trial-1 (p = 0.01). This decrease was coupled with a significant reduction in running speed at VT (p = 0.015). A significant interaction between trial and running speed indicate that RE was significantly altered during Trial-3 compared to Trial-1 (p < 0.001). The first two legs of a laboratory based duathlon simulation negatively impact VT and RE. Our findings may provide a useful method to evaluate multisport athletes since a single-bout incremental treadmill test fails to reveal important alterations in physiological thresholds. Key points Decrease in relative oxygen uptake at VT (ml·kg-1·min-1) during the final leg of a duathlon simulation, compared to a single-bout maximal run. We observed a decrease in running speed at VT during the final leg of a duathlon simulation; resulting in an increase of more than 2 minutes to complete a 5 km run. During our study, highly trained athletes were unable to complete the final 5 km run at the same intensity that they completed the initial 10 km run (in a laboratory setting). A better understanding, and determination, of training loads during multisport training may help to better periodize training programs; additional research is required. PMID:27274661
SPIRE Data Evaluation and Nuclear IR Fluorescence Processes.
1982-11-30
so that all isotopes can be dealt with in a single run rather than a number of separate runs. At lower altitudes the radiance calculation needs to be...approximation can be inferred from the work of Neuendorffer (1982) on developing an analytic expression for the absorption of a single non-overlapping line...personnel by using prominent atmospheric infrared features such as the OH maximum, the HNO3 maximum, the CO3 4.3 um knee, etc. The azimuth however
Wan, LingLin; Han, Juan; Sang, Min; Li, AiFen; Wu, Hong; Yin, ShunJi; Zhang, ChengWu
2012-01-01
Background Eustigmatos cf. polyphem is a yellow-green unicellular soil microalga belonging to the eustimatophyte with high biomass and considerable production of triacylglycerols (TAGs) for biofuels, which is thus referred to as an oleaginous microalga. The paucity of microalgae genome sequences, however, limits development of gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for a non-model microalgae species, E. cf. polyphem, and identify pathways and genes of importance related to biofuel production. Results We performed the de novo assembly of E. cf. polyphem transcriptome using Illumina paired-end sequencing technology. In a single run, we produced 29,199,432 sequencing reads corresponding to 2.33 Gb total nucleotides. These reads were assembled into 75,632 unigenes with a mean size of 503 bp and an N50 of 663 bp, ranging from 100 bp to >3,000 bp. Assembled unigenes were subjected to BLAST similarity searches and annotated with Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology identifiers. These analyses identified the majority of carbohydrate, fatty acids, TAG and carotenoids biosynthesis and catabolism pathways in E. cf. polyphem. Conclusions Our data provides the construction of metabolic pathways involved in the biosynthesis and catabolism of carbohydrate, fatty acids, TAG and carotenoids in E. cf. polyphem and provides a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock. PMID:22536352
Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation
2011-01-01
Background The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. Results A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Conclusions Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance. PMID:21631914
Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation.
Rognes, Torbjørn
2011-06-01
The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance.
OncoNEM: inferring tumor evolution from single-cell sequencing data.
Ross, Edith M; Markowetz, Florian
2016-04-15
Single-cell sequencing promises a high-resolution view of genetic heterogeneity and clonal evolution in cancer. However, methods to infer tumor evolution from single-cell sequencing data lag behind methods developed for bulk-sequencing data. Here, we present OncoNEM, a probabilistic method for inferring intra-tumor evolutionary lineage trees from somatic single nucleotide variants of single cells. OncoNEM identifies homogeneous cellular subpopulations and infers their genotypes as well as a tree describing their evolutionary relationships. In simulation studies, we assess OncoNEM's robustness and benchmark its performance against competing methods. Finally, we show its applicability in case studies of muscle-invasive bladder cancer and essential thrombocythemia.
Reality of Single Circulating Tumor Cell Sequencing for Molecular Diagnostics in Pancreatic Cancer.
Court, Colin M; Ankeny, Jacob S; Sho, Shonan; Hou, Shuang; Li, Qingyu; Hsieh, Carolyn; Song, Min; Liao, Xinfang; Rochefort, Matthew M; Wainberg, Zev A; Graeber, Thomas G; Tseng, Hsian-Rong; Tomlinson, James S
2016-09-01
To understand the potential and limitations of circulating tumor cell (CTC) sequencing for molecular diagnostics, we investigated the feasibility of identifying the ubiquitous KRAS mutation in single CTCs from pancreatic cancer (PC) patients. We used the NanoVelcro/laser capture microdissection CTC platform, combined with whole genome amplification and KRAS Sanger sequencing. We assessed both KRAS codon-12 coverage and the degree that allele dropout during whole genome amplification affected the detection of KRAS mutations from single CTCs. We isolated 385 single cells, 163 from PC cell lines and 222 from the blood of 12 PC patients, and obtained KRAS sequence coverage in 218 of 385 single cells (56.6%). For PC cell lines with known KRAS mutations, single mutations were detected in 67% of homozygous cells but only 37.4% of heterozygous single cells, demonstrating that both coverage and allele dropout are important causes of mutation detection failure from single cells. We could detect KRAS mutations in CTCs from 11 of 12 patients (92%) and 33 of 119 single CTCs sequenced, resulting in a KRAS mutation detection rate of 27.7%. Importantly, KRAS mutations were never found in the 103 white blood cells sequenced. Sequencing of groups of cells containing between 1 and 100 cells determined that at least 10 CTCs are likely required to reliably assess KRAS mutation status from CTCs. Copyright © 2016 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Macas, Jiří; Neumann, Pavel; Navrátilová, Alice
2007-01-01
Background Extraordinary size variation of higher plant nuclear genomes is in large part caused by differences in accumulation of repetitive DNA. This makes repetitive DNA of great interest for studying the molecular mechanisms shaping architecture and function of complex plant genomes. However, due to methodological constraints of conventional cloning and sequencing, a global description of repeat composition is available for only a very limited number of higher plants. In order to provide further data required for investigating evolutionary patterns of repeated DNA within and between species, we used a novel approach based on massive parallel sequencing which allowed a comprehensive repeat characterization in our model species, garden pea (Pisum sativum). Results Analysis of 33.3 Mb sequence data resulted in quantification and partial sequence reconstruction of major repeat families occurring in the pea genome with at least thousands of copies. Our results showed that the pea genome is dominated by LTR-retrotransposons, estimated at 140,000 copies/1C. Ty3/gypsy elements are less diverse and accumulated to higher copy numbers than Ty1/copia. This is in part due to a large population of Ogre-like retrotransposons which alone make up over 20% of the genome. In addition to numerous types of mobile elements, we have discovered a set of novel satellite repeats and two additional variants of telomeric sequences. Comparative genome analysis revealed that there are only a few repeat sequences conserved between pea and soybean genomes. On the other hand, all major families of pea mobile elements are well represented in M. truncatula. Conclusion We have demonstrated that even in a species with a relatively large genome like pea, where a single 454-sequencing run provided only 0.77% coverage, the generated sequences were sufficient to reconstruct and analyze major repeat families corresponding to a total of 35–48% of the genome. These data provide a starting point for further investigations of legume plant genomes based on their global comparative analysis and for the development of more sophisticated approaches for data mining. PMID:18031571
Hinaut, Xavier; Dominey, Peter Ford
2011-01-01
Categorical encoding is crucial for mastering large bodies of related sensory-motor experiences, but what is its neural substrate? In an effort to respond to this question, recent single-unit recording studies in the macaque lateral prefrontal cortex (LPFC) have demonstrated two characteristic forms of neural encoding of the sequential structure of the animal's sensory-motor experience. One population of neurons encodes the specific behavioral sequences. A second population of neurons encodes the sequence category (e.g. ABAB, AABB or AAAA) and does not differentiate sequences within the category (Shima, K., Isoda, M., Mushiake, H., Tanji, J., 2007. Categorization of behavioural sequences in the prefrontal cortex. Nature 445, 315-318.). Interestingly these neurons are intermingled in the lateral prefrontal cortex, and not topographically segregated. Thus, LPFC may provide a neurophysiological basis for sensorimotor categorization. Here we report on a neural network simulation study that reproduces and explains these results. We model a cortical circuit composed of three layers (infragranular, granular, and supragranular) of 5*5 leaky integrator neurons with a sigmoidal output function, and we examine 1000 such circuits running in parallel. Crucially the three layers are interconnected with recurrent connections, thus producing a dynamical system that is inherently sensitive to the spatiotemporal structure of the sequential inputs. The model is presented with 11 four-element sequences following Shima et al. We isolated one subpopulation of neurons each of whose activity predicts individual sequences, and a second population that predicts category independent of the specific sequence. We argue that a richly interconnected cortical circuit is capable of internally generating a neural representation of category membership, thus significantly extending the scope of recurrent network computation. In order to demonstrate that these representations can be used to create an explicit categorization capability, we introduced an additional neural structure corresponding to the striatum. We showed that via cortico-striatal plasticity, neurons in the striatum could produce an explicit representation both of the identity of each sequence, and its category membership. Copyright © 2011 Elsevier Ltd. All rights reserved.
Tumor Heterogeneity, Single-Cell Sequencing, and Drug Resistance.
Schmidt, Felix; Efferth, Thomas
2016-06-16
Tumor heterogeneity has been compared with Darwinian evolution and survival of the fittest. The evolutionary ecosystem of tumors consisting of heterogeneous tumor cell populations represents a considerable challenge to tumor therapy, since all genetically and phenotypically different subpopulations have to be efficiently killed by therapy. Otherwise, even small surviving subpopulations may cause repopulation and refractory tumors. Single-cell sequencing allows for a better understanding of the genomic principles of tumor heterogeneity and represents the basis for more successful tumor treatments. The isolation and sequencing of single tumor cells still represents a considerable technical challenge and consists of three major steps: (1) single cell isolation (e.g., by laser-capture microdissection), fluorescence-activated cell sorting, micromanipulation, whole genome amplification (e.g., with the help of Phi29 DNA polymerase), and transcriptome-wide next generation sequencing technologies (e.g., 454 pyrosequencing, Illumina sequencing, and other systems). Data demonstrating the feasibility of single-cell sequencing for monitoring the emergence of drug-resistant cell clones in patient samples are discussed herein. It is envisioned that single-cell sequencing will be a valuable asset to assist the design of regimens for personalized tumor therapies based on tumor subpopulation-specific genetic alterations in individual patients.
The dance of the honeybee: how do honeybees dance to transfer food information effectively?
Okada, R; Ikeno, H; Sasayama, Noriko; Aonuma, H; Kurabayashi, D; Ito, E
2008-01-01
A honeybee informs her nestmates of the location of a flower she has visited by a unique behavior called a "waggle dance." On a vertical comb, the direction of the waggle run relative to gravity indicates the direction to the food source relative to the sun in the field, and the duration of the waggle run indicates the distance to the food source. To determine the detailed biological features of the waggle dance, we observed worker honeybee behavior in the field. Video analysis showed that the bee does not dance in a single or random place in the hive but waggled several times in one place and then several times in another. It also showed that the information of the waggle dance contains a substantial margin of error. Angle and duration of waggle runs varied from run to run, with the range of +/-15 degrees and +/-15%, respectively, even in a series of waggle dances of a single individual. We also found that most dance followers that listen to the waggle dance left the dancer after one or two sessions of listening.
Jamming and Localization of Interacting Run-and-Tumble Particles
NASA Astrophysics Data System (ADS)
Blythe, Richard; Evans, Martin; Slowman, Alexander
Certain species of bacteria, notably Escherichia coli, exhibit a characteristic run-and-tumble motion comprising a sequence of straight-line runs at constant velocity interspersed with tumble events that randomize the direction of motion. In a many-body setting, this nonequilibrium dynamics can generate the phenomenon of motility-induced phase separation, which is also seen for a wide variety of self-propelled particles more generally. Whilst the propensity of self-propelled particles to phase separate is understood at a mesoscopic level, the origin of this behaviour in the inelastic collisions between particles implied by the microscopic dynamics is not. Here we present exact results for run-and-tumble particles in one dimension that reveal a richly-structured stationary state that comprises a superposition of three distinct physical states whose relative weights vary with the run and tumble rates, namely a jammed state, a localized state and a delocalized state.
Anhøj, Jacob; Olesen, Anne Vingaard
2014-01-01
A run chart is a line graph of a measure plotted over time with the median as a horizontal line. The main purpose of the run chart is to identify process improvement or degradation, which may be detected by statistical tests for non-random patterns in the data sequence. We studied the sensitivity to shifts and linear drifts in simulated processes using the shift, crossings and trend rules for detecting non-random variation in run charts. The shift and crossings rules are effective in detecting shifts and drifts in process centre over time while keeping the false signal rate constant around 5% and independent of the number of data points in the chart. The trend rule is virtually useless for detection of linear drift over time, the purpose it was intended for.
Real-Time DNA Sequencing in the Antarctic Dry Valleys Using the Oxford Nanopore Sequencer
Johnson, Sarah S.; Zaikova, Elena; Goerlitz, David S.; Bai, Yu; Tighe, Scott W.
2017-01-01
The ability to sequence DNA outside of the laboratory setting has enabled novel research questions to be addressed in the field in diverse areas, ranging from environmental microbiology to viral epidemics. Here, we demonstrate the application of offline DNA sequencing of environmental samples using a hand-held nanopore sequencer in a remote field location: the McMurdo Dry Valleys, Antarctica. Sequencing was performed using a MK1B MinION sequencer from Oxford Nanopore Technologies (ONT; Oxford, United Kingdom) that was equipped with software to operate without internet connectivity. One-direction (1D) genomic libraries were prepared using portable field techniques on DNA isolated from desiccated microbial mats. By adequately insulating the sequencer and laptop, it was possible to run the sequencing protocol for up to 2½ h under arduous conditions. PMID:28337073
Volume 2: Compendium of Abstracts
2017-06-01
simulation work using a standard running model for legged systems, the Spring Loaded Inverted Pendulum (SLIP) Model. In this model, the dynamics of a single...bar SLIP model is analyzed using a basin of attraction analyses to determine the optimal configuration for running at different velocities and...acquisition, and the automatic target acquisition were then compared to each other. After running trials with the current system, it will be
Singh, Chandra K; Ojha, Abhishek; Kachru, Devendra N
2007-01-01
To comply with international labeling regulations for genetically modified (GM) crops and food, and to enable proper identification of GM organisms (GMOs), effective methodologies and reliable approaches are needed. The spurious and unapproved GM planting has contributed to crop failures and commercial losses. To ensure effective and genuine GM cultivation, a methodology is needed to detect and identify the trait of interest and concurrently evaluate the structural and functional stability of the transgene insert. A multiple polymerase chain reaction (PCR) approach was developed for detection, identification, and gene stability confirmation of cry1Ac transgene construct in Bt cotton. As many as 9 samples of Bt cotton hybrid seeds comprising 3 approved Bt hybrids, MECH-12Bt, MECH-162Bt, MECH-184Bt, and a batch of 6 nonapproved Bt hybrids were tested. Initially, single standard PCR assays were run to amplify predominant GM DNA sequences (CaMV 35S promoter, nos terminator, and npt-II marker gene); a housekeeping gene, Gossypium hirsutum fiber-specific acyl carrier protein gene (acp1); a trait-specific transgene (cry1Ac); and a sequence of 7S 3' transcription terminator which specifically borders with 3' region of cry1Ac transgene cassette. The concurrent amplification of all sequences of the entire cassette was performed by 3 assays, duplex, triplex, and quadruplex multiplex PCR assays, under common assay conditions. The identity of amplicons was reconfirmed by restriction endonuclease digestion profile. The 2 distinct transgene cassettes, cry1Ac and npt-II, of the Bt cotton were amplified using the respective forward primer of promoter and reverse primer of terminator. The resultant amplicons were excised, eluted, and purified. The purified amplicons served as template for nested PCR assays. The nested PCR runs confirmed the transgene construct orientation and identity. The limit of detection as established by our assay for GM trait (cry1Ac) was 0.1%. This approach can be adopted as a standard procedure for complete molecular characterization of Bt cotton. These assays will be of interest and use to importers, breeders, research laboratories, safety regulators, and food processors for detection of cry1Ac bearing GMOs.
NASA Astrophysics Data System (ADS)
Liu, Junliang; Zhang, Tingfa; Li, Yongfu; Ding, Lei; Tao, Junchao; Wang, Ying; Wang, Qingpu; Fang, Jiaxiong
2017-07-01
A free-running single-photon detector for 1.06 μm wavelength based on an InGaAsP/InP single-photon avalanche diode is presented. The detector incorporates an ultra-fast active-quenching technique to greatly lessen the afterpulsing effects. An improved method for avalanche characterization using electroluminescence is proposed, and the performance of the detector is evaluated. The number of avalanche carriers is as low as 1.68 ×106 , resulting in a low total afterpulse probability of 4% at 233 K, 10% detection efficiency, and 1 μs hold-off time.
Nadkarni, P. M.; Miller, P. L.
1991-01-01
A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations. PMID:1807632
Working memory encoding delays top-down attention to visual cortex.
Scalf, Paige E; Dux, Paul E; Marois, René
2011-09-01
The encoding of information from one event into working memory can delay high-level, central decision-making processes for subsequent events [e.g., Jolicoeur, P., & Dell'Acqua, R. The demonstration of short-term consolidation. Cognitive Psychology, 36, 138-202, 1998, doi:10.1006/cogp.1998.0684]. Working memory, however, is also believed to interfere with the deployment of top-down attention [de Fockert, J. W., Rees, G., Frith, C. D., & Lavie, N. The role of working memory in visual selective attention. Science, 291, 1803-1806, 2001, doi:10.1126/science.1056496]. It is, therefore, possible that, in addition to delaying central processes, the engagement of working memory encoding (WME) also postpones perceptual processing as well. Here, we tested this hypothesis with time-resolved fMRI by assessing whether WME serially postpones the action of top-down attention on low-level sensory signals. In three experiments, participants viewed a skeletal rapid serial visual presentation sequence that contained two target items (T1 and T2) separated by either a short (550 msec) or long (1450 msec) SOA. During single-target runs, participants attended and responded only to T1, whereas in dual-target runs, participants attended and responded to both targets. To determine whether T1 processing delayed top-down attentional enhancement of T2, we examined T2 BOLD response in visual cortex by subtracting the single-task waveforms from the dual-task waveforms for each SOA. When the WME demands of T1 were high (Experiments 1 and 3), T2 BOLD response was delayed at the short SOA relative to the long SOA. This was not the case when T1 encoding demands were low (Experiment 2). We conclude that encoding of a stimulus into working memory delays the deployment of attention to subsequent target representations in visual cortex.
DOT National Transportation Integrated Search
1994-10-01
THE RUN-OFF-ROAD COLLISION AVOIDANCE USING LVHS COUNTERMEASURES PROGRAM IS TO ADDRESS THE SINGLE VEHICLE CRASH PROBLEM THROUGH APPLICATION OF TECHNOLOGY TO PREVENT AND/OR REDUCE THE SEVERITY OF THESE CRASHES.
Sanders, Ashley D; Falconer, Ester; Hills, Mark; Spierings, Diana C J; Lansdorp, Peter M
2017-06-01
The ability to distinguish between genome sequences of homologous chromosomes in single cells is important for studies of copy-neutral genomic rearrangements (such as inversions and translocations), building chromosome-length haplotypes, refining genome assemblies, mapping sister chromatid exchange events and exploring cellular heterogeneity. Strand-seq is a single-cell sequencing technology that resolves the individual homologs within a cell by restricting sequence analysis to the DNA template strands used during DNA replication. This protocol, which takes up to 4 d to complete, relies on the directionality of DNA, in which each single strand of a DNA molecule is distinguished based on its 5'-3' orientation. Culturing cells in a thymidine analog for one round of cell division labels nascent DNA strands, allowing for their selective removal during genomic library construction. To preserve directionality of template strands, genomic preamplification is bypassed and labeled nascent strands are nicked and not amplified during library preparation. Each single-cell library is multiplexed for pooling and sequencing, and the resulting sequence data are aligned, mapping to either the minus or plus strand of the reference genome, to assign template strand states for each chromosome in the cell. The major adaptations to conventional single-cell sequencing protocols include harvesting of daughter cells after a single round of BrdU incorporation, bypassing of whole-genome amplification, and removal of the BrdU + strand during Strand-seq library preparation. By sequencing just template strands, the structure and identity of each homolog are preserved.
Simplified programming and control of automated radiosynthesizers through unit operations.
Claggett, Shane B; Quinn, Kevin M; Lazari, Mark; Moore, Melissa D; van Dam, R Michael
2013-07-15
Many automated radiosynthesizers for producing positron emission tomography (PET) probes provide a means for the operator to create custom synthesis programs. The programming interfaces are typically designed with the engineer rather than the radiochemist in mind, requiring lengthy programs to be created from sequences of low-level, non-intuitive hardware operations. In some cases, the user is even responsible for adding steps to update the graphical representation of the system. In light of these unnecessarily complex approaches, we have created software to perform radiochemistry on the ELIXYS radiosynthesizer with the goal of being intuitive and easy to use. Radiochemists were consulted, and a wide range of radiosyntheses were analyzed to determine a comprehensive set of basic chemistry unit operations. Based around these operations, we created a software control system with a client-server architecture. In an attempt to maximize flexibility, the client software was designed to run on a variety of portable multi-touch devices. The software was used to create programs for the synthesis of several 18F-labeled probes on the ELIXYS radiosynthesizer, with [18F]FDG detailed here. To gauge the user-friendliness of the software, program lengths were compared to those from other systems. A small sample group with no prior radiosynthesizer experience was tasked with creating and running a simple protocol. The software was successfully used to synthesize several 18F-labeled PET probes, including [18F]FDG, with synthesis times and yields comparable to literature reports. The resulting programs were significantly shorter and easier to debug than programs from other systems. The sample group of naive users created and ran a simple protocol within a couple of hours, revealing a very short learning curve. The client-server architecture provided reliability, enabling continuity of the synthesis run even if the computer running the client software failed. The architecture enabled a single user to control the hardware while others observed the run in progress or created programs for other probes. We developed a novel unit operation-based software interface to control automated radiosynthesizers that reduced the program length and complexity and also exhibited a short learning curve. The client-server architecture provided robustness and flexibility.
Simplified programming and control of automated radiosynthesizers through unit operations
2013-01-01
Background Many automated radiosynthesizers for producing positron emission tomography (PET) probes provide a means for the operator to create custom synthesis programs. The programming interfaces are typically designed with the engineer rather than the radiochemist in mind, requiring lengthy programs to be created from sequences of low-level, non-intuitive hardware operations. In some cases, the user is even responsible for adding steps to update the graphical representation of the system. In light of these unnecessarily complex approaches, we have created software to perform radiochemistry on the ELIXYS radiosynthesizer with the goal of being intuitive and easy to use. Methods Radiochemists were consulted, and a wide range of radiosyntheses were analyzed to determine a comprehensive set of basic chemistry unit operations. Based around these operations, we created a software control system with a client–server architecture. In an attempt to maximize flexibility, the client software was designed to run on a variety of portable multi-touch devices. The software was used to create programs for the synthesis of several 18F-labeled probes on the ELIXYS radiosynthesizer, with [18F]FDG detailed here. To gauge the user-friendliness of the software, program lengths were compared to those from other systems. A small sample group with no prior radiosynthesizer experience was tasked with creating and running a simple protocol. Results The software was successfully used to synthesize several 18F-labeled PET probes, including [18F]FDG, with synthesis times and yields comparable to literature reports. The resulting programs were significantly shorter and easier to debug than programs from other systems. The sample group of naive users created and ran a simple protocol within a couple of hours, revealing a very short learning curve. The client–server architecture provided reliability, enabling continuity of the synthesis run even if the computer running the client software failed. The architecture enabled a single user to control the hardware while others observed the run in progress or created programs for other probes. Conclusions We developed a novel unit operation-based software interface to control automated radiosynthesizers that reduced the program length and complexity and also exhibited a short learning curve. The client–server architecture provided robustness and flexibility. PMID:23855995
Single-Cell RNA-Sequencing in Glioma.
Johnson, Eli; Dickerson, Katherine L; Connolly, Ian D; Hayden Gephart, Melanie
2018-04-10
In this review, we seek to summarize the literature concerning the use of single-cell RNA-sequencing for CNS gliomas. Single-cell analysis has revealed complex tumor heterogeneity, subpopulations of proliferating stem-like cells and expanded our view of tumor microenvironment influence in the disease process. Although bulk RNA-sequencing has guided our initial understanding of glioma genetics, this method does not accurately define the heterogeneous subpopulations found within these tumors. Single-cell techniques have appealing applications in cancer research, as diverse cell types and the tumor microenvironment have important implications in therapy. High cost and difficult protocols prevent widespread use of single-cell RNA-sequencing; however, continued innovation will improve accessibility and expand our of knowledge gliomas.
NASA Astrophysics Data System (ADS)
Gelderblom, Erik C.; Vos, Hendrik J.; Mastik, Frits; Faez, Telli; Luan, Ying; Kokhuis, Tom J. A.; van der Steen, Antonius F. W.; Lohse, Detlef; de Jong, Nico; Versluis, Michel
2012-10-01
The Brandaris 128 ultra-high-speed imaging facility has been updated over the last 10 years through modifications made to the camera's hardware and software. At its introduction the camera was able to record 6 sequences of 128 images (500 × 292 pixels) at a maximum frame rate of 25 Mfps. The segmented mode of the camera was revised to allow for subdivision of the 128 image sensors into arbitrary segments (1-128) with an inter-segment time of 17 μs. Furthermore, a region of interest can be selected to increase the number of recordings within a single run of the camera from 6 up to 125. By extending the imaging system with a laser-induced fluorescence setup, time-resolved ultra-high-speed fluorescence imaging of microscopic objects has been enabled. Minor updates to the system are also reported here.
NASA Astrophysics Data System (ADS)
Ma, Yuan-Zhuo; Li, Hong-Shuang; Yao, Wei-Xing
2018-05-01
The evaluation of the probabilistic constraints in reliability-based design optimization (RBDO) problems has always been significant and challenging work, which strongly affects the performance of RBDO methods. This article deals with RBDO problems using a recently developed generalized subset simulation (GSS) method and a posterior approximation approach. The posterior approximation approach is used to transform all the probabilistic constraints into ordinary constraints as in deterministic optimization. The assessment of multiple failure probabilities required by the posterior approximation approach is achieved by GSS in a single run at all supporting points, which are selected by a proper experimental design scheme combining Sobol' sequences and Bucher's design. Sequentially, the transformed deterministic design optimization problem can be solved by optimization algorithms, for example, the sequential quadratic programming method. Three optimization problems are used to demonstrate the efficiency and accuracy of the proposed method.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Chen, Yousu; Wu, Di
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
Run-Reversal Equilibrium for Clinical Trial Randomization
Grant, William C.
2015-01-01
In this paper, we describe a new restricted randomization method called run-reversal equilibrium (RRE), which is a Nash equilibrium of a game where (1) the clinical trial statistician chooses a sequence of medical treatments, and (2) clinical investigators make treatment predictions. RRE randomization counteracts how each investigator could observe treatment histories in order to forecast upcoming treatments. Computation of a run-reversal equilibrium reflects how the treatment history at a particular site is imperfectly correlated with the treatment imbalance for the overall trial. An attractive feature of RRE randomization is that treatment imbalance follows a random walk at each site, while treatment balance is tightly constrained and regularly restored for the overall trial. Less predictable and therefore more scientifically valid experiments can be facilitated by run-reversal equilibrium for multi-site clinical trials. PMID:26079608
A DNA sequence analysis package for the IBM personal computer.
Lagrimini, L M; Brentano, S T; Donelson, J E
1984-01-01
We present here a collection of DNA sequence analysis programs, called "PC Sequence" (PCS), which are designed to run on the IBM Personal Computer (PC). These programs are written in IBM PC compiled BASIC and take full advantage of the IBM PC's speed, error handling, and graphics capabilities. For a modest initial expense in hardware any laboratory can use these programs to quickly perform computer analysis on DNA sequences. They are written with the novice user in mind and require very little training or previous experience with computers. Also provided are a text editing program for creating and modifying DNA sequence files and a communications program which enables the PC to communicate with and collect information from mainframe computers and DNA sequence databases. PMID:6546433
USDA-ARS?s Scientific Manuscript database
Current technologies with next generation sequencing have revolutionized metagenomics analysis of clinical samples. To achieve the non-selective amplification and recovery of low abundance genetic sequences, a simplified Sequence-Independent, Single-Primer Amplification (SISPA) technique in combinat...
Separation and parallel sequencing of the genomes and transcriptomes of single cells using G&T-seq.
Macaulay, Iain C; Teng, Mabel J; Haerty, Wilfried; Kumar, Parveen; Ponting, Chris P; Voet, Thierry
2016-11-01
Parallel sequencing of a single cell's genome and transcriptome provides a powerful tool for dissecting genetic variation and its relationship with gene expression. Here we present a detailed protocol for G&T-seq, a method for separation and parallel sequencing of genomic DNA and full-length polyA(+) mRNA from single cells. We provide step-by-step instructions for the isolation and lysis of single cells; the physical separation of polyA(+) mRNA from genomic DNA using a modified oligo-dT bead capture and the respective whole-transcriptome and whole-genome amplifications; and library preparation and sequence analyses of these amplification products. The method allows the detection of thousands of transcripts in parallel with the genetic variants captured by the DNA-seq data from the same single cell. G&T-seq differs from other currently available methods for parallel DNA and RNA sequencing from single cells, as it involves physical separation of the DNA and RNA and does not require bespoke microfluidics platforms. The process can be implemented manually or through automation. When performed manually, paired genome and transcriptome sequencing libraries from eight single cells can be produced in ∼3 d by researchers experienced in molecular laboratory work. For users with experience in the programming and operation of liquid-handling robots, paired DNA and RNA libraries from 96 single cells can be produced in the same time frame. Sequence analysis and integration of single-cell G&T-seq DNA and RNA data requires a high level of bioinformatics expertise and familiarity with a wide range of informatics tools.
MIMS supports complex computational studies that use multiple interrelated models / programs, such as the modules within TRIM. MIMS is used by TRIM to run various models in sequence, while sharing input and output files.
Hybrid error correction and de novo assembly of single-molecule sequencing reads
Koren, Sergey; Schatz, Michael C.; Walenz, Brian P.; Martin, Jeffrey; Howard, Jason; Ganapathy, Ganeshkumar; Wang, Zhong; Rasko, David A.; McCombie, W. Richard; Jarvis, Erich D.; Phillippy, Adam M.
2012-01-01
Emerging single-molecule sequencing instruments can generate multi-kilobase sequences with the potential to dramatically improve genome and transcriptome assembly. However, the high error rate of single-molecule reads is challenging, and has limited their use to resequencing bacteria. To address this limitation, we introduce a novel correction algorithm and assembly strategy that utilizes shorter, high-identity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on Pacbio RS reads of phage, prokaryotic, and eukaryotic whole genomes, including the novel genome of the parrot Melopsittacus undulatus, as well as for RNA-seq reads of the corn (Zea mays) transcriptome. Our approach achieves over 99.9% read correction accuracy and produces substantially better assemblies than current sequencing strategies: in the best example, quintupling the median contig size relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly. PMID:22750884
High throughput profile-profile based fold recognition for the entire human proteome.
McGuffin, Liam J; Smith, Richard T; Bryson, Kevin; Sørensen, Søren-Aksel; Jones, David T
2006-06-07
In order to maintain the most comprehensive structural annotation databases we must carry out regular updates for each proteome using the latest profile-profile fold recognition methods. The ability to carry out these updates on demand is necessary to keep pace with the regular updates of sequence and structure databases. Providing the highest quality structural models requires the most intensive profile-profile fold recognition methods running with the very latest available sequence databases and fold libraries. However, running these methods on such a regular basis for every sequenced proteome requires large amounts of processing power. In this paper we describe and benchmark the JYDE (Job Yield Distribution Environment) system, which is a meta-scheduler designed to work above cluster schedulers, such as Sun Grid Engine (SGE) or Condor. We demonstrate the ability of JYDE to distribute the load of genomic-scale fold recognition across multiple independent Grid domains. We use the most recent profile-profile version of our mGenTHREADER software in order to annotate the latest version of the Human proteome against the latest sequence and structure databases in as short a time as possible. We show that our JYDE system is able to scale to large numbers of intensive fold recognition jobs running across several independent computer clusters. Using our JYDE system we have been able to annotate 99.9% of the protein sequences within the Human proteome in less than 24 hours, by harnessing over 500 CPUs from 3 independent Grid domains. This study clearly demonstrates the feasibility of carrying out on demand high quality structural annotations for the proteomes of major eukaryotic organisms. Specifically, we have shown that it is now possible to provide complete regular updates of profile-profile based fold recognition models for entire eukaryotic proteomes, through the use of Grid middleware such as JYDE.
Differential correlation for sequencing data.
Siska, Charlotte; Kechris, Katerina
2017-01-19
Several methods have been developed to identify differential correlation (DC) between pairs of molecular features from -omics studies. Most DC methods have only been tested with microarrays and other platforms producing continuous and Gaussian-like data. Sequencing data is in the form of counts, often modeled with a negative binomial distribution making it difficult to apply standard correlation metrics. We have developed an R package for identifying DC called Discordant which uses mixture models for correlations between features and the Expectation Maximization (EM) algorithm for fitting parameters of the mixture model. Several correlation metrics for sequencing data are provided and tested using simulations. Other extensions in the Discordant package include additional modeling for different types of differential correlation, and faster implementation, using a subsampling routine to reduce run-time and address the assumption of independence between molecular feature pairs. With simulations and breast cancer miRNA-Seq and RNA-Seq data, we find that Spearman's correlation has the best performance among the tested correlation methods for identifying differential correlation. Application of Spearman's correlation in the Discordant method demonstrated the most power in ROC curves and sensitivity/specificity plots, and improved ability to identify experimentally validated breast cancer miRNA. We also considered including additional types of differential correlation, which showed a slight reduction in power due to the additional parameters that need to be estimated, but more versatility in applications. Finally, subsampling within the EM algorithm considerably decreased run-time with negligible effect on performance. A new method and R package called Discordant is presented for identifying differential correlation with sequencing data. Based on comparisons with different correlation metrics, this study suggests Spearman's correlation is appropriate for sequencing data, but other correlation metrics are available to the user depending on the application and data type. The Discordant method can also be extended to investigate additional DC types and subsampling with the EM algorithm is now available for reduced run-time. These extensions to the R package make Discordant more robust and versatile for multiple -omics studies.
iDoComp: a compression scheme for assembled genomes
Ochoa, Idoia; Hernaez, Mikel; Weissman, Tsachy
2015-01-01
Motivation: With the release of the latest next-generation sequencing (NGS) machine, the HiSeq X by Illumina, the cost of sequencing a Human has dropped to a mere $4000. Thus we are approaching a milestone in the sequencing history, known as the $1000 genome era, where the sequencing of individuals is affordable, opening the doors to effective personalized medicine. Massive generation of genomic data, including assembled genomes, is expected in the following years. There is crucial need for compression of genomes guaranteed of performing well simultaneously on different species, from simple bacteria to humans, which will ease their transmission, dissemination and analysis. Further, most of the new genomes to be compressed will correspond to individuals of a species from which a reference already exists on the database. Thus, it is natural to propose compression schemes that assume and exploit the availability of such references. Results: We propose iDoComp, a compressor of assembled genomes presented in FASTA format that compresses an individual genome using a reference genome for both the compression and the decompression. In terms of compression efficiency, iDoComp outperforms previously proposed algorithms in most of the studied cases, with comparable or better running time. For example, we observe compression gains of up to 60% in several cases, including H.sapiens data, when comparing with the best compression performance among the previously proposed algorithms. Availability: iDoComp is written in C and can be downloaded from: http://www.stanford.edu/~iochoa/iDoComp.html (We also provide a full explanation on how to run the program and an example with all the necessary files to run it.). Contact: iochoa@stanford.edu Supplementary information: Supplementary Data are available at Bioinformatics online. PMID:25344501
Garinet, Simon; Néou, Mario; de La Villéon, Bruno; Faillot, Simon; Sakat, Julien; Da Fonseca, Juliana P; Jouinot, Anne; Le Tourneau, Christophe; Kamal, Maud; Luscap-Rondof, Windy; Boeva, Valentina; Gaujoux, Sebastien; Vidaud, Michel; Pasmant, Eric; Letourneur, Franck; Bertherat, Jérôme; Assié, Guillaume
2017-09-01
Pangenomic studies identified distinct molecular classes for many cancers, with major clinical applications. However, routine use requires cost-effective assays. We assessed whether targeted next-generation sequencing (NGS) could call chromosomal alterations and DNA methylation status. A training set of 77 tumors and a validation set of 449 (43 tumor types) were analyzed by targeted NGS and single-nucleotide polymorphism (SNP) arrays. Thirty-two tumors were analyzed by NGS after bisulfite conversion, and compared to methylation array or methylation-specific multiplex ligation-dependent probe amplification. Considering allelic ratios, correlation was strong between targeted NGS and SNP arrays (r = 0.88). In contrast, considering DNA copy number, for variations of one DNA copy, correlation was weaker between read counts and SNP array (r = 0.49). Thus, we generated TARGOMICs, optimized for detecting chromosome alterations by combining allelic ratios and read counts generated by targeted NGS. Sensitivity for calling normal, lost, and gained chromosomes was 89%, 72%, and 31%, respectively. Specificity was 81%, 93%, and 98%, respectively. These results were confirmed in the validation set. Finally, TARGOMICs could efficiently align and compute proportions of methylated cytosines from bisulfite-converted DNA from targeted NGS. In conclusion, beyond calling mutations, targeted NGS efficiently calls chromosome alterations and methylation status in tumors. A single run and minor design/protocol adaptations are sufficient. Optimizing targeted NGS should expand translation of genomics to clinical routine. Copyright © 2017 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
GRIL: genome rearrangement and inversion locator.
Darling, Aaron E; Mau, Bob; Blattner, Frederick R; Perna, Nicole T
2004-01-01
GRIL is a tool to automatically identify collinear regions in a set of bacterial-size genome sequences. GRIL uses three basic steps. First, regions of high sequence identity are located. Second, some of these regions are filtered based on user-specified criteria. Finally, the remaining regions of sequence identity are used to define significant collinear regions among the sequences. By locating collinear regions of sequence, GRIL provides a basis for multiple genome alignment using current alignment systems. GRIL also provides a basis for using current inversion distance tools to infer phylogeny. GRIL is implemented in C++ and runs on any x86-based Linux or Windows platform. It is available from http://asap.ahabs.wisc.edu/gril
NASA Astrophysics Data System (ADS)
Streets, Aaron M.; Cao, Chen; Zhang, Xiannian; Huang, Yanyi
2016-03-01
Phenotype classification of single cells reveals biological variation that is masked in ensemble measurement. This heterogeneity is found in gene and protein expression as well as in cell morphology. Many techniques are available to probe phenotypic heterogeneity at the single cell level, for example quantitative imaging and single-cell RNA sequencing, but it is difficult to perform multiple assays on the same single cell. In order to directly track correlation between morphology and gene expression at the single cell level, we developed a microfluidic platform for quantitative coherent Raman imaging and immediate RNA sequencing (RNA-Seq) of single cells. With this device we actively sort and trap cells for analysis with stimulated Raman scattering microscopy (SRS). The cells are then processed in parallel pipelines for lysis, and preparation of cDNA for high-throughput transcriptome sequencing. SRS microscopy offers three-dimensional imaging with chemical specificity for quantitative analysis of protein and lipid distribution in single cells. Meanwhile, the microfluidic platform facilitates single-cell manipulation, minimizes contamination, and furthermore, provides improved RNA-Seq detection sensitivity and measurement precision, which is necessary for differentiating biological variability from technical noise. By combining coherent Raman microscopy with RNA sequencing, we can better understand the relationship between cellular morphology and gene expression at the single-cell level.
The lepton+jets Selection and Determination of the Lepton Fake Rate with the Full RunIIb Data Set
DOE Office of Scientific and Technical Information (OSTI.GOV)
Meister, Daniel
2013-01-01
This thesis presents the combined single top andmore » $$ t\\overline{ }\\ t $$ lepton+jets selection for the full RunIIb dataset of the DØ detector. The selection uses the newest soft- ware versions including all standard central object identifications and corrections and has various additions and improvements compared to the previous 7 . 3 fb - 1 $$ t\\overline{ }\\ t $$ selection and the previous single top selection in order to accommodate even more different analyses. The lepton fake rate $$\\epsilon_{\\rm QCD}$$ and the real lepton efficiency $$\\epsilon_{\\rm sig}$$ are estimated using the matrix method and different variations are considered in order to determine the systematic errors. The calculation has to be done for each run period and every set of analysis cuts separately. In addition the values for the exclusive jet bins and for the new single top analysis cuts have been derived and the thesis shows numerous control plots to demonstrate the excellent agreement between data and Monte Carlo.« less
Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike
2018-01-01
ABSTRACT Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection. PMID:29564396
Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike; Khan, Arifa S
2018-01-01
Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection.
QTL Mapping and CRISPR/Cas9 Editing to Identify a Drug Resistance Gene in Toxoplasma gondii.
Shen, Bang; Powell, Robin H; Behnke, Michael S
2017-06-22
Scientific knowledge is intrinsically linked to available technologies and methods. This article will present two methods that allowed for the identification and verification of a drug resistance gene in the Apicomplexan parasite Toxoplasma gondii, the method of Quantitative Trait Locus (QTL) mapping using a Whole Genome Sequence (WGS) -based genetic map and the method of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/Cas9 -based gene editing. The approach of QTL mapping allows one to test if there is a correlation between a genomic region(s) and a phenotype. Two datasets are required to run a QTL scan, a genetic map based on the progeny of a recombinant cross and a quantifiable phenotype assessed in each of the progeny of that cross. These datasets are then formatted to be compatible with R/qtl software that generates a QTL scan to identify significant loci correlated with the phenotype. Although this can greatly narrow the search window of possible candidates, QTLs span regions containing a number of genes from which the causal gene needs to be identified. Having WGS of the progeny was critical to identify the causal drug resistance mutation at the gene level. Once identified, the candidate mutation can be verified by genetic manipulation of drug sensitive parasites. The most facile and efficient method to genetically modify T. gondii is the CRISPR/Cas9 system. This system comprised of just 2 components both encoded on a single plasmid, a single guide RNA (gRNA) containing a 20 bp sequence complementary to the genomic target and the Cas9 endonuclease that generates a double-strand DNA break (DSB) at the target, repair of which allows for insertion or deletion of sequences around the break site. This article provides detailed protocols to use CRISPR/Cas9 based genome editing tools to verify the gene responsible for sinefungin resistance and to construct transgenic parasites.
QTL Mapping and CRISPR/Cas9 Editing to Identify a Drug Resistance Gene in Toxoplasma gondii
Shen, Bang; Powell, Robin H.; Behnke, Michael S.
2017-01-01
Scientific knowledge is intrinsically linked to available technologies and methods. This article will present two methods that allowed for the identification and verification of a drug resistance gene in the Apicomplexan parasite Toxoplasma gondii, the method of Quantitative Trait Locus (QTL) mapping using a Whole Genome Sequence (WGS) -based genetic map and the method of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/Cas9 -based gene editing. The approach of QTL mapping allows one to test if there is a correlation between a genomic region(s) and a phenotype. Two datasets are required to run a QTL scan, a genetic map based on the progeny of a recombinant cross and a quantifiable phenotype assessed in each of the progeny of that cross. These datasets are then formatted to be compatible with R/qtl software that generates a QTL scan to identify significant loci correlated with the phenotype. Although this can greatly narrow the search window of possible candidates, QTLs span regions containing a number of genes from which the causal gene needs to be identified. Having WGS of the progeny was critical to identify the causal drug resistance mutation at the gene level. Once identified, the candidate mutation can be verified by genetic manipulation of drug sensitive parasites. The most facile and efficient method to genetically modify T. gondii is the CRISPR/Cas9 system. This system comprised of just 2 components both encoded on a single plasmid, a single guide RNA (gRNA) containing a 20 bp sequence complementary to the genomic target and the Cas9 endonuclease that generates a double-strand DNA break (DSB) at the target, repair of which allows for insertion or deletion of sequences around the break site. This article provides detailed protocols to use CRISPR/Cas9 based genome editing tools to verify the gene responsible for sinefungin resistance and to construct transgenic parasites. PMID:28671645
Rideout, Jai Ram; He, Yan; Navas-Molina, Jose A; Walters, William A; Ursell, Luke K; Gibbons, Sean M; Chase, John; McDonald, Daniel; Gonzalez, Antonio; Robbins-Pianka, Adam; Clemente, Jose C; Gilbert, Jack A; Huse, Susan M; Zhou, Hong-Wei; Knight, Rob; Caporaso, J Gregory
2014-01-01
We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA) sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs) for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime) and closed-reference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity). Because more of our algorithm can be run in parallel relative to "classic" open-reference OTU picking, it makes open-reference OTU picking tractable on massive amplicon sequence data sets (though on smaller data sets, "classic" open-reference OTU clustering is often faster). We illustrate that here by applying it to the first 15,000 samples sequenced for the Earth Microbiome Project (1.3 billion V4 16S rRNA amplicons). To the best of our knowledge, this is the largest OTU picking run ever performed, and we estimate that our new algorithm runs in less than 1/5 the time than would be required of "classic" open reference OTU picking. We show that subsampled open-reference OTU picking yields results that are highly correlated with those generated by "classic" open-reference OTU picking through comparisons on three well-studied datasets. An implementation of this algorithm is provided in the popular QIIME software package, which uses uclust for read clustering. All analyses were performed using QIIME's uclust wrappers, though we provide details (aided by the open-source code in our GitHub repository) that will allow implementation of subsampled open-reference OTU picking independently of QIIME (e.g., in a compiled programming language, where runtimes should be further reduced). Our analyses should generalize to other implementations of these OTU picking algorithms. Finally, we present a comparison of parameter settings in QIIME's OTU picking workflows and make recommendations on settings for these free parameters to optimize runtime without reducing the quality of the results. These optimized parameters can vastly decrease the runtime of uclust-based OTU picking in QIIME.
2014-01-01
Next generation sequencing (NGS) of metagenomic samples is becoming a standard approach to detect individual species or pathogenic strains of microorganisms. Computer programs used in the NGS community have to balance between speed and sensitivity and as a result, species or strain level identification is often inaccurate and low abundance pathogens can sometimes be missed. We have developed Taxoner, an open source, taxon assignment pipeline that includes a fast aligner (e.g. Bowtie2) and a comprehensive DNA sequence database. We tested the program on simulated datasets as well as experimental data from Illumina, IonTorrent, and Roche 454 sequencing platforms. We found that Taxoner performs as well as, and often better than BLAST, but requires two orders of magnitude less running time meaning that it can be run on desktop or laptop computers. Taxoner is slower than the approaches that use small marker databases but is more sensitive due the comprehensive reference database. In addition, it can be easily tuned to specific applications using small tailored databases. When applied to metagenomic datasets, Taxoner can provide a functional summary of the genes mapped and can provide strain level identification. Taxoner is written in C for Linux operating systems. The code and documentation are available for research applications at http://code.google.com/p/taxoner. PMID:25077800
Pongor, Lőrinc S; Vera, Roberto; Ligeti, Balázs
2014-01-01
Next generation sequencing (NGS) of metagenomic samples is becoming a standard approach to detect individual species or pathogenic strains of microorganisms. Computer programs used in the NGS community have to balance between speed and sensitivity and as a result, species or strain level identification is often inaccurate and low abundance pathogens can sometimes be missed. We have developed Taxoner, an open source, taxon assignment pipeline that includes a fast aligner (e.g. Bowtie2) and a comprehensive DNA sequence database. We tested the program on simulated datasets as well as experimental data from Illumina, IonTorrent, and Roche 454 sequencing platforms. We found that Taxoner performs as well as, and often better than BLAST, but requires two orders of magnitude less running time meaning that it can be run on desktop or laptop computers. Taxoner is slower than the approaches that use small marker databases but is more sensitive due the comprehensive reference database. In addition, it can be easily tuned to specific applications using small tailored databases. When applied to metagenomic datasets, Taxoner can provide a functional summary of the genes mapped and can provide strain level identification. Taxoner is written in C for Linux operating systems. The code and documentation are available for research applications at http://code.google.com/p/taxoner.
2013-01-01
Background Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Results Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li’s D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li’s D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low FST. Conclusions This initial survey of genetic variation within honey bee RNA viruses suggests future directions for studies examining the underlying causes of population-genetic structure in these economically important pathogens. PMID:23497218
Cornman, Robert Scott; Boncristiani, Humberto; Dainat, Benjamin; Chen, Yanping; vanEngelsdorp, Dennis; Weaver, Daniel; Evans, Jay D
2013-03-07
Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li's D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li's D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low FST. This initial survey of genetic variation within honey bee RNA viruses suggests future directions for studies examining the underlying causes of population-genetic structure in these economically important pathogens.
NASA Technical Reports Server (NTRS)
Olsson, W. J.; Martin, R. L.
1982-01-01
Flight loads on the 747 propulsion system and resulting JT9D blade to outer airseal running clearances during representative acceptance flight and revenue flight sequences were measured. The resulting rub induced clearance changes, and engine performance changes were then analyzed to validate and refine the JT9D-7A short term performance deterioration model.
Observations on the Growth of Roughness Elements Into Icing Feathers
NASA Technical Reports Server (NTRS)
Vargas, Mario; Tsao, Jen, Ching
2007-01-01
This work presents the results of an experiment conducted in the Icing Research Tunnel at NASA Glenn Research Center to understand the process by which icing feathers are formed in the initial stages of ice accretion formation on swept wings. Close-up photographic data were taken on an aluminum NACA 0012 swept wing tip airfoil. Two types of photographic data were obtained: time sequence close-up photographic data during the run and close-up photographic data of the ice accretion at the end of each run. Icing runs were conducted for short ice accretion times from 10 to 180 sec. The time sequence close-up photographic data was used to study the process frame by frame and to create movies of how the process developed. The movies confirmed that at glaze icing conditions in the attachment line area icing feathers develop from roughness elements. The close-up photographic data at the end of each run showed that roughness elements change into a pointed shape with an upstream facet and join on the side with other elements having the same change to form ridges with pointed shape and upstream facet. The ridges develop into feathers when the upstream facet grows away to form the stem of the feather. The ridges and their growth into feathers were observed to form the initial scallop tips present in complete scallops.
DOT National Transportation Integrated Search
1995-09-05
The Run-Off-Road Collision Avoidance Using IVHS Countermeasures program is to address the single vehicle crash problem through application of technology to prevent and/or reduce the severity of these crashes. : This report documents the RORSIM comput...
DOT National Transportation Integrated Search
1995-08-01
INTELLIGENT VEHICLE INITIATIVE OR IVI : THE RUN-OFF-ROAD COLLISION AVOIDANCE USING IVHS COUNTERMEASURES PROGRAM IS TO ADDRESS THE SINGLE VEHICLE CRASH PROBLEM THROUGH APPLICATION OF TECHNOLOGY TO PREVENT AND/OR REDUCE THE SEVERITY OF THESE CRASHES. :...
Run-Off-Road Collision Avoidance Countermeasures Using IVHS Countermeasures: Task 3, Volume 1
DOT National Transportation Integrated Search
1995-08-23
The Run-Off-Road Collision Avoidance Using IVHS Countermeasures program is to address the single vehicle crash problem through application of technology to prevent and/or reduce the severity oi these crashes. This report describes the findings of the...
Run-Off-Road Collision Avoidance Countermeasures Using IVHS Countermeasures Task 3 - Volume 2
DOT National Transportation Integrated Search
1995-08-23
The Run-Off-Road Collision Avoidance Using IVHS Countermeasures program is to address the single vehicle crash problem through application of technology to prevent and/or reduce the severity of these crashes. : This report describes the findings of t...
NASA Technical Reports Server (NTRS)
Becker, Jeffrey C.
1995-01-01
The Thinking Machines CM-5 platform was designed to run single program, multiple data (SPMD) applications, i.e., to run a single binary across all nodes of a partition, with each node possibly operating on different data. Certain classes of applications, such as multi-disciplinary computational fluid dynamics codes, are facilitated by the ability to have subsets of the partition nodes running different binaries. In order to extend the CM-5 system software to permit such applications, a multi-program loader was developed. This system is based on the dld loader which was originally developed for workstations. This paper provides a high level description of dld, and describes how it was ported to the CM-5 to provide support for multi-binary applications. Finally, it elaborates how the loader has been used to implement the CM-5 version of MPIRUN, a portable facility for running multi-disciplinary/multi-zonal MPI (Message-Passing Interface Standard) codes.
Development of a polyprobe to detect six viroids of pome and stone fruit trees.
Lin, Liming; Li, Ruhui; Mock, Ray; Kinard, Gary
2011-01-01
A simple and sensitive dot blot hybridization assay using a digoxigenin-labeled cRNA polyprobe was developed for the simultaneous detection of six viroids that infect pome and stone fruit trees. The polyprobe was constructed by cloning sequentially partial sequences of each viroid into a single vector, with run-off transcription driven by the T7 promoter. All six viroids were detectable within a dilution range of 5(-3) to 5(-4) in total nucleic acid extracts from infected trees. Individual trees were co-inoculated to create mixed infections and all four pome fruit viroids and both stone fruit viroids could be detected in pear and peach trees, respectively, using the polyprobe. The results of the assays using the polyprobe were comparable to those using single probes. The methods were validated by testing geographically diverse isolates of viroids, as well as field samples from several collections in the US. The assay offers a rapid, reliable and cost-effective approach to the simultaneous detection of six fruit trees viroids and has the potential for routine use in quarantine, certification, and plant genebank programs where many samples are tested and distributed worldwide. Published by Elsevier B.V.
Single-Cell Sequencing for Precise Cancer Research: Progress and Prospects.
Zhang, Xiaoyan; Marjani, Sadie L; Hu, Zhaoyang; Weissman, Sherman M; Pan, Xinghua; Wu, Shixiu
2016-03-15
Advances in genomic technology have enabled the faithful detection and measurement of mutations and the gene expression profile of cancer cells at the single-cell level. Recently, several single-cell sequencing methods have been developed that permit the comprehensive and precise analysis of the cancer-cell genome, transcriptome, and epigenome. The use of these methods to analyze cancer cells has led to a series of unanticipated discoveries, such as the high heterogeneity and stochastic changes in cancer-cell populations, the new driver mutations and the complicated clonal evolution mechanisms, and the novel identification of biomarkers of variant tumors. These methods and the knowledge gained from their utilization could potentially improve the early detection and monitoring of rare cancer cells, such as circulating tumor cells and disseminated tumor cells, and promote the development of personalized and highly precise cancer therapy. Here, we discuss the current methods for single cancer-cell sequencing, with a strong focus on those practically used or potentially valuable in cancer research, including single-cell isolation, whole genome and transcriptome amplification, epigenome profiling, multi-dimensional sequencing, and next-generation sequencing and analysis. We also examine the current applications, challenges, and prospects of single cancer-cell sequencing. ©2016 American Association for Cancer Research.
Classifying proteins into functional groups based on all-versus-all BLAST of 10 million proteins.
Kolker, Natali; Higdon, Roger; Broomall, William; Stanberry, Larissa; Welch, Dean; Lu, Wei; Haynes, Winston; Barga, Roger; Kolker, Eugene
2011-01-01
To address the monumental challenge of assigning function to millions of sequenced proteins, we completed the first of a kind all-versus-all sequence alignments using BLAST for 9.9 million proteins in the UniRef100 database. Microsoft Windows Azure produced over 3 billion filtered records in 6 days using 475 eight-core virtual machines. Protein classification into functional groups was then performed using Hive and custom jars implemented on top of Apache Hadoop utilizing the MapReduce paradigm. First, using the Clusters of Orthologous Genes (COG) database, a length normalized bit score (LNBS) was determined to be the best similarity measure for classification of proteins. LNBS achieved sensitivity and specificity of 98% each. Second, out of 5.1 million bacterial proteins, about two-thirds were assigned to significantly extended COG groups, encompassing 30 times more assigned proteins. Third, the remaining proteins were classified into protein functional groups using an innovative implementation of a single-linkage algorithm on an in-house Hadoop compute cluster. This implementation significantly reduces the run time for nonindexed queries and optimizes efficient clustering on a large scale. The performance was also verified on Amazon Elastic MapReduce. This clustering assigned nearly 2 million proteins to approximately half a million different functional groups. A similar approach was applied to classify 2.8 million eukaryotic sequences resulting in over 1 million proteins being assign to existing KOG groups and the remainder clustered into 100,000 functional groups.
Evaluation of sequencing approaches for high-throughput ...
Whole-genome in vitro transcriptomics has shown the capability to identify mechanisms of action and estimates of potency for chemical-mediated effects in a toxicological framework, but with limited throughput and high cost. We present the evaluation of three toxicogenomics platforms for potential application to high-throughput screening: 1. TempO-Seq utilizing custom designed paired probes per gene; 2. Targeted sequencing (TSQ) utilizing Illumina’s TruSeq RNA Access Library Prep Kit containing tiled exon-specific probe sets; 3. Low coverage whole transcriptome sequencing (LSQ) using Illumina’s TruSeq Stranded mRNA Kit. Each platform was required to cover the ~20,000 genes of the full transcriptome, operate directly with cell lysates, and be automatable with 384-well plates. Technical reproducibility was assessed using MAQC control RNA samples A and B, while functional utility for chemical screening was evaluated using six treatments at a single concentration after 6 hr in MCF7 breast cancer cells: 10 µM chlorpromazine, 10 µM ciclopriox, 10 µM genistein, 100 nM sirolimus, 1 µM tanespimycin, and 1 µM trichostatin A. All RNA samples and chemical treatments were run with 5 technical replicates. The three platforms achieved different read depths, with the TempO-Seq having ~34M mapped reads per sample, while TSQ and LSQ averaged 20M and 11M aligned reads per sample, respectively. Inter-replicate correlation averaged ≥0.95 for raw log2 expression values i
Evolutionary distance from human homologs reflects allergenicity of animal food proteins.
Jenkins, John A; Breiteneder, Heimo; Mills, E N Clare
2007-12-01
In silico analysis of allergens can identify putative relationships among protein sequence, structure, and allergenic properties. Such systematic analysis reveals that most plant food allergens belong to a restricted number of protein superfamilies, with pollen allergens behaving similarly. We have investigated the structural relationships of animal food allergens and their evolutionary relatedness to human homologs to define how closely a protein must resemble a human counterpart to lose its allergenic potential. Profile-based sequence homology methods were used to classify animal food allergens into Pfam families, and in silico analyses of their evolutionary and structural relationships were performed. Animal food allergens could be classified into 3 main families--tropomyosins, EF-hand proteins, and caseins--along with 14 minor families each composed of 1 to 3 allergens. The evolutionary relationships of each of these allergen superfamilies showed that in general, proteins with a sequence identity to a human homolog above approximately 62% were rarely allergenic. Single substitutions in otherwise highly conserved regions containing IgE epitopes in EF-hand parvalbumins may modulate allergenicity. These data support the premise that certain protein structures are more allergenic than others. Contrasting with plant food allergens, animal allergens, such as the highly conserved tropomyosins, challenge the capability of the human immune system to discriminate between foreign and self-proteins. Such immune responses run close to becoming autoimmune responses. Exploiting the closeness between animal allergens and their human homologs in the development of recombinant allergens for immunotherapy will need to consider the potential for developing unanticipated autoimmune responses.
NASA Astrophysics Data System (ADS)
Lecoeur, Jérémy; Ferré, Jean-Christophe; Collins, D. Louis; Morrisey, Sean P.; Barillot, Christian
2009-02-01
A new segmentation framework is presented taking advantage of multimodal image signature of the different brain tissues (healthy and/or pathological). This is achieved by merging three different modalities of gray-level MRI sequences into a single RGB-like MRI, hence creating a unique 3-dimensional signature for each tissue by utilising the complementary information of each MRI sequence. Using the scale-space spectral gradient operator, we can obtain a spatial gradient robust to intensity inhomogeneity. Even though it is based on psycho-visual color theory, it can be very efficiently applied to the RGB colored images. More over, it is not influenced by the channel assigment of each MRI. Its optimisation by the graph cuts paradigm provides a powerful and accurate tool to segment either healthy or pathological tissues in a short time (average time about ninety seconds for a brain-tissues classification). As it is a semi-automatic method, we run experiments to quantify the amount of seeds needed to perform a correct segmentation (dice similarity score above 0.85). Depending on the different sets of MRI sequences used, this amount of seeds (expressed as a relative number in pourcentage of the number of voxels of the ground truth) is between 6 to 16%. We tested this algorithm on brainweb for validation purpose (healthy tissue classification and MS lesions segmentation) and also on clinical data for tumours and MS lesions dectection and tissues classification.
Indexed variation graphs for efficient and accurate resistome profiling.
Rowe, Will P M; Winn, Martyn D
2018-05-14
Antimicrobial resistance remains a major threat to global health. Profiling the collective antimicrobial resistance genes within a metagenome (the "resistome") facilitates greater understanding of antimicrobial resistance gene diversity and dynamics. In turn, this can allow for gene surveillance, individualised treatment of bacterial infections and more sustainable use of antimicrobials. However, resistome profiling can be complicated by high similarity between reference genes, as well as the sheer volume of sequencing data and the complexity of analysis workflows. We have developed an efficient and accurate method for resistome profiling that addresses these complications and improves upon currently available tools. Our method combines a variation graph representation of gene sets with an LSH Forest indexing scheme to allow for fast classification of metagenomic sequence reads using similarity-search queries. Subsequent hierarchical local alignment of classified reads against graph traversals enables accurate reconstruction of full-length gene sequences using a scoring scheme. We provide our implementation, GROOT, and show it to be both faster and more accurate than a current reference-dependent tool for resistome profiling. GROOT runs on a laptop and can process a typical 2 gigabyte metagenome in 2 minutes using a single CPU. Our method is not restricted to resistome profiling and has the potential to improve current metagenomic workflows. GROOT is written in Go and is available at https://github.com/will-rowe/groot (MIT license). will.rowe@stfc.ac.uk. Supplementary data are available at Bioinformatics online.
Effect of Footwear on Dynamic Stability during Single-leg Jump Landings.
Bowser, Bradley J; Rose, William C; McGrath, Robert; Salerno, Jilian; Wallace, Joshua; Davis, Irene S
2017-06-01
Barefoot and minimal footwear running has led to greater interest in the biomechanical effects of different types of footwear. The effect of running footwear on dynamic stability is not well understood. The purpose of this study was to compare dynamic stability and impact loading across 3 footwear conditions; barefoot, minimal footwear and standard running shoes. 25 injury free runners (21 male, 4 female) completed 5 single-leg jump landings in each footwear condition. Dynamic stability was assessed using the dynamic postural stability index and its directional components (mediolateral, anteroposterior, vertical). Peak vertical ground reaction force and vertical loadrates were also compared across footwear conditions. Dynamic stability was dependent on footwear type for all stability indices (ANOVA, p<0.05). Post-hoc tests showed dynamic stability was greater when barefoot than in running shoes for each stability index (p<0.02) and greater than minimal footwear for the anteroposterior stability index (p<0.01). Peak vertical force and average loadrates were both dependent on footwear (p≤0.05). Dynamic stability, peak vertical force, and average loadrates during single-leg jump landings appear to be affected by footwear type. The results suggest greater dynamic stability and lower impact loading when landing barefoot or in minimal footwear. © Georg Thieme Verlag KG Stuttgart · New York.
Zhu, Yuan O; Aw, Pauline P K; de Sessions, Paola Florez; Hong, Shuzhen; See, Lee Xian; Hong, Lewis Z; Wilm, Andreas; Li, Chen Hao; Hue, Stephane; Lim, Seng Gee; Nagarajan, Niranjan; Burkholder, William F; Hibberd, Martin
2017-10-27
Viral populations are complex, dynamic, and fast evolving. The evolution of groups of closely related viruses in a competitive environment is termed quasispecies. To fully understand the role that quasispecies play in viral evolution, characterizing the trajectories of viral genotypes in an evolving population is the key. In particular, long-range haplotype information for thousands of individual viruses is critical; yet generating this information is non-trivial. Popular deep sequencing methods generate relatively short reads that do not preserve linkage information, while third generation sequencing methods have higher error rates that make detection of low frequency mutations a bioinformatics challenge. Here we applied BAsE-Seq, an Illumina-based single-virion sequencing technology, to eight samples from four chronic hepatitis B (CHB) patients - once before antiviral treatment and once after viral rebound due to resistance. With single-virion sequencing, we obtained 248-8796 single-virion sequences per sample, which allowed us to find evidence for both hard and soft selective sweeps. We were able to reconstruct population demographic history that was independently verified by clinically collected data. We further verified four of the samples independently through PacBio SMRT and Illumina Pooled deep sequencing. Overall, we showed that single-virion sequencing yields insight into viral evolution and population dynamics in an efficient and high throughput manner. We believe that single-virion sequencing is widely applicable to the study of viral evolution in the context of drug resistance and host adaptation, allows differentiation between soft or hard selective sweeps, and may be useful in the reconstruction of intra-host viral population demographic history.
Embedding strategies for effective use of information from multiple sequence alignments.
Henikoff, S.; Henikoff, J. G.
1997-01-01
We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain. PMID:9070452
Knowledge Data Base for Amorphous Metals
2007-07-26
not programmatic, updates. Over 100 custom SQL statements that maintain the domain specific data are attached to the workflow entries in a generic...for the form by populating the SQL and run generation tables. Application data may be prepared in different ways for two steps that invoke the same form...run generation mode). There is a single table of SQL commands. Each record has a user-definable ID, the SQL code, and a comment. The run generation
Copy number variants calling for single cell sequencing data by multi-constrained optimization.
Xu, Bo; Cai, Hongmin; Zhang, Changsheng; Yang, Xi; Han, Guoqiang
2016-08-01
Variations in DNA copy number carry important information on genome evolution and regulation of DNA replication in cancer cells. The rapid development of single-cell sequencing technology allows one to explore gene expression heterogeneity among single-cells, thus providing important cancer cell evolution information. Single-cell DNA/RNA sequencing data usually have low genome coverage, which requires an extra step of amplification to accumulate enough samples. However, such amplification will introduce large bias and makes bioinformatics analysis challenging. Accurately modeling the distribution of sequencing data and effectively suppressing the bias influence is the key to success variations analysis. Recent advances demonstrate the technical noises by amplification are more likely to follow negative binomial distribution, a special case of Poisson distribution. Thus, we tackle the problem CNV detection by formulating it into a quadratic optimization problem involving two constraints, in which the underling signals are corrupted by Poisson distributed noises. By imposing the constraints of sparsity and smoothness, the reconstructed read depth signals from single-cell sequencing data are anticipated to fit the CNVs patterns more accurately. An efficient numerical solution based on the classical alternating direction minimization method (ADMM) is tailored to solve the proposed model. We demonstrate the advantages of the proposed method using both synthetic and empirical single-cell sequencing data. Our experimental results demonstrate that the proposed method achieves excellent performance and high promise of success with single-cell sequencing data. Crown Copyright © 2016. Published by Elsevier Ltd. All rights reserved.
Watanabe, Manabu; Kusano, Junko; Ohtaki, Shinsaku; Ishikura, Takashi; Katayama, Jin; Koguchi, Akira; Paumen, Michael; Hayashi, Yoshiharu
2014-09-01
Combining single-cell methods and next-generation sequencing should provide a powerful means to understand single-cell biology and obviate the effects of sample heterogeneity. Here we report a single-cell identification method and seamless cancer gene profiling using semiconductor-based massively parallel sequencing. A549 cells (adenocarcinomic human alveolar basal epithelial cell line) were used as a model. Single-cell capture was performed using laser capture microdissection (LCM) with an Arcturus® XT system, and a captured single cell and a bulk population of A549 cells (≈ 10(6) cells) were subjected to whole genome amplification (WGA). For cell identification, a multiplex PCR method (AmpliSeq™ SNP HID panel) was used to enrich 136 highly discriminatory SNPs with a genotype concordance probability of 10(31-35). For cancer gene profiling, we used mutation profiling that was performed in parallel using a hotspot panel for 50 cancer-related genes. Sequencing was performed using a semiconductor-based bench top sequencer. The distribution of sequence reads for both HID and Cancer panel amplicons was consistent across these samples. For the bulk population of cells, the percentages of sequence covered at coverage of more than 100 × were 99.04% for the HID panel and 98.83% for the Cancer panel, while for the single cell percentages of sequence covered at coverage of more than 100 × were 55.93% for the HID panel and 65.96% for the Cancer panel. Partial amplification failure or randomly distributed non-amplified regions across samples from single cells during the WGA procedures or random allele drop out probably caused these differences. However, comparative analyses showed that this method successfully discriminated a single A549 cancer cell from a bulk population of A549 cells. Thus, our approach provides a powerful means to overcome tumor sample heterogeneity when searching for somatic mutations.
PeakRanger: A cloud-enabled peak caller for ChIP-seq data
2011-01-01
Background Chromatin immunoprecipitation (ChIP), coupled with massively parallel short-read sequencing (seq) is used to probe chromatin dynamics. Although there are many algorithms to call peaks from ChIP-seq datasets, most are tuned either to handle punctate sites, such as transcriptional factor binding sites, or broad regions, such as histone modification marks; few can do both. Other algorithms are limited in their configurability, performance on large data sets, and ability to distinguish closely-spaced peaks. Results In this paper, we introduce PeakRanger, a peak caller software package that works equally well on punctate and broad sites, can resolve closely-spaced peaks, has excellent performance, and is easily customized. In addition, PeakRanger can be run in a parallel cloud computing environment to obtain extremely high performance on very large data sets. We present a series of benchmarks to evaluate PeakRanger against 10 other peak callers, and demonstrate the performance of PeakRanger on both real and synthetic data sets. We also present real world usages of PeakRanger, including peak-calling in the modENCODE project. Conclusions Compared to other peak callers tested, PeakRanger offers improved resolution in distinguishing extremely closely-spaced peaks. PeakRanger has above-average spatial accuracy in terms of identifying the precise location of binding events. PeakRanger also has excellent sensitivity and specificity in all benchmarks evaluated. In addition, PeakRanger offers significant improvements in run time when running on a single processor system, and very marked improvements when allowed to take advantage of the MapReduce parallel environment offered by a cloud computing resource. PeakRanger can be downloaded at the official site of modENCODE project: http://www.modencode.org/software/ranger/ PMID:21554709
MuffinInfo: HTML5-Based Statistics Extractor from Next-Generation Sequencing Data.
Alic, Andy S; Blanquer, Ignacio
2016-09-01
Usually, the information known a priori about a newly sequenced organism is limited. Even resequencing the same organism can generate unpredictable output. We introduce MuffinInfo, a FastQ/Fasta/SAM information extractor implemented in HTML5 capable of offering insights into next-generation sequencing (NGS) data. Our new tool can run on any software or hardware environment, in command line or graphically, and in browser or standalone. It presents information such as average length, base distribution, quality scores distribution, k-mer histogram, and homopolymers analysis. MuffinInfo improves upon the existing extractors by adding the ability to save and then reload the results obtained after a run as a navigable file (also supporting saving pictures of the charts), by supporting custom statistics implemented by the user, and by offering user-adjustable parameters involved in the processing, all in one software. At the moment, the extractor works with all base space technologies such as Illumina, Roche, Ion Torrent, Pacific Biosciences, and Oxford Nanopore. Owing to HTML5, our software demonstrates the readiness of web technologies for mild intensive tasks encountered in bioinformatics.
ACON: a multipurpose production controller for plasma physics codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Snell, C.
1983-01-01
ACON is a BCON controller designed to run large production codes on the CTSS Cray-1 or the LTSS 7600 computers. ACON can also be operated interactively, with input from the user's terminal. The controller can run one code or a sequence of up to ten codes during the same job. Options are available to get and save Mass storage files, to perform Historian file updating operations, to compile and load source files, and to send out print and film files. Special features include ability to retry after Mass failures, backup options for saving files, startup messages for the various codes,more » and ability to reserve specified amounts of computer time after successive code runs. ACON's flexibility and power make it useful for running a number of different production codes.« less
DOT National Transportation Integrated Search
1994-10-28
The Run-Off-Road Collision Avoidance Using IVHS Countermeasures program is to address the single vehicle crash problem through application of technology to prevent and/or reduce the severity of these crashes. This report describes and documents the a...
DOT National Transportation Integrated Search
1994-10-01
THE RUN-OFF-ROAD COLLISION AVOIDANCE USING IVHS COUNTERMEASURES PROGRAM IS TO ADDRESS THE SINGLE VEHICLE CRASH PROBLEM THROUGH APPLICATION OF TECHNOLOGY TO PREVENT AND/OR REDUCE THE SEVERITY OF THESE CRASHES. : THIS REPORT DESCRIBES AND DOCUMENTS ...
DOT National Transportation Integrated Search
1995-06-01
THE RUN-OFF-ROAD COLLISION AVOIDANCE USING IVHS COUNTERMEASURES PROGRAM IS TO ADDRESS THE SINGLE VEHICLE CRASH PROBLEM THROUGH APPLICATION OF TECHNOLOGY TO PREVENT AND/OR REDUCE THE SEVERITY OF THESE CRASHES. : THIS REPORT DESCRIBES AND DOCUMENTS ...
DOT National Transportation Integrated Search
1995-09-01
THE RUN-OFF-ROAD COLLISION AVOIDANCE USING IVHS COUNTERMEASURES PROGRAM IS TO ADDRESS THE SINGLE VEHICLE CRASH PROBLEM THROUGH APPLICATION OF TECHNOLOGY TO PREVENT AND/OR REDUCE THE SEVERITY OF THESE CRASHES. : THIS REPORT DOCUMENTS THE RORSIM COM...
DOT National Transportation Integrated Search
1994-10-28
The Run-Off-Road Collision Avoidance Using IVHS Countermeasures program is to address the single vehicle crash problem through application of technology to prevent and/or reduce the severity of these crashes. This report contains a summary of data us...
Advances in single-cell RNA sequencing and its applications in cancer research.
Zhu, Sibo; Qing, Tao; Zheng, Yuanting; Jin, Li; Shi, Leming
2017-08-08
Unlike population-level approaches, single-cell RNA sequencing enables transcriptomic analysis of an individual cell. Through the combination of high-throughput sequencing and bioinformatic tools, single-cell RNA-seq can detect more than 10,000 transcripts in one cell to distinguish cell subsets and dynamic cellular changes. After several years' development, single-cell RNA-seq can now achieve massively parallel, full-length mRNA sequencing as well as in situ sequencing and even has potential for multi-omic detection. One appealing area of single-cell RNA-seq is cancer research, and it is regarded as a promising way to enhance prognosis and provide more precise target therapy by identifying druggable subclones. Indeed, progresses have been made regarding solid tumor analysis to reveal intratumoral heterogeneity, correlations between signaling pathways, stemness, drug resistance, and tumor architecture shaping the microenvironment. Furthermore, through investigation into circulating tumor cells, many genes have been shown to promote a propensity toward stemness and the epithelial-mesenchymal transition, to enhance anchoring and adhesion, and to be involved in mechanisms of anoikis resistance and drug resistance. This review focuses on advances and progresses of single-cell RNA-seq with regard to the following aspects: 1. Methodologies of single-cell RNA-seq 2. Single-cell isolation techniques 3. Single-cell RNA-seq in solid tumor research 4. Single-cell RNA-seq in circulating tumor cell research 5.
Advances in single-cell RNA sequencing and its applications in cancer research
Zhu, Sibo; Qing, Tao; Zheng, Yuanting; Jin, Li; Shi, Leming
2017-01-01
Unlike population-level approaches, single-cell RNA sequencing enables transcriptomic analysis of an individual cell. Through the combination of high-throughput sequencing and bioinformatic tools, single-cell RNA-seq can detect more than 10,000 transcripts in one cell to distinguish cell subsets and dynamic cellular changes. After several years’ development, single-cell RNA-seq can now achieve massively parallel, full-length mRNA sequencing as well as in situ sequencing and even has potential for multi-omic detection. One appealing area of single-cell RNA-seq is cancer research, and it is regarded as a promising way to enhance prognosis and provide more precise target therapy by identifying druggable subclones. Indeed, progresses have been made regarding solid tumor analysis to reveal intratumoral heterogeneity, correlations between signaling pathways, stemness, drug resistance, and tumor architecture shaping the microenvironment. Furthermore, through investigation into circulating tumor cells, many genes have been shown to promote a propensity toward stemness and the epithelial-mesenchymal transition, to enhance anchoring and adhesion, and to be involved in mechanisms of anoikis resistance and drug resistance. This review focuses on advances and progresses of single-cell RNA-seq with regard to the following aspects: 1. Methodologies of single-cell RNA-seq 2. Single-cell isolation techniques 3. Single-cell RNA-seq in solid tumor research 4. Single-cell RNA-seq in circulating tumor cell research 5. Perspectives PMID:28881849
Gene and translation initiation site prediction in metagenomic sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John
2012-01-01
Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translationmore » initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.« less
[Osteoarthritis from long-distance running?].
Hohmann, E; Wörtler, K; Imhoff, A
2005-06-01
Long distance running has become a fashionable recreational activity. This study investigated the effects of external impact loading on bone and cartilage introduced by performing a marathon race. Seven beginners were compared to six experienced recreational long distance runners and two professional athletes. All participants underwent magnetic resonance imaging of the hip and knee before and after a marathon run. Coronal T1 weighted and STIR sequences were used. The pre MRI served as a baseline investigation and monitored the training effect. All athletes demonstrated normal findings in the pre run scan. All but one athlete in the beginner group demonstrated joint effusions after the race. The experienced and professional runners failed to demonstrate pathology in the post run scans. Recreational and professional long distance runners tolerate high impact forces well. Beginners demonstrate significant changes on the post run scans. Whether those findings are a result of inadequate training (miles and duration) warrant further studies. We conclude that adequate endurance training results in adaptation mechanisms that allow the athlete to compensate for the stresses introduced by long distance running and do not predispose to the onset of osteoarthritis. Significant malalignment of the lower extremity may cause increased focal loading of joint and cartilage.
Rischewski, J; Schneppenheim, R
2001-01-30
Patients with Fanconi anemia (Fanc) are at risk of developing leukemia. Mutations of the group A gene (FancA) are most common. A multitude of polymorphisms and mutations within the 43 exons of the gene are described. To examine the role of heterozygosity as a risk factor for malignancies, a partially automatized screening method to identify aberrations was needed. We report on our experience with DHPLC (WAVE (Transgenomic)). PCR amplification of all 43 exons from one individual was performed on one microtiter plate on a gradient thermocycler. DHPLC analysis conditions were established via melting curves, prediction software, and test runs with aberrant samples. PCR products were analyzed twice: native, and after adding a WT-PCR product. Retention patterns were compared with previously identified polymorphic PCR products or mutants. We have defined the mutation screening conditions for all 43 exons of FancA using DHPLC. So far, 40 different sequence variations have been detected in more than 100 individuals. The native analysis identifies heterozygous individuals, and the second run detects homozygous aberrations. Retention patterns are specific for the underlying sequence aberration, thus reducing sequencing demand and costs. DHPLC is a valuable tool for reproducible recognition of known sequence aberrations and screening for unknown mutations in the highly polymorphic FancA gene.
Dou, Wei; Shen, Guang-Mao; Niu, Jin-Zhi; Ding, Tian-Bo; Wei, Dan-Dan; Wang, Jin-Jun
2013-01-01
Recent studies indicate that infestations of psocids pose a new risk for global food security. Among the psocids species, Liposcelis bostrychophila Badonnel has gained recognition in importance because of its parthenogenic reproduction, rapid adaptation, and increased worldwide distribution. To date, the molecular data available for L. bostrychophila is largely limited to genes identified through homology. Also, no transcriptome data relevant to psocids infection is available. In this study, we generated de novo assembly of L. bostrychophila transcriptome performed through the short read sequencing technology (Illumina). In a single run, we obtained more than 51 million sequencing reads that were assembled into 60,012 unigenes (mean size = 711 bp) by Trinity. The transcriptome sequences from different developmental stages of L. bostrychophila including egg, nymph and adult were annotated with non-redundant (Nr) protein database, gene ontology (GO), cluster of orthologous groups of proteins (COG), and KEGG orthology (KO). The analysis revealed three major enzyme families involved in insecticide metabolism as differentially expressed in the L. bostrychophila transcriptome. A total of 49 P450-, 31 GST- and 21 CES-specific genes representing the three enzyme families were identified. Besides, 16 transcripts were identified to contain target site sequences of resistance genes. Furthermore, we profiled gene expression patterns upon insecticide (malathion and deltamethrin) exposure using the tag-based digital gene expression (DGE) method. The L. bostrychophila transcriptome and DGE data provide gene expression data that would further our understanding of molecular mechanisms in psocids. In particular, the findings of this investigation will facilitate identification of genes involved in insecticide resistance and designing of new compounds for control of psocids.
Dou, Wei; Shen, Guang-Mao; Niu, Jin-Zhi; Ding, Tian-Bo; Wei, Dan-Dan; Wang, Jin-Jun
2013-01-01
Background Recent studies indicate that infestations of psocids pose a new risk for global food security. Among the psocids species, Liposcelis bostrychophila Badonnel has gained recognition in importance because of its parthenogenic reproduction, rapid adaptation, and increased worldwide distribution. To date, the molecular data available for L. bostrychophila is largely limited to genes identified through homology. Also, no transcriptome data relevant to psocids infection is available. Methodology and Principal Findings In this study, we generated de novo assembly of L. bostrychophila transcriptome performed through the short read sequencing technology (Illumina). In a single run, we obtained more than 51 million sequencing reads that were assembled into 60,012 unigenes (mean size = 711 bp) by Trinity. The transcriptome sequences from different developmental stages of L. bostrychophila including egg, nymph and adult were annotated with non-redundant (Nr) protein database, gene ontology (GO), cluster of orthologous groups of proteins (COG), and KEGG orthology (KO). The analysis revealed three major enzyme families involved in insecticide metabolism as differentially expressed in the L. bostrychophila transcriptome. A total of 49 P450-, 31 GST- and 21 CES-specific genes representing the three enzyme families were identified. Besides, 16 transcripts were identified to contain target site sequences of resistance genes. Furthermore, we profiled gene expression patterns upon insecticide (malathion and deltamethrin) exposure using the tag-based digital gene expression (DGE) method. Conclusion The L. bostrychophila transcriptome and DGE data provide gene expression data that would further our understanding of molecular mechanisms in psocids. In particular, the findings of this investigation will facilitate identification of genes involved in insecticide resistance and designing of new compounds for control of psocids. PMID:24278202
Genome Features of “Dark-Fly”, a Drosophila Line Reared Long-Term in a Dark Environment
Zhou, Jun; Sugiyama, Yuzo; Nishimura, Osamu; Aizu, Tomoyuki; Toyoda, Atsushi; Fujiyama, Asao; Agata, Kiyokazu
2012-01-01
Organisms are remarkably adapted to diverse environments by specialized metabolisms, morphology, or behaviors. To address the molecular mechanisms underlying environmental adaptation, we have utilized a Drosophila melanogaster line, termed “Dark-fly”, which has been maintained in constant dark conditions for 57 years (1400 generations). We found that Dark-fly exhibited higher fecundity in dark than in light conditions, indicating that Dark-fly possesses some traits advantageous in darkness. Using next-generation sequencing technology, we determined the whole genome sequence of Dark-fly and identified approximately 220,000 single nucleotide polymorphisms (SNPs) and 4,700 insertions or deletions (InDels) in the Dark-fly genome compared to the genome of the Oregon-R-S strain, a control strain. 1.8% of SNPs were classified as non-synonymous SNPs (nsSNPs: i.e., they alter the amino acid sequence of gene products). Among them, we detected 28 nonsense mutations (i.e., they produce a stop codon in the protein sequence) in the Dark-fly genome. These included genes encoding an olfactory receptor and a light receptor. We also searched runs of homozygosity (ROH) regions as putative regions selected during the population history, and found 21 ROH regions in the Dark-fly genome. We identified 241 genes carrying nsSNPs or InDels in the ROH regions. These include a cluster of alpha-esterase genes that are involved in detoxification processes. Furthermore, analysis of structural variants in the Dark-fly genome showed the deletion of a gene related to fatty acid metabolism. Our results revealed unique features of the Dark-fly genome and provided a list of potential candidate genes involved in environmental adaptation. PMID:22432011
Chen, DaYang; Zhen, HeFu; Qiu, Yong; Liu, Ping; Zeng, Peng; Xia, Jun; Shi, QianYu; Xie, Lin; Zhu, Zhu; Gao, Ya; Huang, GuoDong; Wang, Jian; Yang, HuanMing; Chen, Fang
2018-03-21
Research based on a strategy of single-cell low-coverage whole genome sequencing (SLWGS) has enabled better reproducibility and accuracy for detection of copy number variations (CNVs). The whole genome amplification (WGA) method and sequencing platform are critical factors for successful SLWGS (<0.1 × coverage). In this study, we compared single cell and multiple cells sequencing data produced by the HiSeq2000 and Ion Proton platforms using two WGA kits and then comprehensively evaluated the GC-bias, reproducibility, uniformity and CNV detection among different experimental combinations. Our analysis demonstrated that the PicoPLEX WGA Kit resulted in higher reproducibility, lower sequencing error frequency but more GC-bias than the GenomePlex Single Cell WGA Kit (WGA4 kit) independent of the cell number on the HiSeq2000 platform. While on the Ion Proton platform, the WGA4 kit (both single cell and multiple cells) had higher uniformity and less GC-bias but lower reproducibility than those of the PicoPLEX WGA Kit. Moreover, on these two sequencing platforms, depending on cell number, the performance of the two WGA kits was different for both sensitivity and specificity on CNV detection. The results can help researchers who plan to use SLWGS on single or multiple cells to select appropriate experimental conditions for their applications.
Single molecule targeted sequencing for cancer gene mutation detection.
Gao, Yan; Deng, Liwei; Yan, Qin; Gao, Yongqian; Wu, Zengding; Cai, Jinsen; Ji, Daorui; Li, Gailing; Wu, Ping; Jin, Huan; Zhao, Luyang; Liu, Song; Ge, Liangjin; Deem, Michael W; He, Jiankui
2016-05-19
With the rapid decline in cost of sequencing, it is now affordable to examine multiple genes in a single disease-targeted clinical test using next generation sequencing. Current targeted sequencing methods require a separate step of targeted capture enrichment during sample preparation before sequencing. Although there are fast sample preparation methods available in market, the library preparation process is still relatively complicated for physicians to use routinely. Here, we introduced an amplification-free Single Molecule Targeted Sequencing (SMTS) technology, which combined targeted capture and sequencing in one step. We demonstrated that this technology can detect low-frequency mutations using artificially synthesized DNA sample. SMTS has several potential advantages, including simple sample preparation thus no biases and errors are introduced by PCR reaction. SMTS has the potential to be an easy and quick sequencing technology for clinical diagnosis such as cancer gene mutation detection, infectious disease detection, inherited condition screening and noninvasive prenatal diagnosis.
Is Single-Port Laparoscopy More Precise and Faster with the Robot?
Fransen, Sofie A F; van den Bos, Jacqueline; Stassen, Laurents P S; Bouvy, Nicole D
2016-11-01
Single-port laparoscopy is a step forward toward nearly scar less surgery. Concern has been raised that single-incision laparoscopic surgery (SILS) is technically more challenging because of the lack of triangulation and the clashing of instruments. Robotic single-incision laparoscopic surgery (RSILS) in chopstick setting might overcome these problems. This study evaluated the outcome in time and errors of two tasks of the Fundamentals of Laparoscopic Surgery on a dry platform, in two settings: SILS versus RSILS. Nine experienced laparoscopic surgeons performed two tasks: peg transfer and a suturing task, on a standard box trainer. All participants practiced each task three times in both settings: SILS and a RSILS setting. The assessment scores (time and errors) were recorded. For the first task of peg transfer, RSILS was significantly better in time (124 versus 230 seconds, P = .0004) and errors (0.80 errors versus 2.60 errors, P = .024) at the first run, compared to the SILS setting. At the third and final run, RSILS still proved to be significantly better in errors (0.10 errors versus 0.80 errors, P = .025) compared to the SILS group. RSILS was faster in the third run, but not significant (116 versus 157 seconds, P = .08). For the second task, a suturing task, only 3 participants of the SILS group were able to perform this task within the set time frame of 600 seconds. There was no significant difference in time in the three runs between SILS and RSILS for the 3 participants that fulfilled both tasks within the 600 seconds. This study shows that robotic single-port surgery seems easier, faster, and more precise to perform basis tasks of the Fundamentals of laparoscopic surgery. For the more complex task of suturing, only the single-port robotic setting enabled all participants to fulfill this task, within the set time frame.
CHROMA: consensus-based colouring of multiple alignments for publication.
Goodstadt, L; Ponting, C P
2001-09-01
CHROMA annotates multiple protein sequence alignments by consensus to produce formatted and coloured text suitable for incorporation into other documents for publication. The package is designed to be flexible and reliable, and has a simple-to-use graphical user interface running under Microsoft Windows. Both the executables and source code for CHROMA running under Windows and Linux (portable command-line only) are freely available at http://www.lg.ndirect.co.uk/chroma. Software enquiries should be directed to CHROMA@lg.ndirect.co.uk.
Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) onboard calibration system
NASA Technical Reports Server (NTRS)
Chrien, Thomas G.; Eastwood, Mike; Green, Robert O.; Sarture, Charles; Johnson, Howell; Chovit, Chris; Hajek, Pavel
1995-01-01
The AVIRIS instrument uses an onboard calibration system to provide auxiliary calibration data. The system consist of a tungsten halogen cycle lamp imaged onto a fiber bundle through an eight position filter wheel. The fiber bundle illuminates the back side of the foreoptics shutter during a pre-run and post-run calibration sequence. The filter wheel contains two neutral density filters, five spectral filters and one blocked position. This paper reviews the general workings of the onboard calibrator system and discusses recent modifications.
Hayashi, Tetsutaro; Ozaki, Haruka; Sasagawa, Yohei; Umeda, Mana; Danno, Hiroki; Nikaido, Itoshi
2018-02-12
Total RNA sequencing has been used to reveal poly(A) and non-poly(A) RNA expression, RNA processing and enhancer activity. To date, no method for full-length total RNA sequencing of single cells has been developed despite the potential of this technology for single-cell biology. Here we describe random displacement amplification sequencing (RamDA-seq), the first full-length total RNA-sequencing method for single cells. Compared with other methods, RamDA-seq shows high sensitivity to non-poly(A) RNA and near-complete full-length transcript coverage. Using RamDA-seq with differentiation time course samples of mouse embryonic stem cells, we reveal hundreds of dynamically regulated non-poly(A) transcripts, including histone transcripts and long noncoding RNA Neat1. Moreover, RamDA-seq profiles recursive splicing in >300-kb introns. RamDA-seq also detects enhancer RNAs and their cell type-specific activity in single cells. Taken together, we demonstrate that RamDA-seq could help investigate the dynamics of gene expression, RNA-processing events and transcriptional regulation in single cells.
Multi-region and single-cell sequencing reveal variable genomic heterogeneity in rectal cancer.
Liu, Mingshan; Liu, Yang; Di, Jiabo; Su, Zhe; Yang, Hong; Jiang, Beihai; Wang, Zaozao; Zhuang, Meng; Bai, Fan; Su, Xiangqian
2017-11-23
Colorectal cancer is a heterogeneous group of malignancies with complex molecular subtypes. While colon cancer has been widely investigated, studies on rectal cancer are very limited. Here, we performed multi-region whole-exome sequencing and single-cell whole-genome sequencing to examine the genomic intratumor heterogeneity (ITH) of rectal tumors. We sequenced nine tumor regions and 88 single cells from two rectal cancer patients with tumors of the same molecular classification and characterized their mutation profiles and somatic copy number alterations (SCNAs) at the multi-region and the single-cell levels. A variable extent of genomic heterogeneity was observed between the two patients, and the degree of ITH increased when analyzed on the single-cell level. We found that major SCNAs were early events in cancer development and inherited steadily. Single-cell sequencing revealed mutations and SCNAs which were hidden in bulk sequencing. In summary, we studied the ITH of rectal cancer at regional and single-cell resolution and demonstrated that variable heterogeneity existed in two patients. The mutational scenarios and SCNA profiles of two patients with treatment naïve from the same molecular subtype are quite different. Our results suggest each tumor possesses its own architecture, which may result in different diagnosis, prognosis, and drug responses. Remarkable ITH exists in the two patients we have studied, providing a preliminary impression of ITH in rectal cancer.
Optimizing a Laser Process for Making Carbon Nanotubes
NASA Technical Reports Server (NTRS)
Arepalli, Sivaram; Nikolaev, Pavel; Holmes, William
2010-01-01
A systematic experimental study has been performed to determine the effects of each of the operating conditions in a double-pulse laser ablation process that is used to produce single-wall carbon nanotubes (SWCNTs). The comprehensive data compiled in this study have been analyzed to recommend conditions for optimizing the process and scaling up the process for mass production. The double-pulse laser ablation process for making SWCNTs was developed by Rice University researchers. Of all currently known nanotube-synthesizing processes (arc and chemical vapor deposition), this process yields the greatest proportion of SWCNTs in the product material. The aforementioned process conditions are important for optimizing the production of SWCNTs and scaling up production. Reports of previous research (mostly at Rice University) toward optimization of process conditions mention effects of oven temperature and briefly mention effects of flow conditions, but no systematic, comprehensive study of the effects of process conditions was done prior to the study described here. This was a parametric study, in which several production runs were carried out, changing one operating condition for each run. The study involved variation of a total of nine parameters: the sequence of the laser pulses, pulse-separation time, laser pulse energy density, buffer gas (helium or nitrogen instead of argon), oven temperature, pressure, flow speed, inner diameter of the flow tube, and flow-tube material.
Telerobot local-remote control architecture for space flight program applications
NASA Technical Reports Server (NTRS)
Zimmerman, Wayne; Backes, Paul; Steele, Robert; Long, Mark; Bon, Bruce; Beahan, John
1993-01-01
The JPL Supervisory Telerobotics (STELER) Laboratory has developed and demonstrated a unique local-remote robot control architecture which enables management of intermittent communication bus latencies and delays such as those expected for ground-remote operation of Space Station robotic systems via the Tracking and Data Relay Satellite System (TDRSS) communication platform. The current work at JPL in this area has focused on enhancing the technologies and transferring the control architecture to hardware and software environments which are more compatible with projected ground and space operational environments. At the local site, the operator updates the remote worksite model using stereo video and a model overlay/fitting algorithm which outputs the location and orientation of the object in free space. That information is relayed to the robot User Macro Interface (UMI) to enable programming of the robot control macros. This capability runs on a single Silicon Graphics Inc. machine. The operator can employ either manual teleoperation, shared control, or supervised autonomous control to manipulate the intended object. The remote site controller, called the Modular Telerobot Task Execution System (MOTES), runs in a multi-processor VME environment and performs the task sequencing, task execution, trajectory generation, closed loop force/torque control, task parameter monitoring, and reflex action. This paper describes the new STELER architecture implementation, and also documents the results of the recent autonomous docking task execution using the local site and MOTES.
Chung, Wei-Chun; Chen, Chien-Chih; Ho, Jan-Ming; Lin, Chung-Yen; Hsu, Wen-Lian; Wang, Yu-Chun; Lee, D T; Lai, Feipei; Huang, Chih-Wei; Chang, Yu-Jung
2014-01-01
Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce. We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard. CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark. CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/.
Chung, Wei-Chun; Chen, Chien-Chih; Ho, Jan-Ming; Lin, Chung-Yen; Hsu, Wen-Lian; Wang, Yu-Chun; Lee, D. T.; Lai, Feipei; Huang, Chih-Wei; Chang, Yu-Jung
2014-01-01
Background Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce. Results We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard. Conclusions CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark. Availability: CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/. PMID:24897343
Cheng, Wei; Cai, Shu; Sun, Jia-yu; Xia, Chun-chao; Li, Zhen-lin; Chen, Yu-cheng; Zhong, Yao-zu
2015-05-01
To compare the two sequences [single shot true-FISP-PSIR (single shot-PSIR) and segmented-turbo-FLASH-PSIR (segmented-PSIR)] in the value of quantification for myocardial infarct size at 3. 0 tesla MRI. 38 patients with clinical confirmed myocardial infarction were served a comprehensive gadonilium cardiac MRI at 3. 0 tesla MRI system (Trio, Siemens). Myocardial delayed enhancement (MDE) were performed by single shot-PSIR and segmented-PSIR sequences separatedly in 12-20 min followed gadopentetate dimeglumine injection (0. 15 mmol/kg). The quality of MDE images were analysed by experienced physicians. Signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR) between the two techniques were compared. Myocardial infarct size was quantified by a dedicated software automatically (Q-mass, Medis). All objectives were scanned on the 3. 0T MR successfully. No significant difference was found in SNR and CNR of the image quality between the two sequences (P>0. 05), as well as the total myocardial volume, between two sequences (P>0. 05). Furthermore, there were still no difference in the infarct size [single shot-PSIR (30. 87 ± 15. 72) mL, segmented-PSIR (29. 26±14. 07) ml], ratio [single shot-PSIR (22. 94%±10. 94%), segmented-PSIR (20. 75% ± 8. 78%)] between the two sequences (P>0. 05). However, the average aquisition time of single shot-PSIR (21. 4 s) was less than that of the latter (380 s). Single shot-PSIR is equal to segmented-PSIR in detecting the myocardial infarct size with less acquisition time, which is valuable in the clinic application and further research.
Nanoliter reactors improve multiple displacement amplification of genomes from single cells.
Marcy, Yann; Ishoey, Thomas; Lasken, Roger S; Stockwell, Timothy B; Walenz, Brian P; Halpern, Aaron L; Beeson, Karen Y; Goldberg, Susanne M D; Quake, Stephen R
2007-09-01
Since only a small fraction of environmental bacteria are amenable to laboratory culture, there is great interest in genomic sequencing directly from single cells. Sufficient DNA for sequencing can be obtained from one cell by the Multiple Displacement Amplification (MDA) method, thereby eliminating the need to develop culture methods. Here we used a microfluidic device to isolate individual Escherichia coli and amplify genomic DNA by MDA in 60-nl reactions. Our results confirm a report that reduced MDA reaction volume lowers nonspecific synthesis that can result from contaminant DNA templates and unfavourable interaction between primers. The quality of the genome amplification was assessed by qPCR and compared favourably to single-cell amplifications performed in standard 50-microl volumes. Amplification bias was greatly reduced in nanoliter volumes, thereby providing a more even representation of all sequences. Single-cell amplicons from both microliter and nanoliter volumes provided high-quality sequence data by high-throughput pyrosequencing, thereby demonstrating a straightforward route to sequencing genomes from single cells.
Applying pollen DNA metabarcoding to the study of plant–pollinator interactions1
Bell, Karen L.; Fowler, Julie; Burgess, Kevin S.; Dobbs, Emily K.; Gruenewald, David; Lawley, Brice; Morozumi, Connor; Brosi, Berry J.
2017-01-01
Premise of the study: To study pollination networks in a changing environment, we need accurate, high-throughput methods. Previous studies have shown that more highly resolved networks can be constructed by studying pollen loads taken from bees, relative to field observations. DNA metabarcoding potentially allows for faster and finer-scale taxonomic resolution of pollen compared to traditional approaches (e.g., light microscopy), but has not been applied to pollination networks. Methods: We sampled pollen from 38 bee species collected in Florida from sites differing in forest management. We isolated DNA from pollen mixtures and sequenced rbcL and ITS2 gene regions from all mixtures in a single run on the Illumina MiSeq platform. We identified species from sequence data using comprehensive rbcL and ITS2 databases. Results: We successfully built a proof-of-concept quantitative pollination network using pollen metabarcoding. Discussion: Our work underscores that pollen metabarcoding is not quantitative but that quantitative networks can be constructed based on the number of interacting individuals. Due to the frequency of contamination and false positive reads, isolation and PCR negative controls should be used in every reaction. DNA metabarcoding has advantages in efficiency and resolution over microscopic identification of pollen, and we expect that it will have broad utility for future studies of plant–pollinator interactions. PMID:28690929
Kück, Patrick; Struck, Torsten H
2014-01-01
BaCoCa (BAse COmposition CAlculator) is a user-friendly software that combines multiple statistical approaches (like RCFV and C value calculations) to identify biases in aligned sequence data which potentially mislead phylogenetic reconstructions. As a result of its speed and flexibility, the program provides the possibility to analyze hundreds of pre-defined gene partitions and taxon subsets in one single process run. BaCoCa is command-line driven and can be easily integrated into automatic process pipelines of phylogenomic studies. Moreover, given the tab-delimited output style the results can be easily used for further analyses in programs like Excel or statistical packages like R. A built-in option of BaCoCa is the generation of heat maps with hierarchical clustering of certain results using R. As input files BaCoCa can handle FASTA and relaxed PHYLIP, which are commonly used in phylogenomic pipelines. BaCoCa is implemented in Perl and works on Windows PCs, Macs and Linux operating systems. The executable source code as well as example test files and a detailed documentation of BaCoCa are freely available at http://software.zfmk.de. Copyright © 2013 Elsevier Inc. All rights reserved.
Wilkinson, Samuel L.; John, Shibu; Walsh, Roddy; Novotny, Tomas; Valaskova, Iveta; Gupta, Manu; Game, Laurence; Barton, Paul J R.; Cook, Stuart A.; Ware, James S.
2013-01-01
Background Molecular genetic testing is recommended for diagnosis of inherited cardiac disease, to guide prognosis and treatment, but access is often limited by cost and availability. Recently introduced high-throughput bench-top DNA sequencing platforms have the potential to overcome these limitations. Methodology/Principal Findings We evaluated two next-generation sequencing (NGS) platforms for molecular diagnostics. The protein-coding regions of six genes associated with inherited arrhythmia syndromes were amplified from 15 human samples using parallelised multiplex PCR (Access Array, Fluidigm), and sequenced on the MiSeq (Illumina) and Ion Torrent PGM (Life Technologies). Overall, 97.9% of the target was sequenced adequately for variant calling on the MiSeq, and 96.8% on the Ion Torrent PGM. Regions missed tended to be of high GC-content, and most were problematic for both platforms. Variant calling was assessed using 107 variants detected using Sanger sequencing: within adequately sequenced regions, variant calling on both platforms was highly accurate (Sensitivity: MiSeq 100%, PGM 99.1%. Positive predictive value: MiSeq 95.9%, PGM 95.5%). At the time of the study the Ion Torrent PGM had a lower capital cost and individual runs were cheaper and faster. The MiSeq had a higher capacity (requiring fewer runs), with reduced hands-on time and simpler laboratory workflows. Both provide significant cost and time savings over conventional methods, even allowing for adjunct Sanger sequencing to validate findings and sequence exons missed by NGS. Conclusions/Significance MiSeq and Ion Torrent PGM both provide accurate variant detection as part of a PCR-based molecular diagnostic workflow, and provide alternative platforms for molecular diagnosis of inherited cardiac conditions. Though there were performance differences at this throughput, platforms differed primarily in terms of cost, scalability, protocol stability and ease of use. Compared with current molecular genetic diagnostic tests for inherited cardiac arrhythmias, these NGS approaches are faster, less expensive, and yet more comprehensive. PMID:23861798
Noda, Yoshifumi; Goshima, Satoshi; Kojima, Toshihisa; Kawaguchi, Shimpei; Kawada, Hiroshi; Kawai, Nobuyuki; Koyasu, Hiromi; Matsuo, Masayuki; Bae, Kyongtae T
2017-04-01
To evaluate the value of adding single-shot balanced turbo field-echo (b-TFE) sequence to conventional magnetic resonance cholangiopancreatography (MRCP) for the detection of common bile duct (CBD) stone. One hundred thirty-seven consecutive patients with suspected CBD stone underwent MRCP including single-shot b-TFE sequence. Twenty-five patients were confirmed with CBD stone by endoscopic retrograde cholangiopancreatography or ultrasonography. Two radiologists reviewed two image protocols: protocol A (conventional MRCP protocol: unenhanced T1-, T2-, and respiratory-triggered three-dimensional fat-suppressed single-shot turbo spin-echo MRCP sequence) and protocol B (protocol A plus single-shot b-TFE sequence). The sensitivity, specificity, positive (PPV) and negative predictive value (NPV), and area under the receiver-operating-characteristic (ROC) curve (AUC) for the detection of CBD stone were compared. The sensitivity (72%) and NPV (94%) were the same between the two protocols. However, protocol B was greater in the specificity (99%) and PPV (94%) than protocol A (92% and 67%, respectively) (P = 0.0078 and 0.031, respectively). The AUC was significantly greater for protocol B (0.93) than for protocol A (0.86) (P = 0.026). Inclusion of single-shot b-TFE sequence to conventional MRCP significantly improved the specificity and PPV for the detection of CBD stone.
Single-cell isolation by a modular single-cell pipette for RNA-sequencing.
Zhang, Kai; Gao, Min; Chong, Zechen; Li, Ying; Han, Xin; Chen, Rui; Qin, Lidong
2016-11-29
Single-cell transcriptome sequencing highly requires a convenient and reliable method to rapidly isolate a live cell into a specific container such as a PCR tube. Here, we report a modular single-cell pipette (mSCP) consisting of three modular components, a SCP-Tip, an air-displacement pipette (ADP), and ADP-Tips, that can be easily assembled, disassembled, and reassembled. By assembling the SCP-Tip containing a hydrodynamic trap, the mSCP can isolate single cells from 5-10 cells per μL of cell suspension. The mSCP is compatible with microscopic identification of captured single cells to finally achieve 100% single-cell isolation efficiency. The isolated live single cells are in submicroliter volumes and well suitable for single-cell PCR analysis and RNA-sequencing. The mSCP possesses merits of convenience, rapidness, and high efficiency, making it a powerful tool to isolate single cells for transcriptome analysis.
Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals.
Taylor, Jeremy F; Whitacre, Lynsey K; Hoff, Jesse L; Tizioto, Polyana C; Kim, JaeWoo; Decker, Jared E; Schnabel, Robert D
2016-08-17
Decreasing sequencing costs and development of new protocols for characterizing global methylation, gene expression patterns and regulatory regions have stimulated the generation of large livestock datasets. Here, we discuss experiences in the analysis of whole-genome and transcriptome sequence data. We analyzed whole-genome sequence (WGS) data from 132 individuals from five canid species (Canis familiaris, C. latrans, C. dingo, C. aureus and C. lupus) and 61 breeds, three bison (Bison bison), 64 water buffalo (Bubalus bubalis) and 297 bovines from 17 breeds. By individual, data vary in extent of reference genome depth of coverage from 4.9X to 64.0X. We have also analyzed RNA-seq data for 580 samples representing 159 Bos taurus and Rattus norvegicus animals and 98 tissues. By aligning reads to a reference assembly and calling variants, we assessed effects of average depth of coverage on the actual coverage and on the number of called variants. We examined the identity of unmapped reads by assembling them and querying produced contigs against the non-redundant nucleic acids database. By imputing high-density single nucleotide polymorphism data on 4010 US registered Angus animals to WGS using Run4 of the 1000 Bull Genomes Project and assessing the accuracy of imputation, we identified misassembled reference sequence regions. We estimate that a 24X depth of coverage is required to achieve 99.5 % coverage of the reference assembly and identify 95 % of the variants within an individual's genome. Genomes sequenced to low average coverage (e.g., <10X) may fail to cover 10 % of the reference genome and identify <75 % of variants. About 10 % of genomic DNA or transcriptome sequence reads fail to align to the reference assembly. These reads include loci missing from the reference assembly and misassembled genes and interesting symbionts, commensal and pathogenic organisms. Assembly errors and a lack of annotation of functional elements significantly limit the utility of the current draft livestock reference assemblies. The Functional Annotation of Animal Genomes initiative seeks to annotate functional elements, while a 70X Pac-Bio assembly for cow is underway and may result in a significantly improved reference assembly.
NASA Astrophysics Data System (ADS)
Chevalier, Paul; Piccardo, Marco; Anand, Sajant; Mejia, Enrique A.; Wang, Yongrui; Mansuripur, Tobias S.; Xie, Feng; Lascola, Kevin; Belyanin, Alexey; Capasso, Federico
2018-02-01
Free-running Fabry-Perot lasers normally operate in a single-mode regime until the pumping current is increased beyond the single-mode instability threshold, above which they evolve into a multimode state. As a result of this instability, the single-mode operation of these lasers is typically constrained to few percents of their output power range, this being an undesired limitation in spectroscopy applications. In order to expand the span of single-mode operation, we use an optical injection seed generated by an external-cavity single-mode laser source to force the Fabry-Perot quantum cascade laser into a single-mode state in the high current range, where it would otherwise operate in a multimode regime. Utilizing this approach, we achieve single-mode emission at room temperature with a tuning range of 36 cm-1 and stable continuous-wave output power exceeding 1 W at 4.5 μm. Far-field measurements show that a single transverse mode is emitted up to the highest optical power, indicating that the beam properties of the seeded Fabry-Perot laser remain unchanged as compared to free-running operation.
Clonal architecture of secondary acute myeloid leukemia defined by single-cell sequencing.
Hughes, Andrew E O; Magrini, Vincent; Demeter, Ryan; Miller, Christopher A; Fulton, Robert; Fulton, Lucinda L; Eades, William C; Elliott, Kevin; Heath, Sharon; Westervelt, Peter; Ding, Li; Conrad, Donald F; White, Brian S; Shao, Jin; Link, Daniel C; DiPersio, John F; Mardis, Elaine R; Wilson, Richard K; Ley, Timothy J; Walter, Matthew J; Graubert, Timothy A
2014-07-01
Next-generation sequencing has been used to infer the clonality of heterogeneous tumor samples. These analyses yield specific predictions-the population frequency of individual clones, their genetic composition, and their evolutionary relationships-which we set out to test by sequencing individual cells from three subjects diagnosed with secondary acute myeloid leukemia, each of whom had been previously characterized by whole genome sequencing of unfractionated tumor samples. Single-cell mutation profiling strongly supported the clonal architecture implied by the analysis of bulk material. In addition, it resolved the clonal assignment of single nucleotide variants that had been initially ambiguous and identified areas of previously unappreciated complexity. Accordingly, we find that many of the key assumptions underlying the analysis of tumor clonality by deep sequencing of unfractionated material are valid. Furthermore, we illustrate a single-cell sequencing strategy for interrogating the clonal relationships among known variants that is cost-effective, scalable, and adaptable to the analysis of both hematopoietic and solid tumors, or any heterogeneous population of cells.
Chemoresistance Evolution in Triple-Negative Breast Cancer Delineated by Single-Cell Sequencing.
Kim, Charissa; Gao, Ruli; Sei, Emi; Brandt, Rachel; Hartman, Johan; Hatschek, Thomas; Crosetto, Nicola; Foukakis, Theodoros; Navin, Nicholas E
2018-05-03
Triple-negative breast cancer (TNBC) is an aggressive subtype that frequently develops resistance to chemotherapy. An unresolved question is whether resistance is caused by the selection of rare pre-existing clones or alternatively through the acquisition of new genomic aberrations. To investigate this question, we applied single-cell DNA and RNA sequencing in addition to bulk exome sequencing to profile longitudinal samples from 20 TNBC patients during neoadjuvant chemotherapy (NAC). Deep-exome sequencing identified 10 patients in which NAC led to clonal extinction and 10 patients in which clones persisted after treatment. In 8 patients, we performed a more detailed study using single-cell DNA sequencing to analyze 900 cells and single-cell RNA sequencing to analyze 6,862 cells. Our data showed that resistant genotypes were pre-existing and adaptively selected by NAC, while transcriptional profiles were acquired by reprogramming in response to chemotherapy in TNBC patients. Copyright © 2018 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Garnache, Arnaud; Myara, Mikhaël.; Laurain, A.; Bouchier, Aude; Perez, J. P.; Signoret, P.; Sagnes, I.; Romanini, D.
2017-11-01
We present a highly coherent semiconductor laser device formed by a ½-VCSEL structure and an external concave mirror in a millimetre high finesse stable cavity. The quantum well structure is diode-pumped by a commercial single mode GaAs laser diode system. This free running low noise tunable single-frequency laser exhibits >50mW output power in a low divergent circular TEM00 beam with a spectral linewidth below 1kHz and a relative intensity noise close to the quantum limit. This approach ensures, with a compact design, homogeneous gain behaviour and a sufficiently long photon lifetime to reach the oscillation-relaxation-free class-A regime, with a cut off frequency around 10MHz.
Patel, Rajesh; Tsan, Alison; Sumiyoshi, Teiko; Fu, Ling; Desai, Rupal; Schoenbrunner, Nancy; Myers, Thomas W.; Bauer, Keith; Smith, Edward; Raja, Rajiv
2014-01-01
Molecular profiling of tumor tissue to detect alterations, such as oncogenic mutations, plays a vital role in determining treatment options in oncology. Hence, there is an increasing need for a robust and high-throughput technology to detect oncogenic hotspot mutations. Although commercial assays are available to detect genetic alterations in single genes, only a limited amount of tissue is often available from patients, requiring multiplexing to allow for simultaneous detection of mutations in many genes using low DNA input. Even though next-generation sequencing (NGS) platforms provide powerful tools for this purpose, they face challenges such as high cost, large DNA input requirement, complex data analysis, and long turnaround times, limiting their use in clinical settings. We report the development of the next generation mutation multi-analyte panel (MUT-MAP), a high-throughput microfluidic, panel for detecting 120 somatic mutations across eleven genes of therapeutic interest (AKT1, BRAF, EGFR, FGFR3, FLT3, HRAS, KIT, KRAS, MET, NRAS, and PIK3CA) using allele-specific PCR (AS-PCR) and Taqman technology. This mutation panel requires as little as 2 ng of high quality DNA from fresh frozen or 100 ng of DNA from formalin-fixed paraffin-embedded (FFPE) tissues. Mutation calls, including an automated data analysis process, have been implemented to run 88 samples per day. Validation of this platform using plasmids showed robust signal and low cross-reactivity in all of the newly added assays and mutation calls in cell line samples were found to be consistent with the Catalogue of Somatic Mutations in Cancer (COSMIC) database allowing for direct comparison of our platform to Sanger sequencing. High correlation with NGS when compared to the SuraSeq500 panel run on the Ion Torrent platform in a FFPE dilution experiment showed assay sensitivity down to 0.45%. This multiplexed mutation panel is a valuable tool for high-throughput biomarker discovery in personalized medicine and cancer drug development. PMID:24658394
LETTER TO THE EDITOR: Exhaustive search for low-autocorrelation binary sequences
NASA Astrophysics Data System (ADS)
Mertens, S.
1996-09-01
Binary sequences with low autocorrelations are important in communication engineering and in statistical mechanics as ground states of the Bernasconi model. Computer searches are the main tool in the construction of such sequences. Owing to the exponential size 0305-4470/29/18/005/img1 of the configuration space, exhaustive searches are limited to short sequences. We discuss an exhaustive search algorithm with run-time characteristic 0305-4470/29/18/005/img2 and apply it to compile a table of exact ground states of the Bernasconi model up to N = 48. The data suggest F > 9 for the optimal merit factor in the limit 0305-4470/29/18/005/img3.
Correcting for batch effects in case-control microbiome studies
Gibbons, Sean M.; Duvallet, Claire
2018-01-01
High-throughput data generation platforms, like mass-spectrometry, microarrays, and second-generation sequencing are susceptible to batch effects due to run-to-run variation in reagents, equipment, protocols, or personnel. Currently, batch correction methods are not commonly applied to microbiome sequencing datasets. In this paper, we compare different batch-correction methods applied to microbiome case-control studies. We introduce a model-free normalization procedure where features (i.e. bacterial taxa) in case samples are converted to percentiles of the equivalent features in control samples within a study prior to pooling data across studies. We look at how this percentile-normalization method compares to traditional meta-analysis methods for combining independent p-values and to limma and ComBat, widely used batch-correction models developed for RNA microarray data. Overall, we show that percentile-normalization is a simple, non-parametric approach for correcting batch effects and improving sensitivity in case-control meta-analyses. PMID:29684016
Randjbaran, Elias; Zahari, Rizal; Jalil, Nawal Aswan Abdul; Majid, Dayang Laila Abang Abdul
2014-01-01
Current study reported a facile method to investigate the effects of stacking sequence layers of hybrid composite materials on ballistic energy absorption by running the ballistic test at the high velocity ballistic impact conditions. The velocity and absorbed energy were accordingly calculated as well. The specimens were fabricated from Kevlar, carbon, and glass woven fabrics and resin and were experimentally investigated under impact conditions. All the specimens possessed equal mass, shape, and density; nevertheless, the layers were ordered in different stacking sequence. After running the ballistic test at the same conditions, the final velocities of the cylindrical AISI 4340 Steel pellet showed how much energy was absorbed by the samples. The energy absorption of each sample through the ballistic impact was calculated; accordingly, the proper ballistic impact resistance materials could be found by conducting the test. This paper can be further studied in order to characterise the material properties for the different layers.
Multi-classification of cell deformation based on object alignment and run length statistic.
Li, Heng; Liu, Zhiwen; An, Xing; Shi, Yonggang
2014-01-01
Cellular morphology is widely applied in digital pathology and is essential for improving our understanding of the basic physiological processes of organisms. One of the main issues of application is to develop efficient methods for cell deformation measurement. We propose an innovative indirect approach to analyze dynamic cell morphology in image sequences. The proposed approach considers both the cellular shape change and cytoplasm variation, and takes each frame in the image sequence into account. The cell deformation is measured by the minimum energy function of object alignment, which is invariant to object pose. Then an indirect analysis strategy is employed to overcome the limitation of gradual deformation by run length statistic. We demonstrate the power of the proposed approach with one application: multi-classification of cell deformation. Experimental results show that the proposed method is sensitive to the morphology variation and performs better than standard shape representation methods.
A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages.
Park, Seung-Jin; Kim, Jong-Hwan; Yoon, Byung-Ha; Kim, Seon-Young
2017-03-01
Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. 'dada2' performs trimming of the high-throughput sequencing data. 'QuasR' and 'mosaics' perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, 'ChIPseeker' performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git.
A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages
Park, Seung-Jin; Kim, Jong-Hwan; Yoon, Byung-Ha; Kim, Seon-Young
2017-01-01
Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. ‘dada2’ performs trimming of the high-throughput sequencing data. ‘QuasR’ and ‘mosaics’ perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, ‘ChIPseeker’ performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git. PMID:28416945
USDA-ARS?s Scientific Manuscript database
Single-nucleotide polymorphisms (SNPs) are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout, SNP discovery has been done through sequencing of restriction-site associated DNA (RAD) libraries, reduced representation libraries (RRL), RNA sequencing, and whole...
Distributed run of a one-dimensional model in a regional application using SOAP-based web services
NASA Astrophysics Data System (ADS)
Smiatek, Gerhard
This article describes the setup of a distributed computing system in Perl. It facilitates the parallel run of a one-dimensional environmental model on a number of simple network PC hosts. The system uses Simple Object Access Protocol (SOAP) driven web services offering the model run on remote hosts and a multi-thread environment distributing the work and accessing the web services. Its application is demonstrated in a regional run of a process-oriented biogenic emission model for the area of Germany. Within a network consisting of up to seven web services implemented on Linux and MS-Windows hosts, a performance increase of approximately 400% has been reached compared to a model run on the fastest single host.
Genomic Insights into Geothermal Spring Community Members using a 16S Agnostic Single-Cell Approach
NASA Astrophysics Data System (ADS)
Bowers, R. M.
2016-12-01
INSTUTIONS (ALL): DOE Joint Genome Institute, Walnut Creek, CA USA. Bigelow Laboratory for Ocean Sciences, East Boothbay, ME USA. Department of Biological Sciences, University of Calgary, Calgary, Alberta, Canada. ABSTRACT BODY: With recent advances in DNA sequencing, rapid and affordable screening of single-cell genomes has become a reality. Single-cell sequencing is a multi-step process that takes advantage of any number of single-cell sorting techniques, whole genome amplification (WGA), and 16S rRNA gene based PCR screening to identify the microbes of interest prior to shotgun sequencing. However, the 16S PCR based screening step is costly and may lead to unanticipated losses of microbial diversity, as cells that do not produce a clean 16S amplicon are typically omitted from downstream shotgun sequencing. While many of the sorted cells that fail the 16S PCR step likely originate from poor quality amplified DNA, some of the cells with good WGA kinetics may instead represent bacteria or archaea with 16S genes that fail to amplify due to primer mis-matches or the presence of intervening sequences. Using cell material from Dewar Creek, a hot spring in British Columbia, we sequenced all sorted cells with good WGA kinetics irrespective of their 16S amplification success. We show that this high-throughput approach to single-cell sequencing (i) can reduce the overall cost of single-cell genome production, and (ii). may lead to the discovery of previously unknown branches on the microbial tree of life.
Recent patents of nanopore DNA sequencing technology: progress and challenges.
Zhou, Jianfeng; Xu, Bingqian
2010-11-01
DNA sequencing techniques witnessed fast development in the last decades, primarily driven by the Human Genome Project. Among the proposed new techniques, Nanopore was considered as a suitable candidate for the single DNA sequencing with ultrahigh speed and very low cost. Several fabrication and modification techniques have been developed to produce robust and well-defined nanopore devices. Many efforts have also been done to apply nanopore to analyze the properties of DNA molecules. By comparing with traditional sequencing techniques, nanopore has demonstrated its distinctive superiorities in main practical issues, such as sample preparation, sequencing speed, cost-effective and read-length. Although challenges still remain, recent researches in improving the capabilities of nanopore have shed a light to achieve its ultimate goal: Sequence individual DNA strand at single nucleotide level. This patent review briefly highlights recent developments and technological achievements for DNA analysis and sequencing at single molecule level, focusing on nanopore based methods.
A short review of variants calling for single-cell-sequencing data with applications.
Wei, Zhuohui; Shu, Chang; Zhang, Changsheng; Huang, Jingying; Cai, Hongmin
2017-11-01
The field of single-cell sequencing is fleetly expanding, and many techniques have been developed in the past decade. With this technology, biologists can study not only the heterogeneity between two adjacent cells in the same tissue or organ, but also the evolutionary relationships and degenerative processes in a single cell. Calling variants is the main purpose in analyzing single cell sequencing (SCS) data. Currently, some popular methods used for bulk-cell-sequencing data analysis are tailored directly to be applied in dealing with SCS data. However, SCS requires an extra step of genome amplification to accumulate enough quantity for satisfying sequencing needs. The amplification yields large biases and thus raises challenge for using the bulk-cell-sequencing methods. In order to provide guidance for the development of specialized analyzed methods as well as using currently developed tools for SNS, this paper aims to bridge the gap. In this paper, we firstly introduced two popular genome amplification methods and compared their capabilities. Then we introduced a few popular models for calling single-nucleotide polymorphisms and copy-number variations. Finally, break-through applications of SNS were summarized to demonstrate its potential in researching cell evolution. Copyright © 2017 Elsevier Ltd. All rights reserved.
Exploring viral infection using single-cell sequencing.
Rato, Sylvie; Golumbeanu, Monica; Telenti, Amalio; Ciuffi, Angela
2017-07-15
Single-cell sequencing (SCS) has emerged as a valuable tool to study cellular heterogeneity in diverse fields, including virology. By studying the viral and cellular genome and/or transcriptome, the dynamics of viral infection can be investigated at single cell level. Most studies have explored the impact of cell-to-cell variation on the viral life cycle from the point of view of the virus, by analyzing viral sequences, and from the point of view of the cell, mainly by analyzing the cellular host transcriptome. In this review, we will focus on recent studies that use single-cell sequencing to explore viral diversity and cell variability in response to viral replication. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Single-cell sequencing in stem cell biology.
Wen, Lu; Tang, Fuchou
2016-04-15
Cell-to-cell variation and heterogeneity are fundamental and intrinsic characteristics of stem cell populations, but these differences are masked when bulk cells are used for omic analysis. Single-cell sequencing technologies serve as powerful tools to dissect cellular heterogeneity comprehensively and to identify distinct phenotypic cell types, even within a 'homogeneous' stem cell population. These technologies, including single-cell genome, epigenome, and transcriptome sequencing technologies, have been developing rapidly in recent years. The application of these methods to different types of stem cells, including pluripotent stem cells and tissue-specific stem cells, has led to exciting new findings in the stem cell field. In this review, we discuss the recent progress as well as future perspectives in the methodologies and applications of single-cell omic sequencing technologies.
Method for compression of data using single pass LZSS and run-length encoding
Berlin, G.J.
1994-01-01
A method used preferably with LZSS-based compression methods for compressing a stream of digital data. The method uses a run-length encoding scheme especially suited for data strings of identical data bytes having large run-lengths, such as data representing scanned images. The method reads an input data stream to determine the length of the data strings. Longer data strings are then encoded in one of two ways depending on the length of the string. For data strings having run-lengths less than 18 bytes, a cleared offset and the actual run-length are written to an output buffer and then a run byte is written to the output buffer. For data strings of 18 bytes or longer, a set offset and an encoded run-length are written to the output buffer and then a run byte is written to the output buffer. The encoded run-length is written in two parts obtained by dividing the run length by a factor of 255. The first of two parts of the encoded run-length is the quotient; the second part is the remainder. Data bytes that are not part of data strings of sufficient length are written directly to the output buffer.
Method for compression of data using single pass LZSS and run-length encoding
Berlin, Gary J.
1997-01-01
A method used preferably with LZSS-based compression methods for compressing a stream of digital data. The method uses a run-length encoding scheme especially suited for data strings of identical data bytes having large run-lengths, such as data representing scanned images. The method reads an input data stream to determine the length of the data strings. Longer data strings are then encoded in one of two ways depending on the length of the string. For data strings having run-lengths less than 18 bytes, a cleared offset and the actual run-length are written to an output buffer and then a run byte is written to the output buffer. For data strings of 18 bytes or longer, a set offset and an encoded run-length are written to the output buffer and then a run byte is written to the output buffer. The encoded run-length is written in two parts obtained by dividing the run length by a factor of 255. The first of two parts of the encoded run-length is the quotient; the second part is the remainder. Data bytes that are not part of data strings of sufficient length are written directly to the output buffer.
Leray, Matthieu; Knowlton, Nancy
2017-01-01
DNA metabarcoding, the PCR-based profiling of natural communities, is becoming the method of choice for biodiversity monitoring because it circumvents some of the limitations inherent to traditional ecological surveys. However, potential sources of bias that can affect the reproducibility of this method remain to be quantified. The interpretation of differences in patterns of sequence abundance and the ecological relevance of rare sequences remain particularly uncertain. Here we used one artificial mock community to explore the significance of abundance patterns and disentangle the effects of two potential biases on data reproducibility: indexed PCR primers and random sampling during Illumina MiSeq sequencing. We amplified a short fragment of the mitochondrial Cytochrome c Oxidase Subunit I (COI) for a single mock sample containing equimolar amounts of total genomic DNA from 34 marine invertebrates belonging to six phyla. We used seven indexed broad-range primers and sequenced the resulting library on two consecutive Illumina MiSeq runs. The total number of Operational Taxonomic Units (OTUs) was ∼4 times higher than expected based on the composition of the mock sample. Moreover, the total number of reads for the 34 components of the mock sample differed by up to three orders of magnitude. However, 79 out of 86 of the unexpected OTUs were represented by <10 sequences that did not appear consistently across replicates. Our data suggest that random sampling of rare OTUs (e.g., small associated fauna such as parasites) accounted for most of variation in OTU presence-absence, whereas biases associated with indexed PCRs accounted for a larger amount of variation in relative abundance patterns. These results suggest that random sampling during sequencing leads to the low reproducibility of rare OTUs. We suggest that the strategy for handling rare OTUs should depend on the objectives of the study. Systematic removal of rare OTUs may avoid inflating diversity based on common β descriptors but will exclude positive records of taxa that are functionally important. Our results further reinforce the need for technical replicates (parallel PCR and sequencing from the same sample) in metabarcoding experimental designs. Data reproducibility should be determined empirically as it will depend upon the sequencing depth, the type of sample, the sequence analysis pipeline, and the number of replicates. Moreover, estimating relative biomasses or abundances based on read counts remains elusive at the OTU level.
2011-01-01
Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO) terms, and thousands of single-nucleotide polymorphisms (SNPs) were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49%) that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to further develop Silene as a plant model system. The genes characterized will be useful for future research not only in the species included in the present study, but also in related species for which no genomic resources are yet available. Our results demonstrate the efficiency of massively parallel transcriptome sequencing in a comparative framework as an approach for developing genomic resources in diverse groups of non-model organisms. PMID:21791039
Asia Pacific Research Initiative for Sustainable Energy Systems 2011 (APRISES11)
2017-09-29
created during a single run , highlighting rapid prototyping capabilities. NRL’s overall goal was to evaluate whether 3D printed metallic bipolar plates...varying the air flow to evaluate the effect on peak power. These runs are displayed in Figure 2.1.17. The reactants were connected in co-flow with the...way valve allows the operator to either run the gas through a humidifier (PermaPure Model FCl 25-240-7) or a bypass loop. On the humidifier side of
Running of the spectrum of cosmological perturbations in string gas cosmology
NASA Astrophysics Data System (ADS)
Brandenberger, Robert; Franzmann, Guilherme; Liang, Qiuyue
2017-12-01
We compute the running of the spectrum of cosmological perturbations in string gas cosmology, making use of a smooth parametrization of the transition between the early Hagedorn phase and the later radiation phase. We find that the running has the same sign as in simple models of single scalar field inflation. Its magnitude is proportional to (1 -ns) (ns being the slope index of the spectrum), and it is thus parametrically larger than for inflationary cosmology, where it is proportional to (1 -ns)2 .
Jurka, Jerzy W.
1997-01-01
Enhanced homologous recombination is obtained by employing a consensus sequence which has been found to be associated with integration of repeat sequences, such as Alu and ID. The consensus sequence or sequence having a single transition mutation determines one site of a double break which allows for high efficiency of integration at the site. By introducing single or double stranded DNA having the consensus sequence flanking region joined to a sequence of interest, one can reproducibly direct integration of the sequence of interest at one or a limited number of sites. In this way, specific sites can be identified and homologous recombination achieved at the site by employing a second flanking sequence associated with a sequence proximal to the 3'-nick.
Iterated function systems for DNA replication
NASA Astrophysics Data System (ADS)
Gaspard, Pierre
2017-10-01
The kinetic equations of DNA replication are shown to be exactly solved in terms of iterated function systems, running along the template sequence and giving the statistical properties of the copy sequences, as well as the kinetic and thermodynamic properties of the replication process. With this method, different effects due to sequence heterogeneity can be studied, in particular, a transition between linear and sublinear growths in time of the copies, and a transition between continuous and fractal distributions of the local velocities of the DNA polymerase along the template. The method is applied to the human mitochondrial DNA polymerase γ without and with exonuclease proofreading.
Low-Cost, High-Throughput Sequencing of DNA Assemblies Using a Highly Multiplexed Nextera Process.
Shapland, Elaine B; Holmes, Victor; Reeves, Christopher D; Sorokin, Elena; Durot, Maxime; Platt, Darren; Allen, Christopher; Dean, Jed; Serber, Zach; Newman, Jack; Chandran, Sunil
2015-07-17
In recent years, next-generation sequencing (NGS) technology has greatly reduced the cost of sequencing whole genomes, whereas the cost of sequence verification of plasmids via Sanger sequencing has remained high. Consequently, industrial-scale strain engineers either limit the number of designs or take short cuts in quality control. Here, we show that over 4000 plasmids can be completely sequenced in one Illumina MiSeq run for less than $3 each (15× coverage), which is a 20-fold reduction over using Sanger sequencing (2× coverage). We reduced the volume of the Nextera tagmentation reaction by 100-fold and developed an automated workflow to prepare thousands of samples for sequencing. We also developed software to track the samples and associated sequence data and to rapidly identify correctly assembled constructs having the fewest defects. As DNA synthesis and assembly become a centralized commodity, this NGS quality control (QC) process will be essential to groups operating high-throughput pipelines for DNA construction.
Tanabe, Akifumi S
2011-09-01
Proportional and separate models able to apply different combination of substitution rate matrix (SRM) and among-site rate variation model (ASRVM) to each locus are frequently used in phylogenetic studies of multilocus data. A proportional model assumes that branch lengths are proportional among partitions and a separate model assumes that each partition has an independent set of branch lengths. However, the selection from among nonpartitioned (i.e., a common combination of models is applied to all-loci concatenated sequences), proportional and separate models is usually based on the researcher's preference rather than on any information criteria. This study describes two programs, 'Kakusan4' (for DNA sequences) and 'Aminosan' (for amino-acid sequences), which allow the selection of evolutionary models based on several types of information criteria. The programs can handle both multilocus and single-locus data, in addition to providing an easy-to-use wizard interface and a noninteractive command line interface. In the case of multilocus data, SRMs and ASRVMs are compared at each locus and at all-loci concatenated sequences, after which nonpartitioned, proportional and separate models are compared based on information criteria. The programs also provide model configuration files for mrbayes, paup*, phyml, raxml and Treefinder to support further phylogenetic analysis using a selected model. When likelihoods are optimized by Treefinder, the best-fit models were found to differ depending on the data set. Furthermore, differences in the information criteria among nonpartitioned, proportional and separate models were much larger than those among the nonpartitioned models. These findings suggest that selecting from nonpartitioned, proportional and separate models results in a better phylogenetic tree. Kakusan4 and Aminosan are available at http://www.fifthdimension.jp/. They are licensed under gnugpl Ver.2, and are able to run on Windows, MacOS X and Linux. © 2011 Blackwell Publishing Ltd.
Transcriptomic Analysis of the Salivary Glands of an Invasive Whitefly
Su, Yun-Lin; Li, Jun-Min; Li, Meng; Luan, Jun-Bo; Ye, Xiao-Dong; Wang, Xiao-Wei; Liu, Shu-Sheng
2012-01-01
Background Some species of the whitefly Bemisia tabaci complex cause tremendous losses to crops worldwide through feeding directly and virus transmission indirectly. The primary salivary glands of whiteflies are critical for their feeding and virus transmission. However, partly due to their tiny size, research on whitefly salivary glands is limited and our knowledge on these glands is scarce. Methodology/Principal Findings We sequenced the transcriptome of the primary salivary glands of the Mediterranean species of B. tabaci complex using an effective cDNA amplification method in combination with short read sequencing (Illumina). In a single run, we obtained 13,615 unigenes. The quantity of the unigenes obtained from the salivary glands of the whitefly is at least four folds of the salivary gland genes from other plant-sucking insects. To reveal the functions of the primary glands, sequence similarity search and comparisons with the whole transcriptome of the whitefly were performed. The results demonstrated that the genes related to metabolism and transport were significantly enriched in the primary salivary glands. Furthermore, we found that a number of highly expressed genes in the salivary glands might be involved in secretory protein processing, secretion and virus transmission. To identify potential proteins of whitefly saliva, the translated unigenes were put into secretory protein prediction. Finally, 295 genes were predicted to encode secretory proteins and some of them might play important roles in whitefly feeding. Conclusions/Significance: The combined method of cDNA amplification, Illumina sequencing and de novo assembly is suitable for transcriptomic analysis of tiny organs in insects. Through analysis of the transcriptome, genomic features of the primary salivary glands were dissected and biologically important proteins, especially secreted proteins, were predicted. Our findings provide substantial sequence information for the primary salivary glands of whiteflies and will be the basis for future studies on whitefly-plant interactions and virus transmission. PMID:22745728
As-sadi, Falah; Carrere, Sébastien; Gascuel, Quentin; Hourlier, Thibaut; Rengel, David; Le Paslier, Marie-Christine; Bordat, Amandine; Boniface, Marie-Claude; Brunel, Dominique; Gouzy, Jérôme; Godiard, Laurence; Vincourt, Patrick
2011-10-11
Downy mildew in sunflowers (Helianthus annuus L.) is caused by the oomycete Plasmopara halstedii (Farl.) Berlese et de Toni. Despite efforts by the international community to breed mildew-resistant varieties, downy mildew remains a major threat to the sunflower crop. Very few genomic, genetic and molecular resources are currently available to study this pathogen. Using a 454 sequencing method, expressed sequence tags (EST) during the interaction between H. annuus and P. halstedii have been generated and a search was performed for sites in putative effectors to show polymorphisms between the different races of P. halstedii. A 454 pyrosequencing run of two infected sunflower samples (inbred lines XRQ and PSC8 infected with race 710 of P. halstedii, which exhibit incompatible and compatible interactions, respectively) generated 113,720 and 172,107 useable reads. From these reads, 44,948 contigs and singletons have been produced. A bioinformatic portal, HP, was specifically created for in-depth analysis of these clusters. Using in silico filtering, 405 clusters were defined as being specific to oomycetes, and 172 were defined as non-specific oomycete clusters. A subset of these two categories was checked using PCR amplification, and 86% of the tested clusters were validated. Twenty putative RXLR and CRN effectors were detected using PSI-BLAST. Using corresponding sequences from four races (100, 304, 703 and 710), 22 SNPs were detected, providing new information on pathogen polymorphisms. This study identified a large number of genes that are expressed during H. annuus/P. halstedii compatible or incompatible interactions. It also reveals, for the first time, that an infection mechanism exists in P. halstedii similar to that in other oomycetes associated with the presence of putative RXLR and CRN effectors. SNPs discovered in CRN effector sequences were used to determine the genetic distances between the four races of P. halstedii. This work therefore provides valuable tools for further discoveries regarding the H. annuus/P. halstedii pathosystem.
2011-01-01
Background Downy mildew in sunflowers (Helianthus annuus L.) is caused by the oomycete Plasmopara halstedii (Farl.) Berlese et de Toni. Despite efforts by the international community to breed mildew-resistant varieties, downy mildew remains a major threat to the sunflower crop. Very few genomic, genetic and molecular resources are currently available to study this pathogen. Using a 454 sequencing method, expressed sequence tags (EST) during the interaction between H. annuus and P. halstedii have been generated and a search was performed for sites in putative effectors to show polymorphisms between the different races of P. halstedii. Results A 454 pyrosequencing run of two infected sunflower samples (inbred lines XRQ and PSC8 infected with race 710 of P. halstedii, which exhibit incompatible and compatible interactions, respectively) generated 113,720 and 172,107 useable reads. From these reads, 44,948 contigs and singletons have been produced. A bioinformatic portal, HP, was specifically created for in-depth analysis of these clusters. Using in silico filtering, 405 clusters were defined as being specific to oomycetes, and 172 were defined as non-specific oomycete clusters. A subset of these two categories was checked using PCR amplification, and 86% of the tested clusters were validated. Twenty putative RXLR and CRN effectors were detected using PSI-BLAST. Using corresponding sequences from four races (100, 304, 703 and 710), 22 SNPs were detected, providing new information on pathogen polymorphisms. Conclusions This study identified a large number of genes that are expressed during H. annuus/P. halstedii compatible or incompatible interactions. It also reveals, for the first time, that an infection mechanism exists in P. halstedii similar to that in other oomycetes associated with the presence of putative RXLR and CRN effectors. SNPs discovered in CRN effector sequences were used to determine the genetic distances between the four races of P. halstedii. This work therefore provides valuable tools for further discoveries regarding the H. annuus/P. halstedii pathosystem. PMID:21988821
OVAS: an open-source variant analysis suite with inheritance modelling.
Mozere, Monika; Tekman, Mehmet; Kari, Jameela; Bockenhauer, Detlef; Kleta, Robert; Stanescu, Horia
2018-02-08
The advent of modern high-throughput genetics continually broadens the gap between the rising volume of sequencing data, and the tools required to process them. The need to pinpoint a small subset of functionally important variants has now shifted towards identifying the critical differences between normal variants and disease-causing ones. The ever-increasing reliance on cloud-based services for sequence analysis and the non-transparent methods they utilize has prompted the need for more in-situ services that can provide a safer and more accessible environment to process patient data, especially in circumstances where continuous internet usage is limited. To address these issues, we herein propose our standalone Open-source Variant Analysis Sequencing (OVAS) pipeline; consisting of three key stages of processing that pertain to the separate modes of annotation, filtering, and interpretation. Core annotation performs variant-mapping to gene-isoforms at the exon/intron level, append functional data pertaining the type of variant mutation, and determine hetero/homozygosity. An extensive inheritance-modelling module in conjunction with 11 other filtering components can be used in sequence ranging from single quality control to multi-file penetrance model specifics such as X-linked recessive or mosaicism. Depending on the type of interpretation required, additional annotation is performed to identify organ specificity through gene expression and protein domains. In the course of this paper we analysed an autosomal recessive case study. OVAS made effective use of the filtering modules to recapitulate the results of the study by identifying the prescribed compound-heterozygous disease pattern from exome-capture sequence input samples. OVAS is an offline open-source modular-driven analysis environment designed to annotate and extract useful variants from Variant Call Format (VCF) files, and process them under an inheritance context through a top-down filtering schema of swappable modules, run entirely off a live bootable medium and accessed locally through a web-browser.
A Dual-Loop Opto-Electronic Oscillator
NASA Astrophysics Data System (ADS)
Yao, X. S.; Maleki, L.; Ji, Y.; Lutes, G.; Tu, M.
1998-07-01
We describe and demonstrate a multiloop technique for single-mode selection in an opto-electronic oscillator (OEO). We present experimental results of a dual-loop OEO free running at 10 GHz that has the lowest phase noise (-140 dBc/Hz at 10 kHz from the carrier) of all free-running room-temperature oscillators to date.
Ruegsegger, Gregory N; Toedebusch, Ryan G; Childs, Thomas E; Grigsby, Kolter B; Booth, Frank W
2017-01-01
Physical inactivity, which drastically increases with advancing age, is associated with numerous chronic diseases. The nucleus accumbens (the pleasure and reward 'hub' in the brain) influences wheel running behaviour in rodents. RNA-sequencing and subsequent bioinformatics analysis led us to hypothesize a potential relationship between the regulation of dendritic spine density, the molecules involved in synaptic transmission, and age-related reductions in wheel running. Upon completion of follow-up studies, we developed the working model that synaptic plasticity in the nucleus accumbens is central to age-related changes in voluntary running. Testing this hypothesis, inhibition of Cdk5 (comprising a molecule central to the processes described above) in the nucleus accumbens reduced wheel running. The results of the present study show that reductions in synaptic transmission and Cdk5 function are related to decreases in voluntary running behaviour and provide guidance for understanding the neural mechanisms that underlie age-dependent reductions in the motivation to be physically active. Increases in age are often associated with reduced levels of physical activity, which, in turn, associates with the development of numerous chronic diseases. We aimed to assess molecular differences in the nucleus accumbens (NAc) (a specific brain nucleus postulated to influence rewarding behaviour) with respect to wheel running and sedentary female Wistar rats at 8 and 14 weeks of age. RNA-sequencing was used to interrogate transcriptomic changes between 8- and 14-week-old wheel running rats, and select transcripts were later analysed by quantitative RT-PCR in age-matched sedentary rats. Voluntary wheel running was greatest at 8 weeks and had significantly decreased by 12 weeks. From 619 differentially expressed mRNAs, bioinformatics suggested that cAMP-mediated signalling, dopamine- and cAMP-regulated neuronal phosphoprotein of 32 kDa feedback, and synaptic plasticity were greater in 8- vs. 14-week-old rats. In depth analysis of these networks showed significant (∼20-30%; P < 0.05) decreases in cell adhesion molecule (Cadm)4 and p39 mRNAs, as well as their proteins from 8 to 14 weeks of age in running and sedentary rats. Furthermore, Cadm4, cyclin-dependent kinase 5 (Cdk5) and p39 mRNAs were significantly correlated with voluntary running distance. Analysis of dendritic spine density in the NAc showed that wheel access increased spine density (P < 0.001), whereas spine density was lower in 14- vs. 8-week-old sedentary rats (P = 0.03). Intriguingly, intra-NAc injection of the Cdk5 inhibitor roscovitine, dose-dependently decreased wheel running. Collectively, these experiments suggest that an age-dependent loss in synaptic function and Cdk5/p39 activity in the NAc may be partially responsible for age-related declines in voluntary running behaviour. © 2016 The Authors. The Journal of Physiology © 2016 The Physiological Society.
Ruegsegger, Gregory N.; Toedebusch, Ryan G.; Childs, Thomas E.; Grigsby, Kolter B.
2016-01-01
Key points Physical inactivity, which drastically increases with advancing age, is associated with numerous chronic diseases.The nucleus accumbens (the pleasure and reward ‘hub’ in the brain) influences wheel running behaviour in rodents.RNA‐sequencing and subsequent bioinformatics analysis led us to hypothesize a potential relationship between the regulation of dendritic spine density, the molecules involved in synaptic transmission, and age‐related reductions in wheel running. Upon completion of follow‐up studies, we developed the working model that synaptic plasticity in the nucleus accumbens is central to age‐related changes in voluntary running.Testing this hypothesis, inhibition of Cdk5 (comprising a molecule central to the processes described above) in the nucleus accumbens reduced wheel running.The results of the present study show that reductions in synaptic transmission and Cdk5 function are related to decreases in voluntary running behaviour and provide guidance for understanding the neural mechanisms that underlie age‐dependent reductions in the motivation to be physically active. Abstract Increases in age are often associated with reduced levels of physical activity, which, in turn, associates with the development of numerous chronic diseases. We aimed to assess molecular differences in the nucleus accumbens (NAc) (a specific brain nucleus postulated to influence rewarding behaviour) with respect to wheel running and sedentary female Wistar rats at 8 and 14 weeks of age. RNA‐sequencing was used to interrogate transcriptomic changes between 8‐ and 14‐week‐old wheel running rats, and select transcripts were later analysed by quantitative RT‐PCR in age‐matched sedentary rats. Voluntary wheel running was greatest at 8 weeks and had significantly decreased by 12 weeks. From 619 differentially expressed mRNAs, bioinformatics suggested that cAMP‐mediated signalling, dopamine‐ and cAMP‐regulated neuronal phosphoprotein of 32 kDa feedback, and synaptic plasticity were greater in 8‐ vs. 14‐week‐old rats. In depth analysis of these networks showed significant (∼20–30%; P < 0.05) decreases in cell adhesion molecule (Cadm)4 and p39 mRNAs, as well as their proteins from 8 to 14 weeks of age in running and sedentary rats. Furthermore, Cadm4, cyclin‐dependent kinase 5 (Cdk5) and p39 mRNAs were significantly correlated with voluntary running distance. Analysis of dendritic spine density in the NAc showed that wheel access increased spine density (P < 0.001), whereas spine density was lower in 14‐ vs. 8‐week‐old sedentary rats (P = 0.03). Intriguingly, intra‐NAc injection of the Cdk5 inhibitor roscovitine, dose‐dependently decreased wheel running. Collectively, these experiments suggest that an age‐dependent loss in synaptic function and Cdk5/p39 activity in the NAc may be partially responsible for age‐related declines in voluntary running behaviour. PMID:27461471
MetaMeta: integrating metagenome analysis tools to improve taxonomic profiling.
Piro, Vitor C; Matschkowski, Marcel; Renard, Bernhard Y
2017-08-14
Many metagenome analysis tools are presently available to classify sequences and profile environmental samples. In particular, taxonomic profiling and binning methods are commonly used for such tasks. Tools available among these two categories make use of several techniques, e.g., read mapping, k-mer alignment, and composition analysis. Variations on the construction of the corresponding reference sequence databases are also common. In addition, different tools provide good results in different datasets and configurations. All this variation creates a complicated scenario to researchers to decide which methods to use. Installation, configuration and execution can also be difficult especially when dealing with multiple datasets and tools. We propose MetaMeta: a pipeline to execute and integrate results from metagenome analysis tools. MetaMeta provides an easy workflow to run multiple tools with multiple samples, producing a single enhanced output profile for each sample. MetaMeta includes a database generation, pre-processing, execution, and integration steps, allowing easy execution and parallelization. The integration relies on the co-occurrence of organisms from different methods as the main feature to improve community profiling while accounting for differences in their databases. In a controlled case with simulated and real data, we show that the integrated profiles of MetaMeta overcome the best single profile. Using the same input data, it provides more sensitive and reliable results with the presence of each organism being supported by several methods. MetaMeta uses Snakemake and has six pre-configured tools, all available at BioConda channel for easy installation (conda install -c bioconda metameta). The MetaMeta pipeline is open-source and can be downloaded at: https://gitlab.com/rki_bioinformatics .
Lim, Hassol; Park, Young-Mi; Lee, Jong-Keuk; Taek Lim, Hyun
2016-10-01
To present an efficient and successful application of a single-exome sequencing study in a family clinically diagnosed with X-linked retinitis pigmentosa. Exome sequencing study based on clinical examination data. An 8-year-old proband and his family. The proband and his family members underwent comprehensive ophthalmologic examinations. Exome sequencing was undertaken in the proband using Agilent SureSelect Human All Exon Kit and Illumina HiSeq 2000 platform. Bioinformatic analysis used Illumina pipeline with Burrows-Wheeler Aligner-Genome Analysis Toolkit (BWA-GATK), followed by ANNOVAR to perform variant functional annotation. All variants passing filter criteria were validated by Sanger sequencing to confirm familial segregation. Analysis of exome sequence data identified a novel frameshift mutation in RP2 gene resulting in a premature stop codon (c.665delC, p.Pro222fsTer237). Sanger sequencing revealed this mutation co-segregated with the disease phenotype in the child's family. We identified a novel causative mutation in RP2 from a single proband's exome sequence data analysis. This study highlights the effectiveness of the whole-exome sequencing in the genetic diagnosis of X-linked retinitis pigmentosa, over the conventional sequencing methods. Even using a single exome, exome sequencing technology would be able to pinpoint pathogenic variant(s) for X-linked retinitis pigmentosa, when properly applied with aid of adequate variant filtering strategy. Copyright © 2016 Canadian Ophthalmological Society. Published by Elsevier Inc. All rights reserved.
Using Modules with MPICH-G2 (and "Loose Ends")
NASA Technical Reports Server (NTRS)
Chang, Johnny; Thigpen, William W. (Technical Monitor)
2002-01-01
A new approach to running complex, distributed MPI jobs using the MPICH-G2 library is described. This approach allows the user to switch between different versions of compilers, system libraries, MPI libraries, etc. via the "module" command. The key idea is a departure from the prescribed "(jobtype=mpi)" approach to running distributed MPI jobs. The new method requires the user to provide a script that will be run as the "executable" with the "(jobtype=single)" RSL attribute. The major advantage of the proposed method is to enable users to decide in their own script what modules, environment, etc. they would like to have in running their job.
Spreadsheet-based program for alignment of overlapping DNA sequences.
Anbazhagan, R; Gabrielson, E
1999-06-01
Molecular biology laboratories frequently face the challenge of aligning small overlapping DNA sequences derived from a long DNA segment. Here, we present a short program that can be used to adapt Excel spreadsheets as a tool for aligning DNA sequences, regardless of their orientation. The program runs on any Windows or Macintosh operating system computer with Excel 97 or Excel 98. The program is available for use as an Excel file, which can be downloaded from the BioTechniques Web site. Upon execution, the program opens a specially designed customized workbook and is capable of identifying overlapping regions between two sequence fragments and displaying the sequence alignment. It also performs a number of specialized functions such as recognition of restriction enzyme cutting sites and CpG island mapping without costly specialized software.
Program for Editing Spacecraft Command Sequences
NASA Technical Reports Server (NTRS)
Gladden, Roy; Waggoner, Bruce; Kordon, Mark; Hashemi, Mahnaz; Hanks, David; Salcedo, Jose
2006-01-01
Sequence Translator, Editor, and Expander Resource (STEER) is a computer program that facilitates construction of sequences and blocks of sequences (hereafter denoted generally as sequence products) for commanding a spacecraft. STEER also provides mechanisms for translating among various sequence product types and quickly expanding activities of a given sequence in chronological order for review and analysis of the sequence. To date, construction of sequence products has generally been done by use of such clumsy mechanisms as text-editor programs, translating among sequence product types has been challenging, and expanding sequences to time-ordered lists has involved arduous processes of converting sequence products to "real" sequences and running them through Class-A software (defined, loosely, as flight and ground software critical to a spacecraft mission). Also, heretofore, generating sequence products in standard formats has been troublesome because precise formatting and syntax are required. STEER alleviates these issues by providing a graphical user interface containing intuitive fields in which the user can enter the necessary information. The STEER expansion function provides a "quick and dirty" means of seeing how a sequence and sequence block would expand into a chronological list, without need to use of Class-A software.
NASA Technical Reports Server (NTRS)
1972-01-01
The IDAPS (Image Data Processing System) is a user-oriented, computer-based, language and control system, which provides a framework or standard for implementing image data processing applications, simplifies set-up of image processing runs so that the system may be used without a working knowledge of computer programming or operation, streamlines operation of the image processing facility, and allows multiple applications to be run in sequence without operator interaction. The control system loads the operators, interprets the input, constructs the necessary parameters for each application, and cells the application. The overlay feature of the IBSYS loader (IBLDR) provides the means of running multiple operators which would otherwise overflow core storage.
Ding, Jiarui; Condon, Anne; Shah, Sohrab P
2018-05-21
Single-cell RNA-sequencing has great potential to discover cell types, identify cell states, trace development lineages, and reconstruct the spatial organization of cells. However, dimension reduction to interpret structure in single-cell sequencing data remains a challenge. Existing algorithms are either not able to uncover the clustering structures in the data or lose global information such as groups of clusters that are close to each other. We present a robust statistical model, scvis, to capture and visualize the low-dimensional structures in single-cell gene expression data. Simulation results demonstrate that low-dimensional representations learned by scvis preserve both the local and global neighbor structures in the data. In addition, scvis is robust to the number of data points and learns a probabilistic parametric mapping function to add new data points to an existing embedding. We then use scvis to analyze four single-cell RNA-sequencing datasets, exemplifying interpretable two-dimensional representations of the high-dimensional single-cell RNA-sequencing data.
McCallion, Ciara; Donne, Bernard; Fleming, Neil; Blanksby, Brian
2014-05-01
This study compared stride length, stride frequency, contact time, flight time and foot-strike patterns (FSP) when running barefoot, and in minimalist and conventional running shoes. Habitually shod male athletes (n = 14; age 25 ± 6 yr; competitive running experience 8 ± 3 yr) completed a randomised order of 6 by 4-min treadmill runs at velocities (V1 and V2) equivalent to 70 and 85% of best 5-km race time, in the three conditions. Synchronous recording of 3-D joint kinematics and ground reaction force data examined spatiotemporal variables and FSP. Most participants adopted a mid-foot strike pattern, regardless of condition. Heel-toe latency was less at V2 than V1 (-6 ± 20 vs. -1 ± 13 ms, p < 0.05), which indicated a velocity related shift towards a more FFS pattern. Stride duration and flight time, when shod and in minimalist footwear, were greater than barefoot (713 ± 48 and 701 ± 49 vs. 679 ± 56 ms, p < 0.001; and 502 ± 45 and 503 ± 41 vs. 488 ±4 9 ms, p < 0.05, respectively). Contact time was significantly longer when running shod than barefoot or in minimalist footwear (211±30 vs. 191 ± 29 ms and 198 ± 33 ms, p < 0.001). When running barefoot, stride frequency was significantly higher (p < 0.001) than in conventional and minimalist footwear (89 ± 7 vs. 85 ± 6 and 86 ± 6 strides·min(-1)). In conclusion, differences in spatiotemporal variables occurred within a single running session, irrespective of barefoot running experience, and, without a detectable change in FSP. Key pointsDifferences in spatiotemporal variables occurred within a single running session, without a change in foot strike pattern.Stride duration and flight time were greater when shod and in minimalist footwear than when barefoot.Stride frequency when barefoot was higher than when shod or in minimalist footwear.Contact time when shod was longer than when barefoot or in minimalist footwear.Spatiotemporal variables when running in minimalist footwear more closely resemble shod than barefoot running.
McCallion, Ciara; Donne, Bernard; Fleming, Neil; Blanksby, Brian
2014-01-01
This study compared stride length, stride frequency, contact time, flight time and foot-strike patterns (FSP) when running barefoot, and in minimalist and conventional running shoes. Habitually shod male athletes (n = 14; age 25 ± 6 yr; competitive running experience 8 ± 3 yr) completed a randomised order of 6 by 4-min treadmill runs at velocities (V1 and V2) equivalent to 70 and 85% of best 5-km race time, in the three conditions. Synchronous recording of 3-D joint kinematics and ground reaction force data examined spatiotemporal variables and FSP. Most participants adopted a mid-foot strike pattern, regardless of condition. Heel-toe latency was less at V2 than V1 (-6 ± 20 vs. -1 ± 13 ms, p < 0.05), which indicated a velocity related shift towards a more FFS pattern. Stride duration and flight time, when shod and in minimalist footwear, were greater than barefoot (713 ± 48 and 701 ± 49 vs. 679 ± 56 ms, p < 0.001; and 502 ± 45 and 503 ± 41 vs. 488 ±4 9 ms, p < 0.05, respectively). Contact time was significantly longer when running shod than barefoot or in minimalist footwear (211±30 vs. 191 ± 29 ms and 198 ± 33 ms, p < 0.001). When running barefoot, stride frequency was significantly higher (p < 0.001) than in conventional and minimalist footwear (89 ± 7 vs. 85 ± 6 and 86 ± 6 strides·min-1). In conclusion, differences in spatiotemporal variables occurred within a single running session, irrespective of barefoot running experience, and, without a detectable change in FSP. Key points Differences in spatiotemporal variables occurred within a single running session, without a change in foot strike pattern. Stride duration and flight time were greater when shod and in minimalist footwear than when barefoot. Stride frequency when barefoot was higher than when shod or in minimalist footwear. Contact time when shod was longer than when barefoot or in minimalist footwear. Spatiotemporal variables when running in minimalist footwear more closely resemble shod than barefoot running. PMID:24790480
Deep sequencing reveals cell-type-specific patterns of single-cell transcriptome variation.
Dueck, Hannah; Khaladkar, Mugdha; Kim, Tae Kyung; Spaethling, Jennifer M; Francis, Chantal; Suresh, Sangita; Fisher, Stephen A; Seale, Patrick; Beck, Sheryl G; Bartfai, Tamas; Kuhn, Bernhard; Eberwine, James; Kim, Junhyong
2015-06-09
Differentiation of metazoan cells requires execution of different gene expression programs but recent single-cell transcriptome profiling has revealed considerable variation within cells of seeming identical phenotype. This brings into question the relationship between transcriptome states and cell phenotypes. Additionally, single-cell transcriptomics presents unique analysis challenges that need to be addressed to answer this question. We present high quality deep read-depth single-cell RNA sequencing for 91 cells from five mouse tissues and 18 cells from two rat tissues, along with 30 control samples of bulk RNA diluted to single-cell levels. We find that transcriptomes differ globally across tissues with regard to the number of genes expressed, the average expression patterns, and within-cell-type variation patterns. We develop methods to filter genes for reliable quantification and to calibrate biological variation. All cell types include genes with high variability in expression, in a tissue-specific manner. We also find evidence that single-cell variability of neuronal genes in mice is correlated with that in rats consistent with the hypothesis that levels of variation may be conserved. Single-cell RNA-sequencing data provide a unique view of transcriptome function; however, careful analysis is required in order to use single-cell RNA-sequencing measurements for this purpose. Technical variation must be considered in single-cell RNA-sequencing studies of expression variation. For a subset of genes, biological variability within each cell type appears to be regulated in order to perform dynamic functions, rather than solely molecular noise.
Meher, J K; Meher, P K; Dash, G N; Raval, M K
2012-01-01
The first step in gene identification problem based on genomic signal processing is to convert character strings into numerical sequences. These numerical sequences are then analysed spectrally or using digital filtering techniques for the period-3 peaks, which are present in exons (coding areas) and absent in introns (non-coding areas). In this paper, we have shown that single-indicator sequences can be generated by encoding schemes based on physico-chemical properties. Two new methods are proposed for generating single-indicator sequences based on hydration energy and dipole moments. The proposed methods produce high peak at exon locations and effectively suppress false exons (intron regions having greater peak than exon regions) resulting in high discriminating factor, sensitivity and specificity.
Clonal evolution in breast cancer revealed by single nucleus genome sequencing.
Wang, Yong; Waters, Jill; Leung, Marco L; Unruh, Anna; Roh, Whijae; Shi, Xiuqing; Chen, Ken; Scheet, Paul; Vattathil, Selina; Liang, Han; Multani, Asha; Zhang, Hong; Zhao, Rui; Michor, Franziska; Meric-Bernstam, Funda; Navin, Nicholas E
2014-08-14
Sequencing studies of breast tumour cohorts have identified many prevalent mutations, but provide limited insight into the genomic diversity within tumours. Here we developed a whole-genome and exome single cell sequencing approach called nuc-seq that uses G2/M nuclei to achieve 91% mean coverage breadth. We applied this method to sequence single normal and tumour nuclei from an oestrogen-receptor-positive (ER(+)) breast cancer and a triple-negative ductal carcinoma. In parallel, we performed single nuclei copy number profiling. Our data show that aneuploid rearrangements occurred early in tumour evolution and remained highly stable as the tumour masses clonally expanded. In contrast, point mutations evolved gradually, generating extensive clonal diversity. Using targeted single-molecule sequencing, many of the diverse mutations were shown to occur at low frequencies (<10%) in the tumour mass. Using mathematical modelling we found that the triple-negative tumour cells had an increased mutation rate (13.3×), whereas the ER(+) tumour cells did not. These findings have important implications for the diagnosis, therapeutic treatment and evolution of chemoresistance in breast cancer.