USDA-ARS?s Scientific Manuscript database
BACKGROUND: Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain u...
An efficient approach to BAC based assembly of complex genomes.
Visendi, Paul; Berkman, Paul J; Hayashi, Satomi; Golicz, Agnieszka A; Bayer, Philipp E; Ruperao, Pradeep; Hurgobin, Bhavna; Montenegro, Juan; Chan, Chon-Kit Kenneth; Staňková, Helena; Batley, Jacqueline; Šimková, Hana; Doležel, Jaroslav; Edwards, David
2016-01-01
There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes.
Ciric, Milica; Moon, Christina D; Leahy, Sinead C; Creevey, Christopher J; Altermann, Eric; Attwood, Graeme T; Rakonjac, Jasna; Gagic, Dragana
2014-05-12
In silico, secretome proteins can be predicted from completely sequenced genomes using various available algorithms that identify membrane-targeting sequences. For metasecretome (collection of surface, secreted and transmembrane proteins from environmental microbial communities) this approach is impractical, considering that the metasecretome open reading frames (ORFs) comprise only 10% to 30% of total metagenome, and are poorly represented in the dataset due to overall low coverage of metagenomic gene pool, even in large-scale projects. By combining secretome-selective phage display and next-generation sequencing, we focused the sequence analysis of complex rumen microbial community on the metasecretome component of the metagenome. This approach achieved high enrichment (29 fold) of secreted fibrolytic enzymes from the plant-adherent microbial community of the bovine rumen. In particular, we identified hundreds of heretofore rare modules belonging to cellulosomes, cell-surface complexes specialised for recognition and degradation of the plant fibre. As a method, metasecretome phage display combined with next-generation sequencing has a power to sample the diversity of low-abundance surface and secreted proteins that would otherwise require exceptionally large metagenomic sequencing projects. As a resource, metasecretome display library backed by the dataset obtained by next-generation sequencing is ready for i) affinity selection by standard phage display methodology and ii) easy purification of displayed proteins as part of the virion for individual functional analysis.
DNA fingerprinting, DNA barcoding, and next generation sequencing technology in plants.
Sucher, Nikolaus J; Hennell, James R; Carles, Maria C
2012-01-01
DNA fingerprinting of plants has become an invaluable tool in forensic, scientific, and industrial laboratories all over the world. PCR has become part of virtually every variation of the plethora of approaches used for DNA fingerprinting today. DNA sequencing is increasingly used either in combination with or as a replacement for traditional DNA fingerprinting techniques. A prime example is the use of short, standardized regions of the genome as taxon barcodes for biological identification of plants. Rapid advances in "next generation sequencing" (NGS) technology are driving down the cost of sequencing and bringing large-scale sequencing projects into the reach of individual investigators. We present an overview of recent publications that demonstrate the use of "NGS" technology for DNA fingerprinting and DNA barcoding applications.
Genomic Encyclopedia of Type Strains, Phase I: The one thousand microbial genomes (KMG-I) project
Kyrpides, Nikos C.; Woyke, Tanja; Eisen, Jonathan A.; ...
2014-06-15
The Genomic Encyclopedia of Bacteria and Archaea (GEBA) project was launched by the JGI in 2007 as a pilot project with the objective of sequencing 250 bacterial and archaeal genomes. The two major goals of that project were (a) to test the hypothesis that there are many benefits to the use the phylogenetic diversity of organisms in the tree of life as a primary criterion for generating their genome sequence and (b) to develop the necessary framework, technology and organization for large-scale sequencing of microbial isolate genomes. While the GEBA pilot project has not yet been entirely completed, both ofmore » the original goals have already been successfully accomplished, leading the way for the next phase of the project. Here we propose taking the GEBA project to the next level, by generating high quality draft genomes for 1,000 bacterial and archaeal strains. This represents a combined 16-fold increase in both scale and speed as compared to the GEBA pilot project (250 isolate genomes in 4+ years). We will follow a similar approach for organism selection and sequencing prioritization as was done for the GEBA pilot project (i.e. phylogenetic novelty, availability and growth of cultures of type strains and DNA extraction capability), focusing on type strains as this ensures reproducibility of our results and provides the strongest linkage between genome sequences and other knowledge about each strain. In turn, this project will constitute a pilot phase of a larger effort that will target the genome sequences of all available type strains of the Bacteria and Archaea.« less
Genomic Encyclopedia of Type Strains, Phase I: The one thousand microbial genomes (KMG-I) project
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kyrpides, Nikos C.; Woyke, Tanja; Eisen, Jonathan A.
The Genomic Encyclopedia of Bacteria and Archaea (GEBA) project was launched by the JGI in 2007 as a pilot project with the objective of sequencing 250 bacterial and archaeal genomes. The two major goals of that project were (a) to test the hypothesis that there are many benefits to the use the phylogenetic diversity of organisms in the tree of life as a primary criterion for generating their genome sequence and (b) to develop the necessary framework, technology and organization for large-scale sequencing of microbial isolate genomes. While the GEBA pilot project has not yet been entirely completed, both ofmore » the original goals have already been successfully accomplished, leading the way for the next phase of the project. Here we propose taking the GEBA project to the next level, by generating high quality draft genomes for 1,000 bacterial and archaeal strains. This represents a combined 16-fold increase in both scale and speed as compared to the GEBA pilot project (250 isolate genomes in 4+ years). We will follow a similar approach for organism selection and sequencing prioritization as was done for the GEBA pilot project (i.e. phylogenetic novelty, availability and growth of cultures of type strains and DNA extraction capability), focusing on type strains as this ensures reproducibility of our results and provides the strongest linkage between genome sequences and other knowledge about each strain. In turn, this project will constitute a pilot phase of a larger effort that will target the genome sequences of all available type strains of the Bacteria and Archaea.« less
A Next-Generation Sequencing Primer—How Does It Work and What Can It Do?
Alekseyev, Yuriy O.; Fazeli, Roghayeh; Yang, Shi; Basran, Raveen; Miller, Nancy S.
2018-01-01
Next-generation sequencing refers to a high-throughput technology that determines the nucleic acid sequences and identifies variants in a sample. The technology has been introduced into clinical laboratory testing and produces test results for precision medicine. Since next-generation sequencing is relatively new, graduate students, medical students, pathology residents, and other physicians may benefit from a primer to provide a foundation about basic next-generation sequencing methods and applications, as well as specific examples where it has had diagnostic and prognostic utility. Next-generation sequencing technology grew out of advances in multiple fields to produce a sophisticated laboratory test with tremendous potential. Next-generation sequencing may be used in the clinical setting to look for specific genetic alterations in patients with cancer, diagnose inherited conditions such as cystic fibrosis, and detect and profile microbial organisms. This primer will review DNA sequencing technology, the commercialization of next-generation sequencing, and clinical uses of next-generation sequencing. Specific applications where next-generation sequencing has demonstrated utility in oncology are provided. PMID:29761157
Lu, Emily; Elizondo-Riojas, Miguel-Angel; Chang, Jeffrey T; Volk, David E
2014-06-10
Next-generation sequencing results from bead-based aptamer libraries have demonstrated that traditional DNA/RNA alignment software is insufficient. This is particularly true for X-aptamers containing specialty bases (W, X, Y, Z, ...) that are identified by special encoding. Thus, we sought an automated program that uses the inherent design scheme of bead-based X-aptamers to create a hypothetical reference library and Markov modeling techniques to provide improved alignments. Aptaligner provides this feature as well as length error and noise level cutoff features, is parallelized to run on multiple central processing units (cores), and sorts sequences from a single chip into projects and subprojects.
USDA-ARS?s Scientific Manuscript database
Over the past decade, Next Generation Sequencing (NGS) technologies, also called deep sequencing, have continued to evolve, increasing capacity and lower the cost necessary for large genome sequencing projects. The one of the advantage of NGS platforms is the possibility to sequence the samples with...
Cassini Mission Sequence Subsystem (MSS)
NASA Technical Reports Server (NTRS)
Alland, Robert
2011-01-01
This paper describes my work with the Cassini Mission Sequence Subsystem (MSS) team during the summer of 2011. It gives some background on the motivation for this project and describes the expected benefit to the Cassini program. It then introduces the two tasks that I worked on - an automatic system auditing tool and a series of corrections to the Cassini Sequence Generator (SEQ_GEN) - and the specific objectives these tasks were to accomplish. Next, it details the approach I took to meet these objectives and the results of this approach, followed by a discussion of how the outcome of the project compares with my initial expectations. The paper concludes with a summary of my experience working on this project, lists what the next steps are, and acknowledges the help of my Cassini colleagues.
USDA-ARS?s Scientific Manuscript database
Modern day genomics holds the promise of solving the complexities of basic plant sciences, and of catalyzing practical advances in plant breeding. While contiguous, "base perfect" deep sequencing is a key module of any genome project, recent advances in parallel next generation sequencing technologi...
Tools to exploit sequence data to find new markers and disease loci in dairy cattle
USDA-ARS?s Scientific Manuscript database
The decrease in cost of Next-Generation Sequencing has brought the technology into the realm of practical applications in livestock genomics. Recently, the 1000 Bulls Project has heralded the possibility of using full sequence data to improve imputation and detect disease loci within select founder ...
New tool to assemble repetitive regions using next-generation sequencing data
NASA Astrophysics Data System (ADS)
Kuśmirek, Wiktor; Nowak, Robert M.; Neumann, Łukasz
2017-08-01
The next generation sequencing techniques produce a large amount of sequencing data. Some part of the genome are composed of repetitive DNA sequences, which are very problematic for the existing genome assemblers. We propose a modification of the algorithm for a DNA assembly, which uses the relative frequency of reads to properly reconstruct repetitive sequences. The new approach was implemented and tested, as a demonstration of the capability of our software we present some results for model organisms. The new implementation, using a three-layer software architecture was selected, where the presentation layer, data processing layer, and data storage layer were kept separate. Source code as well as demo application with web interface and the additional data are available at project web-page: http://dnaasm.sourceforge.net.
Gullapalli, Rama R; Desai, Ketaki V; Santana-Santos, Lucas; Kant, Jeffrey A; Becich, Michael J
2012-01-01
The Human Genome Project (HGP) provided the initial draft of mankind's DNA sequence in 2001. The HGP was produced by 23 collaborating laboratories using Sanger sequencing of mapped regions as well as shotgun sequencing techniques in a process that occupied 13 years at a cost of ~$3 billion. Today, Next Generation Sequencing (NGS) techniques represent the next phase in the evolution of DNA sequencing technology at dramatically reduced cost compared to traditional Sanger sequencing. A single laboratory today can sequence the entire human genome in a few days for a few thousand dollars in reagents and staff time. Routine whole exome or even whole genome sequencing of clinical patients is well within the realm of affordability for many academic institutions across the country. This paper reviews current sequencing technology methods and upcoming advancements in sequencing technology as well as challenges associated with data generation, data manipulation and data storage. Implementation of routine NGS data in cancer genomics is discussed along with potential pitfalls in the interpretation of the NGS data. The overarching importance of bioinformatics in the clinical implementation of NGS is emphasized.[7] We also review the issue of physician education which also is an important consideration for the successful implementation of NGS in the clinical workplace. NGS technologies represent a golden opportunity for the next generation of pathologists to be at the leading edge of the personalized medicine approaches coming our way. Often under-emphasized issues of data access and control as well as potential ethical implications of whole genome NGS sequencing are also discussed. Despite some challenges, it's hard not to be optimistic about the future of personalized genome sequencing and its potential impact on patient care and the advancement of knowledge of human biology and disease in the near future.
Gullapalli, Rama R.; Desai, Ketaki V.; Santana-Santos, Lucas; Kant, Jeffrey A.; Becich, Michael J.
2012-01-01
The Human Genome Project (HGP) provided the initial draft of mankind's DNA sequence in 2001. The HGP was produced by 23 collaborating laboratories using Sanger sequencing of mapped regions as well as shotgun sequencing techniques in a process that occupied 13 years at a cost of ~$3 billion. Today, Next Generation Sequencing (NGS) techniques represent the next phase in the evolution of DNA sequencing technology at dramatically reduced cost compared to traditional Sanger sequencing. A single laboratory today can sequence the entire human genome in a few days for a few thousand dollars in reagents and staff time. Routine whole exome or even whole genome sequencing of clinical patients is well within the realm of affordability for many academic institutions across the country. This paper reviews current sequencing technology methods and upcoming advancements in sequencing technology as well as challenges associated with data generation, data manipulation and data storage. Implementation of routine NGS data in cancer genomics is discussed along with potential pitfalls in the interpretation of the NGS data. The overarching importance of bioinformatics in the clinical implementation of NGS is emphasized.[7] We also review the issue of physician education which also is an important consideration for the successful implementation of NGS in the clinical workplace. NGS technologies represent a golden opportunity for the next generation of pathologists to be at the leading edge of the personalized medicine approaches coming our way. Often under-emphasized issues of data access and control as well as potential ethical implications of whole genome NGS sequencing are also discussed. Despite some challenges, it's hard not to be optimistic about the future of personalized genome sequencing and its potential impact on patient care and the advancement of knowledge of human biology and disease in the near future. PMID:23248761
Next Generation Sequencing Technologies: The Doorway to the Unexplored Genomics of Non-Model Plants
Unamba, Chibuikem I. N.; Nag, Akshay; Sharma, Ram K.
2015-01-01
Non-model plants i.e., the species which have one or all of the characters such as long life cycle, difficulty to grow in the laboratory or poor fecundity, have been schemed out of sequencing projects earlier, due to high running cost of Sanger sequencing. Consequently, the information about their genomics and key biological processes are inadequate. However, the advent of fast and cost effective next generation sequencing (NGS) platforms in the recent past has enabled the unearthing of certain characteristic gene structures unique to these species. It has also aided in gaining insight about mechanisms underlying processes of gene expression and secondary metabolism as well as facilitated development of genomic resources for diversity characterization, evolutionary analysis and marker assisted breeding even without prior availability of genomic sequence information. In this review we explore how different Next Gen Sequencing platforms, as well as recent advances in NGS based high throughput genotyping technologies are rewarding efforts on de-novo whole genome/transcriptome sequencing, development of genome wide sequence based markers resources for improvement of non-model crops that are less costly than phenotyping. PMID:26734016
GAMES identifies and annotates mutations in next-generation sequencing projects.
Sana, Maria Elena; Iascone, Maria; Marchetti, Daniela; Palatini, Jeff; Galasso, Marco; Volinia, Stefano
2011-01-01
Next-generation sequencing (NGS) methods have the potential for changing the landscape of biomedical science, but at the same time pose several problems in analysis and interpretation. Currently, there are many commercial and public software packages that analyze NGS data. However, the limitations of these applications include output which is insufficiently annotated and of difficult functional comprehension to end users. We developed GAMES (Genomic Analysis of Mutations Extracted by Sequencing), a pipeline aiming to serve as an efficient middleman between data deluge and investigators. GAMES attains multiple levels of filtering and annotation, such as aligning the reads to a reference genome, performing quality control and mutational analysis, integrating results with genome annotations and sorting each mismatch/deletion according to a range of parameters. Variations are matched to known polymorphisms. The prediction of functional mutations is achieved by using different approaches. Overall GAMES enables an effective complexity reduction in large-scale DNA-sequencing projects. GAMES is available free of charge to academic users and may be obtained from http://aqua.unife.it/GAMES.
Evaluation of ribosomal RNA removal protocols for Salmonella RNA-Seq projects
USDA-ARS?s Scientific Manuscript database
Next generation sequencing is a powerful technology and its application to sequencing entire RNA populations of food-borne pathogens will provide valuable insights. A problem unique to prokaryotic RNA-Seq is the massive abundance of ribosomal RNA. Unlike eukaryotic messenger RNA (mRNA), bacterial ...
Altmüller, Janine; Budde, Birgit S; Nürnberg, Peter
2014-02-01
Abstract Targeted re-sequencing such as gene panel sequencing (GPS) has become very popular in medical genetics, both for research projects and in diagnostic settings. The technical principles of the different enrichment methods have been reviewed several times before; however, new enrichment products are constantly entering the market, and researchers are often puzzled about the requirement to take decisions about long-term commitments, both for the enrichment product and the sequencing technology. This review summarizes important considerations for the experimental design and provides helpful recommendations in choosing the best sequencing strategy for various research projects and diagnostic applications.
WormBase 2014: new views of curated biology
Harris, Todd W.; Baran, Joachim; Bieri, Tamberlyn; Cabunoc, Abigail; Chan, Juancarlos; Chen, Wen J.; Davis, Paul; Done, James; Grove, Christian; Howe, Kevin; Kishore, Ranjana; Lee, Raymond; Li, Yuling; Muller, Hans-Michael; Nakamura, Cecilia; Ozersky, Philip; Paulini, Michael; Raciti, Daniela; Schindelman, Gary; Tuli, Mary Ann; Auken, Kimberly Van; Wang, Daniel; Wang, Xiaodong; Williams, Gary; Wong, J. D.; Yook, Karen; Schedl, Tim; Hodgkin, Jonathan; Berriman, Matthew; Kersey, Paul; Spieth, John; Stein, Lincoln; Sternberg, Paul W.
2014-01-01
WormBase (http://www.wormbase.org/) is a highly curated resource dedicated to supporting research using the model organism Caenorhabditis elegans. With an electronic history predating the World Wide Web, WormBase contains information ranging from the sequence and phenotype of individual alleles to genome-wide studies generated using next-generation sequencing technologies. In recent years, we have expanded the contents to include data on additional nematodes of agricultural and medical significance, bringing the knowledge of C. elegans to bear on these systems and providing support for underserved research communities. Manual curation of the primary literature remains a central focus of the WormBase project, providing users with reliable, up-to-date and highly cross-linked information. In this update, we describe efforts to organize the original atomized and highly contextualized curated data into integrated syntheses of discrete biological topics. Next, we discuss our experiences coping with the vast increase in available genome sequences made possible through next-generation sequencing platforms. Finally, we describe some of the features and tools of the new WormBase Web site that help users better find and explore data of interest. PMID:24194605
California mild CTV strains that break resistance in Trifoliate Orange
USDA-ARS?s Scientific Manuscript database
This is the final report of a project to characterize California isolates of Citrus tristeza virus (CTV) that replicate in Poncirus trifoliata (trifoliate orange). Next Generation Sequencing (NGS) of viral small interfering RNAs (siRNAs) and assembly of full-length sequences of mild California CTV i...
Standardization and quality management in next-generation sequencing.
Endrullat, Christoph; Glökler, Jörn; Franke, Philipp; Frohme, Marcus
2016-09-01
DNA sequencing continues to evolve quickly even after > 30 years. Many new platforms suddenly appeared and former established systems have vanished in almost the same manner. Since establishment of next-generation sequencing devices, this progress gains momentum due to the continually growing demand for higher throughput, lower costs and better quality of data. In consequence of this rapid development, standardized procedures and data formats as well as comprehensive quality management considerations are still scarce. Here, we listed and summarized current standardization efforts and quality management initiatives from companies, organizations and societies in form of published studies and ongoing projects. These comprise on the one hand quality documentation issues like technical notes, accreditation checklists and guidelines for validation of sequencing workflows. On the other hand, general standard proposals and quality metrics are developed and applied to the sequencing workflow steps with the main focus on upstream processes. Finally, certain standard developments for downstream pipeline data handling, processing and storage are discussed in brief. These standardization approaches represent a first basis for continuing work in order to prospectively implement next-generation sequencing in important areas such as clinical diagnostics, where reliable results and fast processing is crucial. Additionally, these efforts will exert a decisive influence on traceability and reproducibility of sequence data.
Jun, Goo; Wing, Mary Kate; Abecasis, Gonçalo R; Kang, Hyun Min
2015-06-01
The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies. © 2015 Jun et al.; Published by Cold Spring Harbor Laboratory Press.
Next Generation Sequencing of Actinobacteria for the Discovery of Novel Natural Products
Gomez-Escribano, Juan Pablo; Alt, Silke; Bibb, Mervyn J.
2016-01-01
Like many fields of the biosciences, actinomycete natural products research has been revolutionised by next-generation DNA sequencing (NGS). Hundreds of new genome sequences from actinobacteria are made public every year, many of them as a result of projects aimed at identifying new natural products and their biosynthetic pathways through genome mining. Advances in these technologies in the last five years have meant not only a reduction in the cost of whole genome sequencing, but also a substantial increase in the quality of the data, having moved from obtaining a draft genome sequence comprised of several hundred short contigs, sometimes of doubtful reliability, to the possibility of obtaining an almost complete and accurate chromosome sequence in a single contig, allowing a detailed study of gene clusters and the design of strategies for refactoring and full gene cluster synthesis. The impact that these technologies are having in the discovery and study of natural products from actinobacteria, including those from the marine environment, is only starting to be realised. In this review we provide a historical perspective of the field, analyse the strengths and limitations of the most relevant technologies, and share the insights acquired during our genome mining projects. PMID:27089350
Meeting Report: The Terabase Metagenomics Workshop and the Vision of an Earth Microbiome Project
Gilbert, Jack A.; Meyer, Folker; Antonopoulos, Dion; Balaji, Pavan; Brown, C. Titus; Brown, Christopher T.; Desai, Narayan; Eisen, Jonathan A; Evers, Dirk; Field, Dawn; Feng, Wu; Huson, Daniel; Jansson, Janet; Knight, Rob; Knight, James; Kolker, Eugene; Konstantindis, Kostas; Kostka, Joel; Kyrpides, Nikos; Mackelprang, Rachel; McHardy, Alice; Quince, Christopher; Raes, Jeroen; Sczyrba, Alexander; Shade, Ashley; Stevens, Rick
2010-01-01
Between July 18th and 24th 2010, 26 leading microbial ecology, computation, bioinformatics and statistics researchers came together in Snowbird, Utah (USA) to discuss the challenge of how to best characterize the microbial world using next-generation sequencing technologies. The meeting was entitled “Terabase Metagenomics” and was sponsored by the Institute for Computing in Science (ICiS) summer 2010 workshop program. The aim of the workshop was to explore the fundamental questions relating to microbial ecology that could be addressed using advances in sequencing potential. Technological advances in next-generation sequencing platforms such as the Illumina HiSeq 2000 can generate in excess of 250 billion base pairs of genetic information in 8 days. Thus, the generation of a trillion base pairs of genetic information is becoming a routine matter. The main outcome from this meeting was the birth of a concept and practical approach to exploring microbial life on earth, the Earth Microbiome Project (EMP). Here we briefly describe the highlights of this meeting and provide an overview of the EMP concept and how it can be applied to exploration of the microbiome of each ecosystem on this planet. PMID:21304727
Sequencing-based diagnostics for pediatric genetic diseases: progress and potential
Tayoun, Ahmad Abou; Krock, Bryan; Spinner, Nancy B.
2016-01-01
Introduction The last two decades have witnessed revolutionary changes in clinical diagnostics, fueled by the Human Genome Project and advances in high throughput, Next Generation Sequencing (NGS). We review the current state of sequencing-based pediatric diagnostics, associated challenges, and future prospects. Areas Covered We present an overview of genetic disease in children, review the technical aspects of Next Generation Sequencing and the strategies to make molecular diagnoses for children with genetic disease. We discuss the challenges of genomic sequencing including incomplete current knowledge of variants, lack of data about certain genomic regions, mosaicism, and the presence of regions with high homology. Expert Commentary NGS has been a transformative technology and the gap between the research and clinical communities has never been so narrow. Therapeutic interventions are emerging based on genomic findings and the applications of NGS are progressing to prenatal genetics, epigenomics and transcriptomics. PMID:27388938
Next-generation sequencing for diagnosis of rare diseases in the neonatal intensive care unit.
Daoud, Hussein; Luco, Stephanie M; Li, Rui; Bareke, Eric; Beaulieu, Chandree; Jarinova, Olga; Carson, Nancy; Nikkel, Sarah M; Graham, Gail E; Richer, Julie; Armour, Christine; Bulman, Dennis E; Chakraborty, Pranesh; Geraghty, Michael; Lines, Matthew A; Lacaze-Masmonteil, Thierry; Majewski, Jacek; Boycott, Kym M; Dyment, David A
2016-08-09
Rare diseases often present in the first days and weeks of life and may require complex management in the setting of a neonatal intensive care unit (NICU). Exhaustive consultations and traditional genetic or metabolic investigations are costly and often fail to arrive at a final diagnosis when no recognizable syndrome is suspected. For this pilot project, we assessed the feasibility of next-generation sequencing as a tool to improve the diagnosis of rare diseases in newborns in the NICU. We retrospectively identified and prospectively recruited newborns and infants admitted to the NICU of the Children's Hospital of Eastern Ontario and the Ottawa Hospital, General Campus, who had been referred to the medical genetics or metabolics inpatient consult service and had features suggesting an underlying genetic or metabolic condition. DNA from the newborns and parents was enriched for a panel of clinically relevant genes and sequenced on a MiSeq sequencing platform (Illumina Inc.). The data were interpreted with a standard informatics pipeline and reported to care providers, who assessed the importance of genotype-phenotype correlations. Of 20 newborns studied, 8 received a diagnosis on the basis of next-generation sequencing (diagnostic rate 40%). The diagnoses were renal tubular dysgenesis, SCN1A-related encephalopathy syndrome, myotubular myopathy, FTO deficiency syndrome, cranioectodermal dysplasia, congenital myasthenic syndrome, autosomal dominant intellectual disability syndrome type 7 and Denys-Drash syndrome. This pilot study highlighted the potential of next-generation sequencing to deliver molecular diagnoses rapidly with a high success rate. With broader use, this approach has the potential to alter health care delivery in the NICU. © 2016 Canadian Medical Association or its licensors.
Next-generation sequencing for diagnosis of rare diseases in the neonatal intensive care unit
Daoud, Hussein; Luco, Stephanie M.; Li, Rui; Bareke, Eric; Beaulieu, Chandree; Jarinova, Olga; Carson, Nancy; Nikkel, Sarah M.; Graham, Gail E.; Richer, Julie; Armour, Christine; Bulman, Dennis E.; Chakraborty, Pranesh; Geraghty, Michael; Lines, Matthew A.; Lacaze-Masmonteil, Thierry; Majewski, Jacek; Boycott, Kym M.; Dyment, David A.
2016-01-01
Background: Rare diseases often present in the first days and weeks of life and may require complex management in the setting of a neonatal intensive care unit (NICU). Exhaustive consultations and traditional genetic or metabolic investigations are costly and often fail to arrive at a final diagnosis when no recognizable syndrome is suspected. For this pilot project, we assessed the feasibility of next-generation sequencing as a tool to improve the diagnosis of rare diseases in newborns in the NICU. Methods: We retrospectively identified and prospectively recruited newborns and infants admitted to the NICU of the Children’s Hospital of Eastern Ontario and the Ottawa Hospital, General Campus, who had been referred to the medical genetics or metabolics inpatient consult service and had features suggesting an underlying genetic or metabolic condition. DNA from the newborns and parents was enriched for a panel of clinically relevant genes and sequenced on a MiSeq sequencing platform (Illumina Inc.). The data were interpreted with a standard informatics pipeline and reported to care providers, who assessed the importance of genotype–phenotype correlations. Results: Of 20 newborns studied, 8 received a diagnosis on the basis of next-generation sequencing (diagnostic rate 40%). The diagnoses were renal tubular dysgenesis, SCN1A-related encephalopathy syndrome, myotubular myopathy, FTO deficiency syndrome, cranioectodermal dysplasia, congenital myasthenic syndrome, autosomal dominant intellectual disability syndrome type 7 and Denys–Drash syndrome. Interpretation: This pilot study highlighted the potential of next-generation sequencing to deliver molecular diagnoses rapidly with a high success rate. With broader use, this approach has the potential to alter health care delivery in the NICU. PMID:27241786
He, W; Zhao, S; Liu, X; Dong, S; Lv, J; Liu, D; Wang, J; Meng, Z
2013-12-04
Large-scale next-generation sequencing (NGS)-based resequencing detects sequence variations, constructs evolutionary histories, and identifies phenotype-related genotypes. However, NGS-based resequencing studies generate extraordinarily large amounts of data, making computations difficult. Effective use and analysis of these data for NGS-based resequencing studies remains a difficult task for individual researchers. Here, we introduce ReSeqTools, a full-featured toolkit for NGS (Illumina sequencing)-based resequencing analysis, which processes raw data, interprets mapping results, and identifies and annotates sequence variations. ReSeqTools provides abundant scalable functions for routine resequencing analysis in different modules to facilitate customization of the analysis pipeline. ReSeqTools is designed to use compressed data files as input or output to save storage space and facilitates faster and more computationally efficient large-scale resequencing studies in a user-friendly manner. It offers abundant practical functions and generates useful statistics during the analysis pipeline, which significantly simplifies resequencing analysis. Its integrated algorithms and abundant sub-functions provide a solid foundation for special demands in resequencing projects. Users can combine these functions to construct their own pipelines for other purposes.
Identifying micro-inversions using high-throughput sequencing reads.
He, Feifei; Li, Yang; Tang, Yu-Hang; Ma, Jian; Zhu, Huaiqiu
2016-01-11
The identification of inversions of DNA segments shorter than read length (e.g., 100 bp), defined as micro-inversions (MIs), remains challenging for next-generation sequencing reads. It is acknowledged that MIs are important genomic variation and may play roles in causing genetic disease. However, current alignment methods are generally insensitive to detect MIs. Here we develop a novel tool, MID (Micro-Inversion Detector), to identify MIs in human genomes using next-generation sequencing reads. The algorithm of MID is designed based on a dynamic programming path-finding approach. What makes MID different from other variant detection tools is that MID can handle small MIs and multiple breakpoints within an unmapped read. Moreover, MID improves reliability in low coverage data by integrating multiple samples. Our evaluation demonstrated that MID outperforms Gustaf, which can currently detect inversions from 30 bp to 500 bp. To our knowledge, MID is the first method that can efficiently and reliably identify MIs from unmapped short next-generation sequencing reads. MID is reliable on low coverage data, which is suitable for large-scale projects such as the 1000 Genomes Project (1KGP). MID identified previously unknown MIs from the 1KGP that overlap with genes and regulatory elements in the human genome. We also identified MIs in cancer cell lines from Cancer Cell Line Encyclopedia (CCLE). Therefore our tool is expected to be useful to improve the study of MIs as a type of genetic variant in the human genome. The source code can be downloaded from: http://cqb.pku.edu.cn/ZhuLab/MID .
Next generation tools for genomic data generation, distribution, and visualization
2010-01-01
Background With the rapidly falling cost and availability of high throughput sequencing and microarray technologies, the bottleneck for effectively using genomic analysis in the laboratory and clinic is shifting to one of effectively managing, analyzing, and sharing genomic data. Results Here we present three open-source, platform independent, software tools for generating, analyzing, distributing, and visualizing genomic data. These include a next generation sequencing/microarray LIMS and analysis project center (GNomEx); an application for annotating and programmatically distributing genomic data using the community vetted DAS/2 data exchange protocol (GenoPub); and a standalone Java Swing application (GWrap) that makes cutting edge command line analysis tools available to those who prefer graphical user interfaces. Both GNomEx and GenoPub use the rich client Flex/Flash web browser interface to interact with Java classes and a relational database on a remote server. Both employ a public-private user-group security model enabling controlled distribution of patient and unpublished data alongside public resources. As such, they function as genomic data repositories that can be accessed manually or programmatically through DAS/2-enabled client applications such as the Integrated Genome Browser. Conclusions These tools have gained wide use in our core facilities, research laboratories and clinics and are freely available for non-profit use. See http://sourceforge.net/projects/gnomex/, http://sourceforge.net/projects/genoviz/, and http://sourceforge.net/projects/useq. PMID:20828407
Carr, Ian M; Morgan, Joanne; Watson, Christopher; Melnik, Svitlana; Diggle, Christine P; Logan, Clare V; Harrison, Sally M; Taylor, Graham R; Pena, Sergio D J; Markham, Alexander F; Alkuraya, Fowzan S; Black, Graeme C M; Ali, Manir; Bonthron, David T
2013-07-01
Massively parallel ("next generation") DNA sequencing (NGS) has quickly become the method of choice for seeking pathogenic mutations in rare uncharacterized monogenic diseases. Typically, before DNA sequencing, protein-coding regions are enriched from patient genomic DNA, representing either the entire genome ("exome sequencing") or selected mapped candidate loci. Sequence variants, identified as differences between the patient's and the human genome reference sequences, are then filtered according to various quality parameters. Changes are screened against datasets of known polymorphisms, such as dbSNP and the 1000 Genomes Project, in the effort to narrow the list of candidate causative variants. An increasing number of commercial services now offer to both generate and align NGS data to a reference genome. This potentially allows small groups with limited computing infrastructure and informatics skills to utilize this technology. However, the capability to effectively filter and assess sequence variants is still an important bottleneck in the identification of deleterious sequence variants in both research and diagnostic settings. We have developed an approach to this problem comprising a user-friendly suite of programs that can interactively analyze, filter and screen data from enrichment-capture NGS data. These programs ("Agile Suite") are particularly suitable for small-scale gene discovery or for diagnostic analysis. © 2013 WILEY PERIODICALS, INC.
Gong, Jun; Pan, Kathy; Fakih, Marwan; Pal, Sumanta; Salgia, Ravi
2018-03-20
Advancements in next-generation sequencing have greatly enhanced the development of biomarker-driven cancer therapies. The affordability and availability of next-generation sequencers have allowed for the commercialization of next-generation sequencing platforms that have found widespread use for clinical-decision making and research purposes. Despite the greater availability of tumor molecular profiling by next-generation sequencing at our doorsteps, the achievement of value-based care, or improving patient outcomes while reducing overall costs or risks, in the era of precision oncology remains a looming challenge. In this review, we highlight available data through a pre-established and conceptualized framework for evaluating value-based medicine to assess the cost (efficiency), clinical benefit (effectiveness), and toxicity (safety) of genomic profiling in cancer care. We also provide perspectives on future directions of next-generation sequencing from targeted panels to whole-exome or whole-genome sequencing and describe potential strategies needed to attain value-based genomics.
Gong, Jun; Pan, Kathy; Fakih, Marwan; Pal, Sumanta; Salgia, Ravi
2018-01-01
Advancements in next-generation sequencing have greatly enhanced the development of biomarker-driven cancer therapies. The affordability and availability of next-generation sequencers have allowed for the commercialization of next-generation sequencing platforms that have found widespread use for clinical-decision making and research purposes. Despite the greater availability of tumor molecular profiling by next-generation sequencing at our doorsteps, the achievement of value-based care, or improving patient outcomes while reducing overall costs or risks, in the era of precision oncology remains a looming challenge. In this review, we highlight available data through a pre-established and conceptualized framework for evaluating value-based medicine to assess the cost (efficiency), clinical benefit (effectiveness), and toxicity (safety) of genomic profiling in cancer care. We also provide perspectives on future directions of next-generation sequencing from targeted panels to whole-exome or whole-genome sequencing and describe potential strategies needed to attain value-based genomics. PMID:29644010
NGSANE: a lightweight production informatics framework for high-throughput data analysis.
Buske, Fabian A; French, Hugh J; Smith, Martin A; Clark, Susan J; Bauer, Denis C
2014-05-15
The initial steps in the analysis of next-generation sequencing data can be automated by way of software 'pipelines'. However, individual components depreciate rapidly because of the evolving technology and analysis methods, often rendering entire versions of production informatics pipelines obsolete. Constructing pipelines from Linux bash commands enables the use of hot swappable modular components as opposed to the more rigid program call wrapping by higher level languages, as implemented in comparable published pipelining systems. Here we present Next Generation Sequencing ANalysis for Enterprises (NGSANE), a Linux-based, high-performance-computing-enabled framework that minimizes overhead for set up and processing of new projects, yet maintains full flexibility of custom scripting when processing raw sequence data. Ngsane is implemented in bash and publicly available under BSD (3-Clause) licence via GitHub at https://github.com/BauerLab/ngsane. Denis.Bauer@csiro.au Supplementary data are available at Bioinformatics online.
Germplasm Management in the Post-genomics Era-a case study with lettuce
USDA-ARS?s Scientific Manuscript database
High-throughput genotyping platforms and next-generation sequencing technologies revolutionized our ways in germplasm characterization. In collaboration with UC Davis Genome Center, we completed a project of genotyping the entire cultivated lettuce (Lactuca sativa L.) collection of 1,066 accessions ...
Applications of nanotechnology, next generation sequencing and microarrays in biomedical research.
Elingaramil, Sauli; Li, Xiaolong; He, Nongyue
2013-07-01
Next-generation sequencing technologies, microarrays and advances in bio nanotechnology have had an enormous impact on research within a short time frame. This impact appears certain to increase further as many biomedical institutions are now acquiring these prevailing new technologies. Beyond conventional sampling of genome content, wide-ranging applications are rapidly evolving for next-generation sequencing, microarrays and nanotechnology. To date, these technologies have been applied in a variety of contexts, including whole-genome sequencing, targeted re sequencing and discovery of transcription factor binding sites, noncoding RNA expression profiling and molecular diagnostics. This paper thus discusses current applications of nanotechnology, next-generation sequencing technologies and microarrays in biomedical research and highlights the transforming potential these technologies offer.
A Window Into Clinical Next-Generation Sequencing-Based Oncology Testing Practices.
Nagarajan, Rakesh; Bartley, Angela N; Bridge, Julia A; Jennings, Lawrence J; Kamel-Reid, Suzanne; Kim, Annette; Lazar, Alexander J; Lindeman, Neal I; Moncur, Joel; Rai, Alex J; Routbort, Mark J; Vasalos, Patricia; Merker, Jason D
2017-12-01
- Detection of acquired variants in cancer is a paradigm of precision medicine, yet little has been reported about clinical laboratory practices across a broad range of laboratories. - To use College of American Pathologists proficiency testing survey results to report on the results from surveys on next-generation sequencing-based oncology testing practices. - College of American Pathologists proficiency testing survey results from more than 250 laboratories currently performing molecular oncology testing were used to determine laboratory trends in next-generation sequencing-based oncology testing. - These presented data provide key information about the number of laboratories that currently offer or are planning to offer next-generation sequencing-based oncology testing. Furthermore, we present data from 60 laboratories performing next-generation sequencing-based oncology testing regarding specimen requirements and assay characteristics. The findings indicate that most laboratories are performing tumor-only targeted sequencing to detect single-nucleotide variants and small insertions and deletions, using desktop sequencers and predesigned commercial kits. Despite these trends, a diversity of approaches to testing exists. - This information should be useful to further inform a variety of topics, including national discussions involving clinical laboratory quality systems, regulation and oversight of next-generation sequencing-based oncology testing, and precision oncology efforts in a data-driven manner.
Genome Sequencing and Assembly by Long Reads in Plants
Li, Changsheng; Lin, Feng; An, Dong; Huang, Ruidong
2017-01-01
Plant genomes generated by Sanger and Next Generation Sequencing (NGS) have provided insight into species diversity and evolution. However, Sanger sequencing is limited in its applications due to high cost, labor intensity, and low throughput, while NGS reads are too short to resolve abundant repeats and polyploidy, leading to incomplete or ambiguous assemblies. The advent and improvement of long-read sequencing by Third Generation Sequencing (TGS) methods such as PacBio and Nanopore have shown promise in producing high-quality assemblies for complex genomes. Here, we review the development of sequencing, introducing the application as well as considerations of experimental design in TGS of plant genomes. We also introduce recent revolutionary scaffolding technologies including BioNano, Hi-C, and 10× Genomics. We expect that the informative guidance for genome sequencing and assembly by long reads will benefit the initiation of scientists’ projects. PMID:29283420
Beigh, Mohammad Muzafar
2016-01-01
Humans have predicted the relationship between heredity and diseases for a long time. Only in the beginning of the last century, scientists begin to discover the connotations between different genes and disease phenotypes. Recent trends in next-generation sequencing (NGS) technologies have brought a great momentum in biomedical research that in turn has remarkably augmented our basic understanding of human biology and its associated diseases. State-of-the-art next generation biotechnologies have started making huge strides in our current understanding of mechanisms of various chronic illnesses like cancers, metabolic disorders, neurodegenerative anomalies, etc. We are experiencing a renaissance in biomedical research primarily driven by next generation biotechnologies like genomics, transcriptomics, proteomics, metabolomics, lipidomics etc. Although genomic discoveries are at the forefront of next generation omics technologies, however, their implementation into clinical arena had been painstakingly slow mainly because of high reaction costs and unavailability of requisite computational tools for large-scale data analysis. However rapid innovations and steadily lowering cost of sequence-based chemistries along with the development of advanced bioinformatics tools have lately prompted launching and implementation of large-scale massively parallel genome sequencing programs in different fields ranging from medical genetics, infectious biology, agriculture sciences etc. Recent advances in large-scale omics-technologies is bringing healthcare research beyond the traditional “bench to bedside” approach to more of a continuum that will include improvements, in public healthcare and will be primarily based on predictive, preventive, personalized, and participatory medicine approach (P4). Recent large-scale research projects in genetic and infectious disease biology have indicated that massively parallel whole-genome/whole-exome sequencing, transcriptome analysis, and other functional genomic tools can reveal large number of unique functional elements and/or markers that otherwise would be undetected by traditional sequencing methodologies. Therefore, latest trends in the biomedical research is giving birth to the new branch in medicine commonly referred to as personalized and/or precision medicine. Developments in the post-genomic era are believed to completely restructure the present clinical pattern of disease prevention and treatment as well as methods of diagnosis and prognosis. The next important step in the direction of the precision/personalized medicine approach should be its early adoption in clinics for future medical interventions. Consequently, in coming year’s next generation biotechnologies will reorient medical practice more towards disease prediction and prevention approaches rather than curing them at later stages of their development and progression, even at wider population level(s) for general public healthcare system. PMID:28930123
Stajdohar, Miha; Rosengarten, Rafael D; Kokosar, Janez; Jeran, Luka; Blenkus, Domen; Shaulsky, Gad; Zupan, Blaz
2017-06-02
Dictyostelium discoideum, a soil-dwelling social amoeba, is a model for the study of numerous biological processes. Research in the field has benefited mightily from the adoption of next-generation sequencing for genomics and transcriptomics. Dictyostelium biologists now face the widespread challenges of analyzing and exploring high dimensional data sets to generate hypotheses and discovering novel insights. We present dictyExpress (2.0), a web application designed for exploratory analysis of gene expression data, as well as data from related experiments such as Chromatin Immunoprecipitation sequencing (ChIP-Seq). The application features visualization modules that include time course expression profiles, clustering, gene ontology enrichment analysis, differential expression analysis and comparison of experiments. All visualizations are interactive and interconnected, such that the selection of genes in one module propagates instantly to visualizations in other modules. dictyExpress currently stores the data from over 800 Dictyostelium experiments and is embedded within a general-purpose software framework for management of next-generation sequencing data. dictyExpress allows users to explore their data in a broader context by reciprocal linking with dictyBase-a repository of Dictyostelium genomic data. In addition, we introduce a companion application called GenBoard, an intuitive graphic user interface for data management and bioinformatics analysis. dictyExpress and GenBoard enable broad adoption of next generation sequencing based inquiries by the Dictyostelium research community. Labs without the means to undertake deep sequencing projects can mine the data available to the public. The entire information flow, from raw sequence data to hypothesis testing, can be accomplished in an efficient workspace. The software framework is generalizable and represents a useful approach for any research community. To encourage more wide usage, the backend is open-source, available for extension and further development by bioinformaticians and data scientists.
Hoogestraat, Daniel R.; Abbott, April N.; SenGupta, Dhruba J.; Cummings, Lisa A.; Butler-Wu, Susan M.; Stephens, Karen; Cookson, Brad T.; Hoffman, Noah G.
2014-01-01
Some bacterial infections involve potentially complex mixtures of species that can now be distinguished using next-generation DNA sequencing. We present a case of mastoiditis where Gram stain, culture, and molecular diagnosis were nondiagnostic or discrepant. Next-generation sequencing implicated coinfection of Fusobacterium nucleatum and Actinomyces israelii, resolving these diagnostic discrepancies. PMID:24574281
Molecular Characterization of Transgenic Events Using Next Generation Sequencing Approach.
Guttikonda, Satish K; Marri, Pradeep; Mammadov, Jafar; Ye, Liang; Soe, Khaing; Richey, Kimberly; Cruse, James; Zhuang, Meibao; Gao, Zhifang; Evans, Clive; Rounsley, Steve; Kumpatla, Siva P
2016-01-01
Demand for the commercial use of genetically modified (GM) crops has been increasing in light of the projected growth of world population to nine billion by 2050. A prerequisite of paramount importance for regulatory submissions is the rigorous safety assessment of GM crops. One of the components of safety assessment is molecular characterization at DNA level which helps to determine the copy number, integrity and stability of a transgene; characterize the integration site within a host genome; and confirm the absence of vector DNA. Historically, molecular characterization has been carried out using Southern blot analysis coupled with Sanger sequencing. While this is a robust approach to characterize the transgenic crops, it is both time- and resource-consuming. The emergence of next-generation sequencing (NGS) technologies has provided highly sensitive and cost- and labor-effective alternative for molecular characterization compared to traditional Southern blot analysis. Herein, we have demonstrated the successful application of both whole genome sequencing and target capture sequencing approaches for the characterization of single and stacked transgenic events and compared the results and inferences with traditional method with respect to key criteria required for regulatory submissions.
USDA-ARS?s Scientific Manuscript database
Next Generation Sequencing is transforming the way scientists collect and measure an organism’s genetic background and gene dynamics, while bioinformatics and super-computing are merging to facilitate parallel sample computation and interpretation at unprecedented speeds. Analyzing the complete gene...
An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics
2010-01-01
Background Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. Description An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date. Conclusions Hadoop and the MapReduce programming paradigm already have a substantial base in the bioinformatics community, especially in the field of next-generation sequencing analysis, and such use is increasing. This is due to the cost-effectiveness of Hadoop-based analysis on commodity Linux clusters, and in the cloud via data upload to cloud vendors who have implemented Hadoop/HBase; and due to the effectiveness and ease-of-use of the MapReduce method in parallelization of many data analysis algorithms. PMID:21210976
An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics.
Taylor, Ronald C
2010-12-21
Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date. Hadoop and the MapReduce programming paradigm already have a substantial base in the bioinformatics community, especially in the field of next-generation sequencing analysis, and such use is increasing. This is due to the cost-effectiveness of Hadoop-based analysis on commodity Linux clusters, and in the cloud via data upload to cloud vendors who have implemented Hadoop/HBase; and due to the effectiveness and ease-of-use of the MapReduce method in parallelization of many data analysis algorithms.
2016-07-06
1 Targeted next-generation sequencing for the detection of ciprofloxacin resistance markers using molecular inversion probes Christopher P...development and evaluation of a panel of 44 single-stranded molecular inversion probes (MIPs) coupled to next-generation sequencing (NGS) for the...padlock and molecular inversion probes as upfront enrichment steps for use with NGS showed the specificity and multiplexability of these techniques
USDA-ARS?s Scientific Manuscript database
Next-generation sequencing technologies were used to rapidly and efficiently sequence the genome of the domestic turkey (Meleagris gallopavo). The current genome assembly (~1.1 Gb) includes 917 Mb of sequence assigned to chromosomes. Innate heterozygosity of the sequenced bird allowed discovery of...
Wymant, Chris; Colijn, Caroline; Danaviah, Siva; Essex, Max; Frost, Simon; Gall, Astrid; Gaseitsiwe, Simani; Grabowski, Mary K.; Gray, Ronald; Guindon, Stephane; von Haeseler, Arndt; Kaleebu, Pontiano; Kendall, Michelle; Kozlov, Alexey; Manasa, Justen; Minh, Bui Quang; Moyo, Sikhulile; Novitsky, Vlad; Nsubuga, Rebecca; Pillay, Sureshnee; Quinn, Thomas C.; Serwadda, David; Ssemwanga, Deogratius; Stamatakis, Alexandros; Trifinopoulos, Jana; Wawer, Maria; Brown, Andy Leigh; de Oliveira, Tulio; Kellam, Paul; Pillay, Deenan; Fraser, Christophe
2017-01-01
Abstract To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the “Phylogenetics and Networks for Generalised HIV Epidemics in Africa” consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n = 2,833; MRC/UVRI Uganda, n = 701; Mochudi Prevention Project, n = 359; Africa Health Research Institute Resistance Cohort, n = 92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3′ end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences (NGS) has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected. PMID:28540766
Ratmann, Oliver; Wymant, Chris; Colijn, Caroline; Danaviah, Siva; Essex, M; Frost, Simon D W; Gall, Astrid; Gaiseitsiwe, Simani; Grabowski, Mary; Gray, Ronald; Guindon, Stephane; von Haeseler, Arndt; Kaleebu, Pontiano; Kendall, Michelle; Kozlov, Alexey; Manasa, Justen; Minh, Bui Quang; Moyo, Sikhulile; Novitsky, Vladimir; Nsubuga, Rebecca; Pillay, Sureshnee; Quinn, Thomas C; Serwadda, David; Ssemwanga, Deogratius; Stamatakis, Alexandros; Trifinopoulos, Jana; Wawer, Maria; Leigh Brown, Andrew; de Oliveira, Tulio; Kellam, Paul; Pillay, Deenan; Fraser, Christophe
2017-05-25
To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the 'Phylogenetics and Networks for Generalised HIV Epidemics in Africa' consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n=2,833; MRC/UVRI Uganda, n=701; Mochudi Prevention Project, n=359; Africa Health Research Institute Resistance Cohort, n=92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3' end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected.
Constructing DNA Barcode Sets Based on Particle Swarm Optimization.
Wang, Bin; Zheng, Xuedong; Zhou, Shihua; Zhou, Changjun; Wei, Xiaopeng; Zhang, Qiang; Wei, Ziqi
2018-01-01
Following the completion of the human genome project, a large amount of high-throughput bio-data was generated. To analyze these data, massively parallel sequencing, namely next-generation sequencing, was rapidly developed. DNA barcodes are used to identify the ownership between sequences and samples when they are attached at the beginning or end of sequencing reads. Constructing DNA barcode sets provides the candidate DNA barcodes for this application. To increase the accuracy of DNA barcode sets, a particle swarm optimization (PSO) algorithm has been modified and used to construct the DNA barcode sets in this paper. Compared with the extant results, some lower bounds of DNA barcode sets are improved. The results show that the proposed algorithm is effective in constructing DNA barcode sets.
DraGnET: Software for storing, managing and analyzing annotated draft genome sequence data
2010-01-01
Background New "next generation" DNA sequencing technologies offer individual researchers the ability to rapidly generate large amounts of genome sequence data at dramatically reduced costs. As a result, a need has arisen for new software tools for storage, management and analysis of genome sequence data. Although bioinformatic tools are available for the analysis and management of genome sequences, limitations still remain. For example, restrictions on the submission of data and use of these tools may be imposed, thereby making them unsuitable for sequencing projects that need to remain in-house or proprietary during their initial stages. Furthermore, the availability and use of next generation sequencing in industrial, governmental and academic environments requires biologist to have access to computational support for the curation and analysis of the data generated; however, this type of support is not always immediately available. Results To address these limitations, we have developed DraGnET (Draft Genome Evaluation Tool). DraGnET is an open source web application which allows researchers, with no experience in programming and database management, to setup their own in-house projects for storing, retrieving, organizing and managing annotated draft and complete genome sequence data. The software provides a web interface for the use of BLAST, allowing users to perform preliminary comparative analysis among multiple genomes. We demonstrate the utility of DraGnET for performing comparative genomics on closely related bacterial strains. Furthermore, DraGnET can be further developed to incorporate additional tools for more sophisticated analyses. Conclusions DraGnET is designed for use either by individual researchers or as a collaborative tool available through Internet (or Intranet) deployment. For genome projects that require genome sequencing data to initially remain proprietary, DraGnET provides the means for researchers to keep their data in-house for analysis using local programs or until it is made publicly available, at which point it may be uploaded to additional analysis software applications. The DraGnET home page is available at http://www.dragnet.cvm.iastate.edu and includes example files for examining the functionalities, a link for downloading the DraGnET setup package and a link to the DraGnET source code hosted with full documentation on SourceForge. PMID:20175920
USDA-ARS?s Scientific Manuscript database
Next-generation sequencing technology such as genotyping-by-sequencing (GBS) made low-cost, but often low-coverage, whole-genome sequencing widely available. Extensive inbreeding in crop plants provides an untapped, high quality source of phased haplotypes for imputing missing genotypes. We introduc...
USDA-ARS?s Scientific Manuscript database
Next-generation sequencing technologies are able to produce high-throughput short sequence reads in a cost-effective fashion. The emergence of these technologies has not only facilitated genome sequencing but also changed the landscape of life sciences. Here I survey their major applications ranging...
Next generation sequencers: methods and applications in food-borne pathogens
USDA-ARS?s Scientific Manuscript database
Next generation sequencers are able to produce millions of short sequence reads in a high-throughput, low-cost way. The emergence of these technologies has not only facilitated genome sequencing but also started to change the landscape of life sciences. This chapter will survey their methods and app...
CGAT: a model for immersive personalized training in computational genomics
Sims, David; Ponting, Chris P.
2016-01-01
How should the next generation of genomics scientists be trained while simultaneously pursuing high quality and diverse research? CGAT, the Computational Genomics Analysis and Training programme, was set up in 2010 by the UK Medical Research Council to complement its investment in next-generation sequencing capacity. CGAT was conceived around the twin goals of training future leaders in genome biology and medicine, and providing much needed capacity to UK science for analysing genome scale data sets. Here we outline the training programme employed by CGAT and describe how it dovetails with collaborative research projects to launch scientists on the road towards independent research careers in genomics. PMID:25981124
USDA-ARS?s Scientific Manuscript database
Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...
USDA-ARS?s Scientific Manuscript database
Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...
QuickNGS elevates Next-Generation Sequencing data analysis to a new level of automation.
Wagle, Prerana; Nikolić, Miloš; Frommolt, Peter
2015-07-01
Next-Generation Sequencing (NGS) has emerged as a widely used tool in molecular biology. While time and cost for the sequencing itself are decreasing, the analysis of the massive amounts of data remains challenging. Since multiple algorithmic approaches for the basic data analysis have been developed, there is now an increasing need to efficiently use these tools to obtain results in reasonable time. We have developed QuickNGS, a new workflow system for laboratories with the need to analyze data from multiple NGS projects at a time. QuickNGS takes advantage of parallel computing resources, a comprehensive back-end database, and a careful selection of previously published algorithmic approaches to build fully automated data analysis workflows. We demonstrate the efficiency of our new software by a comprehensive analysis of 10 RNA-Seq samples which we can finish in only a few minutes of hands-on time. The approach we have taken is suitable to process even much larger numbers of samples and multiple projects at a time. Our approach considerably reduces the barriers that still limit the usability of the powerful NGS technology and finally decreases the time to be spent before proceeding to further downstream analysis and interpretation of the data.
JVM: Java Visual Mapping tool for next generation sequencing read.
Yang, Ye; Liu, Juan
2015-01-01
We developed a program JVM (Java Visual Mapping) for mapping next generation sequencing read to reference sequence. The program is implemented in Java and is designed to deal with millions of short read generated by sequence alignment using the Illumina sequencing technology. It employs seed index strategy and octal encoding operations for sequence alignments. JVM is useful for DNA-Seq, RNA-Seq when dealing with single-end resequencing. JVM is a desktop application, which supports reads capacity from 1 MB to 10 GB.
Next Generation Sequencing at the University of Chicago Genomics Core
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faber, Pieter
2013-04-24
The University of Chicago Genomics Core provides University of Chicago investigators (and external clients) access to State-of-the-Art genomics capabilities: next generation sequencing, Sanger sequencing / genotyping and micro-arrays (gene expression, genotyping, and methylation). The current presentation will highlight our capabilities in the area of ultra-high throughput sequencing analysis.
Altimari, Annalisa; de Biase, Dario; De Maglio, Giovanna; Gruppioni, Elisa; Capizzi, Elisa; Degiovanni, Alessio; D’Errico, Antonia; Pession, Annalisa; Pizzolitto, Stefano; Fiorentino, Michelangelo; Tallini, Giovanni
2013-01-01
Detection of KRAS mutations in archival pathology samples is critical for therapeutic appropriateness of anti-EGFR monoclonal antibodies in colorectal cancer. We compared the sensitivity, specificity, and accuracy of Sanger sequencing, ARMS-Scorpion (TheraScreen®) real-time polymerase chain reaction (PCR), pyrosequencing, chip array hybridization, and 454 next-generation sequencing to assess KRAS codon 12 and 13 mutations in 60 nonconsecutive selected cases of colorectal cancer. Twenty of the 60 cases were detected as wild-type KRAS by all methods with 100% specificity. Among the 40 mutated cases, 13 were discrepant with at least one method. The sensitivity was 85%, 90%, 93%, and 92%, and the accuracy was 90%, 93%, 95%, and 95% for Sanger sequencing, TheraScreen real-time PCR, pyrosequencing, and chip array hybridization, respectively. The main limitation of Sanger sequencing was its low analytical sensitivity, whereas TheraScreen real-time PCR, pyrosequencing, and chip array hybridization showed higher sensitivity but suffered from the limitations of predesigned assays. Concordance between the methods was k = 0.79 for Sanger sequencing and k > 0.85 for the other techniques. Tumor cell enrichment correlated significantly with the abundance of KRAS-mutated deoxyribonucleic acid (DNA), evaluated as ΔCt for TheraScreen real-time PCR (P = 0.03), percentage of mutation for pyrosequencing (P = 0.001), ratio for chip array hybridization (P = 0.003), and percentage of mutation for 454 next-generation sequencing (P = 0.004). Also, 454 next-generation sequencing showed the best cross correlation for quantification of mutation abundance compared with all the other methods (P < 0.001). Our comparison showed the superiority of next-generation sequencing over the other techniques in terms of sensitivity and specificity. Next-generation sequencing will replace Sanger sequencing as the reference technique for diagnostic detection of KRAS mutation in archival tumor tissues. PMID:23950653
Comparison of next generation sequencing technologies for transcriptome characterization
2009-01-01
Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms. PMID:19646272
Next-generation sequencing for targeted discovery of rare mutations in rice
USDA-ARS?s Scientific Manuscript database
Advances in DNA sequencing (i.e., next-generation sequencing, NGS) have greatly increased the power and efficiency of detecting rare mutations in large mutant populations. Targeting Induced Local Lesions in Genomes (TILLING) is a reverse genetics approach for identifying gene mutations resulting fro...
Next-Generation Genomics Facility at C-CAMP: Accelerating Genomic Research in India
S, Chandana; Russiachand, Heikham; H, Pradeep; S, Shilpa; M, Ashwini; S, Sahana; B, Jayanth; Atla, Goutham; Jain, Smita; Arunkumar, Nandini; Gowda, Malali
2014-01-01
Next-Generation Sequencing (NGS; http://www.genome.gov/12513162) is a recent life-sciences technological revolution that allows scientists to decode genomes or transcriptomes at a much faster rate with a lower cost. Genomic-based studies are in a relatively slow pace in India due to the non-availability of genomics experts, trained personnel and dedicated service providers. Using NGS there is a lot of potential to study India's national diversity (of all kinds). We at the Centre for Cellular and Molecular Platforms (C-CAMP) have launched the Next Generation Genomics Facility (NGGF) to provide genomics service to scientists, to train researchers and also work on national and international genomic projects. We have HiSeq1000 from Illumina and GS-FLX Plus from Roche454. The long reads from GS FLX Plus, and high sequence depth from HiSeq1000, are the best and ideal hybrid approaches for de novo and re-sequencing of genomes and transcriptomes. At our facility, we have sequenced around 70 different organisms comprising of more than 388 genomes and 615 transcriptomes – prokaryotes and eukaryotes (fungi, plants and animals). In addition we have optimized other unique applications such as small RNA (miRNA, siRNA etc), long Mate-pair sequencing (2 to 20 Kb), Coding sequences (Exome), Methylome (ChIP-Seq), Restriction Mapping (RAD-Seq), Human Leukocyte Antigen (HLA) typing, mixed genomes (metagenomes) and target amplicons, etc. Translating DNA sequence data from NGS sequencer into meaningful information is an important exercise. Under NGGF, we have bioinformatics experts and high-end computing resources to dissect NGS data such as genome assembly and annotation, gene expression, target enrichment, variant calling (SSR or SNP), comparative analysis etc. Our services (sequencing and bioinformatics) have been utilized by more than 45 organizations (academia and industry) both within India and outside, resulting several publications in peer-reviewed journals and several genomic/transcriptomic data is available at NCBI.
The Mouse Genomes Project: a repository of inbred laboratory mouse strain genomes.
Adams, David J; Doran, Anthony G; Lilue, Jingtao; Keane, Thomas M
2015-10-01
The Mouse Genomes Project was initiated in 2009 with the goal of using next-generation sequencing technologies to catalogue molecular variation in the common laboratory mouse strains, and a selected set of wild-derived inbred strains. The initial sequencing and survey of sequence variation in 17 inbred strains was completed in 2011 and included comprehensive catalogue of single nucleotide polymorphisms, short insertion/deletions, larger structural variants including their fine scale architecture and landscape of transposable element variation, and genomic sites subject to post-transcriptional alteration of RNA. From this beginning, the resource has expanded significantly to include 36 fully sequenced inbred laboratory mouse strains, a refined and updated data processing pipeline, and new variation querying and data visualisation tools which are available on the project's website ( http://www.sanger.ac.uk/resources/mouse/genomes/ ). The focus of the project is now the completion of de novo assembled chromosome sequences and strain-specific gene structures for the core strains. We discuss how the assembled chromosomes will power comparative analysis, data access tools and future directions of mouse genetics.
Gürtler, Nicolas; Röthlisberger, Benno; Ludin, Katja; Schlegel, Christoph; Lalwani, Anil K
2017-07-01
Identification of the causative mutation using next-generation sequencing in autosomal-dominant hereditary hearing impairment, as mutation analysis in hereditary hearing impairment by classic genetic methods, is hindered by the high heterogeneity of the disease. Two Swiss families with autosomal-dominant hereditary hearing impairment. Amplified DNA libraries for next-generation sequencing were constructed from extracted genomic DNA, derived from peripheral blood, and enriched by a custom-made sequence capture library. Validated, pooled libraries were sequenced on an Illumina MiSeq instrument, 300 cycles and paired-end sequencing. Technical data analysis was performed with SeqMonk, variant analysis with GeneTalk or VariantStudio. The detection of mutations in genes related to hearing loss by next-generation sequencing was subsequently confirmed using specific polymerase-chain-reaction and Sanger sequencing. Mutation detection in hearing-loss-related genes. The first family harbored the mutation c.5383+5delGTGA in the TECTA-gene. In the second family, a novel mutation c.2614-2625delCATGGCGCCGTG in the WFS1-gene and a second mutation TCOF1-c.1028G>A were identified. Next-generation sequencing successfully identified the causative mutation in families with autosomal-dominant hereditary hearing impairment. The results helped to clarify the pathogenic role of a known mutation and led to the detection of a novel one. NGS represents a feasible approach with great potential future in the diagnostics of hereditary hearing impairment, even in smaller labs.
Campbell, Catherine
2018-01-22
Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Campbell, Catherine
Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Next generation sequencing provides rapid access to the genome of wheat stripe rust
USDA-ARS?s Scientific Manuscript database
Background: The wheat stripe rust fungus (Puccinia striiformis f. sp. tritici, PST) is responsible for significant yield losses in wheat production worldwide. In spite of its economic importance, the PST genomic sequence is not currently available. Fortunately Next Generation Sequencing (NGS) has ra...
The invention of new approaches to DNA sequencing commonly referred to as next generation sequencing technologies is revolutionizing the study of microbial diversity. In this chapter, we discuss the characterization of microbial population structures in recreational waters and p...
ERIC Educational Resources Information Center
Bowling, Bethany; Zimmer, Erin; Pyatt, Robert E.
2014-01-01
Although the development of next-generation (NextGen) sequencing technologies has revolutionized genomic research and medicine, the incorporation of these topics into the classroom is challenging, given an implied high degree of technical complexity. We developed an easy-to-implement, interactive classroom activity investigating the similarities…
CGAT: a model for immersive personalized training in computational genomics.
Sims, David; Ponting, Chris P; Heger, Andreas
2016-01-01
How should the next generation of genomics scientists be trained while simultaneously pursuing high quality and diverse research? CGAT, the Computational Genomics Analysis and Training programme, was set up in 2010 by the UK Medical Research Council to complement its investment in next-generation sequencing capacity. CGAT was conceived around the twin goals of training future leaders in genome biology and medicine, and providing much needed capacity to UK science for analysing genome scale data sets. Here we outline the training programme employed by CGAT and describe how it dovetails with collaborative research projects to launch scientists on the road towards independent research careers in genomics. © The Author 2015. Published by Oxford University Press.
Genome Sequencing of Steroid Producing Bacteria Using Ion Torrent Technology and a Reference Genome.
Sola-Landa, Alberto; Rodríguez-García, Antonio; Barreiro, Carlos; Pérez-Redondo, Rosario
2017-01-01
The Next-Generation Sequencing technology has enormously eased the bacterial genome sequencing and several tens of thousands of genomes have been sequenced during the last 10 years. Most of the genome projects are published as draft version, however, for certain applications the complete genome sequence is required.In this chapter, we describe the strategy that allowed the complete genome sequencing of Mycobacterium neoaurum NRRL B-3805, an industrial strain exploited for steroid production, using Ion Torrent sequencing reads and the genome of a close strain as the reference. This protocol can be applied to analyze the genetic variations between closely related strains; for example, to elucidate the point mutations between a parental strain and a random mutagenesis-derived mutant.
Genetically improved BarraCUDA.
Langdon, W B; Lam, Brian Yee Hong
2017-01-01
BarraCUDA is an open source C program which uses the BWA algorithm in parallel with nVidia CUDA to align short next generation DNA sequences against a reference genome. Recently its source code was optimised using "Genetic Improvement". The genetically improved (GI) code is up to three times faster on short paired end reads from The 1000 Genomes Project and 60% more accurate on a short BioPlanet.com GCAT alignment benchmark. GPGPU BarraCUDA running on a single K80 Tesla GPU can align short paired end nextGen sequences up to ten times faster than bwa on a 12 core server. The speed up was such that the GI version was adopted and has been regularly downloaded from SourceForge for more than 12 months.
Caruccio, Nicholas
2011-01-01
DNA library preparation is a common entry point and bottleneck for next-generation sequencing. Current methods generally consist of distinct steps that often involve significant sample loss and hands-on time: DNA fragmentation, end-polishing, and adaptor-ligation. In vitro transposition with Nextera™ Transposomes simultaneously fragments and covalently tags the target DNA, thereby combining these three distinct steps into a single reaction. Platform-specific sequencing adaptors can be added, and the sample can be enriched and bar-coded using limited-cycle PCR to prepare di-tagged DNA fragment libraries. Nextera technology offers a streamlined, efficient, and high-throughput method for generating bar-coded libraries compatible with multiple next-generation sequencing platforms.
2013-10-01
proposal for the first phase of the project, gene expression and epigenetic alterations are to be analyzed by next generation sequencing. Laser captured... genes and also alternative splicing events which could provide valuable biomarkers for this project. We required RRBS libraries for the methylation...regulated compared to BP in both the benign tissue adjacent to cancer (BPC) and in prostate cancer. Interestingly, both of these genes can also be
Next-Generation Sequencing in the Mycology Lab.
Zoll, Jan; Snelders, Eveline; Verweij, Paul E; Melchers, Willem J G
New state-of-the-art techniques in sequencing offer valuable tools in both detection of mycobiota and in understanding of the molecular mechanisms of resistance against antifungal compounds and virulence. Introduction of new sequencing platform with enhanced capacity and a reduction in costs for sequence analysis provides a potential powerful tool in mycological diagnosis and research. In this review, we summarize the applications of next-generation sequencing techniques in mycology.
USDA-ARS?s Scientific Manuscript database
Advances in Next Generation Sequencing (NGS) allow for rapid development of genomics resources needed to generate molecular diagnostics assays for infectious agents. NGS approaches are particularly helpful for organisms that cannot be cultured, such as the downy mildew pathogens, a group of biotrop...
2013-01-01
Background Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous computational tools to analyze and assemble the millions to billions of short sequencing reads produced by these technologies. While these tools filled an important gap, current approaches for storing, processing, and analyzing short read datasets generally have remained simple and lack the complexity needed to efficiently model the produced reads and assemble them correctly. Results Previously, we presented an overlap graph coarsening scheme for modeling read overlap relationships on multiple levels. Most current read assembly and analysis approaches use a single graph or set of clusters to represent the relationships among a read dataset. Instead, we use a series of graphs to represent the reads and their overlap relationships across a spectrum of information granularity. At each information level our algorithm is capable of generating clusters of reads from the reduced graph, forming an integrated graph modeling and clustering approach for read analysis and assembly. Previously we applied our algorithm to simulated and real 454 datasets to assess its ability to efficiently model and cluster next generation sequencing data. In this paper we extend our algorithm to large simulated and real Illumina datasets to demonstrate that our algorithm is practical for both sequencing technologies. Conclusions Our overlap graph theoretic algorithm is able to model next generation sequencing reads at various levels of granularity through the process of graph coarsening. Additionally, our model allows for efficient representation of the read overlap relationships, is scalable for large datasets, and is practical for both Illumina and 454 sequencing technologies. PMID:24564333
Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.
Doan, Ryan; Cohen, Noah D; Sawyer, Jason; Ghaffari, Noushin; Johnson, Charlie D; Dindot, Scott V
2012-02-17
The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.
Gao, M L; Zhong, X M; Ma, X; Ning, H J; Zhu, D; Zou, J Z
2016-06-02
To make genetic diagnosis of Alagille syndrome (ALGS) patients using target gene sequence capture and next generation sequencing technology. Target gene sequence capture and next generation sequencing were used to detect ALGS gene of 4 patients. They were hospitalized at the Affiliated Hospital, Capital Institute of Pediatrics between January 2014 and December 2015, referred to clinical diagnosis of ALGS typical and atypical respectively in 2 cases. Blood samples were collected from patients and their parents and genomic DNA was extracted from lymphocytes. Target gene sequence capture and next generation sequencing was detected. Sanger sequencing was used to confirm the results of the patients and their parents. Cholestasis, heart defects, inverted triangular face and butterfly vertebrae were presented as main clinical features in 4 male patients. The first hospital visiting ages ranged from 3 months and 14 days to 3 years and 1 month. The age of onset ranged from 3 days to 42 days (median 23 days). According to the clinical diagnostic criteria of ALGS, patient 1 and patient 2 were considered as typical ALGS. The other 2 patients were considered as atypical ALGS. Four Jagged 1(JAG1) pathogenic mutations were detected. Three different missense mutations were detected in patient 1 to patient 3 with ALGS(c.839C>T(p.W280X), c. 703G>A(p.R235X), c. 1720C>T(p.V574M)). The JAG1 mutation of patient 3 was first reported. Patient 4 had one novel insertion mutation (c.1779_1780insA(p.Ile594AsnfsTer23)). Parental analysis verified that the JAG1 missense mutation of 3 patients were de novo. The results of sanger sequencing was consistent with the results of the next generation sequencing. Target gene sequence capture combined with next generation sequencing can detect two pathogenic genes in ALGS and test genes of other related diseases in infantile cholestatic diseases simultaneously and presents a high throughput, high efficiency and low cost. It may provide molecular diagnosis and treatment for clinicians with good clinical application prospects.
Advanced Applications of Next-Generation Sequencing Technologies to Orchid Biology.
Yeh, Chuan-Ming; Liu, Zhong-Jian; Tsai, Wen-Chieh
2018-01-01
Next-generation sequencing technologies are revolutionizing biology by permitting, transcriptome sequencing, whole-genome sequencing and resequencing, and genome-wide single nucleotide polymorphism profiling. Orchid research has benefited from this breakthrough, and a few orchid genomes are now available; new biological questions can be approached and new breeding strategies can be designed. The first part of this review describes the unique features of orchid biology. The second part provides an overview of the current next-generation sequencing platforms, many of which are already used in plant laboratories. The third part summarizes the state of orchid transcriptome and genome sequencing and illustrates current achievements. The genetic sequences currently obtained will not only provide a broad scope for the study of orchid biology, but also serves as a starting point for uncovering the mystery of orchid evolution.
Human Y chromosome copy number variation in the next generation sequencing era and beyond.
Massaia, Andrea; Xue, Yali
2017-05-01
The human Y chromosome provides a fertile ground for structural rearrangements owing to its haploidy and high content of repeated sequences. The methodologies used for copy number variation (CNV) studies have developed over the years. Low-throughput techniques based on direct observation of rearrangements were developed early on, and are still used, often to complement array-based or sequencing approaches which have limited power in regions with high repeat content and specifically in the presence of long, identical repeats, such as those found in human sex chromosomes. Some specific rearrangements have been investigated for decades; because of their effects on fertility, or their outstanding evolutionary features, the interest in these has not diminished. However, following the flourishing of large-scale genomics, several studies have investigated CNVs across the whole chromosome. These studies sometimes employ data generated within large genomic projects such as the DDD study or the 1000 Genomes Project, and often survey large samples of healthy individuals without any prior selection. Novel technologies based on sequencing long molecules and combinations of technologies, promise to stimulate the study of Y-CNVs in the immediate future.
Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data
Kosugi, Shunichi; Natsume, Satoshi; Yoshida, Kentaro; MacLean, Daniel; Cano, Liliana; Kamoun, Sophien; Terauchi, Ryohei
2013-01-01
Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in ‘targeted’ alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/. PMID:24116042
Ai, Jing-Wen; Li, Yang; Cheng, Qi; Cui, Peng; Wu, Hong-Long; Xu, Bin; Zhang, Wen-Hong
2018-06-01
A 45-year-old man who complained of continuous fever and multiple hepatic masses was admitted to our hospital. Repeated MRI manifestations were similar while each radiological report suggested contradictory diagnosis pointing to infections or malignances respectively. Pathologic examination of the liver tissue showed no direct evidence of either infections or tumor. We performed next-generation sequencing on the liver tissue and peripheral blood to further investigate the possible etiology. High throughput sequencing was performed on the liver lesion tissues using BGISEQ-100 platform, and data was mapped to the Microbial Genome Databases after filtering low quality data and human reads. We identified a total of 299 sequencing reads of Mycobacterium tuberculosis (M. tuberculosis) complex sequences from the liver tissue, including 8, 229 of 4,424,435 of the M. tuberculosis nucleotide sequences, and Mycobacterium africanum, Mycobacterium bovis, and Mycobacterium canettii were also detected due to the 99.9% identical rate among these strains. No specific Mycobacterial tuberculosis nucleotide sequence was detected in the sample of peripheral blood. Patient's symptom quickly recovered after anti-tuberculosis treatment and repeated Ziehl-Neelsen staining of the liver tissue finally identified small numbers of positive bacillus. The diagnosis of this patient was difficult to establish before the next-generation sequencing because of contradictive radiological results and negative pathological findings. More sensitive diagnostic methods are urgently needed. This is the first case reporting hepatic tuberculosis confirmed by the next-generation sequencing, and marks the promising potential of the application of the next-generation sequencing in the diagnosis of hepatic lesions with unknown etiology. Copyright © 2018 Elsevier Masson SAS. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Next generation sequencing (NGS) technology was used to analyze the occurrence of viruses in Sorghum almum plants in Florida exhibiting mosaic symptoms. Total RNA was extracted from symptomatic leaves and used as a template for cDNA library preparation. The resulting library was sequenced on an Illu...
Design of association studies with pooled or un-pooled next-generation sequencing data.
Kim, Su Yeon; Li, Yingrui; Guo, Yiran; Li, Ruiqiang; Holmkvist, Johan; Hansen, Torben; Pedersen, Oluf; Wang, Jun; Nielsen, Rasmus
2010-07-01
Most common hereditary diseases in humans are complex and multifactorial. Large-scale genome-wide association studies based on SNP genotyping have only identified a small fraction of the heritable variation of these diseases. One explanation may be that many rare variants (a minor allele frequency, MAF <5%), which are not included in the common genotyping platforms, may contribute substantially to the genetic variation of these diseases. Next-generation sequencing, which would allow the analysis of rare variants, is now becoming so cheap that it provides a viable alternative to SNP genotyping. In this paper, we present cost-effective protocols for using next-generation sequencing in association mapping studies based on pooled and un-pooled samples, and identify optimal designs with respect to total number of individuals, number of individuals per pool, and the sequencing coverage. We perform a small empirical study to evaluate the pooling variance in a realistic setting where pooling is combined with exon-capturing. To test for associations, we develop a likelihood ratio statistic that accounts for the high error rate of next-generation sequencing data. We also perform extensive simulations to determine the power and accuracy of this method. Overall, our findings suggest that with a fixed cost, sequencing many individuals at a more shallow depth with larger pool size achieves higher power than sequencing a small number of individuals in higher depth with smaller pool size, even in the presence of high error rates. Our results provide guidelines for researchers who are developing association mapping studies based on next-generation sequencing. (c) 2010 Wiley-Liss, Inc.
Wu, Wei; Lu, Chao-Xia; Wang, Yi-Ning; Liu, Fang; Chen, Wei; Liu, Yong-Tai; Han, Ye-Chen; Cao, Jian; Zhang, Shu-Yang; Zhang, Xue
2015-07-10
MYBPC3 dysfunctions have been proven to induce dilated cardiomyopathy, hypertrophic cardiomyopathy, and/or left ventricular noncompaction; however, the genotype-phenotype correlation between MYBPC3 and restrictive cardiomyopathy (RCM) has not been established. The newly developed next-generation sequencing method is capable of broad genomic DNA sequencing with high throughput and can help explore novel correlations between genetic variants and cardiomyopathies. A proband from a multigenerational family with 3 live patients and 1 unrelated patient with clinical diagnoses of RCM underwent a next-generation sequencing workflow based on a custom AmpliSeq panel, including 64 candidate pathogenic genes for cardiomyopathies, on the Ion Personal Genome Machine high-throughput sequencing benchtop instrument. The selected panel contained a total of 64 genes that were reportedly associated with inherited cardiomyopathies. All patients fulfilled strict criteria for RCM with clinical characteristics, echocardiography, and/or cardiac magnetic resonance findings. The multigenerational family with 3 adult RCM patients carried an identical nonsense MYBPC3 mutation, and the unrelated patient carried a missense mutation in the MYBPC3 gene. All of these results were confirmed by the Sanger sequencing method. This study demonstrated that MYBPC3 gene mutations, revealed by next-generation sequencing, were associated with familial and sporadic RCM patients. It is suggested that the next-generation sequencing platform with a selected panel provides a highly efficient approach for molecular diagnosis of hereditary and idiopathic RCM and helps build new genotype-phenotype correlations. © 2015 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley Blackwell.
Detection of Bacillus anthracis DNA in Complex Soil and Air Samples Using Next-Generation Sequencing
Be, Nicholas A.; Thissen, James B.; Gardner, Shea N.; McLoughlin, Kevin S.; Fofanov, Viacheslav Y.; Koshinsky, Heather; Ellingson, Sally R.; Brettin, Thomas S.; Jackson, Paul J.; Jaing, Crystal J.
2013-01-01
Bacillus anthracis is the potentially lethal etiologic agent of anthrax disease, and is a significant concern in the realm of biodefense. One of the cornerstones of an effective biodefense strategy is the ability to detect infectious agents with a high degree of sensitivity and specificity in the context of a complex sample background. The nature of the B. anthracis genome, however, renders specific detection difficult, due to close homology with B. cereus and B. thuringiensis. We therefore elected to determine the efficacy of next-generation sequencing analysis and microarrays for detection of B. anthracis in an environmental background. We applied next-generation sequencing to titrated genome copy numbers of B. anthracis in the presence of background nucleic acid extracted from aerosol and soil samples. We found next-generation sequencing to be capable of detecting as few as 10 genomic equivalents of B. anthracis DNA per nanogram of background nucleic acid. Detection was accomplished by mapping reads to either a defined subset of reference genomes or to the full GenBank database. Moreover, sequence data obtained from B. anthracis could be reliably distinguished from sequence data mapping to either B. cereus or B. thuringiensis. We also demonstrated the efficacy of a microbial census microarray in detecting B. anthracis in the same samples, representing a cost-effective and high-throughput approach, complementary to next-generation sequencing. Our results, in combination with the capacity of sequencing for providing insights into the genomic characteristics of complex and novel organisms, suggest that these platforms should be considered important components of a biosurveillance strategy. PMID:24039948
Software for pre-processing Illumina next-generation sequencing short read sequences
2014-01-01
Background When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets. Methods We developed ngsShoRT (next-generation sequencing Short Reads Trimmer), a flexible and comprehensive open-source software package written in Perl that provides a set of algorithms commonly used for pre-processing NGS short read sequences. We compared the features and performance of ngsShoRT with existing tools: CutAdapt, NGS QC Toolkit and Trimmomatic. We also compared the effects of using pre-processed short read sequences generated by different algorithms on de novo and reference-based assembly for three different genomes: Caenorhabditis elegans, Saccharomyces cerevisiae S288c, and Escherichia coli O157 H7. Results Several combinations of ngsShoRT algorithms were tested on publicly available Illumina GA II, HiSeq 2000, and MiSeq eukaryotic and bacteria genomic short read sequences with the focus on removing sequencing artifacts and low-quality reads and/or bases. Our results show that across three organisms and three sequencing platforms, trimming improved the mean quality scores of trimmed sequences. Using trimmed sequences for de novo and reference-based assembly improved assembly quality as well as assembler performance. In general, ngsShoRT outperformed comparable trimming tools in terms of trimming speed and improvement of de novo and reference-based assembly as measured by assembly contiguity and correctness. Conclusions Trimming of short read sequences can improve the quality of de novo and reference-based assembly and assembler performance. The parallel processing capability of ngsShoRT reduces trimming time and improves the memory efficiency when dealing with large datasets. We recommend combining sequencing artifacts removal, and quality score based read filtering and base trimming as the most consistent method for improving sequence quality and downstream assemblies. ngsShoRT source code, user guide and tutorial are available at http://research.bioinformatics.udel.edu/genomics/ngsShoRT/. ngsShoRT can be incorporated as a pre-processing step in genome and transcriptome assembly projects. PMID:24955109
Fu, Xiaona; Liu, Aijie; Yang, Haipo; Wei, Cuijie; Ding, Juan; Wang, Shuang; Wang, Jingmin; Yuan, Yun; Jiang, Yuwu; Xiong, Hui
2015-10-01
To elucidate the usefulness of next generation sequencing for diagnosis of inherited myopathy, and to analyze the relevance between clinical phenotype and genotype in inherited myopathy. Related genes were selected for SureSelect target enrichment system kit (Panel Version 1 and Panel Version 2). A total of 134 patients who were diagnosed as inherited myopathy clinically underwent next generation sequencing in Department of Pediatrics, Peking University First Hospital from January 2013 to June 2014. Clinical information and gene detection result of the patients were collected and analyzed. Seventy-seven of 134 patients (89 males and 45 females, visiting ages from 6-month-old to 26-year-old, average visiting age was 6 years and 1 month) underwent next generation sequencing by Panel Version 1 in 2013, and 57 patients underwent next generation sequencing by Panel Version 2 in 2014. The gene detection revealed that 74 patients had pathogenic gene mutations, and the positive rate of genetic diagnosis was 55.22%. One patient was diagnosed as metabolic myopathy. Five patients were diagnosed as congenital myopathy; 68 were diagnosed as muscular dystrophy, including 22 with congenital muscular dystrophy 1A (MDC1A), 11 with Ullrich congenital muscular dystrophy (UCMD), 6 with Bethlem myopathy (BM), 12 with Duchenne muscular dystrophy (DMD) caused by point mutations in DMD gene, 5 with LMNA-related congenital muscular dystrophy (L-CMD), 1 with Emery-Dreifuss muscular dystrophy (EDMD), 7 with alpha-dystroglycanopathy (α-DG) patients, and 4 with limb-girdle muscular dystrophy (LGMD) patients. Next generation sequencing plays an important role in diagnosis of inherited myopathy. Clinical and biological information analysis was essential for screening pathogenic gene of inherited myopathy.
Y and W Chromosome Assemblies: Approaches and Discoveries.
Tomaszkiewicz, Marta; Medvedev, Paul; Makova, Kateryna D
2017-04-01
Hundreds of vertebrate genomes have been sequenced and assembled to date. However, most sequencing projects have ignored the sex chromosomes unique to the heterogametic sex - Y and W - that are known as sex-limited chromosomes (SLCs). Indeed, haploid and repetitive Y chromosomes in species with male heterogamety (XY), and W chromosomes in species with female heterogamety (ZW), are difficult to sequence and assemble. Nevertheless, obtaining their sequences is important for understanding the intricacies of vertebrate genome function and evolution. Recent progress has been made towards the adaptation of next-generation sequencing (NGS) techniques to deciphering SLC sequences. We review here currently available methodology and results with regard to SLC sequencing and assembly. We focus on vertebrates, but bring in some examples from other taxa. Copyright © 2017 Elsevier Ltd. All rights reserved.
A Concise Atlas of Thyroid Cancer Next-Generation Sequencing Panel ThyroSeq v.2
Alsina, Jorge; Alsina, Raul; Gulec, Seza
2017-01-01
The next-generation sequencing technology allows high out-put genomic analysis. An innovative assay in thyroid cancer, ThyroSeq® was developed for targeted mutation detection by next generation sequencing technology in fine needle aspiration and tissue samples. ThyroSeq v.2 next generation sequencing panel offers simultaneous sequencing and detection in >1000 hotspots of 14 thyroid cancer-related genes and for 42 types of gene fusions known to occur in thyroid cancer. ThyroSeq is being increasingly used to further narrow the indeterminate category defined by cytology for thyroid nodules. From a surgical perspective, genomic profiling also provides prognostic and predictive information and closely relates to determination of surgical strategy. Both the genomic analysis technology and the informatics for the cancer genome data base are rapidly developing. In this paper, we have gathered existing information on the thyroid cancer-related genes involved in the initiation and progression of thyroid cancer. Our goal is to assemble a glossary for the current ThyroSeq genomic panel that can help elucidate the role genomics play in thyroid cancer oncogenesis. PMID:28117295
Detection of a divergent variant of grapevine virus F by next-generation sequencing.
Molenaar, Nicholas; Burger, Johan T; Maree, Hans J
2015-08-01
The complete genome sequence of a South African isolate of grapevine virus F (GVF) is presented. It was first detected by metagenomic next-generation sequencing of field samples and validated through direct Sanger sequencing. The genome sequence of GVF isolate V5 consists of 7539 nucleotides and contains a poly(A) tail. It has a typical vitivirus genome arrangement that comprises five open reading frames (ORFs), which share only 88.96 % nucleotide sequence identity with the existing complete GVF genome sequence (JX105428).
Hadoop-BAM: directly manipulating next generation sequencing data in the cloud
Niemenmaa, Matti; Kallio, Aleksi; Schumacher, André; Klemelä, Petri; Korpelainen, Eija; Heljanko, Keijo
2012-01-01
Summary: Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large-scale distributed processing. In this article we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps. Availability: Available under the open-source MIT license at http://sourceforge.net/projects/hadoop-bam/ Contact: matti.niemenmaa@aalto.fi Supplementary information: Supplementary material is available at Bioinformatics online. PMID:22302568
Uribe-Convers, Simon; Duke, Justin R.; Moore, Michael J.; Tank, David C.
2014-01-01
• Premise of the study: We present an alternative approach for molecular systematic studies that combines long PCR and next-generation sequencing. Our approach can be used to generate templates from any DNA source for next-generation sequencing. Here we test our approach by amplifying complete chloroplast genomes, and we present a set of 58 potentially universal primers for angiosperms to do so. Additionally, this approach is likely to be particularly useful for nuclear and mitochondrial regions. • Methods and Results: Chloroplast genomes of 30 species across angiosperms were amplified to test our approach. Amplification success varied depending on whether PCR conditions were optimized for a given taxon. To further test our approach, some amplicons were sequenced on an Illumina HiSeq 2000. • Conclusions: Although here we tested this approach by sequencing plastomes, long PCR amplicons could be generated using DNA from any genome, expanding the possibilities of this approach for molecular systematic studies. PMID:25202592
2013-01-01
Background The revolution in DNA sequencing technology continues unabated, and is affecting all aspects of the biological and medical sciences. The training and recruitment of the next generation of researchers who are able to use and exploit the new technology is severely lacking and potentially negatively influencing research and development efforts to advance genome biology. Here we present a cross-disciplinary course that provides undergraduate students with practical experience in running a next generation sequencing instrument through to the analysis and annotation of the generated DNA sequences. Results Many labs across world are installing next generation sequencing technology and we show that the undergraduate students produce quality sequence data and were excited to participate in cutting edge research. The students conducted the work flow from DNA extraction, library preparation, running the sequencing instrument, to the extraction and analysis of the data. They sequenced microbes, metagenomes, and a marine mammal, the Californian sea lion, Zalophus californianus. The students met sequencing quality controls, had no detectable contamination in the targeted DNA sequences, provided publication quality data, and became part of an international collaboration to investigate carcinomas in carnivores. Conclusions Students learned important skills for their future education and career opportunities, and a perceived increase in students’ ability to conduct independent scientific research was measured. DNA sequencing is rapidly expanding in the life sciences. Teaching undergraduates to use the latest technology to sequence genomic DNA ensures they are ready to meet the challenges of the genomic era and allows them to participate in annotating the tree of life. PMID:24007365
VAMPS: a website for visualization and analysis of microbial population structures.
Huse, Susan M; Mark Welch, David B; Voorhis, Andy; Shipunova, Anna; Morrison, Hilary G; Eren, A Murat; Sogin, Mitchell L
2014-02-05
The advent of next-generation DNA sequencing platforms has revolutionized molecular microbial ecology by making the detailed analysis of complex communities over time and space a tractable research pursuit for small research groups. However, the ability to generate 10⁵-10⁸ reads with relative ease brings with it many downstream complications. Beyond the computational resources and skills needed to process and analyze data, it is difficult to compare datasets in an intuitive and interactive manner that leads to hypothesis generation and testing. We developed the free web service VAMPS (Visualization and Analysis of Microbial Population Structures, http://vamps.mbl.edu) to address these challenges and to facilitate research by individuals or collaborating groups working on projects with large-scale sequencing data. Users can upload marker gene sequences and associated metadata; reads are quality filtered and assigned to both taxonomic structures and to taxonomy-independent clusters. A simple point-and-click interface allows users to select for analysis any combination of their own or their collaborators' private data and data from public projects, filter these by their choice of taxonomic and/or abundance criteria, and then explore these data using a wide range of analytic methods and visualizations. Each result is extensively hyperlinked to other analysis and visualization options, promoting data exploration and leading to a greater understanding of data relationships. VAMPS allows researchers using marker gene sequence data to analyze the diversity of microbial communities and the relationships between communities, to explore these analyses in an intuitive visual context, and to download data, results, and images for publication. VAMPS obviates the need for individual research groups to make the considerable investment in computational infrastructure and bioinformatic support otherwise necessary to process, analyze, and interpret massive amounts of next-generation sequence data. Any web-capable device can be used to upload, process, explore, and extract data and results from VAMPS. VAMPS encourages researchers to share sequence and metadata, and fosters collaboration between researchers of disparate biomes who recognize common patterns in shared data.
A vertebrate case study of the quality of assemblies derived from next-generation sequences
2011-01-01
The unparalleled efficiency of next-generation sequencing (NGS) has prompted widespread adoption, but significant problems remain in the use of NGS data for whole genome assembly. We explore the advantages and disadvantages of chicken genome assemblies generated using a variety of sequencing and assembly methodologies. NGS assemblies are equivalent in some ways to a Sanger-based assembly yet deficient in others. Nonetheless, these assemblies are sufficient for the identification of the majority of genes and can reveal novel sequences when compared to existing assembly references. PMID:21453517
2014-01-01
Background Next-generation DNA sequencing (NGS) technologies have made huge impacts in many fields of biological research, but especially in evolutionary biology. One area where NGS has shown potential is for high-throughput sequencing of complete mtDNA genomes (of humans and other animals). Despite the increasing use of NGS technologies and a better appreciation of their importance in answering biological questions, there remain significant obstacles to the successful implementation of NGS-based projects, especially for new users. Results Here we present an ‘A to Z’ protocol for obtaining complete human mitochondrial (mtDNA) genomes – from DNA extraction to consensus sequence. Although designed for use on humans, this protocol could also be used to sequence small, organellar genomes from other species, and also nuclear loci. This protocol includes DNA extraction, PCR amplification, fragmentation of PCR products, barcoding of fragments, sequencing using the 454 GS FLX platform, and a complete bioinformatics pipeline (primer removal, reference-based mapping, output of coverage plots and SNP calling). Conclusions All steps in this protocol are designed to be straightforward to implement, especially for researchers who are undertaking next-generation sequencing for the first time. The molecular steps are scalable to large numbers (hundreds) of individuals and all steps post-DNA extraction can be carried out in 96-well plate format. Also, the protocol has been assembled so that individual ‘modules’ can be swapped out to suit available resources. PMID:24460871
Historical Perspective, Development and Applications of Next-Generation Sequencing in Plant Virology
Barba, Marina; Czosnek, Henryk; Hadidi, Ahmed
2014-01-01
Next-generation high throughput sequencing technologies became available at the onset of the 21st century. They provide a highly efficient, rapid, and low cost DNA sequencing platform beyond the reach of the standard and traditional DNA sequencing technologies developed in the late 1970s. They are continually improved to become faster, more efficient and cheaper. They have been used in many fields of biology since 2004. In 2009, next-generation sequencing (NGS) technologies began to be applied to several areas of plant virology including virus/viroid genome sequencing, discovery and detection, ecology and epidemiology, replication and transcription. Identification and characterization of known and unknown viruses and/or viroids in infected plants are currently among the most successful applications of these technologies. It is expected that NGS will play very significant roles in many research and non-research areas of plant virology. PMID:24399207
Next generation sequencing applications for microRNA biomarker discovery in toxicological studies
Next Generation Sequencing (NGS) technology will be reviewed for its base pair resolution, wide dynamic range, and insights into the genome and transcriptome, with special focus upon the biomarker potential of microRNAs (miRNAs). The first part of this presentation reviews commo...
SIMBA: a web tool for managing bacterial genome assembly generated by Ion PGM sequencing technology.
Mariano, Diego C B; Pereira, Felipe L; Aguiar, Edgar L; Oliveira, Letícia C; Benevides, Leandro; Guimarães, Luís C; Folador, Edson L; Sousa, Thiago J; Ghosh, Preetam; Barh, Debmalya; Figueiredo, Henrique C P; Silva, Artur; Ramos, Rommel T J; Azevedo, Vasco A C
2016-12-15
The evolution of Next-Generation Sequencing (NGS) has considerably reduced the cost per sequenced-base, allowing a significant rise of sequencing projects, mainly in prokaryotes. However, the range of available NGS platforms requires different strategies and software to correctly assemble genomes. Different strategies are necessary to properly complete an assembly project, in addition to the installation or modification of various software. This requires users to have significant expertise in these software and command line scripting experience on Unix platforms, besides possessing the basic expertise on methodologies and techniques for genome assembly. These difficulties often delay the complete genome assembly projects. In order to overcome this, we developed SIMBA (SImple Manager for Bacterial Assemblies), a freely available web tool that integrates several component tools for assembling and finishing bacterial genomes. SIMBA provides a friendly and intuitive user interface so bioinformaticians, even with low computational expertise, can work under a centralized administrative control system of assemblies managed by the assembly center head. SIMBA guides the users to execute assembly process through simple and interactive pages. SIMBA workflow was divided in three modules: (i) projects: allows a general vision of genome sequencing projects, in addition to data quality analysis and data format conversions; (ii) assemblies: allows de novo assemblies with the software Mira, Minia, Newbler and SPAdes, also assembly quality validations using QUAST software; and (iii) curation: presents methods to finishing assemblies through tools for scaffolding contigs and close gaps. We also presented a case study that validated the efficacy of SIMBA to manage bacterial assemblies projects sequenced using Ion Torrent PGM. Besides to be a web tool for genome assembly, SIMBA is a complete genome assemblies project management system, which can be useful for managing of several projects in laboratories. SIMBA source code is available to download and install in local webservers at http://ufmg-simba.sourceforge.net .
Whole-genome sequencing for comparative genomics and de novo genome assembly.
Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C
2015-01-01
Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).
Copy number variation of individual cattle genomes using next-generation sequencing
USDA-ARS?s Scientific Manuscript database
Copy number variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often intractable. Using a read depth approach based on next-generation sequencing, we examined genome-wide copy number differences among five taurine (three Angus, one ...
Individualized cattle copy number and segmental duplication maps using next generation sequencing
USDA-ARS?s Scientific Manuscript database
Copy Number Variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often intractable. Using a read depth approach based on next generation sequencing, we examined genome-wide copy number differences among five taurine (three Angus, one ...
Copy number variation of individual cattle genomes using next-generation sequencing
USDA-ARS?s Scientific Manuscript database
Copy Number Variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often difficult to track. Using a read depth approach based on next generation sequencing, we examined genome-wide copy number differences among five taurine (three Angu...
Practical applications of next-generation sequencing for food-safety research
USDA-ARS?s Scientific Manuscript database
Next-generation sequencing (NGS) is a transformative technology that is revolutionizing the biological sciences. However, many researchers remain uncertain as to the best ways to harness the power of NGS and apply it to their own research questions. Here we highlight three case studies of how NGS ...
Using next generation sequencing for multiplexed trait-linked markers in wheat
USDA-ARS?s Scientific Manuscript database
With the advent of next generation sequencing (NGS) technologies, single nucleotide polymorphisms (SNPs) have become the major type of marker for genotyping in many crops. However, the availability of SNP markers for important traits of bread wheat (Triticum aestivum L.) that can be effectively used...
Early detection of non-native fishes using next-generation DNA sequencing of fish larvae
Our objective was to evaluate the use of fish larvae for early detection of non-native fishes, comparing traditional and molecular taxonomy based on next-generation DNA sequencing to investigate potential efficiencies. Our approach was to intensively sample a Great Lakes non-nati...
The role of next generation sequencing for the development and testing of veterinary biologics
USDA-ARS?s Scientific Manuscript database
Next generation sequencing technology has become widely available and it offers many new opportunities in vaccine technology. Both human and veterinary medicine has numerous examples of adventitious agents being found in live vaccines. In veterinary medicine a continuing trend is the use of viral ...
Connecting the Human Variome Project to nutrigenomics.
Kaput, Jim; Evelo, Chris T; Perozzi, Giuditta; van Ommen, Ben; Cotton, Richard
2010-12-01
Nutrigenomics is the science of analyzing and understanding gene-nutrient interactions, which because of the genetic heterogeneity, varying degrees of interaction among gene products, and the environmental diversity is a complex science. Although much knowledge of human diversity has been accumulated, estimates suggest that ~90% of genetic variation has not yet been characterized. Identification of the DNA sequence variants that contribute to nutrition-related disease risk is essential for developing a better understanding of the complex causes of disease in humans, including nutrition-related disease. The Human Variome Project (HVP; http://www.humanvariomeproject.org/) is an international effort to systematically identify genes, their mutations, and their variants associated with phenotypic variability and indications of human disease or phenotype. Since nutrigenomic research uses genetic information in the design and analysis of experiments, the HVP is an essential collaborator for ongoing studies of gene-nutrient interactions. With the advent of next generation sequencing methodologies and the understanding of the undiscovered variation in human genomes, the nutrigenomic community will be generating novel sequence data and results. The guidelines and practices of the HVP can guide and harmonize these efforts.
Connecting the Human Variome Project to nutrigenomics
Evelo, Chris T.; Perozzi, Giuditta; van Ommen, Ben; Cotton, Richard
2010-01-01
Nutrigenomics is the science of analyzing and understanding gene–nutrient interactions, which because of the genetic heterogeneity, varying degrees of interaction among gene products, and the environmental diversity is a complex science. Although much knowledge of human diversity has been accumulated, estimates suggest that ~90% of genetic variation has not yet been characterized. Identification of the DNA sequence variants that contribute to nutrition-related disease risk is essential for developing a better understanding of the complex causes of disease in humans, including nutrition-related disease. The Human Variome Project (HVP; http://www.humanvariomeproject.org/) is an international effort to systematically identify genes, their mutations, and their variants associated with phenotypic variability and indications of human disease or phenotype. Since nutrigenomic research uses genetic information in the design and analysis of experiments, the HVP is an essential collaborator for ongoing studies of gene–nutrient interactions. With the advent of next generation sequencing methodologies and the understanding of the undiscovered variation in human genomes, the nutrigenomic community will be generating novel sequence data and results. The guidelines and practices of the HVP can guide and harmonize these efforts. PMID:28300226
Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay
2013-01-01
Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.
2013-01-01
Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. Here we describe UPPNEX, an implementation of such an infrastructure, tailored to the needs of data storage and analysis of NGS data in Sweden serving various labs and multiple instruments from the major sequencing technology platforms. UPPNEX comprises resources for high-performance computing, large-scale and high-availability storage, an extensive bioinformatics software suite, up-to-date reference genomes and annotations, a support function with system and application experts as well as a web portal and support ticket system. UPPNEX applications are numerous and diverse, and include whole genome-, de novo- and exome sequencing, targeted resequencing, SNP discovery, RNASeq, and methylation analysis. There are over 300 projects that utilize UPPNEX and include large undertakings such as the sequencing of the flycatcher and Norwegian spruce. We describe the strategic decisions made when investing in hardware, setting up maintenance and support, allocating resources, and illustrate major challenges such as managing data growth. We conclude with summarizing our experiences and observations with UPPNEX to date, providing insights into the successful and less successful decisions made. PMID:23800020
CANEapp: a user-friendly application for automated next generation transcriptomic data analysis.
Velmeshev, Dmitry; Lally, Patrick; Magistri, Marco; Faghihi, Mohammad Ali
2016-01-13
Next generation sequencing (NGS) technologies are indispensable for molecular biology research, but data analysis represents the bottleneck in their application. Users need to be familiar with computer terminal commands, the Linux environment, and various software tools and scripts. Analysis workflows have to be optimized and experimentally validated to extract biologically meaningful data. Moreover, as larger datasets are being generated, their analysis requires use of high-performance servers. To address these needs, we developed CANEapp (application for Comprehensive automated Analysis of Next-generation sequencing Experiments), a unique suite that combines a Graphical User Interface (GUI) and an automated server-side analysis pipeline that is platform-independent, making it suitable for any server architecture. The GUI runs on a PC or Mac and seamlessly connects to the server to provide full GUI control of RNA-sequencing (RNA-seq) project analysis. The server-side analysis pipeline contains a framework that is implemented on a Linux server through completely automated installation of software components and reference files. Analysis with CANEapp is also fully automated and performs differential gene expression analysis and novel noncoding RNA discovery through alternative workflows (Cuffdiff and R packages edgeR and DESeq2). We compared CANEapp to other similar tools, and it significantly improves on previous developments. We experimentally validated CANEapp's performance by applying it to data derived from different experimental paradigms and confirming the results with quantitative real-time PCR (qRT-PCR). CANEapp adapts to any server architecture by effectively using available resources and thus handles large amounts of data efficiently. CANEapp performance has been experimentally validated on various biological datasets. CANEapp is available free of charge at http://psychiatry.med.miami.edu/research/laboratory-of-translational-rna-genomics/CANE-app . We believe that CANEapp will serve both biologists with no computational experience and bioinformaticians as a simple, timesaving but accurate and powerful tool to analyze large RNA-seq datasets and will provide foundations for future development of integrated and automated high-throughput genomics data analysis tools. Due to its inherently standardized pipeline and combination of automated analysis and platform-independence, CANEapp is an ideal for large-scale collaborative RNA-seq projects between different institutions and research groups.
USDA-ARS?s Scientific Manuscript database
Early stage infections caused by fungal/oomycete spores can remain undetected until signs or symptoms develop. Serological and molecular techniques are currently used for detecting these pathogens. Next-generation sequencing (NGS) has potential as a diagnostic tool, due to the capacity to target mul...
USDA-ARS?s Scientific Manuscript database
Using next-generation-sequencing technology to assess entire transcriptomes requires high quality starting RNA. Currently, RNA quality is routinely judged using automated microfluidic gel electrophoresis platforms and associated algorithms. Here we report that such automated methods generate false-n...
Targeted enrichment strategies for next-generation plant biology
Richard Cronn; Brian J. Knaus; Aaron Liston; Peter J. Maughan; Matthew Parks; John V. Syring; Joshua Udall
2012-01-01
The dramatic advances offered by modem DNA sequencers continue to redefine the limits of what can be accomplished in comparative plant biology. Even with recent achievements, however, plant genomes present obstacles that can make it difficult to execute large-scale population and phylogenetic studies on next-generation sequencing platforms. Factors like large genome...
Shen, Li; Shao, Ningyi; Liu, Xiaochuan; Nestler, Eric
2014-04-15
Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. We have developed ngs.plot - a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.
2014-01-01
Background Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. Results We have developed ngs.plot – a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. Conclusions We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data. PMID:24735413
Fumagalli, Caterina; Vacirca, Davide; Rappa, Alessandra; Passaro, Antonio; Guarize, Juliana; Rafaniello Raviele, Paola; de Marinis, Filippo; Spaggiari, Lorenzo; Casadio, Chiara; Viale, Giuseppe; Barberis, Massimo; Guerini-Rocco, Elena
2018-03-13
Molecular profiling of advanced non-small cell lung cancers (NSCLC) is essential to identify patients who may benefit from targeted treatments. In the last years, the number of potentially actionable molecular alterations has rapidly increased. Next-generation sequencing allows for the analysis of multiple genes simultaneously. To evaluate the feasibility and the throughput of next-generation sequencing in clinical molecular diagnostics of advanced NSCLC. A single-institution cohort of 535 non-squamous NSCLC was profiled using a next-generation sequencing panel targeting 22 actionable and cancer-related genes. 441 non-squamous NSCLC (82.4%) harboured at least one gene alteration, including 340 cases (63.6%) with clinically relevant molecular aberrations. Mutations have been detected in all but one gene ( FGFR1 ) of the panel. Recurrent alterations were observed in KRAS , TP53 , EGFR , STK11 and MET genes, whereas the remaining genes were mutated in <5% of the cases. Concurrent mutations were detected in 183 tumours (34.2%), mostly impairing KRAS or EGFR in association with TP53 alterations. The study highlights the feasibility of targeted next-generation sequencing in clinical setting. The majority of NSCLC harboured mutations in clinically relevant genes, thus identifying patients who might benefit from different targeted therapies. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Next-generation sequencing for endocrine cancers: Recent advances and challenges.
Suresh, Padmanaban S; Venkatesh, Thejaswini; Tsutsumi, Rie; Shetty, Abhishek
2017-05-01
Contemporary molecular biology research tools have enriched numerous areas of biomedical research that address challenging diseases, including endocrine cancers (pituitary, thyroid, parathyroid, adrenal, testicular, ovarian, and neuroendocrine cancers). These tools have placed several intriguing clues before the scientific community. Endocrine cancers pose a major challenge in health care and research despite considerable attempts by researchers to understand their etiology. Microarray analyses have provided gene signatures from many cells, tissues, and organs that can differentiate healthy states from diseased ones, and even show patterns that correlate with stages of a disease. Microarray data can also elucidate the responses of endocrine tumors to therapeutic treatments. The rapid progress in next-generation sequencing methods has overcome many of the initial challenges of these technologies, and their advantages over microarray techniques have enabled them to emerge as valuable aids for clinical research applications (prognosis, identification of drug targets, etc.). A comprehensive review describing the recent advances in next-generation sequencing methods and their application in the evaluation of endocrine and endocrine-related cancers is lacking. The main purpose of this review is to illustrate the concepts that collectively constitute our current view of the possibilities offered by next-generation sequencing technological platforms, challenges to relevant applications, and perspectives on the future of clinical genetic testing of patients with endocrine tumors. We focus on recent discoveries in the use of next-generation sequencing methods for clinical diagnosis of endocrine tumors in patients and conclude with a discussion on persisting challenges and future objectives.
Droplet Digital™ PCR Next-Generation Sequencing Library QC Assay.
Heredia, Nicholas J
2018-01-01
Digital PCR is a valuable tool to quantify next-generation sequencing (NGS) libraries precisely and accurately. Accurately quantifying NGS libraries enable accurate loading of the libraries on to the sequencer and thus improve sequencing performance by reducing under and overloading error. Accurate quantification also benefits users by enabling uniform loading of indexed/barcoded libraries which in turn greatly improves sequencing uniformity of the indexed/barcoded samples. The advantages gained by employing the Droplet Digital PCR (ddPCR™) library QC assay includes the precise and accurate quantification in addition to size quality assessment, enabling users to QC their sequencing libraries with confidence.
Zhang, Ran; Yin, Yinliang; Zhang, Yujun; Li, Kexin; Zhu, Hongxia; Gong, Qin; Wang, Jianwu; Hu, Xiaoxiang; Li, Ning
2012-01-01
As the number of transgenic livestock increases, reliable detection and molecular characterization of transgene integration sites and copy number are crucial not only for interpreting the relationship between the integration site and the specific phenotype but also for commercial and economic demands. However, the ability of conventional PCR techniques to detect incomplete and multiple integration events is limited, making it technically challenging to characterize transgenes. Next-generation sequencing has enabled cost-effective, routine and widespread high-throughput genomic analysis. Here, we demonstrate the use of next-generation sequencing to extensively characterize cattle harboring a 150-kb human lactoferrin transgene that was initially analyzed by chromosome walking without success. Using this approach, the sites upstream and downstream of the target gene integration site in the host genome were identified at the single nucleotide level. The sequencing result was verified by event-specific PCR for the integration sites and FISH for the chromosomal location. Sequencing depth analysis revealed that multiple copies of the incomplete target gene and the vector backbone were present in the host genome. Upon integration, complex recombination was also observed between the target gene and the vector backbone. These findings indicate that next-generation sequencing is a reliable and accurate approach for the molecular characterization of the transgene sequence, integration sites and copy number in transgenic species. PMID:23185606
Ordulu, Zehra; Wong, Kristen E; Currall, Benjamin B; Ivanov, Andrew R; Pereira, Shahrin; Althari, Sara; Gusella, James F; Talkowski, Michael E; Morton, Cynthia C
2014-05-01
With recent rapid advances in genomic technologies, precise delineation of structural chromosome rearrangements at the nucleotide level is becoming increasingly feasible. In this era of "next-generation cytogenetics" (i.e., an integration of traditional cytogenetic techniques and next-generation sequencing), a consensus nomenclature is essential for accurate communication and data sharing. Currently, nomenclature for describing the sequencing data of these aberrations is lacking. Herein, we present a system called Next-Gen Cytogenetic Nomenclature, which is concordant with the International System for Human Cytogenetic Nomenclature (2013). This system starts with the alignment of rearrangement sequences by BLAT or BLAST (alignment tools) and arrives at a concise and detailed description of chromosomal changes. To facilitate usage and implementation of this nomenclature, we are developing a program designated BLA(S)T Output Sequence Tool of Nomenclature (BOSToN), a demonstrative version of which is accessible online. A standardized characterization of structural chromosomal rearrangements is essential both for research analyses and for application in the clinical setting. Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Real-Time Optimization and Control of Next-Generation Distribution
Infrastructure | Grid Modernization | NREL Real-Time Optimization and Control of Next -Generation Distribution Infrastructure Real-Time Optimization and Control of Next-Generation Distribution Infrastructure This project develops innovative, real-time optimization and control methods for next-generation
ParticleCall: A particle filter for base calling in next-generation sequencing systems
2012-01-01
Background Next-generation sequencing systems are capable of rapid and cost-effective DNA sequencing, thus enabling routine sequencing tasks and taking us one step closer to personalized medicine. Accuracy and lengths of their reads, however, are yet to surpass those provided by the conventional Sanger sequencing method. This motivates the search for computationally efficient algorithms capable of reliable and accurate detection of the order of nucleotides in short DNA fragments from the acquired data. Results In this paper, we consider Illumina’s sequencing-by-synthesis platform which relies on reversible terminator chemistry and describe the acquired signal by reformulating its mathematical model as a Hidden Markov Model. Relying on this model and sequential Monte Carlo methods, we develop a parameter estimation and base calling scheme called ParticleCall. ParticleCall is tested on a data set obtained by sequencing phiX174 bacteriophage using Illumina’s Genome Analyzer II. The results show that the developed base calling scheme is significantly more computationally efficient than the best performing unsupervised method currently available, while achieving the same accuracy. Conclusions The proposed ParticleCall provides more accurate calls than the Illumina’s base calling algorithm, Bustard. At the same time, ParticleCall is significantly more computationally efficient than other recent schemes with similar performance, rendering it more feasible for high-throughput sequencing data analysis. Improvement of base calling accuracy will have immediate beneficial effects on the performance of downstream applications such as SNP and genotype calling. ParticleCall is freely available at https://sourceforge.net/projects/particlecall. PMID:22776067
Omics Metadata Management Software (OMMS).
Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo
2015-01-01
Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. The OMMS can be obtained at http://omms.sandia.gov.
Omics Metadata Management Software (OMMS)
Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo
2015-01-01
Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. Availability The OMMS can be obtained at http://omms.sandia.gov PMID:26124554
An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.
Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K
2014-01-01
Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone free software.
Szymanski, Maciej; Karlowski, Wojciech M
2016-01-01
In eukaryotes, ribosomal 5S rRNAs are products of multigene families organized within clusters of tandemly repeated units. Accumulation of genomic data obtained from a variety of organisms demonstrated that the potential 5S rRNA coding sequences show a large number of variants, often incompatible with folding into a correct secondary structure. Here, we present results of an analysis of a large set of short RNA sequences generated by the next generation sequencing techniques, to address the problem of heterogeneity of the 5S rRNA transcripts in Arabidopsis and identification of potentially functional rRNA-derived fragments.
[The ENCODE project and functional genomics studies].
Ding, Nan; Qu, Hongzhu; Fang, Xiangdong
2014-03-01
Upon the completion of the Human Genome Project, scientists have been trying to interpret the underlying genomic code for human biology. Since 2003, National Human Genome Research Institute (NHGRI) has invested nearly $0.3 billion and gathered over 440 scientists from more than 32 institutions in the United States, China, United Kingdom, Japan, Spain and Singapore to initiate the Encyclopedia of DNA Elements (ENCODE) project, aiming to identify and analyze all regulatory elements in the human genome. Taking advantage of the development of next-generation sequencing technologies and continuous improvement of experimental methods, ENCODE had made remarkable achievements: identified methylation and histone modification of DNA sequences and their regulatory effects on gene expression through altering chromatin structures, categorized binding sites of various transcription factors and constructed their regulatory networks, further revised and updated database for pseudogenes and non-coding RNA, and identified SNPs in regulatory sequences associated with diseases. These findings help to comprehensively understand information embedded in gene and genome sequences, the function of regulatory elements as well as the molecular mechanism underlying the transcriptional regulation by noncoding regions, and provide extensive data resource for life sciences, particularly for translational medicine. We re-viewed the contributions of high-throughput sequencing platform development and bioinformatical technology improve-ment to the ENCODE project, the association between epigenetics studies and the ENCODE project, and the major achievement of the ENCODE project. We also provided our prospective on the role of the ENCODE project in promoting the development of basic and clinical medicine.
HLA genotyping by next-generation sequencing of complementary DNA.
Segawa, Hidenobu; Kukita, Yoji; Kato, Kikuya
2017-11-28
Genotyping of the human leucocyte antigen (HLA) is indispensable for various medical treatments. However, unambiguous genotyping is technically challenging due to high polymorphism of the corresponding genomic region. Next-generation sequencing is changing the landscape of genotyping. In addition to high throughput of data, its additional advantage is that DNA templates are derived from single molecules, which is a strong merit for the phasing problem. Although most currently developed technologies use genomic DNA, use of cDNA could enable genotyping with reduced costs in data production and analysis. We thus developed an HLA genotyping system based on next-generation sequencing of cDNA. Each HLA gene was divided into 3 or 4 target regions subjected to PCR amplification and subsequent sequencing with Ion Torrent PGM. The sequence data were then subjected to an automated analysis. The principle of the analysis was to construct candidate sequences generated from all possible combinations of variable bases and arrange them in decreasing order of the number of reads. Upon collecting candidate sequences from all target regions, 2 haplotypes were usually assigned. Cases not assigned 2 haplotypes were forwarded to 4 additional processes: selection of candidate sequences applying more stringent criteria, removal of artificial haplotypes, selection of candidate sequences with a relaxed threshold for sequence matching, and countermeasure for incomplete sequences in the HLA database. The genotyping system was evaluated using 30 samples; the overall accuracy was 97.0% at the field 3 level and 98.3% at the G group level. With one sample, genotyping of DPB1 was not completed due to short read size. We then developed a method for complete sequencing of individual molecules of the DPB1 gene, using the molecular barcode technology. The performance of the automatic genotyping system was comparable to that of systems developed in previous studies. Thus, next-generation sequencing of cDNA is a viable option for HLA genotyping.
Aspergillus, Penicillium and Talaromyces isolated from house dust samples collected around the world
Visagie, C.M.; Hirooka, Y.; Tanney, J.B.; Whitfield, E.; Mwange, K.; Meijer, M.; Amend, A.S.; Seifert, K.A.; Samson, R.A.
2014-01-01
As part of a worldwide survey of the indoor mycobiota, dust was collected from nine countries. Analyses of dust samples included the culture-dependent dilution-to-extinction method and the culture-independent 454-pyrosequencing. Of the 7 904 isolates, 2 717 isolates were identified as belonging to Aspergillus, Penicillium and Talaromyces. The aim of this study was to identify isolates to species level and describe the new species found. Secondly, we wanted to create a reliable reference sequence database to be used for next-generation sequencing projects. Isolates represented 59 Aspergillus species, including eight undescribed species, 49 Penicillium species of which seven were undescribed and 18 Talaromyces species including three described here as new. In total, 568 ITS barcodes were generated, and 391 β-tubulin and 507 calmodulin sequences, which serve as alternative identification markers. PMID:25492981
Next-generation Sequencing-based genomic profiling: Fostering innovation in cancer care?
Fernandes, Gustavo S; Marques, Daniel F; Girardi, Daniel M; Braghiroli, Maria Ignez F; Coudry, Renata A; Meireles, Sibele I; Katz, Artur; Hoff, Paulo M
2017-10-01
With the development of next-generation sequencing (NGS) technologies, DNA sequencing has been increasingly utilized in clinical practice. Our goal was to investigate the impact of genomic evaluation on treatment decisions for heavily pretreated patients with metastatic cancer. We analyzed metastatic cancer patients from a single institution whose cancers had progressed after all available standard-of-care therapies and whose tumors underwent next-generation sequencing analysis. We determined the percentage of patients who received any therapy directed by the test, and its efficacy. From July 2013 to December 2015, 185 consecutive patients were tested using a commercially available next-generation sequencing-based test, and 157 patients were eligible. Sixty-six patients (42.0%) were female, and 91 (58.0%) were male. The mean age at diagnosis was 52.2 years, and the mean number of pre-test lines of systemic treatment was 2.7. One hundred and seventy-seven patients (95.6%) had at least one identified gene alteration. Twenty-four patients (15.2%) underwent systemic treatment directed by the test result. Of these, one patient had a complete response, four (16.7%) had partial responses, two (8.3%) had stable disease, and 17 (70.8%) had disease progression as the best result. The median progression-free survival time with matched therapy was 1.6 months, and the median overall survival was 10 months. We identified a high prevalence of gene alterations using an next-generation sequencing test. Although some benefit was associated with the matched therapy, most of the patients had disease progression as the best response, indicating the limited biological potential and unclear clinical relevance of this practice.
SCARF: maximizing next-generation EST assemblies for evolutionary and population genomic analyses.
Barker, Michael S; Dlugosch, Katrina M; Reddy, A Chaitanya C; Amyotte, Sarah N; Rieseberg, Loren H
2009-02-15
Scaffolded and Corrected Assembly of Roche 454 (SCARF) is a next-generation sequence assembly tool for evolutionary genomics that is designed especially for assembling 454 EST sequences against high-quality reference sequences from related species. The program was created to knit together 454 contigs that do not assemble during traditional de novo assembly, using a reference sequence library to orient the 454 sequences. SCARF is freely available at http://msbarker.com/software.htm, and is released under the open source GPLv3 license (http://www.opensource.org/licenses/gpl-3.0.html.
[Molecular and prenatal diagnosis of a family with Fanconi anemia by next generation sequencing].
Gong, Zhuwen; Yu, Yongguo; Zhang, Qigang; Gu, Xuefan
2015-04-01
To provide prenatal diagnosis for a pregnant woman who had given birth to a child with Fanconi anemia with combined next-generation sequencing (NGS) and Sanger sequencing. For the affected child, potential mutations of the FANCA gene were analyzed with NGS. Suspected mutation was verified with Sanger sequencing. For prenatal diagnosis, genomic DNA was extracted from cultured fetal amniotic fluid cells and subjected to analysis of the same mutations. A low-frequency frameshifting mutation c.989_995del7 (p.H330LfsX2, inherited from his father) and a truncating mutation c.3971C>T (p.P1324L, inherited from his mother) have been identified in the affected child and considered to be pathogenic. The two mutations were subsequently verified by Sanger sequencing. Upon prenatal diagnosis, the fetus was found to carry two mutations. The combined next-generation sequencing and Sanger sequencing can reduce the time for diagnosis and identify subtypes of Fanconi anemia and the mutational sites, which has enabled reliable prenatal diagnosis of this disease.
NASA Technical Reports Server (NTRS)
Kopardekar, Parimal H.
2010-01-01
This document describes the FY2010 plan for the management and execution of the Next Generation Air Transportation System (NextGen) Concepts and Technology Development (CTD) Project. The document was developed in response to guidance from the Airspace Systems Program (ASP), as approved by the Associate Administrator of the Aeronautics Research Mission Directorate (ARMD), and from guidelines in the Airspace Systems Program Plan. Congress established the multi-agency Joint Planning and Development Office (JPDO) in 2003 to develop a vision for the 2025 Next Generation Air Transportation System (NextGen) and to define the research required to enable it. NASA is one of seven agency partners contributing to the effort. Accordingly, NASA's ARMD realigned the Airspace Systems Program in 2007 to "directly address the fundamental research needs of the Next Generation Air Transportation System...in partnership with the member agencies of the JPDO." The Program subsequently established two new projects to meet this objective: the NextGen-Airspace Project and the NextGen-Airportal Project. Together, the projects will also focus NASA s technical expertise and world-class facilities to address the question of where, when, how and the extent to which automation can be applied to moving aircraft safely and efficiently through the NAS and technologies that address optimal allocation of ground and air technologies necessary for NextGen. Additionally, the roles and responsibilities of humans and automation influence in the NAS will be addressed by both projects. Foundational concept and technology research and development begun under the NextGen-Airspace and NextGen-Airportal projects will continue. There will be no change in NASA Research Announcement (NRA) strategy, nor will there be any change to NASA interfaces with the JPDO, Federal Aviation Administration (FAA), Research Transition Teams (RTTs), or other stakeholders
Sequencing technologies - the next generation.
Metzker, Michael L
2010-01-01
Demand has never been greater for revolutionary technologies that deliver fast, inexpensive and accurate genome information. This challenge has catalysed the development of next-generation sequencing (NGS) technologies. The inexpensive production of large volumes of sequence data is the primary advantage over conventional methods. Here, I present a technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments. I also outline the broad range of applications for NGS technologies, in addition to providing guidelines for platform selection to address biological questions of interest.
Yang, Lei; Naylor, Gavin J P
2016-01-01
We determined the complete mitochondrial genome sequence (16,760 bp) of the peacock skate Pavoraja nitida using a long-PCR based next generation sequencing method. It has 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and 1 control region in the typical vertebrate arrangement. Primers, protocols, and procedures used to obtain this mitogenome are provided. We anticipate that this approach will facilitate rapid collection of mitogenome sequences for studies on phylogenetic relationships, population genetics, and conservation of cartilaginous fishes.
Fuentes-Pananá, Ezequiel M; Larios-Serrato, Violeta; Méndez-Tenorio, Alfonso; Morales-Sánchez, Abigail; Arias, Carlos F; Torres, Javier
2016-01-01
Gastric (GC) and breast (BrC) cancer are two of the most common and deadly tumours. Different lines of evidence suggest a possible causative role of viral infections for both GC and BrC. Wide genome sequencing (WGS) technologies allow searching for viral agents in tissues of patients with cancer. These technologies have already contributed to establish virus-cancer associations as well as to discovery new tumour viruses. The objective of this study was to document possible associations of viral infection with GC and BrC in Mexican patients. In order to gain idea about cost effective conditions of experimental sequencing, we first carried out an in silico simulation of WGS. The next-generation-platform IlluminaGallx was then used to sequence GC and BrC tumour samples. While we did not find viral sequences in tissues from BrC patients, multiple reads matching Epstein-Barr virus (EBV) sequences were found in GC tissues. An end-point polymerase chain reaction confirmed an enrichment of EBV sequences in one of the GC samples sequenced, validating the next-generation sequencing-bioinformatics pipeline. PMID:26910355
Fuentes-Pananá, Ezequiel M; Larios-Serrato, Violeta; Méndez-Tenorio, Alfonso; Morales-Sánchez, Abigail; Arias, Carlos F; Torres, Javier
2016-03-01
Gastric (GC) and breast (BrC) cancer are two of the most common and deadly tumours. Different lines of evidence suggest a possible causative role of viral infections for both GC and BrC. Wide genome sequencing (WGS) technologies allow searching for viral agents in tissues of patients with cancer. These technologies have already contributed to establish virus-cancer associations as well as to discovery new tumour viruses. The objective of this study was to document possible associations of viral infection with GC and BrC in Mexican patients. In order to gain idea about cost effective conditions of experimental sequencing, we first carried out an in silico simulation of WGS. The next-generation-platform IlluminaGallx was then used to sequence GC and BrC tumour samples. While we did not find viral sequences in tissues from BrC patients, multiple reads matching Epstein-Barr virus (EBV) sequences were found in GC tissues. An end-point polymerase chain reaction confirmed an enrichment of EBV sequences in one of the GC samples sequenced, validating the next-generation sequencing-bioinformatics pipeline.
Sequence information signal processor
Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.
1999-01-01
An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.
Next-Generation Sequencing Platforms
NASA Astrophysics Data System (ADS)
Mardis, Elaine R.
2013-06-01
Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.
75 FR 74040 - Intent To Prepare an Environmental Impact Statement and To Conduct Scoping Meetings...
Federal Register 2010, 2011, 2012, 2013, 2014
2010-11-30
... Proposed Project NextEra's proposed Project would consist of up to 100 wind turbine generators with a... roads. NextEra has secured leases with willing landowners for its wind generation turbines and related... substantial natural resources conflicts. NextEra's siting process for the wind turbine strings and associated...
Fan, Yu; Xi, Liu; Hughes, Daniel S T; Zhang, Jianjun; Zhang, Jianhua; Futreal, P Andrew; Wheeler, David A; Wang, Wenyi
2016-08-24
Subclonal mutations reveal important features of the genetic architecture of tumors. However, accurate detection of mutations in genetically heterogeneous tumor cell populations using next-generation sequencing remains challenging. We develop MuSE ( http://bioinformatics.mdanderson.org/main/MuSE ), Mutation calling using a Markov Substitution model for Evolution, a novel approach for modeling the evolution of the allelic composition of the tumor and normal tissue at each reference base. MuSE adopts a sample-specific error model that reflects the underlying tumor heterogeneity to greatly improve the overall accuracy. We demonstrate the accuracy of MuSE in calling subclonal mutations in the context of large-scale tumor sequencing projects using whole exome and whole genome sequencing.
USDA-ARS?s Scientific Manuscript database
The dissection of complex traits of economic importance for the pig industry requires the availability of a significant number of genetic markers, such as SNPs. This study was conducted in order to discover thousands of porcine SNPs using next generation sequencing technologies and use those SNPs, a...
Review of General Algorithmic Features for Genome Assemblers for Next Generation Sequencers
Wajid, Bilal; Serpedin, Erchin
2012-01-01
In the realm of bioinformatics and computational biology, the most rudimentary data upon which all the analysis is built is the sequence data of genes, proteins and RNA. The sequence data of the entire genome is the solution to the genome assembly problem. The scope of this contribution is to provide an overview on the art of problem-solving applied within the domain of genome assembly in the next-generation sequencing (NGS) platforms. This article discusses the major genome assemblers that were proposed in the literature during the past decade by outlining their basic working principles. It is intended to act as a qualitative, not a quantitative, tutorial to all working on genome assemblers pertaining to the next generation of sequencers. We discuss the theoretical aspects of various genome assemblers, identifying their working schemes. We also discuss briefly the direction in which the area is headed towards along with discussing core issues on software simplicity. PMID:22768980
Suyama, Yoshihisa; Matsuki, Yu
2015-01-01
Restriction-enzyme (RE)-based next-generation sequencing methods have revolutionized marker-assisted genetic studies; however, the use of REs has limited their widespread adoption, especially in field samples with low-quality DNA and/or small quantities of DNA. Here, we developed a PCR-based procedure to construct reduced representation libraries without RE digestion steps, representing de novo single-nucleotide polymorphism discovery, and its genotyping using next-generation sequencing. Using multiplexed inter-simple sequence repeat (ISSR) primers, thousands of genome-wide regions were amplified effectively from a wide variety of genomes, without prior genetic information. We demonstrated: 1) Mendelian gametic segregation of the discovered variants; 2) reproducibility of genotyping by checking its applicability for individual identification; and 3) applicability in a wide variety of species by checking standard population genetic analysis. This approach, called multiplexed ISSR genotyping by sequencing, should be applicable to many marker-assisted genetic studies with a wide range of DNA qualities and quantities. PMID:26593239
Refinetti, Paulo; Morgenthaler, Stephan; Ekstrøm, Per O
2016-07-01
Cycling temperature capillary electrophoresis has been optimised for mutation detection in 76% of the mitochondrial genome. The method was tested on a mixed sample and compared to mutation detection by next generation sequencing. Out of 152 fragments 90 were concordant, 51 discordant and in 11 were semi-concordant. Dilution experiments show that cycling capillary electrophoresis has a detection limit of 1-3%. The detection limit of routine next generation sequencing was in the ranges of 15 to 30%. Cycling temperature capillary electrophoresis detect and accurate quantify mutations at a fraction of the cost and time required to perform a next generation sequencing analysis. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
[Hot topics of circulating tumor DNA testing in breast cancer].
Liu, Y H; Zhou, B; Xu, L; Xin, L
2017-02-01
The progress of gene detection technologies represented by next generation sequencing (NGS) and digital PCR laid a foundation for studies of circulating tumor DNA (ctDNA) in breast cancer. In 2014, the NGS workgroup organized by the College of American Pathologists (CAP) published the College of American Pathologists ' Laboratory Standards for Next - Generation Sequencing Clinical Tests, which provides a blueprint for the standardization of gene testing. In 2015, the Guidelines for Diagnostic Next - generation Sequencing published by the European Society of Human Genetics claimed that NGS is unacceptable in clinical practice before studies guided by guidelines are approved. Although existing studies show the benefits of ctDNA testing in disease monitoring and prognosis analyzing, we have a ways to go to normalize the procedure and build strict detection criteria.
Liu, Gary W; Livesay, Brynn R; Kacherovsky, Nataly A; Cieslewicz, Maryelise; Lutz, Emi; Waalkes, Adam; Jensen, Michael C; Salipante, Stephen J; Pun, Suzie H
2015-08-19
Peptide ligands are used to increase the specificity of drug carriers to their target cells and to facilitate intracellular delivery. One method to identify such peptide ligands, phage display, enables high-throughput screening of peptide libraries for ligands binding to therapeutic targets of interest. However, conventional methods for identifying target binders in a library by Sanger sequencing are low-throughput, labor-intensive, and provide a limited perspective (<0.01%) of the complete sequence space. Moreover, the small sample space can be dominated by nonspecific, preferentially amplifying "parasitic sequences" and plastic-binding sequences, which may lead to the identification of false positives or exclude the identification of target-binding sequences. To overcome these challenges, we employed next-generation Illumina sequencing to couple high-throughput screening and high-throughput sequencing, enabling more comprehensive access to the phage display library sequence space. In this work, we define the hallmarks of binding sequences in next-generation sequencing data, and develop a method that identifies several target-binding phage clones for murine, alternatively activated M2 macrophages with a high (100%) success rate: sequences and binding motifs were reproducibly present across biological replicates; binding motifs were identified across multiple unique sequences; and an unselected, amplified library accurately filtered out parasitic sequences. In addition, we validate the Multiple Em for Motif Elicitation tool as an efficient and principled means of discovering binding sequences.
Keller, A; Danner, N; Grimmer, G; Ankenbrand, M; von der Ohe, K; von der Ohe, W; Rost, S; Härtel, S; Steffan-Dewenter, I
2015-03-01
The identification of pollen plays an important role in ecology, palaeo-climatology, honey quality control and other areas. Currently, expert knowledge and reference collections are essential to identify pollen origin through light microscopy. Pollen identification through molecular sequencing and DNA barcoding has been proposed as an alternative approach, but the assessment of mixed pollen samples originating from multiple plant species is still a tedious and error-prone task. Next-generation sequencing has been proposed to avoid this hindrance. In this study we assessed mixed pollen probes through next-generation sequencing of amplicons from the highly variable, species-specific internal transcribed spacer 2 region of nuclear ribosomal DNA. Further, we developed a bioinformatic workflow to analyse these high-throughput data with a newly created reference database. To evaluate the feasibility, we compared results from classical identification based on light microscopy from the same samples with our sequencing results. We assessed in total 16 mixed pollen samples, 14 originated from honeybee colonies and two from solitary bee nests. The sequencing technique resulted in higher taxon richness (deeper assignments and more identified taxa) compared to light microscopy. Abundance estimations from sequencing data were significantly correlated with counted abundances through light microscopy. Simulation analyses of taxon specificity and sensitivity indicate that 96% of taxa present in the database are correctly identifiable at the genus level and 70% at the species level. Next-generation sequencing thus presents a useful and efficient workflow to identify pollen at the genus and species level without requiring specialised palynological expert knowledge. © 2014 German Botanical Society and The Royal Botanical Society of the Netherlands.
Carey, John C
2017-09-01
The designation, phenotype, was proposed as a term by Wilhelm Johannsen in 1909. The word is derived from the Greek, phano (showing) and typo (type), phanotypos. Phenotype has become a widely recognized term, even outside of the genetics community, in recent years with the ongoing identification of human disease genes. The term has been defined as the observable constitution of an organism, but sometimes refers to a condition when a person has a particular clinical presentation. Analysis of phenotype is a timely theme because advances in the understanding of the genetic basis of human disease and the emergence of next generation sequencing have spurred a renewed interest in phenotype and the proposal to establish a "Human Phenome Project." This article summarizes the principles of phenotype analysis that are important in medical genetics and describes approaches to comprehensive phenotype analysis in the investigation of patients with human disorders. I discuss the various elements related to disease phenotypes and highlight neurofibromatosis type 1 and the Elements of Morphology Project as illustrations of the principles. In recent years, the notion of "deep phenotyping" has emerged. Currently there are now a number of proposed strategies and resources to approach this concept. Not since the 1960s and 1970s has there been such an exciting time in the history of medicine surrounding the analysis of phenotype in genetic disorders. © 2017 Wiley Periodicals, Inc.
From Conventional to Next Generation Sequencing of Epstein-Barr Virus Genomes.
Kwok, Hin; Chiang, Alan Kwok Shing
2016-02-24
Genomic sequences of Epstein-Barr virus (EBV) have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS) and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.
Martin-Fernandez, Laura; Gavidia-Bovadilla, Giovana; Corrales, Irene; Brunel, Helena; Ramírez, Lorena; López, Sonia; Souto, Juan Carlos; Vidal, Francisco; Soria, José Manuel
2017-01-01
Venous thromboembolism is a complex disease with a high heritability. There are significant associations among Factor XI (FXI) levels and SNPs in the KNG1 and F11 loci. Our aim was to identify the genetic variation of KNG1 and F11 that might account for the variability of FXI levels. The KNG1 and F11 loci were sequenced completely in 110 unrelated individuals from the GAIT-2 (Genetic Analysis of Idiopathic Thrombophilia 2) Project using Next Generation Sequencing on an Illumina MiSeq. The GAIT-2 Project is a study of 935 individuals in 35 extended Spanish families selected through a proband with idiopathic thrombophilia. Among the 110 individuals, a subset of 40 individuals was chosen as a discovery sample for identifying variants. A total of 762 genetic variants were detected. Several significant associations were established among common variants and low-frequency variants sets in KNG1 and F11 with FXI levels using the PLINK and SKAT packages. Among these associations, those of rs710446 and five low-frequency variant sets in KNG1 with FXI level variation were significant after multiple testing correction and permutation. Also, two putative pathogenic mutations related to high and low FXI levels were identified by data filtering and in silico predictions. This study of KNG1 and F11 loci should help to understand the connection between genotypic variation and variation in FXI levels. The functional genetic variants should be useful as markers of thromboembolic risk.
NGS Catalog: A Database of Next Generation Sequencing Studies in Humans
Xia, Junfeng; Wang, Qingguo; Jia, Peilin; Wang, Bing; Pao, William; Zhao, Zhongming
2015-01-01
Next generation sequencing (NGS) technologies have been rapidly applied in biomedical and biological research since its advent only a few years ago, and they are expected to advance at an unprecedented pace in the following years. To provide the research community with a comprehensive NGS resource, we have developed the database Next Generation Sequencing Catalog (NGS Catalog, http://bioinfo.mc.vanderbilt.edu/NGS/index.html), a continually updated database that collects, curates and manages available human NGS data obtained from published literature. NGS Catalog deposits publication information of NGS studies and their mutation characteristics (SNVs, small insertions/deletions, copy number variations, and structural variants), as well as mutated genes and gene fusions detected by NGS. Other functions include user data upload, NGS general analysis pipelines, and NGS software. NGS Catalog is particularly useful for investigators who are new to NGS but would like to take advantage of these powerful technologies for their own research. Finally, based on the data deposited in NGS Catalog, we summarized features and findings from whole exome sequencing, whole genome sequencing, and transcriptome sequencing studies for human diseases or traits. PMID:22517761
Johnston, Christine; Magaret, Amalia; Roychoudhury, Pavitra; Greninger, Alexander L; Cheng, Anqi; Diem, Kurt; Fitzgibbon, Matthew P; Huang, Meei-Li; Selke, Stacy; Lingappa, Jairam R; Celum, Connie; Jerome, Keith R; Wald, Anna; Koelle, David M
2017-10-01
Understanding the variability in circulating herpes simplex virus type 2 (HSV-2) genomic sequences is critical to the development of HSV-2 vaccines. Genital lesion swabs containing ≥ 10 7 log 10 copies HSV DNA collected from Africa, the USA, and South America underwent next-generation sequencing, followed by K-mer based filtering and de novo genomic assembly. Sites of heterogeneity within coding regions in unique long and unique short (U L _U S ) regions were identified. Phylogenetic trees were created using maximum likelihood reconstruction. Among 46 samples from 38 persons, 1468 intragenic base-pair substitutions were identified. The maximum nucleotide distance between strains for concatenated U L_ U S segments was 0.4%. Phylogeny did not reveal geographic clustering. The most variable proteins had non-synonymous mutations in < 3% of amino acids. Unenriched HSV-2 DNA can undergo next-generation sequencing to identify intragenic variability. The use of clinical swabs for sequencing expands the information that can be gathered directly from these specimens. Copyright © 2017 Elsevier Inc. All rights reserved.
Library construction for next-generation sequencing: Overviews and challenges
Head, Steven R.; Komori, H. Kiyomi; LaMere, Sarah A.; Whisenant, Thomas; Van Nieuwerburgh, Filip; Salomon, Daniel R.; Ordoukhanian, Phillip
2014-01-01
High-throughput sequencing, also known as next-generation sequencing (NGS), has revolutionized genomic research. In recent years, NGS technology has steadily improved, with costs dropping and the number and range of sequencing applications increasing exponentially. Here, we examine the critical role of sequencing library quality and consider important challenges when preparing NGS libraries from DNA and RNA sources. Factors such as the quantity and physical characteristics of the RNA or DNA source material as well as the desired application (i.e., genome sequencing, targeted sequencing, RNA-seq, ChIP-seq, RIP-seq, and methylation) are addressed in the context of preparing high quality sequencing libraries. In addition, the current methods for preparing NGS libraries from single cells are also discussed. PMID:24502796
Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter
2015-01-01
Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438
Doyle, Stephen R; Griffith, Ian S; Murphy, Nick P; Strugnell, Jan M
2015-01-01
The complete mitochondrial genome of the Eastern Rock lobster, Sagmariasus verreauxi, is reported for the first time. Using low-coverage, long read MiSeq next generation sequencing, we constructed and determined the mtDNA genome organization of the 15,470 bp sequence from two isolates from Eastern Tasmania, Australia and Northern New Zealand, and identified 46 polymorphic nucleotides between the two sequences. This genome sequence and its genetic polymorphisms will likely be useful in understanding the distribution and population connectivity of the Eastern Rock Lobster, and in the fisheries management of this commercially important species.
Jimenez, Nelson Lopez; Flannick, Jason; Yahyavi, Mani; Li, Jiang; Bardakjian, Tanya; Tonkin, Leath; Schneider, Adele; Sherr, Elliott H; Slavotinek, Anne M
2011-12-28
Anophthalmia/microphthalmia (A/M) is caused by mutations in several different transcription factors, but mutations in each causative gene are relatively rare, emphasizing the need for a testing approach that screens multiple genes simultaneously. We used next-generation sequencing to screen 15 A/M patients for mutations in 9 pathogenic genes to evaluate this technology for screening in A/M. We used a pooled sequencing design, together with custom single nucleotide polymorphism (SNP) calling software. We verified predicted sequence alterations using Sanger sequencing. We verified three mutations - c.542delC in SOX2, resulting in p.Pro181Argfs*22, p.Glu105X in OTX2 and p.Cys240X in FOXE3. We found several novel sequence alterations and SNPs that were likely to be non-pathogenic - p.Glu42Lys in CRYBA4, p.Val201Met in FOXE3 and p.Asp291Asn in VSX2. Our analysis methodology gave one false positive result comprising a mutation in PAX6 (c.1268A > T, predicting p.X423LeuextX*15) that was not verified by Sanger sequencing. We also failed to detect one 20 base pair (bp) deletion and one 3 bp duplication in SOX2. Our results demonstrated the power of next-generation sequencing with pooled sample groups for the rapid screening of candidate genes for A/M as we were correctly able to identify disease-causing mutations. However, next-generation sequencing was less useful for small, intragenic deletions and duplications. We did not find mutations in 10/15 patients and conclude that there is a need for further gene discovery in A/M.
2011-01-01
Background Anophthalmia/microphthalmia (A/M) is caused by mutations in several different transcription factors, but mutations in each causative gene are relatively rare, emphasizing the need for a testing approach that screens multiple genes simultaneously. We used next-generation sequencing to screen 15 A/M patients for mutations in 9 pathogenic genes to evaluate this technology for screening in A/M. Methods We used a pooled sequencing design, together with custom single nucleotide polymorphism (SNP) calling software. We verified predicted sequence alterations using Sanger sequencing. Results We verified three mutations - c.542delC in SOX2, resulting in p.Pro181Argfs*22, p.Glu105X in OTX2 and p.Cys240X in FOXE3. We found several novel sequence alterations and SNPs that were likely to be non-pathogenic - p.Glu42Lys in CRYBA4, p.Val201Met in FOXE3 and p.Asp291Asn in VSX2. Our analysis methodology gave one false positive result comprising a mutation in PAX6 (c.1268A > T, predicting p.X423LeuextX*15) that was not verified by Sanger sequencing. We also failed to detect one 20 base pair (bp) deletion and one 3 bp duplication in SOX2. Conclusions Our results demonstrated the power of next-generation sequencing with pooled sample groups for the rapid screening of candidate genes for A/M as we were correctly able to identify disease-causing mutations. However, next-generation sequencing was less useful for small, intragenic deletions and duplications. We did not find mutations in 10/15 patients and conclude that there is a need for further gene discovery in A/M. PMID:22204637
Chin, Ephrem L H; da Silva, Cristina; Hegde, Madhuri
2013-02-19
Detecting mutations in disease genes by full gene sequence analysis is common in clinical diagnostic laboratories. Sanger dideoxy terminator sequencing allows for rapid development and implementation of sequencing assays in the clinical laboratory, but it has limited throughput, and due to cost constraints, only allows analysis of one or at most a few genes in a patient. Next-generation sequencing (NGS), on the other hand, has evolved rapidly, although to date it has mainly been used for large-scale genome sequencing projects and is beginning to be used in the clinical diagnostic testing. One advantage of NGS is that many genes can be analyzed easily at the same time, allowing for mutation detection when there are many possible causative genes for a specific phenotype. In addition, regions of a gene typically not tested for mutations, like deep intronic and promoter mutations, can also be detected. Here we use 20 previously characterized Sanger-sequenced positive controls in disease-causing genes to demonstrate the utility of NGS in a clinical setting using standard PCR based amplification to assess the analytical sensitivity and specificity of the technology for detecting all previously characterized changes (mutations and benign SNPs). The positive controls chosen for validation range from simple substitution mutations to complex deletion and insertion mutations occurring in autosomal dominant and recessive disorders. The NGS data was 100% concordant with the Sanger sequencing data identifying all 119 previously identified changes in the 20 samples. We have demonstrated that NGS technology is ready to be deployed in clinical laboratories. However, NGS and associated technologies are evolving, and clinical laboratories will need to invest significantly in staff and infrastructure to build the necessary foundation for success.
Model-based quality assessment and base-calling for second-generation sequencing data.
Bravo, Héctor Corrada; Irizarry, Rafael A
2010-09-01
Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads-strings of A,C,G, or T's, between 30 and 100 characters long-which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base-calling. The complexity of the base-calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across-sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec-gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base-calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base-calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base-calling performance. © 2009, The International Biometric Society.
Wei, Xiaoming; Sun, Yan; Xie, Jiansheng; Shi, Quan; Qu, Ning; Yang, Guanghui; Cai, Jun; Yang, Yi; Liang, Yu; Wang, Wei; Yi, Xin
2012-11-20
Targeted enrichment and next-generation sequencing (NGS) have been employed for detection of genetic diseases. The purpose of this study was to validate the accuracy and sensitivity of our method for comprehensive mutation detection of hereditary hearing loss, and identify inherited mutations involved in human deafness accurately and economically. To make genetic diagnosis of hereditary hearing loss simple and timesaving, we designed a 0.60 MB array-based chip containing 69 nuclear genes and mitochondrial genome responsible for human deafness and conducted NGS toward ten patients with five known mutations and a Chinese family with hearing loss (never genetically investigated). Ten patients with five known mutations were sequenced using next-generation sequencing to validate the sensitivity of the method. We identified four known mutations in two nuclear deafness causing genes (GJB2 and SLC26A4), one in mitochondrial DNA. We then performed this method to analyze the variants in a Chinese family with hearing loss and identified compound heterozygosity for two novel mutations in gene MYO7A. The compound heterozygosity identified in gene MYO7A causes Usher Syndrome 1B with severe phenotypes. The results support that the combination of enrichment of targeted genes and next-generation sequencing is a valuable molecular diagnostic tool for hereditary deafness and suitable for clinical application. Copyright © 2012 Elsevier B.V. All rights reserved.
Effect of Next-Generation Exome Sequencing Depth for Discovery of Diagnostic Variants.
Kim, Kyung; Seong, Moon-Woo; Chung, Won-Hyong; Park, Sung Sup; Leem, Sangseob; Park, Won; Kim, Jihyun; Lee, KiYoung; Park, Rae Woong; Kim, Namshin
2015-06-01
Sequencing depth, which is directly related to the cost and time required for the generation, processing, and maintenance of next-generation sequencing data, is an important factor in the practical utilization of such data in clinical fields. Unfortunately, identifying an exome sequencing depth adequate for clinical use is a challenge that has not been addressed extensively. Here, we investigate the effect of exome sequencing depth on the discovery of sequence variants for clinical use. Toward this, we sequenced ten germ-line blood samples from breast cancer patients on the Illumina platform GAII(x) at a high depth of ~200×. We observed that most function-related diverse variants in the human exonic regions could be detected at a sequencing depth of 120×. Furthermore, investigation using a diagnostic gene set showed that the number of clinical variants identified using exome sequencing reached a plateau at an average sequencing depth of about 120×. Moreover, the phenomena were consistent across the breast cancer samples.
2012-01-01
Background Genetic mapping and QTL detection are powerful methodologies in plant improvement and breeding. Construction of a high-density and high-quality genetic map would be of great benefit in the production of superior grapes to meet human demand. High throughput and low cost of the recently developed next generation sequencing (NGS) technology have resulted in its wide application in genome research. Sequencing restriction-site associated DNA (RAD) might be an efficient strategy to simplify genotyping. Combining NGS with RAD has proven to be powerful for single nucleotide polymorphism (SNP) marker development. Results An F1 population of 100 individual plants was developed. In-silico digestion-site prediction was used to select an appropriate restriction enzyme for construction of a RAD sequencing library. Next generation RAD sequencing was applied to genotype the F1 population and its parents. Applying a cluster strategy for SNP modulation, a total of 1,814 high-quality SNP markers were developed: 1,121 of these were mapped to the female genetic map, 759 to the male map, and 1,646 to the integrated map. A comparison of the genetic maps to the published Vitis vinifera genome revealed both conservation and variations. Conclusions The applicability of next generation RAD sequencing for genotyping a grape F1 population was demonstrated, leading to the successful development of a genetic map with high density and quality using our designed SNP markers. Detailed analysis revealed that this newly developed genetic map can be used for a variety of genome investigations, such as QTL detection, sequence assembly and genome comparison. PMID:22908993
Droege, Marcus; Hill, Brendon
2008-08-31
The Genome Sequencer FLX System (GS FLX), powered by 454 Sequencing, is a next-generation DNA sequencing technology featuring a unique mix of long reads, exceptional accuracy, and ultra-high throughput. It has been proven to be the most versatile of all currently available next-generation sequencing technologies, supporting many high-profile studies in over seven applications categories. GS FLX users have pursued innovative research in de novo sequencing, re-sequencing of whole genomes and target DNA regions, metagenomics, and RNA analysis. 454 Sequencing is a powerful tool for human genetics research, having recently re-sequenced the genome of an individual human, currently re-sequencing the complete human exome and targeted genomic regions using the NimbleGen sequence capture process, and detected low-frequency somatic mutations linked to cancer.
Method of Simulating Flow-Through Area of a Pressure Regulator
NASA Technical Reports Server (NTRS)
Hass, Neal E. (Inventor); Schallhorn, Paul A. (Inventor)
2011-01-01
The flow-through area of a pressure regulator positioned in a branch of a simulated fluid flow network is generated. A target pressure is defined downstream of the pressure regulator. A projected flow-through area is generated as a non-linear function of (i) target pressure, (ii) flow-through area of the pressure regulator for a current time step and a previous time step, and (iii) pressure at the downstream location for the current time step and previous time step. A simulated flow-through area for the next time step is generated as a sum of (i) flow-through area for the current time step, and (ii) a difference between the projected flow-through area and the flow-through area for the current time step multiplied by a user-defined rate control parameter. These steps are repeated for a sequence of time steps until the pressure at the downstream location is approximately equal to the target pressure.
An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data
Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M.; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A. V. S. K.; Varshney, Rajeev K.
2014-01-01
Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone free software. PMID:25003610
Genome-wide comparative analysis of four Indian Drosophila species.
Mohanty, Sujata; Khanna, Radhika
2017-12-01
Comparative analysis of multiple genomes of closely or distantly related Drosophila species undoubtedly creates excitement among evolutionary biologists in exploring the genomic changes with an ecology and evolutionary perspective. We present herewith the de novo assembled whole genome sequences of four Drosophila species, D. bipectinata, D. takahashii, D. biarmipes and D. nasuta of Indian origin using Next Generation Sequencing technology on an Illumina platform along with their detailed assembly statistics. The comparative genomics analysis, e.g. gene predictions and annotations, functional and orthogroup analysis of coding sequences and genome wide SNP distribution were performed. The whole genome of Zaprionus indianus of Indian origin published earlier by us and the genome sequences of previously sequenced 12 Drosophila species available in the NCBI database were included in the analysis. The present work is a part of our ongoing genomics project of Indian Drosophila species.
Giese, Sven H; Zickmann, Franziska; Renard, Bernhard Y
2014-01-01
Accurate estimation, comparison and evaluation of read mapping error rates is a crucial step in the processing of next-generation sequencing data, as further analysis steps and interpretation assume the correctness of the mapping results. Current approaches are either focused on sensitivity estimation and thereby disregard specificity or are based on read simulations. Although continuously improving, read simulations are still prone to introduce a bias into the mapping error quantitation and cannot capture all characteristics of an individual dataset. We introduce ARDEN (artificial reference driven estimation of false positives in next-generation sequencing data), a novel benchmark method that estimates error rates of read mappers based on real experimental reads, using an additionally generated artificial reference genome. It allows a dataset-specific computation of error rates and the construction of a receiver operating characteristic curve. Thereby, it can be used for optimization of parameters for read mappers, selection of read mappers for a specific problem or for filtering alignments based on quality estimation. The use of ARDEN is demonstrated in a general read mapper comparison, a parameter optimization for one read mapper and an application example in single-nucleotide polymorphism discovery with a significant reduction in the number of false positive identifications. The ARDEN source code is freely available at http://sourceforge.net/projects/arden/.
Liu, Yu; Koyutürk, Mehmet; Maxwell, Sean; Xiang, Min; Veigl, Martina; Cooper, Richard S; Tayo, Bamidele O; Li, Li; LaFramboise, Thomas; Wang, Zhenghe; Zhu, Xiaofeng; Chance, Mark R
2014-08-16
Sequences up to several megabases in length have been found to be present in individual genomes but absent in the human reference genome. These sequences may be common in populations, and their absence in the reference genome may indicate rare variants in the genomes of individuals who served as donors for the human genome project. As the reference genome is used in probe design for microarray technology and mapping short reads in next generation sequencing (NGS), this missing sequence could be a source of bias in functional genomic studies and variant analysis. One End Anchor (OEA) and/or orphan reads from paired-end sequencing have been used to identify novel sequences that are absent in reference genome. However, there is no study to investigate the distribution, evolution and functionality of those sequences in human populations. To systematically identify and study the missing common sequences (micSeqs), we extended the previous method by pooling OEA reads from large number of individuals and applying strict filtering methods to remove false sequences. The pipeline was applied to data from phase 1 of the 1000 Genomes Project. We identified 309 micSeqs that are present in at least 1% of the human population, but absent in the reference genome. We confirmed 76% of these 309 micSeqs by comparison to other primate genomes, individual human genomes, and gene expression data. Furthermore, we randomly selected fifteen micSeqs and confirmed their presence using PCR validation in 38 additional individuals. Functional analysis using published RNA-seq and ChIP-seq data showed that eleven micSeqs are highly expressed in human brain and three micSeqs contain transcription factor (TF) binding regions, suggesting they are functional elements. In addition, the identified micSeqs are absent in non-primates and show dynamic acquisition during primate evolution culminating with most micSeqs being present in Africans, suggesting some micSeqs may be important sources of human diversity. 76% of micSeqs were confirmed by a comparative genomics approach. Fourteen micSeqs are expressed in human brain or contain TF binding regions. Some micSeqs are primate-specific, conserved and may play a role in the evolution of primates.
Next-Generation Technologies for Multiomics Approaches Including Interactome Sequencing
Ohashi, Hiroyuki; Miyamoto-Sato, Etsuko
2015-01-01
The development of high-speed analytical techniques such as next-generation sequencing and microarrays allows high-throughput analysis of biological information at a low cost. These techniques contribute to medical and bioscience advancements and provide new avenues for scientific research. Here, we outline a variety of new innovative techniques and discuss their use in omics research (e.g., genomics, transcriptomics, metabolomics, proteomics, and interactomics). We also discuss the possible applications of these methods, including an interactome sequencing technology that we developed, in future medical and life science research. PMID:25649523
Ancient DNA studies: new perspectives on old samples
2012-01-01
In spite of past controversies, the field of ancient DNA is now a reliable research area due to recent methodological improvements. A series of recent large-scale studies have revealed the true potential of ancient DNA samples to study the processes of evolution and to test models and assumptions commonly used to reconstruct patterns of evolution and to analyze population genetics and palaeoecological changes. Recent advances in DNA technologies, such as next-generation sequencing make it possible to recover DNA information from archaeological and paleontological remains allowing us to go back in time and study the genetic relationships between extinct organisms and their contemporary relatives. With the next-generation sequencing methodologies, DNA sequences can be retrieved even from samples (for example human remains) for which the technical pitfalls of classical methodologies required stringent criteria to guaranty the reliability of the results. In this paper, we review the methodologies applied to ancient DNA analysis and the perspectives that next-generation sequencing applications provide in this field. PMID:22697611
A statistical method for the detection of variants from next-generation resequencing of DNA pools.
Bansal, Vikas
2010-06-15
Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing. We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80-85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3-5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP. Implementation of this method is available at http://polymorphism.scripps.edu/~vbansal/software/CRISP/.
Review of general algorithmic features for genome assemblers for next generation sequencers.
Wajid, Bilal; Serpedin, Erchin
2012-04-01
In the realm of bioinformatics and computational biology, the most rudimentary data upon which all the analysis is built is the sequence data of genes, proteins and RNA. The sequence data of the entire genome is the solution to the genome assembly problem. The scope of this contribution is to provide an overview on the art of problem-solving applied within the domain of genome assembly in the next-generation sequencing (NGS) platforms. This article discusses the major genome assemblers that were proposed in the literature during the past decade by outlining their basic working principles. It is intended to act as a qualitative, not a quantitative, tutorial to all working on genome assemblers pertaining to the next generation of sequencers. We discuss the theoretical aspects of various genome assemblers, identifying their working schemes. We also discuss briefly the direction in which the area is headed towards along with discussing core issues on software simplicity. Copyright © 2012 Beijing Institute of Genomics, Chinese Academy of Sciences. Published by Elsevier Ltd. All rights reserved.
Valtcheva, Nadejda; Lang, Franziska M; Noske, Aurelia; Samartzis, Eleftherios P; Schmidt, Anna-Maria; Bellini, Elisa; Fink, Daniel; Moch, Holger; Rechsteiner, Markus; Dedes, Konstantin J; Wild, Peter J
2017-01-19
Endometrioid adenocarcinoma of the uterus and ovarian endometrioid carcinoma share many morphological and molecular features. Differentiation between simultaneous primary carcinomas and ovarian metastases of an endometrial cancer may be very challenging but is essential for prognostic and therapeutic considerations. In the present case study of a 33 year-old patient we used targeted amplicon next-generation re-sequencing for clarifying the origin of synchronous endometrioid cancer of the corpus uteri and the left ovary. The patient developed a metachronous lung metastasis of an endometrioid adenocarcinoma four years after hyster- and adnexectomy, vaginal brachytherapy and treatment with the synthetic steroid tibolone. Removal of the metastasis and megestrol treatment for seven years led to a complete remission. A total of 409 genes from the Ampliseq Comprehensive Cancer Panel (Ion Torrent, Thermo Fisher) were analysed by next generation sequencing and mutations in 10 genes, including ARID1A, CTNNB1, PIK3CA and PTEN were identified and confirmed by Sanger sequencing. Primary endometrial as well as ovarian cancer showed an identical mutational profile, suggesting the presence of an ovarian metastasis of the endometrial cancer, rather than a simultaneous endometrial and ovarian cancer. The metachronous lung metastasis showed a different mutational profile compared to the primary cancer. Immunohistochemical staining of the corresponding proteins suggested that the tumour development was driven by alterations in the protein function rather than by changes of the protein abundance in the cell. Our results have demonstrated next generation sequencing as a valuable tool in the differentiation of synchronous primary tumours and metastases, which has an important impact on the clinical decision making process. Similar to breast cancer, targeted therapies based on mutational tumour profiling will become increasingly important in endometrial and ovarian cancer. In summary, our results support the usage of next generation sequencing as a supplementary diagnostic tool, assisting in personalized precision medicine.
Genome assembly reborn: recent computational challenges
2009-01-01
Research into genome assembly algorithms has experienced a resurgence due to new challenges created by the development of next generation sequencing technologies. Several genome assemblers have been published in recent years specifically targeted at the new sequence data; however, the ever-changing technological landscape leads to the need for continued research. In addition, the low cost of next generation sequencing data has led to an increased use of sequencing in new settings. For example, the new field of metagenomics relies on large-scale sequencing of entire microbial communities instead of isolate genomes, leading to new computational challenges. In this article, we outline the major algorithmic approaches for genome assembly and describe recent developments in this domain. PMID:19482960
Chen, Guiqian; Qiu, Yuan; Zhuang, Qingye; Wang, Suchun; Wang, Tong; Chen, Jiming; Wang, Kaicheng
2018-05-09
Next generation sequencing (NGS) is a powerful tool for the characterization, discovery, and molecular identification of RNA viruses. There were multiple NGS library preparation methods published for strand-specific RNA-seq, but some methods are not suitable for identifying and characterizing RNA viruses. In this study, we report a NGS library preparation method to identify RNA viruses using the Ion Torrent PGM platform. The NGS sequencing adapters were directly inserted into the sequencing library through reverse transcription and polymerase chain reaction, without fragmentation and ligation of nucleic acids. The results show that this method is simple to perform, able to identify multiple species of RNA viruses in clinical samples.
Next-Generation Sequencing in Oncology: Genetic Diagnosis, Risk Prediction and Cancer Classification
Kamps, Rick; Brandão, Rita D.; van den Bosch, Bianca J.; Paulussen, Aimee D. C.; Xanthoulea, Sofia; Blok, Marinus J.; Romano, Andrea
2017-01-01
Next-generation sequencing (NGS) technology has expanded in the last decades with significant improvements in the reliability, sequencing chemistry, pipeline analyses, data interpretation and costs. Such advances make the use of NGS feasible in clinical practice today. This review describes the recent technological developments in NGS applied to the field of oncology. A number of clinical applications are reviewed, i.e., mutation detection in inherited cancer syndromes based on DNA-sequencing, detection of spliceogenic variants based on RNA-sequencing, DNA-sequencing to identify risk modifiers and application for pre-implantation genetic diagnosis, cancer somatic mutation analysis, pharmacogenetics and liquid biopsy. Conclusive remarks, clinical limitations, implications and ethical considerations that relate to the different applications are provided. PMID:28146134
ERIC Educational Resources Information Center
Lennett, Benjamin; Morris, Sarah J.; Byrum, Greta
2012-01-01
Based on a request for information (RFI) submitted to The University Community Next Generation Innovation Project (Gig.U), the paper describes a model for universities to develop next generation broadband infrastructure in their communities. In the our view universities can play a critical role in spurring next generation networks into their…
“Shovel-ready” Sequences as a Stimulus for the Next Generation of Life Scientists
Boyle, Michael D.
2010-01-01
Genomics and bioinformatics are dynamic fields well-suited for capturing the imagination of undergraduates in both research laboratories and classrooms. Currently, raw nucleotide sequence is being provided, as part of several genomics research initiatives, for undergraduate research and teaching. These initiatives could be easily extended and much more effective if the source of the sequenced material and the subsequent focus of the data analysis were aligned with the research interests of individual faculty at undergraduate institutions. By judicious use of surplus capacity in existing nucleotide sequencing cores, raw sequence data could be generated to support ongoing research efforts involving undergraduates. This would allow these students to participate actively in discovery research, with a goal of making novel contributions to their field through original research while nurturing the next generation of talented research scientists. PMID:23653696
"Shovel-ready" Sequences as a Stimulus for the Next Generation of Life Scientists.
Boyle, Michael D
2010-01-01
Genomics and bioinformatics are dynamic fields well-suited for capturing the imagination of undergraduates in both research laboratories and classrooms. Currently, raw nucleotide sequence is being provided, as part of several genomics research initiatives, for undergraduate research and teaching. These initiatives could be easily extended and much more effective if the source of the sequenced material and the subsequent focus of the data analysis were aligned with the research interests of individual faculty at undergraduate institutions. By judicious use of surplus capacity in existing nucleotide sequencing cores, raw sequence data could be generated to support ongoing research efforts involving undergraduates. This would allow these students to participate actively in discovery research, with a goal of making novel contributions to their field through original research while nurturing the next generation of talented research scientists.
Next-generation digital information storage in DNA.
Church, George M; Gao, Yuan; Kosuri, Sriram
2012-09-28
Digital information is accumulating at an astounding rate, straining our ability to store and archive it. DNA is among the most dense and stable information media known. The development of new technologies in both DNA synthesis and sequencing make DNA an increasingly feasible digital storage medium. We developed a strategy to encode arbitrary digital information in DNA, wrote a 5.27-megabit book using DNA microchips, and read the book by using next-generation DNA sequencing.
2016-06-13
syndrome ; JCV 5 JC polyomavirus; NGS 5 next- generation sequencing; PML 5 progressive multifocal leukoencephalopathy. Ascertainment of the etiology of...Hunt-like syndrome and focal pachymeningitis. A 69-year-old man developed left-sided ptosis and Figure 1 Heatmap shows the top microbial species in each...The symptoms were followed by decreased vision, diplopia, ophthalmoplegia, and facial numbness. He was diagnosed with Tolosa-Hunt syndrome and treated
Brancaccio, Rosario N; Robitaille, Alexis; Dutta, Sankhadeep; Cuenin, Cyrille; Santare, Daiga; Skenders, Girts; Leja, Marcis; Fischer, Nicole; Giuliano, Anna R; Rollison, Dana E; Grundhoff, Adam; Tommasino, Massimo; Gheit, Tarik
2018-05-07
With the advent of new molecular tools, the discovery of new papillomaviruses (PVs) has accelerated during the past decade, enabling the expansion of knowledge about the viral populations that inhabit the human body. Human PVs (HPVs) are etiologically linked to benign or malignant lesions of the skin and mucosa. The detection of HPV types can vary widely, depending mainly on the methodology and the quality of the biological sample. Next-generation sequencing is one of the most powerful tools, enabling the discovery of novel viruses in a wide range of biological material. Here, we report a novel protocol for the detection of known and unknown HPV types in human skin and oral gargle samples using improved PCR protocols combined with next-generation sequencing. We identified 105 putative new PV types in addition to 296 known types, thus providing important information about the viral distribution in the oral cavity and skin. Copyright © 2018. Published by Elsevier Inc.
Kamboj, Atul; Hallwirth, Claus V; Alexander, Ian E; McCowage, Geoffrey B; Kramer, Belinda
2017-06-17
The analysis of viral vector genomic integration sites is an important component in assessing the safety and efficiency of patient treatment using gene therapy. Alongside this clinical application, integration site identification is a key step in the genetic mapping of viral elements in mutagenesis screens that aim to elucidate gene function. We have developed a UNIX-based vector integration site analysis pipeline (Ub-ISAP) that utilises a UNIX-based workflow for automated integration site identification and annotation of both single and paired-end sequencing reads. Reads that contain viral sequences of interest are selected and aligned to the host genome, and unique integration sites are then classified as transcription start site-proximal, intragenic or intergenic. Ub-ISAP provides a reliable and efficient pipeline to generate large datasets for assessing the safety and efficiency of integrating vectors in clinical settings, with broader applications in cancer research. Ub-ISAP is available as an open source software package at https://sourceforge.net/projects/ub-isap/ .
Metagenome assembly through clustering of next-generation sequencing data using protein sequences.
Sim, Mikang; Kim, Jaebum
2015-02-01
The study of environmental microbial communities, called metagenomics, has gained a lot of attention because of the recent advances in next-generation sequencing (NGS) technologies. Microbes play a critical role in changing their environments, and the mode of their effect can be solved by investigating metagenomes. However, the difficulty of metagenomes, such as the combination of multiple microbes and different species abundance, makes metagenome assembly tasks more challenging. In this paper, we developed a new metagenome assembly method by utilizing protein sequences, in addition to the NGS read sequences. Our method (i) builds read clusters by using mapping information against available protein sequences, and (ii) creates contig sequences by finding consensus sequences through probabilistic choices from the read clusters. By using simulated NGS read sequences from real microbial genome sequences, we evaluated our method in comparison with four existing assembly programs. We found that our method could generate relatively long and accurate metagenome assemblies, indicating that the idea of using protein sequences, as a guide for the assembly, is promising. Copyright © 2015 Elsevier B.V. All rights reserved.
Genome-wide gene–gene interaction analysis for next-generation sequencing
Zhao, Jinying; Zhu, Yun; Xiong, Momiao
2016-01-01
The critical barrier in interaction analysis for next-generation sequencing (NGS) data is that the traditional pairwise interaction analysis that is suitable for common variants is difficult to apply to rare variants because of their prohibitive computational time, large number of tests and low power. The great challenges for successful detection of interactions with NGS data are (1) the demands in the paradigm of changes in interaction analysis; (2) severe multiple testing; and (3) heavy computations. To meet these challenges, we shift the paradigm of interaction analysis between two SNPs to interaction analysis between two genomic regions. In other words, we take a gene as a unit of analysis and use functional data analysis techniques as dimensional reduction tools to develop a novel statistic to collectively test interaction between all possible pairs of SNPs within two genome regions. By intensive simulations, we demonstrate that the functional logistic regression for interaction analysis has the correct type 1 error rates and higher power to detect interaction than the currently used methods. The proposed method was applied to a coronary artery disease dataset from the Wellcome Trust Case Control Consortium (WTCCC) study and the Framingham Heart Study (FHS) dataset, and the early-onset myocardial infarction (EOMI) exome sequence datasets with European origin from the NHLBI's Exome Sequencing Project. We discovered that 6 of 27 pairs of significantly interacted genes in the FHS were replicated in the independent WTCCC study and 24 pairs of significantly interacted genes after applying Bonferroni correction in the EOMI study. PMID:26173972
Robertson, Charles E; Harris, J Kirk; Wagner, Brandie D; Granger, David; Browne, Kathy; Tatem, Beth; Feazel, Leah M; Park, Kristin; Pace, Norman R; Frank, Daniel N
2013-12-01
Studies of the human microbiome, and microbial community ecology in general, have blossomed of late and are now a burgeoning source of exciting research findings. Along with the advent of next-generation sequencing platforms, which have dramatically increased the scope of microbiome-related projects, several high-performance sequence analysis pipelines (e.g. QIIME, MOTHUR, VAMPS) are now available to investigators for microbiome analysis. The subject of our manuscript, the graphical user interface-based Explicet software package, fills a previously unmet need for a robust, yet intuitive means of integrating the outputs of the software pipelines with user-specified metadata and then visualizing the combined data.
Next generation sequencing as a useful tool in the diagnostics of mosaicism in Alport syndrome.
Beicht, Sonja; Strobl-Wildemann, Gertrud; Rath, Sabine; Wachter, Oliver; Alberer, Martin; Kaminsky, Elke; Weber, Lutz T; Hinrichsen, Tanja; Klein, Hanns-Georg; Hoefele, Julia
2013-09-10
Alport syndrome (ATS) is a progressive hereditary nephropathy characterized by hematuria and/or proteinuria with structural defects of the glomerular basement membrane. It can be associated with extrarenal manifestations (high-tone sensorineural hearing loss and ocular abnormalities). Somatic mutations in COL4A5 (X-linked), COL4A3 and COL4A4 genes (both autosomal recessive and autosomal dominant) cause Alport syndrome. Somatic mosaicism in Alport patients is very rare. The reason for this may be due to the difficulty of detection. We report the case of a boy and his mother who presented with Alport syndrome. Mutational analysis showed the novel hemizygote pathogenic mutation c.2396-1G>A (IVS29-1G>A) at the splice acceptor site of the intron 29 exon 30 boundary of the COL4A5 gene in the boy. The mutation in the mother would not have been detected by Sanger sequencing without the knowledge of the mutational analysis result of her son. Further investigation of the mother using next generation sequencing showed somatic mosaicism and implied potential germ cell mosaicism. The mutation in the mother has most likely occurred during early embryogenesis. Analysis of tissue of different embryonic origin in the mother confirmed mosaicism in both mesoderm and ectoderm. Low grade mosaicism is very difficult to detect by Sanger sequencing. Next generation sequencing is increasingly used in the diagnostics and might improve the detection of mosaicism. In the case of definite clinical symptoms of ATS and missing detection of a mutation by Sanger sequencing, mutational analysis should be performed by next generation sequencing. Copyright © 2013 Elsevier B.V. All rights reserved.
Pilotte, Nils; Papaiakovou, Marina; Grant, Jessica R; Bierwert, Lou Ann; Llewellyn, Stacey; McCarthy, James S; Williams, Steven A
2016-03-01
The soil transmitted helminths are a group of parasitic worms responsible for extensive morbidity in many of the world's most economically depressed locations. With growing emphasis on disease mapping and eradication, the availability of accurate and cost-effective diagnostic measures is of paramount importance to global control and elimination efforts. While real-time PCR-based molecular detection assays have shown great promise, to date, these assays have utilized sub-optimal targets. By performing next-generation sequencing-based repeat analyses, we have identified high copy-number, non-coding DNA sequences from a series of soil transmitted pathogens. We have used these repetitive DNA elements as targets in the development of novel, multi-parallel, PCR-based diagnostic assays. Utilizing next-generation sequencing and the Galaxy-based RepeatExplorer web server, we performed repeat DNA analysis on five species of soil transmitted helminths (Necator americanus, Ancylostoma duodenale, Trichuris trichiura, Ascaris lumbricoides, and Strongyloides stercoralis). Employing high copy-number, non-coding repeat DNA sequences as targets, novel real-time PCR assays were designed, and assays were tested against established molecular detection methods. Each assay provided consistent detection of genomic DNA at quantities of 2 fg or less, demonstrated species-specificity, and showed an improved limit of detection over the existing, proven PCR-based assay. The utilization of next-generation sequencing-based repeat DNA analysis methodologies for the identification of molecular diagnostic targets has the ability to improve assay species-specificity and limits of detection. By exploiting such high copy-number repeat sequences, the assays described here will facilitate soil transmitted helminth diagnostic efforts. We recommend similar analyses when designing PCR-based diagnostic tests for the detection of other eukaryotic pathogens.
Reiman, Anne; Pandey, Sarojini; Lloyd, Kate L; Dyer, Nigel; Khan, Mike; Crockard, Martin; Latten, Mark J; Watson, Tracey L; Cree, Ian A; Grammatopoulos, Dimitris K
2016-11-01
Background Detection of disease-associated mutations in patients with familial hypercholesterolaemia is crucial for early interventions to reduce risk of cardiovascular disease. Screening for these mutations represents a methodological challenge since more than 1200 different causal mutations in the low-density lipoprotein receptor has been identified. A number of methodological approaches have been developed for screening by clinical diagnostic laboratories. Methods Using primers targeting, the low-density lipoprotein receptor, apolipoprotein B, and proprotein convertase subtilisin/kexin type 9, we developed a novel Ion Torrent-based targeted re-sequencing method. We validated this in a West Midlands-UK small cohort of 58 patients screened in parallel with other mutation-targeting methods, such as multiplex polymerase chain reaction (Elucigene FH20), oligonucleotide arrays (Randox familial hypercholesterolaemia array) or the Illumina next-generation sequencing platform. Results In this small cohort, the next-generation sequencing method achieved excellent analytical performance characteristics and showed 100% and 89% concordance with the Randox array and the Elucigene FH20 assay. Investigation of the discrepant results identified two cases of mutation misclassification of the Elucigene FH20 multiplex polymerase chain reaction assay. A number of novel mutations not previously reported were also identified by the next-generation sequencing method. Conclusions Ion Torrent-based next-generation sequencing can deliver a suitable alternative for the molecular investigation of familial hypercholesterolaemia patients, especially when comprehensive mutation screening for rare or unknown mutations is required.
Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum.
VanBuren, Robert; Bryant, Doug; Edger, Patrick P; Tang, Haibao; Burgess, Diane; Challabathula, Dinakar; Spittle, Kristi; Hall, Richard; Gu, Jenny; Lyons, Eric; Freeling, Michael; Bartels, Dorothea; Ten Hallers, Boudewijn; Hastie, Alex; Michael, Todd P; Mockler, Todd C
2015-11-26
Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a 'near-complete' draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.
Next generation sequencing (NGS): a golden tool in forensic toolkit.
Aly, S M; Sabri, D M
The DNA analysis is a cornerstone in contemporary forensic sciences. DNA sequencing technologies are powerful tools that enrich molecular sciences in the past based on Sanger sequencing and continue to glowing these sciences based on Next generation sequencing (NGS). Next generation sequencing has excellent potential to flourish and increase the molecular applications in forensic sciences by jumping over the pitfalls of the conventional method of sequencing. The main advantages of NGS compared to conventional method that it utilizes simultaneously a large number of genetic markers with high-resolution of genetic data. These advantages will help in solving several challenges such as mixture analysis and dealing with minute degraded samples. Based on these new technologies, many markers could be examined to get important biological data such as age, geographical origins, tissue type determination, external visible traits and monozygotic twins identification. It also could get data related to microbes, insects, plants and soil which are of great medico-legal importance. Despite the dozens of forensic research involving NGS, there are requirements before using this technology routinely in forensic cases. Thus, there is a great need to more studies that address robustness of these techniques. Therefore, this work highlights the applications of forensic sciences in the era of massively parallel sequencing.
Zhou, Bin; Lin, Xudong; Wang, Wei; Halpin, Rebecca A.; Bera, Jayati; Stockwell, Timothy B.; Barr, Ian G.
2014-01-01
Although human influenza B virus (IBV) is a significant human pathogen, its great genetic diversity has limited our ability to universally amplify the entire genome for subsequent sequencing or vaccine production. The generation of sequence data via next-generation approaches and the rapid cloning of viral genes are critical for basic research, diagnostics, antiviral drugs, and vaccines to combat IBV. To overcome the difficulty of amplifying the diverse and ever-changing IBV genome, we developed and optimized techniques that amplify the complete segmented negative-sense RNA genome from any IBV strain in a single tube/well (IBV genomic amplification [IBV-GA]). Amplicons for >1,000 diverse IBV genomes from different sample types (e.g., clinical specimens) were generated and sequenced using this robust technology. These approaches are sensitive, robust, and sequence independent (i.e., universally amplify past, present, and future IBVs), which facilitates next-generation sequencing and advanced genomic diagnostics. Importantly, special terminal sequences engineered into the optimized IBV-GA2 products also enable ligation-free cloning to rapidly generate reverse-genetics plasmids, which can be used for the rescue of recombinant viruses and/or the creation of vaccine seed stock. PMID:24501036
Deep whole-genome sequencing of 90 Han Chinese genomes.
Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen
2017-09-01
Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency < 5%), including 5 813 503 single nucleotide polymorphisms, 1 169 199 InDels, and 17 927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000 Genomes Project, as well as to other human genome projects. © The Authors 2017. Published by Oxford University Press.
MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.
Lee, Wan-Ping; Stromberg, Michael P; Ward, Alistair; Stewart, Chip; Garrison, Erik P; Marth, Gabor T
2014-01-01
MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me).
MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping
Lee, Wan-Ping; Stromberg, Michael P.; Ward, Alistair; Stewart, Chip; Garrison, Erik P.; Marth, Gabor T.
2014-01-01
MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me). PMID:24599324
Open-Source Sequence Clustering Methods Improve the State Of the Art.
Kopylova, Evguenia; Navas-Molina, Jose A; Mercier, Céline; Xu, Zhenjiang Zech; Mahé, Frédéric; He, Yan; Zhou, Hong-Wei; Rognes, Torbjørn; Caporaso, J Gregory; Knight, Rob
2016-01-01
Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1).
Zavodna, Monika; Grueber, Catherine E; Gemmell, Neil J
2013-01-01
Next-generation sequencing (NGS) on pooled samples has already been broadly applied in human medical diagnostics and plant and animal breeding. However, thus far it has been only sparingly employed in ecology and conservation, where it may serve as a useful diagnostic tool for rapid assessment of species genetic diversity and structure at the population level. Here we undertake a comprehensive evaluation of the accuracy, practicality and limitations of parallel tagged amplicon NGS on pooled population samples for estimating species population diversity and structure. We obtained 16S and Cyt b data from 20 populations of Leiopelma hochstetteri, a frog species of conservation concern in New Zealand, using two approaches - parallel tagged NGS on pooled population samples and individual Sanger sequenced samples. Data from each approach were then used to estimate two standard population genetic parameters, nucleotide diversity (π) and population differentiation (FST), that enable population genetic inference in a species conservation context. We found a positive correlation between our two approaches for population genetic estimates, showing that the pooled population NGS approach is a reliable, rapid and appropriate method for population genetic inference in an ecological and conservation context. Our experimental design also allowed us to identify both the strengths and weaknesses of the pooled population NGS approach and outline some guidelines and suggestions that might be considered when planning future projects.
Next-generation sequencing in schizophrenia and other neuropsychiatric disorders.
Schreiber, Matthew; Dorschner, Michael; Tsuang, Debby
2013-10-01
Schizophrenia is a debilitating lifelong illness that lacks a cure and poses a worldwide public health burden. The disease is characterized by a heterogeneous clinical and genetic presentation that complicates research efforts to identify causative genetic variations. This review examines the potential of current findings in schizophrenia and in other related neuropsychiatric disorders for application in next-generation technologies, particularly whole-exome sequencing (WES) and whole-genome sequencing (WGS). These approaches may lead to the discovery of underlying genetic factors for schizophrenia and may thereby identify and target novel therapeutic targets for this devastating disorder. © 2013 Wiley Periodicals, Inc.
Next-Generation Sequencing of Antibody Display Repertoires
Rouet, Romain; Jackson, Katherine J. L.; Langley, David B.; Christ, Daniel
2018-01-01
In vitro selection technology has transformed the development of therapeutic monoclonal antibodies. Using methods such as phage, ribosome, and yeast display, high affinity binders can be selected from diverse repertoires. Here, we review strategies for the next-generation sequencing (NGS) of phage- and other antibody-display libraries, as well as NGS platforms and analysis tools. Moreover, we discuss recent examples relating to the use of NGS to assess library diversity, clonal enrichment, and affinity maturation. PMID:29472918
Polygenic Versus Monogenic Causes of Hypercholesterolemia Ascertained Clinically.
Wang, Jian; Dron, Jacqueline S; Ban, Matthew R; Robinson, John F; McIntyre, Adam D; Alazzam, Maher; Zhao, Pei Jun; Dilliott, Allison A; Cao, Henian; Huff, Murray W; Rhainds, David; Low-Kam, Cécile; Dubé, Marie-Pierre; Lettre, Guillaume; Tardif, Jean-Claude; Hegele, Robert A
2016-12-01
Next-generation sequencing technology is transforming our understanding of heterozygous familial hypercholesterolemia, including revision of prevalence estimates and attribution of polygenic effects. Here, we examined the contributions of monogenic and polygenic factors in patients with severe hypercholesterolemia referred to a specialty clinic. We applied targeted next-generation sequencing with custom annotation, coupled with evaluation of large-scale copy number variation and polygenic scores for raised low-density lipoprotein cholesterol in a cohort of 313 individuals with severe hypercholesterolemia, defined as low-density lipoprotein cholesterol >5.0 mmol/L (>194 mg/dL). We found that (1) monogenic familial hypercholesterolemia-causing mutations detected by targeted next-generation sequencing were present in 47.3% of individuals; (2) the percentage of individuals with monogenic mutations increased to 53.7% when copy number variations were included; (3) the percentage further increased to 67.1% when individuals with extreme polygenic scores were included; and (4) the percentage of individuals with an identified genetic component increased from 57.0% to 92.0% as low-density lipoprotein cholesterol level increased from 5.0 to >8.0 mmol/L (194 to >310 mg/dL). In a clinically ascertained sample with severe hypercholesterolemia, we found that most patients had a discrete genetic basis detected using a comprehensive screening approach that includes targeted next-generation sequencing, an assay for copy number variations, and polygenic trait scores. © 2016 American Heart Association, Inc.
Signature of genetic associations in oral cancer.
Sharma, Vishwas; Nandan, Amrita; Sharma, Amitesh Kumar; Singh, Harpreet; Bharadwaj, Mausumi; Sinha, Dhirendra Narain; Mehrotra, Ravi
2017-10-01
Oral cancer etiology is complex and controlled by multi-factorial events including genetic events. Candidate gene studies, genome-wide association studies, and next-generation sequencing identified various chromosomal loci to be associated with oral cancer. There is no available review that could give us the comprehensive picture of genetic loci identified to be associated with oral cancer by candidate gene studies-based, genome-wide association studies-based, and next-generation sequencing-based approaches. A systematic literature search was performed in the PubMed database to identify the loci associated with oral cancer by exclusive candidate gene studies-based, genome-wide association studies-based, and next-generation sequencing-based study approaches. The information of loci associated with oral cancer is made online through the resource "ORNATE." Next, screening of the loci validated by candidate gene studies and next-generation sequencing approach or by two independent studies within candidate gene studies or next-generation sequencing approaches were performed. A total of 264 loci were identified to be associated with oral cancer by candidate gene studies, genome-wide association studies, and next-generation sequencing approaches. In total, 28 loci, that is, 14q32.33 (AKT1), 5q22.2 (APC), 11q22.3 (ATM), 2q33.1 (CASP8), 11q13.3 (CCND1), 16q22.1 (CDH1), 9p21.3 (CDKN2A), 1q31.1 (COX-2), 7p11.2 (EGFR), 22q13.2 (EP300), 4q35.2 (FAT1), 4q31.3 (FBXW7), 4p16.3 (FGFR3), 1p13.3 (GSTM1-GSTT1), 11q13.2 (GSTP1), 11p15.5 (H-RAS), 3p25.3 (hOGG1), 1q32.1 (IL-10), 4q13.3 (IL-8), 12p12.1 (KRAS), 12q15 (MDM2), 12q13.12 (MLL2), 9q34.3 (NOTCH1), 17p13.1 (p53), 3q26.32 (PIK3CA), 10q23.31 (PTEN), 13q14.2 (RB1), and 5q14.2 (XRCC4), were validated to be associated with oral cancer. "ORNATE" gives a snapshot of genetic loci associated with oral cancer. All 28 loci were validated to be linked to oral cancer for which further fine-mapping followed by gene-by-gene and gene-environment interaction studies is needed to confirm their involvement in modifying oral cancer.
ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data
2012-01-01
Background Next-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results. Results Here we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy. Conclusion ReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration. PMID:22946927
Yu, Jia; Blom, Jochen; Sczyrba, Alexander; Goesmann, Alexander
2017-09-10
The introduction of next generation sequencing has caused a steady increase in the amounts of data that have to be processed in modern life science. Sequence alignment plays a key role in the analysis of sequencing data e.g. within whole genome sequencing or metagenome projects. BLAST is a commonly used alignment tool that was the standard approach for more than two decades, but in the last years faster alternatives have been proposed including RapSearch, GHOSTX, and DIAMOND. Here we introduce HAMOND, an application that uses Apache Hadoop to parallelize DIAMOND computation in order to scale-out the calculation of alignments. HAMOND is fault tolerant and scalable by utilizing large cloud computing infrastructures like Amazon Web Services. HAMOND has been tested in comparative genomics analyses and showed promising results both in efficiency and accuracy. Copyright © 2017 The Author(s). Published by Elsevier B.V. All rights reserved.
Mining and Development of Novel SSR Markers Using Next Generation Sequencing (NGS) Data in Plants.
Taheri, Sima; Lee Abdullah, Thohirah; Yusop, Mohd Rafii; Hanafi, Mohamed Musa; Sahebi, Mahbod; Azizi, Parisa; Shamshiri, Redmond Ramin
2018-02-13
Microsatellites, or simple sequence repeats (SSRs), are one of the most informative and multi-purpose genetic markers exploited in plant functional genomics. However, the discovery of SSRs and development using traditional methods are laborious, time-consuming, and costly. Recently, the availability of high-throughput sequencing technologies has enabled researchers to identify a substantial number of microsatellites at less cost and effort than traditional approaches. Illumina is a noteworthy transcriptome sequencing technology that is currently used in SSR marker development. Although 454 pyrosequencing datasets can be used for SSR development, this type of sequencing is no longer supported. This review aims to present an overview of the next generation sequencing, with a focus on the efficient use of de novo transcriptome sequencing (RNA-Seq) and related tools for mining and development of microsatellites in plants.
Transforming clinical microbiology with bacterial genome sequencing.
Didelot, Xavier; Bowden, Rory; Wilson, Daniel J; Peto, Tim E A; Crook, Derrick W
2012-09-01
Whole-genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here, we review the current status of clinical microbiology and how it has already begun to be transformed by using next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties, such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. We predict that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow.
Transforming clinical microbiology with bacterial genome sequencing
2016-01-01
Whole genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here we review the current status of clinical microbiology and how it has already begun to be transformed by the use of next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. The application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow. PMID:22868263
Genetic diagnosis of familial hypercholesterolaemia by targeted next-generation sequencing
Maglio, C; Mancina, R M; Motta, B M; Stef, M; Pirazzi, C; Palacios, L; Askaryar, N; Borén, J; Wiklund, O; Romeo, S
2014-01-01
Maglio C., Mancina R. M., Motta B. M., Stef M., Pirazzi C., Palacios L., Askaryar N., Borén J., Wiklund O., Romeo S. (University of Gothenburg, Gothenburg, Sweden; University Magna Graecia of Catanzaro, Italy; University of Milan, Italy; Progenika Biopharma SA, Derio, Spain). Genetic diagnosis of familial hypercholesterolaemia by targeted next-generation sequencing. Objectives The aim of this study was to combine clinical criteria and next-generation sequencing (pyrosequencing) to establish a diagnosis of familial hypercholesterolaemia (FH). Design, setting and subjects A total of 77 subjects with a Dutch Lipid Clinic Network score of ≥3 (possible, probable or definite FH clinical diagnosis) were recruited from the Lipid Clinic at Sahlgrenska Hospital, Gothenburg, Sweden. Next-generation sequencing was performed in all subjects using SEQPRO LIPO RS, a kit that detects mutations in the low-density lipoprotein receptor (LDLR), apolipoprotein B (APOB), proprotein convertase subtilisin/kexin type 9 (PCSK9) and LDLR adapter protein 1 (LDLRAP1) genes; copy-number variations in the LDLR gene were also examined. Results A total of 26 mutations were detected in 50 subjects (65% success rate). Amongst these, 23 mutations were in the LDLR gene, two in the APOB gene and one in the PCSK9 gene. Four mutations with unknown pathogenicity were detected in LDLR. Of these, three mutations (Gly505Asp, Ile585Thr and Gln660Arg) have been previously reported in subjects with FH, but their pathogenicity has not been proved. The fourth, a mutation in LDLR affecting a splicing site (exon 6–intron 6) has not previously been reported; it was found to segregate with high cholesterol levels in the family of the proband. Conclusions Using a combination of clinical criteria and targeted next-generation sequencing, we have achieved FH diagnosis with a high success rate. Furthermore, we identified a new splicing-site mutation in the LDLR gene. PMID:24785115
The advantages of SMRT sequencing.
Roberts, Richard J; Carneiro, Mauricio O; Schatz, Michael C
2013-07-03
Of the current next-generation sequencing technologies, SMRT sequencing is sometimes overlooked. However, attributes such as long reads, modified base detection and high accuracy make SMRT a useful technology and an ideal approach to the complete sequencing of small genomes.
2016-01-01
Comprehensive next generation sequencing virus detection was used to detect the whole spectrum of viruses and viroids in selected grapevines from the Czech Republic. The novel NGS approach was based on sequencing libraries of small RNA isolated from grapevine vascular tissues. Eight previously partially-characterized grapevines of diverse varieties were selected and subjected to analysis: Chardonnay, Laurot, Guzal Kara, and rootstock Kober 125AA from the Moravia wine-producing region; plus Müller-Thurgau and Pinot Noir from the Bohemia wine-producing region, both in the Czech Republic. Using next generation sequencing of small RNA, the presence of 8 viruses and 2 viroids were detected in a set of eight grapevines; therefore, confirming the high effectiveness of the technique in plant virology and producing results supporting previous data on multiple infected grapevines in Czech vineyards. Among the pathogens detected, the Grapevine rupestris vein feathering virus and Grapevine yellow speckle viroid 1 were recorded in the Czech Republic for the first time. PMID:27959951
Eichmeier, Aleš; Komínková, Marcela; Komínek, Petr; Baránek, Miroslav
2016-01-01
Comprehensive next generation sequencing virus detection was used to detect the whole spectrum of viruses and viroids in selected grapevines from the Czech Republic. The novel NGS approach was based on sequencing libraries of small RNA isolated from grapevine vascular tissues. Eight previously partially-characterized grapevines of diverse varieties were selected and subjected to analysis: Chardonnay, Laurot, Guzal Kara, and rootstock Kober 125AA from the Moravia wine-producing region; plus Müller-Thurgau and Pinot Noir from the Bohemia wine-producing region, both in the Czech Republic. Using next generation sequencing of small RNA, the presence of 8 viruses and 2 viroids were detected in a set of eight grapevines; therefore, confirming the high effectiveness of the technique in plant virology and producing results supporting previous data on multiple infected grapevines in Czech vineyards. Among the pathogens detected, the Grapevine rupestris vein feathering virus and Grapevine yellow speckle viroid 1 were recorded in the Czech Republic for the first time.
A 1000 Arab genome project to study the Emirati population.
Al-Ali, Mariam; Osman, Wael; Tay, Guan K; AlSafar, Habiba S
2018-04-01
Discoveries from the human genome, HapMap, and 1000 genome projects have collectively contributed toward the creation of a catalog of human genetic variations that has improved our understanding of human diversity. Despite the collegial nature of many of these genome study consortiums, which has led to the cataloging of genetic variations of different ethnic groups from around the world, genome data on the Arab population remains overwhelmingly underrepresented. The National Arab Genome project in the United Arab Emirates (UAE) aims to address this deficiency by using Next Generation Sequencing (NGS) technology to provide data to improve our understanding of the Arab genome and catalog variants that are unique to the Arab population of the UAE. The project was conceived to shed light on the similarities and differences between the Arab genome and those of the other ethnic groups.
An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Taylor, Ronald C.
Bioinformatics researchers are increasingly confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBasemore » project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date.« less
BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons
2011-01-01
Background Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image. Results BLAST Ring Image Generator (BRIG) can generate images that show multiple prokaryote genome comparisons, without an arbitrary limit on the number of genomes compared. The output image shows similarity between a central reference sequence and other sequences as a set of concentric rings, where BLAST matches are coloured on a sliding scale indicating a defined percentage identity. Images can also include draft genome assembly information to show read coverage, assembly breakpoints and collapsed repeats. In addition, BRIG supports the mapping of unassembled sequencing reads against one or more central reference sequences. Many types of custom data and annotations can be shown using BRIG, making it a versatile approach for visualising a range of genomic comparison data. BRIG is readily accessible to any user, as it assumes no specialist computational knowledge and will perform all required file parsing and BLAST comparisons automatically. Conclusions There is a clear need for a user-friendly program that can produce genome comparisons for a large number of prokaryote genomes with an emphasis on rapidly utilising unfinished or unassembled genome data. Here we present BRIG, a cross-platform application that enables the interactive generation of comparative genomic images via a simple graphical-user interface. BRIG is freely available for all operating systems at http://sourceforge.net/projects/brig/. PMID:21824423
BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons.
Alikhan, Nabil-Fareed; Petty, Nicola K; Ben Zakour, Nouri L; Beatson, Scott A
2011-08-08
Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image. BLAST Ring Image Generator (BRIG) can generate images that show multiple prokaryote genome comparisons, without an arbitrary limit on the number of genomes compared. The output image shows similarity between a central reference sequence and other sequences as a set of concentric rings, where BLAST matches are coloured on a sliding scale indicating a defined percentage identity. Images can also include draft genome assembly information to show read coverage, assembly breakpoints and collapsed repeats. In addition, BRIG supports the mapping of unassembled sequencing reads against one or more central reference sequences. Many types of custom data and annotations can be shown using BRIG, making it a versatile approach for visualising a range of genomic comparison data. BRIG is readily accessible to any user, as it assumes no specialist computational knowledge and will perform all required file parsing and BLAST comparisons automatically. There is a clear need for a user-friendly program that can produce genome comparisons for a large number of prokaryote genomes with an emphasis on rapidly utilising unfinished or unassembled genome data. Here we present BRIG, a cross-platform application that enables the interactive generation of comparative genomic images via a simple graphical-user interface. BRIG is freely available for all operating systems at http://sourceforge.net/projects/brig/.
USDA-ARS?s Scientific Manuscript database
High-throughput next-generation sequencing was used to scan the genome and generate reliable sequence of high copy number regions. Using this method, we examined whole plastid genomes as well as nearly 6000 bases of nuclear ribosomal DNA sequences for nine genotypes of Theobroma cacao and an indivi...
Christensen, Paul A.; Ni, Yunyun; Bao, Feifei; Hendrickson, Heather L.; Greenwood, Michael; Thomas, Jessica S.; Long, S. Wesley; Olsen, Randall J.
2017-01-01
Introduction: Next-generation-sequencing (NGS) is increasingly used in clinical and research protocols for patients with cancer. NGS assays are routinely used in clinical laboratories to detect mutations bearing on cancer diagnosis, prognosis and personalized therapy. A typical assay may interrogate 50 or more gene targets that encompass many thousands of possible gene variants. Analysis of NGS data in cancer is a labor-intensive process that can become overwhelming to the molecular pathologist or research scientist. Although commercial tools for NGS data analysis and interpretation are available, they are often costly, lack key functionality or cannot be customized by the end user. Methods: To facilitate NGS data analysis in our clinical molecular diagnostics laboratory, we created a custom bioinformatics tool termed Houston Methodist Variant Viewer (HMVV). HMVV is a Java-based solution that integrates sequencing instrument output, bioinformatics analysis, storage resources and end user interface. Results: Compared to the predicate method used in our clinical laboratory, HMVV markedly simplifies the bioinformatics workflow for the molecular technologist and facilitates the variant review by the molecular pathologist. Importantly, HMVV reduces time spent researching the biological significance of the variants detected, standardizes the online resources used to perform the variant investigation and assists generation of the annotated report for the electronic medical record. HMVV also maintains a searchable variant database, including the variant annotations generated by the pathologist, which is useful for downstream quality improvement and research projects. Conclusions: HMVV is a clinical grade, low-cost, feature-rich, highly customizable platform that we have made available for continued development by the pathology informatics community. PMID:29226007
Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data
NASA Astrophysics Data System (ADS)
Sandmann, Sarah; de Graaf, Aniek O.; Karimi, Mohsen; van der Reijden, Bert A.; Hellström-Lindberg, Eva; Jansen, Joop H.; Dugas, Martin
2017-02-01
Valid variant calling results are crucial for the use of next-generation sequencing in clinical routine. However, there are numerous variant calling tools that usually differ in algorithms, filtering strategies, recommendations and thus, also in the output. We evaluated eight open-source tools regarding their ability to call single nucleotide variants and short indels with allelic frequencies as low as 1% in non-matched next-generation sequencing data: GATK HaplotypeCaller, Platypus, VarScan, LoFreq, FreeBayes, SNVer, SAMtools and VarDict. We analysed two real datasets from patients with myelodysplastic syndrome, covering 54 Illumina HiSeq samples and 111 Illumina NextSeq samples. Mutations were validated by re-sequencing on the same platform, on a different platform and expert based review. In addition we considered two simulated datasets with varying coverage and error profiles, covering 50 samples each. In all cases an identical target region consisting of 19 genes (42,322 bp) was analysed. Altogether, no tool succeeded in calling all mutations. High sensitivity was always accompanied by low precision. Influence of varying coverages- and background noise on variant calling was generally low. Taking everything into account, VarDict performed best. However, our results indicate that there is a need to improve reproducibility of the results in the context of multithreading.
Toxicogenomics and Cancer Susceptibility: Advances with Next-Generation Sequencing
Ning, Baitang; Su, Zhenqiang; Mei, Nan; Hong, Huixiao; Deng, Helen; Shi, Leming; Fuscoe, James C.; Tolleson, William H.
2017-01-01
The aim of this review is to comprehensively summarize the recent achievements in the field of toxicogenomics and cancer research regarding genetic-environmental interactions in carcinogenesis and detection of genetic aberrations in cancer genomes by next-generation sequencing technology. Cancer is primarily a genetic disease in which genetic factors and environmental stimuli interact to cause genetic and epigenetic aberrations in human cells. Mutations in the germline act as either high-penetrance alleles that strongly increase the risk of cancer development, or as low-penetrance alleles that mildly change an individual’s susceptibility to cancer. Somatic mutations, resulting from either DNA damage induced by exposure to environmental mutagens or from spontaneous errors in DNA replication or repair are involved in the development or progression of the cancer. Induced or spontaneous changes in the epigenome may also drive carcinogenesis. Advances in next-generation sequencing technology provide us opportunities to accurately, economically, and rapidly identify genetic variants, somatic mutations, gene expression profiles, and epigenetic alterations with single-base resolution. Whole genome sequencing, whole exome sequencing, and RNA sequencing of paired cancer and adjacent normal tissue present a comprehensive picture of the cancer genome. These new findings should benefit public health by providing insights in understanding cancer biology, and in improving cancer diagnosis and therapy. PMID:24875441
Next Generation Sequence Assembly with AMOS
Treangen, Todd J; Sommer, Dan D; Angly, Florent E; Koren, Sergey; Pop, Mihai
2011-01-01
A Modular Open-Source Assembler (AMOS) was designed to offer a modular approach to genome assembly. AMOS includes a wide range of tools for assembly, including lightweight de novo assemblers Minimus and Minimo, and Bambus 2, a robust scaffolder able to handle metagenomic and polymorphic data. This protocol describes how to configure and use AMOS for the assembly of Next Generation sequence data. Additionally, we provide three tutorial examples that include bacterial, viral, and metagenomic datasets with specific tips for improving assembly quality. PMID:21400694
NEXT Ion Propulsion System Development Status and Capabilities
NASA Technical Reports Server (NTRS)
Patterson, Michael J.; Benson, Scott W.
2008-01-01
NASA s Evolutionary Xenon Thruster (NEXT) project is developing next generation ion propulsion technologies to provide future NASA science missions with enhanced mission performance benefit at a low total development cost. The objective of the NEXT project is to advance next generation ion propulsion technology by producing engineering model system components, validating these through qualification-level and integrated system testing, and ensuring preparedness for transitioning to flight system development. As NASA s Evolutionary Xenon Thruster technology program completes advanced development activities, it is advantageous to review the existing technology capabilities of the system under development. This paper describes the NEXT ion propulsion system development status, characteristics and performance. A review of mission analyses results conducted to date using the NEXT system is also provided.
Shum, Bennett O V; Henner, Ilya; Belluoccio, Daniele; Hinchcliffe, Marcus J
2017-07-01
The sensitivity and specificity of next-generation sequencing laboratory developed tests (LDTs) are typically determined by an analyte-specific approach. Analyte-specific validations use disease-specific controls to assess an LDT's ability to detect known pathogenic variants. Alternatively, a methods-based approach can be used for LDT technical validations. Methods-focused validations do not use disease-specific controls but use benchmark reference DNA that contains known variants (benign, variants of unknown significance, and pathogenic) to assess variant calling accuracy of a next-generation sequencing workflow. Recently, four whole-genome reference materials (RMs) from the National Institute of Standards and Technology (NIST) were released to standardize methods-based validations of next-generation sequencing panels across laboratories. We provide a practical method for using NIST RMs to validate multigene panels. We analyzed the utility of RMs in validating a novel newborn screening test that targets 70 genes, called NEO1. Despite the NIST RM variant truth set originating from multiple sequencing platforms, replicates, and library types, we discovered a 5.2% false-negative variant detection rate in the RM truth set genes that were assessed in our validation. We developed a strategy using complementary non-RM controls to demonstrate 99.6% sensitivity of the NEO1 test in detecting variants. Our findings have implications for laboratories or proficiency testing organizations using whole-genome NIST RMs for testing. Copyright © 2017 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
ADOMA: A Command Line Tool to Modify ClustalW Multiple Alignment Output.
Zaal, Dionne; Nota, Benjamin
2016-01-01
We present ADOMA, a command line tool that produces alternative outputs from ClustalW multiple alignments of nucleotide or protein sequences. ADOMA can simplify the output of alignments by showing only the different residues between sequences, which is often desirable when only small differences such as single nucleotide polymorphisms are present (e.g., between different alleles). Another feature of ADOMA is that it can enhance the ClustalW output by coloring the residues in the alignment. This tool is easily integrated into automated Linux pipelines for next-generation sequencing data analysis, and may be useful for researchers in a broad range of scientific disciplines including evolutionary biology and biomedical sciences. The source code is freely available at https://sourceforge. net/projects/adoma/. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NexGen Production â Sequencing and Analysis
Muzny, Donna
2018-01-16
Donna Muzny of the Baylor College of Medicine Human Genome Sequencing Center discusses next generation sequencing platforms and evaluating pipeline performance on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.
From genomics to functional markers in the era of next-generation sequencing.
Salgotra, R K; Gupta, B B; Stewart, C N
2014-03-01
The availability of complete genome sequences, along with other genomic resources for Arabidopsis, rice, pigeon pea, soybean and other crops, has revolutionized our understanding of the genetic make-up of plants. Next-generation DNA sequencing (NGS) has facilitated single nucleotide polymorphism discovery in plants. Functionally-characterized sequences can be identified and functional markers (FMs) for important traits can be developed at an ever-increasing ease. FMs are derived from sequence polymorphisms found in allelic variants of a functional gene. Linkage disequilibrium-based association mapping and homologous recombinants have been developed for identification of "perfect" markers for their use in crop improvement practices. Compared with many other molecular markers, FMs derived from the functionally characterized sequence genes using NGS techniques and their use provide opportunities to develop high-yielding plant genotypes resistant to various stresses at a fast pace.
Hocum, Jonah D; Battrell, Logan R; Maynard, Ryan; Adair, Jennifer E; Beard, Brian C; Rawlings, David J; Kiem, Hans-Peter; Miller, Daniel G; Trobridge, Grant D
2015-07-07
Analyzing the integration profile of retroviral vectors is a vital step in determining their potential genotoxic effects and developing safer vectors for therapeutic use. Identifying retroviral vector integration sites is also important for retroviral mutagenesis screens. We developed VISA, a vector integration site analysis server, to analyze next-generation sequencing data for retroviral vector integration sites. Sequence reads that contain a provirus are mapped to the human genome, sequence reads that cannot be localized to a unique location in the genome are filtered out, and then unique retroviral vector integration sites are determined based on the alignment scores of the remaining sequence reads. VISA offers a simple web interface to upload sequence files and results are returned in a concise tabular format to allow rapid analysis of retroviral vector integration sites.
First report of bacterial community from a Bat Guano using Illumina next-generation sequencing.
De Mandal, Surajit; Zothansanga; Panda, Amritha Kumari; Bisht, Satpal Singh; Senthil Kumar, Nachimuthu
2015-06-01
V4 hypervariable region of 16S rDNA was analyzed for identifying the bacterial communities present in Bat Guano from the unexplored cave - Pnahkyndeng, Meghalaya, Northeast India. Metagenome comprised of 585,434 raw Illumina sequences with a 59.59% G+C content. A total of 416,490 preprocessed reads were clustered into 1282 OTUs (operational taxonomical units) comprising of 18 bacterial phyla. The taxonomic profile showed that the guano bacterial community is dominated by Chloroflexi, Actinobacteria and Crenarchaeota which account for 70.73% of all sequence reads and 43.83% of all OTUs. Metagenome sequence data are available at NCBI under the accession no. SRP051094. This study is the first to characterize Bat Guano bacterial community using next-generation sequencing approach.
First report of bacterial community from a Bat Guano using Illumina next-generation sequencing
De Mandal, Surajit; Zothansanga; Panda, Amritha Kumari; Bisht, Satpal Singh; Senthil Kumar, Nachimuthu
2015-01-01
V4 hypervariable region of 16S rDNA was analyzed for identifying the bacterial communities present in Bat Guano from the unexplored cave — Pnahkyndeng, Meghalaya, Northeast India. Metagenome comprised of 585,434 raw Illumina sequences with a 59.59% G+C content. A total of 416,490 preprocessed reads were clustered into 1282 OTUs (operational taxonomical units) comprising of 18 bacterial phyla. The taxonomic profile showed that the guano bacterial community is dominated by Chloroflexi, Actinobacteria and Crenarchaeota which account for 70.73% of all sequence reads and 43.83% of all OTUs. Metagenome sequence data are available at NCBI under the accession no. SRP051094. This study is the first to characterize Bat Guano bacterial community using next-generation sequencing approach. PMID:26484190
Comparison of Next-Generation Sequencing Systems
Liu, Lin; Li, Yinhu; Li, Siliang; Hu, Ni; He, Yimin; Pong, Ray; Lin, Danni; Lu, Lihua; Law, Maggie
2012-01-01
With fast development and wide applications of next-generation sequencing (NGS) technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI), which possesses the world's biggest sequencing capacity, has multiple NGS systems including 137 HiSeq 2000, 27 SOLiD, one Ion Torrent PGM, one MiSeq, and one 454 sequencer. We have accumulated extensive experience in sample handling, sequencing, and bioinformatics analysis. In this paper, technologies of these systems are reviewed, and first-hand data from extensive experience is summarized and analyzed to discuss the advantages and specifics associated with each sequencing system. At last, applications of NGS are summarized. PMID:22829749
Yleaf: Software for Human Y-Chromosomal Haplogroup Inference from Next-Generation Sequencing Data.
Ralf, Arwin; Montiel González, Diego; Zhong, Kaiyin; Kayser, Manfred
2018-05-01
Next-generation sequencing (NGS) technologies offer immense possibilities given the large genomic data they simultaneously deliver. The human Y-chromosome serves as good example how NGS benefits various applications in evolution, anthropology, genealogy, and forensics. Prior to NGS, the Y-chromosome phylogenetic tree consisted of a few hundred branches, based on NGS data, it now contains many thousands. The complexity of both, Y tree and NGS data provide challenges for haplogroup assignment. For effective analysis and interpretation of Y-chromosome NGS data, we present Yleaf, a publically available, automated, user-friendly software for high-resolution Y-chromosome haplogroup inference independently of library and sequencing methods.
Using GBrowse 2.0 to visualize and share next-generation sequence data
2013-01-01
GBrowse is a mature web-based genome browser that is suitable for deployment on both public and private web sites. It supports most of genome browser features, including qualitative and quantitative (wiggle) tracks, track uploading, track sharing, interactive track configuration, semantic zooming and limited smooth track panning. As of version 2.0, GBrowse supports next-generation sequencing (NGS) data by providing for the direct display of SAM and BAM sequence alignment files. SAM/BAM tracks provide semantic zooming and support both local and remote data sources. This article provides step-by-step instructions for configuring GBrowse to display NGS data. PMID:23376193
Lu, Chaoxia; Wu, Wei; Xiao, Jifang; Meng, Yan; Zhang, Shuyang; Zhang, Xue
2013-06-01
To detect pathogenic mutations in Marfan syndrome (MFS) using an Ion Torrent Personal Genome Machine (PGM) and to validate the result of targeted next-generation semiconductor sequencing for the diagnosis of genetic disorders. Peripheral blood samples were collected from three MFS patients and a normal control with informed consent. Genomic DNA was isolated by standard method and then subjected to targeted sequencing using an Ion Ampliseq(TM) Inherited Disease Panel. Three multiplex PCR reactions were carried out to amplify the coding exons of 328 genes including FBN1, TGFBR1 and TGFBR2. DNA fragments from different samples were ligated with barcoded sequencing adaptors. Template preparation and emulsion PCR, and Ion Sphere Particles enrichment were carried out using an Ion One Touch system. The ion sphere particles were sequenced on a 318 chip using the PGM platform. Data from the PGM runs were processed using an Ion Torrent Suite 3.2 software to generate sequence reads. After sequence alignment and extraction of SNPs and indels, all the variants were filtered against dbSNP137. DNA sequences were visualized with an Integrated Genomics Viewer. The most likely disease-causing variants were analyzed by Sanger sequencing. The PGM sequencing has yielded an output of 855.80 Mb, with a > 100 × median sequencing depth and a coverage of > 98% for the targeted regions in all the four samples. After data analysis and database filtering, one known missense mutation (p.E1811K) and two novel premature termination mutations (p.E2264X and p.L871FfsX23) in the FBN1 gene were identified in the three MFS patients. All mutations were verified by conventional Sanger sequencing. Pathogenic FBN1 mutations have been identified in all patients with MFS, indicating that the targeted next-generation sequencing on the PGM sequencers can be applied for accurate and high-throughput testing of genetic disorders.
Nematode.net update 2011: addition of data sets and tools featuring next-generation sequencing data
Martin, John; Abubucker, Sahar; Heizer, Esley; Taylor, Christina M.; Mitreva, Makedonka
2012-01-01
Nematode.net (http://nematode.net) has been a publicly available resource for studying nematodes for over a decade. In the past 3 years, we reorganized Nematode.net to provide more user-friendly navigation through the site, a necessity due to the explosion of data from next-generation sequencing platforms. Organism-centric portals containing dynamically generated data are available for over 56 different nematode species. Next-generation data has been added to the various data-mining portals hosted, including NemaBLAST and NemaBrowse. The NemaPath metabolic pathway viewer builds associations using KOs, rather than ECs to provide more accurate and fine-grained descriptions of proteins. Two new features for data analysis and comparative genomics have been added to the site. NemaSNP enables the user to perform population genetics studies in various nematode populations using next-generation sequencing data. HelmCoP (Helminth Control and Prevention) as an independent component of Nematode.net provides an integrated resource for storage, annotation and comparative genomics of helminth genomes to aid in learning more about nematode genomes, as well as drug, pesticide, vaccine and drug target discovery. With this update, Nematode.net will continue to realize its original goal to disseminate diverse bioinformatic data sets and provide analysis tools to the broad scientific community in a useful and user-friendly manner. PMID:22139919
Survey of MapReduce frame operation in bioinformatics.
Zou, Quan; Li, Xu-Bin; Jiang, Wen-Rui; Lin, Zi-Yu; Li, Gui-Lin; Chen, Ke
2014-07-01
Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes.
Oyola, Samuel O; Otto, Thomas D; Gu, Yong; Maslen, Gareth; Manske, Magnus; Campino, Susana; Turner, Daniel J; Macinnis, Bronwyn; Kwiatkowski, Dominic P; Swerdlow, Harold P; Quail, Michael A
2012-01-03
Massively parallel sequencing technology is revolutionizing approaches to genomic and genetic research. Since its advent, the scale and efficiency of Next-Generation Sequencing (NGS) has rapidly improved. In spite of this success, sequencing genomes or genomic regions with extremely biased base composition is still a great challenge to the currently available NGS platforms. The genomes of some important pathogenic organisms like Plasmodium falciparum (high AT content) and Mycobacterium tuberculosis (high GC content) display extremes of base composition. The standard library preparation procedures that employ PCR amplification have been shown to cause uneven read coverage particularly across AT and GC rich regions, leading to problems in genome assembly and variation analyses. Alternative library-preparation approaches that omit PCR amplification require large quantities of starting material and hence are not suitable for small amounts of DNA/RNA such as those from clinical isolates. We have developed and optimized library-preparation procedures suitable for low quantity starting material and tolerant to extremely high AT content sequences. We have used our optimized conditions in parallel with standard methods to prepare Illumina sequencing libraries from a non-clinical and a clinical isolate (containing ~53% host contamination). By analyzing and comparing the quality of sequence data generated, we show that our optimized conditions that involve a PCR additive (TMAC), produces amplified libraries with improved coverage of extremely AT-rich regions and reduced bias toward GC neutral templates. We have developed a robust and optimized Next-Generation Sequencing library amplification method suitable for extremely AT-rich genomes. The new amplification conditions significantly reduce bias and retain the complexity of either extremes of base composition. This development will greatly benefit sequencing clinical samples that often require amplification due to low mass of DNA starting material.
Friis-Nielsen, Jens; Vinner, Lasse; Hansen, Thomas Arn; Richter, Stine Raith; Fridholm, Helena; Herrera, Jose Alejandro Romero; Lund, Ole; Brunak, Søren; Izarzugaza, Jose M. G.; Mourier, Tobias; Nielsen, Lars Peter
2016-01-01
Propionibacterium acnes is the most abundant bacterium on human skin, particularly in sebaceous areas. P. acnes is suggested to be an opportunistic pathogen involved in the development of diverse medical conditions but is also a proven contaminant of human clinical samples and surgical wounds. Its significance as a pathogen is consequently a matter of debate. In the present study, we investigated the presence of P. acnes DNA in 250 next-generation sequencing data sets generated from 180 samples of 20 different sample types, mostly of cancerous origin. The samples were subjected to either microbial enrichment, involving nuclease treatment to reduce the amount of host nucleic acids, or shotgun sequencing. We detected high proportions of P. acnes DNA in enriched samples, particularly skin tissue-derived and other tissue samples, with the levels being higher in enriched samples than in shotgun-sequenced samples. P. acnes reads were detected in most samples analyzed, though the proportions in most shotgun-sequenced samples were low. Our results show that P. acnes can be detected in practically all sample types when molecular methods, such as next-generation sequencing, are employed. The possibility of contamination from the patient or other sources, including laboratory reagents or environment, should therefore always be considered carefully when P. acnes is detected in clinical samples. We advocate that detection of P. acnes always be accompanied by experiments validating the association between this bacterium and any clinical condition. PMID:26818667
Takahashi, Shunsuke; Tomita, Junko; Nishioka, Kaori; Hisada, Takayoshi; Nishijima, Miyuki
2014-01-01
For the analysis of microbial community structure based on 16S rDNA sequence diversity, sensitive and robust PCR amplification of 16S rDNA is a critical step. To obtain accurate microbial composition data, PCR amplification must be free of bias; however, amplifying all 16S rDNA species with equal efficiency from a sample containing a large variety of microorganisms remains challenging. Here, we designed a universal primer based on the V3-V4 hypervariable region of prokaryotic 16S rDNA for the simultaneous detection of Bacteria and Archaea in fecal samples from crossbred pigs (Landrace×Large white×Duroc) using an Illumina MiSeq next-generation sequencer. In-silico analysis showed that the newly designed universal prokaryotic primers matched approximately 98.0% of Bacteria and 94.6% of Archaea rRNA gene sequences in the Ribosomal Database Project database. For each sequencing reaction performed with the prokaryotic universal primer, an average of 69,330 (±20,482) reads were obtained, of which archaeal rRNA genes comprised approximately 1.2% to 3.2% of all prokaryotic reads. In addition, the detection frequency of Bacteria belonging to the phylum Verrucomicrobia, including members of the classes Verrucomicrobiae and Opitutae, was higher in the NGS analysis using the prokaryotic universal primer than that performed with the bacterial universal primer. Importantly, this new prokaryotic universal primer set had markedly lower bias than that of most previously designed universal primers. Our findings demonstrate that the prokaryotic universal primer set designed in the present study will permit the simultaneous detection of Bacteria and Archaea, and will therefore allow for a more comprehensive understanding of microbial community structures in environmental samples. PMID:25144201
Cheng, Ji-Hong; Liu, Wen-Chun; Chang, Ting-Tsung; Hsieh, Sun-Yuan; Tseng, Vincent S
2017-10-01
Many studies have suggested that deletions of Hepatitis B Viral (HBV) are associated with the development of progressive liver diseases, even ultimately resulting in hepatocellular carcinoma (HCC). Among the methods for detecting deletions from next-generation sequencing (NGS) data, few methods considered the characteristics of virus, such as high evolution rates and high divergence among the different HBV genomes. Sequencing high divergence HBV genome sequences using the NGS technology outputs millions of reads. Thus, detecting exact breakpoints of deletions from these big and complex data incurs very high computational cost. We proposed a novel analytical method named VirDelect (Virus Deletion Detect), which uses split read alignment base to detect exact breakpoint and diversity variable to consider high divergence in single-end reads data, such that the computational cost can be reduced without losing accuracy. We use four simulated reads datasets and two real pair-end reads datasets of HBV genome sequence to verify VirDelect accuracy by score functions. The experimental results show that VirDelect outperforms the state-of-the-art method Pindel in terms of accuracy score for all simulated datasets and VirDelect had only two base errors even in real datasets. VirDelect is also shown to deliver high accuracy in analyzing the single-end read data as well as pair-end data. VirDelect can serve as an effective and efficient bioinformatics tool for physiologists with high accuracy and efficient performance and applicable to further analysis with characteristics similar to HBV on genome length and high divergence. The software program of VirDelect can be downloaded at https://sourceforge.net/projects/virdelect/. Copyright © 2017. Published by Elsevier Inc.
Ethical and legal implications of whole genome and whole exome sequencing in African populations.
Wright, Galen E B; Koornhof, Pieter G J; Adeyemo, Adebowale A; Tiffin, Nicki
2013-05-28
Rapid advances in high throughput genomic technologies and next generation sequencing are making medical genomic research more readily accessible and affordable, including the sequencing of patient and control whole genomes and exomes in order to elucidate genetic factors underlying disease. Over the next five years, the Human Heredity and Health in Africa (H3Africa) Initiative, funded by the Wellcome Trust (United Kingdom) and the National Institutes of Health (United States of America), will contribute greatly towards sequencing of numerous African samples for biomedical research. Funding agencies and journals often require submission of genomic data from research participants to databases that allow open or controlled data access for all investigators. Access to such genotype-phenotype and pedigree data, however, needs careful control in order to prevent identification of individuals or families. This is particularly the case in Africa, where many researchers and their patients are inexperienced in the ethical issues accompanying whole genome and exome research; and where an historical unidirectional flow of samples and data out of Africa has created a sense of exploitation and distrust. In the current study, we analysed the implications of the anticipated surge of next generation sequencing data in Africa and the subsequent data sharing concepts on the protection of privacy of research subjects. We performed a retrospective analysis of the informed consent process for the continent and the rest-of-the-world and examined relevant legislation, both current and proposed. We investigated the following issues: (i) informed consent, including guidelines for performing culturally-sensitive next generation sequencing research in Africa and availability of suitable informed consent documents; (ii) data security and subject privacy whilst practicing data sharing; (iii) conveying the implications of such concepts to research participants in resource limited settings. We conclude that, in order to meet the unique requirements of performing next generation sequencing-related research in African populations, novel approaches to the informed consent process are required. This will help to avoid infringement of privacy of individual subjects as well as to ensure that informed consent adheres to acceptable data protection levels with regard to use and transfer of such information.
Ethical and legal implications of whole genome and whole exome sequencing in African populations
2013-01-01
Background Rapid advances in high throughput genomic technologies and next generation sequencing are making medical genomic research more readily accessible and affordable, including the sequencing of patient and control whole genomes and exomes in order to elucidate genetic factors underlying disease. Over the next five years, the Human Heredity and Health in Africa (H3Africa) Initiative, funded by the Wellcome Trust (United Kingdom) and the National Institutes of Health (United States of America), will contribute greatly towards sequencing of numerous African samples for biomedical research. Discussion Funding agencies and journals often require submission of genomic data from research participants to databases that allow open or controlled data access for all investigators. Access to such genotype-phenotype and pedigree data, however, needs careful control in order to prevent identification of individuals or families. This is particularly the case in Africa, where many researchers and their patients are inexperienced in the ethical issues accompanying whole genome and exome research; and where an historical unidirectional flow of samples and data out of Africa has created a sense of exploitation and distrust. In the current study, we analysed the implications of the anticipated surge of next generation sequencing data in Africa and the subsequent data sharing concepts on the protection of privacy of research subjects. We performed a retrospective analysis of the informed consent process for the continent and the rest-of-the-world and examined relevant legislation, both current and proposed. We investigated the following issues: (i) informed consent, including guidelines for performing culturally-sensitive next generation sequencing research in Africa and availability of suitable informed consent documents; (ii) data security and subject privacy whilst practicing data sharing; (iii) conveying the implications of such concepts to research participants in resource limited settings. Summary We conclude that, in order to meet the unique requirements of performing next generation sequencing-related research in African populations, novel approaches to the informed consent process are required. This will help to avoid infringement of privacy of individual subjects as well as to ensure that informed consent adheres to acceptable data protection levels with regard to use and transfer of such information. PMID:23714101
Khan, Waqasuddin; Saripella, Ganapathi Varma-; Ludwig, Thomas; Cuppens, Tania; Thibord, Florian; Génin, Emmanuelle; Deleuze, Jean-Francois; Trégouët, David-Alexandre
2018-05-03
Predicted deleteriousness of coding variants is a frequently used criterion to filter out variants detected in next-generation sequencing projects and to select candidates impacting on the risk of human diseases. Most available dedicated tools implement a base-to-base annotation approach that could be biased in presence of several variants in the same genetic codon. We here proposed the MACARON program that, from a standard VCF file, identifies, re-annotates and predicts the amino acid change resulting from multiple single nucleotide variants (SNVs) within the same genetic codon. Applied to the whole exome dataset of 573 individuals, MACARON identifies 114 situations where multiple SNVs within a genetic codon induce an amino acid change that is different from those predicted by standard single SNV annotation tool. Such events are not uncommon and deserve to be studied in sequencing projects with inconclusive findings. MACARON is written in python with codes available on the GENMED website (www.genmed.fr). david-alexandre.tregouet@inserm.fr. Supplementary data are available at Bioinformatics online.
DNApod: DNA polymorphism annotation database from next-generation sequence read archives.
Mochizuki, Takako; Tanizawa, Yasuhiro; Fujisawa, Takatomo; Ohta, Tazro; Nikoh, Naruo; Shimizu, Tokurou; Toyoda, Atsushi; Fujiyama, Asao; Kurata, Nori; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu
2017-01-01
With the rapid advances in next-generation sequencing (NGS), datasets for DNA polymorphisms among various species and strains have been produced, stored, and distributed. However, reliability varies among these datasets because the experimental and analytical conditions used differ among assays. Furthermore, such datasets have been frequently distributed from the websites of individual sequencing projects. It is desirable to integrate DNA polymorphism data into one database featuring uniform quality control that is distributed from a single platform at a single place. DNA polymorphism annotation database (DNApod; http://tga.nig.ac.jp/dnapod/) is an integrated database that stores genome-wide DNA polymorphism datasets acquired under uniform analytical conditions, and this includes uniformity in the quality of the raw data, the reference genome version, and evaluation algorithms. DNApod genotypic data are re-analyzed whole-genome shotgun datasets extracted from sequence read archives, and DNApod distributes genome-wide DNA polymorphism datasets and known-gene annotations for each DNA polymorphism. This new database was developed for storing genome-wide DNA polymorphism datasets of plants, with crops being the first priority. Here, we describe our analyzed data for 679, 404, and 66 strains of rice, maize, and sorghum, respectively. The analytical methods are available as a DNApod workflow in an NGS annotation system of the DNA Data Bank of Japan and a virtual machine image. Furthermore, DNApod provides tables of links of identifiers between DNApod genotypic data and public phenotypic data. To advance the sharing of organism knowledge, DNApod offers basic and ubiquitous functions for multiple alignment and phylogenetic tree construction by using orthologous gene information.
DNApod: DNA polymorphism annotation database from next-generation sequence read archives
Mochizuki, Takako; Tanizawa, Yasuhiro; Fujisawa, Takatomo; Ohta, Tazro; Nikoh, Naruo; Shimizu, Tokurou; Toyoda, Atsushi; Fujiyama, Asao; Kurata, Nori; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu
2017-01-01
With the rapid advances in next-generation sequencing (NGS), datasets for DNA polymorphisms among various species and strains have been produced, stored, and distributed. However, reliability varies among these datasets because the experimental and analytical conditions used differ among assays. Furthermore, such datasets have been frequently distributed from the websites of individual sequencing projects. It is desirable to integrate DNA polymorphism data into one database featuring uniform quality control that is distributed from a single platform at a single place. DNA polymorphism annotation database (DNApod; http://tga.nig.ac.jp/dnapod/) is an integrated database that stores genome-wide DNA polymorphism datasets acquired under uniform analytical conditions, and this includes uniformity in the quality of the raw data, the reference genome version, and evaluation algorithms. DNApod genotypic data are re-analyzed whole-genome shotgun datasets extracted from sequence read archives, and DNApod distributes genome-wide DNA polymorphism datasets and known-gene annotations for each DNA polymorphism. This new database was developed for storing genome-wide DNA polymorphism datasets of plants, with crops being the first priority. Here, we describe our analyzed data for 679, 404, and 66 strains of rice, maize, and sorghum, respectively. The analytical methods are available as a DNApod workflow in an NGS annotation system of the DNA Data Bank of Japan and a virtual machine image. Furthermore, DNApod provides tables of links of identifiers between DNApod genotypic data and public phenotypic data. To advance the sharing of organism knowledge, DNApod offers basic and ubiquitous functions for multiple alignment and phylogenetic tree construction by using orthologous gene information. PMID:28234924
The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants.
Fadista, João; Manning, Alisa K; Florez, Jose C; Groop, Leif
2016-08-01
Genome-wide association studies (GWAS) have long relied on proposed statistical significance thresholds to be able to differentiate true positives from false positives. Although the genome-wide significance P-value threshold of 5 × 10(-8) has become a standard for common-variant GWAS, it has not been updated to cope with the lower allele frequency spectrum used in many recent array-based GWAS studies and sequencing studies. Using a whole-genome- and -exome-sequencing data set of 2875 individuals of European ancestry from the Genetics of Type 2 Diabetes (GoT2D) project and a whole-exome-sequencing data set of 13 000 individuals from five ancestries from the GoT2D and T2D-GENES (Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples) projects, we describe guidelines for genome- and exome-wide association P-value thresholds needed to correct for multiple testing, explaining the impact of linkage disequilibrium thresholds for distinguishing independent variants, minor allele frequency and ancestry characteristics. We emphasize the advantage of studying recent genetic isolate populations when performing rare and low-frequency genetic association analyses, as the multiple testing burden is diminished due to higher genetic homogeneity.
Zhang, Jianwei; Kudrna, Dave; Mu, Ting; Li, Weiming; Copetti, Dario; Yu, Yeisoo; Goicoechea, Jose Luis; Lei, Yang; Wing, Rod A
2016-10-15
Next generation sequencing technologies have revolutionized our ability to rapidly and affordably generate vast quantities of sequence data. Once generated, raw sequences are assembled into contigs or scaffolds. However, these assemblies are mostly fragmented and inaccurate at the whole genome scale, largely due to the inability to integrate additional informative datasets (e.g. physical, optical and genetic maps). To address this problem, we developed a semi-automated software tool-Genome Puzzle Master (GPM)-that enables the integration of additional genomic signposts to edit and build 'new-gen-assemblies' that result in high-quality 'annotation-ready' pseudomolecules. With GPM, loaded datasets can be connected to each other via their logical relationships which accomplishes tasks to 'group,' 'merge,' 'order and orient' sequences in a draft assembly. Manual editing can also be performed with a user-friendly graphical interface. Final pseudomolecules reflect a user's total data package and are available for long-term project management. GPM is a web-based pipeline and an important part of a Laboratory Information Management System (LIMS) which can be easily deployed on local servers for any genome research laboratory. The GPM (with LIMS) package is available at https://github.com/Jianwei-Zhang/LIMS CONTACTS: jzhang@mail.hzau.edu.cn or rwing@mail.arizona.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Nakazato, Takeru; Ohta, Tazro; Bono, Hidemasa
2013-01-01
High-throughput sequencing technology, also called next-generation sequencing (NGS), has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA). As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs) from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH) extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called “Gendoo”. We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called “DBCLS SRA” (http://sra.dbcls.jp/). This service will improve accessibility to high-quality data from SRA. PMID:24167589
The sequence and de novo assembly of the giant panda genome
Li, Ruiqiang; Fan, Wei; Tian, Geng; Zhu, Hongmei; He, Lin; Cai, Jing; Huang, Quanfei; Cai, Qingle; Li, Bo; Bai, Yinqi; Zhang, Zhihe; Zhang, Yaping; Wang, Wen; Li, Jun; Wei, Fuwen; Li, Heng; Jian, Min; Li, Jianwen; Zhang, Zhaolei; Nielsen, Rasmus; Li, Dawei; Gu, Wanjun; Yang, Zhentao; Xuan, Zhaoling; Ryder, Oliver A.; Leung, Frederick Chi-Ching; Zhou, Yan; Cao, Jianjun; Sun, Xiao; Fu, Yonggui; Fang, Xiaodong; Guo, Xiaosen; Wang, Bo; Hou, Rong; Shen, Fujun; Mu, Bo; Ni, Peixiang; Lin, Runmao; Qian, Wubin; Wang, Guodong; Yu, Chang; Nie, Wenhui; Wang, Jinhuan; Wu, Zhigang; Liang, Huiqing; Min, Jiumeng; Wu, Qi; Cheng, Shifeng; Ruan, Jue; Wang, Mingwei; Shi, Zhongbin; Wen, Ming; Liu, Binghang; Ren, Xiaoli; Zheng, Huisong; Dong, Dong; Cook, Kathleen; Shan, Gao; Zhang, Hao; Kosiol, Carolin; Xie, Xueying; Lu, Zuhong; Zheng, Hancheng; Li, Yingrui; Steiner, Cynthia C.; Lam, Tommy Tsan-Yuk; Lin, Siyuan; Zhang, Qinghui; Li, Guoqing; Tian, Jing; Gong, Timing; Liu, Hongde; Zhang, Dejin; Fang, Lin; Ye, Chen; Zhang, Juanbin; Hu, Wenbo; Xu, Anlong; Ren, Yuanyuan; Zhang, Guojie; Bruford, Michael W.; Li, Qibin; Ma, Lijia; Guo, Yiran; An, Na; Hu, Yujie; Zheng, Yang; Shi, Yongyong; Li, Zhiqiang; Liu, Qing; Chen, Yanling; Zhao, Jing; Qu, Ning; Zhao, Shancen; Tian, Feng; Wang, Xiaoling; Wang, Haiyin; Xu, Lizhi; Liu, Xiao; Vinar, Tomas; Wang, Yajun; Lam, Tak-Wah; Yiu, Siu-Ming; Liu, Shiping; Zhang, Hemin; Li, Desheng; Huang, Yan; Wang, Xia; Yang, Guohua; Jiang, Zhi; Wang, Junyi; Qin, Nan; Li, Li; Li, Jingxiang; Bolund, Lars; Kristiansen, Karsten; Wong, Gane Ka-Shu; Olson, Maynard; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun
2013-01-01
Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes. PMID:20010809
Exome-wide DNA capture and next generation sequencing in domestic and wild species.
Cosart, Ted; Beja-Pereira, Albano; Chen, Shanyuan; Ng, Sarah B; Shendure, Jay; Luikart, Gordon
2011-07-05
Gene-targeted and genome-wide markers are crucial to advance evolutionary biology, agriculture, and biodiversity conservation by improving our understanding of genetic processes underlying adaptation and speciation. Unfortunately, for eukaryotic species with large genomes it remains costly to obtain genome sequences and to develop genome resources such as genome-wide SNPs. A method is needed to allow gene-targeted, next-generation sequencing that is flexible enough to include any gene or number of genes, unlike transcriptome sequencing. Such a method would allow sequencing of many individuals, avoiding ascertainment bias in subsequent population genetic analyses.We demonstrate the usefulness of a recent technology, exon capture, for genome-wide, gene-targeted marker discovery in species with no genome resources. We use coding gene sequences from the domestic cow genome sequence (Bos taurus) to capture (enrich for), and subsequently sequence, thousands of exons of B. taurus, B. indicus, and Bison bison (wild bison). Our capture array has probes for 16,131 exons in 2,570 genes, including 203 candidate genes with known function and of interest for their association with disease and other fitness traits. We successfully sequenced and mapped exon sequences from across the 29 autosomes and X chromosome in the B. taurus genome sequence. Exon capture and high-throughput sequencing identified thousands of putative SNPs spread evenly across all reference chromosomes, in all three individuals, including hundreds of SNPs in our targeted candidate genes. This study shows exon capture can be customized for SNP discovery in many individuals and for non-model species without genomic resources. Our captured exome subset was small enough for affordable next-generation sequencing, and successfully captured exons from a divergent wild species using the domestic cow genome as reference.
Xu, Jiajia; Li, Yuanyuan; Ma, Xiuling; Ding, Jianfeng; Wang, Kai; Wang, Sisi; Tian, Ye; Zhang, Hui; Zhu, Xin-Guang
2013-09-01
Setaria viridis is an emerging model species for genetic studies of C4 photosynthesis. Many basic molecular resources need to be developed to support for this species. In this paper, we performed a comprehensive transcriptome analysis from multiple developmental stages and tissues of S. viridis using next-generation sequencing technologies. Sequencing of the transcriptome from multiple tissues across three developmental stages (seed germination, vegetative growth, and reproduction) yielded a total of 71 million single end 100 bp long reads. Reference-based assembly using Setaria italica genome as a reference generated 42,754 transcripts. De novo assembly generated 60,751 transcripts. In addition, 9,576 and 7,056 potential simple sequence repeats (SSRs) covering S. viridis genome were identified when using the reference based assembled transcripts and the de novo assembled transcripts, respectively. This identified transcripts and SSR provided by this study can be used for both reverse and forward genetic studies based on S. viridis.
Benschop, Corina C G; Quaak, Frederike C A; Boon, Mathilde E; Sijen, Titia; Kuiper, Irene
2012-03-01
Forensic analysis of biological traces generally encompasses the investigation of both the person who contributed to the trace and the body site(s) from which the trace originates. For instance, for sexual assault cases, it can be beneficial to distinguish vaginal samples from skin or saliva samples. In this study, we explored the use of microbial flora to indicate vaginal origin. First, we explored the vaginal microbiome for a large set of clinical vaginal samples (n = 240) by next generation sequencing (n = 338,184 sequence reads) and found 1,619 different sequences. Next, we selected 389 candidate probes targeting genera or species and designed a microarray, with which we analysed a diverse set of samples; 43 DNA extracts from vaginal samples and 25 DNA extracts from samples from other body sites, including sites in close proximity of or in contact with the vagina. Finally, we used the microarray results and next generation sequencing dataset to assess the potential for a future approach that uses microbial markers to indicate vaginal origin. Since no candidate genera/species were found to positively identify all vaginal DNA extracts on their own, while excluding all non-vaginal DNA extracts, we deduce that a reliable statement about the cellular origin of a biological trace should be based on the detection of multiple species within various genera. Microarray analysis of a sample will then render a microbial flora pattern that is probably best analysed in a probabilistic approach.
Use of the Minion nanopore sequencer for rapid sequencing of avian influenza virus isolates
USDA-ARS?s Scientific Manuscript database
A relatively new sequencing technology, the MinION nanopore sequencer, provides a platform that is smaller, faster, and cheaper than existing Next Generation Sequence (NGS) technologies. The MinION sequences of individual strands of DNA and can produce millions of sequencing reads. The cost of the s...
Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum
DOE Office of Scientific and Technical Information (OSTI.GOV)
VanBuren, Robert; Bryant, Doug; Edger, Patrick P.
Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly1. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetiummore » genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a ‘near-complete’ draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. As a result, the Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.« less
Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum
VanBuren, Robert; Bryant, Doug; Edger, Patrick P.; ...
2015-11-11
Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly1. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetiummore » genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a ‘near-complete’ draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. As a result, the Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.« less
A robust and cost-effective approach to sequence and analyze complete genomes of small RNA viruses
USDA-ARS?s Scientific Manuscript database
Background: Next-generation sequencing (NGS) allows ultra-deep sequencing of nucleic acids. The use of sequence-independent amplification of viral nucleic acids without utilization of target-specific primers provides advantages over traditional sequencing methods and allows detection of unsuspected ...
Quality of Service for Real-Time Applications Over Next Generation Data Networks
NASA Technical Reports Server (NTRS)
Atiquzzaman, Mohammed; Jain, Raj
2001-01-01
This project, which started on January 1, 2000, was funded by the NASA Glenn Research Center for duration of one year. The deliverables of the project included the following tasks: (1) Study of QoS mapping between the edge and core networks envisioned in the Next Generation networks will provide us with the QoS guarantees that can be obtained from next generation networks; (2) Buffer management techniques to provide strict guarantees to real-time end-to-end applications through preferential treatment to packets belonging to real-time applications. In particular, use of ECN to help reduce the loss on high bandwidth-delay product satellite networks needs to be studied; (3) Effect of Prioritized Packet Discard to increase goodput of the network and reduce the buffering requirements in the ATM switches; (4) Provision of new IP circuit emulation services over Satellite IP backbones using MPLS will be studied; and (5) Determine the architecture and requirements for internetworking ATN and the Next Generation Internet for real-time applications. The project has been completed on time. All the objectives and deliverables of the project have been completed. Research results obtained from this project have been published in a number of papers in journals, conferences, and technical reports, included in this document.
Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.
Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T
2013-01-01
Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal. Furthermore, NGS of historical DNA enables recovering crucial genetic information from old type specimens that to date have remained mostly unutilized and, thus, opens up a new frontier for taxonomic research as well.
Alonso, Ana; Larraga, Vicente; Alcolea, Pedro J
2018-05-07
The first genome project of any living organism excluding viruses, the gammaproteobacteria Haemophilus influenzae, was completed in 1995. Until the last decade, genome sequencing was very tedious because genome survey sequences (GSS) and/or expressed sequence tags (ESTs) belonging to plasmid, cosmid and artificial chromosome genome libraries had to be sequenced and assembled in silico. Nowadays, no genome is completely assembled actually, because gaps and unassembled contigs are always remaining. However, most represent the whole genome of the organism of origin from a practical point of view. The first genome sequencing projects of trypanosomatid parasites were completed in 2005 following those strategies, and belong to Leishmania major, Trypanosoma cruzi and T. brucei. The functional genomics era rapidly developed on the basis of the microarray technology and has been evolving. In the case of the genus Leishmania, substantial biological information about differentiation in the digenetic life cycle of the parasite has been obtained. Later on, next generation sequencing has revolutionized genome sequencing and functional genomics, leading to more sensitive, accurate results by using much less resources. This new technology is more advantageous, but does not invalidate microarray results. In fact, promising vaccine candidates and drug targets have been found on the basis of microarray-based screening and preliminary proof-of-concept tests. Copyright © 2018. Published by Elsevier B.V.
Yum, Soo-Young; Lee, Song-Jeon; Kim, Hyun-Min; Choi, Woo-Jae; Park, Ji-Hyun; Lee, Won-Wu; Kim, Hee-Soo; Kim, Hyeong-Jong; Bae, Seong-Hun; Lee, Je-Hyeong; Moon, Joo-Yeong; Lee, Ji-Hyun; Lee, Choong-Il; Son, Bong-Jun; Song, Sang-Hoon; Ji, Su-Min; Kim, Seong-Jin; Jang, Goo
2016-01-01
Here, we efficiently generated transgenic cattle using two transposon systems (Sleeping Beauty and Piggybac) and their genomes were analyzed by next-generation sequencing (NGS). Blastocysts derived from microinjection of DNA transposons were selected and transferred into recipient cows. Nine transgenic cattle have been generated and grown-up to date without any health issues except two. Some of them expressed strong fluorescence and the transgene in the oocytes from a superovulating one were detected by PCR and sequencing. To investigate genomic variants by the transgene transposition, whole genomic DNA were analyzed by NGS. We found that preferred transposable integration (TA or TTAA) was identified in their genome. Even though multi-copies (i.e. fifteen) were confirmed, there was no significant difference in genome instabilities. In conclusion, we demonstrated that transgenic cattle using the DNA transposon system could be efficiently generated, and all those animals could be a valuable resource for agriculture and veterinary science. PMID:27324781
Baker, Mei W; Atkins, Anne E; Cordovado, Suzanne K; Hendrix, Miyono; Earley, Marie C; Farrell, Philip M
2016-03-01
Many regions have implemented newborn screening (NBS) for cystic fibrosis (CF) using a limited panel of cystic fibrosis transmembrane regulator (CFTR) mutations after immunoreactive trypsinogen (IRT) analysis. We sought to assess the feasibility of further improving the screening using next-generation sequencing (NGS) technology. An NGS assay was used to detect 162 CFTR mutations/variants characterized by the CFTR2 project. We used 67 dried blood spots (DBSs) containing 48 distinct CFTR mutations to validate the assay. NGS assay was retrospectively performed on 165 CF screen-positive samples with one CFTR mutation. The NGS assay was successfully performed using DNA isolated from DBSs, and it correctly detected all CFTR mutations in the validation. Among 165 screen-positive infants with one CFTR mutation, no additional disease-causing mutation was identified in 151 samples consistent with normal sweat tests. Five infants had a CF-causing mutation that was not included in this panel, and nine with two CF-causing mutations were identified. The NGS assay was 100% concordant with traditional methods. Retrospective analysis results indicate an IRT/NGS screening algorithm would enable high sensitivity, better specificity and positive predictive value (PPV). This study lays the foundation for prospective studies and for introducing NGS in NBS laboratories.
Next-generation sequencing in the clinic: promises and challenges.
Xuan, Jiekun; Yu, Ying; Qing, Tao; Guo, Lei; Shi, Leming
2013-11-01
The advent of next generation sequencing (NGS) technologies has revolutionized the field of genomics, enabling fast and cost-effective generation of genome-scale sequence data with exquisite resolution and accuracy. Over the past years, rapid technological advances led by academic institutions and companies have continued to broaden NGS applications from research to the clinic. A recent crop of discoveries have highlighted the medical impact of NGS technologies on Mendelian and complex diseases, particularly cancer. However, the ever-increasing pace of NGS adoption presents enormous challenges in terms of data processing, storage, management and interpretation as well as sequencing quality control, which hinder the translation from sequence data into clinical practice. In this review, we first summarize the technical characteristics and performance of current NGS platforms. We further highlight advances in the applications of NGS technologies towards the development of clinical diagnostics and therapeutics. Common issues in NGS workflows are also discussed to guide the selection of NGS platforms and pipelines for specific research purposes. Published by Elsevier Ireland Ltd.
FOUNTAIN: A JAVA open-source package to assist large sequencing projects
Buerstedde, Jean-Marie; Prill, Florian
2001-01-01
Background Better automation, lower cost per reaction and a heightened interest in comparative genomics has led to a dramatic increase in DNA sequencing activities. Although the large sequencing projects of specialized centers are supported by in-house bioinformatics groups, many smaller laboratories face difficulties managing the appropriate processing and storage of their sequencing output. The challenges include documentation of clones, templates and sequencing reactions, and the storage, annotation and analysis of the large number of generated sequences. Results We describe here a new program, named FOUNTAIN, for the management of large sequencing projects . FOUNTAIN uses the JAVA computer language and data storage in a relational database. Starting with a collection of sequencing objects (clones), the program generates and stores information related to the different stages of the sequencing project using a web browser interface for user input. The generated sequences are subsequently imported and annotated based on BLAST searches against the public databases. In addition, simple algorithms to cluster sequences and determine putative polymorphic positions are implemented. Conclusions A simple, but flexible and scalable software package is presented to facilitate data generation and storage for large sequencing projects. Open source and largely platform and database independent, we wish FOUNTAIN to be improved and extended in a community effort. PMID:11591214
Jiang, Jiming
2015-04-01
Sequencing of complete plant genomes has become increasingly more routine since the advent of the next-generation sequencing technology. Identification and annotation of large amounts of noncoding but functional DNA sequences, including cis-regulatory DNA elements (CREs), have become a new frontier in plant genome research. Genomic regions containing active CREs bound to regulatory proteins are hypersensitive to DNase I digestion and are called DNase I hypersensitive sites (DHSs). Several recent DHS studies in plants illustrate that DHS datasets produced by DNase I digestion followed by next-generation sequencing (DNase-seq) are highly valuable for the identification and characterization of CREs associated with plant development and responses to environmental cues. DHS-based genomic profiling has opened a door to identify and annotate the 'dark matter' in sequenced plant genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.
van den Oever, Jessica M E; van Minderhout, Ivonne J H M; Harteveld, Cornelis L; den Hollander, Nicolette S; Bakker, Egbert; van der Stoep, Nienke; Boon, Elles M J
2015-09-01
The challenge in noninvasive prenatal diagnosis for monogenic disorders lies in the detection of low levels of fetal variants in the excess of maternal cell-free plasma DNA. Next-generation sequencing, which is the main method used for noninvasive prenatal testing and diagnosis, can overcome this challenge. However, this method may not be accessible to all genetic laboratories. Moreover, shotgun next-generation sequencing as, for instance, currently applied for noninvasive fetal trisomy screening may not be suitable for the detection of inherited mutations. We have developed a sensitive, mutation-specific, and fast alternative for next-generation sequencing-mediated noninvasive prenatal diagnosis using a PCR-based method. For this proof-of-principle study, noninvasive fetal paternally inherited mutation detection was performed using cell-free DNA from maternal plasma. Preferential amplification of the paternally inherited allele was accomplished through a personalized approach using a blocking probe against maternal sequences in a high-resolution melting curve analysis-based assay. Enhanced detection of the fetal paternally inherited mutation was obtained for both an autosomal dominant and a recessive monogenic disorder by blocking the amplification of maternal sequences in maternal plasma. Copyright © 2015 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Li, Haonan; Jin, Peng; Hao, Qian; Zhu, Wei; Chen, Xia; Wang, Ping
2017-11-01
Waardenburg syndrome (WS) is a rare autosomal dominant disorder associated with pigmentation abnormalities and sensorineural hearing loss. In this study, we investigated the genetic cause of WSII in a patient and evaluated the reliability of the targeted next-generation exome sequencing method for the genetic diagnosis of WS. Clinical evaluations were conducted on the patient and targeted next-generation sequencing (NGS) was used to identify the candidate genes responsible for WSII. Multiplex ligation-dependent probe amplification (MLPA) and real-time quantitative polymerase chain reaction (qPCR) were performed to confirm the targeted NGS results. Targeted NGS detected the entire deletion of the coding sequence (CDS) of the SOX10 gene in the WSII patient. MLPA results indicated that all exons of the SOX10 heterozygous deletion were detected; no aberrant copy number in the PAX3 and microphthalmia-associated transcription factor (MITF) genes was found. Real-time qPCR results identified the mutation as a de novo heterozygous deletion. This is the first report of using a targeted NGS method for WS candidate gene sequencing; its accuracy was verified by using the MLPA and qPCR methods. Our research provides a valuable method for the genetic diagnosis of WS.
Leaf LIMS: A Flexible Laboratory Information Management System with a Synthetic Biology Focus.
Craig, Thomas; Holland, Richard; D'Amore, Rosalinda; Johnson, James R; McCue, Hannah V; West, Anthony; Zulkower, Valentin; Tekotte, Hille; Cai, Yizhi; Swan, Daniel; Davey, Robert P; Hertz-Fowler, Christiane; Hall, Anthony; Caddick, Mark
2017-12-15
This paper presents Leaf LIMS, a flexible laboratory information management system (LIMS) designed to address the complexity of synthetic biology workflows. At the project's inception there was a lack of a LIMS designed specifically to address synthetic biology processes, with most systems focused on either next generation sequencing or biobanks and clinical sample handling. Leaf LIMS implements integrated project, item, and laboratory stock tracking, offering complete sample and construct genealogy, materials and lot tracking, and modular assay data capture. Hence, it enables highly configurable task-based workflows and supports data capture from project inception to completion. As such, in addition to it supporting synthetic biology it is ideal for many laboratory environments with multiple projects and users. The system is deployed as a web application through Docker and is provided under a permissive MIT license. It is freely available for download at https://leaflims.github.io .
Sma3s: A universal tool for easy functional annotation of proteomes and transcriptomes.
Casimiro-Soriguer, Carlos S; Muñoz-Mérida, Antonio; Pérez-Pulido, Antonio J
2017-06-01
The current cheapening of next-generation sequencing has led to an enormous growth in the number of sequenced genomes and transcriptomes, allowing wet labs to get the sequences from their organisms of study. To make the most of these data, one of the first things that should be done is the functional annotation of the protein-coding genes. But it used to be a slow and tedious step that can involve the characterization of thousands of sequences. Sma3s is an accurate computational tool for annotating proteins in an unattended way. Now, we have developed a completely new version, which includes functionalities that will be of utility for fundamental and applied science. Currently, the results provide functional categories such as biological processes, which become useful for both characterizing particular sequence datasets and comparing results from different projects. But one of the most important implemented innovations is that it has now low computational requirements, and the complete annotation of a simple proteome or transcriptome usually takes around 24 hours in a personal computer. Sma3s has been tested with a large amount of complete proteomes and transcriptomes, and it has demonstrated its potential in health science and other specific projects. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Bontems, Franck; Baerlocher, Loic; Mehenni, Sabrina; Bahechar, Ilham; Farinelli, Laurent; Dosch, Roland
2011-02-18
Fish models like medaka, stickleback or zebrafish provide a valuable resource to study vertebrate genes. However, finding genetic variants e.g. mutations in the genome is still arduous. Here we used a combination of microarray capturing and next generation sequencing to identify the affected gene in the mozartkugelp11cv (mzlp11cv) mutant zebrafish. We discovered a 31-bp deletion in macf1 demonstrating the potential of this technique to efficiently isolate mutations in a vertebrate genome. Copyright © 2011 Elsevier Inc. All rights reserved.
Next generation sequence assembly with AMOS.
Treangen, Todd J; Sommer, Dan D; Angly, Florent E; Koren, Sergey; Pop, Mihai
2011-03-01
A Modular Open-Source Assembler (AMOS) was designed to offer a modular approach to genome assembly. AMOS includes a wide range of tools for assembly, including the lightweight de novo assemblers Minimus and Minimo, and Bambus 2, a robust scaffolder able to handle metagenomic and polymorphic data. This protocol describes how to configure and use AMOS for the assembly of Next Generation sequence data. Additionally, we provide three tutorial examples that include bacterial, viral, and metagenomic datasets with specific tips for improving assembly quality. © 2011 by John Wiley & Sons, Inc.
USDA-ARS?s Scientific Manuscript database
Background: Vertebrate immune systems generate diverse repertoires of antibodies capable of mediating response to a variety of antigens. Next generation sequencing methods provide unique approaches to a number of immuno-based research areas including antibody discovery and engineering, disease surve...
Wang, Qinghua; Arighi, Cecilia N; King, Benjamin L; Polson, Shawn W; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F; Page, Shallee T; Rendino, Marc Farnum; Thomas, William Kelley; Udwary, Daniel W; Wu, Cathy H
2012-01-01
Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome.
Wang, Qinghua; Arighi, Cecilia N.; King, Benjamin L.; Polson, Shawn W.; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F.; Page, Shallee T.; Farnum Rendino, Marc; Thomas, William Kelley; Udwary, Daniel W.; Wu, Cathy H.
2012-01-01
Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome. PMID:22434832
ParDRe: faster parallel duplicated reads removal tool for sequencing studies.
González-Domínguez, Jorge; Schmidt, Bertil
2016-05-15
Current next generation sequencing technologies often generate duplicated or near-duplicated reads that (depending on the application scenario) do not provide any interesting biological information but can increase memory requirements and computational time of downstream analysis. In this work we present ParDRe, a de novo parallel tool to remove duplicated and near-duplicated reads through the clustering of Single-End or Paired-End sequences from fasta or fastq files. It uses a novel bitwise approach to compare the suffixes of DNA strings and employs hybrid MPI/multithreading to reduce runtime on multicore systems. We show that ParDRe is up to 27.29 times faster than Fulcrum (a representative state-of-the-art tool) on a platform with two 8-core Sandy-Bridge processors. Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at https://sourceforge.net/projects/pardre/ jgonzalezd@udc.es. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
USDA-ARS?s Scientific Manuscript database
Marker assisted selection (MAS) has become widely used in perennial crop breeding programs to accelerate and enhance cultivar development via selection during the juvenile phase and parental selection prior to crossing. Next generation sequencing (NGS) has been widely used for whole genome molecular...
USDA-ARS?s Scientific Manuscript database
Marker assisted selection (MAS) is often employed in crop breeding programs to accelerate and enhance cultivar development, via selection during the juvenile phase and parental selection prior to crossing. Next generation sequencing (NGS) and its derivative technologies have been used for genome-wid...
USDA-ARS?s Scientific Manuscript database
Next generation sequencing offers new ways to identify the genetic mechanisms that underlie mutant phenotypes. The release of a reference diploid Gossypium raimondii (D5) genome and bioinformatics tools to sort tetraploid reads into subgenomes has brought cotton genetic mapping into the genomics er...
Veeranagouda, Yaligara; Debono-Lagneaux, Delphine; Fournet, Hamida; Thill, Gilbert; Didier, Michel
2018-01-16
The emergence of clustered regularly interspaced short palindromic repeats-Cas9 (CRISPR-Cas9) gene editing systems has enabled the creation of specific mutants at low cost, in a short time and with high efficiency, in eukaryotic cells. Since a CRISPR-Cas9 system typically creates an array of mutations in targeted sites, a successful gene editing project requires careful selection of edited clones. This process can be very challenging, especially when working with multiallelic genes and/or polyploid cells (such as cancer and plants cells). Here we described a next-generation sequencing method called CRISPR-Cas9 Edited Site Sequencing (CRES-Seq) for the efficient and high-throughput screening of CRISPR-Cas9-edited clones. CRES-Seq facilitates the precise genotyping up to 96 CRISPR-Cas9-edited sites (CRES) in a single MiniSeq (Illumina) run with an approximate sequencing cost of $6/clone. CRES-Seq is particularly useful when multiple genes are simultaneously targeted by CRISPR-Cas9, and also for screening of clones generated from multiallelic genes/polyploid cells. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.
USDA-ARS?s Scientific Manuscript database
The advancement of next-generation sequencing technologies in conjunction with new bioinformatics tools enabled fine-tuning of sequence-based high resolution mapping strategies for complex genomes. Although genotyping-by-sequencing (GBS) provides a large number of markers, its application for assoc...
Short reads from honey bee (Apis sp.) sequencing projects reflect microbial associate diversity.
Gerth, Michael; Hurst, Gregory D D
2017-01-01
High throughput (or 'next generation') sequencing has transformed most areas of biological research and is now a standard method that underpins empirical study of organismal biology, and (through comparison of genomes), reveals patterns of evolution. For projects focused on animals, these sequencing methods do not discriminate between the primary target of sequencing (the animal genome) and 'contaminating' material, such as associated microbes. A common first step is to filter out these contaminants to allow better assembly of the animal genome or transcriptome. Here, we aimed to assess if these 'contaminations' provide information with regard to biologically important microorganisms associated with the individual. To achieve this, we examined whether the short read data from Apis retrieved elements of its well established microbiome. To this end, we screened almost 1,000 short read libraries of honey bee ( Apis sp.) DNA sequencing project for the presence of microbial sequences, and find sequences from known honey bee microbial associates in at least 11% of them. Further to this, we screened ∼500 Apis RNA sequencing libraries for evidence of viral infections, which were found to be present in about half of them. We then used the data to reconstruct draft genomes of three Apis associated bacteria, as well as several viral strains de novo . We conclude that 'contamination' in short read sequencing libraries can provide useful genomic information on microbial taxa known to be associated with the target organisms, and may even lead to the discovery of novel associations. Finally, we demonstrate that RNAseq samples from experiments commonly carry uneven viral loads across libraries. We note variation in viral presence and load may be a confounding feature of differential gene expression analyses, and as such it should be incorporated as a random factor in analyses.
Masking as an effective quality control method for next-generation sequencing data analysis.
Yun, Sajung; Yun, Sijung
2014-12-13
Next generation sequencing produces base calls with low quality scores that can affect the accuracy of identifying simple nucleotide variation calls, including single nucleotide polymorphisms and small insertions and deletions. Here we compare the effectiveness of two data preprocessing methods, masking and trimming, and the accuracy of simple nucleotide variation calls on whole-genome sequence data from Caenorhabditis elegans. Masking substitutes low quality base calls with 'N's (undetermined bases), whereas trimming removes low quality bases that results in a shorter read lengths. We demonstrate that masking is more effective than trimming in reducing the false-positive rate in single nucleotide polymorphism (SNP) calling. However, both of the preprocessing methods did not affect the false-negative rate in SNP calling with statistical significance compared to the data analysis without preprocessing. False-positive rate and false-negative rate for small insertions and deletions did not show differences between masking and trimming. We recommend masking over trimming as a more effective preprocessing method for next generation sequencing data analysis since masking reduces the false-positive rate in SNP calling without sacrificing the false-negative rate although trimming is more commonly used currently in the field. The perl script for masking is available at http://code.google.com/p/subn/. The sequencing data used in the study were deposited in the Sequence Read Archive (SRX450968 and SRX451773).
Genetic basis of arrhythmogenic cardiomyopathy.
Karmouch, Jennifer; Protonotarios, Alexandros; Syrris, Petros
2018-05-01
To date 16 genes have been associated with arrhythmogenic cardiomyopathy (ACM). Mutations in these genes can lead to a broad spectrum of phenotypic expression ranging from disease affecting predominantly the right or left ventricle, to biventricular subtypes. Understanding the genetic causes of ACM is important in diagnosis and management of the disorder. This review summarizes recent advances in molecular genetics and discusses the application of next-generation sequencing technology in genetic testing in ACM. Use of next-generation sequencing methods has resulted in the identification of novel causative variants and genes for ACM. The involvement of filamin C in ACM demonstrates the genetic overlap between ACM and other types of cardiomyopathy. Putative pathogenic variants have been detected in cadherin 2 gene, a protein involved in cell adhesion. Large genomic rearrangements in desmosome genes have been systematically investigated in a cohort of ACM patients. Recent studies have identified novel causes of ACM providing new insights into the genetic spectrum of the disease and highlighting an overlapping phenotype between ACM and dilated cardiomyopathy. Next-generation sequencing is a useful tool for research and genetic diagnostic screening but interpretation of identified sequence variants requires caution and should be performed in specialized centres.
Next generation sequencing applications for breast cancer research
PETRIC, ROXANA COJOCNEANU; POP, LAURA-ANCUTA; JURJ, ANCUTA; RADULY, LAJOS; DUMITRASCU, DAN; DRAGOS, NICOLAE; NEAGOE, IOANA BERINDAN
2015-01-01
For some time, cancer has not been thought of as a disease, but as a multifaceted, heterogeneous complex of genotypic and phenotypic manifestations leading to tumorigenesis. Due to recent technological progress, the outcome of cancer patients can be greatly improved by introducing in clinical practice the advantages brought about by the development of next generation sequencing techniques. Biomedical suppliers have come up with various applications which medical researchers can use to characterize a patient’s disease from molecular and genetic point of view in order to provide caregivers with rapid and relevant information to guide them in choosing the most appropriate course of treatment, with maximum efficiency and minimal side effects. Breast cancer, whose incidence has risen dramatically, is a good candidate for these novel diagnosis and therapeutic approaches, particularly when referring to specific sequencing panels which are designed to detect germline or somatic mutations in genes that are involved in breast cancer tumorigenesis and progression. Benchtop next generation sequencing machines are becoming a more common presence in the clinical setting, empowering physicians to better treat their patients, by offering early diagnosis alternatives, targeted remedies, and bringing medicine a step closer to achieving its ultimate goal, personalized therapy. PMID:26609257
Chopra, Ratan; Burow, Gloria; Farmer, Andrew; Mudge, Joann; Simpson, Charles E; Wilkins, Thea A; Baring, Michael R; Puppala, Naveen; Chamberlin, Kelly D; Burow, Mark D
2015-06-01
Single-nucleotide polymorphisms, which can be identified in the thousands or millions from comparisons of transcriptome or genome sequences, are ideally suited for making high-resolution genetic maps, investigating population evolutionary history, and discovering marker-trait linkages. Despite significant results from their use in human genetics, progress in identification and use in plants, and particularly polyploid plants, has lagged. As part of a long-term project to identify and use SNPs suitable for these purposes in cultivated peanut, which is tetraploid, we generated transcriptome sequences of four peanut cultivars, namely OLin, New Mexico Valencia C, Tamrun OL07 and Jupiter, which represent the four major market classes of peanut grown in the world, and which are important economically to the US southwest peanut growing region. CopyDNA libraries of each genotype were used to generate 2 × 54 paired-end reads using an Illumina GAIIx sequencer. Raw reads were mapped to a custom reference consisting of Tifrunner 454 sequences plus peanut ESTs in GenBank, compromising 43,108 contigs; 263,840 SNP and indel variants were identified among four genotypes compared to the reference. A subset of 6 variants was assayed across 24 genotypes representing four market types using KASP chemistry to assess the criteria for SNP selection. Results demonstrated that transcriptome sequencing can identify SNPs usable as selectable DNA-based markers in complex polyploid species such as peanut. Criteria for effective use of SNPs as markers are discussed in this context.
Exome and genome sequencing in reproductive medicine.
Normand, Elizabeth A; Alaimo, Joseph T; Van den Veyver, Ignatia B
2018-02-01
The advent of next-generation sequencing has enabled clinicians to assess many genes simultaneously and at high resolution. This is advantageous for diagnosing patients in whom a genetic disorder is suspected but who have a nonspecific or atypical phenotype or when the disorder has significant genetic heterogeneity. Herein, we describe common clinical applications of next-generation sequencing technology, as well as their respective benefits and limitations. We then discuss key considerations of variant interpretation and reporting, clinical utility, pre- and posttest genetic counseling, and ethical challenges. We will present these topics with an emphasis on their applicability to the reproductive medicine setting. Copyright © 2017 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.
Next-Generation Sequencing: a Diagnostic One-Stop Shop for Hepatitis C?
Poljak, Mario
2016-10-01
Before starting chronic hepatitis C treatment, the viral genotype/subtype has to be accurately determined and potentially coupled with drug resistance testing. Due to the high genetic variability of the hepatitis C virus, this can be a demanding task that can potentially be streamlined by viral whole-genome sequencing using next-generation sequencing as demonstrated by an article in this issue of the Journal of Clinical Microbiology by E. Thomson, C. L. C. Ip, A. Badhan, M. T. Christiansen, W. Adamson, et al. (J Clin Microbiol. 54:2455-2469, 2016, http://dx.doi.org/10.1128/JCM.00330-16). Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Next-generation sequencing: hype and hope for development of personalized radiation therapy?
Tinhofer, Ingeborg; Niehr, Franziska; Konschak, Robert; Liebs, Sandra; Munz, Matthias; Stenzinger, Albrecht; Weichert, Wilko; Keilholz, Ulrich; Budach, Volker
2015-08-28
The introduction of next-generation sequencing (NGS) in the field of cancer research has boosted worldwide efforts of genome-wide personalized oncology aiming at identifying predictive biomarkers and novel actionable targets. Despite considerable progress in understanding the molecular biology of distinct cancer entities by the use of this revolutionary technology and despite contemporaneous innovations in drug development, translation of NGS findings into improved concepts for cancer treatment remains a challenge. The aim of this article is to describe shortly the NGS platforms for DNA sequencing and in more detail key achievements and unresolved hurdles. A special focus will be given on potential clinical applications of this innovative technique in the field of radiation oncology.
Calibration of a Direct Detection Doppler Wind Lidar System using a Wind Tunnel
NASA Astrophysics Data System (ADS)
Rees, David
2012-07-01
As a critical stage of a Project to develop an airborne Direct-Detection Doppler Wind Lidar System, it was possible to exploit a Wind Tunnel of the VZLU, Prague, Czech Republic for a comprehensive series of tests against calibrated Air Speed generated by the Wind Tunnel. The initial results from these test sequences will be presented. The rms wind speed errors were of order 0.25 m/sec - very satisfactory for this class of Doppler Wind Lidar measurements. The next stage of this Project will exploit a more highly-developed laser and detection system for measurements of wind shear, wake vortex and other potentially hazardous meteorological phenomena at Airports. Following the end of this Project, key parts of the instrumentation will be used for routine ground-based Doppler Wind Lidar measurements of the troposphere and stratosphere.
Shen, Kang-Ning; Chen, Ching-Hung; Hsiao, Chung-Der
2016-05-01
In this study, the complete mitogenome sequence of hornlip mullet Plicomugil labiosus (Teleostei: Mugilidae) has been sequenced by next-generation sequencing method. The assembled mitogenome, consisting of 16,829 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein coding genes, 22 transfer RNAs, 2 ribosomal RNAs genes and a non-coding control region of D-loop. D-loop contains 1057 bp length is located between tRNA-Pro and tRNA-Phe. The overall base composition of P. labiosus is 28.0% for A, 29.3% for C, 15.5% for G and 27.2% for T. The complete mitogenome may provide essential and important DNA molecular data for further population, phylogenetic and evolutionary analysis for Mugilidae.
Shen, Kang-Ning; Tsai, Shiou-Yi; Chen, Ching-Hung; Hsiao, Chung-Der; Durand, Jean-Dominique
2016-11-01
In this study, the complete mitogenome sequence of largescale mullet (Teleostei: Mugilidae) has been sequenced by the next-generation sequencing method. The assembled mitogenome, consisting of 16,832 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein-coding genes, 22 transfer RNAs, two ribosomal RNAs genes, and a non-coding control region of D-loop. D-loop which has a length of 1094 bp is located between tRNA-Pro and tRNA-Phe. The overall base composition of largescale mullet is 27.8% for A, 30.1% for C, 16.2% for G, and 25.9% for T. The complete mitogenome may provide essential and important DNA molecular data for further phylogenetic and evolutionary analysis for Mugilidae.
USDA-ARS?s Scientific Manuscript database
Next-generation sequencing (NGS) technologies are revolutionizing both medical and biological research through generation of massive SNP data sets for identifying heritable genome variation underlying key traits, from rare human diseases to important agronomic phenotypes in crop species. We evaluate...
System, method and apparatus for generating phrases from a database
NASA Technical Reports Server (NTRS)
McGreevy, Michael W. (Inventor)
2004-01-01
A phrase generation is a method of generating sequences of terms, such as phrases, that may occur within a database of subsets containing sequences of terms, such as text. A database is provided and a relational model of the database is created. A query is then input. The query includes a term or a sequence of terms or multiple individual terms or multiple sequences of terms or combinations thereof. Next, several sequences of terms that are contextually related to the query are assembled from contextual relations in the model of the database. The sequences of terms are then sorted and output. Phrase generation can also be an iterative process used to produce sequences of terms from a relational model of a database.
MinION Analysis and Reference Consortium: Phase 1 data release and analysis
Eccles, David A.; Zalunin, Vadim; Urban, John M.; Piazza, Paolo; Bowden, Rory J.; Paten, Benedict; Mwaigwisya, Solomon; Batty, Elizabeth M.; Simpson, Jared T.; Snutch, Terrance P.
2015-01-01
The advent of a miniaturized DNA sequencing device with a high-throughput contextual sequencing capability embodies the next generation of large scale sequencing tools. The MinION™ Access Programme (MAP) was initiated by Oxford Nanopore Technologies™ in April 2014, giving public access to their USB-attached miniature sequencing device. The MinION Analysis and Reference Consortium (MARC) was formed by a subset of MAP participants, with the aim of evaluating and providing standard protocols and reference data to the community. Envisaged as a multi-phased project, this study provides the global community with the Phase 1 data from MARC, where the reproducibility of the performance of the MinION was evaluated at multiple sites. Five laboratories on two continents generated data using a control strain of Escherichia coli K-12, preparing and sequencing samples according to a revised ONT protocol. Here, we provide the details of the protocol used, along with a preliminary analysis of the characteristics of typical runs including the consistency, rate, volume and quality of data produced. Further analysis of the Phase 1 data presented here, and additional experiments in Phase 2 of E. coli from MARC are already underway to identify ways to improve and enhance MinION performance. PMID:26834992
SNP discovery through de novo deep sequencing using the next generation of DNA sequencers
USDA-ARS?s Scientific Manuscript database
The production of high volumes of DNA sequence data using new technologies has permitted more efficient identification of single nucleotide polymorphisms in vertebrate genomes. This chapter presented practical methodology for production and analysis of DNA sequence data for SNP discovery....
USDA-ARS?s Scientific Manuscript database
Current technologies with next generation sequencing have revolutionized metagenomics analysis of clinical samples. To achieve the non-selective amplification and recovery of low abundance genetic sequences, a simplified Sequence-Independent, Single-Primer Amplification (SISPA) technique in combinat...
Li, Zibo; Guo, Xinwu; Tang, Lili; Peng, Limin; Chen, Ming; Luo, Xipeng; Wang, Shouman; Xiao, Zhi; Deng, Zhongping; Dai, Lizhong; Xia, Kun; Wang, Jun
2016-10-01
Circulating cell-free DNA (cfDNA) has been considered as a potential biomarker for non-invasive cancer detection. To evaluate the methylation levels of six candidate genes (EGFR, GREM1, PDGFRB, PPM1E, SOX17, and WRN) in plasma cfDNA as biomarkers for breast cancer early detection, quantitative analysis of the promoter methylation of these genes from 86 breast cancer patients and 67 healthy controls was performed by using microfluidic-PCR-based target enrichment and next-generation bisulfite sequencing technology. The predictive performance of different logistic models based on methylation status of candidate genes was investigated by means of the area under the ROC curve (AUC) and odds ratio (OR) analysis. Results revealed that EGFR, PPM1E, and 8 gene-specific CpG sites showed significantly hypermethylation in cancer patients' plasma and significantly associated with breast cancer (OR ranging from 2.51 to 9.88). The AUC values for these biomarkers were ranging from 0.66 to 0.75. Combinations of multiple hypermethylated genes or CpG sites substantially improved the predictive performance for breast cancer detection. Our study demonstrated the feasibility of quantitative measurement of candidate gene methylation in cfDNA by using microfluidic-PCR-based target enrichment and bisulfite next-generation sequencing, which is worthy of further validation and potentially benefits a broad range of applications in clinical oncology practice. Quantitative analysis of methylation pattern of plasma cfDNA by next-generation sequencing might be a valuable non-invasive tool for early detection of breast cancer.
Sequencing, Assembly and Analysis of Human Microbial Communities
Petrosino, Joe
2018-02-02
Joe Petrosino of Baylor College of Medicine discusses using next generation sequencing technologies to study human microbial communities associated with health and disease on June 4, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.
[Advances of Molecular Diagnostic Techniques Application in Clinical Diagnosis.
Ying, Bin-Wu
2016-11-01
Over the past 20 years,clinical molecular diagnostic technology has made rapid development,and became the most promising field in clinical laboratory medicine.In particular,with the development of genomics,clinical molecular diagnostic methods will reveal the nature of clinical diseases in a deeper level,thus guiding the clinical diagnosis and treatments.Many molecular diagnostic projects have been routinely applied in clinical works.This paper reviews the advances on application of clinical diagnostic techniques in infectious disease,tumor and genetic disorders,including nucleic acid amplification,biochip,next-generation sequencing,and automation molecular system,and so on.
USDA-ARS?s Scientific Manuscript database
Current technologies for next generation sequencing (NGS) have revolutionized metagenomics analysis of clinical samples. One advantage of the NGS platform is the possibility to sequence the genetic material in samples without any prior knowledge of the sequence contained within. Sequence-Independent...
Targeted sequencing of plant genomes
Mark D. Huynh
2014-01-01
Next-generation sequencing (NGS) has revolutionized the field of genetics by providing a means for fast and relatively affordable sequencing. With the advancement of NGS, wholegenome sequencing (WGS) has become more commonplace. However, sequencing an entire genome is still not cost effective or even beneficial in all cases. In studies that do not require a whole-...
International Standards for Genomes, Transcriptomes, and Metagenomes
Mason, Christopher E.; Afshinnekoo, Ebrahim; Tighe, Scott; Wu, Shixiu; Levy, Shawn
2017-01-01
Challenges and biases in preparing, characterizing, and sequencing DNA and RNA can have significant impacts on research in genomics across all kingdoms of life, including experiments in single-cells, RNA profiling, and metagenomics (across multiple genomes). Technical artifacts and contamination can arise at each point of sample manipulation, extraction, sequencing, and analysis. Thus, the measurement and benchmarking of these potential sources of error are of paramount importance as next-generation sequencing (NGS) projects become more global and ubiquitous. Fortunately, a variety of methods, standards, and technologies have recently emerged that improve measurements in genomics and sequencing, from the initial input material to the computational pipelines that process and annotate the data. Here we review current standards and their applications in genomics, including whole genomes, transcriptomes, mixed genomic samples (metagenomes), and the modified bases within each (epigenomes and epitranscriptomes). These standards, tools, and metrics are critical for quantifying the accuracy of NGS methods, which will be essential for robust approaches in clinical genomics and precision medicine. PMID:28337071
Multiplex Reverse Transcription-PCR for Simultaneous Surveillance of Influenza A and B Viruses
Zhou, Bin; Barnes, John R.; Sessions, October M.; Chou, Tsui-Wen; Wilson, Malania; Stark, Thomas J.; Volk, Michelle; Spirason, Natalie; Halpin, Rebecca A.; Kamaraj, Uma Sangumathi; Ding, Tao; Stockwell, Timothy B.; Ghedin, Elodie; Barr, Ian G.
2017-01-01
ABSTRACT Influenza A and B viruses are the causative agents of annual influenza epidemics that can be severe, and influenza A viruses intermittently cause pandemics. Sequence information from influenza virus genomes is instrumental in determining mechanisms underpinning antigenic evolution and antiviral resistance. However, due to sequence diversity and the dynamics of influenza virus evolution, rapid and high-throughput sequencing of influenza viruses remains a challenge. We developed a single-reaction influenza A/B virus (FluA/B) multiplex reverse transcription-PCR (RT-PCR) method that amplifies the most critical genomic segments (hemagglutinin [HA], neuraminidase [NA], and matrix [M]) of seasonal influenza A and B viruses for next-generation sequencing, regardless of viral type, subtype, or lineage. Herein, we demonstrate that the strategy is highly sensitive and robust. The strategy was validated on thousands of seasonal influenza A and B virus-positive specimens using multiple next-generation sequencing platforms. PMID:28978683
A map of human genome variation from population-scale sequencing.
Abecasis, Gonçalo R; Altshuler, David; Auton, Adam; Brooks, Lisa D; Durbin, Richard M; Gibbs, Richard A; Hurles, Matt E; McVean, Gil A
2010-10-28
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.
Transit safety retrofit package development : final report.
DOT National Transportation Integrated Search
2014-07-01
This report provides a summary of the Transit Safety Retrofit Package (TRP) Development project and its results. The report documents results of each project phase, and provides recommended next steps as well as a vision for a next generation TRP. Th...
Recent Applications of DNA Sequencing Technologies in Food, Nutrition and Agriculture
USDA-ARS?s Scientific Manuscript database
Next-generation DNA sequencing technologies are able to produce millions of short sequence reads in a high-throughput, cost-effective fashion. The emergence of these technologies has not only facilitated genome sequencing but also changed the landscape of life sciences. This review surveys their rec...
USDA-ARS?s Scientific Manuscript database
The advent of next-generation sequencing technologies has been a boon to the cost-effective development of molecular markers, particularly in non-model species. Here, we demonstrate the efficiency of microsatellite or simple sequence repeat (SSR) marker development from short-read sequences using th...
A remark on copy number variation detection methods.
Li, Shuo; Dou, Xialiang; Gao, Ruiqi; Ge, Xinzhou; Qian, Minping; Wan, Lin
2018-01-01
Copy number variations (CNVs) are gain and loss of DNA sequence of a genome. High throughput platforms such as microarrays and next generation sequencing technologies (NGS) have been applied for genome wide copy number losses. Although progress has been made in both approaches, the accuracy and consistency of CNV calling from the two platforms remain in dispute. In this study, we perform a deep analysis on copy number losses on 254 human DNA samples, which have both SNP microarray data and NGS data publicly available from Hapmap Project and 1000 Genomes Project respectively. We show that the copy number losses reported from Hapmap Project and 1000 Genome Project only have < 30% overlap, while these reports are required to have cross-platform (e.g. PCR, microarray and high-throughput sequencing) experimental supporting by their corresponding projects, even though state-of-art calling methods were employed. On the other hand, copy number losses are found directly from HapMap microarray data by an accurate algorithm, i.e. CNVhac, almost all of which have lower read mapping depth in NGS data; furthermore, 88% of which can be supported by the sequences with breakpoint in NGS data. Our results suggest the ability of microarray calling CNVs and the possible introduction of false negatives from the unessential requirement of the additional cross-platform supporting. The inconsistency of CNV reports from Hapmap Project and 1000 Genomes Project might result from the inadequate information containing in microarray data, the inconsistent detection criteria, or the filtration effect of cross-platform supporting. The statistical test on CNVs called from CNVhac show that the microarray data can offer reliable CNV reports, and majority of CNV candidates can be confirmed by raw sequences. Therefore, the CNV candidates given by a good caller could be highly reliable without cross-platform supporting, so additional experimental information should be applied in need instead of necessarily.
Yebra, Gonzalo; Frampton, Dan; Gallo Cassarino, Tiziano; Raffle, Jade; Hubb, Jonathan; Ferns, R Bridget; Waters, Laura; Tong, C Y William; Kozlakidis, Zisis; Hayward, Andrew; Kellam, Paul; Pillay, Deenan; Clark, Duncan; Nastouli, Eleni; Leigh Brown, Andrew J
2018-01-01
The ICONIC project has developed an automated high-throughput pipeline to generate HIV nearly full-length genomes (NFLG, i.e. from gag to nef) from next-generation sequencing (NGS) data. The pipeline was applied to 420 HIV samples collected at University College London Hospitals NHS Trust and Barts Health NHS Trust (London) and sequenced using an Illumina MiSeq at the Wellcome Trust Sanger Institute (Cambridge). Consensus genomes were generated and subtyped using COMET, and unique recombinants were studied with jpHMM and SimPlot. Maximum-likelihood phylogenetic trees were constructed using RAxML to identify transmission networks using the Cluster Picker. The pipeline generated sequences of at least 1Kb of length (median = 7.46Kb, IQR = 4.01Kb) for 375 out of the 420 samples (89%), with 174 (46.4%) being NFLG. A total of 365 sequences (169 of them NFLG) corresponded to unique subjects and were included in the down-stream analyses. The most frequent HIV subtypes were B (n = 149, 40.8%) and C (n = 77, 21.1%) and the circulating recombinant form CRF02_AG (n = 32, 8.8%). We found 14 different CRFs (n = 66, 18.1%) and multiple URFs (n = 32, 8.8%) that involved recombination between 12 different subtypes/CRFs. The most frequent URFs were B/CRF01_AE (4 cases) and A1/D, B/C, and B/CRF02_AG (3 cases each). Most URFs (19/26, 73%) lacked breakpoints in the PR+RT pol region, rendering them undetectable if only that was sequenced. Twelve (37.5%) of the URFs could have emerged within the UK, whereas the rest were probably imported from sub-Saharan Africa, South East Asia and South America. For 2 URFs we found highly similar pol sequences circulating in the UK. We detected 31 phylogenetic clusters using the full dataset: 25 pairs (mostly subtypes B and C), 4 triplets and 2 quadruplets. Some of these were not consistent across different genes due to inter- and intra-subtype recombination. Clusters involved 70 sequences, 19.2% of the dataset. The initial analysis of genome sequences detected substantial hidden variability in the London HIV epidemic. Analysing full genome sequences, as opposed to only PR+RT, identified previously undetected recombinants. It provided a more reliable description of CRFs (that would be otherwise misclassified) and transmission clusters.
NASA Technical Reports Server (NTRS)
Smith, David J.; Burton, Aaron; Castro-Wallace, Sarah; John, Kristen; Stahl, Sarah E.; Dworkin, Jason Peter; Lupisella, Mark L.
2016-01-01
On the International Space Station (ISS), technologies capable of rapid microbial identification and disease diagnostics are not currently available. NASA still relies upon sample return for comprehensive, molecular-based sample characterization. Next-generation DNA sequencing is a powerful approach for identifying microorganisms in air, water, and surfaces onboard spacecraft. The Biomolecule Sequencer payload, manifested to SpaceX-9 and scheduled on the Increment 4748 research plan (June 2016), will assess the functionality of a commercially-available next-generation DNA sequencer in the microgravity environment of ISS. The MinION device from Oxford Nanopore Technologies (Oxford, UK) measures picoamp changes in electrical current dependent on nucleotide sequences of the DNA strand migrating through nanopores in the system. The hardware is exceptionally small (9.5 x 3.2 x 1.6 cm), lightweight (120 grams), and powered only by a USB connection. For the ISS technology demonstration, the Biomolecule Sequencer will be powered by a Microsoft Surface Pro3. Ground-prepared samples containing lambda bacteriophage, Escherichia coli, and mouse genomic DNA, will be launched and stored frozen on the ISS until experiment initiation. Immediately prior to sequencing, a crew member will collect and thaw frozen DNA samples, connect the sequencer to the Surface Pro3, inject thawed samples into a MinION flow cell, and initiate sequencing. At the completion of the sequencing run, data will be downlinked for ground analysis. Identical, synchronous ground controls will be used for data comparisons to determine sequencer functionality, run-time sequence, current dynamics, and overall accuracy. We will present our latest results from the ISS flight experiment the first time DNA has ever been sequenced in space and discuss the many potential applications of the Biomolecule Sequencer for environmental monitoring, medical diagnostics, higher fidelity and more adaptable Space Biology Human Research Program investigations, and even life detection experiments for astrobiology missions.
New Technology Drafts: Production and Improvements
Lapidus, Alla
2018-01-22
Alla Lapidus, head of the DOE Joint Genome Institute's Finishing group, gives a talk on how the DOE JGI's microbial genome sequencing pipeline has been adapted to accommodate next generation sequencing platforms at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.
Mind the gap; seven reasons to close fragmented genome assemblies
USDA-ARS?s Scientific Manuscript database
Like other domains of life, research into the biology of filamentous microbes has greatly benefited from the advent of whole-genome sequencing. Next-generation sequencing (NGS) technologies have revolutionized sequencing, making genomic sciences accessible to many academic laboratories including tho...
Introduction to the fathead minnow genome browser and ...
Ab initio gene prediction and evidence alignment were used to produce the first annotations for the fathead minnow SOAPdenovo genome assembly. Additionally, a genome browser hosted at genome.setac.org provides simplified access to the annotation data in context with fathead minnow genomic sequence. This work is meant to extend the utility of fathead minnow genome as a resource and enable the continued development of this species as a model organism. The fathead minnow (Pimephales promelas) is a laboratory model organism widely used in regulatory toxicity testing and ecotoxicology research. Despite, the wealth of toxicological data for this organism, until recently genome scale information was lacking for the species, which limited the utility of the species for pathway-based toxicity testing and research. As part of a EPA Pathfinder Innovation Project, next generation sequencing was applied to generate a draft genome assembly, which was published in 2016. However, application of those genome-scale sequencing resources was still limited by the lack of available gene annotations for fathead minnow. Here we report on development of a first generation genome annotation for fathead minnow and the dissemination of that information through a web-based browser that makes it easy to search for genes of interest, extract the corresponding sequence, identify intron and exon boundaries and regulatory regions, and align the computationally predicted genes with other supporti
Watson, Christopher M.; Crinnion, Laura A.; Gurgel‐Gianetti, Juliana; Harrison, Sally M.; Daly, Catherine; Antanavicuite, Agne; Lascelles, Carolina; Markham, Alexander F.; Pena, Sergio D. J.; Bonthron, David T.
2015-01-01
ABSTRACT Autozygosity mapping is a powerful technique for the identification of rare, autosomal recessive, disease‐causing genes. The ease with which this category of disease gene can be identified has greatly increased through the availability of genome‐wide SNP genotyping microarrays and subsequently of exome sequencing. Although these methods have simplified the generation of experimental data, its analysis, particularly when disparate data types must be integrated, remains time consuming. Moreover, the huge volume of sequence variant data generated from next generation sequencing experiments opens up the possibility of using these data instead of microarray genotype data to identify disease loci. To allow these two types of data to be used in an integrated fashion, we have developed AgileVCFMapper, a program that performs both the mapping of disease loci by SNP genotyping and the analysis of potentially deleterious variants using exome sequence variant data, in a single step. This method does not require microarray SNP genotype data, although analysis with a combination of microarray and exome genotype data enables more precise delineation of disease loci, due to superior marker density and distribution. PMID:26037133
Zseq: An Approach for Preprocessing Next-Generation Sequencing Data.
Alkhateeb, Abedalrhman; Rueda, Luis
2017-08-01
Next-generation sequencing technology generates a huge number of reads (short sequences), which contain a vast amount of genomic data. The sequencing process, however, comes with artifacts. Preprocessing of sequences is mandatory for further downstream analysis. We present Zseq, a linear method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of unique k-mers in each sequence as its corresponding score and also takes into the account other factors such as ambiguous nucleotides or high GC-content percentage in k-mers. Based on a z-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold. Zseq algorithm is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as de novo assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Moreover, de novo assembled transcripts from the reads filtered by Zseq have longer genomic sequences than other tested methods. Estimating the threshold of the cutoff point is introduced using labeling rules with optimistic results.
Cao, Yu; Fanning, Séamus; Proos, Sinéad; Jordan, Kieran; Srikumar, Shabarinath
2017-01-01
The development of next generation sequencing (NGS) techniques has enabled researchers to study and understand the world of microorganisms from broader and deeper perspectives. The contemporary advances in DNA sequencing technologies have not only enabled finer characterization of bacterial genomes but also provided deeper taxonomic identification of complex microbiomes which in its genomic essence is the combined genetic material of the microorganisms inhabiting an environment, whether the environment be a particular body econiche (e.g., human intestinal contents) or a food manufacturing facility econiche (e.g., floor drain). To date, 16S rDNA sequencing, metagenomics and metatranscriptomics are the three basic sequencing strategies used in the taxonomic identification and characterization of food-related microbiomes. These sequencing strategies have used different NGS platforms for DNA and RNA sequence identification. Traditionally, 16S rDNA sequencing has played a key role in understanding the taxonomic composition of a food-related microbiome. Recently, metagenomic approaches have resulted in improved understanding of a microbiome by providing a species-level/strain-level characterization. Further, metatranscriptomic approaches have contributed to the functional characterization of the complex interactions between different microbial communities within a single microbiome. Many studies have highlighted the use of NGS techniques in investigating the microbiome of fermented foods. However, the utilization of NGS techniques in studying the microbiome of non-fermented foods are limited. This review provides a brief overview of the advances in DNA sequencing chemistries as the technology progressed from first, next and third generations and highlights how NGS provided a deeper understanding of food-related microbiomes with special focus on non-fermented foods. PMID:29033905
BASiNET-BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification.
Ito, Eric Augusto; Katahira, Isaque; Vicente, Fábio Fernandes da Rocha; Pereira, Luiz Filipe Protasio; Lopes, Fabrício Martins
2018-06-05
With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET.
Chaitankar, Vijender; Karakülah, Gökhan; Ratnapriya, Rinki; Giuste, Felipe O.; Brooks, Matthew J.; Swaroop, Anand
2016-01-01
The advent of high throughput next generation sequencing (NGS) has accelerated the pace of discovery of disease-associated genetic variants and genomewide profiling of expressed sequences and epigenetic marks, thereby permitting systems-based analyses of ocular development and disease. Rapid evolution of NGS and associated methodologies presents significant challenges in acquisition, management, and analysis of large data sets and for extracting biologically or clinically relevant information. Here we illustrate the basic design of commonly used NGS-based methods, specifically whole exome sequencing, transcriptome, and epigenome profiling, and provide recommendations for data analyses. We briefly discuss systems biology approaches for integrating multiple data sets to elucidate gene regulatory or disease networks. While we provide examples from the retina, the NGS guidelines reviewed here are applicable to other tissues/cell types as well. PMID:27297499
Compression of next-generation sequencing reads aided by highly efficient de novo assembly
Jones, Daniel C.; Ruzzo, Walter L.; Peng, Xinxia
2012-01-01
We present Quip, a lossless compression algorithm for next-generation sequencing data in the FASTQ and SAM/BAM formats. In addition to implementing reference-based compression, we have developed, to our knowledge, the first assembly-based compressor, using a novel de novo assembly algorithm. A probabilistic data structure is used to dramatically reduce the memory required by traditional de Bruijn graph assemblers, allowing millions of reads to be assembled very efficiently. Read sequences are then stored as positions within the assembled contigs. This is combined with statistical compression of read identifiers, quality scores, alignment information and sequences, effectively collapsing very large data sets to <15% of their original size with no loss of information. Availability: Quip is freely available under the 3-clause BSD license from http://cs.washington.edu/homes/dcjones/quip. PMID:22904078
NABIC: A New Access Portal to Search, Visualize, and Share Agricultural Genomics Data.
Seol, Young-Joo; Lee, Tae-Ho; Park, Dong-Suk; Kim, Chang-Kug
2016-01-01
The National Agricultural Biotechnology Information Center developed an access portal to search, visualize, and share agricultural genomics data with a focus on South Korean information and resources. The portal features an agricultural biotechnology database containing a wide range of omics data from public and proprietary sources. We collected 28.4 TB of data from 162 agricultural organisms, with 10 types of omics data comprising next-generation sequencing sequence read archive, genome, gene, nucleotide, DNA chip, expressed sequence tag, interactome, protein structure, molecular marker, and single-nucleotide polymorphism datasets. Our genomic resources contain information on five animals, seven plants, and one fungus, which is accessed through a genome browser. We also developed a data submission and analysis system as a web service, with easy-to-use functions and cutting-edge algorithms, including those for handling next-generation sequencing data.
Besaratinia, Ahmad; Li, Haiqing; Yoon, Jae-In; Zheng, Albert; Gao, Hanlin; Tommasi, Stella
2012-01-01
Many carcinogens leave a unique mutational fingerprint in the human genome. These mutational fingerprints manifest as specific types of mutations often clustering at certain genomic loci in tumor genomes from carcinogen-exposed individuals. To develop a high-throughput method for detecting the mutational fingerprint of carcinogens, we have devised a cost-, time- and labor-effective strategy, in which the widely used transgenic Big Blue® mouse mutation detection assay is made compatible with the Roche/454 Genome Sequencer FLX Titanium next-generation sequencing technology. As proof of principle, we have used this novel method to establish the mutational fingerprints of three prominent carcinogens with varying mutagenic potencies, including sunlight ultraviolet radiation, 4-aminobiphenyl and secondhand smoke that are known to be strong, moderate and weak mutagens, respectively. For verification purposes, we have compared the mutational fingerprints of these carcinogens obtained by our newly developed method with those obtained by parallel analyses using the conventional low-throughput approach, that is, standard mutation detection assay followed by direct DNA sequencing using a capillary DNA sequencer. We demonstrate that this high-throughput next-generation sequencing-based method is highly specific and sensitive to detect the mutational fingerprints of the tested carcinogens. The method is reproducible, and its accuracy is comparable with that of the currently available low-throughput method. In conclusion, this novel method has the potential to move the field of carcinogenesis forward by allowing high-throughput analysis of mutations induced by endogenous and/or exogenous genotoxic agents. PMID:22735701
Besaratinia, Ahmad; Li, Haiqing; Yoon, Jae-In; Zheng, Albert; Gao, Hanlin; Tommasi, Stella
2012-08-01
Many carcinogens leave a unique mutational fingerprint in the human genome. These mutational fingerprints manifest as specific types of mutations often clustering at certain genomic loci in tumor genomes from carcinogen-exposed individuals. To develop a high-throughput method for detecting the mutational fingerprint of carcinogens, we have devised a cost-, time- and labor-effective strategy, in which the widely used transgenic Big Blue mouse mutation detection assay is made compatible with the Roche/454 Genome Sequencer FLX Titanium next-generation sequencing technology. As proof of principle, we have used this novel method to establish the mutational fingerprints of three prominent carcinogens with varying mutagenic potencies, including sunlight ultraviolet radiation, 4-aminobiphenyl and secondhand smoke that are known to be strong, moderate and weak mutagens, respectively. For verification purposes, we have compared the mutational fingerprints of these carcinogens obtained by our newly developed method with those obtained by parallel analyses using the conventional low-throughput approach, that is, standard mutation detection assay followed by direct DNA sequencing using a capillary DNA sequencer. We demonstrate that this high-throughput next-generation sequencing-based method is highly specific and sensitive to detect the mutational fingerprints of the tested carcinogens. The method is reproducible, and its accuracy is comparable with that of the currently available low-throughput method. In conclusion, this novel method has the potential to move the field of carcinogenesis forward by allowing high-throughput analysis of mutations induced by endogenous and/or exogenous genotoxic agents.
Zhang, Jianwei; Kudrna, Dave; Mu, Ting; Li, Weiming; Copetti, Dario; Yu, Yeisoo; Goicoechea, Jose Luis; Lei, Yang; Wing, Rod A.
2016-01-01
Abstract Motivation: Next generation sequencing technologies have revolutionized our ability to rapidly and affordably generate vast quantities of sequence data. Once generated, raw sequences are assembled into contigs or scaffolds. However, these assemblies are mostly fragmented and inaccurate at the whole genome scale, largely due to the inability to integrate additional informative datasets (e.g. physical, optical and genetic maps). To address this problem, we developed a semi-automated software tool—Genome Puzzle Master (GPM)—that enables the integration of additional genomic signposts to edit and build ‘new-gen-assemblies’ that result in high-quality ‘annotation-ready’ pseudomolecules. Results: With GPM, loaded datasets can be connected to each other via their logical relationships which accomplishes tasks to ‘group,’ ‘merge,’ ‘order and orient’ sequences in a draft assembly. Manual editing can also be performed with a user-friendly graphical interface. Final pseudomolecules reflect a user’s total data package and are available for long-term project management. GPM is a web-based pipeline and an important part of a Laboratory Information Management System (LIMS) which can be easily deployed on local servers for any genome research laboratory. Availability and Implementation: The GPM (with LIMS) package is available at https://github.com/Jianwei-Zhang/LIMS Contacts: jzhang@mail.hzau.edu.cn or rwing@mail.arizona.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27318200
Next Generation * Natural Gas (NG)2 Information Requirements--Executive Summary
2000-01-01
The Energy Information Administration (EIA) has initiated the Next Generation * Natural Gas (NG)2 project to design and implement a new and comprehensive information program for natural gas to meet customer requirements in the post-2000 time frame.
Okamoto, Nobuhiko; Nakashima, Mitsuko; Tsurusaki, Yoshinori; Miyake, Noriko; Saitsu, Hirotomo; Matsumoto, Naomichi
2013-01-01
Next-generation sequencing (NGS) combined with enrichment of target genes enables highly efficient and low-cost sequencing of multiple genes for genetic diseases. The aim of this study was to validate the accuracy and sensitivity of our method for comprehensive mutation detection in autism spectrum disorder (ASD). We assessed the performance of the bench-top Ion Torrent PGM and Illumina MiSeq platforms as optimized solutions for mutation detection, using microdroplet PCR-based enrichment of 62 ASD associated genes. Ten patients with known mutations were sequenced using NGS to validate the sensitivity of our method. The overall read quality was better with MiSeq, largely because of the increased indel-related error associated with PGM. The sensitivity of SNV detection was similar between the two platforms, suggesting they are both suitable for SNV detection in the human genome. Next, we used these methods to analyze 28 patients with ASD, and identified 22 novel variants in genes associated with ASD, with one mutation detected by MiSeq only. Thus, our results support the combination of target gene enrichment and NGS as a valuable molecular method for investigating rare variants in ASD. PMID:24066114
Characterisation and Next-generation Sequencing Analysis of Unknown Arboviruses
2012-09-01
on the development of real- time PCR detection assays for Vibrio cholerae, a water-borne bacterium responsible for severe enteric disease. From...specific sequence [22]. The length of time from harvesting virus to generating samples that are ready for sequencing takes about two weeks, which is a...two viruses, and on day 4 post infection significant and widespread cytopathic effect was observed. The viruses were harvested by ultracentrifugation
Chen, Hui; Luthra, Rajyalakshmi; Goswami, Rashmi S; Singh, Rajesh R; Roy-Chowdhuri, Sinchita
2015-08-28
Application of next-generation sequencing (NGS) technology to routine clinical practice has enabled characterization of personalized cancer genomes to identify patients likely to have a response to targeted therapy. The proper selection of tumor sample for downstream NGS based mutational analysis is critical to generate accurate results and to guide therapeutic intervention. However, multiple pre-analytic factors come into play in determining the success of NGS testing. In this review, we discuss pre-analytic requirements for AmpliSeq PCR-based sequencing using Ion Torrent Personal Genome Machine (PGM) (Life Technologies), a NGS sequencing platform that is often used by clinical laboratories for sequencing solid tumors because of its low input DNA requirement from formalin fixed and paraffin embedded tissue. The success of NGS mutational analysis is affected not only by the input DNA quantity but also by several other factors, including the specimen type, the DNA quality, and the tumor cellularity. Here, we review tissue requirements for solid tumor NGS based mutational analysis, including procedure types, tissue types, tumor volume and fraction, decalcification, and treatment effects.
Lopez-Doriga, Adriana; Feliubadaló, Lídia; Menéndez, Mireia; Lopez-Doriga, Sergio; Morón-Duran, Francisco D; del Valle, Jesús; Tornero, Eva; Montes, Eva; Cuesta, Raquel; Campos, Olga; Gómez, Carolina; Pineda, Marta; González, Sara; Moreno, Victor; Capellá, Gabriel; Lázaro, Conxi
2014-03-01
Next-generation sequencing (NGS) has revolutionized genomic research and is set to have a major impact on genetic diagnostics thanks to the advent of benchtop sequencers and flexible kits for targeted libraries. Among the main hurdles in NGS are the difficulty of performing bioinformatic analysis of the huge volume of data generated and the high number of false positive calls that could be obtained, depending on the NGS technology and the analysis pipeline. Here, we present the development of a free and user-friendly Web data analysis tool that detects and filters sequence variants, provides coverage information, and allows the user to customize some basic parameters. The tool has been developed to provide accurate genetic analysis of targeted sequencing of common high-risk hereditary cancer genes using amplicon libraries run in a GS Junior System. The Web resource is linked to our own mutation database, to assist in the clinical classification of identified variants. We believe that this tool will greatly facilitate the use of the NGS approach in routine laboratories.
Philipp, E E R; Kraemer, L; Mountfort, D; Schilhabel, M; Schreiber, S; Rosenstiel, P
2012-03-15
Next generation sequencing (NGS) technologies allow a rapid and cost-effective compilation of large RNA sequence datasets in model and non-model organisms. However, the storage and analysis of transcriptome information from different NGS platforms is still a significant bottleneck, leading to a delay in data dissemination and subsequent biological understanding. Especially database interfaces with transcriptome analysis modules going beyond mere read counts are missing. Here, we present the Transcriptome Analysis and Comparison Explorer (T-ACE), a tool designed for the organization and analysis of large sequence datasets, and especially suited for transcriptome projects of non-model organisms with little or no a priori sequence information. T-ACE offers a TCL-based interface, which accesses a PostgreSQL database via a php-script. Within T-ACE, information belonging to single sequences or contigs, such as annotation or read coverage, is linked to the respective sequence and immediately accessible. Sequences and assigned information can be searched via keyword- or BLAST-search. Additionally, T-ACE provides within and between transcriptome analysis modules on the level of expression, GO terms, KEGG pathways and protein domains. Results are visualized and can be easily exported for external analysis. We developed T-ACE for laboratory environments, which have only a limited amount of bioinformatics support, and for collaborative projects in which different partners work on the same dataset from different locations or platforms (Windows/Linux/MacOS). For laboratories with some experience in bioinformatics and programming, the low complexity of the database structure and open-source code provides a framework that can be customized according to the different needs of the user and transcriptome project.
USDA-ARS?s Scientific Manuscript database
The complete genome sequence of a Southern tomato virus (STV) isolate on tomato plants in a seed production field in Bangladesh was obtained for the first time using next generation sequencing. The identified isolate STV_BD-13 shares high degree of sequence identity (99%) with several known STV isol...
USDA-ARS?s Scientific Manuscript database
Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...
ERIC Educational Resources Information Center
Taylor, D. Leland; Campbell, A. Malcolm; Heyer, Laurie J.
2013-01-01
Next-generation sequencing technologies have greatly reduced the cost of sequencing genomes. With the current sequencing technology, a genome is broken into fragments and sequenced, producing millions of "reads." A computer algorithm pieces these reads together in the genome assembly process. PHAST is a set of online modules…
Fassan, Matteo; Rachiglio, Anna Maria; Cappellesso, Rocco; Antonello, Davide; Amato, Eliana; Mafficini, Andrea; Lambiase, Matilde; Esposito, Claudia; Bria, Emilio; Simonato, Francesca; Scardoni, Maria; Turri, Giona; Chilosi, Marco; Tortora, Giampaolo; Fassina, Ambrogio; Normanno, Nicola
2013-01-01
Identification of driver mutations in lung adenocarcinoma has led to development of targeted agents that are already approved for clinical use or are in clinical trials. Therefore, the number of biomarkers that will be needed to assess is expected to rapidly increase. This calls for the implementation of methods probing the mutational status of multiple genes for inoperable cases, for which limited cytological or bioptic material is available. Cytology specimens from 38 lung adenocarcinomas were subjected to the simultaneous assessment of 504 mutational hotspots of 22 lung cancer-associated genes using 10 nanograms of DNA and Ion Torrent PGM next-generation sequencing. Thirty-six cases were successfully sequenced (95%). In 24/36 cases (67%) at least one mutated gene was observed, including EGFR, KRAS, PIK3CA, BRAF, TP53, PTEN, MET, SMAD4, FGFR3, STK11, MAP2K1. EGFR and KRAS mutations, respectively found in 6/36 (16%) and 10/36 (28%) cases, were mutually exclusive. Nine samples (25%) showed concurrent alterations in different genes. The next-generation sequencing test used is superior to current standard methodologies, as it interrogates multiple genes and requires limited amounts of DNA. Its applicability to routine cytology samples might allow a significant increase in the fraction of lung cancer patients eligible for personalized therapy. PMID:24236184
Lo, David; Weng, Jingning; Liu, xiaohong; Yang, Juhua; He, Fen; Wang, Yun; Liu, Xuyang
2016-01-01
PURPOSE To detect the disease-causing gene in a Chinese pedigree with autosomal-recessive retinitis pigmentosa (ARRP). METHODS All subjects in this family underwent a complete ophthalmic examination. Targeted-capture next generation sequencing (NGS) was performed on the proband to detect variants. All variants were verified in the remaining family members by PCR amplification and Sanger sequencing. RESULTS All the affected subjects in this pedigree were diagnosed with retinitis pigmentosa (RP). The compound heterozygous c.138delA (p.Asp47IlefsX24) and c.1841G>T (p.Gly614Val) mutations in the Crumbs homolog 1 (CRB1) gene were identified in all the affected patients but not in the unaffected individuals in this family. These mutations were inherited from their parents, respectively. CONCLUSION The novel compound heterozygous mutations in CRB1 were identified in a Chinese pedigree with ARRP using targeted-capture next generation sequencing. After evaluating the significant heredity and impaired protein function, the compound heterozygous c.138delA (p.Asp47IlefsX24) and c.1841G>T (p.Gly614Val) mutations are the causal genes of early onset ARRP in this pedigree. To the best of our knowledge, there is no previous report regarding the compound mutations. PMID:27806333
Farhat, Maha; Shaheed, Raja A; Al-Ali, Haider H; Al-Ghamdi, Abdullah S; Al-Hamaqi, Ghadeer M; Maan, Hawraa S; Al-Mahfoodh, Zainab A; Al-Seba, Hussain Z
2018-02-01
To investigate the presence of Legionella spp in cooling tower water. Legionella proliferation in cooling tower water has serious public health implications as it can be transmitted to humans via aerosols and cause Legionnaires' disease. Samples of cooling tower water were collected from King Fahd Hospital of the University (KFHU) (Imam Abdulrahman Bin Faisal University, 2015/2016). The water samples were analyzed by a standard Legionella culture method, real-time polymerase chain reaction (RT-PCR), and 16S rRNA next-generation sequencing. In addition, the bacterial community composition was evaluated. All samples were negative by conventional Legionella culture. In contrast, all water samples yielded positive results by real-time PCR (105 to 106 GU/L). The results of 16S rRNA next generation sequencing showed high similarity and reproducibility among the water samples. The majority of sequences were Alpha-, Beta-, and Gamma-proteobacteria, and Legionella was the predominant genus. The hydrogen-oxidizing gram-negative bacterium Hydrogenophaga was present at high abundance, indicating high metabolic activity. Sphingopyxis, which is known for its resistance to antimicrobials and as a pioneer in biofilm formation, was also detected. Our findings indicate that monitoring of Legionella in cooling tower water would be enhanced by use of both conventional culturing and molecular methods.
Goldberg, Brittany; Sichtig, Heike; Geyer, Chelsie; Ledeboer, Nathan
2015-01-01
ABSTRACT Next-generation DNA sequencing (NGS) has progressed enormously over the past decade, transforming genomic analysis and opening up many new opportunities for applications in clinical microbiology laboratories. The impact of NGS on microbiology has been revolutionary, with new microbial genomic sequences being generated daily, leading to the development of large databases of genomes and gene sequences. The ability to analyze microbial communities without culturing organisms has created the ever-growing field of metagenomics and microbiome analysis and has generated significant new insights into the relation between host and microbe. The medical literature contains many examples of how this new technology can be used for infectious disease diagnostics and pathogen analysis. The implementation of NGS in medical practice has been a slow process due to various challenges such as clinical trials, lack of applicable regulatory guidelines, and the adaptation of the technology to the clinical environment. In April 2015, the American Academy of Microbiology (AAM) convened a colloquium to begin to define these issues, and in this document, we present some of the concepts that were generated from these discussions. PMID:26646014
Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) Webinar Series
The Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) Webinar Series provides an opportunity for our grantees and other interested individuals to share lessons learned and practical information regarding the application of next generation sequencing to cancer epidemiology studies.
Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS.
Warris, Sven; Yalcin, Feyruz; Jackson, Katherine J L; Nap, Jan Peter
2015-01-01
To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis. With the Parallel SW Alignment Software (PaSWAS) it is possible (a) to have easy access to the computational power of NVIDIA-based general purpose graphics processing units (GPGPUs) to perform high-speed sequence alignments, and (b) retrieve relevant information such as score, number of gaps and mismatches. The software reports multiple hits per alignment. The added value of the new SW implementation is demonstrated with two test cases: (1) tag recovery in next generation sequence data and (2) isotype assignment within an immunoglobulin 454 sequence data set. Both cases show the usability and versatility of the new parallel Smith-Waterman implementation.
MALINA: a web service for visual analytics of human gut microbiota whole-genome metagenomic reads.
Tyakht, Alexander V; Popenko, Anna S; Belenikin, Maxim S; Altukhov, Ilya A; Pavlenko, Alexander V; Kostryukova, Elena S; Selezneva, Oksana V; Larin, Andrei K; Karpova, Irina Y; Alexeev, Dmitry G
2012-12-07
MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors' knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded from http://hmpdacc.org). MALINA is made freely available on the web at http://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported.
Bybee, Seth M; Bracken-Grissom, Heather; Haynes, Benjamin D; Hermansen, Russell A; Byers, Robert L; Clement, Mark J; Udall, Joshua A; Wilcox, Edward R; Crandall, Keith A
2011-01-01
Next-gen sequencing technologies have revolutionized data collection in genetic studies and advanced genome biology to novel frontiers. However, to date, next-gen technologies have been used principally for whole genome sequencing and transcriptome sequencing. Yet many questions in population genetics and systematics rely on sequencing specific genes of known function or diversity levels. Here, we describe a targeted amplicon sequencing (TAS) approach capitalizing on next-gen capacity to sequence large numbers of targeted gene regions from a large number of samples. Our TAS approach is easily scalable, simple in execution, neither time-nor labor-intensive, relatively inexpensive, and can be applied to a broad diversity of organisms and/or genes. Our TAS approach includes a bioinformatic application, BarcodeCrucher, to take raw next-gen sequence reads and perform quality control checks and convert the data into FASTA format organized by gene and sample, ready for phylogenetic analyses. We demonstrate our approach by sequencing targeted genes of known phylogenetic utility to estimate a phylogeny for the Pancrustacea. We generated data from 44 taxa using 68 different 10-bp multiplexing identifiers. The overall quality of data produced was robust and was informative for phylogeny estimation. The potential for this method to produce copious amounts of data from a single 454 plate (e.g., 325 taxa for 24 loci) significantly reduces sequencing expenses incurred from traditional Sanger sequencing. We further discuss the advantages and disadvantages of this method, while offering suggestions to enhance the approach.
Bybee, Seth M.; Bracken-Grissom, Heather; Haynes, Benjamin D.; Hermansen, Russell A.; Byers, Robert L.; Clement, Mark J.; Udall, Joshua A.; Wilcox, Edward R.; Crandall, Keith A.
2011-01-01
Next-gen sequencing technologies have revolutionized data collection in genetic studies and advanced genome biology to novel frontiers. However, to date, next-gen technologies have been used principally for whole genome sequencing and transcriptome sequencing. Yet many questions in population genetics and systematics rely on sequencing specific genes of known function or diversity levels. Here, we describe a targeted amplicon sequencing (TAS) approach capitalizing on next-gen capacity to sequence large numbers of targeted gene regions from a large number of samples. Our TAS approach is easily scalable, simple in execution, neither time-nor labor-intensive, relatively inexpensive, and can be applied to a broad diversity of organisms and/or genes. Our TAS approach includes a bioinformatic application, BarcodeCrucher, to take raw next-gen sequence reads and perform quality control checks and convert the data into FASTA format organized by gene and sample, ready for phylogenetic analyses. We demonstrate our approach by sequencing targeted genes of known phylogenetic utility to estimate a phylogeny for the Pancrustacea. We generated data from 44 taxa using 68 different 10-bp multiplexing identifiers. The overall quality of data produced was robust and was informative for phylogeny estimation. The potential for this method to produce copious amounts of data from a single 454 plate (e.g., 325 taxa for 24 loci) significantly reduces sequencing expenses incurred from traditional Sanger sequencing. We further discuss the advantages and disadvantages of this method, while offering suggestions to enhance the approach. PMID:22002916
A Foray into Fungal Ecology: Understanding Fungi and Their Functions Across Ecosystems
NASA Astrophysics Data System (ADS)
Francis, N.; Dunkirk, N. C.; Peay, K.
2015-12-01
Despite their incredible diversity and importance to terrestrial ecosystems, fungi are not included in a standard high school science curriculum. This past summer, however, my work for the Stanford EARTH High School Internship program introduced me to fungal ecology through experiments involving culturing, genomics and root dissections. The two fungal experiments I worked on had very different foci, both searching for answers to broad ecological questions of fungal function and physiology. The first, a symbiosis experiment, sought to determine if the partners of the nutrient exchange between pine trees and their fungal symbionts could choose one another. The second experiment, a dung fungal succession project, compared the genetic sequencing results of fungal extractions from dung versus fungal cultures from dung. My part in the symbiosis experiment involved dissection, weighing and encapsulation of root tissue samples characterized based on the root thickness and presence of ectomycorrhizal fungi. The dung fungi succession project required that I not only learn how to culture various genera of dung fungi but also learn how to extract DNA and RNA for sequencing from the fungal tissue. Although I primarily worked with dung fungi cultures and thereby learned about their unique physiologies, I also learned about the different types of genetic sequencing since the project compared sequences of cultured fungi versus Next Generation sequencing of all fungi present within a dung pellet. Through working on distinct fungal projects that reassess how information about fungi is known within the field of fungal ecology, I learned not only about the two experiments I worked on but also many past related experiments and inquiries through reading scientific papers. Thanks to my foray into fungal research, I now know not only the broader significance of fungi in ecological research but also how to design and conduct ecological experiments.
Roy, Somak; Durso, Mary Beth; Wald, Abigail; Nikiforov, Yuri E; Nikiforova, Marina N
2014-01-01
A wide repertoire of bioinformatics applications exist for next-generation sequencing data analysis; however, certain requirements of the clinical molecular laboratory limit their use: i) comprehensive report generation, ii) compatibility with existing laboratory information systems and computer operating system, iii) knowledgebase development, iv) quality management, and v) data security. SeqReporter is a web-based application developed using ASP.NET framework version 4.0. The client-side was designed using HTML5, CSS3, and Javascript. The server-side processing (VB.NET) relied on interaction with a customized SQL server 2008 R2 database. Overall, 104 cases (1062 variant calls) were analyzed by SeqReporter. Each variant call was classified into one of five report levels: i) known clinical significance, ii) uncertain clinical significance, iii) pending pathologists' review, iv) synonymous and deep intronic, and v) platform and panel-specific sequence errors. SeqReporter correctly annotated and classified 99.9% (859 of 860) of sequence variants, including 68.7% synonymous single-nucleotide variants, 28.3% nonsynonymous single-nucleotide variants, 1.7% insertions, and 1.3% deletions. One variant of potential clinical significance was re-classified after pathologist review. Laboratory information system-compatible clinical reports were generated automatically. SeqReporter also facilitated quality management activities. SeqReporter is an example of a customized and well-designed informatics solution to optimize and automate the downstream analysis of clinical next-generation sequencing data. We propose it as a model that may envisage the development of a comprehensive clinical informatics solution. Copyright © 2014 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
USDA-ARS?s Scientific Manuscript database
New and emerging next generation sequencing technologies have been promising in reducing sequencing costs, but not significantly for complex polyploid plant genomes such as cotton. Large and highly repetitive genome of G. hirsutum (~2.5GB) is less amenable and cost-intensive with traditional BAC-by...
Yin, Li; Yao, Jiqiang; Gardner, Brent P; Chang, Kaifen; Yu, Fahong; Goodenow, Maureen M
2012-01-01
Next Generation sequencing (NGS) applied to human papilloma viruses (HPV) can provide sensitive methods to investigate the molecular epidemiology of multiple type HPV infection. Currently a genotyping system with a comprehensive collection of updated HPV reference sequences and a capacity to handle NGS data sets is lacking. HPV-QUEST was developed as an automated and rapid HPV genotyping system. The web-based HPV-QUEST subtyping algorithm was developed using HTML, PHP, Perl scripting language, and MYSQL as the database backend. HPV-QUEST includes a database of annotated HPV reference sequences with updated nomenclature covering 5 genuses, 14 species and 150 mucosal and cutaneous types to genotype blasted query sequences. HPV-QUEST processes up to 10 megabases of sequences within 1 to 2 minutes. Results are reported in html, text and excel formats and display e-value, blast score, and local and coverage identities; provide genus, species, type, infection site and risk for the best matched reference HPV sequence; and produce results ready for additional analyses.
Zhang, Jimmy F; James, Francis; Shukla, Anju; Girisha, Katta M; Paciorkowski, Alex R
2017-06-27
We built India Allele Finder, an online searchable database and command line tool, that gives researchers access to variant frequencies of Indian Telugu individuals, using publicly available fastq data from the 1000 Genomes Project. Access to appropriate population-based genomic variant annotation can accelerate the interpretation of genomic sequencing data. In particular, exome analysis of individuals of Indian descent will identify population variants not reflected in European exomes, complicating genomic analysis for such individuals. India Allele Finder offers improved ease-of-use to investigators seeking to identify and annotate sequencing data from Indian populations. We describe the use of India Allele Finder to identify common population variants in a disease quartet whole exome dataset, reducing the number of candidate single nucleotide variants from 84 to 7. India Allele Finder is freely available to investigators to annotate genomic sequencing data from Indian populations. Use of India Allele Finder allows efficient identification of population variants in genomic sequencing data, and is an example of a population-specific annotation tool that simplifies analysis and encourages international collaboration in genomics research.
BeerDeCoded: the open beer metagenome project.
Sobel, Jonathan; Henry, Luc; Rotman, Nicolas; Rando, Gianpaolo
2017-01-01
Next generation sequencing has radically changed research in the life sciences, in both academic and corporate laboratories. The potential impact is tremendous, yet a majority of citizens have little or no understanding of the technological and ethical aspects of this widespread adoption. We designed BeerDeCoded as a pretext to discuss the societal issues related to genomic and metagenomic data with fellow citizens, while advancing scientific knowledge of the most popular beverage of all. In the spirit of citizen science, sample collection and DNA extraction were carried out with the participation of non-scientists in the community laboratory of Hackuarium, a not-for-profit organisation that supports unconventional research and promotes the public understanding of science. The dataset presented herein contains the targeted metagenomic profile of 39 bottled beers from 5 countries, based on internal transcribed spacer (ITS) sequencing of fungal species. A preliminary analysis reveals the presence of a large diversity of wild yeast species in commercial brews. With this project, we demonstrate that coupling simple laboratory procedures that can be carried out in a non-professional environment with state-of-the-art sequencing technologies and targeted metagenomic analyses, can lead to the detection and identification of the microbial content in bottled beer.
BeerDeCoded: the open beer metagenome project
Sobel, Jonathan; Henry, Luc; Rotman, Nicolas; Rando, Gianpaolo
2017-01-01
Next generation sequencing has radically changed research in the life sciences, in both academic and corporate laboratories. The potential impact is tremendous, yet a majority of citizens have little or no understanding of the technological and ethical aspects of this widespread adoption. We designed BeerDeCoded as a pretext to discuss the societal issues related to genomic and metagenomic data with fellow citizens, while advancing scientific knowledge of the most popular beverage of all. In the spirit of citizen science, sample collection and DNA extraction were carried out with the participation of non-scientists in the community laboratory of Hackuarium, a not-for-profit organisation that supports unconventional research and promotes the public understanding of science. The dataset presented herein contains the targeted metagenomic profile of 39 bottled beers from 5 countries, based on internal transcribed spacer (ITS) sequencing of fungal species. A preliminary analysis reveals the presence of a large diversity of wild yeast species in commercial brews. With this project, we demonstrate that coupling simple laboratory procedures that can be carried out in a non-professional environment with state-of-the-art sequencing technologies and targeted metagenomic analyses, can lead to the detection and identification of the microbial content in bottled beer. PMID:29123645
Tan, Swee Jin; Phan, Huan; Gerry, Benjamin Michael; Kuhn, Alexandre; Hong, Lewis Zuocheng; Min Ong, Yao; Poon, Polly Suk Yean; Unger, Marc Alexander; Jones, Robert C; Quake, Stephen R; Burkholder, William F
2013-01-01
Library preparation for next-generation DNA sequencing (NGS) remains a key bottleneck in the sequencing process which can be relieved through improved automation and miniaturization. We describe a microfluidic device for automating laboratory protocols that require one or more column chromatography steps and demonstrate its utility for preparing Next Generation sequencing libraries for the Illumina and Ion Torrent platforms. Sixteen different libraries can be generated simultaneously with significantly reduced reagent cost and hands-on time compared to manual library preparation. Using an appropriate column matrix and buffers, size selection can be performed on-chip following end-repair, dA tailing, and linker ligation, so that the libraries eluted from the chip are ready for sequencing. The core architecture of the device ensures uniform, reproducible column packing without user supervision and accommodates multiple routine protocol steps in any sequence, such as reagent mixing and incubation; column packing, loading, washing, elution, and regeneration; capture of eluted material for use as a substrate in a later step of the protocol; and removal of one column matrix so that two or more column matrices with different functional properties can be used in the same protocol. The microfluidic device is mounted on a plastic carrier so that reagents and products can be aliquoted and recovered using standard pipettors and liquid handling robots. The carrier-mounted device is operated using a benchtop controller that seals and operates the device with programmable temperature control, eliminating any requirement for the user to manually attach tubing or connectors. In addition to NGS library preparation, the device and controller are suitable for automating other time-consuming and error-prone laboratory protocols requiring column chromatography steps, such as chromatin immunoprecipitation.
Davey, Sue; Navarrete, Cristina; Brown, Colin
2017-06-01
Twenty-nine human platelet antigen systems have been described to date, but the majority of current genotyping methods are restricted to the identification of those most commonly associated with alloantibody production in a clinical context. This can result in a protracted investigation if causative human platelet antigens are rare or novel. A targeted next-generation sequencing approach was designed to detect all known human platelet antigens with the additional capability of identifying novel mutations in the encoding genes. A targeted enrichment, high-sensitivity HaloPlex assay was designed to sequence all exons and flanking regions of the six genes known to encode human platelet antigens. Indexed DNA libraries were prepared from 47 previously human platelet antigen-genotyped samples and subsequently combined into one of three pools for sequencing on an Illumina MiSeq platform. The generated FASTQ files were aligned and scrutinized for each human platelet antigen polymorphism using SureCall data analysis software. Forty-six samples were successfully genotyped for human platelet antigens 1 through 29bw, with an average per base coverage depth of 1144. Concordance with historical human platelet antigen genotypes was 100%. A putative novel mutation in Exon 10 of the integrin β-3 (ITGB3) gene from an unsolved case of fetal neonatal alloimmune thrombocytopenia was also detected. A next-generation sequencing-based method that can accurately define all known human platelet antigen polymorphisms was developed. With the ability to sequence up to 96 samples simultaneously, our HaloPlex design could be used for high-throughput human platelet antigen genotyping. This method is also applicable for investigating fetal neonatal alloimmune thrombocytopenia when rare or novel human platelet antigens are suspected. © 2017 AABB.
Tan, Swee Jin; Phan, Huan; Gerry, Benjamin Michael; Kuhn, Alexandre; Hong, Lewis Zuocheng; Min Ong, Yao; Poon, Polly Suk Yean; Unger, Marc Alexander; Jones, Robert C.; Quake, Stephen R.; Burkholder, William F.
2013-01-01
Library preparation for next-generation DNA sequencing (NGS) remains a key bottleneck in the sequencing process which can be relieved through improved automation and miniaturization. We describe a microfluidic device for automating laboratory protocols that require one or more column chromatography steps and demonstrate its utility for preparing Next Generation sequencing libraries for the Illumina and Ion Torrent platforms. Sixteen different libraries can be generated simultaneously with significantly reduced reagent cost and hands-on time compared to manual library preparation. Using an appropriate column matrix and buffers, size selection can be performed on-chip following end-repair, dA tailing, and linker ligation, so that the libraries eluted from the chip are ready for sequencing. The core architecture of the device ensures uniform, reproducible column packing without user supervision and accommodates multiple routine protocol steps in any sequence, such as reagent mixing and incubation; column packing, loading, washing, elution, and regeneration; capture of eluted material for use as a substrate in a later step of the protocol; and removal of one column matrix so that two or more column matrices with different functional properties can be used in the same protocol. The microfluidic device is mounted on a plastic carrier so that reagents and products can be aliquoted and recovered using standard pipettors and liquid handling robots. The carrier-mounted device is operated using a benchtop controller that seals and operates the device with programmable temperature control, eliminating any requirement for the user to manually attach tubing or connectors. In addition to NGS library preparation, the device and controller are suitable for automating other time-consuming and error-prone laboratory protocols requiring column chromatography steps, such as chromatin immunoprecipitation. PMID:23894273
BMPOS: a Flexible and User-Friendly Tool Sets for Microbiome Studies.
Pylro, Victor S; Morais, Daniel K; de Oliveira, Francislon S; Dos Santos, Fausto G; Lemos, Leandro N; Oliveira, Guilherme; Roesch, Luiz F W
2016-08-01
Recent advances in science and technology are leading to a revision and re-orientation of methodologies, addressing old and current issues under a new perspective. Advances in next generation sequencing (NGS) are allowing comparative analysis of the abundance and diversity of whole microbial communities, generating a large amount of data and findings at a systems level. The current limitation for biologists has been the increasing demand for computational power and training required for processing of NGS data. Here, we describe the deployment of the Brazilian Microbiome Project Operating System (BMPOS), a flexible and user-friendly Linux distribution dedicated to microbiome studies. The Brazilian Microbiome Project (BMP) has developed data analyses pipelines for metagenomic studies (phylogenetic marker genes), conducted using the two main high-throughput sequencing platforms (Ion Torrent and Illumina MiSeq). The BMPOS is freely available and possesses the entire requirement of bioinformatics packages and databases to perform all the pipelines suggested by the BMP team. The BMPOS may be used as a bootable live USB stick or installed in any computer with at least 1 GHz CPU and 512 MB RAM, independent of the operating system previously installed. The BMPOS has proved to be effective for sequences processing, sequences clustering, alignment, taxonomic annotation, statistical analysis, and plotting of metagenomic data. The BMPOS has been used during several metagenomic analyses courses, being valuable as a tool for training, and an excellent starting point to anyone interested in performing metagenomic studies. The BMPOS and its documentation are available at http://www.brmicrobiome.org .
NASA's Evolutionary Xenon Thruster: The NEXT Ion Propulsion System for Solar System Exploration
NASA Technical Reports Server (NTRS)
Pencil, Eric J.; Benson, Scott W.
2008-01-01
This viewgraph presentation reviews NASA s Evolutionary Xenon Thruster (NEXT) Ion Propulsion system. The NEXT project is developing a solar electric ion propulsion system. The NEXT project is advancing the capability of ion propulsion to meet NASA robotic science mission needs. The NEXT system is planned to significantly improve performance over the state of the art electric propulsion systems, such as NASA Solar Electric Propulsion Technology Application Readiness (NSTAR). The status of NEXT development is reviewed, including information on the NEXT Thruster, the power processing unit, the propellant management system (PMS), the digital control interface unit, and the gimbal. Block diagrams NEXT system are presented. Also a review of the lessons learned from the Dawn and NSTAR systems is provided. In summary the NEXT project activities through 2007 have brought next-generation ion propulsion technology to a sufficient maturity level.
2012-01-01
Background Brassica oleracea encompass a family of vegetables and cabbage that are among the most widely cultivated crops. In 2009, the B. oleracea Genome Sequencing Project was launched using next generation sequencing technology. None of the available maps were detailed enough to anchor the sequence scaffolds for the Genome Sequencing Project. This report describes the development of a large number of SSR and SNP markers from the whole genome shotgun sequence data of B. oleracea, and the construction of a high-density genetic linkage map using a double haploid mapping population. Results The B. oleracea high-density genetic linkage map that was constructed includes 1,227 markers in nine linkage groups spanning a total of 1197.9 cM with an average of 0.98 cM between adjacent loci. There were 602 SSR markers and 625 SNP markers on the map. The chromosome with the highest number of markers (186) was C03, and the chromosome with smallest number of markers (99) was C09. Conclusions This first high-density map allowed the assembled scaffolds to be anchored to pseudochromosomes. The map also provides useful information for positional cloning, molecular breeding, and integration of information of genes and traits in B. oleracea. All the markers on the map will be transferable and could be used for the construction of other genetic maps. PMID:23033896
Li, Cheng-Wei; Chen, Bor-Sen
2016-01-01
Epigenetic and microRNA (miRNA) regulation are associated with carcinogenesis and the development of cancer. By using the available omics data, including those from next-generation sequencing (NGS), genome-wide methylation profiling, candidate integrated genetic and epigenetic network (IGEN) analysis, and drug response genome-wide microarray analysis, we constructed an IGEN system based on three coupling regression models that characterize protein-protein interaction networks (PPINs), gene regulatory networks (GRNs), miRNA regulatory networks (MRNs), and epigenetic regulatory networks (ERNs). By applying system identification method and principal genome-wide network projection (PGNP) to IGEN analysis, we identified the core network biomarkers to investigate bladder carcinogenic mechanisms and design multiple drug combinations for treating bladder cancer with minimal side-effects. The progression of DNA repair and cell proliferation in stage 1 bladder cancer ultimately results not only in the derepression of miR-200a and miR-200b but also in the regulation of the TNF pathway to metastasis-related genes or proteins, cell proliferation, and DNA repair in stage 4 bladder cancer. We designed a multiple drug combination comprising gefitinib, estradiol, yohimbine, and fulvestrant for treating stage 1 bladder cancer with minimal side-effects, and another multiple drug combination comprising gefitinib, estradiol, chlorpromazine, and LY294002 for treating stage 4 bladder cancer with minimal side-effects.
Systems biology of cancer biomarker detection.
Mitra, Sanga; Das, Smarajit; Chakrabarti, Jayprokas
2013-01-01
Cancer systems-biology is an ever-growing area of research due to explosion of data; how to mine these data and extract useful information is the problem. To have an insight on carcinogenesis one need to systematically mine several resources, such as databases, microarray and next-generation sequences. This review encompasses management and analysis of cancer data, databases construction and data deposition, whole transcriptome and genome comparison, analysing results from high throughput experiments to uncover cellular pathways and molecular interactions, and the design of effective algorithms to identify potential biomarkers. Recent technical advances such as ChIP-on-chip, ChIP-seq and RNA-seq can be applied to get epigenetic information transformed into a high-throughput endeavour to which systems biology and bioinformatics are making significant inroads. The data from ENCODE and GENCODE projects available through UCSC genome browser can be considered as benchmark for comparison and meta-analysis. A pipeline for integrating next generation sequencing data, microarray data, and putting them together with the existing database is discussed. The understanding of cancer genomics is changing the way we approach cancer diagnosis and treatment. To give a better understanding of utilizing available resources' we have chosen oral cancer to show how and what kind of analysis can be done. This review is a computational genomic primer that provides a bird's eye view of computational and bioinformatics' tools currently available to perform integrated genomic and system biology analyses of several carcinoma.
Pathway analysis with next-generation sequencing data.
Zhao, Jinying; Zhu, Yun; Boerwinkle, Eric; Xiong, Momiao
2015-04-01
Although pathway analysis methods have been developed and successfully applied to association studies of common variants, the statistical methods for pathway-based association analysis of rare variants have not been well developed. Many investigators observed highly inflated false-positive rates and low power in pathway-based tests of association of rare variants. The inflated false-positive rates and low true-positive rates of the current methods are mainly due to their lack of ability to account for gametic phase disequilibrium. To overcome these serious limitations, we develop a novel statistic that is based on the smoothed functional principal component analysis (SFPCA) for pathway association tests with next-generation sequencing data. The developed statistic has the ability to capture position-level variant information and account for gametic phase disequilibrium. By intensive simulations, we demonstrate that the SFPCA-based statistic for testing pathway association with either rare or common or both rare and common variants has the correct type 1 error rates. Also the power of the SFPCA-based statistic and 22 additional existing statistics are evaluated. We found that the SFPCA-based statistic has a much higher power than other existing statistics in all the scenarios considered. To further evaluate its performance, the SFPCA-based statistic is applied to pathway analysis of exome sequencing data in the early-onset myocardial infarction (EOMI) project. We identify three pathways significantly associated with EOMI after the Bonferroni correction. In addition, our preliminary results show that the SFPCA-based statistic has much smaller P-values to identify pathway association than other existing methods.
Genome sequences of nine vesicular stomatitis virus isolates from South America
USDA-ARS?s Scientific Manuscript database
We report nine full-genome sequences of vesicular stomatitis virus obtrained by Illumina next-generation sequencing of RNA, isolated from either cattle epithelial suspensions or cell culture supernatants. Seven of these viral genomes belonged to the New Jersey serotype/species, clade III, while two...
Next-generation sequencing provides unprecedented access to genomic information in archival FFPE tissue samples. However, costs and technical challenges related to RNA isolation and enrichment limit use of whole-genome RNA-sequencing for large-scale studies of FFPE specimens. Rec...
Sequencing the Genome of the Heirloom Watermelon Cultivar Charleston Gray
USDA-ARS?s Scientific Manuscript database
The genome of the watermelon cultivar Charleston Gray, a major heirloom which has been used in breeding programs of many watermelon cultivars, was sequenced. Our strategy involved a hybrid approach using the Illumina and 454/Titanium next-generation sequencing technologies. For Illumina, shotgun g...
USDA-ARS?s Scientific Manuscript database
The mitochondrial genome of the bollworm, Helicoverpa zea, was assembled using paired-end nucleotide sequence reads generated with a next-generation sequencing platform. Assembly resulted in a mitogenome of 15,348 bp with greater than 17,000-fold average coverage. Organization of the H. zea mitogen...
DOT National Transportation Integrated Search
2009-01-01
Can a self-calibrating signal control system lead to wider adoption of adaptive traffic control systems? The focus of Next Generation of Smart Traffic Signals, an Exploratory Advanced Research (EAR) Program project, is a system that-with lit...
Quality of Service for Real-Time Applications Over Next Generation Data Networks
NASA Technical Reports Server (NTRS)
Ivancic, William; Atiquzzaman, Mohammed; Bai, Haowei; Su, Hongjun; Jain, Raj; Duresi, Arjan; Goyal, Mukyl; Bharani, Venkata; Liu, Chunlei; Kota, Sastri
2001-01-01
This project, which started on January 1, 2000, was funded by NASA Glenn Research Center for duration of one year. The deliverables of the project included the following tasks: Study of QoS mapping between the edge and core networks envisioned in the Next Generation networks will provide us with the QoS guarantees that can be obtained from next generation networks. Buffer management techniques to provide strict guarantees to real-time end-to-end applications through preferential treatment to packets belonging to real-time applications. In particular, use of ECN to help reduce the loss on high bandwidth-delay product satellite networks needs to be studied. Effect of Prioritized Packet Discard to increase goodput of the network and reduce the buffering requirements in the ATM switches. Provision of new IP circuit emulation services over Satellite IP backbones using MPLS will be studied. Determine the architecture and requirements for internetworking ATN and the Next Generation Internet for real-time applications.
O'Brien, Heath E; Gong, Yunchen; Fung, Pauline; Wang, Pauline W; Guttman, David S
2011-01-01
Next-generation genomic technology has both greatly accelerated the pace of genome research as well as increased our reliance on draft genome sequences. While groups such as the Genomics Standards Consortium have made strong efforts to promote genome standards there is a still a general lack of uniformity among published draft genomes, leading to challenges for downstream comparative analyses. This lack of uniformity is a particular problem when using standard draft genomes that frequently have large numbers of low-quality sequencing tracts. Here we present a proposal for an "enhanced-quality draft" genome that identifies at least 95% of the coding sequences, thereby effectively providing a full accounting of the genic component of the genome. Enhanced-quality draft genomes are easily attainable through a combination of small- and large-insert next-generation, paired-end sequencing. We illustrate the generation of an enhanced-quality draft genome by re-sequencing the plant pathogenic bacterium Pseudomonas syringae pv. phaseolicola 1448A (Pph 1448A), which has a published, closed genome sequence of 5.93 Mbp. We use a combination of Illumina paired-end and mate-pair sequencing, and surprisingly find that de novo assemblies with 100x paired-end coverage and mate-pair sequencing with as low as low as 2-5x coverage are substantially better than assemblies based on higher coverage. The rapid and low-cost generation of large numbers of enhanced-quality draft genome sequences will be of particular value for microbial diagnostics and biosecurity, which rely on precise discrimination of potentially dangerous clones from closely related benign strains.
Seneca, Sara; Vancampenhout, Kim; Van Coster, Rudy; Smet, Joél; Lissens, Willy; Vanlander, Arnaud; De Paepe, Boel; Jonckheere, An; Stouffs, Katrien; De Meirleir, Linda
2015-01-01
Next-generation sequencing (NGS), an innovative sequencing technology that enables the successful analysis of numerous gene sequences in a massive parallel sequencing approach, has revolutionized the field of molecular biology. Although NGS was introduced in a rather recent past, the technology has already demonstrated its potential and effectiveness in many research projects, and is now on the verge of being introduced into the diagnostic setting of routine laboratories to delineate the molecular basis of genetic disease in undiagnosed patient samples. We tested a benchtop device on retrospective genomic DNA (gDNA) samples of controls and patients with a clinical suspicion of a mitochondrial DNA disorder. This Ion Torrent Personal Genome Machine platform is a high-throughput sequencer with a fast turnaround time and reasonable running costs. We challenged the chemistry and technology with the analysis and processing of a mutational spectrum composed of samples with single-nucleotide substitutions, indels (insertions and deletions) and large single or multiple deletions, occasionally in heteroplasmy. The output data were compared with previously obtained conventional dideoxy sequencing results and the mitochondrial revised Cambridge Reference Sequence (rCRS). We were able to identify the majority of all nucleotide alterations, but three false-negative results were also encountered in the data set. At the same time, the poor performance of the PGM instrument in regions associated with homopolymeric stretches generated many false-positive miscalls demanding additional manual curation of the data.
The FDA's Experience with Emerging Genomics Technologies-Past, Present, and Future.
Xu, Joshua; Thakkar, Shraddha; Gong, Binsheng; Tong, Weida
2016-07-01
The rapid advancement of emerging genomics technologies and their application for assessing safety and efficacy of FDA-regulated products require a high standard of reliability and robustness supporting regulatory decision-making in the FDA. To facilitate the regulatory application, the FDA implemented a novel data submission program, Voluntary Genomics Data Submission (VGDS), and also to engage the stakeholders. As part of the endeavor, for the past 10 years, the FDA has led an international consortium of regulatory agencies, academia, pharmaceutical companies, and genomics platform providers, which was named MicroArray Quality Control Consortium (MAQC), to address issues such as reproducibility, precision, specificity/sensitivity, and data interpretation. Three projects have been completed so far assessing these genomics technologies: gene expression microarrays, whole genome genotyping arrays, and whole transcriptome sequencing (i.e., RNA-seq). The resultant studies provide the basic parameters for fit-for-purpose application of these new data streams in regulatory environments, and the solutions have been made available to the public through peer-reviewed publications. The latest MAQC project is also called the SEquencing Quality Control (SEQC) project focused on next-generation sequencing. Using reference samples with built-in controls, SEQC studies have demonstrated that relative gene expression can be measured accurately and reliably across laboratories and RNA-seq platforms. Besides prediction performance comparable to microarrays in clinical settings and safety assessments, RNA-seq is shown to have better sensitivity for low expression and reveal novel transcriptomic features. Future effort of MAQC will be focused on quality control of whole genome sequencing and targeted sequencing.
The FDA’s Experience with Emerging Genomics Technologies—Past, Present, and Future
Xu, Joshua; Thakkar, Shraddha; Gong, Binsheng; Tong, Weida
2016-01-01
The rapid advancement of emerging genomics technologies and their application for assessing safety and efficacy of FDA-regulated products require a high standard of reliability and robustness supporting regulatory decision-making in the FDA. To facilitate the regulatory application, the FDA implemented a novel data submission program, Voluntary Genomics Data Submission (VGDS), and also to engage the stakeholders. As part of the endeavor, for the past 10 years, the FDA has led an international consortium of regulatory agencies, academia, pharmaceutical companies, and genomics platform providers, which was named MicroArray Quality Control Consortium (MAQC), to address issues such as reproducibility, precision, specificity/sensitivity, and data interpretation. Three projects have been completed so far assessing these genomics technologies: gene expression microarrays, whole genome genotyping arrays, and whole transcriptome sequencing (i.e., RNA-seq). The resultant studies provide the basic parameters for fit-for-purpose application of these new data streams in regulatory environments, and the solutions have been made available to the public through peer-reviewed publications. The latest MAQC project is also called the SEquencing Quality Control (SEQC) project focused on next-generation sequencing. Using reference samples with built-in controls, SEQC studies have demonstrated that relative gene expression can be measured accurately and reliably across laboratories and RNA-seq platforms. Besides prediction performance comparable to microarrays in clinical settings and safety assessments, RNA-seq is shown to have better sensitivity for low expression and reveal novel transcriptomic features. Future effort of MAQC will be focused on quality control of whole genome sequencing and targeted sequencing. PMID:27116022
USDA-ARS?s Scientific Manuscript database
Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...
Mitsui, Jun; Fukuda, Yoko; Azuma, Kyo; Tozaki, Hirokazu; Ishiura, Hiroyuki; Takahashi, Yuji; Goto, Jun; Tsuji, Shoji
2010-07-01
We have recently found that multiple rare variants of the glucocerebrosidase gene (GBA) confer a robust risk for Parkinson disease, supporting the 'common disease-multiple rare variants' hypothesis. To develop an efficient method of identifying rare variants in a large number of samples, we applied multiplexed resequencing using a next-generation sequencer to identification of rare variants of GBA. Sixteen sets of pooled DNAs from six pooled DNA samples were prepared. Each set of pooled DNAs was subjected to polymerase chain reaction to amplify the target gene (GBA) covering 6.5 kb, pooled into one tube with barcode indexing, and then subjected to extensive sequence analysis using the SOLiD System. Individual samples were also subjected to direct nucleotide sequence analysis. With the optimization of data processing, we were able to extract all the variants from 96 samples with acceptable rates of false-positive single-nucleotide variants.
Kim, Won-Keun; No, Jin Sun; Lee, Seung-Ho; Song, Dong Hyun; Lee, Daesang; Kim, Jeong-Ah; Gu, Se Hun; Park, Sunhye; Jeong, Seong Tae; Kim, Heung-Chul; Klein, Terry A; Wiley, Michael R; Palacios, Gustavo; Song, Jin-Won
2018-02-01
Seoul virus (SEOV) poses a worldwide public health threat. This virus, which is harbored by Rattus norvegicus and R. rattus rats, is the causative agent of hemorrhagic fever with renal syndrome (HFRS) in humans, which has been reported in Asia, Europe, the Americas, and Africa. Defining SEOV genome sequences plays a critical role in development of preventive and therapeutic strategies against the unique worldwide hantavirus. We applied multiplex PCR-based next-generation sequencing to obtain SEOV genome sequences from clinical and reservoir host specimens. Epidemiologic surveillance of R. norvegicus rats in South Korea during 2000-2016 demonstrated that the serologic prevalence of enzootic SEOV infections was not significant on the basis of sex, weight (age), and season. Viral loads of SEOV in rats showed wide dissemination in tissues and dynamic circulation among populations. Phylogenetic analyses showed the global diversity of SEOV and possible genomic configuration of genetic exchanges.
Bijwaard, Karen; Dickey, Jennifer S; Kelm, Kellie; Težak, Živana
2015-01-01
The rapid emergence and clinical translation of novel high-throughput sequencing technologies created a need to clarify the regulatory pathway for the evaluation and authorization of these unique technologies. Recently, the US FDA authorized for marketing four next generation sequencing (NGS)-based diagnostic devices which consisted of two heritable disease-specific assays, library preparation reagents and a NGS platform that are intended for human germline targeted sequencing from whole blood. These first authorizations can serve as a case study in how different types of NGS-based technology are reviewed by the FDA. In this manuscript we describe challenges associated with the evaluation of these novel technologies and provide an overview of what was reviewed. Besides making validated NGS-based devices available for in vitro diagnostic use, these first authorizations create a regulatory path for similar future instruments and assays.
Shen, Kang-Ning; Chen, Ching-Hung; Hsiao, Chung-Der; Durand, Jean-Dominique
2016-09-01
In this study, the complete mitogenome sequence of a cryptic species from East Australia (Mugil sp. H) belonging to the worldwide Mugil cephalus species complex (Teleostei: Mugilidae) has been sequenced by next-generation sequencing method. The assembled mitogenome, consisting of 16,845 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein-coding genes, 22 transfer RNAs, 2 ribosomal RNAs genes and a non-coding control region of D-loop. D-loop consists of 1067 bp length, and is located between tRNA-Pro and tRNA-Phe. The overall base composition of East Australia M. cephalus is 28.4% for A, 29.3% for C, 15.4% for G and 26.9% for T. The complete mitogenome may provide essential and important DNA molecular data for further phylogenetic and evolutionary analysis for flathead mullet species complex.
Shen, Kang-Ning; Yen, Ta-Chi; Chen, Ching-Hung; Li, Huei-Ying; Chen, Pei-Lung; Hsiao, Chung-Der
2016-05-01
In this study, the complete mitogenome sequence of Northwestern Pacific 2 (NWP2) cryptic species of flathead mullet, Mugil cephalus (Teleostei: Mugilidae) has been amplified by long-range PCR and sequenced by next-generation sequencing method. The assembled mitogenome, consisting of 16,686 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein-coding genes, 22 transfer RNAs, 2 ribosomal RNAs genes and a non-coding control region of D-loop. D-loop was 909 bp length and was located between tRNA-Pro and tRNA-Phe. The overall base composition of NWP2 M. cephalus was 28.4% for A, 29.8% for C, 26.5% for T and 15.3% for G. The complete mitogenome may provide essential and important DNA molecular data for further phylogenetic and evolutionary analysis for flathead mullet species complex.
NABIC: A New Access Portal to Search, Visualize, and Share Agricultural Genomics Data
Seol, Young-Joo; Lee, Tae-Ho; Park, Dong-Suk; Kim, Chang-Kug
2016-01-01
The National Agricultural Biotechnology Information Center developed an access portal to search, visualize, and share agricultural genomics data with a focus on South Korean information and resources. The portal features an agricultural biotechnology database containing a wide range of omics data from public and proprietary sources. We collected 28.4 TB of data from 162 agricultural organisms, with 10 types of omics data comprising next-generation sequencing sequence read archive, genome, gene, nucleotide, DNA chip, expressed sequence tag, interactome, protein structure, molecular marker, and single-nucleotide polymorphism datasets. Our genomic resources contain information on five animals, seven plants, and one fungus, which is accessed through a genome browser. We also developed a data submission and analysis system as a web service, with easy-to-use functions and cutting-edge algorithms, including those for handling next-generation sequencing data. PMID:26848255
Zhao, Min; Wang, Qingguo; Wang, Quan; Jia, Peilin; Zhao, Zhongming
2013-01-01
Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development.
2013-01-01
Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development. PMID:24564169
De Bellis, Fabien; Malapa, Roger; Kagy, Valérie; Lebegin, Stéphane; Billot, Claire; Labouisse, Jean-Pierre
2016-08-01
Using next-generation sequencing technology, new microsatellite loci were characterized in Artocarpus altilis (Moraceae) and two congeners to increase the number of available markers for genotyping breadfruit cultivars. A total of 47,607 simple sequence repeat loci were obtained by sequencing a library of breadfruit genomic DNA with an Illumina MiSeq system. Among them, 50 single-locus markers were selected and assessed using 41 samples (39 A. altilis, one A. camansi, and one A. heterophyllus). All loci were polymorphic in A. altilis, 44 in A. camansi, and 21 in A. heterophyllus. The number of alleles per locus ranged from two to 19. The new markers will be useful for assessing the identity and genetic diversity of breadfruit cultivars on a small geographical scale, gaining a better understanding of farmer management practices, and will help to optimize breadfruit genebank management.
Rapid and Easy Protocol for Quantification of Next-Generation Sequencing Libraries.
Hawkins, Steve F C; Guest, Paul C
2018-01-01
The emergence of next-generation sequencing (NGS) over the last 10 years has increased the efficiency of DNA sequencing in terms of speed, ease, and price. However, the exact quantification of a NGS library is crucial in order to obtain good data on sequencing platforms developed by the current market leader Illumina. Different approaches for DNA quantification are available currently and the most commonly used are based on analysis of the physical properties of the DNA through spectrophotometric or fluorometric methods. Although these methods are technically simple, they do not allow exact quantification as can be achieved using a real-time quantitative PCR (qPCR) approach. A qPCR protocol for DNA quantification with applications in NGS library preparation studies is presented here. This can be applied in various fields of study such as medical disorders resulting from nutritional programming disturbances.
Bodini, Margherita; Ronchini, Chiara; Giacò, Luciano; Russo, Anna; Melloni, Giorgio E. M.; Luzi, Lucilla; Sardella, Domenico; Volorio, Sara; Hasan, Syed K.; Ottone, Tiziana; Lavorgna, Serena; Lo-Coco, Francesco; Candoni, Anna; Fanin, Renato; Toffoletti, Eleonora; Iacobucci, Ilaria; Martinelli, Giovanni; Cignetti, Alessandro; Tarella, Corrado; Bernard, Loris; Pelicci, Pier Giuseppe
2015-01-01
The analyses carried out using 2 different bioinformatics pipelines (SomaticSniper and MuTect) on the same set of genomic data from 133 acute myeloid leukemia (AML) patients, sequenced inside the Cancer Genome Atlas project, gave discrepant results. We subsequently tested these 2 variant-calling pipelines on 20 leukemia samples from our series (19 primary AMLs and 1 secondary AML). By validating many of the predicted somatic variants (variant allele frequencies ranging from 100% to 5%), we observed significantly different calling efficiencies. In particular, despite relatively high specificity, sensitivity was poor in both pipelines resulting in a high rate of false negatives. Our findings raise the possibility that landscapes of AML genomes might be more complex than previously reported and characterized by the presence of hundreds of genes mutated at low variant allele frequency, suggesting that the application of genome sequencing to the clinic requires a careful and critical evaluation. We think that improvements in technology and workflow standardization, through the generation of clear experimental and bioinformatics guidelines, are fundamental to translate the use of next-generation sequencing from research to the clinic and to transform genomic information into better diagnosis and outcomes for the patient. PMID:25499761
Mack, Steven J.; Milius, Robert P.; Gifford, Benjamin D.; Sauter, Jürgen; Hofmann, Jan; Osoegawa, Kazutoyo; Robinson, James; Groeneweg, Mathijs; Turenchalk, Gregory S.; Adai, Alex; Holcomb, Cherie; Rozemuller, Erik H.; Penning, Maarten T.; Heuer, Michael L.; Wang, Chunlin; Salit, Marc L.; Schmidt, Alexander H.; Parham, Peter R.; Müller, Carlheinz; Hague, Tim; Fischer, Gottfried; Fernandez-Viňa, Marcelo; Hollenbach, Jill A; Norman, Paul J.; Maiers, Martin
2015-01-01
The development of next-generation sequencing (NGS) technologies for HLA and KIR genotyping is rapidly advancing knowledge of genetic variation of these highly polymorphic loci. NGS genotyping is poised to replace older methods for clinical use, but standard methods for reporting and exchanging these new, high quality genotype data are needed. The Immunogenomic NGS Consortium, a broad collaboration of histocompatibility and immunogenetics clinicians, researchers, instrument manufacturers and software developers, has developed the Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) reporting guidelines. MIRING is a checklist that specifies the content of NGS genotyping results as well as a set of messaging guidelines for reporting the results. A MIRING message includes five categories of structured information – message annotation, reference context, full genotype, consensus sequence and novel polymorphism – and references to three categories of accessory information – NGS platform documentation, read processing documentation and primary data. These eight categories of information ensure the long-term portability and broad application of this NGS data for all current histocompatibility and immunogenetics use cases. In addition, MIRING can be extended to allow the reporting of genotype data generated using pre-NGS technologies. Because genotyping results reported using MIRING are easily updated in accordance with reference and nomenclature databases, MIRING represents a bold departure from previous methods of reporting HLA and KIR genotyping results, which have provided static and less-portable data. More information about MIRING can be found online at miring.immunogenomics.org. PMID:26407912
ViennaNGS: A toolbox for building efficient next- generation sequencing analysis pipelines
Wolfinger, Michael T.; Fallmann, Jörg; Eggenhofer, Florian; Amman, Fabian
2015-01-01
Recent achievements in next-generation sequencing (NGS) technologies lead to a high demand for reuseable software components to easily compile customized analysis workflows for big genomics data. We present ViennaNGS, an integrated collection of Perl modules focused on building efficient pipelines for NGS data processing. It comes with functionality for extracting and converting features from common NGS file formats, computation and evaluation of read mapping statistics, as well as normalization of RNA abundance. Moreover, ViennaNGS provides software components for identification and characterization of splice junctions from RNA-seq data, parsing and condensing sequence motif data, automated construction of Assembly and Track Hubs for the UCSC genome browser, as well as wrapper routines for a set of commonly used NGS command line tools. PMID:26236465
BioPig: Developing Cloud Computing Applications for Next-Generation Sequence Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhatia, Karan; Wang, Zhong
Next Generation sequencing is producing ever larger data sizes with a growth rate outpacing Moore's Law. The data deluge has made many of the current sequenceanalysis tools obsolete because they do not scale with data. Here we present BioPig, a collection of cloud computing tools to scale data analysis and management. Pig is aflexible data scripting language that uses Apache's Hadoop data structure and map reduce framework to process very large data files in parallel and combine the results.BioPig extends Pig with capability with sequence analysis. We will show the performance of BioPig on a variety of bioinformatics tasks, includingmore » screeningsequence contaminants, Illumina QA/QC, and gene discovery from metagenome data sets using the Rumen metagenome as an example.« less
Next-Generation Sequencing of Coccidioides immitis Isolated during Cluster Investigation
Engelthaler, David M.; Chiller, Tom; Schupp, James A.; Colvin, Joshua; Beckstrom-Sternberg, Stephen M.; Driebe, Elizabeth M.; Moses, Tracy; Tembe, Waibhav; Sinari, Shripad; Beckstrom-Sternberg, James S.; Christoforides, Alexis; Pearson, John V.; Carpten, John; Keim, Paul; Peterson, Ashley; Terashita, Dawn
2011-01-01
Next-generation sequencing enables use of whole-genome sequence typing (WGST) as a viable and discriminatory tool for genotyping and molecular epidemiologic analysis. We used WGST to confirm the linkage of a cluster of Coccidioides immitis isolates from 3 patients who received organ transplants from a single donor who later had positive test results for coccidioidomycosis. Isolates from the 3 patients were nearly genetically identical (a total of 3 single-nucleotide polymorphisms identified among them), thereby demonstrating direct descent of the 3 isolates from an original isolate. We used WGST to demonstrate the genotypic relatedness of C. immitis isolates that were also epidemiologically linked. Thus, WGST offers unique benefits to public health for investigation of clusters considered to be linked to a single source. PMID:21291593
Bowerman, Bruce
2011-10-01
Molecular genetic investigation of the early Caenorhabditis elegans embryo has contributed substantially to the discovery and general understanding of the genes, pathways, and mechanisms that regulate and execute developmental and cell biological processes. Initially, worm geneticists relied exclusively on a classical genetics approach, isolating mutants with interesting phenotypes after mutagenesis and then determining the identity of the affected genes. Subsequently, the discovery of RNA interference (RNAi) led to a much greater reliance on a reverse genetics approach: reducing the function of known genes with RNAi and then observing the phenotypic consequences. Now the advent of next-generation DNA sequencing technologies and the ensuing ease and affordability of whole-genome sequencing are reviving the use of classical genetics to investigate early C. elegans embryogenesis.
Identifying molecular drivers of gastric cancer through next-generation sequencing.
Liang, Han; Kim, Yon Hui
2013-11-01
Gastric cancer is the second most common cause of cancer-related death in the world, representing a major global health issue. The high mortality rate is largely due to the lack of effective medical treatment for advanced stages of this disease. Recently next-generation sequencing (NGS) technology has become a revolutionary tool for cancer research, and several NGS studies in gastric cancer have been published. Here we review the insights gained from these studies regarding how use NGS to elucidate the molecular basis of gastric cancer and identify potential therapeutic targets. We also discuss the challenges and future directions of such efforts. Published by Elsevier Ireland Ltd.
Targeted therapy according to next generation sequencing-based panel sequencing.
Saito, Motonobu; Momma, Tomoyuki; Kono, Koji
2018-04-17
Targeted therapy against actionable gene mutations shows a significantly higher response rate as well as longer survival compared to conventional chemotherapy, and has become a standard therapy for many cancers. Recent progress in next-generation sequencing (NGS) has enabled to identify huge number of genetic aberrations. Based on sequencing results, patients recommend to undergo targeted therapy or immunotherapy. In cases where there are no available approved drugs for the genetic mutations detected in the patients, it is recommended to be facilitate the registration for the clinical trials. For that purpose, a NGS-based sequencing panel that can simultaneously target multiple genes in a single investigation has been used in daily clinical practice. To date, various types of sequencing panels have been developed to investigate genetic aberrations with tumor somatic genome variants (gain-of-function or loss-of-function mutations, high-level copy number alterations, and gene fusions) through comprehensive bioinformatics. Because sequencing panels are efficient and cost-effective, they are quickly being adopted outside the lab, in hospitals and clinics, in order to identify personal targeted therapy for individual cancer patients.
de la Fuente, Gabriel; Belanche, Alejandro; Girwood, Susan E.; Pinloche, Eric; Wilkinson, Toby; Newbold, C. Jamie
2014-01-01
The development of next generation sequencing has challenged the use of other molecular fingerprinting methods used to study microbial diversity. We analysed the bacterial diversity in the rumen of defaunated sheep following the introduction of different protozoal populations, using both next generation sequencing (NGS: Ion Torrent PGM) and terminal restriction fragment length polymorphism (T-RFLP). Although absolute number differed, there was a high correlation between NGS and T-RFLP in terms of richness and diversity with R values of 0.836 and 0.781 for richness and Shannon-Wiener index, respectively. Dendrograms for both datasets were also highly correlated (Mantel test = 0.742). Eighteen OTUs and ten genera were significantly impacted by the addition of rumen protozoa, with an increase in the relative abundance of Prevotella, Bacteroides and Ruminobacter, related to an increase in free ammonia levels in the rumen. Our findings suggest that classic fingerprinting methods are still valuable tools to study microbial diversity and structure in complex environments but that NGS techniques now provide cost effect alternatives that provide a far greater level of information on the individual members of the microbial population. PMID:25051490
Disclosure of Incidental Findings From Next-Generation Sequencing in Pediatric Genomic Research
Abdul-Karim, Ruqayyah; Berkman, Benjamin E.; Wendler, David; Rid, Annette; Khan, Javed; Badgett, Tom
2013-01-01
Next-generation sequencing technologies will likely be used with increasing frequency in pediatric research. One consequence will be the increased identification of individual genomic research findings that are incidental to the aims of the research. Although researchers and ethicists have raised theoretical concerns about incidental findings in the context of genetic research, next-generation sequencing will make this once largely hypothetical concern an increasing reality. Most commentators have begun to accept the notion that there is some duty to disclose individual genetic research results to research subjects; however, the scope of that duty remains unclear. These issues are especially complicated in the pediatric setting, where subjects cannot currently but typically will eventually be able to make their own medical decisions at the age of adulthood. This article discusses the management of incidental findings in the context of pediatric genomic research. We provide an overview of the current literature and propose a framework to manage incidental findings in this unique context, based on what we believe is a limited responsibility to disclose. We hope this will be a useful source of guidance for investigators, institutional review boards, and bioethicists that anticipates the complicated ethical issues raised by advances in genomic technology. PMID:23400601
Jung, Yeonjoo; Kim, Pora; Jung, Yeonhwa; Keum, Juhee; Kim, Soon-Nam; Choi, Yong Soo; Do, In-Gu; Lee, Jinseon; Choi, So-Jung; Kim, Sujin; Lee, Jong-Eun; Kim, Jhingook; Lee, Sanghyuk; Kim, Jaesang
2012-06-01
An increasing number of chromosomal aberrations is being identified in solid tumors providing novel biomarkers for various types of cancer and new insights into the mechanisms of carcinogenesis. We applied next generation sequencing technique to analyze the transcriptome of the non-small cell lung carcinoma (NSCLC) cell line H2228 and discovered a fusion transcript composed of multiple exons of ALK (anaplastic lymphoma receptor tyrosine kinase) and PTPN3 (protein tyrosine phosphatase, nonreceptor Type 3). Detailed analysis of the genomic structure revealed that a portion of genomic region encompassing Exons 10 and 11 of ALK has been translocated into the intronic region between Exons 2 and 3 of PTPN3. The key net result appears to be the null mutation of one allele of PTPN3, a gene with tumor suppressor activity. Consistently, ectopic expression of PTPN3 in NSCLC cell lines led to inhibition of colony formation. Our study confirms the utility of next generation sequencing as a tool for the discovery of somatic mutations and has led to the identification of a novel mutation in NSCLC that may be of diagnostic, prognostic, and therapeutic importance. Copyright © 2012 Wiley Periodicals, Inc.
USDA-ARS?s Scientific Manuscript database
Genetic diversity is an essential resource for breeders to improve new cultivars with desirable characteristics. Recently genotyping-by-sequencing (GBS), a next generation sequencing (NGS) based technology that can simplify complex genomes, has been used as a high-throughput and cost-effective molec...
USDA-ARS?s Scientific Manuscript database
The Indianmeal moth, Plodia interpunctella (Lepidoptera: Pyralidae), is a common pest of stored goods with a worldwide distribution. The complete genome sequence for a larval pathogen of this moth, the baculovirus Plodia interpunctella granulovirus (PiGV), was determined by next-generation sequenci...
High-Throughput resequencing of maize landraces at genomic regions associated with flowering time
USDA-ARS?s Scientific Manuscript database
Despite the reduction in the price of sequencing, it remains expensive to sequence and assemble whole, complex genomes of multiple samples for population studies, particularly for large genomes like those of many crop species. Enrichment of target genome regions coupled with next generation sequenci...
Open source tools to exploit DNA sequence data from livestock species
USDA-ARS?s Scientific Manuscript database
Next-Generation Sequencing (NGS) is a recent technological development that allows researchers to rapidly determine the DNA sequence of an individual. The decrease in cost of NGS has brought the technology into the realm of practical applications in livestock genomics, where it can be used to genera...
A Web-Hosted R Workflow to Simplify and Automate the Analysis of 16S NGS Data
Next-Generation Sequencing (NGS) produces large data sets that include tens-of-thousands of sequence reads per sample. For analysis of bacterial diversity, 16S NGS sequences are typically analyzed in a workflow that containing best-of-breed bioinformatics packages that may levera...
USDA-ARS?s Scientific Manuscript database
Next generation sequencing technologies and improved bioinformatics methods have provided opportunities to study sequence variability in complex polyploid transcriptomes. In this study, we used a diverse panel of twenty-two Arachis accessions representing seven Arachis hypogaea market classes, A-, B...
A Pan-HIV Strategy for Complete Genome Sequencing
Yamaguchi, Julie; Alessandri-Gradt, Elodie; Tell, Robert W.; Brennan, Catherine A.
2015-01-01
Molecular surveillance is essential to monitor HIV diversity and track emerging strains. We have developed a universal library preparation method (HIV-SMART [i.e., switching mechanism at 5′ end of RNA transcript]) for next-generation sequencing that harnesses the specificity of HIV-directed priming to enable full genome characterization of all HIV-1 groups (M, N, O, and P) and HIV-2. Broad application of the HIV-SMART approach was demonstrated using a panel of diverse cell-cultured virus isolates. HIV-1 non-subtype B-infected clinical specimens from Cameroon were then used to optimize the protocol to sequence directly from plasma. When multiplexing 8 or more libraries per MiSeq run, full genome coverage at a median ∼2,000× depth was routinely obtained for either sample type. The method reproducibly generated the same consensus sequence, consistently identified viral sequence heterogeneity present in specimens, and at viral loads of ≤4.5 log copies/ml yielded sufficient coverage to permit strain classification. HIV-SMART provides an unparalleled opportunity to identify diverse HIV strains in patient specimens and to determine phylogenetic classification based on the entire viral genome. Easily adapted to sequence any RNA virus, this technology illustrates the utility of next-generation sequencing (NGS) for viral characterization and surveillance. PMID:26699702
Systems and Software Producibility Collaboration and Experimental Environment (SPRUCE)
2009-04-23
Research Manhattan Project Like Research – Transition timeframe needed • Current generation programs – DoD acquisitions over next 1-5 years • Next...Specific Computing Plant B a s i c Transformational Research Manhattan Project Like Research B a s i c 16 • Sponsored by Lockheed Martin
Complete Genome Sequences of Two Vesicular Stomatitis Virus Isolates Collected in Mexico.
Velazquez-Salinas, Lauro; Isa, Pavel; Pauszek, Steven J; Rodriguez, Luis L
2017-09-14
We report two full-genome sequences of vesicular stomatitis New Jersey virus (VSNJV) obtained by Illumina next-generation sequencing of RNA isolated from epithelial suspensions of cattle naturally infected in Mexico. These genomes represent the first full-genome sequences of vesicular stomatitis New Jersey viruses circulating in Mexico deposited in the GenBank database.
Advancing Next-Generation Energy in Indian Country (Fact Sheet)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
This fact provides information on the Strategic Technical Assistance Response Team (START) Program, a U.S. Department of Energy Office of Indian Energy Policy and Programs (DOE-IE) initiative to provide technical expertise to support the development of next-generation energy projects in Indian Country.
Next-Gen 3: Sequencing, Modeling, and Advanced Biofuels - Final Technical Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zengler, Karsten; Palsson, Bernhard; Lewis, Nathan
Successful, scalable implementation of biofuels is dependent on the efficient and near complete utilization of diverse biomass sources. One approach is to utilize the large recalcitrant biomass fraction (or any organic waste stream) through the thermochemical conversion of organic compounds to syngas, a mixture of carbon monoxide (CO), carbon dioxide (CO 2), and hydrogen (H 2), which can subsequently be metabolized by acetogenic microorganisms to produce next-gen biofuels. The goal of this proposal was to advance the development of the acetogen Clostridium ljungdahlii as a chassis organism for next-gen biofuel production from cheap, renewable sources and to detail the interconnectivitymore » of metabolism, energy conservation, and regulation of acetogens using next-gen sequencing and next-gen modeling. To achieve this goal we determined optimization of carbon and energy utilization through differential translational efficiency in C. ljungdahlii. Furthermore, we reconstructed a next-generation model of all major cellular processes, such as macromolecular synthesis and transcriptional regulation and deployed this model to predicting proteome allocation, overflow metabolism, and metal requirements in this model acetogen. In addition we explored the evolutionary significance of tRNA operon structure using the next-gen model and determined the optimal operon structure for bioproduction. Our study substantially enhanced the knowledgebaase for chemolithoautotrophs and their potential for advanced biofuel production. It provides next-generation modeling capability, offer innovative tools for genome-scale engineering, and provide novel methods to utilize next-generation models for the design of tunable systems that produce commodity chemicals from inexpensive sources.« less
Implementing genomic medicine in pathology.
Williams, Eli S; Hegde, Madhuri
2013-07-01
The finished sequence of the Human Genome Project, published 50 years after Watson and Crick's seminal paper on the structure of DNA, pushed human genetics into the public eye and ushered in the genomic era. A significant, if overlooked, aspect of the race to complete the genome was the technology that propelled scientists to the finish line. DNA sequencing technologies have become more standardized, automated, and capable of higher throughput. This technology has continued to grow at an astounding rate in the decade since the Human Genome Project was completed. Today, massively parallel sequencing, or next-generation sequencing (NGS), allows the detection of genetic variants across the entire genome. This ability has led to the identification of new causes of disease and is changing the way we categorize, treat, and manage disease. NGS approaches such as whole-exome sequencing and whole-genome sequencing are rapidly becoming an affordable genetic testing strategy for the clinical laboratory. One test can now provide vast amounts of health information pertaining not only to the disease of interest, but information that may also predict adult-onset disease, reveal carrier status for a rare disease and predict drug responsiveness. The issue of what to do with these incidental findings, along with questions pertaining to NGS testing strategies, data interpretation and storage, and applying genetic testing results into patient care, remains without a clear answer. This review will explore these issues and others relevant to the implementation of NGS in the clinical laboratory.
Karan, M; Evans, D S; Reilly, D; Schulte, K; Wright, C; Innes, D; Holton, T A; Nikles, D G; Dickinson, G R
2012-03-01
Khaya senegalensis (African mahogany or dry-zone mahogany) is a high-value hardwood timber species with great potential for forest plantations in northern Australia. The species is distributed across the sub-Saharan belt from Senegal to Sudan and Uganda. Because of heavy exploitation and constraints on natural regeneration and sustainable planting, it is now classified as a vulnerable species. Here, we describe the development of microsatellite markers for K. senegalensis using next-generation sequencing to assess its intra-specific diversity across its natural range, which is a key for successful breeding programs and effective conservation management of the species. Next-generation sequencing yielded 93,943 sequences with an average read length of 234 bp. The assembled sequences contained 1030 simple sequence repeats, with primers designed for 522 microsatellite loci. Twenty-one microsatellite loci were tested with 11 showing reliable amplification and polymorphism in K. senegalensis. The 11 novel microsatellites, together with one previously published, were used to assess 73 accessions belonging to the Australian K. senegalensis domestication program, sampled from across the natural range of the species. STRUCTURE analysis shows two major clusters, one comprising mainly accessions from west Africa (Senegal to Benin) and the second based in the far eastern limits of the range in Sudan and Uganda. Higher levels of genetic diversity were found in material from western Africa. This suggests that new seed collections from this region may yield more diverse genotypes than those originating from Sudan and Uganda in eastern Africa. © 2011 Blackwell Publishing Ltd.
The Proteome Folding Project: Proteome-scale prediction of structure and function
Drew, Kevin; Winters, Patrick; Butterfoss, Glenn L.; Berstis, Viktors; Uplinger, Keith; Armstrong, Jonathan; Riffle, Michael; Schweighofer, Erik; Bovermann, Bill; Goodlett, David R.; Davis, Trisha N.; Shasha, Dennis; Malmström, Lars; Bonneau, Richard
2011-01-01
The incompleteness of proteome structure and function annotation is a critical problem for biologists and, in particular, severely limits interpretation of high-throughput and next-generation experiments. We have developed a proteome annotation pipeline based on structure prediction, where function and structure annotations are generated using an integration of sequence comparison, fold recognition, and grid-computing-enabled de novo structure prediction. We predict protein domain boundaries and three-dimensional (3D) structures for protein domains from 94 genomes (including human, Arabidopsis, rice, mouse, fly, yeast, Escherichia coli, and worm). De novo structure predictions were distributed on a grid of more than 1.5 million CPUs worldwide (World Community Grid). We generated significant numbers of new confident fold annotations (9% of domains that are otherwise unannotated in these genomes). We demonstrate that predicted structures can be combined with annotations from the Gene Ontology database to predict new and more specific molecular functions. PMID:21824995
Detection of a novel herpesvirus from bats in the Philippines.
Sano, Kaori; Okazaki, Sachiko; Taniguchi, Satoshi; Masangkay, Joseph S; Puentespina, Roberto; Eres, Eduardo; Cosico, Edison; Quibod, Niña; Kondo, Taisuke; Shimoda, Hiroshi; Hatta, Yuuki; Mitomo, Shumpei; Oba, Mami; Katayama, Yukie; Sassa, Yukiko; Furuya, Tetsuya; Nagai, Makoto; Une, Yumi; Maeda, Ken; Kyuwa, Shigeru; Yoshikawa, Yasuhiro; Akashi, Hiroomi; Omatsu, Tsutomu; Mizutani, Tetsuya
2015-08-01
Bats are natural hosts of many zoonotic viruses. Monitoring bat viruses is important to detect novel bat-borne infectious diseases. In this study, next generation sequencing techniques and conventional PCR were used to analyze intestine, lung, and blood clot samples collected from wild bats captured at three locations in Davao region, in the Philippines in 2012. Different viral genes belonging to the Retroviridae and Herpesviridae families were identified using next generation sequencing. The existence of herpesvirus in the samples was confirmed by PCR using herpesvirus consensus primers. The nucleotide sequences of the resulting PCR amplicons were 166-bp. Further phylogenetic analysis identified that the virus from which this nucleotide sequence was obtained belonged to the Gammaherpesvirinae subfamily. PCR using primers specific to the nucleotide sequence obtained revealed that the infection rate among the captured bats was 30 %. In this study, we present the partial genome of a novel gammaherpesvirus detected from wild bats. Our observations also indicate that this herpesvirus may be widely distributed in bat populations in Davao region.
Guan, Yan-Fang; Li, Gai-Rui; Wang, Rong-Jiao; Yi, Yu-Ting; Yang, Ling; Jiang, Dan; Zhang, Xiao-Ping; Peng, Yin
2012-01-01
With the development and improvement of new sequencing technology, next-generation sequencing (NGS) has been applied increasingly in cancer genomics research over the past decade. More recently, NGS has been adopted in clinical oncology to advance personalized treatment of cancer. NGS is used to identify novel and rare cancer mutations, detect familial cancer mutation carriers, and provide molecular rationale for appropriate targeted therapy. Compared to traditional sequencing, NGS holds many advantages, such as the ability to fully sequence all types of mutations for a large number of genes (hundreds to thousands) in a single test at a relatively low cost. However, significant challenges, particularly with respect to the requirement for simpler assays, more flexible throughput, shorter turnaround time, and most importantly, easier data analysis and interpretation, will have to be overcome to translate NGS to the bedside of cancer patients. Overall, continuous dedication to apply NGS in clinical oncology practice will enable us to be one step closer to personalized medicine. PMID:22980418
Camerlengo, Terry; Ozer, Hatice Gulcin; Onti-Srinivasan, Raghuram; Yan, Pearlly; Huang, Tim; Parvin, Jeffrey; Huang, Kun
2012-01-01
Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State University's Biomedical Informatics Shared Resource, we designed and implemented a scalable architecture to address the challenges associated with the resource intensive nature of NGS secondary analysis built around Illumina Genome Analyzer II sequencers and Illumina's Gerald data processing pipeline. The software infrastructure includes a distributed computing platform consisting of a LIMS called QUEST (http://bisr.osumc.edu), an Automation Server, a computer cluster for processing NGS pipelines, and a network attached storage device expandable up to 40TB. The system has been architected to scale to multiple sequencers without requiring additional computing or labor resources. This platform provides demonstrates how to manage and automate NGS experiments in an institutional or core facility setting.
Dasa, Siva Sai Krishna; Kelly, Kimberly A.
2016-01-01
Next-generation sequencing has enhanced the phage display process, allowing for the quantification of millions of sequences resulting from the biopanning process. In response, many valuable analysis programs focused on specificity and finding targeted motifs or consensus sequences were developed. For targeted drug delivery and molecular imaging, it is also necessary to find peptides that are selective—targeting only the cell type or tissue of interest. We present a new analysis strategy and accompanying software, PHage Analysis for Selective Targeted PEPtides (PHASTpep), which identifies highly specific and selective peptides. Using this process, we discovered and validated, both in vitro and in vivo in mice, two sequences (HTTIPKV and APPIMSV) targeted to pancreatic cancer-associated fibroblasts that escaped identification using previously existing software. Our selectivity analysis makes it possible to discover peptides that target a specific cell type and avoid other cell types, enhancing clinical translatability by circumventing complications with systemic use. PMID:27186887
Groisman, Iris Jaitovich; Mathieu, Ghislaine; Godard, Beatrice
2012-12-20
Next Generation Sequencing (NGS) is expected to help find the elusive, causative genetic defects associated with Bipolar Disorder (BD). This article identifies the importance of NGS and further analyses the social and ethical implications of this approach when used in research projects studying BD, as well as other psychiatric ailments, with a view to ensuring the protection of research participants. We performed a systematic review of studies through PubMed, followed by a manual search through the titles and abstracts of original articles, including the reviews, commentaries and letters published in the last five years and dealing with the ethical and social issues raised by NGS technologies and genomics studies of mental disorders, especially BD. A total of 217 studies contributed to identify the themes discussed herein. The amount of information generated by NGS renders individuals suffering from BD particularly vulnerable, and increases the need for educational support throughout the consent process, and, subsequently, of genetic counselling, when communicating individual research results and incidental findings to them. Our results highlight the importance and difficulty of respecting participants' autonomy while avoiding any therapeutic misconception. We also analysed the need for specific regulations on the use and communication of incidental findings, as well as the increasing influence of NGS in health care. Shared efforts on the part of researchers and their institutions, Research Ethics Boards as well as participants' representatives are needed to delineate a tailored consent process so as to better protect research participants. However, health care professionals involved in BD care and treatment need to first determine the scientific validity and clinical utility of NGS-generated findings, and thereafter their prevention and treatment significance.
2012-01-01
Background Next Generation Sequencing (NGS) is expected to help find the elusive, causative genetic defects associated with Bipolar Disorder (BD). This article identifies the importance of NGS and further analyses the social and ethical implications of this approach when used in research projects studying BD, as well as other psychiatric ailments, with a view to ensuring the protection of research participants. Methods We performed a systematic review of studies through PubMed, followed by a manual search through the titles and abstracts of original articles, including the reviews, commentaries and letters published in the last five years and dealing with the ethical and social issues raised by NGS technologies and genomics studies of mental disorders, especially BD. A total of 217 studies contributed to identify the themes discussed herein. Results The amount of information generated by NGS renders individuals suffering from BD particularly vulnerable, and increases the need for educational support throughout the consent process, and, subsequently, of genetic counselling, when communicating individual research results and incidental findings to them. Our results highlight the importance and difficulty of respecting participants’ autonomy while avoiding any therapeutic misconception. We also analysed the need for specific regulations on the use and communication of incidental findings, as well as the increasing influence of NGS in health care. Conclusions Shared efforts on the part of researchers and their institutions, Research Ethics Boards as well as participants’ representatives are needed to delineate a tailored consent process so as to better protect research participants. However, health care professionals involved in BD care and treatment need to first determine the scientific validity and clinical utility of NGS-generated findings, and thereafter their prevention and treatment significance. PMID:23256847
AMPLISAS: a web server for multilocus genotyping using next-generation amplicon sequencing data.
Sebastian, Alvaro; Herdegen, Magdalena; Migalska, Magdalena; Radwan, Jacek
2016-03-01
Next-generation sequencing (NGS) technologies are revolutionizing the fields of biology and medicine as powerful tools for amplicon sequencing (AS). Using combinations of primers and barcodes, it is possible to sequence targeted genomic regions with deep coverage for hundreds, even thousands, of individuals in a single experiment. This is extremely valuable for the genotyping of gene families in which locus-specific primers are often difficult to design, such as the major histocompatibility complex (MHC). The utility of AS is, however, limited by the high intrinsic sequencing error rates of NGS technologies and other sources of error such as polymerase amplification or chimera formation. Correcting these errors requires extensive bioinformatic post-processing of NGS data. Amplicon Sequence Assignment (AMPLISAS) is a tool that performs analysis of AS results in a simple and efficient way, while offering customization options for advanced users. AMPLISAS is designed as a three-step pipeline consisting of (i) read demultiplexing, (ii) unique sequence clustering and (iii) erroneous sequence filtering. Allele sequences and frequencies are retrieved in excel spreadsheet format, making them easy to interpret. AMPLISAS performance has been successfully benchmarked against previously published genotyped MHC data sets obtained with various NGS technologies. © 2015 John Wiley & Sons Ltd.
Tanase, Koji; Nishitani, Chikako; Hirakawa, Hideki; Isobe, Sachiko; Tabata, Satoshi; Ohmiya, Akemi; Onozaki, Takashi
2012-07-02
Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST) database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. We constructed a normalized cDNA library and a 3'-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380) of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO) and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs) in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant.
2012-01-01
Background Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST) database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. Results We constructed a normalized cDNA library and a 3’-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380) of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO) and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs) in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. Conclusions We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant. PMID:22747974
USDA-ARS?s Scientific Manuscript database
The tomato genome sequence was undertaken at a time when state-of-the-art sequencing methodologies were undergoing a transition to co-called next generation methodologies. The result was an international consortium undertaking a strategy merging both old and new approaches. Because biologists were...
Next-generation sequencing: the future of molecular genetics in poultry production and food safety.
Diaz-Sanchez, S; Hanning, I; Pendleton, Sean; D'Souza, Doris
2013-02-01
The era of molecular biology and automation of the Sanger chain-terminator sequencing method has led to discovery and advances in diagnostics and biotechnology. The Sanger methodology dominated research for over 2 decades, leading to significant accomplishments and technological improvements in DNA sequencing. Next-generation high-throughput sequencing (HT-NGS) technologies were developed subsequently to overcome the limitations of this first generation technology that include higher speed, less labor, and lowered cost. Various platforms developed include sequencing-by-synthesis 454 Life Sciences, Illumina (Solexa) sequencing, SOLiD sequencing (among others), and the Ion Torrent semiconductor sequencing technologies that use different detection principles. As technology advances, progress made toward third generation sequencing technologies are being reported, which include Nanopore Sequencing and real-time monitoring of PCR activity through fluorescent resonant energy transfer. The advantages of these technologies include scalability, simplicity, with increasing DNA polymerase performance and yields, being less error prone, and even more economically feasible with the eventual goal of obtaining real-time results. These technologies can be directly applied to improve poultry production and enhance food safety. For example, sequence-based (determination of the gut microbial community, genes for metabolic pathways, or presence of plasmids) and function-based (screening for function such as antibiotic resistance, or vitamin production) metagenomic analysis can be carried out. Gut microbialflora/communities of poultry can be sequenced to determine the changes that affect health and disease along with efficacy of methods to control pathogenic growth. Thus, the purpose of this review is to provide an overview of the principles of these current technologies and their potential application to improve poultry production and food safety as well as public health.
Martineau, Christine; Li, Xuejing; Lalancette, Cindy; Perreault, Thérèse; Fournier, Eric; Tremblay, Julien; Gonzales, Milagros; Yergeau, Étienne; Quach, Caroline
2018-06-13
Serratia marcescens is an environmental bacterium commonly associated with outbreaks in neonatal intensive care units (NICU). Investigation of S. marcescens outbreaks requires efficient recovery and typing of clinical and environmental isolates. In this study, we described how the use of next-generation sequencing applications, such as bacterial whole-genome sequencing (WGS) and bacterial community profiling, could improve S. marcescens outbreak investigation. Phylogenomic links and potential antibiotic resistance genes and plasmids in S. marcescens isolates were investigated using WGS, while bacterial communities and relative abundances of Serratia in environmental samples were assessed using sequencing of bacterial phylogenetic marker genes (16S rRNA and gyrB genes). Typing results obtained using WGS for the ten S. marcescens isolates recovered during a NICU outbreak investigation were highly consistent with those from pulse-field gel electrophoresis (PFGE), the current gold standard typing method for this bacterium. WGS also allowed for the identification of genes associated with antibiotic resistance in all isolates, while no plasmid was detected. Sequencing of the 16S rRNA and gyrB genes both showed higher relative abundances of Serratia in environmental sampling sites that were in close contact with infected babies. Much lower relative abundances of Serratia were observed following disinfection of a room, indicating that the protocol used was efficient. Variations in the bacterial community composition and structure following room disinfection and between sampling sites were also identified through 16S rRNA gene sequencing. Globally, results from this study highlight the potential for next-generation sequencing tools to improve and facilitate outbreak investigation. Copyright © 2018 American Society for Microbiology.
Shaukat, Shahzad; Angez, Mehar; Alam, Muhammad Masroor; Jebbink, Maarten F; Deijs, Martin; Canuti, Marta; Sharif, Salmaan; de Vries, Michel; Khurshid, Adnan; Mahmood, Tariq; van der Hoek, Lia; Zaidi, Syed Sohail Zahoor
2014-08-12
The use of sequence independent methods combined with next generation sequencing for identification purposes in clinical samples appears promising and exciting results have been achieved to understand unexplained infections. One sequence independent method, Virus Discovery based on cDNA Amplified Fragment Length Polymorphism (VIDISCA) is capable of identifying viruses that would have remained unidentified in standard diagnostics or cell cultures. VIDISCA is normally combined with next generation sequencing, however, we set up a simplified VIDISCA which can be used in case next generation sequencing is not possible. Stool samples of 10 patients with unexplained acute flaccid paralysis showing cytopathic effect in rhabdomyosarcoma cells and/or mouse cells were used to test the efficiency of this method. To further characterize the viruses, VIDISCA-positive samples were amplified and sequenced with gene specific primers. Simplified VIDISCA detected seven viruses (70%) and the proportion of eukaryotic viral sequences from each sample ranged from 8.3 to 45.8%. Human enterovirus EV-B97, EV-B100, echovirus-9 and echovirus-21, human parechovirus type-3, human astrovirus probably a type-3/5 recombinant, and tetnovirus-1 were identified. Phylogenetic analysis based on the VP1 region demonstrated that the human enteroviruses are more divergent isolates circulating in the community. Our data support that a simplified VIDISCA protocol can efficiently identify unrecognized viruses grown in cell culture with low cost, limited time without need of advanced technical expertise. Also complex data interpretation is avoided thus the method can be used as a powerful diagnostic tool in limited resources. Redesigning the routine diagnostics might lead to additional detection of previously undiagnosed viruses in clinical samples of patients.
Pratas, Diogo; Pinho, Armando J; Rodrigues, João M O S
2014-01-16
The emerging next-generation sequencing (NGS) is bringing, besides the natural huge amounts of data, an avalanche of new specialized tools (for analysis, compression, alignment, among others) and large public and private network infrastructures. Therefore, a direct necessity of specific simulation tools for testing and benchmarking is rising, such as a flexible and portable FASTQ read simulator, without the need of a reference sequence, yet correctly prepared for producing approximately the same characteristics as real data. We present XS, a skilled FASTQ read simulation tool, flexible, portable (does not need a reference sequence) and tunable in terms of sequence complexity. It has several running modes, depending on the time and memory available, and is aimed at testing computing infrastructures, namely cloud computing of large-scale projects, and testing FASTQ compression algorithms. Moreover, XS offers the possibility of simulating the three main FASTQ components individually (headers, DNA sequences and quality-scores). XS provides an efficient and convenient method for fast simulation of FASTQ files, such as those from Ion Torrent (currently uncovered by other simulators), Roche-454, Illumina and ABI-SOLiD sequencing machines. This tool is publicly available at http://bioinformatics.ua.pt/software/xs/.
DOT National Transportation Integrated Search
2006-11-01
This is the final report of an 18-month project to: (1) review Next Generation Air Transportation System (NGATS) Joint Planning and Development Office (JPDO) documents as they pertain to human-automation interaction; (2) review past system failures i...
Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca
2015-01-01
Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources. PMID:26151450
Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca
2015-01-01
Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.
Somatic mosaicism of a CDKL5 mutation identified by next-generation sequencing.
Kato, Takeshi; Morisada, Naoya; Nagase, Hiroaki; Nishiyama, Masahiro; Toyoshima, Daisaku; Nakagawa, Taku; Maruyama, Azusa; Fu, Xue Jun; Nozu, Kandai; Wada, Hiroko; Takada, Satoshi; Iijima, Kazumoto
2015-10-01
CDKL5-related encephalopathy is an X-linked dominantly inherited disorder that is characterized by early infantile epileptic encephalopathy or atypical Rett syndrome. We describe a 5-year-old Japanese boy with intractable epilepsy, severe developmental delay, and Rett syndrome-like features. Onset was at 2 months, when his electroencephalogram showed sporadic single poly spikes and diffuse irregular poly spikes. We conducted a genetic analysis using an Illumina® TruSight™ One sequencing panel on a next-generation sequencer. We identified two epilepsy-associated single nucleotide variants in our case: CDKL5 p.Ala40Val and KCNQ2 p.Glu515Asp. CDKL5 p.Ala40Val has been previously reported to be responsible for early infantile epileptic encephalopathy. In our case, the CDKL5 heterozygous mutation showed somatic mosaicism because the boy's karyotype was 46,XY. The KCNQ2 variant p.Glu515Asp is known to cause benign familial neonatal seizures-1, and this variant showed paternal inheritance. Although we believe that the somatic mosaic CDKL5 mutation is mainly responsible for the neurological phenotype in the patient, the KCNQ2 variant might have some neurological effect. Genetic analysis by next-generation sequencing is capable of identifying multiple variants in a patient. Copyright © 2015 The Japanese Society of Child Neurology. Published by Elsevier B.V. All rights reserved.
Metzger, Julia; Tonda, Raul; Beltran, Sergi; Agueda, Lídia; Gut, Marta; Distl, Ottmar
2014-07-04
Domestication has shaped the horse and lead to a group of many different types. Some have been under strong human selection while others developed in close relationship with nature. The aim of our study was to perform next generation sequencing of breed and non-breed horses to provide an insight into genetic influences on selective forces. Whole genome sequencing of five horses of four different populations revealed 10,193,421 single nucleotide polymorphisms (SNPs) and 1,361,948 insertion/deletion polymorphisms (indels). In comparison to horse variant databases and previous reports, we were able to identify 3,394,883 novel SNPs and 868,525 novel indels. We analyzed the distribution of individual variants and found significant enrichment of private mutations in coding regions of genes involved in primary metabolic processes, anatomical structures, morphogenesis and cellular components in non-breed horses and in contrast to that private mutations in genes affecting cell communication, lipid metabolic process, neurological system process, muscle contraction, ion transport, developmental processes of the nervous system and ectoderm in breed horses. Our next generation sequencing data constitute an important first step for the characterization of non-breed in comparison to breed horses and provide a large number of novel variants for future analyses. Functional annotations suggest specific variants that could play a role for the characterization of breed or non-breed horses.
Qiu, Biyuan; Ma, Tao; Peng, Chunyan; Zheng, Xiaoqin; Yang, Jiyun
2018-04-01
The diagnosis of oculocutaneous albinism (OCA) is established using clinical signs and symptoms. OCA is, however, a highly genetically heterogeneous disease with mutations identified in at least nineteen unique genes, many of which produce overlapping phenotypic traits. Thus, differentiating genetic OCA subtypes for diagnoses and genetic counseling is challenging, based on clinical presentation alone, and would benefit from a comprehensive molecular diagnostic. To develop and validate a more comprehensive, targeted, next-generation-sequencing-based diagnostic for the identification of OCA-causing variants. The genomic DNA samples from 28 OCA probands were analyzed by targeted next-generation sequencing (NGS), and the candidate variants were confirmed through Sanger sequencing. We observed mutations in the TYR, OCA2, and SLC45A2 genes in 25/28 (89%) patients with OCA. We identified 38 pathogenic variants among these three genes, including 5 novel variants: c.1970G>T (p.Gly657Val), c.1669A>C (p.Thr557Pro), c.2339-2A>C, and c.1349C>G (p.Thr450Arg) in OCA2; c.459_470delTTTTGCTGCCGA (p.Ala155_Phe158del) in SLC45A2. Our findings expand the mutational spectrum of OCA in the Chinese population, and the assay we developed should be broadly useful as a molecular diagnostic, and as an aid for genetic counseling for OCA patients.
Dridi, M; Rosseel, T; Orton, R; Johnson, P; Lecollinet, S; Muylkens, B; Lambrecht, B; Van Borm, S
2015-10-01
West Nile virus (WNV) occurs as a population of genetic variants (quasispecies) infecting a single animal. Previous low-resolution viral genetic diversity estimates in sampled wild birds and mosquitoes, and in multiple-passage adaptation studies in vivo or in cell culture, suggest that WNV genetic diversification is mostly limited to the mosquito vector. This study investigated genetic diversification of WNV in avian hosts during a single passage using next-generation sequencing. Wild-captured carrion crows were subcutaneously infected using a clonal Middle-East WNV. Blood samples were collected 2 and 4 days post-infection. A reverse-transcription (RT)-PCR approach was used to amplify the WNV genome directly from serum samples prior to next-generation sequencing resulting in an average depth of at least 700 × in each sample. Appropriate controls were sequenced to discriminate biologically relevant low-frequency variants from experimentally introduced errors. The WNV populations in the wild crows showed significant diversification away from the inoculum virus quasispecies structure. By contrast, WNV populations in intracerebrally infected day-old chickens did not diversify from that of the inoculum. Where previous studies concluded that WNV genetic diversification is only experimentally demonstrated in its permissive insect vector species, we have experimentally shown significant diversification of WNV populations in a wild bird reservoir species.
Kim, Hyun-Kyoung; Park, Won Cheol; Lee, Kwang Man; Hwang, Hai-Li; Park, Seong-Yeol; Sorn, Sungbin; Chandra, Vishal; Kim, Kwang Gi; Yoon, Woong-Bae; Bae, Joon Seol; Shin, Hyoung Doo; Shin, Jong-Yeon; Seoh, Ju-Young; Kim, Jong-Il; Hong, Kyeong-Man
2014-01-01
The concept of the utilization of rearranged ends for development of personalized biomarkers has attracted much attention owing to its clinical applicability. Although targeted next-generation sequencing (NGS) for recurrent rearrangements has been successful in hematologic malignancies, its application to solid tumors is problematic due to the paucity of recurrent translocations. However, copy-number breakpoints (CNBs), which are abundant in solid tumors, can be utilized for identification of rearranged ends. As a proof of concept, we performed targeted next-generation sequencing at copy-number breakpoints (TNGS-CNB) in nine colon cancer cases including seven primary cancers and two cell lines, COLO205 and SW620. For deduction of CNBs, we developed a novel competitive single-nucleotide polymorphism (cSNP) microarray method entailing CNB-region refinement by competitor DNA. Using TNGS-CNB, 19 specific rearrangements out of 91 CNBs (20.9%) were identified, and two polymerase chain reaction (PCR)-amplifiable rearrangements were obtained in six cases (66.7%). And significantly, TNGS-CNB, with its high positive identification rate (82.6%) of PCR-amplifiable rearrangements at candidate sites (19/23), just from filtering of aligned sequences, requires little effort for validation. Our results indicate that TNGS-CNB, with its utility for identification of rearrangements in solid tumors, can be successfully applied in the clinical laboratory for cancer-relapse and therapy-response monitoring.
Huang, Xiaoyan; Tian, Mao; Li, Jiankang; Cui, Ling; Li, Min; Zhang, Jianguo
2017-11-01
Norrie disease (ND) is a rare X-linked genetic disorder, the main symptoms of which are congenital blindness and white pupils. It has been reported that ND is caused by mutations in the NDP gene. Although many mutations in NDP have been reported, the genetic cause for many patients remains unknown. In this study, the aim is to investigate the genetic defect in a five-generation family with typical symptoms of ND. To identify the causative gene, next-generation sequencing based target capture sequencing was performed. Segregation analysis of the candidate variant was performed in additional family members using Sanger sequencing. We identified a novel missense variant (c.314C>A) located within the NDP gene. The mutation cosegregated within all affected individuals in the family and was not found in unaffected members. By happenstance, in this family, we also detected a known pathogenic variant of retinitis pigmentosa in a healthy individual. c.314C>A mutation of NDP gene is a novel mutation and broadens the genetic spectrum of ND.
Next-generation sequencing reveals a novel NDP gene mutation in a Chinese family with Norrie disease
Huang, Xiaoyan; Tian, Mao; Li, Jiankang; Cui, Ling; Li, Min; Zhang, Jianguo
2017-01-01
Purpose: Norrie disease (ND) is a rare X-linked genetic disorder, the main symptoms of which are congenital blindness and white pupils. It has been reported that ND is caused by mutations in the NDP gene. Although many mutations in NDP have been reported, the genetic cause for many patients remains unknown. In this study, the aim is to investigate the genetic defect in a five-generation family with typical symptoms of ND. Methods: To identify the causative gene, next-generation sequencing based target capture sequencing was performed. Segregation analysis of the candidate variant was performed in additional family members using Sanger sequencing. Results: We identified a novel missense variant (c.314C>A) located within the NDP gene. The mutation cosegregated within all affected individuals in the family and was not found in unaffected members. By happenstance, in this family, we also detected a known pathogenic variant of retinitis pigmentosa in a healthy individual. Conclusion: c.314C>A mutation of NDP gene is a novel mutation and broadens the genetic spectrum of ND. PMID:29133643
Implementation of Cloud based next generation sequencing data analysis in a clinical laboratory.
Onsongo, Getiria; Erdmann, Jesse; Spears, Michael D; Chilton, John; Beckman, Kenneth B; Hauge, Adam; Yohe, Sophia; Schomaker, Matthew; Bower, Matthew; Silverstein, Kevin A T; Thyagarajan, Bharat
2014-05-23
The introduction of next generation sequencing (NGS) has revolutionized molecular diagnostics, though several challenges remain limiting the widespread adoption of NGS testing into clinical practice. One such difficulty includes the development of a robust bioinformatics pipeline that can handle the volume of data generated by high-throughput sequencing in a cost-effective manner. Analysis of sequencing data typically requires a substantial level of computing power that is often cost-prohibitive to most clinical diagnostics laboratories. To address this challenge, our institution has developed a Galaxy-based data analysis pipeline which relies on a web-based, cloud-computing infrastructure to process NGS data and identify genetic variants. It provides additional flexibility, needed to control storage costs, resulting in a pipeline that is cost-effective on a per-sample basis. It does not require the usage of EBS disk to run a sample. We demonstrate the validation and feasibility of implementing this bioinformatics pipeline in a molecular diagnostics laboratory. Four samples were analyzed in duplicate pairs and showed 100% concordance in mutations identified. This pipeline is currently being used in the clinic and all identified pathogenic variants confirmed using Sanger sequencing further validating the software.
Integrating genome assemblies with MAIA
Nijkamp, Jurgen; Winterbach, Wynand; van den Broek, Marcel; Daran, Jean-Marc; Reinders, Marcel; de Ridder, Dick
2010-01-01
Motivation: De novo assembly of a eukaryotic genome with next-generation sequencing data is still a challenging task. Over the past few years several assemblers have been developed, often suitable for one specific type of sequencing data. The number of known genomes is expanding rapidly, therefore it becomes possible to use multiple reference genomes for assembly projects. We introduce an assembly integrator that makes use of all available data, i.e. multiple de novo assemblies and mappings against multiple related genomes, by optimizing a weighted combination of criteria. Results: The developed algorithm was applied on the de novo sequencing of the Saccharomyces cerevisiae CEN.PK 113-7D strain. Using Solexa and 454 read data, two de novo and three comparative assemblies were constructed and subsequently integrated, yielding 29 contigs, covering more than 12 Mbp; a drastic improvement compared with the single assemblies. Availability: MAIA is available as a Matlab package and can be downloaded from http://bioinformatics.tudelft.nl Contact: j.f.nijkamp@tudelft.nl PMID:20823304
Chen, Poyin; Jeannotte, Richard; Weimer, Bart C
2014-05-01
Epigenetics has an important role for the success of foodborne pathogen persistence in diverse host niches. Substantial challenges exist in determining DNA methylation to situation-specific phenotypic traits. DNA modification, mediated by restriction-modification systems, functions as an immune response against antagonistic external DNA, and bacteriophage-acquired methyltransferases (MTase) and orphan MTases - those lacking the cognate restriction endonuclease - facilitate evolution of new phenotypes via gene expression modulation via DNA and RNA modifications, including methylation and phosphorothioation. Recent establishment of large-scale genome sequencing projects will result in a significant increase in genome availability that will lead to new demands for data analysis including new predictive bioinformatics approaches that can be verified with traditional scientific rigor. Sequencing technologies that detect modification coupled with mass spectrometry to discover new adducts is a powerful tactic to study bacterial epigenetics, which is poised to make novel and far-reaching discoveries that link biological significance and the bacterial epigenome. Copyright © 2014 Elsevier Ltd. All rights reserved.
MetaPathways v2.5: quantitative functional, taxonomic and usability improvements.
Konwar, Kishori M; Hanson, Niels W; Bhatia, Maya P; Kim, Dongjae; Wu, Shang-Ju; Hahn, Aria S; Morgan-Lang, Connor; Cheung, Hiu Kan; Hallam, Steven J
2015-10-15
Next-generation sequencing is producing vast amounts of sequence information from natural and engineered ecosystems. Although this data deluge has an enormous potential to transform our lives, knowledge creation and translation need software applications that scale with increasing data processing and analysis requirements. Here, we present improvements to MetaPathways, an annotation and analysis pipeline for environmental sequence information that expedites this transformation. We specifically address pathway prediction hazards through integration of a weighted taxonomic distance and enable quantitative comparison of assembled annotations through a normalized read-mapping measure. Additionally, we improve LAST homology searches through BLAST-equivalent E-values and output formats that are natively compatible with prevailing software applications. Finally, an updated graphical user interface allows for keyword annotation query and projection onto user-defined functional gene hierarchies, including the Carbohydrate-Active Enzyme database. MetaPathways v2.5 is available on GitHub: http://github.com/hallamlab/metapathways2. shallam@mail.ubc.ca Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Extracellular RNA is transported from one generation to the next in Caenorhabditis elegans
Marré, Julia; Traver, Edward C.
2016-01-01
Experiences during the lifetime of an animal have been proposed to have consequences for subsequent generations. Although it is unclear how such intergenerational transfer of information occurs, RNAs found extracellularly in animals are candidate molecules that can transfer gene-specific regulatory information from one generation to the next because they can enter cells and regulate gene expression. In support of this idea, when double-stranded RNA (dsRNA) is introduced into some animals, the dsRNA can silence genes of matching sequence and the silencing can persist in progeny. Such persistent gene silencing is thought to result from sequence-specific interaction of the RNA within parents to generate chromatin modifications, DNA methylation, and/or secondary RNAs, which are then inherited by progeny. Here, we show that dsRNA can be directly transferred between generations in the worm Caenorhabditis elegans. Intergenerational transfer of dsRNA occurs even in animals that lack any DNA of matching sequence, and dsRNA that reaches progeny can spread between cells to cause gene silencing. Surprisingly, extracellular dsRNA can also reach progeny without entry into the cytosol, presumably within intracellular vesicles. Fluorescently labeled dsRNA is imported from extracellular space into oocytes along with yolk and accumulates in punctate structures within embryos. Subsequent entry into the cytosol of early embryos causes gene silencing in progeny. These results demonstrate the transport of extracellular RNA from one generation to the next to regulate gene expression in an animal and thus suggest a mechanism for the transmission of experience-dependent effects between generations. PMID:27791108
Lubin, Ira M; Aziz, Nazneen; Babb, Lawrence J; Ballinger, Dennis; Bisht, Himani; Church, Deanna M; Cordes, Shaun; Eilbeck, Karen; Hyland, Fiona; Kalman, Lisa; Landrum, Melissa; Lockhart, Edward R; Maglott, Donna; Marth, Gabor; Pfeifer, John D; Rehm, Heidi L; Roy, Somak; Tezak, Zivana; Truty, Rebecca; Ullman-Cullere, Mollie; Voelkerding, Karl V; Worthey, Elizabeth A; Zaranek, Alexander W; Zook, Justin M
2017-05-01
A national workgroup convened by the Centers for Disease Control and Prevention identified principles and made recommendations for standardizing the description of sequence data contained within the variant file generated during the course of clinical next-generation sequence analysis for diagnosing human heritable conditions. The specifications for variant files were initially developed to be flexible with regard to content representation to support a variety of research applications. This flexibility permits variation with regard to how sequence findings are described and this depends, in part, on the conventions used. For clinical laboratory testing, this poses a problem because these differences can compromise the capability to compare sequence findings among laboratories to confirm results and to query databases to identify clinically relevant variants. To provide for a more consistent representation of sequence findings described within variant files, the workgroup made several recommendations that considered alignment to a common reference sequence, variant caller settings, use of genomic coordinates, and gene and variant naming conventions. These recommendations were considered with regard to the existing variant file specifications presently used in the clinical setting. Adoption of these recommendations is anticipated to reduce the potential for ambiguity in describing sequence findings and facilitate the sharing of genomic data among clinical laboratories and other entities. Copyright © 2017 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments.
Hackenberg, Michael; Sturm, Martin; Langenberger, David; Falcón-Pérez, Juan Manuel; Aransay, Ana M
2009-07-01
Next-generation sequencing allows now the sequencing of small RNA molecules and the estimation of their expression levels. Consequently, there will be a high demand of bioinformatics tools to cope with the several gigabytes of sequence data generated in each single deep-sequencing experiment. Given this scene, we developed miRanalyzer, a web server tool for the analysis of deep-sequencing experiments for small RNAs. The web server tool requires a simple input file containing a list of unique reads and its copy numbers (expression levels). Using these data, miRanalyzer (i) detects all known microRNA sequences annotated in miRBase, (ii) finds all perfect matches against other libraries of transcribed sequences and (iii) predicts new microRNAs. The prediction of new microRNAs is an especially important point as there are many species with very few known microRNAs. Therefore, we implemented a highly accurate machine learning algorithm for the prediction of new microRNAs that reaches AUC values of 97.9% and recall values of up to 75% on unseen data. The web tool summarizes all the described steps in a single output page, which provides a comprehensive overview of the analysis, adding links to more detailed output pages for each analysis module. miRanalyzer is available at http://web.bioinformatics.cicbiogune.es/microRNA/.
SMITH: a LIMS for handling next-generation sequencing workflows
2014-01-01
Background Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). Methods SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. Results SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. Conclusions SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis. PMID:25471934
SMITH: a LIMS for handling next-generation sequencing workflows.
Venco, Francesco; Vaskin, Yuriy; Ceol, Arnaud; Muller, Heiko
2014-01-01
Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis.
The evolution of the search for novel genes in mammalian sex determination: from mice to men.
Arboleda, Valerie A; Vilain, Eric
2011-01-01
Disorders of sex determination are a genetically heterogeneous group of rare disorders, presenting with sex-specific phenotypes and variable expressivity. Prior to the advent of the Human Genome Project, the identification of novel mammalian sex determination genes was hindered by the rarity of disorders of sex determination and small family sizes that made traditional linkage approaches difficult, if not impossible. This article reviews the revolutionary role of the Human Genome Project in the history of sex determination research and highlights the important role of inbred mouse models in elucidating the role of identified sex determination genes in mammalian sex determination. Next generation sequencing technologies has made it possible to sequence complete human genomes or exomes for the purpose of providing a genetic diagnosis to more patients with unexplained disorders of sex determination and identifying novel sex determination genes. However, beyond novel gene discovery, these tools have the power to inform us on more intricate and complex regulation-taking place within the heterogeneous cells that make up the testis and ovary. Copyright © 2011 Elsevier Inc. All rights reserved.
Plasmodium vivax Biology: Insights Provided by Genomics, Transcriptomics and Proteomics
Bourgard, Catarina; Albrecht, Letusa; Kayano, Ana C. A. V.; Sunnerhagen, Per; Costa, Fabio T. M.
2018-01-01
During the last decade, the vast omics field has revolutionized biological research, especially the genomics, transcriptomics and proteomics branches, as technological tools become available to the field researcher and allow difficult question-driven studies to be addressed. Parasitology has greatly benefited from next generation sequencing (NGS) projects, which have resulted in a broadened comprehension of basic parasite molecular biology, ecology and epidemiology. Malariology is one example where application of this technology has greatly contributed to a better understanding of Plasmodium spp. biology and host-parasite interactions. Among the several parasite species that cause human malaria, the neglected Plasmodium vivax presents great research challenges, as in vitro culturing is not yet feasible and functional assays are heavily limited. Therefore, there are gaps in our P. vivax biology knowledge that affect decisions for control policies aiming to eradicate vivax malaria in the near future. In this review, we provide a snapshot of key discoveries already achieved in P. vivax sequencing projects, focusing on developments, hurdles, and limitations currently faced by the research community, as well as perspectives on future vivax malaria research. PMID:29473024
2013-01-01
Background Human leukocyte antigen matching at allelic resolution is proven clinically significant in hematopoietic stem cell transplantation, lowering the risk of graft-versus-host disease and mortality. However, due to the ever growing HLA allele database, tissue typing laboratories face substantial challenges. In light of the complexity and the high degree of allelic diversity, it has become increasingly difficult to define the classical transplantation antigens at high-resolution by using well-tried methods. Thus, next-generation sequencing is entering into diagnostic laboratories at the perfect time and serving as a promising tool to overcome intrinsic HLA typing problems. Therefore, we have developed and validated a scalable automated HLA class I and class II typing approach suitable for diagnostic use. Results A validation panel of 173 clinical and proficiency testing samples was analysed, demonstrating 100% concordance to the reference method. From a total of 1,273 loci we were able to generate 1,241 (97.3%) initial successful typings. The mean ambiguity reduction for the analysed loci was 93.5%. Allele assignment including intronic sequences showed an improved resolution (99.2%) of non-expressed HLA alleles. Conclusion We provide a powerful HLA typing protocol offering a short turnaround time of only two days, a fully integrated workflow and most importantly a high degree of typing reliability. The presented automated assay is flexible and can be scaled by specific primer compilations and the use of different 454 sequencing systems. The workflow was successfully validated according to the policies of the European Federation for Immunogenetics. Next-generation sequencing seems to become one of the new methods in the field of Histocompatibility. PMID:23557197
Analysis of Litopenaeus vannamei Transcriptome Using the Next-Generation DNA Sequencing Technique
Li, Chaozheng; Weng, Shaoping; Chen, Yonggui; Yu, Xiaoqiang; Lü, Ling; Zhang, Haiqing; He, Jianguo; Xu, Xiaopeng
2012-01-01
Background Pacific white shrimp (Litopenaeus vannamei), the major species of farmed shrimps in the world, has been attracting extensive studies, which require more and more genome background knowledge. The now available transcriptome data of L. vannamei are insufficient for research requirements, and have not been adequately assembled and annotated. Methodology/Principal Findings This is the first study that used a next-generation high-throughput DNA sequencing technique, the Solexa/Illumina GA II method, to analyze the transcriptome from whole bodies of L. vannamei larvae. More than 2.4 Gb of raw data were generated, and 109,169 unigenes with a mean length of 396 bp were assembled using the SOAP denovo software. 73,505 unigenes (>200 bp) with good quality sequences were selected and subjected to annotation analysis, among which 37.80% can be matched in NCBI Nr database, 37.3% matched in Swissprot, and 44.1% matched in TrEMBL. Using BLAST and BLAST2Go softwares, 11,153 unigenes were classified into 25 Clusters of Orthologous Groups of proteins (COG) categories, 8171 unigenes were assigned into 51 Gene ontology (GO) functional groups, and 18,154 unigenes were divided into 220 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. To primarily verify part of the results of assembly and annotations, 12 assembled unigenes that are homologous to many embryo development-related genes were chosen and subjected to RT-PCR for electrophoresis and Sanger sequencing analyses, and to real-time PCR for expression profile analyses during embryo development. Conclusions/Significance The L. vannamei transcriptome analyzed using the next-generation sequencing technique enriches the information of L. vannamei genes, which will facilitate our understanding of the genome background of crustaceans, and promote the studies on L. vannamei. PMID:23071809
Scholz, Christian F P; Jensen, Anders
2017-01-01
The protocol describes a computational method to develop a Single Locus Sequence Typing (SLST) scheme for typing bacterial species. The resulting scheme can be used to type bacterial isolates as well as bacterial species directly from complex communities using next-generation sequencing technologies.
USDA-ARS?s Scientific Manuscript database
The genomic sequences of low and high passages of the United States infectious laryngotracheitis (ILT) vaccine strains CEO and TCO were determined using hybrid next generation sequencing in order to define genomic changes associated with attenuation and reversion to virulence. Phylogenetic analysis ...
USDA-ARS?s Scientific Manuscript database
Genotyping by sequencing (GBS) has been developed as an affordable application of next-generation sequencing for the purposes of discovering and genotyping SNPs in a variety of crop species and populations. In this study we employed a double restriction enzyme digestion protocol (HindIII and NlaIII)...
Using Next-Generation Sequencing to Explore Genetics and Race in the High School Classroom
ERIC Educational Resources Information Center
Yang, Xinmiao; Hartman, Mark R.; Harrington, Kristin T.; Etson, Candice M.; Fierman, Matthew B.; Slonim, Donna K.; Walt, David R.
2017-01-01
With the development of new sequencing and bioinformatics technologies, concepts relating to personal genomics play an increasingly important role in our society. To promote interest and understanding of sequencing and bioinformatics in the high school classroom, we developed and implemented a laboratory-based teaching module called "The…
USDA-ARS?s Scientific Manuscript database
The low cost of next generation sequencing (NGS) technology and the availability of a large number of well annotated plant genomes has made sequencing technology useful to breeding programs. With the published high quality tomato reference genome of the processing cultivar Heinz 1706, we can now uti...
USDA-ARS?s Scientific Manuscript database
There is a growing need to combine DNA sequencing technologies to address complex problems in genome biology. These genomic studies routinely generate voluminous image, sequence, and mapping files that should be associated with quality control information (gels, spectra, etc.), and other important ...
Leong, Wai-Mun; Ripen, Adiratna Mat; Mirsafian, Hoda; Mohamad, Saharuddin Bin; Merican, Amir Feisal
2018-06-07
High-depth next generation sequencing data provide valuable insights into the number and distribution of RNA editing events. Here, we report the RNA editing events at cellular level of human primary monocyte using high-depth whole genomic and transcriptomic sequencing data. We identified over a ten thousand putative RNA editing sites and 69% of the sites were A-to-I editing sites. The sites enriched in repetitive sequences and intronic regions. High-depth sequencing datasets revealed that 90% of the canonical sites were edited at lower frequencies (<0.7). Single and multiple human monocytes and brain tissues samples were analyzed through genome sequence independent approach. The later approach was observed to identify more editing sites. Monocytes was observed to contain more C-to-U editing sites compared to brain tissues. Our results establish comparable pipeline that can address current limitations as well as demonstrate the potential for highly sensitive detection of RNA editing events in single cell type. Copyright © 2018 Elsevier Inc. All rights reserved.
Analysis of plant microbe interactions in the era of next generation sequencing technologies
Knief, Claudia
2014-01-01
Next generation sequencing (NGS) technologies have impressively accelerated research in biological science during the last years by enabling the production of large volumes of sequence data to a drastically lower price per base, compared to traditional sequencing methods. The recent and ongoing developments in the field allow addressing research questions in plant-microbe biology that were not conceivable just a few years ago. The present review provides an overview of NGS technologies and their usefulness for the analysis of microorganisms that live in association with plants. Possible limitations of the different sequencing systems, in particular sources of errors and bias, are critically discussed and methods are disclosed that help to overcome these shortcomings. A focus will be on the application of NGS methods in metagenomic studies, including the analysis of microbial communities by amplicon sequencing, which can be considered as a targeted metagenomic approach. Different applications of NGS technologies are exemplified by selected research articles that address the biology of the plant associated microbiota to demonstrate the worth of the new methods. PMID:24904612
Novel Primer Sets for Next Generation Sequencing-Based Analyses of Water Quality
Lee, Elvina; Khurana, Maninder S.; Whiteley, Andrew S.; Monis, Paul T.; Bath, Andrew; Gordon, Cameron; Ryan, Una M.; Paparini, Andrea
2017-01-01
Next generation sequencing (NGS) has rapidly become an invaluable tool for the detection, identification and relative quantification of environmental microorganisms. Here, we demonstrate two new 16S rDNA primer sets, which are compatible with NGS approaches and are primarily for use in water quality studies. Compared to 16S rRNA gene based universal primers, in silico and experimental analyses demonstrated that the new primers showed increased specificity for the Cyanobacteria and Proteobacteria phyla, allowing increased sensitivity for the detection, identification and relative quantification of toxic bloom-forming microalgae, microbial water quality bioindicators and common pathogens. Significantly, Cyanobacterial and Proteobacterial sequences accounted for ca. 95% of all sequences obtained within NGS runs (when compared to ca. 50% with standard universal NGS primers), providing higher sensitivity and greater phylogenetic resolution of key water quality microbial groups. The increased selectivity of the new primers allow the parallel sequencing of more samples through reduced sequence retrieval levels required to detect target groups, potentially reducing NGS costs by 50% but still guaranteeing optimal coverage and species discrimination. PMID:28118368
Using Next-Generation Sequencing to Explore Genetics and Race in the High School Classroom
Yang, Xinmiao; Hartman, Mark R.; Harrington, Kristin T.; Etson, Candice M.; Fierman, Matthew B.; Slonim, Donna K.; Walt, David R.
2017-01-01
With the development of new sequencing and bioinformatics technologies, concepts relating to personal genomics play an increasingly important role in our society. To promote interest and understanding of sequencing and bioinformatics in the high school classroom, we developed and implemented a laboratory-based teaching module called “The Genetics of Race.” This module uses the topic of race to engage students with sequencing and genetics. In the experimental portion of this module, students isolate their own mitochondrial DNA using standard biotechnology techniques and collect next-generation sequencing data to determine which of their classmates are most and least genetically similar to themselves. We evaluated the efficacy of this module by administering a pretest/posttest evaluation to measure student knowledge related to sequencing and bioinformatics, and we also conducted a survey at the conclusion of the module to assess student attitudes. Upon completion of our Genetics of Race module, students demonstrated significant learning gains, with lower-performing students obtaining the highest gains, and developed more positive attitudes toward scientific research. PMID:28408407
Complete Genome Sequences of Two Vesicular Stomatitis Virus Isolates Collected in Mexico
Isa, Pavel; Pauszek, Steven J.; Rodriguez, Luis L.
2017-01-01
ABSTRACT We report two full-genome sequences of vesicular stomatitis New Jersey virus (VSNJV) obtained by Illumina next-generation sequencing of RNA isolated from epithelial suspensions of cattle naturally infected in Mexico. These genomes represent the first full-genome sequences of vesicular stomatitis New Jersey viruses circulating in Mexico deposited in the GenBank database. PMID:28912331
Diagnostic Applications of Next Generation Sequencing in Immunogenetics and Molecular Oncology
Grumbt, Barbara; Eck, Sebastian H.; Hinrichsen, Tanja; Hirv, Kaimo
2013-01-01
Summary With the introduction of the next generation sequencing (NGS) technologies, remarkable new diagnostic applications have been established in daily routine. Implementation of NGS is challenging in clinical diagnostics, but definite advantages and new diagnostic possibilities make the switch to the technology inevitable. In addition to the higher sequencing capacity, clonal sequencing of single molecules, multiplexing of samples, higher diagnostic sensitivity, workflow miniaturization, and cost benefits are some of the valuable features of the technology. After the recent advances, NGS emerged as a proven alternative for classical Sanger sequencing in the typing of human leukocyte antigens (HLA). By virtue of the clonal amplification of single DNA molecules ambiguous typing results can be avoided. Simultaneously, a higher sample throughput can be achieved by tagging of DNA molecules with multiplex identifiers and pooling of PCR products before sequencing. In our experience, up to 380 samples can be typed for HLA-A, -B, and -DRB1 in high-resolution during every sequencing run. In molecular oncology, NGS shows a markedly increased sensitivity in comparison to the conventional Sanger sequencing and is developing to the standard diagnostic tool in detection of somatic mutations in cancer cells with great impact on personalized treatment of patients. PMID:23922545
Williams, Emma L; Bagg, Eleanor A L; Mueller, Michael; Vandrovcova, Jana; Aitman, Timothy J; Rumsby, Gill
2015-01-01
Definitive diagnosis of primary hyperoxaluria (PH) currently utilizes sequential Sanger sequencing of the AGXT, GRPHR, and HOGA1 genes but efficacy is unproven. This analysis is time-consuming, relatively expensive, and delays in diagnosis and inappropriate treatment can occur if not pursued early in the diagnostic work-up. We reviewed testing outcomes of Sanger sequencing in 200 consecutive patient samples referred for analysis. In addition, the Illumina Truseq custom amplicon system was evaluated for paralleled next-generation sequencing (NGS) of AGXT,GRHPR, and HOGA1 in 90 known PH patients. AGXT sequencing was requested in all patients, permitting a diagnosis of PH1 in 50%. All remaining patients underwent targeted exon sequencing of GRHPR and HOGA1 with 8% diagnosed with PH2 and 8% with PH3. Complete sequencing of both GRHPR and HOGA1 was not requested in 25% of patients referred leaving their diagnosis in doubt. NGS analysis showed 98% agreement with Sanger sequencing and both approaches had 100% diagnostic specificity. Diagnostic sensitivity of Sanger sequencing was 98% and for NGS it was 97%. NGS has comparable diagnostic performance to Sanger sequencing for the diagnosis of PH and, if implemented, would screen for all forms of PH simultaneously ensuring prompt diagnosis at decreased cost. PMID:25629080
Riman, Sarah; Kiesler, Kevin M; Borsuk, Lisa A; Vallone, Peter M
2017-07-01
Standard Reference Materials SRM 2392 and 2392-I are intended to provide quality control when amplifying and sequencing human mitochondrial genome sequences. The National Institute of Standards and Technology (NIST) offers these SRMs to laboratories performing DNA-based forensic human identification, molecular diagnosis of mitochondrial diseases, mutation detection, evolutionary anthropology, and genetic genealogy. The entire mtGenome (∼16569bp) of SRM 2392 and 2392-I have previously been characterized at NIST by Sanger sequencing. Herein, we used the sensitivity, specificity, and accuracy offered by next generation sequencing (NGS) to: (1) re-sequence the certified values of the SRM 2392 and 2392-I; (2) confirm Sanger data with a high coverage new sequencing technology; (3) detect lower level heteroplasmies (<20%); and thus (4) support mitochondrial sequencing communities in the adoption of NGS methods. To obtain a consensus sequence for the SRMs as well as identify and control any bias, sequencing was performed using two NGS platforms and data was analyzed using different bioinformatics pipelines. Our results confirm five low level heteroplasmy sites that were not previously observed with Sanger sequencing: three sites in the GM09947A template in SRM 2392 and two sites in the HL-60 template in SRM 2392-I. Copyright © 2017 Elsevier B.V. All rights reserved.
Effects of short read quality and quantity on a de novo vertebrate transcriptome assembly.
Garcia, T I; Shen, Y; Catchen, J; Amores, A; Schartl, M; Postlethwait, J; Walter, R B
2012-01-01
For many researchers, next generation sequencing data holds the key to answering a category of questions previously unassailable. One of the important and challenging steps in achieving these goals is accurately assembling the massive quantity of short sequencing reads into full nucleic acid sequences. For research groups working with non-model or wild systems, short read assembly can pose a significant challenge due to the lack of pre-existing EST or genome reference libraries. While many publications describe the overall process of sequencing and assembly, few address the topic of how many and what types of reads are best for assembly. The goal of this project was use real world data to explore the effects of read quantity and short read quality scores on the resulting de novo assemblies. Using several samples of short reads of various sizes and qualities we produced many assemblies in an automated manner. We observe how the properties of read length, read quality, and read quantity affect the resulting assemblies and provide some general recommendations based on our real-world data set. Published by Elsevier Inc.
Impact of Next Generation Sequencing Techniques in Food Microbiology
Mayo, Baltasar; Rachid, Caio T. C. C; Alegría, Ángel; Leite, Analy M. O; Peixoto, Raquel S; Delgado, Susana
2014-01-01
Understanding the Maxam-Gilbert and Sanger sequencing as the first generation, in recent years there has been an explosion of newly-developed sequencing strategies, which are usually referred to as next generation sequencing (NGS) techniques. NGS techniques have high-throughputs and produce thousands or even millions of sequences at the same time. These sequences allow for the accurate identification of microbial taxa, including uncultivable organisms and those present in small numbers. In specific applications, NGS provides a complete inventory of all microbial operons and genes present or being expressed under different study conditions. NGS techniques are revolutionizing the field of microbial ecology and have recently been used to examine several food ecosystems. After a short introduction to the most common NGS systems and platforms, this review addresses how NGS techniques have been employed in the study of food microbiota and food fermentations, and discusses their limits and perspectives. The most important findings are reviewed, including those made in the study of the microbiota of milk, fermented dairy products, and plant-, meat- and fish-derived fermented foods. The knowledge that can be gained on microbial diversity, population structure and population dynamics via the use of these technologies could be vital in improving the monitoring and manipulation of foods and fermented food products. They should also improve their safety. PMID:25132799
Frampton, Dan; Gallo Cassarino, Tiziano; Raffle, Jade; Hubb, Jonathan; Ferns, R. Bridget; Waters, Laura; Tong, C. Y. William; Kozlakidis, Zisis; Hayward, Andrew; Kellam, Paul; Pillay, Deenan; Clark, Duncan; Nastouli, Eleni; Leigh Brown, Andrew J.
2018-01-01
Background & methods The ICONIC project has developed an automated high-throughput pipeline to generate HIV nearly full-length genomes (NFLG, i.e. from gag to nef) from next-generation sequencing (NGS) data. The pipeline was applied to 420 HIV samples collected at University College London Hospitals NHS Trust and Barts Health NHS Trust (London) and sequenced using an Illumina MiSeq at the Wellcome Trust Sanger Institute (Cambridge). Consensus genomes were generated and subtyped using COMET, and unique recombinants were studied with jpHMM and SimPlot. Maximum-likelihood phylogenetic trees were constructed using RAxML to identify transmission networks using the Cluster Picker. Results The pipeline generated sequences of at least 1Kb of length (median = 7.46Kb, IQR = 4.01Kb) for 375 out of the 420 samples (89%), with 174 (46.4%) being NFLG. A total of 365 sequences (169 of them NFLG) corresponded to unique subjects and were included in the down-stream analyses. The most frequent HIV subtypes were B (n = 149, 40.8%) and C (n = 77, 21.1%) and the circulating recombinant form CRF02_AG (n = 32, 8.8%). We found 14 different CRFs (n = 66, 18.1%) and multiple URFs (n = 32, 8.8%) that involved recombination between 12 different subtypes/CRFs. The most frequent URFs were B/CRF01_AE (4 cases) and A1/D, B/C, and B/CRF02_AG (3 cases each). Most URFs (19/26, 73%) lacked breakpoints in the PR+RT pol region, rendering them undetectable if only that was sequenced. Twelve (37.5%) of the URFs could have emerged within the UK, whereas the rest were probably imported from sub-Saharan Africa, South East Asia and South America. For 2 URFs we found highly similar pol sequences circulating in the UK. We detected 31 phylogenetic clusters using the full dataset: 25 pairs (mostly subtypes B and C), 4 triplets and 2 quadruplets. Some of these were not consistent across different genes due to inter- and intra-subtype recombination. Clusters involved 70 sequences, 19.2% of the dataset. Conclusions The initial analysis of genome sequences detected substantial hidden variability in the London HIV epidemic. Analysing full genome sequences, as opposed to only PR+RT, identified previously undetected recombinants. It provided a more reliable description of CRFs (that would be otherwise misclassified) and transmission clusters. PMID:29389981
Next Generation Workload Management and Analysis System for Big Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
De, Kaushik
We report on the activities and accomplishments of a four-year project (a three-year grant followed by a one-year no cost extension) to develop a next generation workload management system for Big Data. The new system is based on the highly successful PanDA software developed for High Energy Physics (HEP) in 2005. PanDA is used by the ATLAS experiment at the Large Hadron Collider (LHC), and the AMS experiment at the space station. The program of work described here was carried out by two teams of developers working collaboratively at Brookhaven National Laboratory (BNL) and the University of Texas at Arlingtonmore » (UTA). These teams worked closely with the original PanDA team – for the sake of clarity the work of the next generation team will be referred to as the BigPanDA project. Their work has led to the adoption of BigPanDA by the COMPASS experiment at CERN, and many other experiments and science projects worldwide.« less
Sharma, Davinder; Golla, Naresh; Singh, Dheer; Onteru, Suneel K
2018-03-01
The next-generation sequencing (NGS) based RNA sequencing (RNA-Seq) and transcriptome profiling offers an opportunity to unveil complex biological processes. Successful RNA-Seq and transcriptome profiling requires a large amount of high-quality RNA. However, NGS-quality RNA isolation is extremely difficult from recalcitrant adipose tissue (AT) with high lipid content and low cell numbers. Further, the amount and biochemical composition of AT lipid varies depending upon the animal species which can pose different degree of resistance to RNA extraction. Currently available approaches may work effectively in one species but can be almost unproductive in another species. Herein, we report a two step protocol for the extraction of NGS quality RNA from AT across a broad range of animal species. © 2017 Wiley Periodicals, Inc.
A fast sequence assembly method based on compressed data structures.
Liang, Peifeng; Zhang, Yancong; Lin, Kui; Hu, Jinglu
2014-01-01
Assembling a large genome using next generation sequencing reads requires large computer memory and a long execution time. To reduce these requirements, a memory and time efficient assembler is presented from applying FM-index in JR-Assembler, called FMJ-Assembler, where FM stand for FMR-index derived from the FM-index and BWT and J for jumping extension. The FMJ-Assembler uses expanded FM-index and BWT to compress data of reads to save memory and jumping extension method make it faster in CPU time. An extensive comparison of the FMJ-Assembler with current assemblers shows that the FMJ-Assembler achieves a better or comparable overall assembly quality and requires lower memory use and less CPU time. All these advantages of the FMJ-Assembler indicate that the FMJ-Assembler will be an efficient assembly method in next generation sequencing technology.
ZOOM Lite: next-generation sequencing data mapping and visualization software
Zhang, Zefeng; Lin, Hao; Ma, Bin
2010-01-01
High-throughput next-generation sequencing technologies pose increasing demands on the efficiency, accuracy and usability of data analysis software. In this article, we present ZOOM Lite, a software for efficient reads mapping and result visualization. With a kernel capable of mapping tens of millions of Illumina or AB SOLiD sequencing reads efficiently and accurately, and an intuitive graphical user interface, ZOOM Lite integrates reads mapping and result visualization into a easy to use pipeline on desktop PC. The software handles both single-end and paired-end reads, and can output both the unique mapping result or the top N mapping results for each read. Additionally, the software takes a variety of input file formats and outputs to several commonly used result formats. The software is freely available at http://bioinfor.com/zoom/lite/. PMID:20530531
Screening for SNPs with Allele-Specific Methylation based on Next-Generation Sequencing Data.
Hu, Bo; Ji, Yuan; Xu, Yaomin; Ting, Angela H
2013-05-01
Allele-specific methylation (ASM) has long been studied but mainly documented in the context of genomic imprinting and X chromosome inactivation. Taking advantage of the next-generation sequencing technology, we conduct a high-throughput sequencing experiment with four prostate cell lines to survey the whole genome and identify single nucleotide polymorphisms (SNPs) with ASM. A Bayesian approach is proposed to model the counts of short reads for each SNP conditional on its genotypes of multiple subjects, leading to a posterior probability of ASM. We flag SNPs with high posterior probabilities of ASM by accounting for multiple comparisons based on posterior false discovery rates. Applying the Bayesian approach to the in-house prostate cell line data, we identify 269 SNPs as candidates of ASM. A simulation study is carried out to demonstrate the quantitative performance of the proposed approach.
Guo, Liang; Li, Mingming; Zhang, Heng; Yang, Sen; Chen, Xinghan; Meng, Zining; Lin, Haoran
2016-05-01
Recently, the next-generation sequencing (NGS) technology has become a powerful tool for sequencing the teleost mitochondrial genome (mitogenome). Here, we used this technology to determine the mitogenome of the yellowfin tuna (Thunnus albacares). A total of 41,378 reads were generated by Illumina platform with an average depth of 250×. The mitogenome (16,528 bp in length) contained 37 mitochondrial genes with the similar gene order to other typical teleosts. These mitochondrial genes were encoded on the heavy strand except for ND6 and eight tRNA genes. The result of phylogenetic analysis supported two distinct clades dividing the genus Thunnus, but the tuna species of these two genetic clades were different from that of two recognized subgenus based on anatomical characters and geographical distribution. Our results might help to understand the structure, function, and evolutionary history of the yellowfin tuna mitogenome and also provide valuable new insights for phylogenetic affinity of tuna species.
Mans, Ben J; Pienaar, Ronel; Ratabane, John; Pule, Boitumelo; Latif, Abdalla A
2016-07-01
Molecular classification and systematics of the Theileria is based on the analysis of the 18S rRNA gene. Reverse line blot or conventional sequencing approaches have disadvantages in the study of 18S rRNA diversity and a next-generation 454 sequencing approach was investigated. The 18S rRNA gene was amplified using RLB primers coupled to 96 unique sequence identifiers (MIDs). Theileria positive samples from African buffalo (672) and cattle (480) from southern Africa were combined in batches of 96 and sequenced using the GS Junior 454 sequencer to produce 825711 informative sequences. Sequences were extracted based on MIDs and analysed to identify Theileria genotypes. Genotypes observed in buffalo and cattle were confirmed in the current study, while no new genotypes were discovered. Genotypes showed specific geographic distributions, most probably linked with vector distributions. Host specificity of buffalo and cattle specific genotypes were confirmed and prevalence data as well as relative parasitemia trends indicate preference for different hosts. Mixed infections are common with African buffalo carrying more genotypes compared to cattle. Associative or exclusion co-infection profiles were observed between genotypes that may have implications for speciation and systematics: specifically that more Theileria species may exist in cattle and buffalo than currently recognized. Analysis of primers used for Theileria parva diagnostics indicate that no new genotypes will be amplified by the current primer sets confirming their specificity. T. parva SNP variants that occur in the 18S rRNA hypervariable region were confirmed. A next generation sequencing approach is useful in obtaining comprehensive knowledge regarding 18S rRNA diversity and prevalence for the Theileria, allowing for the assessment of systematics and diagnostic assays based on the 18S gene. Copyright © 2016 Elsevier GmbH. All rights reserved.
Webb, Kristen M; Rosenthal, Benjamin M
2011-01-01
The mitochondrial genome's non-recombinant mode of inheritance and relatively rapid rate of evolution has promoted its use as a marker for studying the biogeographic history and evolutionary interrelationships among many metazoan species. A modest portion of the mitochondrial genome has been defined for 12 species and genotypes of parasites in the genus Trichinella, but its adequacy in representing the mitochondrial genome as a whole remains unclear, as the complete coding sequence has been characterized only for Trichinella spiralis. Here, we sought to comprehensively describe the extent and nature of divergence between the mitochondrial genomes of T. spiralis (which poses the most appreciable zoonotic risk owing to its capacity to establish persistent infections in domestic pigs) and Trichinella murrelli (which is the most prevalent species in North American wildlife hosts, but which poses relatively little risk to the safety of pork). Next generation sequencing methodologies and scaffold and de novo assembly strategies were employed. The entire protein-coding region was sequenced (13,917 bp), along with a portion of the highly repetitive non-coding region (1524 bp) of the mitochondrial genome of T. murrelli with a combined average read depth of 250 reads. The accuracy of base calling, estimated from coding region sequence was found to exceed 99.3%. Genome content and gene order was not found to be significantly different from that of T. spiralis. An overall inter-species sequence divergence of 9.5% was estimated. Significant variation was identified when the amount of variation between species at each gene is compared to the average amount of variation between species across the coding region. Next generation sequencing is a highly effective means to obtain previously unknown mitochondrial genome sequence. Particular to parasites, the extremely deep coverage achieved through this method allows for the detection of sequence heterogeneity between the multiple individuals that necessarily comprise such templates. Copyright © 2010 Elsevier B.V. All rights reserved.
2013-01-01
The formalin-fixed, paraffin-embedded (FFPE) biopsy is a challenging sample for molecular assays such as targeted next-generation sequencing (NGS). We compared three methods for FFPE DNA quantification, including a novel PCR assay (‘QFI-PCR’) that measures the absolute copy number of amplifiable DNA, across 165 residual clinical specimens. The results reveal the limitations of commonly used approaches, and demonstrate the value of an integrated workflow using QFI-PCR to improve the accuracy of NGS mutation detection and guide changes in input that can rescue low quality FFPE DNA. These findings address a growing need for improved quality measures in NGS-based patient testing. PMID:24001039
Tablet—next generation sequence assembly visualization
Milne, Iain; Bayer, Micha; Cardle, Linda; Shaw, Paul; Stephen, Gordon; Wright, Frank; Marshall, David
2010-01-01
Summary: Tablet is a lightweight, high-performance graphical viewer for next-generation sequence assemblies and alignments. Supporting a range of input assembly formats, Tablet provides high-quality visualizations showing data in packed or stacked views, allowing instant access and navigation to any region of interest, and whole contig overviews and data summaries. Tablet is both multi-core aware and memory efficient, allowing it to handle assemblies containing millions of reads, even on a 32-bit desktop machine. Availability: Tablet is freely available for Microsoft Windows, Apple Mac OS X, Linux and Solaris. Fully bundled installers can be downloaded from http://bioinf.scri.ac.uk/tablet in 32- and 64-bit versions. Contact: tablet@scri.ac.uk PMID:19965881
High-Throughput Next-Generation Sequencing of Polioviruses
Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.
2016-01-01
ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929
Quail, Michael A; Smith, Miriam; Coupland, Paul; Otto, Thomas D; Harris, Simon R; Connor, Thomas R; Bertoni, Anna; Swerdlow, Harold P; Gu, Yong
2012-07-24
Next generation sequencing (NGS) technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent's PGM, Pacific Biosciences' RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy. Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform. All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support.
Efficient error correction for next-generation sequencing of viral amplicons
2012-01-01
Background Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. Results In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Conclusions Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses. The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm PMID:22759430
Efficient error correction for next-generation sequencing of viral amplicons.
Skums, Pavel; Dimitrova, Zoya; Campo, David S; Vaughan, Gilberto; Rossi, Livia; Forbi, Joseph C; Yokosawa, Jonny; Zelikovsky, Alex; Khudyakov, Yury
2012-06-25
Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses.The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm.
Simos, Nikolaos
2017-12-22
Nikolaos Simos of Brookhavenâs Energy Sciences and Technology Department and the National Synchrotron Light Source II Project presents, âExtreme Environments of Next-Generation Energy Systems and Materials: Can They Peacefully Co-Exist?â
Educating the next generation of nature entrepreneurs
Judith C. Jobse; Loes Witteveen; Judith Santegoets; Daan van der Linde
2015-01-01
With this paper, it is illustrated that a focus on entrepreneurship training in the nature and wilderness sector is relevant for diverse organisations and situations. The first curricula on nature entrepreneurship are currently being developed. In this paper the authors describe a project that focusses on educating the next generation of nature entrepreneurs, reflect...
Georgia Tech Vertical Lift Research Center of Excellence
2017-12-14
Technical Project Summaries Task 1.1 (GT-1): Next Generation VABS for More Realistic Modeling of Composite Blades ...Methodology for the Prediction of Rotor Blade Ice Formation and Shedding ..................................................................... 20...software disclosures and technology transfer efforts. Task 1.1 (GT-1): Next Generation VABS for More Realistic Modeling of Composite Blades PIs
Reducing Risk for the Next Generation Nuclear Plant
DOE Office of Scientific and Technical Information (OSTI.GOV)
John M. Beck II; Harold J. Heydt; Emmanuel O. Opare
2010-07-01
The Next Generation Nuclear Plant (NGNP) Project, managed by the Idaho National Laboratory (INL), is directed by the Energy Policy Act of 2005, to research, develop, design, construct, and operate a prototype forth generation nuclear reactor to meet the needs of the 21st Century. As with all large projects developing and deploying new technologies, the NGNP has numerous risks that need to be identified, tracked, mitigated, and reduced in order for successful project completion. A Risk Management Plan (RMP) was created to outline the process the INL is using to manage the risks and reduction strategies for the NGNP Project.more » Integral to the RMP is the development and use of a Risk Management System (RMS). The RMS is a tool that supports management and monitoring of the project risks. The RMS does not only contain a risk register, but other functionality that allows decision makers, engineering staff, and technology researchers to review and monitor the risks as the project matures.« less
Schramm, Chaim A; Sheng, Zizhang; Zhang, Zhenhai; Mascola, John R; Kwong, Peter D; Shapiro, Lawrence
2016-01-01
The rapid advance of massively parallel or next-generation sequencing technologies has made possible the characterization of B cell receptor repertoires in ever greater detail, and these developments have triggered a proliferation of software tools for processing and annotating these data. Of especial interest, however, is the capability to track the development of specific antibody lineages across time, which remains beyond the scope of most current programs. We have previously reported on the use of techniques such as inter- and intradonor analysis and CDR3 tracing to identify transcripts related to an antibody of interest. Here, we present Software for the Ontogenic aNalysis of Antibody Repertoires (SONAR), capable of automating both general repertoire analysis and specialized techniques for investigating specific lineages. SONAR annotates next-generation sequencing data, identifies transcripts in a lineage of interest, and tracks lineage development across multiple time points. SONAR also generates figures, such as identity-divergence plots and longitudinal phylogenetic "birthday" trees, and provides interfaces to other programs such as DNAML and BEAST. SONAR can be downloaded as a ready-to-run Docker image or manually installed on a local machine. In the latter case, it can also be configured to take advantage of a high-performance computing cluster for the most computationally intensive steps, if available. In summary, this software provides a useful new tool for the processing of large next-generation sequencing datasets and the ontogenic analysis of neutralizing antibody lineages. SONAR can be found at https://github.com/scharch/SONAR, and the Docker image can be obtained from https://hub.docker.com/r/scharch/sonar/.
Gangras, Pooja; Dayeh, Daniel M; Mabin, Justin W; Nakanishi, Kotaro; Singh, Guramrit
2018-01-01
Argonaute proteins (AGOs) are loaded with small RNAs as guides to recognize target mRNAs. Since the target specificity heavily depends on the base complementarity between two strands, it is important to identify small guide and long target RNAs bound to AGOs. For this purpose, next-generation sequencing (NGS) technologies have extended our appreciation truly to the nucleotide level. However, the identification of RNAs via NGS from scarce RNA samples remains a challenge. Further, most commercial and published methods are compatible with either small RNAs or long RNAs, but are not equally applicable to both. Therefore, a single method that yields quantitative, bias-free NGS libraries to identify small and long RNAs from low levels of input will be of wide interest. Here, we introduce such a procedure that is based on several modifications of two published protocols and allows robust, sensitive, and reproducible cloning and sequencing of small amounts of RNAs of variable lengths. The method was applied to the identification of small RNAs bound to a purified eukaryotic AGO. Following ligation of a DNA adapter to RNA 3'-end, the key feature of this method is to use the adapter for priming reverse transcription (RT) wherein biotinylated deoxyribonucleotides specifically incorporated into the extended complementary DNA. Such RT products are enriched on streptavidin beads, circularized while immobilized on beads and directly used for PCR amplification. We provide a stepwise guide to generate RNA-Seq libraries, their purification, quantification, validation, and preparation for next-generation sequencing. We also provide basic steps in post-NGS data analyses using Galaxy, an open-source, web-based platform.
gCUP: rapid GPU-based HIV-1 co-receptor usage prediction for next-generation sequencing.
Olejnik, Michael; Steuwer, Michel; Gorlatch, Sergei; Heider, Dominik
2014-11-15
Next-generation sequencing (NGS) has a large potential in HIV diagnostics, and genotypic prediction models have been developed and successfully tested in the recent years. However, albeit being highly accurate, these computational models lack computational efficiency to reach their full potential. In this study, we demonstrate the use of graphics processing units (GPUs) in combination with a computational prediction model for HIV tropism. Our new model named gCUP, parallelized and optimized for GPU, is highly accurate and can classify >175 000 sequences per second on an NVIDIA GeForce GTX 460. The computational efficiency of our new model is the next step to enable NGS technologies to reach clinical significance in HIV diagnostics. Moreover, our approach is not limited to HIV tropism prediction, but can also be easily adapted to other settings, e.g. drug resistance prediction. The source code can be downloaded at http://www.heiderlab.de d.heider@wz-straubing.de. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Ecological and evolutionary genomics of marine photosynthetic organisms.
Coelho, Susana M; Simon, Nathalie; Ahmed, Sophia; Cock, J Mark; Partensky, Frédéric
2013-02-01
Environmental (ecological) genomics aims to understand the genetic basis of relationships between organisms and their abiotic and biotic environments. It is a rapidly progressing field of research largely due to recent advances in the speed and volume of genomic data being produced by next generation sequencing (NGS) technologies. Building on information generated by NGS-based approaches, functional genomic methodologies are being applied to identify and characterize genes and gene systems of both environmental and evolutionary relevance. Marine photosynthetic organisms (MPOs) were poorly represented amongst the early genomic models, but this situation is changing rapidly. Here we provide an overview of the recent advances in the application of ecological genomic approaches to both prokaryotic and eukaryotic MPOs. We describe how these approaches are being used to explore the biology and ecology of marine cyanobacteria and algae, particularly with regard to their functions in a broad range of marine ecosystems. Specifically, we review the ecological and evolutionary insights gained from whole genome and transcriptome sequencing projects applied to MPOs and illustrate how their genomes are yielding information on the specific features of these organisms. © 2012 Blackwell Publishing Ltd.
Zhou, Shuntai; Jones, Corbin; Mieczkowski, Piotr
2015-01-01
ABSTRACT Validating the sampling depth and reducing sequencing errors are critical for studies of viral populations using next-generation sequencing (NGS). We previously described the use of Primer ID to tag each viral RNA template with a block of degenerate nucleotides in the cDNA primer. We now show that low-abundance Primer IDs (offspring Primer IDs) are generated due to PCR/sequencing errors. These artifactual Primer IDs can be removed using a cutoff model for the number of reads required to make a template consensus sequence. We have modeled the fraction of sequences lost due to Primer ID resampling. For a typical sequencing run, less than 10% of the raw reads are lost to offspring Primer ID filtering and resampling. The remaining raw reads are used to correct for PCR resampling and sequencing errors. We also demonstrate that Primer ID reveals bias intrinsic to PCR, especially at low template input or utilization. cDNA synthesis and PCR convert ca. 20% of RNA templates into recoverable sequences, and 30-fold sequence coverage recovers most of these template sequences. We have directly measured the residual error rate to be around 1 in 10,000 nucleotides. We use this error rate and the Poisson distribution to define the cutoff to identify preexisting drug resistance mutations at low abundance in an HIV-infected subject. Collectively, these studies show that >90% of the raw sequence reads can be used to validate template sampling depth and to dramatically reduce the error rate in assessing a genetically diverse viral population using NGS. IMPORTANCE Although next-generation sequencing (NGS) has revolutionized sequencing strategies, it suffers from serious limitations in defining sequence heterogeneity in a genetically diverse population, such as HIV-1 due to PCR resampling and PCR/sequencing errors. The Primer ID approach reveals the true sampling depth and greatly reduces errors. Knowing the sampling depth allows the construction of a model of how to maximize the recovery of sequences from input templates and to reduce resampling of the Primer ID so that appropriate multiplexing can be included in the experimental design. With the defined sampling depth and measured error rate, we are able to assign cutoffs for the accurate detection of minority variants in viral populations. This approach allows the power of NGS to be realized without having to guess about sampling depth or to ignore the problem of PCR resampling, while also being able to correct most of the errors in the data set. PMID:26041299
NASA Technical Reports Server (NTRS)
Patterson, Michael J.; Pencil, Eric J.
2014-01-01
NASAs Evolutionary Xenon Thruster (NEXT) project is developing next generation ion propulsion technologies to enhance the performance and lower the costs of future NASA space science missions. This is being accomplished by producing Engineering Model (EM) and Prototype Model (PM) components, validating these via qualification-level and integrated system testing, and preparing the transition of NEXT technologies to flight system development. This presentation is a follow-up to the NEXT project overviews presented in 2009-2010. It reviews the status of the NEXT project, presents the current system performance characteristics, and describes planned activities in continuing the transition of NEXT technology to a first flight. In 2013 a voluntary decision was made to terminate the long duration test of the NEXT thruster, given the thruster design has exceeded all expectations by accumulating over 50,000 hours of operation to demonstrate around 900 kg of xenon throughput. Besides its promise for upcoming NASA science missions, NEXT has excellent potential for future commercial and international spacecraft applications.
Luo, Li; Zhu, Yun
2012-01-01
Abstract The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T2, collapsing method, multivariate and collapsing (CMC) method, individual χ2 test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets. PMID:22651812
Luo, Li; Zhu, Yun; Xiong, Momiao
2012-06-01
The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.
Analysis of selected genes associated with cardiomyopathy by next-generation sequencing.
Szabadosova, Viktoria; Boronova, Iveta; Ferenc, Peter; Tothova, Iveta; Bernasovska, Jarmila; Zigova, Michaela; Kmec, Jan; Bernasovsky, Ivan
2018-02-01
As the leading cause of congestive heart failure, cardiomyopathy represents a heterogenous group of heart muscle disorders. Despite considerable progress being made in the genetic diagnosis of cardiomyopathy by detection of the mutations in the most prevalent cardiomyopathy genes, the cause remains unsolved in many patients. High-throughput mutation screening in the disease genes for cardiomyopathy is now possible because of using target enrichment followed by next-generation sequencing. The aim of the study was to analyze a panel of genes associated with dilated or hypertrophic cardiomyopathy based on previously published results in order to identify the subjects at risk. The method of next-generation sequencing by IlluminaHiSeq 2500 platform was used to detect sequence variants in 16 individuals diagnosed with dilated or hypertrophic cardiomyopathy. Detected variants were filtered and the functional impact of amino acid changes was predicted by computational programs. DNA samples of the 16 patients were analyzed by whole exome sequencing. We identified six nonsynonymous variants that were shown to be pathogenic in all used prediction softwares: rs3744998 (EPG5), rs11551768 (MGME1), rs148374985 (MURC), rs78461695 (PLEC), rs17158558 (RET) and rs2295190 (SYNE1). Two of the analyzed sequence variants had minor allele frequency (MAF)<0.01: rs148374985 (MURC), rs34580776 (MYBPC3). Our data support the potential role of the detected variants in pathogenesis of dilated or hypertrophic cardiomyopathy; however, the possibility that these variants might not be true disease-causing variants but are susceptibility alleles that require additional mutations or injury to cause the clinical phenotype of disease must be considered. © 2017 Wiley Periodicals, Inc.
Trujillano, Daniel; Weiss, Maximilian E R; Schneider, Juliane; Köster, Julia; Papachristos, Efstathios B; Saviouk, Viatcheslav; Zakharkina, Tetyana; Nahavandi, Nahid; Kovacevic, Lejla; Rolfs, Arndt
2015-03-01
Genetic testing for hereditary breast and/or ovarian cancer mostly relies on laborious molecular tools that use Sanger sequencing to scan for mutations in the BRCA1 and BRCA2 genes. We explored a more efficient genetic screening strategy based on next-generation sequencing of the BRCA1 and BRCA2 genes in 210 hereditary breast and/or ovarian cancer patients. We first validated this approach in a cohort of 115 samples with previously known BRCA1 and BRCA2 mutations and polymorphisms. Genomic DNA was amplified using the Ion AmpliSeq BRCA1 and BRCA2 panel. The DNA Libraries were pooled, barcoded, and sequenced using an Ion Torrent Personal Genome Machine sequencer. The combination of different robust bioinformatics tools allowed detection of all previously known pathogenic mutations and polymorphisms in the 115 samples, without detecting spurious pathogenic calls. We then used the same assay in a discovery cohort of 95 uncharacterized hereditary breast and/or ovarian cancer patients for BRCA1 and BRCA2. In addition, we describe the allelic frequencies across 210 hereditary breast and/or ovarian cancer patients of 74 unique definitely and likely pathogenic and uncertain BRCA1 and BRCA2 variants, some of which have not been previously annotated in the public databases. Targeted next-generation sequencing is ready to substitute classic molecular methods to perform genetic testing on the BRCA1 and BRCA2 genes and provides a greater opportunity for more comprehensive testing of at-risk patients. Copyright © 2015 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
The genetic architecture of long QT syndrome: A critical reappraisal.
Giudicessi, John R; Wilde, Arthur A M; Ackerman, Michael J
2018-03-30
Collectively, the completion of the Human Genome Project and subsequent development of high-throughput next-generation sequencing methodologies have revolutionized genomic research. However, the rapid sequencing and analysis of thousands upon thousands of human exomes and genomes has taught us that most genes, including those known to cause heritable cardiovascular disorders such as long QT syndrome, harbor an unexpected background rate of rare, and presumably innocuous, non-synonymous genetic variation. In this Review, we aim to reappraise the genetic architecture underlying both the acquired and congenital forms of long QT syndrome by examining how the clinical phenotype associated with and background genetic variation in long QT syndrome-susceptibility genes impacts the clinical validity of existing gene-disease associations and the variant classification and reporting strategies that serve as the foundation for diagnostic long QT syndrome genetic testing. Copyright © 2018 Elsevier Inc. All rights reserved.
STINGRAY: system for integrated genomic resources and analysis.
Wagner, Glauber; Jardim, Rodrigo; Tschoeke, Diogo A; Loureiro, Daniel R; Ocaña, Kary A C S; Ribeiro, Antonio C B; Emmel, Vanessa E; Probst, Christian M; Pitaluga, André N; Grisard, Edmundo C; Cavalcanti, Maria C; Campos, Maria L M; Mattoso, Marta; Dávila, Alberto M R
2014-03-07
The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms. STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interface that makes the system intuitive, facilitating the tasks of data analysis and annotation. STINGRAY showed to be an easy to use and complete system for analyzing sequencing data. While both Sanger and NGS platforms are supported, the system could be faster using Sanger data, since the large NGS datasets could potentially slow down the MySQL database usage. STINGRAY is available at http://stingray.biowebdb.org and the open source code at http://sourceforge.net/projects/stingray-biowebdb/.
STINGRAY: system for integrated genomic resources and analysis
2014-01-01
Background The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms. Findings STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interface that makes the system intuitive, facilitating the tasks of data analysis and annotation. Conclusion STINGRAY showed to be an easy to use and complete system for analyzing sequencing data. While both Sanger and NGS platforms are supported, the system could be faster using Sanger data, since the large NGS datasets could potentially slow down the MySQL database usage. STINGRAY is available at http://stingray.biowebdb.org and the open source code at http://sourceforge.net/projects/stingray-biowebdb/. PMID:24606808
Hafler, Brian P
2017-03-01
Inherited retinal dystrophies are a significant cause of vision loss and are characterized by the loss of photoreceptors and the retinal pigment epithelium (RPE). Mutations in approximately 250 genes cause inherited retinal degenerations with a high degree of genetic heterogeneity. New techniques in next-generation sequencing are allowing the comprehensive analysis of all retinal disease genes thus changing the approach to the molecular diagnosis of inherited retinal dystrophies. This review serves to analyze clinical progress in genetic diagnostic testing and implications for retinal gene therapy. A literature search of PubMed and OMIM was conducted to relevant articles in inherited retinal dystrophies. Next-generation genetic sequencing allows the simultaneous analysis of all the approximately 250 genes that cause inherited retinal dystrophies. Reported diagnostic rates range are high and range from 51% to 57%. These new sequencing tools are highly accurate with sensitivities of 97.9% and specificities of 100%. Retinal gene therapy clinical trials are underway for multiple genes including RPE65, ABCA4, CHM, RS1, MYO7A, CNGA3, CNGB3, ND4, and MERTK for which a molecular diagnosis may be beneficial for patients. Comprehensive next-generation genetic sequencing of all retinal dystrophy genes is changing the paradigm for how retinal specialists perform genetic testing for inherited retinal degenerations. Not only are high diagnostic yields obtained, but mutations in genes with novel clinical phenotypes are also identified. In the era of retinal gene therapy clinical trials, identifying specific genetic defects will increasingly be of use to identify patients who may enroll in clinical studies and benefit from novel therapies.
Next-generation sequencing in clinical virology: Discovery of new viruses.
Datta, Sibnarayan; Budhauliya, Raghvendra; Das, Bidisha; Chatterjee, Soumya; Vanlalhmuaka; Veer, Vijay
2015-08-12
Viruses are a cause of significant health problem worldwide, especially in the developing nations. Due to different anthropological activities, human populations are exposed to different viral pathogens, many of which emerge as outbreaks. In such situations, discovery of novel viruses is utmost important for deciding prevention and treatment strategies. Since last century, a number of different virus discovery methods, based on cell culture inoculation, sequence-independent PCR have been used for identification of a variety of viruses. However, the recent emergence and commercial availability of next-generation sequencers (NGS) has entirely changed the field of virus discovery. These massively parallel sequencing platforms can sequence a mixture of genetic materials from a very heterogeneous mix, with high sensitivity. Moreover, these platforms work in a sequence-independent manner, making them ideal tools for virus discovery. However, for their application in clinics, sample preparation or enrichment is necessary to detect low abundance virus populations. A number of techniques have also been developed for enrichment or viral nucleic acids. In this manuscript, we review the evolution of sequencing; NGS technologies available today as well as widely used virus enrichment technologies. We also discuss the challenges associated with their applications in the clinical virus discovery.
Liang, Chanjuan; van Dijk, Jeroen P; Scholtens, Ingrid M J; Staats, Martijn; Prins, Theo W; Voorhuijzen, Marleen M; da Silva, Andrea M; Arisi, Ana Carolina Maisonnave; den Dunnen, Johan T; Kok, Esther J
2014-04-01
The growing number of biotech crops with novel genetic elements increasingly complicates the detection of genetically modified organisms (GMOs) in food and feed samples using conventional screening methods. Unauthorized GMOs (UGMOs) in food and feed are currently identified through combining GMO element screening with sequencing the DNA flanking these elements. In this study, a specific and sensitive qPCR assay was developed for vip3A element detection based on the vip3Aa20 coding sequences of the recently marketed MIR162 maize and COT102 cotton. Furthermore, SiteFinding-PCR in combination with Sanger, Illumina or Pacific BioSciences (PacBio) sequencing was performed targeting the flanking DNA of the vip3Aa20 element in MIR162. De novo assembly and Basic Local Alignment Search Tool searches were used to mimic UGMO identification. PacBio data resulted in relatively long contigs in the upstream (1,326 nucleotides (nt); 95 % identity) and downstream (1,135 nt; 92 % identity) regions, whereas Illumina data resulted in two smaller contigs of 858 and 1,038 nt with higher sequence identity (>99 % identity). Both approaches outperformed Sanger sequencing, underlining the potential for next-generation sequencing in UGMO identification.
Length-independent structural similarities enrich the antibody CDR canonical class model.
Nowak, Jaroslaw; Baker, Terry; Georges, Guy; Kelm, Sebastian; Klostermann, Stefan; Shi, Jiye; Sridharan, Sudharsan; Deane, Charlotte M
2016-01-01
Complementarity-determining regions (CDRs) are antibody loops that make up the antigen binding site. Here, we show that all CDR types have structurally similar loops of different lengths. Based on these findings, we created length-independent canonical classes for the non-H3 CDRs. Our length variable structural clusters show strong sequence patterns suggesting either that they evolved from the same original structure or result from some form of convergence. We find that our length-independent method not only clusters a larger number of CDRs, but also predicts canonical class from sequence better than the standard length-dependent approach. To demonstrate the usefulness of our findings, we predicted cluster membership of CDR-L3 sequences from 3 next-generation sequencing datasets of the antibody repertoire (over 1,000,000 sequences). Using the length-independent clusters, we can structurally classify an additional 135,000 sequences, which represents a ∼20% improvement over the standard approach. This suggests that our length-independent canonical classes might be a highly prevalent feature of antibody space, and could substantially improve our ability to accurately predict the structure of novel CDRs identified by next-generation sequencing.
Alena K. Oliver; Shawn P. Brown; Mac A. Callaham; Ari Jumpponen
2015-01-01
Rare taxa overwhelm metabarcoding data generated using next-generation sequencing (NGS). Low frequency Operational Taxonomic Units (OTUs) may be artifacts generated by PCR-amplification errors resulting from polymerase mispairing. We analyzed two Internal Transcribed Spacer 2 (ITS2) MiSeq libraries generated with proofreading (ThermoScientific Phusion
Clinical sequencing in leukemia with the assistance of artificial intelligence.
Tojo, Arinobu
2017-01-01
Next generation sequencing (NGS) of cancer genomes is now becoming a prerequisite for accurate diagnosis and proper treatment in clinical oncology. Because the genomic regions for NGS expand from a certain set of genes to the whole exome or whole genome, the resulting sequence data becomes incredibly enormous and makes it quite laborious to translate the genomic data into medicine, so-called annotation and curation. We organized a clinical sequencing team and established a bidirectional (bed-to-bench and bench-to-bed) system to integrate clinical and genomic data for hematological malignancies. We also started a collaborative research project with IBM Japan to adopt the artificial intelligence Watson for Genomics (WfG) to the pipeline of medical informatics. Genomic DNA was prepared from malignant as well as normal tissues in each patient and subjected to NGS. Sequence data was analyzed using an in-house semi-automated pipeline in combination with WfG, which was used to identify candidate driver mutations and relevant pathways from which applicable drug information was deduced. Currently, we have analyzed more than 150 patients with hematological disorders, including AML and ALL, and obtained many informative findings. In this presentation, I will introduce some of the achievements we have made so far.
Draft genome sequences of Streptococcus bovis strains ATCC 33317 and JB1
USDA-ARS?s Scientific Manuscript database
We report the draft genome sequences of Streptococcus bovis type strain ATTC 33317 (CVM42251) isolated from cow dung and strain JB1 (CVM42252) isolated from a cow rumen in 1977. Strains were subjected to Next Generation sequencing and the genome sizes are approximately 2 MB and 2.2 MB, respectively....
USDA-ARS?s Scientific Manuscript database
The genomic sequences of low and high passages of U.S. infectious laryngotracheitis (ILT) vaccine strains chicken embryo origin (CEO) and tissue culture origin (TCO) these strains were determined using hybrid next generation sequencing in order to define relevant genomic changes associated with att...