Science.gov

Sample records for generation sequencing platforms

  1. Next-Generation Sequencing Platforms

    NASA Astrophysics Data System (ADS)

    Mardis, Elaine R.

    2013-06-01

    Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.

  2. Base-calling for next-generation sequencing platforms.

    PubMed

    Ledergerber, Christian; Dessimoz, Christophe

    2011-09-01

    Next-generation sequencing platforms are dramatically reducing the cost of DNA sequencing. With these technologies, bases are inferred from light intensity signals, a process commonly referred to as base-calling. Thus, understanding and improving the quality of sequence data generated using these approaches are of high interest. Recently, a number of papers have characterized the biases associated with base-calling and proposed methodological improvements. In this review, we summarize recent development of base-calling approaches for the Illumina and Roche 454 sequencing platforms.

  3. Toward Complete Bacterial Genome Sequencing Through the Combined Use of Multiple Next-Generation Sequencing Platforms.

    PubMed

    Jeong, Haeyoung; Lee, Dae-Hee; Ryu, Choong-Min; Park, Seung-Hwan

    2016-01-01

    PacBio's long-read sequencing technologies can be successfully used for a complete bacterial genome assembly using recently developed non-hybrid assemblers in the absence of secondgeneration, high-quality short reads. However, standardized procedures that take into account multiple pre-existing second-generation sequencing platforms are scarce. In addition to Illumina HiSeq and Ion Torrent PGM-based genome sequencing results derived from previous studies, we generated further sequencing data, including from the PacBio RS II platform, and applied various bioinformatics tools to obtain complete genome assemblies for five bacterial strains. Our approach revealed that the hierarchical genome assembly process (HGAP) non-hybrid assembler resulted in nearly complete assemblies at a moderate coverage of ~75x, but that different versions produced non-compatible results requiring post processing. The other two platforms further improved the PacBio assembly through scaffolding and a final error correction.

  4. Use of four next-generation sequencing platforms to determine HIV-1 coreceptor tropism.

    PubMed

    Archer, John; Weber, Jan; Henry, Kenneth; Winner, Dane; Gibson, Richard; Lee, Lawrence; Paxinos, Ellen; Arts, Eric J; Robertson, David L; Mimms, Larry; Quiñones-Mateu, Miguel E

    2012-01-01

    HIV-1 coreceptor tropism assays are required to rule out the presence of CXCR4-tropic (non-R5) viruses prior treatment with CCR5 antagonists. Phenotypic (e.g., Trofile™, Monogram Biosciences) and genotypic (e.g., population sequencing linked to bioinformatic algorithms) assays are the most widely used. Although several next-generation sequencing (NGS) platforms are available, to date all published deep sequencing HIV-1 tropism studies have used the 454™ Life Sciences/Roche platform. In this study, HIV-1 co-receptor usage was predicted for twelve patients scheduled to start a maraviroc-based antiretroviral regimen. The V3 region of the HIV-1 env gene was sequenced using four NGS platforms: 454™, PacBio® RS (Pacific Biosciences), Illumina®, and Ion Torrent™ (Life Technologies). Cross-platform variation was evaluated, including number of reads, read length and error rates. HIV-1 tropism was inferred using Geno2Pheno, Web PSSM, and the 11/24/25 rule and compared with Trofile™ and virologic response to antiretroviral therapy. Error rates related to insertions/deletions (indels) and nucleotide substitutions introduced by the four NGS platforms were low compared to the actual HIV-1 sequence variation. Each platform detected all major virus variants within the HIV-1 population with similar frequencies. Identification of non-R5 viruses was comparable among the four platforms, with minor differences attributable to the algorithms used to infer HIV-1 tropism. All NGS platforms showed similar concordance with virologic response to the maraviroc-based regimen (75% to 80% range depending on the algorithm used), compared to Trofile (80%) and population sequencing (70%). In conclusion, all four NGS platforms were able to detect minority non-R5 variants at comparable levels suggesting that any NGS-based method can be used to predict HIV-1 coreceptor usage.

  5. Different next generation sequencing platforms produce different microbial profiles and diversity in cystic fibrosis sputum.

    PubMed

    Hahn, Andrea; Sanyal, Amit; Perez, Geovanny F; Colberg-Poley, Anamaris M; Campos, Joseph; Rose, Mary C; Pérez-Losada, Marcos

    2016-11-01

    Cystic fibrosis (CF) is an autosomal recessive disease characterized by recurrent lung infections. Studies of the lung microbiome have shown an association between decreasing diversity and progressive disease. 454 pyrosequencing has frequently been used to study the lung microbiome in CF, but will no longer be supported. We sought to identify the benefits and drawbacks of using two state-of-the-art next generation sequencing (NGS) platforms, MiSeq and PacBio RSII, to characterize the CF lung microbiome. Each has its advantages and limitations. Twelve samples of extracted bacterial DNA were sequenced on both MiSeq and PacBio NGS platforms. DNA was amplified for the V4 region of the 16S rRNA gene and libraries were sequenced on the MiSeq sequencing platform, while the full 16S rRNA gene was sequenced on the PacBio RSII sequencing platform. Raw FASTQ files generated by the MiSeq and PacBio platforms were processed in mothur v1.35.1. There was extreme discordance in alpha-diversity of the CF lung microbiome when using the two platforms. Because of its depth of coverage, sequencing of the 16S rRNA V4 gene region using MiSeq allowed for the observation of many more operational taxonomic units (OTUs) and higher Chao1 and Shannon indices than the PacBio RSII. Interestingly, several patients in our cohort had Escherichia, an unusual pathogen in CF. Also, likely because of its coverage of the complete 16S rRNA gene, only PacBio RSII was able to identify Burkholderia, an important CF pathogen. When comparing microbiome diversity in clinical samples from CF patients using 16S sequences, MiSeq and PacBio NGS platforms may generate different results in microbial community composition and structure. It may be necessary to use different platforms when trying to correctly identify dominant pathogens versus measuring alpha-diversity estimates, and it would be important to use the same platform for comparisons to minimize errors in interpretation. Copyright © 2016 Elsevier B.V. All

  6. Sequencing of BAC pools by different next generation sequencing platforms and strategies.

    PubMed

    Taudien, Stefan; Steuernagel, Burkhard; Ariyadasa, Ruvini; Schulte, Daniela; Schmutzer, Thomas; Groth, Marco; Felder, Marius; Petzold, Andreas; Scholz, Uwe; Mayer, Klaus Fx; Stein, Nils; Platzer, Matthias

    2011-10-14

    Next generation sequencing of BACs is a viable option for deciphering the sequence of even large and highly repetitive genomes. In order to optimize this strategy, we examined the influence of read length on the quality of Roche/454 sequence assemblies, to what extent Illumina/Solexa mate pairs (MPs) improve the assemblies by scaffolding and whether barcoding of BACs is dispensable. Sequencing four BACs with both FLX and Titanium technologies revealed similar sequencing accuracy, but showed that the longer Titanium reads produce considerably less misassemblies and gaps. The 454 assemblies of 96 barcoded BACs were improved by scaffolding 79% of the total contig length with MPs from a non-barcoded library.Assembly of the unmasked 454 sequences without separation by barcodes revealed chimeric contig formation to be a major problem, encompassing 47% of the total contig length. Masking the sequences reduced this fraction to 24%. Optimal BAC pool sequencing should be based on the longest available reads, with barcoding essential for a comprehensive assessment of both repetitive and non-repetitive sequence information. When interest is restricted to non-repetitive regions and repeats are masked prior to assembly, barcoding is non-essential. In any case, the assemblies can be improved considerably by scaffolding with non-barcoded BAC pool MPs.

  7. FLEXBAR—Flexible Barcode and Adapter Processing for Next-Generation Sequencing Platforms

    PubMed Central

    Dodt, Matthias; Roehr, Johannes T.; Ahmed, Rina; Dieterich, Christoph

    2012-01-01

    Quantitative and systems biology approaches benefit from the unprecedented depth of next-generation sequencing. A typical experiment yields millions of short reads, which oftentimes carry particular sequence tags. These tags may be: (a) specific to the sequencing platform and library construction method (e.g., adapter sequences); (b) have been introduced by experimental design (e.g., sample barcodes); or (c) constitute some biological signal (e.g., splice leader sequences in nematodes). Our software FLEXBAR enables accurate recognition, sorting and trimming of sequence tags with maximal flexibility, based on exact overlap sequence alignment. The software supports data formats from all current sequencing platforms, including color-space reads. FLEXBAR maintains read pairings and processes separate barcode reads on demand. Our software facilitates the fine-grained adjustment of sequence tag detection parameters and search regions. FLEXBAR is a multi-threaded software and combines speed with precision. Even complex read processing scenarios might be executed with a single command line call. We demonstrate the utility of the software in terms of read mapping applications, library demultiplexing and splice leader detection. FLEXBAR and additional information is available for academic use from the website: http://sourceforge.net/projects/flexbar/. PMID:24832523

  8. Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses

    PubMed Central

    Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

    2014-01-01

    Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. PMID:24462600

  9. Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.

    PubMed

    Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

    2014-06-01

    Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach.

  10. A Microfluidic DNA Library Preparation Platform for Next-Generation Sequencing

    PubMed Central

    Sinha, Anupama; Bent, Zachary W.; Solberg, Owen D.; Williams, Kelly P.; Langevin, Stanley A.; Renzi, Ronald F.; Van De Vreugde, James L.; Meagher, Robert J.; Schoeniger, Joseph S.; Lane, Todd W.; Branda, Steven S.; Bartsch, Michael S.; Patel, Kamlesh D.

    2013-01-01

    Next-generation sequencing (NGS) is emerging as a powerful tool for elucidating genetic information for a wide range of applications. Unfortunately, the surging popularity of NGS has not yet been accompanied by an improvement in automated techniques for preparing formatted sequencing libraries. To address this challenge, we have developed a prototype microfluidic system for preparing sequencer-ready DNA libraries for analysis by Illumina sequencing. Our system combines droplet-based digital microfluidic (DMF) sample handling with peripheral modules to create a fully-integrated, sample-in library-out platform. In this report, we use our automated system to prepare NGS libraries from samples of human and bacterial genomic DNA. E. coli libraries prepared on-device from 5 ng of total DNA yielded excellent sequence coverage over the entire bacterial genome, with >99% alignment to the reference genome, even genome coverage, and good quality scores. Furthermore, we produced a de novo assembly on a previously unsequenced multi-drug resistant Klebsiella pneumoniae strain BAA-2146 (KpnNDM). The new method described here is fast, robust, scalable, and automated. Our device for library preparation will assist in the integration of NGS technology into a wide variety of laboratories, including small research laboratories and clinical laboratories. PMID:23894387

  11. A microfluidic DNA library preparation platform for next-generation sequencing.

    PubMed

    Kim, Hanyoup; Jebrail, Mais J; Sinha, Anupama; Bent, Zachary W; Solberg, Owen D; Williams, Kelly P; Langevin, Stanley A; Renzi, Ronald F; Van De Vreugde, James L; Meagher, Robert J; Schoeniger, Joseph S; Lane, Todd W; Branda, Steven S; Bartsch, Michael S; Patel, Kamlesh D

    2013-01-01

    Next-generation sequencing (NGS) is emerging as a powerful tool for elucidating genetic information for a wide range of applications. Unfortunately, the surging popularity of NGS has not yet been accompanied by an improvement in automated techniques for preparing formatted sequencing libraries. To address this challenge, we have developed a prototype microfluidic system for preparing sequencer-ready DNA libraries for analysis by Illumina sequencing. Our system combines droplet-based digital microfluidic (DMF) sample handling with peripheral modules to create a fully-integrated, sample-in library-out platform. In this report, we use our automated system to prepare NGS libraries from samples of human and bacterial genomic DNA. E. coli libraries prepared on-device from 5 ng of total DNA yielded excellent sequence coverage over the entire bacterial genome, with >99% alignment to the reference genome, even genome coverage, and good quality scores. Furthermore, we produced a de novo assembly on a previously unsequenced multi-drug resistant Klebsiella pneumoniae strain BAA-2146 (KpnNDM). The new method described here is fast, robust, scalable, and automated. Our device for library preparation will assist in the integration of NGS technology into a wide variety of laboratories, including small research laboratories and clinical laboratories.

  12. Clinical analysis of genome next-generation sequencing data using the Omicia platform

    PubMed Central

    Coonrod, Emily M; Margraf, Rebecca L; Russell, Archie; Voelkerding, Karl V; Reese, Martin G

    2013-01-01

    Aims Next-generation sequencing is being implemented in the clinical laboratory environment for the purposes of candidate causal variant discovery in patients affected with a variety of genetic disorders. The successful implementation of this technology for diagnosing genetic disorders requires a rapid, user-friendly method to annotate variants and generate short lists of clinically relevant variants of interest. This report describes Omicia’s Opal platform, a new software tool designed for variant discovery and interpretation in a clinical laboratory environment. The software allows clinical scientists to process, analyze, interpret and report on personal genome files. Materials & Methods To demonstrate the software, the authors describe the interactive use of the system for the rapid discovery of disease-causing variants using three cases. Results & Conclusion Here, the authors show the features of the Opal system and their use in uncovering variants of clinical significance. PMID:23895124

  13. StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics

    PubMed Central

    Ramirez-Gonzalez, Ricardo H.; Leggett, Richard M.; Waite, Darren; Thanki, Anil; Drou, Nizar; Caccamo, Mario; Davey, Robert

    2014-01-01

    Modern sequencing platforms generate enormous quantities of data in ever-decreasing amounts of time. Additionally, techniques such as multiplex sequencing allow one run to contain hundreds of different samples. With such data comes a significant challenge to understand its quality and to understand how the quality and yield are changing across instruments and over time. As well as the desire to understand historical data, sequencing centres often have a duty to provide clear summaries of individual run performance to collaborators or customers. We present StatsDB, an open-source software package for storage and analysis of next generation sequencing run metrics. The system has been designed for incorporation into a primary analysis pipeline, either at the programmatic level or via integration into existing user interfaces. Statistics are stored in an SQL database and APIs provide the ability to store and access the data while abstracting the underlying database design. This abstraction allows simpler, wider querying across multiple fields than is possible by the manual steps and calculation required to dissect individual reports, e.g. ”provide metrics about nucleotide bias in libraries using adaptor barcode X, across all runs on sequencer A, within the last month”. The software is supplied with modules for storage of statistics from FastQC, a commonly used tool for analysis of sequence reads, but the open nature of the database schema means it can be easily adapted to other tools. Currently at The Genome Analysis Centre (TGAC), reports are accessed through our LIMS system or through a standalone GUI tool, but the API and supplied examples make it easy to develop custom reports and to interface with other packages. PMID:24627795

  14. Performance comparison of next-generation sequencing platforms for determining HIV-1 coreceptor use

    PubMed Central

    Raymond, Stéphanie; Nicot, Florence; Jeanne, Nicolas; Delfour, Olivier; Carcenac, Romain; Lefebvre, Caroline; Cazabat, Michelle; Sauné, Karine; Delobel, Pierre; Izopet, Jacques

    2017-01-01

    The coreceptor used by HIV-1 must be determined before a CCR5 antagonist, part of the arsenal of antiretroviral drugs, is prescribed because viruses that enter cells using the CXCR4 coreceptor are responsible for treatment failure. HIV-1 tropism is also correlated with disease progression and so must be determined for virological studies. Tropism can be determined by next-generation sequencing (NGS), but not all of these new technologies have been fully validated for use in clinical practice. The Illumina NGS technology is used in many laboratories but its ability to predict HIV-1 tropism has not been evaluated while the 454 GS-Junior (Roche) is used for routine diagnosis. The genotypic prediction of HIV-1 tropism is based on sequencing the V3 region and interpreting the results with an appropriate algorithm. We compared the performances of the MiSeq (Illumina) and 454 GS-Junior (Roche) systems with a reference phenotypic assay. We used clinical samples for the NGS tropism predictions and assessed their ability to quantify CXCR4-using variants. The data show that the Illumina platform can be used to detect minor CXCR4-using variants in clinical practice but technical optimization are needed to improve quantification. PMID:28186189

  15. Next-Generation Sequencing Workflow for NSCLC Critical Samples Using a Targeted Sequencing Approach by Ion Torrent PGM™ Platform

    PubMed Central

    Vanni, Irene; Coco, Simona; Truini, Anna; Rusmini, Marta; Dal Bello, Maria Giovanna; Alama, Angela; Banelli, Barbara; Mora, Marco; Rijavec, Erika; Barletta, Giulia; Genova, Carlo; Biello, Federica; Maggioni, Claudia; Grossi, Francesco

    2015-01-01

    Next-generation sequencing (NGS) is a cost-effective technology capable of screening several genes simultaneously; however, its application in a clinical context requires an established workflow to acquire reliable sequencing results. Here, we report an optimized NGS workflow analyzing 22 lung cancer-related genes to sequence critical samples such as DNA from formalin-fixed paraffin-embedded (FFPE) blocks and circulating free DNA (cfDNA). Snap frozen and matched FFPE gDNA from 12 non-small cell lung cancer (NSCLC) patients, whose gDNA fragmentation status was previously evaluated using a multiplex PCR-based quality control, were successfully sequenced with Ion Torrent PGM™. The robust bioinformatic pipeline allowed us to correctly call both Single Nucleotide Variants (SNVs) and indels with a detection limit of 5%, achieving 100% specificity and 96% sensitivity. This workflow was also validated in 13 FFPE NSCLC biopsies. Furthermore, a specific protocol for low input gDNA capable of producing good sequencing data with high coverage, high uniformity, and a low error rate was also optimized. In conclusion, we demonstrate the feasibility of obtaining gDNA from FFPE samples suitable for NGS by performing appropriate quality controls. The optimized workflow, capable of screening low input gDNA, highlights NGS as a potential tool in the detection, disease monitoring, and treatment of NSCLC. PMID:26633390

  16. A platform for leveraging next generation sequencing for routine microbiology and public health use.

    PubMed

    Rusu, Laura I; Wyres, Kelly L; Reumann, Matthias; Queiroz, Carlos; Bojovschi, Alexe; Conway, Tom; Garg, Saurabh; Edwards, David J; Hogg, Geoff; Holt, Kathryn E

    2015-01-01

    Even with the advent of next-generation sequencing (NGS) technologies which have revolutionised the field of bacterial genomics in recent years, a major barrier still exists to the implementation of NGS for routine microbiological use (in public health and clinical microbiology laboratories). Such routine use would make a big difference to investigations of pathogen transmission and prevention/control of (sometimes lethal) infections. The inherent complexity and high frequency of data analyses on very large sets of bacterial DNA sequence data, the ability to ensure data provenance and automatically track and log all analyses for audit purposes, the need for quick and accurate results, together with an essential user-friendly interface for regular non-technical laboratory staff, are all critical requirements for routine use in a public health setting. There are currently no systems to answer positively to all these requirements, in an integrated manner. In this paper, we describe a system for sequence analysis and interpretation that is highly automated and tackles the issues raised earlier, and that is designed for use in diagnostic laboratories by healthcare workers with no specialist bioinformatics knowledge.

  17. Full-length novel MHC class I allele discovery by next-generation sequencing: two platforms are better than one

    PubMed Central

    Dudley, Dawn M.; Karl, Julie A.; Creager, Hannah M.; Bohn, Patrick S.; Wiseman, Roger W.; O'Connor, David H.

    2013-01-01

    Deep sequencing has revolutionized major histocompatibility complex (MHC) class I analysis of nonhuman primates by enabling high-throughput, economical, and comprehensive genotyping. Full-length MHC class I cDNA sequences, which are required to generate reagents such as MHC:peptide tetramers, cannot be directly obtained by short read deep sequencing. We combined data from two next-generation sequencing platforms to discover novel full-length MHC class I mRNA/cDNA transcripts in Chinese rhesus macaques. We first genotyped macaques by Roche/454 pyrosequencing using a 530 bp amplicon spanning the densely polymorphic exons 2 through 4 of the MHC class I loci that encode the peptide-binding region. We then mapped short paired-end 250 bp Illumina sequence reads spanning the full-length transcript to each 530 bp amplicon at high stringency and used paired-end information to reconstruct full-length allele sequences. We characterized 65 full-length sequences from 6 Chinese rhesus macaques. Overall, approximately 70% of the alleles distinguished in these 6 animals contained new sequence information, including 29 novel transcripts. The flexibility of this approach should make full-length MHC class I allele genotyping accessible for any nonhuman primate population of interest. We are currently optimizing this method for full-length characterization of other highly polymorphic, duplicated loci such as the MHC class II DRB and killer immunoglobulin-like receptors. We anticipate that this method will facilitate rapid expansion and near completion of sequence libraries of polymorphic loci, such as MHC class I, within a few years. PMID:24241691

  18. Full-length novel MHC class I allele discovery by next-generation sequencing: two platforms are better than one.

    PubMed

    Dudley, Dawn M; Karl, Julie A; Creager, Hannah M; Bohn, Patrick S; Wiseman, Roger W; O'Connor, David H

    2014-01-01

    Deep sequencing has revolutionized major histocompatibility complex (MHC) class I analysis of nonhuman primates by enabling high-throughput, economical, and comprehensive genotyping. Full-length MHC class I cDNA sequences, which are required to generate reagents such as MHC-peptide tetramers, cannot be directly obtained by short read deep sequencing. We combined data from two next-generation sequencing platforms to discover novel full-length MHC class I mRNA/cDNA transcripts in Chinese rhesus macaques. We first genotyped macaques by Roche/454 pyrosequencing using a 530-bp amplicon spanning the densely polymorphic exons 2 through 4 of the MHC class I loci that encode the peptide-binding region. We then mapped short paired-end 250 bp Illumina sequence reads spanning the full-length transcript to each 530-bp amplicon at high stringency and used paired-end information to reconstruct full-length allele sequences. We characterized 65 full-length sequences from six Chinese rhesus macaques. Overall, approximately 70 % of the alleles distinguished in these six animals contained new sequence information, including 29 novel transcripts. The flexibility of this approach should make full-length MHC class I allele genotyping accessible for any nonhuman primate population of interest. We are currently optimizing this method for full-length characterization of other highly polymorphic, duplicated loci such as the MHC class II DRB and killer immunoglobulin-like receptors. We anticipate that this method will facilitate rapid expansion and near completion of sequence libraries of polymorphic loci, such as MHC class I, within a few years.

  19. Next-generation sequencing: Application of a novel platform to analyze atypical iron disorders.

    PubMed

    McDonald, Cameron J; Ostini, Lesa; Wallace, Daniel F; Lyons, Alison; Crawford, Darrell H G; Subramaniam, V Nathan

    2015-11-01

    The development of targeted next-generation sequencing (NGS) applications now promises to be a clinically viable option for the diagnosis of rare disorders. This approach is proving to have significant utility where standardized testing has failed to identify the underlying molecular basis of disease. We have developed a unique targeted NGS panel for the systematic sequence-based analysis of atypical iron disorders. We report the analysis of 39 genes associated with iron regulation in eight cases of atypical iron dysregulation, in which five cases we identified the definitive causative mutation, and a possible causative mutation in a sixth. We further provide a molecular and cellular characterization study of one of these mutations (TFR2, p.I529N) in a familial case as proof of principle. Cellular analysis of the mutant protein indicates that this amino acid substitution affects the localization of the protein, which results in its retention in the endoplasmic reticulum and thus failure to function at the cell surface. Our unique NGS panel presents a rapid and cost-efficient approach to identify the underlying genetic cause in cases of atypical iron homeostasis disorders. Copyright © 2015 European Association for the Study of the Liver. Published by Elsevier B.V. All rights reserved.

  20. Comparison of Next-Generation Sequencing Panels and Platforms for Detection and Verification of Somatic Tumor Variants for Clinical Diagnostics.

    PubMed

    Misyura, Maksym; Zhang, Tong; Sukhai, Mahadeo A; Thomas, Mariam; Garg, Swati; Kamel-Reid, Suzanne; Stockley, Tracy L

    2016-11-01

    Use of next-generation sequencing to detect somatic variants in DNA extracted from formalin-fixed, paraffin-embedded tumor tissues poses a challenge for clinical molecular diagnostic laboratories because of variable DNA quality and quantity, and the potential to detect low allele frequency somatic variants difficult to verify by non-next-generation sequencing methods. We evaluated somatic variant detection performance of the MiSeq and Ion Proton benchtop sequencers using two commercially available panels, the TruSeq Amplicon Cancer Panel and the AmpliSeq Cancer Hotspot Panel Version 2. Both the MiSeq-TruSeq Amplicon Cancer Panel and Ion Proton-AmpliSeq Cancer Hotspot Panel Version 2 were comparable in terms of detection of somatic variants and allele frequency determination using DNA extracted from tumor tissue. Concordance was 100% between the panels for detection of somatic variants in genomic regions tested by both panels, including 27 variants present at low somatic allele frequency (<15%). Use of both the MiSeq and Ion Proton platforms in a combined workflow enabled detection of potentially actionable variants with importance for patient diagnosis, prognosis, or treatment in 49% (305/621) of cases. Overall, a combined workflow using both platforms enabled successful molecular profiling of 96% (621/644) of tumor samples, and provided an approach for verification of somatic variants not amenable to verification by Sanger sequencing (<15% variant allele frequency). Copyright © 2016 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  1. Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis

    PubMed Central

    Aslam, Luqman; Beal, Kathryn; Ann Blomberg, Le; Bouffard, Pascal; Burt, David W.; Crasta, Oswald; Crooijmans, Richard P. M. A.; Cooper, Kristal; Coulombe, Roger A.; De, Supriyo; Delany, Mary E.; Dodgson, Jerry B.; Dong, Jennifer J.; Evans, Clive; Frederickson, Karin M.; Flicek, Paul; Florea, Liliana; Folkerts, Otto; Groenen, Martien A. M.; Harkins, Tim T.; Herrero, Javier; Hoffmann, Steve; Megens, Hendrik-Jan; Jiang, Andrew; de Jong, Pieter; Kaiser, Pete; Kim, Heebal; Kim, Kyu-Won; Kim, Sungwon; Langenberger, David; Lee, Mi-Kyung; Lee, Taeheon; Mane, Shrinivasrao; Marcais, Guillaume; Marz, Manja; McElroy, Audrey P.; Modise, Thero; Nefedov, Mikhail; Notredame, Cédric; Paton, Ian R.; Payne, William S.; Pertea, Geo; Prickett, Dennis; Puiu, Daniela; Qioa, Dan; Raineri, Emanuele; Ruffier, Magali; Salzberg, Steven L.; Schatz, Michael C.; Scheuring, Chantel; Schmidt, Carl J.; Schroeder, Steven; Searle, Stephen M. J.; Smith, Edward J.; Smith, Jacqueline; Sonstegard, Tad S.; Stadler, Peter F.; Tafer, Hakim; Tu, Zhijian (Jake); Van Tassell, Curtis P.; Vilella, Albert J.; Williams, Kelly P.; Yorke, James A.; Zhang, Liqing; Zhang, Hong-Bin; Zhang, Xiaojun; Zhang, Yang; Reed, Kent M.

    2010-01-01

    A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest. PMID:20838655

  2. CIPHER: a flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction.

    PubMed

    Guzman, Carlos; D'Orso, Iván

    2017-08-08

    Next-generation sequencing (NGS) approaches are commonly used to identify key regulatory networks that drive transcriptional programs. Although these technologies are frequently used in biological studies, NGS data analysis remains a challenging, time-consuming, and often irreproducible process. Therefore, there is a need for a comprehensive and flexible workflow platform that can accelerate data processing and analysis so more time can be spent on functional studies. We have developed an integrative, stand-alone workflow platform, named CIPHER, for the systematic analysis of several commonly used NGS datasets including ChIP-seq, RNA-seq, MNase-seq, DNase-seq, GRO-seq, and ATAC-seq data. CIPHER implements various open source software packages, in-house scripts, and Docker containers to analyze and process single-ended and pair-ended datasets. CIPHER's pipelines conduct extensive quality and contamination control checks, as well as comprehensive downstream analysis. A typical CIPHER workflow includes: (1) raw sequence evaluation, (2) read trimming and adapter removal, (3) read mapping and quality filtering, (4) visualization track generation, and (5) extensive quality control assessment. Furthermore, CIPHER conducts downstream analysis such as: narrow and broad peak calling, peak annotation, and motif identification for ChIP-seq, differential gene expression analysis for RNA-seq, nucleosome positioning for MNase-seq, DNase hypersensitive site mapping, site annotation and motif identification for DNase-seq, analysis of nascent transcription from Global-Run On (GRO-seq) data, and characterization of chromatin accessibility from ATAC-seq datasets. In addition, CIPHER contains an "analysis" mode that completes complex bioinformatics tasks such as enhancer discovery and provides functions to integrate various datasets together. Using public and simulated data, we demonstrate that CIPHER is an efficient and comprehensive workflow platform that can analyze several NGS

  3. A comprehensive transcriptome assembly of pigeonpea (Cajanauscajan L.) using sanger and second-generation sequencing platforms

    USDA-ARS?s Scientific Manuscript database

    A comprehensive transcriptome assembly for pigeonpea has been developed by analyzing 128.9 million short Illumina GA IIx single end reads, 2.19 million single end FLX/454 reads, and 18,353 Sanger expressed sequenced tags (ESTs) from more than 16 genotypes. The resultant transcriptome assembly, refer...

  4. A two-dimensional pooling strategy for rare variant detection on next-generation sequencing platforms.

    PubMed

    Zuzarte, Philip C; Denroche, Robert E; Fehringer, Gordon; Katzov-Eckert, Hagit; Hung, Rayjean J; McPherson, John D

    2014-01-01

    We describe a method for pooling and sequencing DNA from a large number of individual samples while preserving information regarding sample identity. DNA from 576 individuals was arranged into four 12 row by 12 column matrices and then pooled by row and by column resulting in 96 total pools with 12 individuals in each pool. Pooling of DNA was carried out in a two-dimensional fashion, such that DNA from each individual is present in exactly one row pool and exactly one column pool. By considering the variants observed in the rows and columns of a matrix we are able to trace rare variants back to the specific individuals that carry them. The pooled DNA samples were enriched over a 250 kb region previously identified by GWAS to significantly predispose individuals to lung cancer. All 96 pools (12 row and 12 column pools from 4 matrices) were barcoded and sequenced on an Illumina HiSeq 2000 instrument with an average depth of coverage greater than 4,000×. Verification based on Ion PGM sequencing confirmed the presence of 91.4% of confidently classified SNVs assayed. In this way, each individual sample is sequenced in multiple pools providing more accurate variant calling than a single pool or a multiplexed approach. This provides a powerful method for rare variant detection in regions of interest at a reduced cost to the researcher.

  5. Evaluation and comparison of two commercially available targeted next-generation sequencing platforms to assist oncology decision making

    PubMed Central

    Weiss, Glen J; Hoff, Brandi R; Whitehead, Robert P; Sangal, Ashish; Gingrich, Susan A; Penny, Robert J; Mallery, David W; Morris, Scott M; Thompson, Eric J; Loesch, David M; Khemka, Vivek

    2015-01-01

    Background It is widely acknowledged that there is value in examining cancers for genomic aberrations via next-generation sequencing (NGS). How commercially available NGS platforms compare with each other, and the clinical utility of the reported actionable results, are not well known. During the course of the current study, the Foundation One (F1) test generated data on a combination of somatic mutations, insertion and deletion polymorphisms, chromosomal abnormalities, and deoxyribonucleic acid (DNA) copy number changes at ~250× coverage, while the Paradigm Cancer Diagnostic (PCDx) test generated the same type of data at >5,000× coverage, plus provided messenger RNA (mRNA) expression levels. We sought to compare and evaluate paired formalin-fixed paraffin-embedded tumor tissue using these two platforms. Methods Samples from patients with advanced solid tumors were submitted to both the F1 and PCDx vendors for NGS analysis. Turnaround time (TAT) was calculated. Biomarkers were considered clinically actionable if they had a published association with treatment response in humans and were assigned to the following categories: commercially available drug (CA), clinical trial drug (CT), or neither option (hereafter referred to as “None”). Results The demographics of the 21 unique patient tumor samples included ten men and eleven women, with a median age of 56 years. Due to insufficient archival tissue from the same collection period, in one case, we used samples from different collections. PCDx reported first results faster than F1 in 20 cases. When received at both vendors on the same day, PCDx reported first results for 14 of 15 cases, with a median TAT of 9 days earlier than F1 (P<0.0001). Categorization of CA compared to CT and none significantly favored PCDx (P=0.012). Conclusion In the current analysis, commercially available NGS platforms provided clinically relevant actionable targets (CA or CT) in 47%–67% of diverse cancer types. In the samples

  6. Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo) genome assembly and analysis

    USDA-ARS?s Scientific Manuscript database

    Next-generation sequencing technologies were used to rapidly and efficiently sequence the genome of the domestic turkey (Meleagris gallopavo). The current genome assembly (~1.1 Gb) includes 917 Mb of sequence assigned to chromosomes. Innate heterozygosity of the sequenced bird allowed discovery of...

  7. Parallel tagged amplicon sequencing of transcriptome-based genetic markers for Triturus newts with the Ion Torrent next-generation sequencing platform.

    PubMed

    Wielstra, B; Duijm, E; Lagler, P; Lammers, Y; Meilink, W R M; Ziermann, J M; Arntzen, J W

    2014-09-01

    Next-generation sequencing is a fast and cost-effective way to obtain sequence data for nonmodel organisms for many markers and for many individuals. We describe a protocol through which we obtain orthologous markers for the crested newts (Amphibia: Salamandridae: Triturus), suitable for analysis of interspecific hybridization. We use transcriptome data of a single Triturus species and design 96 primer pairs that amplify c. 180 bp fragments positioned in 3-prime untranslated regions. Next, these markers are tested with uniplex PCR for a set of species spanning the taxonomical width of the genus Triturus. The 52 markers that consistently show a single band of expected length at gel electrophoreses for all tested crested newt species are then amplified in five multiplex PCRs (with a plexity of ten or eleven) for 132 individual newts: a set of 84 representing the seven (candidate) species and a set of 48 from a presumed hybrid population. After pooling multiplexes per individual, unique tags are ligated to link amplicons to individuals. Subsequently, individuals are pooled equimolar and sequenced on the Ion Torrent next-generation sequencing platform. A bioinformatics pipeline identifies the alleles and recodes these to a genotypic format. Next, we test the utility of our markers. baps allocates the 84 crested newt individuals representing (candidate) species to their expected (candidate) species, confirming the markers are suitable for species delineation. newhybrids, a hybrid index and hiest confirm the 48 individuals from the presumed hybrid population to be genetically admixed, illustrating the potential of the markers to identify interspecific hybridization. We expect the set of markers we designed to provide a high resolving power for analysis of hybridization in Triturus.

  8. Parallel tagged amplicon sequencing of transcriptome-based genetic markers for Triturus newts with the Ion Torrent next-generation sequencing platform

    PubMed Central

    Wielstra, B; Duijm, E; Lagler, P; Lammers, Y; Meilink, W R M; Ziermann, J M; Arntzen, J W

    2014-01-01

    Next-generation sequencing is a fast and cost-effective way to obtain sequence data for nonmodel organisms for many markers and for many individuals. We describe a protocol through which we obtain orthologous markers for the crested newts (Amphibia: Salamandridae: Triturus), suitable for analysis of interspecific hybridization. We use transcriptome data of a single Triturus species and design 96 primer pairs that amplify c. 180 bp fragments positioned in 3-prime untranslated regions. Next, these markers are tested with uniplex PCR for a set of species spanning the taxonomical width of the genus Triturus. The 52 markers that consistently show a single band of expected length at gel electrophoreses for all tested crested newt species are then amplified in five multiplex PCRs (with a plexity of ten or eleven) for 132 individual newts: a set of 84 representing the seven (candidate) species and a set of 48 from a presumed hybrid population. After pooling multiplexes per individual, unique tags are ligated to link amplicons to individuals. Subsequently, individuals are pooled equimolar and sequenced on the Ion Torrent next-generation sequencing platform. A bioinformatics pipeline identifies the alleles and recodes these to a genotypic format. Next, we test the utility of our markers. baps allocates the 84 crested newt individuals representing (candidate) species to their expected (candidate) species, confirming the markers are suitable for species delineation. newhybrids, a hybrid index and hiest confirm the 48 individuals from the presumed hybrid population to be genetically admixed, illustrating the potential of the markers to identify interspecific hybridization. We expect the set of markers we designed to provide a high resolving power for analysis of hybridization in Triturus. PMID:24571307

  9. Multi-platform and cross-methodological reproducibility of transcriptome profiling by RNA-seq in the ABRF Next-Generation Sequencing Study

    PubMed Central

    Nicolet, Charles M.; Grove, Deborah; Levy, Shawn; Farmerie, William; Viale, Agnes; Wright, Chris; Schweitzer, Peter A.; Gao, Yuan; Kim, Dewey; Boland, Joe; Hicks, Belynda; Kim, Ryan; Chhangawala, Sagar; Jafari, Nadereh; Raghavachari, Nalini; Gandara, Jorge; Garcia-Reyero, Natàlia; Hendrickson, Cynthia; Roberson, David; Rosenfeld, Jeffrey; Smith, Todd; Underwood, Jason G.; Wang, May; Zumbo, Paul; Baldwin, Don A.; Grills, George S.; Mason, Christopher E.

    2014-01-01

    High-throughput RNA sequencing (RNA-seq) dramatically expands the potential for novel genomics discoveries, but the wide variety of platforms, protocols and performance has created the need for comprehensive reference data. Here we describe the Association of Biomolecular Resource Facilities next-generation sequencing (ABRF-NGS) study on RNA-seq. We tested replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols (polyA-selected, ribo-depleted, size-selected and degraded) on five sequencing platforms (Illumina HiSeq, Life Technologies’ PGM and Proton, Pacific Biosciences RS and Roche’s 454). The results show high intra-platform and inter-platform concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms. These data also demonstrate that ribosomal RNA depletion can both enable effective analysis of degraded RNA samples and be readily compared to polyA-enriched fractions. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq. PMID:25150835

  10. Towards allele-level human leucocyte antigens genotyping - assessing two next-generation sequencing platforms: Ion Torrent Personal Genome Machine and Illumina MiSeq.

    PubMed

    Duke, J L; Lind, C; Mackiewicz, K; Ferriola, D; Papazoglou, A; Derbeneva, O; Wallace, D; Monos, D S

    2015-10-01

    Human leucocyte antigens (HLA) typing has been a challenge due to extreme polymorphism of the HLA genes and limitations of the current technologies and protocols used for their characterization. Recently, next-generation sequencing techniques have been shown to be a well-suited technology for the complete characterization of the HLA genes. However, a comprehensive assessment of the different platforms for HLA typing, describing the limitations and advantages of each of them, has not been presented. We have compared the Ion Torrent Personal Genome Machine (PGM) and Illumina MiSeq, currently the two most frequently used platforms for diagnostic applications, for a number of metrics including total output, quality score per position across the reads and error rates after alignment which can all affect the accuracy of HLA genotyping. For this purpose, we have used one homozygous and three heterozygous well-characterized samples, at HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1. The total output of bases produced by the MiSeq was higher, and they have higher quality scores and a lower overall error rate than the PGM. The MiSeq also has a higher fidelity when sequencing through homopolymer regions up to 9 bp in length. The need to set phase between distant polymorphic sites was more readily achieved with MiSeq using paired-end sequencing of fragments that are longer than those obtained with PGM. Additionally, we have assessed the workflows of the different platforms for complexity of sample preparation, sequencer operation and turnaround time. The effects of data quality and quantity can impact the genotyping results; having an adequate amount of good quality data to analyse will be imperative for confident HLA genotyping. The overall turnaround time can be very comparable between the two platforms; however, the complexity of sample preparation is higher with PGM, while the actual sequencing time is longer with MiSeq.

  11. Graphical contig analyzer for all sequencing platforms (G4ALL): a new stand-alone tool for finishing and draft generation of bacterial genomes.

    PubMed

    Ramos, Rommel Thiago Jucá; Carneiro, Adriana R; Caracciolo, Pablo H; Azevedo, Vasco; Schneider, Maria Paula C; Barh, Debmalya; Silva, Artur

    2013-01-01

    Genome assembly has always been complicated due to the inherent difficulties of sequencing technologies, as well the computational methods used to process sequences. Although many of the problems for the generation of contigs from reads are well known, especially those involving short reads, the orientation and ordination of contigs in the finishing stages is still very challenging and time consuming, as it requires the manual curation of the contigs to guarantee correct identification them and prevent misassembly. Due to the large numbers of sequences that are produced, especially from the reads produced by next generation sequencers, this process demands considerable manual effort, and there are few software options available to facilitate the process. To address this problem, we have developed the Graphic Contig Analyzer for All Sequencing Platforms (G4ALL): a stand-alone multi-user tool that facilitates the editing of the contigs produced in the assembly process. Besides providing information on the gene products contained in each contig, obtained through a search of the available biological databases, G4ALL produces a scaffold of the genome, based on the overlap of the contigs after curation. THE SOFTWARE IS AVAILABLE AT: http://www.genoma.ufpa.br/rramos/softwares/g4all.xhtml.

  12. Graphical contig analyzer for all sequencing platforms (G4ALL): a new stand-alone tool for finishing and draft generation of bacterial genomes

    PubMed Central

    Ramos, Rommel Thiago Jucá; Carneiro, Adriana R; Caracciolo, Pablo H; Azevedo, Vasco; Schneider, Maria Paula C; Barh, Debmalya; Silva, Artur

    2013-01-01

    Genome assembly has always been complicated due to the inherent difficulties of sequencing technologies, as well the computational methods used to process sequences. Although many of the problems for the generation of contigs from reads are well known, especially those involving short reads, the orientation and ordination of contigs in the finishing stages is still very challenging and time consuming, as it requires the manual curation of the contigs to guarantee correct identification them and prevent misassembly. Due to the large numbers of sequences that are produced, especially from the reads produced by next generation sequencers, this process demands considerable manual effort, and there are few software options available to facilitate the process. To address this problem, we have developed the Graphic Contig Analyzer for All Sequencing Platforms (G4ALL): a stand-alone multi-user tool that facilitates the editing of the contigs produced in the assembly process. Besides providing information on the gene products contained in each contig, obtained through a search of the available biological databases, G4ALL produces a scaffold of the genome, based on the overlap of the contigs after curation. Availability The software is available at: http://www.genoma.ufpa.br/rramos/softwares/g4all.xhtml PMID:23888102

  13. AG-NGS: a powerful and user-friendly computing application for the semi-automated preparation of next-generation sequencing libraries using open liquid handling platforms.

    PubMed

    Callejas, Sergio; Álvarez, Rebeca; Benguria, Alberto; Dopazo, Ana

    2014-01-01

    Next-generation sequencing (NGS) is becoming one of the most widely used technologies in the field of genomics. Library preparation is one of the most critical, hands-on, and time-consuming steps in the NGS workflow. Each library must be prepared in an independent well, increasing the number of hours required for a sequencing run and the risk of human-introduced error. Automation of library preparation is the best option to avoid these problems. With this in mind, we have developed automatic genomics NGS (AG-NGS), a computing application that allows an open liquid handling platform to be transformed into a library preparation station without losing the potential of an open platform. Implementation of AG-NGS does not require programming experience, and the application has also been designed to minimize implementation costs. Automated library preparation with AG-NGS generated high-quality libraries from different samples, demonstrating its efficiency, and all quality control parameters fell within the range of optimal values.

  14. Efficacy of a 3rd generation high-throughput sequencing platform for analyses of 16S rRNA genes from environmental samples.

    PubMed

    Mosher, Jennifer J; Bernberg, Erin L; Shevchenko, Olga; Kan, Jinjun; Kaplan, Louis A

    2013-11-01

    Longer sequences of the bacterial 16S rRNA gene could provide greater phylogenetic and taxonomic resolutions and advance knowledge of population dynamics within complex natural communities. We assessed the accuracy of a Pacific Biosciences (PacBio) single molecule, real time (SMRT) sequencing based on DNA polymerization, a promising 3rd generation high-throughput technique, and compared this to the 2nd generation Roche 454 pyrosequencing platform. Amplicons of the 16S rRNA gene from a known isolate, Shewanella oneidensis MR1, and environmental samples from two streambed habitats, rocks and sediments, and a riparian zone soil, were analyzed. On the PacBio we analyzed ~500 bp amplicons that covered the V1-V3 regions and the full 1500 bp amplicons of the V1-V9 regions. On the Roche 454 we analyzed the ~500 bp amplicons. Error rates associated with the isolate were lowest with the Roche 454 method (2%), increased by more than 2-fold for the 500 bp amplicons with the PacBio SMRT chip (4-5%), and by more than 8-fold for the full gene with the PacBio SMRT chip (17-18%). Higher error rates with the PacBio SMRT chip artificially inflated estimates of richness and lowered estimates of coverage for environmental samples. The 3rd generation sequencing technology we evaluated does not provide greater phylogenetic and taxonomic resolutions for studies of microbial ecology. © 2013.

  15. Robustness of Massively Parallel Sequencing Platforms

    PubMed Central

    Kavak, Pınar; Yüksel, Bayram; Aksu, Soner; Kulekci, M. Oguzhan; Güngör, Tunga; Hach, Faraz; Şahinalp, S. Cenk; Alkan, Can; Sağıroğlu, Mahmut Şamil

    2015-01-01

    The improvements in high throughput sequencing technologies (HTS) made clinical sequencing projects such as ClinSeq and Genomics England feasible. Although there are significant improvements in accuracy and reproducibility of HTS based analyses, the usability of these types of data for diagnostic and prognostic applications necessitates a near perfect data generation. To assess the usability of a widely used HTS platform for accurate and reproducible clinical applications in terms of robustness, we generated whole genome shotgun (WGS) sequence data from the genomes of two human individuals in two different genome sequencing centers. After analyzing the data to characterize SNPs and indels using the same tools (BWA, SAMtools, and GATK), we observed significant number of discrepancies in the call sets. As expected, the most of the disagreements between the call sets were found within genomic regions containing common repeats and segmental duplications, albeit only a small fraction of the discordant variants were within the exons and other functionally relevant regions such as promoters. We conclude that although HTS platforms are sufficiently powerful for providing data for first-pass clinical tests, the variant predictions still need to be confirmed using orthogonal methods before using in clinical applications. PMID:26382624

  16. ORIO (Online Resource for Integrative Omics): a web-based platform for rapid integration of next generation sequencing data.

    PubMed

    Lavender, Christopher A; Shapiro, Andrew J; Burkholder, Adam B; Bennett, Brian D; Adelman, Karen; Fargo, David C

    2017-04-11

    Established and emerging next generation sequencing (NGS)-based technologies allow for genome-wide interrogation of diverse biological processes. However, accessibility of NGS data remains a problem, and few user-friendly resources exist for integrative analysis of NGS data from different sources and experimental techniques. Here, we present Online Resource for Integrative Omics (ORIO; https://orio.niehs.nih.gov/), a web-based resource with an intuitive user interface for rapid analysis and integration of NGS data. To use ORIO, the user specifies NGS data of interest along with a list of genomic coordinates. Genomic coordinates may be biologically relevant features from a variety of sources, such as ChIP-seq peaks for a given protein or transcription start sites from known gene models. ORIO first iteratively finds read coverage values at each genomic feature for each NGS dataset. Data are then integrated using clustering-based approaches, giving hierarchical relationships across NGS datasets and separating individual genomic features into groups. In focusing its analysis on read coverage, ORIO makes limited assumptions about the analyzed data; this allows the tool to be applied across data from a variety of experiments and techniques. Results from analysis are presented in dynamic displays alongside user-controlled statistical tests, supporting rapid statistical validation of observed results. We emphasize the versatility of ORIO through diverse examples, ranging from NGS data quality control to characterization of enhancer regions and integration of gene expression information. Easily accessible on a public web server, we anticipate wide use of ORIO in genome-wide investigations by life scientists.

  17. Archiving next generation sequencing data.

    PubMed

    Shumway, Martin; Cochrane, Guy; Sugawara, Hideaki

    2010-01-01

    Next generation sequencing platforms are producing biological sequencing data in unprecedented amounts. The partners of the International Nucleotide Sequencing Database Collaboration, which includes the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI), and the DNA Data Bank of Japan (DDBJ), have established the Sequence Read Archive (SRA) to provide the scientific community with an archival destination for next generation data sets. The SRA is now accessible at http://www.ncbi.nlm.nih.gov/Traces/sra from NCBI, at http://www.ebi.ac.uk/ena from EBI and at http://www.ddbj.nig.ac.jp/sub/trace_sra-e.html from DDBJ. Users of these resources can obtain data sets deposited in any of the three SRA instances. Links and submission instructions are provided.

  18. An effective screening strategy for deafness in combination with a next-generation sequencing platform: a consecutive analysis

    PubMed Central

    Sakuma, Naoko; Moteki, Hideaki; Takahashi, Masahiro; Nishio, Shin-ya; Arai, Yasuhiro; Yamashita, Yukiko; Oridate, Nobuhiko; Usami, Shin-ichi

    2016-01-01

    The diagnosis of the genetic etiology of deafness contributes to the clinical management of patients. We performed the following four genetic tests in three stages for 52 consecutive deafness subjects in one facility. We used the Invader assay for 46 mutations in 13 genes and Sanger sequencing for the GJB2 gene or SLC26A4 gene in the first-stage test, the TaqMan genotyping assay in the second-stage test and targeted exon sequencing using massively parallel DNA sequencing in the third-stage test. Overall, we identified the genetic cause in 40% (21/52) of patients. The diagnostic rates of autosomal dominant, autosomal recessive and sporadic cases were 50%, 60% and 34%, respectively. When the sporadic cases with congenital and severe hearing loss were selected, the diagnostic rate rose to 48%. The combination approach using these genetic tests appears to be useful as a diagnostic tool for deafness patients. We recommended that genetic testing for the screening of common mutations in deafness genes using the Invader assay or TaqMan genotyping assay be performed as the initial evaluation. For the remaining undiagnosed cases, targeted exon sequencing using massively parallel DNA sequencing is clinically and economically beneficial. PMID:26763877

  19. An effective screening strategy for deafness in combination with a next-generation sequencing platform: a consecutive analysis.

    PubMed

    Sakuma, Naoko; Moteki, Hideaki; Takahashi, Masahiro; Nishio, Shin-ya; Arai, Yasuhiro; Yamashita, Yukiko; Oridate, Nobuhiko; Usami, Shin-ichi

    2016-03-01

    The diagnosis of the genetic etiology of deafness contributes to the clinical management of patients. We performed the following four genetic tests in three stages for 52 consecutive deafness subjects in one facility. We used the Invader assay for 46 mutations in 13 genes and Sanger sequencing for the GJB2 gene or SLC26A4 gene in the first-stage test, the TaqMan genotyping assay in the second-stage test and targeted exon sequencing using massively parallel DNA sequencing in the third-stage test. Overall, we identified the genetic cause in 40% (21/52) of patients. The diagnostic rates of autosomal dominant, autosomal recessive and sporadic cases were 50%, 60% and 34%, respectively. When the sporadic cases with congenital and severe hearing loss were selected, the diagnostic rate rose to 48%. The combination approach using these genetic tests appears to be useful as a diagnostic tool for deafness patients. We recommended that genetic testing for the screening of common mutations in deafness genes using the Invader assay or TaqMan genotyping assay be performed as the initial evaluation. For the remaining undiagnosed cases, targeted exon sequencing using massively parallel DNA sequencing is clinically and economically beneficial.

  20. Comprehensive transcriptome assembly of chickpea (Cicer arietinum L.) using Sanger and next generation sequencing platforms: development and applications

    USDA-ARS?s Scientific Manuscript database

    A high-quality transcriptome assembly for chickpea has been developed using ~135 million Illumina single-end reads, 7.12 million single-end FLX/454 reads, and 139 thousand Sanger expressed sequence tags (ESTs). This hybrid transcriptome assembly, which we refer to as the "Cicer arietinum Transcripto...

  1. Profile of bacterial communities in South African mine-water samples using Illumina next-generation sequencing platform.

    PubMed

    Keshri, Jitendra; Mankazana, Boitumelo B J; Momba, Maggy N B

    2015-04-01

    Mine water is an example of an extreme environment that contains a large number of diverse and specific bacteria. It is imperative to gain an understanding of these bacterial communities in order to develop effective strategies for the bioremediation of polluted aquatic systems. In this study, the high-throughput sequencing approach was used to characterize the bacterial communities in two different mine waters of South Africa: vanadium and gold mine water. Over 2629 operational taxonomic units (OTUs) were recovered from 15,802 reads of the 16S ribosomal RNA (rRNA) gene. They represented 8 phyla, 43 orders, 84 families and 105 genera. Proteobacteria and unclassified bacterial sequences were the most dominant. Apart from these, Firmicutes, Bacteroidetes, Actinobacteria, Candidate phylum OD1, Cyanobacteria, Verrucomicrobia and Deinococcus-Thermus were the recovered phyla, although their relative abundance differed between both the mine-water samples. Yet, diversity indices suggested that the bacterial communities inhabiting the vanadium mine water were more diverse than those in gold mine water. Interestingly, substantial percentages of the reads from either sample (58 % in vanadium and 17 % in gold mine water) could not be assigned to any phylum and remained unclassified, suggesting hitherto unidentified populations, and vast untapped microbial diversity. Overall, the results of this study exhibited bacterial community structures with high diversity in mine water, which can be explored further for their role in bioremediation and environmental management.

  2. Comprehensive transcriptome assembly of Chickpea (Cicer arietinum L.) using sanger and next generation sequencing platforms: development and applications.

    PubMed

    Kudapa, Himabindu; Azam, Sarwar; Sharpe, Andrew G; Taran, Bunyamin; Li, Rong; Deonovic, Benjamin; Cameron, Connor; Farmer, Andrew D; Cannon, Steven B; Varshney, Rajeev K

    2014-01-01

    A comprehensive transcriptome assembly of chickpea has been developed using 134.95 million Illumina single-end reads, 7.12 million single-end FLX/454 reads and 139,214 Sanger expressed sequence tags (ESTs) from >17 genotypes. This hybrid transcriptome assembly, referred to as Cicer arietinumTranscriptome Assembly version 2 (CaTA v2, available at http://data.comparative-legumes.org/transcriptomes/cicar/lista_cicar-201201), comprising 46,369 transcript assembly contigs (TACs) has an N50 length of 1,726 bp and a maximum contig size of 15,644 bp. Putative functions were determined for 32,869 (70.8%) of the TACs and gene ontology assignments were determined for 21,471 (46.3%). The new transcriptome assembly was compared with the previously available chickpea transcriptome assemblies as well as to the chickpea genome. Comparative analysis of CaTA v2 against transcriptomes of three legumes - Medicago, soybean and common bean, resulted in 27,771 TACs common to all three legumes indicating strong conservation of genes across legumes. CaTA v2 was also used for identification of simple sequence repeats (SSRs) and intron spanning regions (ISRs) for developing molecular markers. ISRs were identified by aligning TACs to the Medicago genome, and their putative mapping positions at chromosomal level were identified using transcript map of chickpea. Primer pairs were designed for 4,990 ISRs, each representing a single contig for which predicted positions are inferred and distributed across eight linkage groups. A subset of randomly selected ISRs representing all eight chickpea linkage groups were validated on five chickpea genotypes and showed 20% polymorphism with average polymorphic information content (PIC) of 0.27. In summary, the hybrid transcriptome assembly developed and novel markers identified can be used for a variety of applications such as gene discovery, marker-trait association, diversity analysis etc., to advance genetics research and breeding applications in

  3. Automatic Command Sequence Generation

    NASA Technical Reports Server (NTRS)

    Fisher, Forest; Gladded, Roy; Khanampompan, Teerapat

    2007-01-01

    Automatic Sequence Generator (Autogen) Version 3.0 software automatically generates command sequences for the Mars Reconnaissance Orbiter (MRO) and several other JPL spacecraft operated by the multi-mission support team. Autogen uses standard JPL sequencing tools like APGEN, ASP, SEQGEN, and the DOM database to automate the generation of uplink command products, Spacecraft Command Message Format (SCMF) files, and the corresponding ground command products, DSN Keywords Files (DKF). Autogen supports all the major multi-mission mission phases including the cruise, aerobraking, mapping/science, and relay mission phases. Autogen is a Perl script, which functions within the mission operations UNIX environment. It consists of two parts: a set of model files and the autogen Perl script. Autogen encodes the behaviors of the system into a model and encodes algorithms for context sensitive customizations of the modeled behaviors. The model includes knowledge of different mission phases and how the resultant command products must differ for these phases. The executable software portion of Autogen, automates the setup and use of APGEN for constructing a spacecraft activity sequence file (SASF). The setup includes file retrieval through the DOM (Distributed Object Manager), an object database used to store project files. This step retrieves all the needed input files for generating the command products. Depending on the mission phase, Autogen also uses the ASP (Automated Sequence Processor) and SEQGEN to generate the command product sent to the spacecraft. Autogen also provides the means for customizing sequences through the use of configuration files. By automating the majority of the sequencing generation process, Autogen eliminates many sequence generation errors commonly introduced by manually constructing spacecraft command sequences. Through the layering of commands into the sequence by a series of scheduling algorithms, users are able to rapidly and reliably construct the

  4. Comprehensive Transcriptome Assembly of Chickpea (Cicer arietinum L.) Using Sanger and Next Generation Sequencing Platforms: Development and Applications

    PubMed Central

    Sharpe, Andrew G.; Taran, Bunyamin; Li, Rong; Deonovic, Benjamin; Cameron, Connor; Farmer, Andrew D.; Cannon, Steven B.; Varshney, Rajeev K.

    2014-01-01

    A comprehensive transcriptome assembly of chickpea has been developed using 134.95 million Illumina single-end reads, 7.12 million single-end FLX/454 reads and 139,214 Sanger expressed sequence tags (ESTs) from >17 genotypes. This hybrid transcriptome assembly, referred to as Cicer arietinum Transcriptome Assembly version 2 (CaTA v2, available at http://data.comparative-legumes.org/transcriptomes/cicar/lista_cicar-201201), comprising 46,369 transcript assembly contigs (TACs) has an N50 length of 1,726 bp and a maximum contig size of 15,644 bp. Putative functions were determined for 32,869 (70.8%) of the TACs and gene ontology assignments were determined for 21,471 (46.3%). The new transcriptome assembly was compared with the previously available chickpea transcriptome assemblies as well as to the chickpea genome. Comparative analysis of CaTA v2 against transcriptomes of three legumes - Medicago, soybean and common bean, resulted in 27,771 TACs common to all three legumes indicating strong conservation of genes across legumes. CaTA v2 was also used for identification of simple sequence repeats (SSRs) and intron spanning regions (ISRs) for developing molecular markers. ISRs were identified by aligning TACs to the Medicago genome, and their putative mapping positions at chromosomal level were identified using transcript map of chickpea. Primer pairs were designed for 4,990 ISRs, each representing a single contig for which predicted positions are inferred and distributed across eight linkage groups. A subset of randomly selected ISRs representing all eight chickpea linkage groups were validated on five chickpea genotypes and showed 20% polymorphism with average polymorphic information content (PIC) of 0.27. In summary, the hybrid transcriptome assembly developed and novel markers identified can be used for a variety of applications such as gene discovery, marker-trait association, diversity analysis etc., to advance genetics research and breeding applications in

  5. A Comprehensive Transcriptome Assembly of Pigeonpea (Cajanus cajan L.) using Sanger and Second-Generation Sequencing Platforms

    PubMed Central

    Kudapa, Himabindu; Bharti, Arvind K.; Cannon, Steven B.; Farmer, Andrew D.; Mulaosmanovic, Benjamin; Kramer, Robin; Bohra, Abhishek; Weeks, Nathan T.; Crow, John A.; Tuteja, Reetu; Shah, Trushar; Dutta, Sutapa; Gupta, Deepak K.; Singh, Archana; Gaikwad, Kishor; Sharma, Tilak R.; May, Gregory D.; Singh, Nagendra K.; Varshney, Rajeev K.

    2012-01-01

    A comprehensive transcriptome assembly for pigeonpea has been developed by analyzing 128.9 million short Illumina GA IIx single end reads, 2.19 million single end FLX/454 reads, and 18 353 Sanger expressed sequenced tags from more than 16 genotypes. The resultant transcriptome assembly, referred to as CcTA v2, comprised 21 434 transcript assembly contigs (TACs) with an N50 of 1510 bp, the largest one being ∼8 kb. Of the 21 434 TACs, 16 622 (77.5%) could be mapped on to the soybean genome build 1.0.9 under fairly stringent alignment parameters. Based on knowledge of intron junctions, 10 009 primer pairs were designed from 5033 TACs for amplifying intron spanning regions (ISRs). By using in silico mapping of BAC-end-derived SSR loci of pigeonpea on the soybean genome as a reference, putative mapping positions at the chromosome level were predicted for 6284 ISR markers, covering all 11 pigeonpea chromosomes. A subset of 128 ISR markers were analyzed on a set of eight genotypes. While 116 markers were validated, 70 markers showed one to three alleles, with an average of 0.16 polymorphism information content (PIC) value. In summary, the CcTA v2 transcript assembly and ISR markers will serve as a useful resource to accelerate genetic research and breeding applications in pigeonpea. PMID:22241453

  6. Next generation sequencing based approaches to epigenomics

    PubMed Central

    Marra, Marco A.

    2010-01-01

    Next generation sequencing has brought epigenomic studies to the forefront of current research. The power of massively parallel sequencing coupled to innovative molecular and computational techniques has allowed researchers to profile the epigenome at resolutions that were unimaginable only a few years ago. With early proof of concept studies published, the field is now moving into the next phase where the importance of method standardization and rigorous quality control are becoming paramount. In this review we will describe methodologies that have been developed to profile the epigenome using next generation sequencing platforms. We will discuss these in terms of library preparation, sequence platforms and analysis techniques. PMID:21266347

  7. Next-Generation Sequencing.

    PubMed

    Le Gallo, Matthieu; Lozy, Fred; Bell, Daphne W

    2017-01-01

    Endometrial cancers are the most frequently diagnosed gynecological malignancy and were expected to be the seventh leading cause of cancer death among American women in 2015. The majority of endometrial cancers are of serous or endometrioid histology. Most human tumors, including endometrial tumors, are driven by the acquisition of pathogenic mutations in cancer genes. Thus, the identification of somatic mutations within tumor genomes is an entry point toward cancer gene discovery. However, efforts to pinpoint somatic mutations in human cancers have, until recently, relied on high-throughput sequencing of single genes or gene families using Sanger sequencing. Although this approach has been fruitful, the cost and throughput of Sanger sequencing generally prohibits systematic sequencing of the ~22,000 genes that make up the exome. The recent development of next-generation sequencing technologies changed this paradigm by providing the capability to rapidly sequence exomes, transcriptomes, and genomes at relatively low cost. Remarkably, the application of this technology to catalog the mutational landscapes of endometrial tumor exomes, transcriptomes, and genomes has revealed, for the first time, that serous and endometrioid endometrial cancers can be classified into four distinct molecular subgroups. In this chapter, we overview the characteristic genomic features of each subgroup and discuss the known and putative cancer genes that have emerged from next-generation sequencing of endometrial carcinomas.

  8. Direct Chloroplast Sequencing: Comparison of Sequencing Platforms and Analysis Tools for Whole Chloroplast Barcoding

    PubMed Central

    Brozynska, Marta; Furtado, Agnelo; Henry, Robert James

    2014-01-01

    Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina) and Ion Torrent (Life Technology) sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare). Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels) between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis. PMID:25329378

  9. Next-generation sequencing-based user-friendly platforms for drug-resistant tuberculosis diagnosis: A promise for the near future.

    PubMed

    Dolinger, David L; Colman, Rebecca E; Engelthaler, David M; Rodwell, Timothy C

    2016-12-01

    Since 2002, there has been a gradual worldwide 1.3% annual decrease in the incidence of tuberculosis (TB). This is an encouraging statistic; however, it will not achieve the World Health Organization's goal of eliminating TB by 2050, and it is being compounded by the persistent global incidence of drug-resistant tuberculosis (DR-TB) acquired by transmission and by treatment pressure. One key to effectively control tuberculosis and the spread of multiresistant strains is accurate information pertaining to drug resistance and susceptibility. Next-generation sequencing (NGS) has the potential to effectively change global health and the management of TB. Industry has focused primarily on using NGS for oncology diagnostics and human genomics, but the area in which NGS can rapidly impact health care is in the area of infectious disease diagnostics in low- and middle-income countries. To date, there has been a failure as a community to capitalize on the potential of NGS, especially at the reference laboratory level where it can provide actionable information pertaining to treatment options for patients. The rapid evolution of knowledge about the genetic foundations of tuberculosis drug resistance makes sequencing a versatile technology platform for providing rapid, accurate, and actionable results for treating this disease. No "plug-and-play" and "end-to-end" NGS solutions exist that provide clinically relevant sequence data from the Mycobacterium tuberculosis complex genome from primary clinical samples (e.g., sputum) in high-burden country reference laboratories, which is where they are most needed. However, such a system-based solution is underdeveloped by Foundation for Innovative Diagnostics (FIND), in collaboration with partners from academia, nongovernmental organizations, and industry. The solution is modular and is designed and developed to perform targeted amplicon sequencing directly from a patient's primary sputum sample. This solution will initially allow

  10. Profiling of human epigenetic regulators using a semi-automated real-time qPCR platform validated by next generation sequencing.

    PubMed

    Dudakovic, Amel; Gluscevic, Martina; Paradise, Christopher R; Dudakovic, Halil; Khani, Farzaneh; Thaler, Roman; Ahmed, Farah S; Li, Xiaodong; Dietz, Allan B; Stein, Gary S; Montecino, Martin A; Deyle, David R; Westendorf, Jennifer J; van Wijnen, Andre J

    2017-04-20

    Epigenetic mechanisms control phenotypic commitment of mesenchymal stromal/stem cells (MSCs) into osteogenic, chondrogenic or adipogenic lineages. To investigate enzymes and chromatin binding proteins controlling the epigenome, we developed a hybrid expression screening strategy that combines semi-automated real-time qPCR (RT-qPCR), next generation RNA sequencing (RNA-seq), and a novel data management application (FileMerge). This strategy was used to interrogate expression of a large cohort (n>300) of human epigenetic regulators (EpiRegs) that generate, interpret and/or edit the histone code. We find that EpiRegs with similar enzymatic functions are variably expressed and specific isoforms dominate over others in human MSCs. This principle is exemplified by analysis of key histone acetyl transferases (HATs) and deacetylases (HDACs), H3 lysine methyltransferases (e.g., EHMTs) and demethylases (KDMs), as well as bromodomain (BRDs) and chromobox (CBX) proteins. Our results show gender-specific expression of H3 lysine 9 [H3K9] demethylases (e.g., KDM5D and UTY) as expected and upregulation of distinct EpiRegs (n>30) during osteogenic differentiation of MSCs (e.g., HDAC5 and HDAC7). The functional significance of HDACs in osteogenic lineage commitment of MSCs was functionally validated using panobinostat (LBH-589). This pan-deacetylase inhibitor suppresses osteoblastic differentiation as evidenced by reductions in bone-specific mRNA markers (e.g., ALPL), alkaline phosphatase activity and calcium deposition (i.e., Alizarin Red staining). Thus, our RT-qPCR platform identifies candidate EpiRegs by expression screening, predicts biological outcomes of their corresponding inhibitors, and enables manipulation of the human epigenome using molecular or pharmacological approaches to control stem cell differentiation. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Relay Sequence Generation Software

    NASA Technical Reports Server (NTRS)

    Gladden, Roy E.; Khanampompan, Teerapat

    2009-01-01

    Due to thermal and electromagnetic interactivity between the UHF (ultrahigh frequency) radio onboard the Mars Reconnaissance Orbiter (MRO), which performs relay sessions with the Martian landers, and the remainder of the MRO payloads, it is required to integrate and de-conflict relay sessions with the MRO science plan. The MRO relay SASF/PTF (spacecraft activity sequence file/ payload target file) generation software facilitates this process by generating a PTF that is needed to integrate the periods of time during which MRO supports relay activities with the rest of the MRO science plans. The software also generates the needed command products that initiate the relay sessions, some features of which are provided by the lander team, some are managed by MRO internally, and some being derived.

  12. Clinical detection of human probiotics and human pathogenic bacteria by using a novel high-throughput platform based on next generation sequencing

    PubMed Central

    2014-01-01

    Background The human body plays host to a vast array of bacteria, found in oral cavities, skin, gastrointestinal tract and the vagina. Some bacteria are harmful while others are beneficial to the host. Despite the availability of many methods to identify bacteria, most of them are only applicable to specific and cultivable bacteria and are also tedious. Based on high throughput sequencing technology, this work derives 16S rRNA sequences of bacteria and analyzes probiotics and pathogens species. Results We constructed a database that recorded the species of probiotics and pathogens from literature, along with a modified Smith-Waterman algorithm for assigning the taxonomy of the sequenced 16S rRNA sequences. We also constructed a bacteria disease risk model for seven diseases based on 98 samples. Applicability of the proposed platform is demonstrated by collecting the microbiome in human gut of 13 samples. Conclusions The proposed platform provides a relatively easy means of identifying a certain amount of bacteria and their species (including uncultivable pathogens) for clinical microbiology applications. That is, detecting how probiotics and pathogens inhabit humans and how affect their health can significantly contribute to develop a diagnosis and treatment method. PMID:24418497

  13. Targeted Exome Sequencing Outcome Variations of Colorectal Tumors within and across Two Sequencing Platforms

    PubMed Central

    Ashktorab, Hassan; Azimi, Hamed; Nickerson, Michael L.; Bass, Sara; Varma, Sudhir; Brim, Hassan

    2016-01-01

    Background and Aim Next generation sequencing (NGS) has quickly the tool of choice for genome and exome data generation. The multitude of sequencing platforms as well as the variabilities within each platform need to be assessed. In this paper we used two platforms (ION TORRENT AND ILLUMINA) to assess single nucleotides variants in colorectal cancer (CRC) specimens. Methods CRC specimens (n = 13) collected from 6 CRC (cancer and matched normal) patients were used to establish the mutational profile using ION TORRENT AND ILLUMINA sequencing platforms. We analyzed a set of samples from Formalin Fixed Paraffin Embedded and FF (FF) samples on both platforms to assess the effect of sample nature (FFPE vs. FF) on sequencing outcome and to evaluate the similarity/differences of SNVs across the two platforms. In addition, duplicates of FF samples were sequenced on each platform to assess variability within platform. Results The comparison of FF replicates to each other gave a concordance of 77% (± 15.3%) in Ion Torrent and 70% (± 3.7%) in Illumina. FFPE vs. FF replicates gave a concordance of 40% (± 32%) in Ion Torrent and 49% (± 19%) in Illumina. For the cross platform concordance were FFPE compared to FF (Average of 75% (± 9.8%) for FFPE samples and 67% (± 32%) for FF and 70% (± 26.8%) overall average). Conclusion Our data show a significant variability within and across platforms. Also the number of detected variants depend on the nature of the specimen; FF vs. FFPE. Validation of NGS discovered mutations is a must to rule-out false positive mutants. This validation might either be performed through a second NGS platform or through Sanger sequencing. PMID:27547838

  14. Next generation sequencing of SNPs using the HID-Ion AmpliSeq™ Identity Panel on the Ion Torrent PGM™ platform.

    PubMed

    Guo, Fei; Zhou, Yishu; Song, He; Zhao, Jinling; Shen, Hongying; Zhao, Bin; Liu, Feng; Jiang, Xianhua

    2016-11-01

    The HID-Ion AmpliSeq™ Identity Panel (the HID Identity Panel) is designed to detect 124-plex single nucleotide polymorphisms (SNPs) with next generation sequencing (NGS) technology on the Ion Torrent PGM™ platform, including 90 individual identification SNPs (IISNPs) on autosomal chromosomes and 34 lineage informative SNPs (LISNPs) on Y chromosome. In this study, we evaluated performance for the HID Identity Panel to provide a reference for NGS-SNP application, focusing on locus strand balance, locus coverage balance, heterozygote balance, and background signals. Besides, several experiments were carried out to find out improvements and limitations of this panel, including studies of species specificity, repeatability and concordance, sensitivity, mixtures, case-type samples and degraded samples, population genetics and pedigrees following the Scientific Working Group on DNA Analysis Methods (SWGDAM) guidelines. In addition, Southern and Northern Chinese Han were investigated to assess applicability of this panel. Results showed this panel led to cross-reactivity with primates to some extent but rarely with non-primate animals. Repeatable and concordant genotypes could be obtained in triplicate with one exception at rs7520386. Full profiles could be obtained from 100pg input DNA, but the optimal input DNA would be 1ng-200pg with 21 initial PCR cycles. A sample with ≥20% minor contributor could be considered as a mixture by the number of homozygotes, and full profiles belonging to minor contributors could be detected between 9:1 and 1:9 mixtures with known reference profiles. Also, this assay could be used for case-type samples and degraded samples. For autosomal SNPs (A-SNPs), FST across all 90loci was not significantly different between Southern and Northern Chinese Han or between male and female samples. All A-SNP loci were independent in Chinese Han population. Except for 18loci with He <0.4, most of the A-SNPs in the HID Identity Panel presented high

  15. MIG-seq: an effective PCR-based method for genome-wide single-nucleotide polymorphism genotyping using the next-generation sequencing platform

    PubMed Central

    Suyama, Yoshihisa; Matsuki, Yu

    2015-01-01

    Restriction-enzyme (RE)-based next-generation sequencing methods have revolutionized marker-assisted genetic studies; however, the use of REs has limited their widespread adoption, especially in field samples with low-quality DNA and/or small quantities of DNA. Here, we developed a PCR-based procedure to construct reduced representation libraries without RE digestion steps, representing de novo single-nucleotide polymorphism discovery, and its genotyping using next-generation sequencing. Using multiplexed inter-simple sequence repeat (ISSR) primers, thousands of genome-wide regions were amplified effectively from a wide variety of genomes, without prior genetic information. We demonstrated: 1) Mendelian gametic segregation of the discovered variants; 2) reproducibility of genotyping by checking its applicability for individual identification; and 3) applicability in a wide variety of species by checking standard population genetic analysis. This approach, called multiplexed ISSR genotyping by sequencing, should be applicable to many marker-assisted genetic studies with a wide range of DNA qualities and quantities. PMID:26593239

  16. MIG-seq: an effective PCR-based method for genome-wide single-nucleotide polymorphism genotyping using the next-generation sequencing platform.

    PubMed

    Suyama, Yoshihisa; Matsuki, Yu

    2015-11-23

    Restriction-enzyme (RE)-based next-generation sequencing methods have revolutionized marker-assisted genetic studies; however, the use of REs has limited their widespread adoption, especially in field samples with low-quality DNA and/or small quantities of DNA. Here, we developed a PCR-based procedure to construct reduced representation libraries without RE digestion steps, representing de novo single-nucleotide polymorphism discovery, and its genotyping using next-generation sequencing. Using multiplexed inter-simple sequence repeat (ISSR) primers, thousands of genome-wide regions were amplified effectively from a wide variety of genomes, without prior genetic information. We demonstrated: 1) Mendelian gametic segregation of the discovered variants; 2) reproducibility of genotyping by checking its applicability for individual identification; and 3) applicability in a wide variety of species by checking standard population genetic analysis. This approach, called multiplexed ISSR genotyping by sequencing, should be applicable to many marker-assisted genetic studies with a wide range of DNA qualities and quantities.

  17. Explanatory chapter: next generation sequencing.

    PubMed

    Yegnasubramanian, Srinivasan

    2013-01-01

    Technological breakthroughs in sequencing technologies have driven the advancement of molecular biology and molecular genetics research. The advent of high-throughput Sanger sequencing (for information on the method, see Sanger Dideoxy Sequencing of DNA) in the mid- to late-1990s made possible the accelerated completion of the human genome project, which has since revolutionized the pace of discovery in biomedical research. Similarly, the advent of next generation sequencing is poised to revolutionize biomedical research and usher a new era of individualized, rational medicine. The term next generation sequencing refers to technologies that have enabled the massively parallel analysis of DNA sequence facilitated through the convergence of advancements in molecular biology, nucleic acid chemistry and biochemistry, computational biology, and electrical and mechanical engineering. The current next generation sequencing technologies are capable of sequencing tens to hundreds of millions of DNA templates simultaneously and generate >4 gigabases of sequence in a single day. These technologies have largely started to replace high-throughput Sanger sequencing for large-scale genomic projects, and have created significant enthusiasm for the advent of a new era of individualized medicine. Copyright © 2013 Elsevier Inc. All rights reserved.

  18. Generating barcoded libraries for multiplex high-throughput sequencing.

    PubMed

    Knapp, Michael; Stiller, Mathias; Meyer, Matthias

    2012-01-01

    Molecular barcoding is an essential tool to use the high throughput of next generation sequencing platforms optimally in studies involving more than one sample. Various barcoding strategies allow for the incorporation of short recognition sequences (barcodes) into sequencing libraries, either by ligation or polymerase chain reaction (PCR). Here, we present two approaches optimized for generating barcoded sequencing libraries from low copy number extracts and amplification products typical of ancient DNA studies.

  19. Choice of next-generation sequencing pipelines.

    PubMed

    Del Chierico, F; Ancora, M; Marcacci, M; Cammà, C; Putignani, L; Conti, Salvatore

    2015-01-01

    The next-generation sequencing (NGS) technologies are revolutionary tools which have made possible achieving remarkable advances in genetics since the beginning of the twenty-first century. Thanks to the possibility to produce large amount of sequence data, these tools are going to completely substitute other high-throughput technologies. Moreover, the large applications of NGS protocols are increasing the genetic decoding of biological systems through studies of genome anatomy and gene mapping, coupled to the transcriptome pictures. The application of NGS pipelines such as (1) de-novo genomic sequencing by mate-paired and whole-genome shotgun strategies; (2) specific gene sequencing on large bacterial communities; and (3) RNA-seq methods including whole transcriptome sequencing and Serial Analysis of Gene Expression (Sage-analysis) are fundamental in the genome-wide fields like metagenomics. Recently, the availability of these advanced protocols has allowed to overcome the usual sequencing technical issues related to the mapping specificity over standard shotgun library sequencing, the detection of large structural genomes variations and bridging sequencing gaps, as well as more precise gene annotation. In this chapter we will discuss how to manage a successful NGS pipeline from the planning of sequencing projects through the choice of the platforms up to the data analysis management.

  20. Next-Generation Sequencing in the Mycology Lab.

    PubMed

    Zoll, Jan; Snelders, Eveline; Verweij, Paul E; Melchers, Willem J G

    New state-of-the-art techniques in sequencing offer valuable tools in both detection of mycobiota and in understanding of the molecular mechanisms of resistance against antifungal compounds and virulence. Introduction of new sequencing platform with enhanced capacity and a reduction in costs for sequence analysis provides a potential powerful tool in mycological diagnosis and research. In this review, we summarize the applications of next-generation sequencing techniques in mycology.

  1. Application of genotyping-by-sequencing on semiconductor sequencing platforms: A comparison of genetic and reference-based marker ordering in barley

    USDA-ARS?s Scientific Manuscript database

    The rapid development of next generation sequencing platforms has enabled the use of sequencing for routine genotyping across a range of genetics studies and breeding applications. Genotyping-by-sequencing (GBS), a low-cost, reduced representation sequencing method, is becoming a common approach fo...

  2. Wolfcampian sequence stratigraphy of eastern Central Basin platform, Texas

    SciTech Connect

    Candelaria, M.P.; Entzminger, D.J.; Behnken, F.H. ); Sarg, J.F. ); Wilde, G.L. )

    1992-04-01

    Integrated study of well logs, cores, high-resolution seismic data, and biostratigraphy has established the sequence framework of the Atokan (Early Pennsylvanian)-Wolfcampian (Early Permian) stratigraphic section along the eastern margin of the Central Basin platform in the Permian basin. Sequence interpretation of high-resolution, high-fold seismic data through this stratigraphic interval has revealed a complex progradational/retrogradational evolution of the platform margin that has demonstrated overall progradation of at least 12 km during early-middle Wolfcampian. Sequence stratigraphic study of the Wolfcamp interval has revealed details of the internal architecture and morphologic evolution of the contemporaneous platform margin. Two generalized seismic facies assemblages are recognized in the Wolfcampian. Platform interior facies are characterized by high-amplitude, laterally continuous parallel reflections; platform margin facies consist of progradational sigmoidal to oblique clinoforms and are characterized by discontinuous, low-amplitude reflections. Sequence interpretation of carbonate platform-to-basin strata geometries helps in predicting subtle stratigraphic trapping relationships and potential reservoir facies distribution. Moreover, this interpretive method assists in describing complex reservoir heterogeneities that can contribute to significant reserve additions from within existing fields.

  3. Generating Exome Enriched Sequencing Libraries from Formalin-Fixed, Paraffin-Embedded Tissue DNA for Next-Generation Sequencing.

    PubMed

    Marosy, Beth A; Craig, Brian D; Hetrick, Kurt N; Witmer, P Dane; Ling, Hua; Griffith, Sean M; Myers, Benjamin; Ostrander, Elaine A; Stanford, Janet L; Brody, Lawrence C; Doheny, Kimberly F

    2017-01-11

    This unit describes a technique for generating exome-enriched sequencing libraries using DNA extracted from formalin-fixed paraffin-embedded (FFPE) samples. Utilizing commercially available kits, we present a low-input FFPE workflow starting with 50 ng of DNA. This procedure includes a repair step to address damage caused by FFPE preservation that improves sequence quality. Subsequently, libraries undergo an in-solution-targeted selection for exons, followed by sequencing using the Illumina next-generation short-read sequencing platform. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  4. Replacement Sequence of Events Generator

    NASA Technical Reports Server (NTRS)

    Fisher, Forest; Gladden, Daniel Wenkert Roy; Khanampompan, Teerpat

    2008-01-01

    The soeWINDOW program automates the generation of an ITAR (International Traffic in Arms Regulations)-compliant sub-RSOE (Replacement Sequence of Events) by extracting a specified temporal window from an RSOE while maintaining page header information. RSOEs contain a significant amount of information that is not ITAR-compliant, yet that foreign partners need to see for command details to their instrument, as well as the surrounding commands that provide context for validation. soeWINDOW can serve as an example of how command support products can be made ITAR-compliant for future missions. This software is a Perl script intended for use in the mission operations UNIX environment. It is designed for use to support the MRO (Mars Reconnaissance Orbiter) instrument team. The tool also provides automated DOM (Distributed Object Manager) storage into the special ITAR-okay DOM collection, and can be used for creating focused RSOEs for product review by any of the MRO teams.

  5. Automated Sequence Generation Process and Software

    NASA Technical Reports Server (NTRS)

    Gladden, Roy

    2007-01-01

    "Automated sequence generation" (autogen) signifies both a process and software used to automatically generate sequences of commands to operate various spacecraft. The autogen software comprises the autogen script plus the Activity Plan Generator (APGEN) program. APGEN can be used for planning missions and command sequences.

  6. Sedimentology and sequence stratigraphy of reefs and carbonate platforms

    SciTech Connect

    Schlager, W. )

    1992-01-01

    Classical sequence stratigraphy has been developed primarily from siliciclastic systems. Application of the concept to carbonates has not been as straightforward as was originally expected even though the basic tenets of sequence stratigraphy are supposed to be applicable to all depositional systems. Rather than force carbonate platforms into the straightjacket of a concept derived from another sediment family, this course takes a different tack. It starts out from the premise that sequence stratigraphy is a modern and sophisticated version of lithostratigraphy and as such is a sedimentologic concept. More sedimentology into sequence stratigraphy is the motto of the course and the red line that runs through the chapter of this book. The cook sets out with a review of sedimentologic in reference to petroleum deposits principles governing the large-scale anatomy of reefs and platforms. It then looks at sequences an systems tracts from a sedimentologic point of view, assesses the differences between siliciclastics and carbonates in their response to sea level, evaluates processes that compete with sea level for control on carbonate sequences, and finally presents a set of guidelines for application of sequence stratigraphy to reefs and carbonate platforms.

  7. Assembly algorithms for next-generation sequencing data.

    PubMed

    Miller, Jason R; Koren, Sergey; Sutton, Granger

    2010-06-01

    The emergence of next-generation sequencing platforms led to resurgence of research in whole-genome shotgun assembly algorithms and software. DNA sequencing data from the Roche 454, Illumina/Solexa, and ABI SOLiD platforms typically present shorter read lengths, higher coverage, and different error profiles compared with Sanger sequencing data. Since 2005, several assembly software packages have been created or revised specifically for de novo assembly of next-generation sequencing data. This review summarizes and compares the published descriptions of packages named SSAKE, SHARCGS, VCAKE, Newbler, Celera Assembler, Euler, Velvet, ABySS, AllPaths, and SOAPdenovo. More generally, it compares the two standard methods known as the de Bruijn graph approach and the overlap/layout/consensus approach to assembly.

  8. Assembly Algorithms for Next-Generation Sequencing Data

    PubMed Central

    Miller, Jason R.; Koren, Sergey; Sutton, Granger

    2010-01-01

    The emergence of next-generation sequencing platforms led to resurgence of research in whole-genome shotgun assembly algorithms and software. DNA sequencing data from the Roche 454, Illumina/Solexa, and ABI SOLiD platforms typically present shorter read lengths, higher coverage, and different error profiles compared with Sanger sequencing data. Since 2005, several assembly software packages have been created or revised specifically for de novo assembly of next-generation sequencing data. This review summarizes and compares the published descriptions of packages named SSAKE, SHARCGS, VCAKE, Newbler, Celera Assembler, Euler, Velvet, ABySS, AllPaths, and SOAPdenovo. More generally, it compares the two standard methods known as the de Bruijn graph approach and the overlap/layout/consensus approach to assembly. PMID:20211242

  9. Sequence Data for Clostridium autoethanogenum using Three Generations of Sequencing Technologies

    SciTech Connect

    Utturkar, Sagar M.; Klingeman, Dawn Marie; Bruno-Barcena, José M.; Chinn, Mari S.; Grunden, Amy; Köpke, Michael; Brown, Steven D.

    2015-04-14

    During the past decade, DNA sequencing output has been mostly dominated by the second generation sequencing platforms which are characterized by low cost, high throughput and shorter read lengths for example, Illumina. The emergence and development of so called third generation sequencing platforms such as PacBio has permitted exceptionally long reads (over 20 kb) to be generated. Due to read length increases, algorithm improvements and hybrid assembly approaches, the concept of one chromosome, one contig and automated finishing of microbial genomes is now a realistic and achievable task for many microbial laboratories. In this paper, we describe high quality sequence datasets which span three generations of sequencing technologies, containing six types of data from four NGS platforms and originating from a single microorganism, Clostridium autoethanogenum. The dataset reported here will be useful for the scientific community to evaluate upcoming NGS platforms, enabling comparison of existing and novel bioinformatics approaches and will encourage interest in the development of innovative experimental and computational methods for NGS data.

  10. Sequence data for Clostridium autoethanogenum using three generations of sequencing technologies

    PubMed Central

    Utturkar, Sagar M; Klingeman, Dawn M; Bruno-Barcena, José M; Chinn, Mari S; Grunden, Amy M; Köpke, Michael; Brown, Steven D

    2015-01-01

    During the past decade, DNA sequencing output has been mostly dominated by the second generation sequencing platforms which are characterized by low cost, high throughput and shorter read lengths for example, Illumina. The emergence and development of so called third generation sequencing platforms such as PacBio has permitted exceptionally long reads (over 20 kb) to be generated. Due to read length increases, algorithm improvements and hybrid assembly approaches, the concept of one chromosome, one contig and automated finishing of microbial genomes is now a realistic and achievable task for many microbial laboratories. In this paper, we describe high quality sequence datasets which span three generations of sequencing technologies, containing six types of data from four NGS platforms and originating from a single microorganism, Clostridium autoethanogenum. The dataset reported here will be useful for the scientific community to evaluate upcoming NGS platforms, enabling comparison of existing and novel bioinformatics approaches and will encourage interest in the development of innovative experimental and computational methods for NGS data. PMID:25977818

  11. Sequence data for Clostridium autoethanogenum using three generations of sequencing technologies.

    PubMed

    Utturkar, Sagar M; Klingeman, Dawn M; Bruno-Barcena, José M; Chinn, Mari S; Grunden, Amy M; Köpke, Michael; Brown, Steven D

    2015-01-01

    During the past decade, DNA sequencing output has been mostly dominated by the second generation sequencing platforms which are characterized by low cost, high throughput and shorter read lengths for example, Illumina. The emergence and development of so called third generation sequencing platforms such as PacBio has permitted exceptionally long reads (over 20 kb) to be generated. Due to read length increases, algorithm improvements and hybrid assembly approaches, the concept of one chromosome, one contig and automated finishing of microbial genomes is now a realistic and achievable task for many microbial laboratories. In this paper, we describe high quality sequence datasets which span three generations of sequencing technologies, containing six types of data from four NGS platforms and originating from a single microorganism, Clostridium autoethanogenum. The dataset reported here will be useful for the scientific community to evaluate upcoming NGS platforms, enabling comparison of existing and novel bioinformatics approaches and will encourage interest in the development of innovative experimental and computational methods for NGS data.

  12. Underlying Data for Sequencing the Mitochondrial Genome with the Massively Parallel Sequencing Platform Ion Torrent™ PGM™

    PubMed Central

    2015-01-01

    Background Massively parallel sequencing (MPS) technologies have the capacity to sequence targeted regions or whole genomes of multiple nucleic acid samples with high coverage by sequencing millions of DNA fragments simultaneously. Compared with Sanger sequencing, MPS also can reduce labor and cost on a per nucleotide basis and indeed on a per sample basis. In this study, whole genomes of human mitochondria (mtGenome) were sequenced on the Personal Genome Machine (PGMTM) (Life Technologies, San Francisco, CA), the out data were assessed, and the results were compared with data previously generated on the MiSeqTM (Illumina, San Diego, CA). The objectives of this paper were to determine the feasibility, accuracy, and reliability of sequence data obtained from the PGM. Results 24 samples were multiplexed (in groups of six) and sequenced on the at least 10 megabase throughput 314 chip. The depth of coverage pattern was similar among all 24 samples; however the coverage across the genome varied. For strand bias, the average ratio of coverage between the forward and reverse strands at each nucleotide position indicated that two-thirds of the positions of the genome had ratios that were greater than 0.5. A few sites had more extreme strand bias. Another observation was that 156 positions had a false deletion rate greater than 0.15 in one or more individuals. There were 31-98 (SNP) mtGenome variants observed per sample for the 24 samples analyzed. The total 1237 (SNP) variants were concordant between the results from the PGM and MiSeq. The quality scores for haplogroup assignment for all 24 samples ranged between 88.8%-100%. Conclusions In this study, mtDNA sequence data generated from the PGM were analyzed and the output evaluated. Depth of coverage variation and strand bias were identified but generally were infrequent and did not impact reliability of variant calls. Multiplexing of samples was demonstrated which can improve throughput and reduce cost per sample analyzed

  13. Sequence Data for Clostridium autoethanogenum using Three Generations of Sequencing Technologies

    DOE PAGES

    Utturkar, Sagar M.; Klingeman, Dawn Marie; Bruno-Barcena, José M.; ...

    2015-04-14

    During the past decade, DNA sequencing output has been mostly dominated by the second generation sequencing platforms which are characterized by low cost, high throughput and shorter read lengths for example, Illumina. The emergence and development of so called third generation sequencing platforms such as PacBio has permitted exceptionally long reads (over 20 kb) to be generated. Due to read length increases, algorithm improvements and hybrid assembly approaches, the concept of one chromosome, one contig and automated finishing of microbial genomes is now a realistic and achievable task for many microbial laboratories. In this paper, we describe high quality sequencemore » datasets which span three generations of sequencing technologies, containing six types of data from four NGS platforms and originating from a single microorganism, Clostridium autoethanogenum. The dataset reported here will be useful for the scientific community to evaluate upcoming NGS platforms, enabling comparison of existing and novel bioinformatics approaches and will encourage interest in the development of innovative experimental and computational methods for NGS data.« less

  14. Advances in clinical next-generation sequencing: target enrichment and sequencing technologies.

    PubMed

    Ballester, Leomar Y; Luthra, Rajyalakshmi; Kanagal-Shamanna, Rashmi; Singh, Rajesh R

    2016-01-01

    The huge parallel sequencing capabilities of next generation sequencing technologies have made them the tools of choice to characterize genomic aberrations for research and diagnostic purposes. For clinical applications, screening the whole genome or exome is challenging owing to the large genomic area to be sequenced, associated costs, complexity of data, and lack of known clinical significance of all genes. Consequently, routine screening involves limited markers with established clinical relevance. This process, referred to as targeted genome sequencing, requires selective enrichment of the genomic areas comprising these markers via one of several primer or probe-based enrichment strategies, followed by sequencing of the enriched genomic areas. Here, the authors review current target enrichment approaches and next generation sequencing platforms, focusing on the underlying principles, capabilities, and limitations of each technology along with validation and implementation for clinical testing.

  15. Improved pipeline for reducing erroneous identification by 16S rRNA sequences using the Illumina MiSeq platform.

    PubMed

    Jeon, Yoon-Seong; Park, Sang-Cheol; Lim, Jeongmin; Chun, Jongsik; Kim, Bong-Soo

    2015-01-01

    The cost of DNA sequencing has decreased due to advancements in Next Generation Sequencing. The number of sequences obtained from the Illumina platform is large, use of this platform can reduce costs more than the 454 pyrosequencer. However, the Illumina platform has other challenges, including bioinformatics analysis of large numbers of sequences and the need to reduce erroneous nucleotides generated at the 3'-ends of the sequences. These erroneous sequences can lead to errors in analysis of microbial communities. Therefore, correction of these erroneous sequences is necessary for accurate taxonomic identification. Several studies that have used the Illumina platform to perform metagenomic analyses proposed curating pipelines to increase accuracy. In this study, we evaluated the likelihood of obtaining an erroneous microbial composition using the MiSeq 250 bp paired sequence platform and improved the pipeline to reduce erroneous identifications. We compared different sequencing conditions by varying the percentage of control phiX added, the concentration of the sequencing library, and the 16S rRNA gene target region using a mock community sample composed of known sequences. Our recommended method corrected erroneous nucleotides and improved identification accuracy. Overall, 99.5% of the total reads shared 95% similarity with the corresponding template sequences and 93.6% of the total reads shared over 97% similarity. This indicated that the MiSeq platform can be used to analyze microbial communities at the genus level with high accuracy. The improved analysis method recommended in this study can be applied to amplicon studies in various environments using high-throughput reads generated on the MiSeq platform.

  16. ACMG clinical laboratory standards for next-generation sequencing.

    PubMed

    Rehm, Heidi L; Bale, Sherri J; Bayrak-Toydemir, Pinar; Berg, Jonathan S; Brown, Kerry K; Deignan, Joshua L; Friez, Michael J; Funke, Birgit H; Hegde, Madhuri R; Lyon, Elaine

    2013-09-01

    Next-generation sequencing technologies have been and continue to be deployed in clinical laboratories, enabling rapid transformations in genomic medicine. These technologies have reduced the cost of large-scale sequencing by several orders of magnitude, and continuous advances are being made. It is now feasible to analyze an individual's near-complete exome or genome to assist in the diagnosis of a wide array of clinical scenarios. Next-generation sequencing technologies are also facilitating further advances in therapeutic decision making and disease prediction for at-risk patients. However, with rapid advances come additional challenges involving the clinical validation and use of these constantly evolving technologies and platforms in clinical laboratories. To assist clinical laboratories with the validation of next-generation sequencing methods and platforms, the ongoing monitoring of next-generation sequencing testing to ensure quality results, and the interpretation and reporting of variants found using these technologies, the American College of Medical Genetics and Genomics has developed the following professional standards and guidelines.

  17. ACMG clinical laboratory standards for next-generation sequencing

    PubMed Central

    Rehm, Heidi L.; Bale, Sherri J; Bayrak-Toydemir, Pinar; Berg, Jonathan S; Brown, Kerry K; Deignan, Joshua L; Friez, Michael J; Funke, Birgit H; Hegde, Madhuri R; Lyon, Elaine

    2014-01-01

    Next-generation sequencing technologies have been and continue to be deployed in clinical laboratories, enabling rapid transformations in genomic medicine. These technologies have reduced the cost of large-scale sequencing by several orders of magnitude, and continuous advances are being made. It is now feasible to analyze an individual's near-complete exome or genome to assist in the diagnosis of a wide array of clinical scenarios. Next-generation sequencing technologies are also facilitating further advances in therapeutic decision making and disease prediction for at-risk patients. However, with rapid advances come additional challenges involving the clinical validation and use of these constantly evolving technologies and platforms in clinical laboratories. To assist clinical laboratories with the validation of next-generation sequencing methods and platforms, the ongoing monitoring of next-generation sequencing testing to ensure quality results, and the interpretation and reporting of variants found using these technologies, the American College of Medical Genetics and Genomics has developed the following professional standards and guidelines. PMID:23887774

  18. Concept For Generation Of Long Pseudorandom Sequences

    NASA Technical Reports Server (NTRS)

    Wang, C. C.

    1990-01-01

    Conceptual very-large-scale integrated (VLSI) digital circuit performs exponentiation in finite field. Algorithm that generates unusually long sequences of pseudorandom numbers executed by digital processor that includes such circuits. Concepts particularly advantageous for such applications as spread-spectrum communications, cryptography, and generation of ranging codes, synthetic noise, and test data, where usually desirable to make pseudorandom sequences as long as possible.

  19. Next-generation sequencing strategies for characterizing the turkey genome.

    PubMed

    Dalloul, Rami A; Zimin, Aleksey V; Settlage, Robert E; Kim, Sungwon; Reed, Kent M

    2014-02-01

    The turkey genome sequencing project was initiated in 2008 and has relied primarily on next-generation sequencing (NGS) technologies. Our first efforts used a synergistic combination of 2 NGS platforms (Roche/454 and Illumina GAII), detailed bacterial artificial chromosome (BAC) maps, and unique assembly tools to sequence and assemble the genome of the domesticated turkey, Meleagris gallopavo. Since the first release in 2010, efforts to improve the genome assembly, gene annotation, and genomic analyses continue. The initial assembly build (2.01) represented about 89% of the genome sequence with 17X coverage depth (931 Mb). Sequence contigs were assigned to 30 of the 40 chromosomes with approximately 10% of the assembled sequence corresponding to unassigned chromosomes (ChrUn). The sequence has been refined through both genome-wide and area-focused sequencing, including shotgun and paired-end sequencing, and targeted sequencing of chromosomal regions with low or incomplete coverage. These additional efforts have improved the sequence assembly resulting in 2 subsequent genome builds of higher genome coverage (25X/Build3.0 and 30X/Build4.0) with a current sequence totaling 1,010 Mb. Further, BAC with end sequences assigned to the Z/W and MG18 (MHC) chromosomes, ChrUn, or not placed in the previous build were isolated, deeply sequenced (Hi-Seq), and incorporated into the latest build (5.0). To aid in the annotation and to generate a gene expression atlas of major tissues, a comprehensive set of RNA samples was collected at various developmental stages of female and male turkeys. Transcriptome sequencing data (using Illumina Hi-Seq) will provide information to enhance the final assembly and ultimately improve sequence annotation. The most current sequence covers more than 95% of the turkey genome and should yield a much improved gene level of annotation, making it a valuable resource for studying genetic variations underlying economically important traits in poultry.

  20. 16S rRNA gene sequencing of mock microbial populations- impact of DNA extraction method, primer choice and sequencing platform.

    PubMed

    Fouhy, Fiona; Clooney, Adam G; Stanton, Catherine; Claesson, Marcus J; Cotter, Paul D

    2016-06-24

    Next-generation sequencing platforms have revolutionised our ability to investigate the microbiota composition of complex environments, frequently through 16S rRNA gene sequencing of the bacterial component of the community. Numerous factors, including DNA extraction method, primer sequences and sequencing platform employed, can affect the accuracy of the results achieved. The aim of this study was to determine the impact of these three factors on 16S rRNA gene sequencing results, using mock communities and mock community DNA. The use of different primer sequences (V4-V5, V1-V2 and V1-V2 degenerate primers) resulted in differences in the genera and species detected. The V4-V5 primers gave the most comparable results across platforms. The three Ion PGM primer sets detected more of the 20 mock community species than the equivalent MiSeq primer sets. Data generated from DNA extracted using the 2 extraction methods were very similar. Microbiota compositional data differed depending on the primers and sequencing platform that were used. The results demonstrate the risks in comparing data generated using different sequencing approaches and highlight the merits of choosing a standardised approach for sequencing in situations where a comparison across multiple sequencing runs is required.

  1. WebLogo: a sequence logo generator.

    PubMed

    Crooks, Gavin E; Hon, Gary; Chandonia, John-Marc; Brenner, Steven E

    2004-06-01

    WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment. Sequence logos provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive. Each logo consists of stacks of letters, one stack for each position in the sequence. The overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. WebLogo has been enhanced recently with additional features and options, to provide a convenient and highly configurable sequence logo generator. A command line interface and the complete, open WebLogo source code are available for local installation and customization. Copyright 2004 Cold Spring Harbor Laboratory Press

  2. Application of genotyping-by-sequencing on semiconductor sequencing platforms: a comparison of genetic and reference-based marker ordering in barley.

    PubMed

    Mascher, Martin; Wu, Shuangye; Amand, Paul St; Stein, Nils; Poland, Jesse

    2013-01-01

    The rapid development of next-generation sequencing platforms has enabled the use of sequencing for routine genotyping across a range of genetics studies and breeding applications. Genotyping-by-sequencing (GBS), a low-cost, reduced representation sequencing method, is becoming a common approach for whole-genome marker profiling in many species. With quickly developing sequencing technologies, adapting current GBS methodologies to new platforms will leverage these advancements for future studies. To test new semiconductor sequencing platforms for GBS, we genotyped a barley recombinant inbred line (RIL) population. Based on a previous GBS approach, we designed bar code and adapter sets for the Ion Torrent platforms. Four sets of 24-plex libraries were constructed consisting of 94 RILs and the two parents and sequenced on two Ion platforms. In parallel, a 96-plex library of the same RILs was sequenced on the Illumina HiSeq 2000. We applied two different computational pipelines to analyze sequencing data; the reference-independent TASSEL pipeline and a reference-based pipeline using SAMtools. Sequence contigs positioned on the integrated physical and genetic map were used for read mapping and variant calling. We found high agreement in genotype calls between the different platforms and high concordance between genetic and reference-based marker order. There was, however, paucity in the number of SNP that were jointly discovered by the different pipelines indicating a strong effect of alignment and filtering parameters on SNP discovery. We show the utility of the current barley genome assembly as a framework for developing very low-cost genetic maps, facilitating high resolution genetic mapping and negating the need for developing de novo genetic maps for future studies in barley. Through demonstration of GBS on semiconductor sequencing platforms, we conclude that the GBS approach is amenable to a range of platforms and can easily be modified as new sequencing

  3. Gradient generation platforms: new directions for an established microfluidic technology

    PubMed Central

    Berthier, E.; Beebe, D.J.

    2014-01-01

    Microscale platforms are enabling for cell-based studies as they allow the recapitulation of physiological conditions such as extracellular matrix (ECM) configurations and soluble factors interactions. Gradient generation platforms have been one of the few applications of microfluidics that have begun to be translated to biological laboratories and may become a new “gold standard”. Though gradient generation platforms are now established, their full potential has not yet been realized. Here, we will provide our perspective on milestones achieved in the development of gradient generation and cell migration platforms, as well as emerging directions such as using cell migration as a diagnostic readout and attaining mechanistic information from cell migration models. PMID:25008971

  4. Metagenomics using next-generation sequencing.

    PubMed

    Bragg, Lauren; Tyson, Gene W

    2014-01-01

    Traditionally, microbial genome sequencing has been restricted to the small number of species that can be grown in pure culture. The progressive development of culture-independent methods over the last 15 years now allows researchers to sequence microbial communities directly from environmental samples. This approach is commonly referred to as "metagenomics" or "community genomics". However, the term metagenomics is applied liberally in the literature to describe any culture-independent analysis of microbial communities. Here, we define metagenomics as shotgun ("random") sequencing of the genomic DNA of a sample taken directly from the environment. The metagenome can be thought of as a sampling of the collective genome of the microbial community. We outline the considerations and analyses that should be undertaken to ensure the success of a metagenomic sequencing project, including the choice of sequencing platform and methods for assembly, binning, annotation, and comparative analysis.

  5. Next-generation sequencing - feasibility and practicality in haematology.

    PubMed

    Kohlmann, Alexander; Grossmann, Vera; Nadarajah, Niroshan; Haferlach, Torsten

    2013-03-01

    Next-generation sequencing platforms have evolved to provide an accurate and comprehensive means for the detection of molecular mutations in heterogeneous tumour specimens. Here, we review the feasibility and practicality of this novel laboratory technology. In particular, we focus on the utility of next-generation sequencing technology in characterizing haematological neoplasms and the landmark findings in key haematological malignancies. We also discuss deep-sequencing strategies to analyse the constantly increasing number of molecular markers applied for disease classification, patient stratification and individualized monitoring of minimal residual disease. Although many facets of this assay need to be taken into account, amplicon deep-sequencing has already demonstrated a promising technical performance and is being continuously developed towards routine application in diagnostic laboratories so that an impact on clinical practice can be achieved.

  6. ADS: The Next Generation Search Platform

    NASA Astrophysics Data System (ADS)

    Accomazzi, A.; Kurtz, M. J.; Henneken, E. A.; Chyla, R.; Luker, J.; Grant, C. S.; Thompson, D. M.; Holachek, A.; Dave, R.; Murray, S. S.

    2015-04-01

    Four years after the last LISA meeting, the NASA Astrophysics Data System (ADS) finds itself in the middle of major changes to the infrastructure and contents of its database. In this paper we highlight a number of features of great importance to librarians and discuss the additional functionality that we are currently developing. Our citation coverage has doubled since 2010 and now consists of over 10 million citations. We are normalizing the affiliation information in our records and we have started collecting and linking funding sources with papers in our system. At the same time, we are undergoing major technology changes in the ADS platform. We have rolled out and are now enhancing a new high-performance search engine capable of performing full-text as well as metadata searches using an intuitive query language. We are currently able to index acknowledgments, affiliations, citations, and funding sources. While this effort is still ongoing, some of its benefits are already available through the ADS Labs user interface and API at http://adslabs.org/adsabs/.

  7. Next-Generation Sequencing for Cancer Diagnostics: a Practical Perspective

    PubMed Central

    Meldrum, Cliff; Doyle, Maria A; Tothill, Richard W

    2011-01-01

    Next-generation sequencing (NGS) is arguably one of the most significant technological advances in the biological sciences of the last 30 years. The second generation sequencing platforms have advanced rapidly to the point that several genomes can now be sequenced simultaneously in a single instrument run in under two weeks. Targeted DNA enrichment methods allow even higher genome throughput at a reduced cost per sample. Medical research has embraced the technology and the cancer field is at the forefront of these efforts given the genetic aspects of the disease. World-wide efforts to catalogue mutations in multiple cancer types are underway and this is likely to lead to new discoveries that will be translated to new diagnostic, prognostic and therapeutic targets. NGS is now maturing to the point where it is being considered by many laboratories for routine diagnostic use. The sensitivity, speed and reduced cost per sample make it a highly attractive platform compared to other sequencing modalities. Moreover, as we identify more genetic determinants of cancer there is a greater need to adopt multi-gene assays that can quickly and reliably sequence complete genes from individual patient samples. Whilst widespread and routine use of whole genome sequencing is likely to be a few years away, there are immediate opportunities to implement NGS for clinical use. Here we review the technology, methods and applications that can be immediately considered and some of the challenges that lie ahead. PMID:22147957

  8. A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome.

    PubMed

    Allali, Imane; Arnold, Jason W; Roach, Jeffrey; Cadenas, Maria Belen; Butz, Natasha; Hassan, Hosni M; Koci, Matthew; Ballou, Anne; Mendoza, Mary; Ali, Rizwana; Azcarate-Peril, M Andrea

    2017-09-13

    Advancements in Next Generation Sequencing (NGS) technologies regarding throughput, read length and accuracy had a major impact on microbiome research by significantly improving 16S rRNA amplicon sequencing. As rapid improvements in sequencing platforms and new data analysis pipelines are introduced, it is essential to evaluate their capabilities in specific applications. The aim of this study was to assess whether the same project-specific biological conclusions regarding microbiome composition could be reached using different sequencing platforms and bioinformatics pipelines. Chicken cecum microbiome was analyzed by 16S rRNA amplicon sequencing using Illumina MiSeq, Ion Torrent PGM, and Roche 454 GS FLX Titanium platforms, with standard and modified protocols for library preparation. We labeled the bioinformatics pipelines included in our analysis QIIME1 and QIIME2 (de novo OTU picking [not to be confused with QIIME version 2 commonly referred to as QIIME2]), QIIME3 and QIIME4 (open reference OTU picking), UPARSE1 and UPARSE2 (each pair differs only in the use of chimera depletion methods), and DADA2 (for Illumina data only). GS FLX+ yielded the longest reads and highest quality scores, while MiSeq generated the largest number of reads after quality filtering. Declines in quality scores were observed starting at bases 150-199 for GS FLX+ and bases 90-99 for MiSeq. Scores were stable for PGM-generated data. Overall microbiome compositional profiles were comparable between platforms; however, average relative abundance of specific taxa varied depending on sequencing platform, library preparation method, and bioinformatics analysis. Specifically, QIIME with de novo OTU picking yielded the highest number of unique species and alpha diversity was reduced with UPARSE and DADA2 compared to QIIME. The three platforms compared in this study were capable of discriminating samples by treatment, despite differences in diversity and abundance, leading to similar biological

  9. Next-generation sequencing discoveries in lymphoma.

    PubMed

    Slack, Graham W; Gascoyne, Randy D

    2013-03-01

    Since the mapping of the human genome and the advent of next-generation sequencing technology thorough examination of the cancer genome has become a reality. Over the last few years several studies have used next-generation sequencing technology to investigate the genetic landscape of Hodgkin and non-Hodgkin lymphomas, identifying novel genetic mutations and gene rearrangements that have shed new light on the underlying tumor biology in these diseases as well as identifying possible targets for directed therapy. This review covers the major discoveries in lymphoma using next-generation sequencing technology.

  10. Iterative method for generating correlated binary sequences

    NASA Astrophysics Data System (ADS)

    Usatenko, O. V.; Melnik, S. S.; Apostolov, S. S.; Makarov, N. M.; Krokhin, A. A.

    2014-11-01

    We propose an efficient iterative method for generating random correlated binary sequences with a prescribed correlation function. The method is based on consecutive linear modulations of an initially uncorrelated sequence into a correlated one. Each step of modulation increases the correlations until the desired level has been reached. The robustness and efficiency of the proposed algorithm are tested by generating sequences with inverse power-law correlations. The substantial increase in the strength of correlation in the iterative method with respect to single-step filtering generation is shown for all studied correlation functions. Our results can be used for design of disordered superlattices, waveguides, and surfaces with selective transport properties.

  11. A platform for biological sequence comparison on parallel computers.

    PubMed

    Deshpande, A S; Richards, D S; Pearson, W R

    1991-04-01

    We have written two programs for searching biological sequence databases that run on Intel hypercube computers. PSCANLIB compares a single sequence against a sequence library, and PCOMPLIB compares all the entries in one sequence library against a second library. The programs provide a general framework for similarity searching; they include functions for reading in query sequences, search parameters and library entries, and reporting the results of a search. We have isolated the code for the specific function that calculates the similarity score between the query and library sequence; alternative searching algorithms can be implemented by editing two files. We have implemented the rapid FASTA sequence comparison algorithm and the more rigorous Smith-Waterman algorithm within this framework. The PSCANLIB program on a 16 node iPSC/2 80386-based hypercube can compare a 229 amino acid protein sequence with a 3.4 million residue sequence library in approximately 16 s with the FASTA algorithm. Using the Smith-Waterman algorithm, the same search takes 35 min. The PCOMPLIB program can compare a 0.8 million amino acid protein sequence library with itself in 5.3 min with FASTA on a third-generation 32 node Intel iPSC/860 hypercube.

  12. FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets.

    PubMed

    Shcherbina, Anna

    2014-08-15

    High-throughput next generation sequencing technologies have enabled rapid characterization of clinical and environmental samples. Consequently, the largest bottleneck to actionable data has become sample processing and bioinformatics analysis, creating a need for accurate and rapid algorithms to process genetic data. Perfectly characterized in silico datasets are a useful tool for evaluating the performance of such algorithms. Background contaminating organisms are observed in sequenced mixtures of organisms. In silico samples provide exact truth. To create the best value for evaluating algorithms, in silico data should mimic actual sequencer data as closely as possible. FASTQSim is a tool that provides the dual functionality of NGS dataset characterization and metagenomic data generation. FASTQSim is sequencing platform-independent, and computes distributions of read length, quality scores, indel rates, single point mutation rates, indel size, and similar statistics for any sequencing platform. To create training or testing datasets, FASTQSim has the ability to convert target sequences into in silico reads with specific error profiles obtained in the characterization step. FASTQSim enables users to assess the quality of NGS datasets. The tool provides information about read length, read quality, repetitive and non-repetitive indel profiles, and single base pair substitutions. FASTQSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software. In this regard, in silico datasets generated with the FASTQsim tool hold several advantages over natural datasets: they are sequencing platform independent, extremely well characterized, and less expensive to generate. Such datasets are valuable in a number of applications, including the training of assemblers for multiple platforms, benchmarking bioinformatics algorithm performance, and creating challenge

  13. Detection and characterization of novel sequence insertions using paired-end next-generation sequencing

    PubMed Central

    Hajirasouliha, Iman; Hormozdiari, Fereydoun; Alkan, Can; Kidd, Jeffrey M.; Birol, Inanc; Eichler, Evan E.; Sahinalp, S. Cenk

    2010-01-01

    Motivation: In the past few years, human genome structural variation discovery has enjoyed increased attention from the genomics research community. Many studies were published to characterize short insertions, deletions, duplications and inversions, and associate copy number variants (CNVs) with disease. Detection of new sequence insertions requires sequence data, however, the ‘detectable’ sequence length with read-pair analysis is limited by the insert size. Thus, longer sequence insertions that contribute to our genetic makeup are not extensively researched. Results: We present NovelSeq: a computational framework to discover the content and location of long novel sequence insertions using paired-end sequencing data generated by the next-generation sequencing platforms. Our framework can be built as part of a general sequence analysis pipeline to discover multiple types of genetic variation (SNPs, structural variation, etc.), thus it requires significantly less-computational resources than de novo sequence assembly. We apply our methods to detect novel sequence insertions in the genome of an anonymous donor and validate our results by comparing with the insertions discovered in the same genome using various sources of sequence data. Availability: The implementation of the NovelSeq pipeline is available at http://compbio.cs.sfu.ca/strvar.htm Contact:eee@gs.washington.edu; cenk@cs.sfu.ca PMID:20385726

  14. NG6: Integrated next generation sequencing storage and processing environment

    PubMed Central

    2012-01-01

    Background Next generation sequencing platforms are now well implanted in sequencing centres and some laboratories. Upcoming smaller scale machines such as the 454 junior from Roche or the MiSeq from Illumina will increase the number of laboratories hosting a sequencer. In such a context, it is important to provide these teams with an easily manageable environment to store and process the produced reads. Results We describe a user-friendly information system able to manage large sets of sequencing data. It includes, on one hand, a workflow environment already containing pipelines adapted to different input formats (sff, fasta, fastq and qseq), different sequencers (Roche 454, Illumina HiSeq) and various analyses (quality control, assembly, alignment, diversity studies,…) and, on the other hand, a secured web site giving access to the results. The connected user will be able to download raw and processed data and browse through the analysis result statistics. The provided workflows can easily be modified or extended and new ones can be added. Ergatis is used as a workflow building, running and monitoring system. The analyses can be run locally or in a cluster environment using Sun Grid Engine. Conclusions NG6 is a complete information system designed to answer the needs of a sequencing platform. It provides a user-friendly interface to process, store and download high-throughput sequencing data. PMID:22958229

  15. Toward a new paradigm of DNA writing using a massively parallel sequencing platform and degenerate oligonucleotide

    PubMed Central

    Hwang, Byungjin; Bang, Duhee

    2016-01-01

    All synthetic DNA materials require prior programming of the building blocks of the oligonucleotide sequences. The development of a programmable microarray platform provides cost-effective and time-efficient solutions in the field of data storage using DNA. However, the scalability of the synthesis is not on par with the accelerating sequencing capacity. Here, we report on a new paradigm of generating genetic material (writing) using a degenerate oligonucleotide and optomechanical retrieval method that leverages sequencing (reading) throughput to generate the desired number of oligonucleotides. As a proof of concept, we demonstrate the feasibility of our concept in digital information storage in DNA. In simulation, the ability to store data is expected to exponentially increase with increase in degenerate space. The present study highlights the major framework change in conventional DNA writing paradigm as a sequencer itself can become a potential source of making genetic materials. PMID:27876825

  16. Comparison of next generation sequencing technologies for transcriptome characterization

    PubMed Central

    2009-01-01

    Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary

  17. Next-generation sequencing and large genome assemblies.

    PubMed

    Henson, Joseph; Tischler, German; Ning, Zemin

    2012-06-01

    The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly. This article reviews the state of the art in de novo genome assembly, paying particular attention to mammalian-sized genomes. The strengths and weaknesses of the main sequencing platforms are highlighted, leading to a discussion of assembly and the new challenges associated with NGS data. Current approaches to assembly are outlined and the various software packages available are introduced and compared. The question of whether quality assemblies can be produced using short-read NGS data alone, or whether it must be combined with more expensive sequencing techniques, is considered. Prospects for future assemblers and tests of assembly performance are also discussed.

  18. Next-generation sequencing and large genome assemblies

    PubMed Central

    Henson, Joseph; Tischler, German; Ning, Zemin

    2012-01-01

    The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly. This article reviews the state of the art in de novo genome assembly, paying particular attention to mammalian-sized genomes. The strengths and weaknesses of the main sequencing platforms are highlighted, leading to a discussion of assembly and the new challenges associated with NGS data. Current approaches to assembly are outlined and the various software packages available are introduced and compared. The question of whether quality assemblies can be produced using short-read NGS data alone, or whether it must be combined with more expensive sequencing techniques, is considered. Prospects for future assemblers and tests of assembly performance are also discussed. PMID:22676195

  19. Variant Calling From Next Generation Sequence Data.

    PubMed

    Hansen, Nancy F

    2016-01-01

    The use of next generation nucleotide sequencing to discover and genotype small sequence variants has led to numerous insights into the molecular causes of various diseases. This chapter describes the use of freely available software to align next generation sequencing reads to a reference and then to use the resulting alignments to call, annotate, view, and filter small sequence variants. The suggested variant calling workflow includes read alignment with novoalign, the removal of polymerase chain reaction duplicate sequences with samtools or bamUtils, and the detection of variants with Freebayes or bam2mpg software. ANNOVAR is then used to annotate the predicted variants using gene models, population frequencies, and predicted mutation severity, producing variant files which can be viewed and filtered with the variant display tool VarSifter.

  20. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform

    PubMed Central

    Schirmer, Melanie; Ijaz, Umer Z.; D'Amore, Rosalinda; Hall, Neil; Sloan, William T.; Quince, Christopher

    2015-01-01

    With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina's MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%. PMID:25586220

  1. Theory of Periodic-Binary-Sequence Generators

    NASA Technical Reports Server (NTRS)

    Perlman, M.

    1987-01-01

    Algorithms yield feedback shift registers with maximum regularity. Report provides extensive mathematical treatment of new and previous results related to generation of pseudo-noise binary sequences by feedback shift registers. Generator architectures amenable to efficient implementation in very-large-scale integrated (VLSI) circuits. Report includes literature references to applications of such sequences in random-number generation, radar, VLSI testing, data encryption and decryption, algebraic error-detection and error-correction encoding and decoding, and feedback-shift-register synthesis of sequential machines.

  2. Theory of Periodic-Binary-Sequence Generators

    NASA Technical Reports Server (NTRS)

    Perlman, M.

    1987-01-01

    Algorithms yield feedback shift registers with maximum regularity. Report provides extensive mathematical treatment of new and previous results related to generation of pseudo-noise binary sequences by feedback shift registers. Generator architectures amenable to efficient implementation in very-large-scale integrated (VLSI) circuits. Report includes literature references to applications of such sequences in random-number generation, radar, VLSI testing, data encryption and decryption, algebraic error-detection and error-correction encoding and decoding, and feedback-shift-register synthesis of sequential machines.

  3. Software for pre-processing Illumina next-generation sequencing short read sequences

    PubMed Central

    2014-01-01

    Background When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets. Methods We developed ngsShoRT (next-generation sequencing Short Reads Trimmer), a flexible and comprehensive open-source software package written in Perl that provides a set of algorithms commonly used for pre-processing NGS short read sequences. We compared the features and performance of ngsShoRT with existing tools: CutAdapt, NGS QC Toolkit and Trimmomatic. We also compared the effects of using pre-processed short read sequences generated by different algorithms on de novo and reference-based assembly for three different genomes: Caenorhabditis elegans, Saccharomyces cerevisiae S288c, and Escherichia coli O157 H7. Results Several combinations of ngsShoRT algorithms were tested on publicly available Illumina GA II, HiSeq 2000, and MiSeq eukaryotic and bacteria genomic short read sequences with the focus on removing sequencing artifacts and low-quality reads and/or bases. Our results show that across three organisms and three sequencing platforms, trimming improved the mean quality scores of trimmed sequences. Using trimmed sequences for de novo and reference-based assembly improved assembly quality as well as assembler performance. In general, ngsShoRT outperformed comparable trimming tools in terms of trimming speed and improvement of de novo and reference

  4. Comparison of Next-Generation Sequencing Systems

    PubMed Central

    Liu, Lin; Li, Yinhu; Li, Siliang; Hu, Ni; He, Yimin; Pong, Ray; Lin, Danni; Lu, Lihua; Law, Maggie

    2012-01-01

    With fast development and wide applications of next-generation sequencing (NGS) technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI), which possesses the world's biggest sequencing capacity, has multiple NGS systems including 137 HiSeq 2000, 27 SOLiD, one Ion Torrent PGM, one MiSeq, and one 454 sequencer. We have accumulated extensive experience in sample handling, sequencing, and bioinformatics analysis. In this paper, technologies of these systems are reviewed, and first-hand data from extensive experience is summarized and analyzed to discuss the advantages and specifics associated with each sequencing system. At last, applications of NGS are summarized. PMID:22829749

  5. Next Generation DNA Sequencing and the Future of Genomic Medicine

    PubMed Central

    Anderson, Matthew W.; Schrijver, Iris

    2010-01-01

    In the years since the first complete human genome sequence was reported, there has been a rapid development of technologies to facilitate high-throughput sequence analysis of DNA (termed “next-generation” sequencing). These novel approaches to DNA sequencing offer the promise of complete genomic analysis at a cost feasible for routine clinical diagnostics. However, the ability to more thoroughly interrogate genomic sequence raises a number of important issues with regard to result interpretation, laboratory workflow, data storage, and ethical considerations. This review describes the current high-throughput sequencing platforms commercially available, and compares the inherent advantages and disadvantages of each. The potential applications for clinical diagnostics are considered, as well as the need for software and analysis tools to interpret the vast amount of data generated. Finally, we discuss the clinical and ethical implications of the wealth of genetic information generated by these methods. Despite the challenges, we anticipate that the evolution and refinement of high-throughput DNA sequencing technologies will catalyze a new era of personalized medicine based on individualized genomic analysis. PMID:24710010

  6. Development of a Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data on the MiSeq Illumina Sequencing Platform

    PubMed Central

    Kozich, James J.; Westcott, Sarah L.; Baxter, Nielson T.; Highlander, Sarah K.

    2013-01-01

    Rapid advances in sequencing technology have changed the experimental landscape of microbial ecology. In the last 10 years, the field has moved from sequencing hundreds of 16S rRNA gene fragments per study using clone libraries to the sequencing of millions of fragments per study using next-generation sequencing technologies from 454 and Illumina. As these technologies advance, it is critical to assess the strengths, weaknesses, and overall suitability of these platforms for the interrogation of microbial communities. Here, we present an improved method for sequencing variable regions within the 16S rRNA gene using Illumina's MiSeq platform, which is currently capable of producing paired 250-nucleotide reads. We evaluated three overlapping regions of the 16S rRNA gene that vary in length (i.e., V34, V4, and V45) by resequencing a mock community and natural samples from human feces, mouse feces, and soil. By titrating the concentration of 16S rRNA gene amplicons applied to the flow cell and using a quality score-based approach to correct discrepancies between reads used to construct contigs, we were able to reduce error rates by as much as two orders of magnitude. Finally, we reprocessed samples from a previous study to demonstrate that large numbers of samples could be multiplexed and sequenced in parallel with shotgun metagenomes. These analyses demonstrate that our approach can provide data that are at least as good as that generated by the 454 platform while providing considerably higher sequencing coverage for a fraction of the cost. PMID:23793624

  7. Use of Sequence-independent, single-primer amplification (SISPA) with NGS platform for detection of RNA viruses in clinical samples

    USDA-ARS?s Scientific Manuscript database

    Current technologies for next generation sequencing (NGS) have revolutionized metagenomics analysis of clinical samples. One advantage of the NGS platform is the possibility to sequence the genetic material in samples without any prior knowledge of the sequence contained within. Sequence-Independent...

  8. Extended blood group molecular typing and next-generation sequencing.

    PubMed

    Liu, Zhugong; Liu, Meihong; Mercado, Teresita; Illoh, Orieji; Davey, Richard

    2014-10-01

    Several high-throughput multiplex blood group molecular typing platforms have been developed to predict blood group antigen phenotypes. These molecular systems support extended donor/patient matching by detecting commonly encountered blood group polymorphisms as well as rare alleles that determine the expression of blood group antigens. Extended molecular typing of a large number of blood donors by high-throughput platforms can increase the likelihood of identifying donor red blood cells that match those of recipients. This is especially important in the management of multiply-transfused patients who may have developed several alloantibodies. Nevertheless, current molecular techniques have limitations. For example, they detect only predefined genetic variants. In contrast, target enrichment next-generation sequencing (NGS) is an emerging technology that provides comprehensive sequence information, focusing on specified genomic regions. Target enrichment NGS is able to assess genetic variations that cannot be achieved by traditional Sanger sequencing or other genotyping platforms. Target enrichment NGS has been used to detect both known and de novo genetic polymorphisms, including single-nucleotide polymorphisms, indels (insertions/deletions), and structural variations. This review discusses the methodology, advantages, and limitations of the current blood group genotyping techniques and describes various target enrichment NGS approaches that can be used to develop an extended blood group genotyping assay system.

  9. Sequence Depth, Not PCR Replication, Improves Ecological Inference from Next Generation DNA Sequencing

    PubMed Central

    Smith, Dylan P.; Peay, Kabir G.

    2014-01-01

    Recent advances in molecular approaches and DNA sequencing have greatly progressed the field of ecology and allowed for the study of complex communities in unprecedented detail. Next generation sequencing (NGS) can reveal powerful insights into the diversity, composition, and dynamics of cryptic organisms, but results may be sensitive to a number of technical factors, including molecular practices used to generate amplicons, sequencing technology, and data processing. Despite the popularity of some techniques over others, explicit tests of the relative benefits they convey in molecular ecology studies remain scarce. Here we tested the effects of PCR replication, sequencing depth, and sequencing platform on ecological inference drawn from environmental samples of soil fungi. We sequenced replicates of three soil samples taken from pine biomes in North America represented by pools of either one, two, four, eight, or sixteen PCR replicates with both 454 pyrosequencing and Illumina MiSeq. Increasing the number of pooled PCR replicates had no detectable effect on measures of α- and β-diversity. Pseudo-β-diversity – which we define as dissimilarity between re-sequenced replicates of the same sample – decreased markedly with increasing sampling depth. The total richness recovered with Illumina was significantly higher than with 454, but measures of α- and β-diversity between a larger set of fungal samples sequenced on both platforms were highly correlated. Our results suggest that molecular ecology studies will benefit more from investing in robust sequencing technologies than from replicating PCRs. This study also demonstrates the potential for continuous integration of older datasets with newer technology. PMID:24587293

  10. Therapeutic assessment of SEED: a new engineered antibody platform designed to generate mono- and bispecific antibodies.

    PubMed

    Muda, Marco; Gross, Alec W; Dawson, Jessica P; He, Chaomei; Kurosawa, Emmi; Schweickhardt, Rene; Dugas, Melanie; Soloviev, Maria; Bernhardt, Anna; Fischer, David; Wesolowski, John S; Kelton, Christie; Neuteboom, Berend; Hock, Bjoern

    2011-05-01

    The strand-exchange engineered domain (SEED) platform was designed to generate asymmetric and bispecific antibody-like molecules, a capability that expands therapeutic applications of natural antibodies. This new protein engineered platform is based on exchanging structurally related sequences of immunoglobulin within the conserved CH3 domains. Alternating sequences from human IgA and IgG in the SEED CH3 domains generate two asymmetric but complementary domains, designated AG and GA. The SEED design allows efficient generation of AG/GA heterodimers, while disfavoring homodimerization of AG and GA SEED CH3 domains. Using a clinically validated antibody (C225), we tested whether Fab derivatives constructed on the SEED platform retain desirable therapeutic antibody features such as in vitro and in vivo stability, favorable pharmacokinetics, ligand binding and effector functions including antibody-dependent cell-mediated cytotoxicity and complement-dependent cytotoxicity. In addition, we tested SEED with combinations of binder domains (scFv, VHH, Fab). Mono- and bivalent Fab-SEED fusions retain full binding affinity, have excellent biochemical and biophysical stability, and retain desirable antibody-like characteristics conferred by Fc domains. Furthermore, SEED is compatible with different combinations of Fab, scFv and VHH domains. Our assessment shows that the new SEED platform expands therapeutic applications of natural antibodies by generating heterodimeric Fc-analog proteins.

  11. Reproducibility of Variant Calls in Replicate Next Generation Sequencing Experiments

    PubMed Central

    Qi, Yuan; Liu, Xiuping; Liu, Chang-gong; Wang, Bailing; Hess, Kenneth R.; Symmans, W. Fraser; Shi, Weiwei; Pusztai, Lajos

    2015-01-01

    Nucleotide alterations detected by next generation sequencing are not always true biological changes but could represent sequencing errors. Even highly accurate methods can yield substantial error rates when applied to millions of nucleotides. In this study, we examined the reproducibility of nucleotide variant calls in replicate sequencing experiments of the same genomic DNA. We performed targeted sequencing of all known human protein kinase genes (kinome) (~3.2 Mb) using the SOLiD v4 platform. Seventeen breast cancer samples were sequenced in duplicate (n=14) or triplicate (n=3) to assess concordance of all calls and single nucleotide variant (SNV) calls. The concordance rates over the entire sequenced region were >99.99%, while the concordance rates for SNVs were 54.3-75.5%. There was substantial variation in basic sequencing metrics from experiment to experiment. The type of nucleotide substitution and genomic location of the variant had little impact on concordance but concordance increased with coverage level, variant allele count (VAC), variant allele frequency (VAF), variant allele quality and p-value of SNV-call. The most important determinants of concordance were VAC and VAF. Even using the highest stringency of QC metrics the reproducibility of SNV calls was around 80% suggesting that erroneous variant calling can be as high as 20-40% in a single experiment. The sequence data have been deposited into the European Genome-phenome Archive (EGA) with accession number EGAS00001000826. PMID:26136146

  12. What can next generation sequencing do for you? Next generation sequencing as a valuable tool in plant research.

    PubMed

    Bräutigam, A; Gowik, U

    2010-11-01

    Next generation sequencing (NGS) technologies have opened fascinating opportunities for the analysis of plants with and without a sequenced genome on a genomic scale. During the last few years, NGS methods have become widely available and cost effective. They can be applied to a wide variety of biological questions, from the sequencing of complete eukaryotic genomes and transcriptomes, to the genome-scale analysis of DNA-protein interactions. In this review, we focus on the use of NGS for plant transcriptomics, including gene discovery, transcript quantification and marker discovery for non-model plants, as well as transcript annotation and quantification, small RNA discovery and antisense transcription analysis for model plants. We discuss the experimental design for analysis of plants with and without a sequenced genome, including considerations on sampling, RNA preparation, sequencing platforms and bioinformatics tools for data analysis. NGS technologies offer exciting new opportunities for the plant sciences, especially for work on plants without a sequenced genome, since large sequence resources can be generated at moderate cost.

  13. Standardization and quality management in next-generation sequencing.

    PubMed

    Endrullat, Christoph; Glökler, Jörn; Franke, Philipp; Frohme, Marcus

    2016-09-01

    DNA sequencing continues to evolve quickly even after > 30 years. Many new platforms suddenly appeared and former established systems have vanished in almost the same manner. Since establishment of next-generation sequencing devices, this progress gains momentum due to the continually growing demand for higher throughput, lower costs and better quality of data. In consequence of this rapid development, standardized procedures and data formats as well as comprehensive quality management considerations are still scarce. Here, we listed and summarized current standardization efforts and quality management initiatives from companies, organizations and societies in form of published studies and ongoing projects. These comprise on the one hand quality documentation issues like technical notes, accreditation checklists and guidelines for validation of sequencing workflows. On the other hand, general standard proposals and quality metrics are developed and applied to the sequencing workflow steps with the main focus on upstream processes. Finally, certain standard developments for downstream pipeline data handling, processing and storage are discussed in brief. These standardization approaches represent a first basis for continuing work in order to prospectively implement next-generation sequencing in important areas such as clinical diagnostics, where reliable results and fast processing is crucial. Additionally, these efforts will exert a decisive influence on traceability and reproducibility of sequence data.

  14. PCR Techniques in Next-Generation Sequencing.

    PubMed

    Goswami, Rashmi S

    2016-01-01

    With the advent of next-generation sequencing and its prolific use in the clinical realm, it would appear that techniques such as PCR would not be in high demand. This is not the case however, as PCR techniques play an important role in the success of NGS technology. Although NGS has rapidly become an important part of clinical molecular diagnostics, whole genome sequencing is still difficult to implement in a clinical laboratory due to high costs of sequencing, as well as issues surrounding data processing, analysis, and data storage, which can reduce efficiency and increase turnaround times. As a result, targeted sequencing is often used in clinical diagnostics, due to its increased efficiency. PCR techniques play an integral role in targeted NGS sequencing, allowing for the generation of multiple NGS libraries and the sequencing of multiple targeted regions simultaneously. We will outline the methods we employ in PCR amplification of targeted genomic regions for cancer mutation hotspots using the Ampliseq Cancer Hotspot v2 panel (Life Technologies, Carlsbad, CA).

  15. Analysis of next-generation sequencing data using Galaxy.

    PubMed

    Blankenberg, Daniel; Hillman-Jackson, Jennifer

    2014-01-01

    The extraordinary throughput of next-generation sequencing (NGS) technology is outpacing our ability to analyze and interpret the data. This chapter will focus on practical informatics methods, strategies, and software tools for transforming NGS data into usable information through the use of a web-based platform, Galaxy. The Galaxy interface is explored through several different types of example analyses. Instructions for running one's own Galaxy server on local hardware or on cloud computing resources are provided. Installing new tools into a personal Galaxy instance is also demonstrated.

  16. Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges

    PubMed Central

    El-Metwally, Sara; Hamza, Taher; Zakaria, Magdi; Helmy, Mohamed

    2013-01-01

    Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms. PMID:24348224

  17. Next-generation sequence assembly: four stages of data processing and computational challenges.

    PubMed

    El-Metwally, Sara; Hamza, Taher; Zakaria, Magdi; Helmy, Mohamed

    2013-01-01

    Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms.

  18. Virtually sequenced: The next genomic generation

    SciTech Connect

    Bains, W.

    1996-06-01

    The announcement of {open_quotes}virtual genomics{close_quotes} requires evaluation of the efficiency and accuracy of computer-generated sequencing efforts. {open_quotes}Digital Northerns{close_quotes}, or Northern blot electrophoresis done in the realm of computer data, have been developed by Incyte Pharmaceuticals (Palo Alto, CA) and Human Genome Sciences (Rockville, MD). 12 refs., 2 figs.

  19. Genetic mutation analysis of human gastric adenocarcinomas using ion torrent sequencing platform.

    PubMed

    Xu, Zhi; Huo, Xinying; Ye, Hua; Tang, Chuanning; Nandakumar, Vijayalakshmi; Lou, Feng; Zhang, Dandan; Dong, Haichao; Sun, Hong; Jiang, Shouwen; Zhang, Guangchun; Liu, Zhiyuan; Dong, Zhishou; Guo, Baishuai; He, Yan; Yan, Chaowei; Wang, Lu; Su, Ziyi; Li, Yangyang; Gu, Dongying; Zhang, Xiaojing; Wu, Xiaomin; Wei, Xiaowei; Hong, Lingzhi; Zhang, Yangmei; Yang, Jinsong; Gong, Yonglin; Tang, Cuiju; Jones, Lindsey; Huang, Xue F; Chen, Si-Yi; Chen, Jinfei

    2014-01-01

    Gastric cancer is the one of the major causes of cancer-related death, especially in Asia. Gastric adenocarcinoma, the most common type of gastric cancer, is heterogeneous and its incidence and cause varies widely with geographical regions, gender, ethnicity, and diet. Since unique mutations have been observed in individual human cancer samples, identification and characterization of the molecular alterations underlying individual gastric adenocarcinomas is a critical step for developing more effective, personalized therapies. Until recently, identifying genetic mutations on an individual basis by DNA sequencing remained a daunting task. Recent advances in new next-generation DNA sequencing technologies, such as the semiconductor-based Ion Torrent sequencing platform, makes DNA sequencing cheaper, faster, and more reliable. In this study, we aim to identify genetic mutations in the genes which are targeted by drugs in clinical use or are under development in individual human gastric adenocarcinoma samples using Ion Torrent sequencing. We sequenced 737 loci from 45 cancer-related genes in 238 human gastric adenocarcinoma samples using the Ion Torrent Ampliseq Cancer Panel. The sequencing analysis revealed a high occurrence of mutations along the TP53 locus (9.7%) in our sample set. Thus, this study indicates the utility of a cost and time efficient tool such as Ion Torrent sequencing to screen cancer mutations for the development of personalized cancer therapy.

  20. Impact of Next Generation Sequencing Techniques in Food Microbiology

    PubMed Central

    Mayo, Baltasar; Rachid, Caio T. C. C; Alegría, Ángel; Leite, Analy M. O; Peixoto, Raquel S; Delgado, Susana

    2014-01-01

    Understanding the Maxam-Gilbert and Sanger sequencing as the first generation, in recent years there has been an explosion of newly-developed sequencing strategies, which are usually referred to as next generation sequencing (NGS) techniques. NGS techniques have high-throughputs and produce thousands or even millions of sequences at the same time. These sequences allow for the accurate identification of microbial taxa, including uncultivable organisms and those present in small numbers. In specific applications, NGS provides a complete inventory of all microbial operons and genes present or being expressed under different study conditions. NGS techniques are revolutionizing the field of microbial ecology and have recently been used to examine several food ecosystems. After a short introduction to the most common NGS systems and platforms, this review addresses how NGS techniques have been employed in the study of food microbiota and food fermentations, and discusses their limits and perspectives. The most important findings are reviewed, including those made in the study of the microbiota of milk, fermented dairy products, and plant-, meat- and fish-derived fermented foods. The knowledge that can be gained on microbial diversity, population structure and population dynamics via the use of these technologies could be vital in improving the monitoring and manipulation of foods and fermented food products. They should also improve their safety. PMID:25132799

  1. Impact of next generation sequencing techniques in food microbiology.

    PubMed

    Mayo, Baltasar; Rachid, Caio T C C; Alegría, Angel; Leite, Analy M O; Peixoto, Raquel S; Delgado, Susana

    2014-08-01

    Understanding the Maxam-Gilbert and Sanger sequencing as the first generation, in recent years there has been an explosion of newly-developed sequencing strategies, which are usually referred to as next generation sequencing (NGS) techniques. NGS techniques have high-throughputs and produce thousands or even millions of sequences at the same time. These sequences allow for the accurate identification of microbial taxa, including uncultivable organisms and those present in small numbers. In specific applications, NGS provides a complete inventory of all microbial operons and genes present or being expressed under different study conditions. NGS techniques are revolutionizing the field of microbial ecology and have recently been used to examine several food ecosystems. After a short introduction to the most common NGS systems and platforms, this review addresses how NGS techniques have been employed in the study of food microbiota and food fermentations, and discusses their limits and perspectives. The most important findings are reviewed, including those made in the study of the microbiota of milk, fermented dairy products, and plant-, meat- and fish-derived fermented foods. The knowledge that can be gained on microbial diversity, population structure and population dynamics via the use of these technologies could be vital in improving the monitoring and manipulation of foods and fermented food products. They should also improve their safety.

  2. Double-digest RAD sequencing using Ion Proton semiconductor platform (ddRADseq-ion) with nonmodel organisms.

    PubMed

    Recknagel, Hans; Jacobs, Arne; Herzyk, Pawel; Elmer, Kathryn R

    2015-11-01

    Research in evolutionary biology involving nonmodel organisms is rapidly shifting from using traditional molecular markers such as mtDNA and microsatellites to higher throughput SNP genotyping methodologies to address questions in population genetics, phylogenetics and genetic mapping. Restriction site associated DNA sequencing (RAD sequencing or RADseq) has become an established method for SNP genotyping on Illumina sequencing platforms. Here, we developed a protocol and adapters for double-digest RAD sequencing for Ion Torrent (Life Technologies; Ion Proton, Ion PGM) semiconductor sequencing. We sequenced thirteen genomic libraries of three different nonmodel vertebrate species on Ion Proton with PI chips: Arctic charr Salvelinus alpinus, European whitefish Coregonus lavaretus and common lizard Zootoca vivipara. This resulted in ~962 million single-end reads overall and a mean of ~74 million reads per library. We filtered the genomic data using Stacks, a bioinformatic tool to process RAD sequencing data. On average, we obtained ~11,000 polymorphic loci per library of 6-30 individuals. We validate our new method by technical and biological replication, by reconstructing phylogenetic relationships, and using a hybrid genetic cross to track genomic variants. Finally, we discuss the differences between using the different sequencing platforms in the context of RAD sequencing, assessing possible advantages and disadvantages. We show that our protocol can be used for Ion semiconductor sequencing platforms for the rapid and cost-effective generation of variable and reproducible genetic markers. © 2015 John Wiley & Sons Ltd.

  3. Neural mechanisms of sequence generation in songbirds

    NASA Astrophysics Data System (ADS)

    Langford, Bruce

    Animal models in research are useful for studying more complex behavior. For example, motor sequence generation of actions requiring good muscle coordination such as writing with a pen, playing an instrument, or speaking, may involve the interaction of many areas in the brain, each a complex system in itself; thus it can be difficult to determine causal relationships between neural behavior and the behavior being studied. Birdsong, however, provides an excellent model behavior for motor sequence learning, memory, and generation. The song consists of learned sequences of notes that are spectrographically stereotyped over multiple renditions of the song, similar to syllables in human speech. The main areas of the songbird brain involve in singing are known, however, the mechanisms by which these systems store and produce song are not well understood. We used a custom built, head-mounted, miniature motorized microdrive to chronically record the neural firing patterns of identified neurons in HVC, a pre-motor cortical nucleus which has been shown to be important in song timing. These were done in Bengalese finch which generate a song made up of stereotyped notes but variable note sequences. We observed song related bursting in neurons projecting to Area X, a homologue to basal ganglia, and tonic firing in HVC interneurons. Interneuron had firing rate patterns that were consistent over multiple renditions of the same note sequence. We also designed and built a light-weight, low-powered wireless programmable neural stimulator using Bluetooth Low Energy Protocol. It was able to generate perturbations in the song when current pulses were administered to RA, which projects to the brainstem nucleus responsible for syringeal muscle control.

  4. Next generation sequencing reveals the hidden diversity of zooplankton assemblages.

    PubMed

    Lindeque, Penelope K; Parry, Helen E; Harmer, Rachel A; Somerfield, Paul J; Atkinson, Angus

    2013-01-01

    Zooplankton play an important role in our oceans, in biogeochemical cycling and providing a food source for commercially important fish larvae. However, difficulties in correctly identifying zooplankton hinder our understanding of their roles in marine ecosystem functioning, and can prevent detection of long term changes in their community structure. The advent of massively parallel next generation sequencing technology allows DNA sequence data to be recovered directly from whole community samples. Here we assess the ability of such sequencing to quantify richness and diversity of a mixed zooplankton assemblage from a productive time series site in the Western English Channel. Plankton net hauls (200 µm) were taken at the Western Channel Observatory station L4 in September 2010 and January 2011. These samples were analysed by microscopy and metagenetic analysis of the 18S nuclear small subunit ribosomal RNA gene using the 454 pyrosequencing platform. Following quality control a total of 419,041 sequences were obtained for all samples. The sequences clustered into 205 operational taxonomic units using a 97% similarity cut-off. Allocation of taxonomy by comparison with the National Centre for Biotechnology Information database identified 135 OTUs to species level, 11 to genus level and 1 to order, <2.5% of sequences were classified as unknowns. By comparison a skilled microscopic analyst was able to routinely enumerate only 58 taxonomic groups. Metagenetics reveals a previously hidden taxonomic richness, especially for Copepoda and hard-to-identify meroplankton such as Bivalvia, Gastropoda and Polychaeta. It also reveals rare species and parasites. We conclude that Next Generation Sequencing of 18S amplicons is a powerful tool for elucidating the true diversity and species richness of zooplankton communities. While this approach allows for broad diversity assessments of plankton it may become increasingly attractive in future if sequence reference libraries of

  5. Next Generation Sequencing Reveals the Hidden Diversity of Zooplankton Assemblages

    PubMed Central

    Harmer, Rachel A.; Somerfield, Paul J.; Atkinson, Angus

    2013-01-01

    Background Zooplankton play an important role in our oceans, in biogeochemical cycling and providing a food source for commercially important fish larvae. However, difficulties in correctly identifying zooplankton hinder our understanding of their roles in marine ecosystem functioning, and can prevent detection of long term changes in their community structure. The advent of massively parallel next generation sequencing technology allows DNA sequence data to be recovered directly from whole community samples. Here we assess the ability of such sequencing to quantify richness and diversity of a mixed zooplankton assemblage from a productive time series site in the Western English Channel. Methodology/Principle Findings Plankton net hauls (200 µm) were taken at the Western Channel Observatory station L4 in September 2010 and January 2011. These samples were analysed by microscopy and metagenetic analysis of the 18S nuclear small subunit ribosomal RNA gene using the 454 pyrosequencing platform. Following quality control a total of 419,041 sequences were obtained for all samples. The sequences clustered into 205 operational taxonomic units using a 97% similarity cut-off. Allocation of taxonomy by comparison with the National Centre for Biotechnology Information database identified 135 OTUs to species level, 11 to genus level and 1 to order, <2.5% of sequences were classified as unknowns. By comparison a skilled microscopic analyst was able to routinely enumerate only 58 taxonomic groups. Conclusions Metagenetics reveals a previously hidden taxonomic richness, especially for Copepoda and hard-to-identify meroplankton such as Bivalvia, Gastropoda and Polychaeta. It also reveals rare species and parasites. We conclude that Next Generation Sequencing of 18S amplicons is a powerful tool for elucidating the true diversity and species richness of zooplankton communities. While this approach allows for broad diversity assessments of plankton it may become increasingly

  6. Open-Phylo: a customizable crowd-computing platform for multiple sequence alignment

    PubMed Central

    2013-01-01

    Citizen science games such as Galaxy Zoo, Foldit, and Phylo aim to harness the intelligence and processing power generated by crowds of online gamers to solve scientific problems. However, the selection of the data to be analyzed through these games is under the exclusive control of the game designers, and so are the results produced by gamers. Here, we introduce Open-Phylo, a freely accessible crowd-computing platform that enables any scientist to enter our system and use crowds of gamers to assist computer programs in solving one of the most fundamental problems in genomics: the multiple sequence alignment problem. PMID:24148814

  7. Open-Phylo: a customizable crowd-computing platform for multiple sequence alignment.

    PubMed

    Kwak, Daniel; Kam, Alfred; Becerra, David; Zhou, Qikuan; Hops, Adam; Zarour, Eleyine; Kam, Arthur; Sarmenta, Luis; Blanchette, Mathieu; Waldispühl, Jérôme

    2013-01-01

    Citizen science games such as Galaxy Zoo, Foldit, and Phylo aim to harness the intelligence and processing power generated by crowds of online gamers to solve scientific problems. However, the selection of the data to be analyzed through these games is under the exclusive control of the game designers, and so are the results produced by gamers. Here, we introduce Open-Phylo, a freely accessible crowd-computing platform that enables any scientist to enter our system and use crowds of gamers to assist computer programs in solving one of the most fundamental problems in genomics: the multiple sequence alignment problem.

  8. Short Barcodes for Next Generation Sequencing

    PubMed Central

    Mir, Katharina; Neuhaus, Klaus; Bossert, Martin; Schober, Steffen

    2013-01-01

    We consider the design and evaluation of short barcodes, with a length between six and eight nucleotides, used for parallel sequencing on platforms where substitution errors dominate. Such codes should have not only good error correction properties but also the code words should fulfil certain biological constraints (experimental parameters). We compare published barcodes with codes obtained by two new constructions methods, one based on the currently best known linear codes and a simple randomized construction method. The evaluation done is with respect to the error correction capabilities, barcode size and their experimental parameters and fundamental bounds on the code size and their distance properties. We provide a list of codes for lengths between six and eight nucleotides, where for length eight, two substitution errors can be corrected. In fact, no code with larger minimum distance can exist. PMID:24386128

  9. Colorimetric biosensing of targeted gene sequence using dual nanoparticle platforms

    PubMed Central

    Thavanathan, Jeevan; Huang, Nay Ming; Thong, Kwai Lin

    2015-01-01

    We have developed a colorimetric biosensor using a dual platform of gold nanoparticles and graphene oxide sheets for the detection of Salmonella enterica. The presence of the invA gene in S. enterica causes a change in color of the biosensor from its original pinkish-red to a light purplish solution. This occurs through the aggregation of the primary gold nanoparticles–conjugated DNA probe onto the surface of the secondary graphene oxide–conjugated DNA probe through DNA hybridization with the targeted DNA sequence. Spectrophotometry analysis showed a shift in wavelength from 525 nm to 600 nm with 1 μM of DNA target. Specificity testing revealed that the biosensor was able to detect various serovars of the S. enterica while no color change was observed with the other bacterial species. Sensitivity testing revealed the limit of detection was at 1 nM of DNA target. This proves the effectiveness of the biosensor in the detection of S. enterica through DNA hybridization. PMID:25897217

  10. Microfluidic Platform Generates Oxygen Landscapes for Localized Hypoxic Activation

    PubMed Central

    Rexius, Megan L.; Mauleon, Gerardo; Malik, Asrar B.; Rehman, Jalees; Eddington, David T.

    2014-01-01

    An open-well microfluidic platform generates an oxygen landscape using gas-perfused networks which diffuse across a membrane. The device enables real-time analysis of cellular and tissue responses to oxygen tension to define how cells adapt to heterogeneous oxygen conditions found in the physiological setting. We demonstrate that localized hypoxic activation of cells elicited specific metabolic and gene responses in human microvascular endothelial cells and bone marrow-derived mesenchymal stem cells. A robust demonstration of the compatibility of the device with standard laboratory techniques demonstrates the wide utility of the method. This platform is ideally suited to study real-time cell responses and cell-cell interactions within physiologically relevant oxygen landscapes. PMID:25315003

  11. Microfluidic platform generates oxygen landscapes for localized hypoxic activation.

    PubMed

    Rexius-Hall, Megan L; Mauleon, Gerardo; Malik, Asrar B; Rehman, Jalees; Eddington, David T

    2014-12-21

    An open-well microfluidic platform generates an oxygen landscape using gas-perfused networks which diffuse across a membrane. The device enables real-time analysis of cellular and tissue responses to oxygen tension to define how cells adapt to heterogeneous oxygen conditions found in the physiological setting. We demonstrate that localized hypoxic activation of cells elicited specific metabolic and gene responses in human microvascular endothelial cells and bone marrow-derived mesenchymal stem cells. A robust demonstration of the compatibility of the device with standard laboratory techniques demonstrates the wide utility of the method. This platform is ideally suited to study real-time cell responses and cell-cell interactions within physiologically relevant oxygen landscapes.

  12. Platform evolution and sequence stratigraphy of Natuna L-Structure, South China Sea, Indonesia

    SciTech Connect

    Rudolph, K.W.; Lehmann, P.J.

    1987-05-01

    By integrating seismic, well-log, and core data into a sequence framework, they are able to recognize seven complete depositional sequences in the Miocene age Terumbu Formation carbonates of the Natuna Platform (L-Structure). Each sequence consists of a lowstand systems tract, a transgressive systems tract and condensed section (seismic downlap surface), and a highstand systems tract. Terumbu carbonates display a downward shift of reservoir facies in the lowstand systems tract, deepen upward (retrograde) in the transgressive systems tract, and shoal upward (prograde) in the highstand systems tract. At each sequence boundary, there is erosional truncation of the platform margin and upper slope and exposure of the platform crest. The highest porosity occurs in grain-prone shoal water carbonates of the late highstand systems tract on the platform crest. Porosity also occurs down dip from the platform crest in the onlapping lowstand systems tract. Sequence stratigraphy, seismic facies, and seismic modeling analysis are used to map and predict reservoir distribution on the Natuna Platform. Increased subsidence from the Miocene onward caused the retreat of the Natuna Platform. Retreat occurred in an asymmetric fashion with more retreat on the west, or low-productivity, side of the platform. Platform retreat occurred incrementally, during deposition of transgressive systems tracts and the condensed sections. The large eustatic sea level rise in the early Pliocene, combined with continued rapid subsidence, drowned the platform and ended carbonate production.

  13. CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes

    PubMed Central

    2012-01-01

    Background It is now well established that nearly 20% of human cancers are caused by infectious agents, and the list of human oncogenic pathogens will grow in the future for a variety of cancer types. Whole tumor transcriptome and genome sequencing by next-generation sequencing technologies presents an unparalleled opportunity for pathogen detection and discovery in human tissues but requires development of new genome-wide bioinformatics tools. Results Here we present CaPSID (Computational Pathogen Sequence IDentification), a comprehensive bioinformatics platform for identifying, querying and visualizing both exogenous and endogenous pathogen nucleotide sequences in tumor genomes and transcriptomes. CaPSID includes a scalable, high performance database for data storage and a web application that integrates the genome browser JBrowse. CaPSID also provides useful metrics for sequence analysis of pre-aligned BAM files, such as gene and genome coverage, and is optimized to run efficiently on multiprocessor computers with low memory usage. Conclusions To demonstrate the usefulness and efficiency of CaPSID, we carried out a comprehensive analysis of both a simulated dataset and transcriptome samples from ovarian cancer. CaPSID correctly identified all of the human and pathogen sequences in the simulated dataset, while in the ovarian dataset CaPSID’s predictions were successfully validated in vitro. PMID:22901030

  14. Next-generation sequencing: advances and applications in cancer diagnosis

    PubMed Central

    Serratì, Simona; De Summa, Simona; Pilato, Brunella; Petriella, Daniela; Lacalamita, Rosanna; Tommasi, Stefania; Pinto, Rosamaria

    2016-01-01

    Technological advances have led to the introduction of next-generation sequencing (NGS) platforms in cancer investigation. NGS allows massive parallel sequencing that affords maximal tumor genomic assessment. NGS approaches are different, and concern DNA and RNA analysis. DNA sequencing includes whole-genome, whole-exome, and targeted sequencing, which focuses on a selection of genes of interest for a specific disease. RNA sequencing facilitates the detection of alternative gene-spliced transcripts, posttranscriptional modifications, gene fusion, mutations/single-nucleotide polymorphisms, small and long noncoding RNAs, and changes in gene expression. Most applications are in the cancer research field, but lately NGS technology has been revolutionizing cancer molecular diagnostics, due to the many advantages it offers compared to traditional methods. There is greater knowledge on solid cancer diagnostics, and recent interest has been shown also in the field of hematologic cancer. In this review, we report the latest data on NGS diagnostic/predictive clinical applications in solid and hematologic cancers. Moreover, since the amount of NGS data produced is very large and their interpretation is very complex, we briefly discuss two bioinformatic aspects, variant-calling accuracy and copy-number variation detection, which are gaining a lot of importance in cancer-diagnostic assessment. PMID:27980425

  15. Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform

    PubMed Central

    Van Nostrand, Joy D.; Ning, Daliang; Sun, Bo; Xue, Kai; Liu, Feifei; Deng, Ye; Liang, Yuting; Zhou, Jizhong

    2017-01-01

    Illumina’s MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered, the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1–3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately

  16. Mining Contiguous Sequential Generators in Biological Sequences.

    PubMed

    Zhang, Jingsong; Wang, Yinglin; Zhang, Chao; Shi, Yongyong

    2016-01-01

    The discovery of conserved sequential patterns in biological sequences is essential to unveiling common shared functions. Mining sequential generators as well as mining closed sequential patterns can contribute to a more concise result set than mining all sequential patterns, especially in the analysis of big data in bioinformatics. Previous studies have also presented convincing arguments that the generator is preferable to the closed pattern in inductive inference and classification. However, classic sequential generator mining algorithms, due to the lack of consideration on the contiguous constraint along with the lower-closed one, still pose a great challenge at spawning a large number of inefficient and redundant patterns, which is too huge for effective usage. Driven by some extensive applications of patterns with contiguous feature, we propose ConSgen, an efficient algorithm for discovering contiguous sequential generators. It adopts the n-gram model, called shingles, to generate potential frequent subsequences and leverages several pruning techniques to prune the unpromising parts of search space. And then, the contiguous sequential generators are identified by using the equivalence class-based lower-closure checking scheme. Our experiments on both DNA and protein data sets demonstrate the compactness, efficiency, and scalability of ConSgen.

  17. MPD: multiplex primer design for next-generation targeted sequencing.

    PubMed

    Wingo, Thomas S; Kotlar, Alex; Cutler, David J

    2017-01-05

    Targeted resequencing offers a cost-effective alternative to whole-genome and whole-exome sequencing when investigating regions known to be associated with a trait or disease. There are a number of approaches to targeted resequencing, including microfluidic PCR amplification, which may be enhanced by multiplex PCR. Currently, there is no open-source software that can design next-generation multiplex PCR experiments that ensures primers are unique at a genome-level and efficiently pools compatible primers. We present MPD, a software package that automates the design of multiplex PCR primers for next-generation sequencing. The core of MPD is implemented in C for speed and uses a hashed genome to ensure primer uniqueness, avoids placing primers over sites of known variation, and efficiently pools compatible primers. A JavaScript web application ( http://multiplexprimer.io ) utilizing the MPD Perl package provides a convenient platform for users to make designs. Using a realistic set of genes identified by genome-wide association studies (GWAS), we achieve 90% coverage of all exonic regions using stringent design criteria. Using the first 47 primer pools for wet-lab validation, we sequenced ~25Kb at 99.7% completeness with a mean coverage of 300X among 313 samples simultaneously and identified 224 variants. The number and nature of variants we observe are consistent with high quality sequencing. MPD can successfully design multiplex PCR experiments suitable for next-generation sequencing, and simplifies retooling targeted resequencing pipelines to focus on new targets as new genetic evidence emerges.

  18. Next Generation Sequencing in Alzheimer's Disease.

    PubMed

    Bertram, Lars

    2016-01-01

    For the first time in the history of human genetics research, it is now both technically feasible and economically affordable to screen individual genomes for novel disease-causing mutations at base-pair resolution using "next-generation sequencing" (NGS). One popular aim in many of today's NGS studies is genome resequencing (in part or whole) to identify DNA variants potentially accounting for the "missing heritability" problem observed in many genetically complex traits. Thus far, only relatively few projects have applied these powerful new technologies to search for novel Alzheimer's disease (AD) related sequence variants. In this review, I summarize the findings from the first NGS-based resequencing studies in AD and discuss their potential implications and limitations. Notable recent discoveries using NGS include the identification of rare susceptibility modifying alleles in APP, TREM2, and PLD3. Several other large-scale NGS projects are currently underway so that additional discoveries can be expected over the coming years.

  19. Next Generation Sequencing in Endocrine Practice

    PubMed Central

    Forlenza, Gregory P.; Calhoun, Amy; Beckman, Kenneth B.; Halvorsen, Tanya; Hamdoun, Elwaseila; Zierhut, Heather; Sarafoglou, Kyriakie; Polgreen, Lynda E.; Miller, Bradley S.; Nathan, Brandon; Petryk, Anna

    2016-01-01

    With the completion of the Human Genome Project and advances in genomic sequencing technologies, the use of clinical molecular diagnostics has grown tremendously over the last decade. Next-generation sequencing (NGS) has overcome many of the practical roadblocks that had slowed the adoption of molecular testing for routine clinical diagnosis. In endocrinology, targeted NGS now complements biochemical testing and imaging studies. The goal of this review is to provide clinicians with a guide to the application of NGS to genetic testing for endocrine conditions, by compiling a list of established gene mutations detectable by NGS, and highlighting key phenotypic features of these disorders. As we outline in this review, the clinical utility of NGS-based molecular testing for endocrine disorders is very high. Identifying an exact genetic etiology improves understanding of the disease, provides clear explanation to families about the cause, and guides decisions about screening, prevention and/or treatment. PMID:25958132

  20. Generating matrix and sums of Fibonacci and Pell sequences

    NASA Astrophysics Data System (ADS)

    Ho, C. K.; Woon, H. S.; Chong, Chin-Yoon

    2014-07-01

    In this paper, we study the Fibonacci sequence and Pell sequence and developed generating matrices for them. First we proved two results on the even sum of the Fibonacci sequence and the Pell sequence, using the generating matrix approach. We then deduce the odd sums, some identities and recursive formulas for these two sequences.

  1. Next Generation Sequence Assembly with AMOS

    PubMed Central

    Treangen, Todd J; Sommer, Dan D; Angly, Florent E; Koren, Sergey; Pop, Mihai

    2011-01-01

    A Modular Open-Source Assembler (AMOS) was designed to offer a modular approach to genome assembly. AMOS includes a wide range of tools for assembly, including lightweight de novo assemblers Minimus and Minimo, and Bambus 2, a robust scaffolder able to handle metagenomic and polymorphic data. This protocol describes how to configure and use AMOS for the assembly of Next Generation sequence data. Additionally, we provide three tutorial examples that include bacterial, viral, and metagenomic datasets with specific tips for improving assembly quality. PMID:21400694

  2. CPSS: a computational platform for the analysis of small RNA deep sequencing data.

    PubMed

    Zhang, Yuanwei; Xu, Bo; Yang, Yifan; Ban, Rongjun; Zhang, Huan; Jiang, Xiaohua; Cooke, Howard J; Xue, Yu; Shi, Qinghua

    2012-07-15

    Next generation sequencing (NGS) techniques have been widely used to document the small ribonucleic acids (RNAs) implicated in a variety of biological, physiological and pathological processes. An integrated computational tool is needed for handling and analysing the enormous datasets from small RNA deep sequencing approach. Herein, we present a novel web server, CPSS (a computational platform for the analysis of small RNA deep sequencing data), designed to completely annotate and functionally analyse microRNAs (miRNAs) from NGS data on one platform with a single data submission. Small RNA NGS data can be submitted to this server with analysis results being returned in two parts: (i) annotation analysis, which provides the most comprehensive analysis for small RNA transcriptome, including length distribution and genome mapping of sequencing reads, small RNA quantification, prediction of novel miRNAs, identification of differentially expressed miRNAs, piwi-interacting RNAs and other non-coding small RNAs between paired samples and detection of miRNA editing and modifications and (ii) functional analysis, including prediction of miRNA targeted genes by multiple tools, enrichment of gene ontology terms, signalling pathway involvement and protein-protein interaction analysis for the predicted genes. CPSS, a ready-to-use web server that integrates most functions of currently available bioinformatics tools, provides all the information wanted by the majority of users from small RNA deep sequencing datasets. CPSS is implemented in PHP/PERL+MySQL+R and can be freely accessed at http://mcg.ustc.edu.cn/db/cpss/index.html or http://mcg.ustc.edu.cn/sdap1/cpss/index.html.

  3. Next generation sequencing and its applications in forensic genetics.

    PubMed

    Børsting, Claus; Morling, Niels

    2015-09-01

    It has been almost a decade since the first next generation sequencing (NGS) technologies emerged and quickly changed the way genetic research is conducted. Today, full genomes are mapped and published almost weekly and with ever increasing speed and decreasing costs. NGS methods and platforms have matured during the last 10 years, and the quality of the sequences has reached a level where NGS is used in clinical diagnostics of humans. Forensic genetic laboratories have also explored NGS technologies and especially in the last year, there has been a small explosion in the number of scientific articles and presentations at conferences with forensic aspects of NGS. These contributions have demonstrated that NGS offers new possibilities for forensic genetic case work. More information may be obtained from unique samples in a single experiment by analyzing combinations of markers (STRs, SNPs, insertion/deletions, mRNA) that cannot be analyzed simultaneously with the standard PCR-CE methods used today. The true variation in core forensic STR loci has been uncovered, and previously unknown STR alleles have been discovered. The detailed sequence information may aid mixture interpretation and will increase the statistical weight of the evidence. In this review, we will give an introduction to NGS and single-molecule sequencing, and we will discuss the possible applications of NGS in forensic genetics. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  4. Revealing the Complexity of Breast Cancer by Next Generation Sequencing

    PubMed Central

    Verigos, John; Magklara, Angeliki

    2015-01-01

    Over the last few years the increasing usage of “-omic” platforms, supported by next-generation sequencing, in the analysis of breast cancer samples has tremendously advanced our understanding of the disease. New driver and passenger mutations, rare chromosomal rearrangements and other genomic aberrations identified by whole genome and exome sequencing are providing missing pieces of the genomic architecture of breast cancer. High resolution maps of breast cancer methylomes and sequencing of the miRNA microworld are beginning to paint the epigenomic landscape of the disease. Transcriptomic profiling is giving us a glimpse into the gene regulatory networks that govern the fate of the breast cancer cell. At the same time, integrative analysis of sequencing data confirms an extensive intertumor and intratumor heterogeneity and plasticity in breast cancer arguing for a new approach to the problem. In this review, we report on the latest findings on the molecular characterization of breast cancer using NGS technologies, and we discuss their potential implications for the improvement of existing therapies. PMID:26561834

  5. Generative technique for dynamic infrared image sequences

    NASA Astrophysics Data System (ADS)

    Zhang, Qian; Cao, Zhiguo; Zhang, Tianxu

    2001-09-01

    The generative technique of the dynamic infrared image was discussed in this paper. Because infrared sensor differs from CCD camera in imaging mechanism, it generates the infrared image by incepting the infrared radiation of scene (including target and background). The infrared imaging sensor is affected deeply by the atmospheric radiation, the environmental radiation and the attenuation of atmospheric radiation transfers. Therefore at first in this paper the imaging influence of all kinds of the radiations was analyzed and the calculation formula of radiation was provided, in addition, the passive scene and the active scene were analyzed separately. Then the methods of calculation in the passive scene were provided, and the functions of the scene model, the atmospheric transmission model and the material physical attribute databases were explained. Secondly based on the infrared imaging model, the design idea, the achievable way and the software frame for the simulation software of the infrared image sequence were introduced in SGI workstation. Under the guidance of the idea above, in the third segment of the paper an example of simulative infrared image sequences was presented, which used the sea and sky as background and used the warship as target and used the aircraft as eye point. At last the simulation synthetically was evaluated and the betterment scheme was presented.

  6. Next-generation sequencing for mitochondrial disorders

    PubMed Central

    Carroll, C J; Brilhante, V; Suomalainen, A

    2014-01-01

    A great deal of our understanding of mitochondrial function has come from studies of inherited mitochondrial diseases, but still majority of the patients lack molecular diagnosis. Furthermore, effective treatments for mitochondrial disorders do not exist. Development of therapies has been complicated by the fact that the diseases are extremely heterogeneous, and collecting large enough cohorts of similarly affected individuals to assess new therapies properly has been difficult. Next-generation sequencing technologies have in the last few years been shown to be an effective method for the genetic diagnosis of inherited mitochondrial diseases. Here we review the strategies and findings from studies applying next-generation sequencing methods for the genetic diagnosis of mitochondrial disorders. Detailed knowledge of molecular causes also enables collection of homogenous cohorts of patients for therapy trials, and therefore boosts development of intervention. Linked Articles This article is part of a themed issue on Mitochondrial Pharmacology: Energy, Injury & Beyond. To view the other articles in this issue visit http://dx.doi.org/10.1111/bph.2014.171.issue-8 PMID:24138576

  7. deepTools: a flexible platform for exploring deep-sequencing data.

    PubMed

    Ramírez, Fidel; Dündar, Friederike; Diehl, Sarah; Grüning, Björn A; Manke, Thomas

    2014-07-01

    We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Microfluidics for genome-wide studies involving next generation sequencing

    PubMed Central

    Murphy, Travis W.; Lu, Chang

    2017-01-01

    Next-generation sequencing (NGS) has revolutionized how molecular biology studies are conducted. Its decreasing cost and increasing throughput permit profiling of genomic, transcriptomic, and epigenomic features for a wide range of applications. Microfluidics has been proven to be highly complementary to NGS technology with its unique capabilities for handling small volumes of samples and providing platforms for automation, integration, and multiplexing. In this article, we review recent progress on applying microfluidics to facilitate genome-wide studies. We emphasize on several technical aspects of NGS and how they benefit from coupling with microfluidic technology. We also summarize recent efforts on developing microfluidic technology for genomic, transcriptomic, and epigenomic studies, with emphasis on single cell analysis. We envision rapid growth in these directions, driven by the needs for testing scarce primary cell samples from patients in the context of precision medicine. PMID:28396707

  9. Initial steps towards a production platform for DNA sequence analysis on the grid

    PubMed Central

    2010-01-01

    Background Bioinformatics is confronted with a new data explosion due to the availability of high throughput DNA sequencers. Data storage and analysis becomes a problem on local servers, and therefore it is needed to switch to other IT infrastructures. Grid and workflow technology can help to handle the data more efficiently, as well as facilitate collaborations. However, interfaces to grids are often unfriendly to novice users. Results In this study we reused a platform that was developed in the VL-e project for the analysis of medical images. Data transfer, workflow execution and job monitoring are operated from one graphical interface. We developed workflows for two sequence alignment tools (BLAST and BLAT) as a proof of concept. The analysis time was significantly reduced. All workflows and executables are available for the members of the Dutch Life Science Grid and the VL-e Medical virtual organizations All components are open source and can be transported to other grid infrastructures. Conclusions The availability of in-house expertise and tools facilitates the usage of grid resources by new users. Our first results indicate that this is a practical, powerful and scalable solution to address the capacity and collaboration issues raised by the deployment of next generation sequencers. We currently adopt this methodology on a daily basis for DNA sequencing and other applications. More information and source code is available via http://www.bioinformaticslaboratory.nl/ PMID:21156038

  10. Capturing genomic signatures of DNA sequence variation using a standard anonymous microarray platform

    PubMed Central

    Cannon, C. H.; Kua, C. S.; Lobenhofer, E. K.; Hurban, P.

    2006-01-01

    Comparative genomics, using the model organism approach, has provided powerful insights into the structure and evolution of whole genomes. Unfortunately, only a small fraction of Earth's biodiversity will have its genome sequenced in the foreseeable future. Most wild organisms have radically different life histories and evolutionary genomics than current model systems. A novel technique is needed to expand comparative genomics to a wider range of organisms. Here, we describe a novel approach using an anonymous DNA microarray platform that gathers genomic samples of sequence variation from any organism. Oligonucleotide probe sequences placed on a custom 44 K array were 25 bp long and designed using a simple set of criteria to maximize their complexity and dispersion in sequence probability space. Using whole genomic samples from three known genomes (mouse, rat and human) and one unknown (Gonystylus bancanus), we demonstrate and validate its power, reliability, transitivity and sensitivity. Using two separate statistical analyses, a large numbers of genomic ‘indicator’ probes were discovered. The construction of a genomic signature database based upon this technique would allow virtual comparisons and simple queries could generate optimal subsets of markers to be used in large-scale assays, using simple downstream techniques. Biologists from a wide range of fields, studying almost any organism, could efficiently perform genomic comparisons, at potentially any phylogenetic level after performing a small number of standardized DNA microarray hybridizations. Possibilities for refining and expanding the approach are discussed. PMID:17000641

  11. Meningococcus genome informatics platform: a system for analyzing multilocus sequence typing data

    PubMed Central

    Katz, Lee S.; Bolen, Chris R.; Harcourt, Brian H.; Schmink, Susanna; Wang, Xin; Kislyuk, Andrey; Taylor, Robert T.; Mayer, Leonard W.; Jordan, I. King

    2009-01-01

    The Meningococcus Genome Informatics Platform (MGIP) is a suite of computational tools for the analysis of multilocus sequence typing (MLST) data, at http://mgip.biology.gatech.edu. MLST is used to generate allelic profiles to characterize strains of Neisseria meningitidis, a major cause of bacterial meningitis worldwide. Neisseria meningitidis strains are characterized with MLST as specific sequence types (ST) and clonal complexes (CC) based on the DNA sequences at defined loci. These data are vital to molecular epidemiology studies of N. meningitidis, including outbreak investigations and population biology. MGIP analyzes DNA sequence trace files, returns individual allele calls and characterizes the STs and CCs. MGIP represents a substantial advance over existing software in several respects: (i) ease of use—MGIP is user friendly, intuitive and thoroughly documented; (ii) flexibility—because MGIP is a website, it is compatible with any computer with an internet connection, can be used from any geographic location, and there is no installation; (iii) speed—MGIP takes just over one minute to process a set of 96 trace files; and (iv) expandability—MGIP has the potential to expand to more loci than those used in MLST and even to other bacterial species. PMID:19468047

  12. AIV Platform for the Galileo Message Generation Facility

    NASA Astrophysics Data System (ADS)

    Oving, B. A.; Zwartbol, T.; Denham, S.; Rennie, M.

    2007-08-01

    The Message Generation Facility (MGF) is an element of the Galileo Mission Segment (GMS) and is responsible for real-time distribution of the navigation, integrity and SAR messages from the processing facilities (OSPF, IPF, ERIS, RLSP) to the Up-Link Stations (ULS). The main objective is to route a message to the correct ULS in time for on-board update of navigation data and integrity data for dissemination to users. The MGF element is being developed by Deimos Space S.L. (Spain). To perform the Assembly, Integration and Verification (AIV) activities of the MGF, a dedicated test platform, MGF-AIVP, is developed by the National Aerospace Laboratory, NLR (the Netherlands). The MGF-AIVP simulates other Elements in the GMS that are connected to the MGF, in real-time. Its focus is to verify the main objective of the MGF.

  13. Periodic binary sequence generators: VLSI circuits considerations

    NASA Technical Reports Server (NTRS)

    Perlman, M.

    1984-01-01

    Feedback shift registers are efficient periodic binary sequence generators. Polynomials of degree r over a Galois field characteristic 2(GF(2)) characterize the behavior of shift registers with linear logic feedback. The algorithmic determination of the trinomial of lowest degree, when it exists, that contains a given irreducible polynomial over GF(2) as a factor is presented. This corresponds to embedding the behavior of an r-stage shift register with linear logic feedback into that of an n-stage shift register with a single two-input modulo 2 summer (i.e., Exclusive-OR gate) in its feedback. This leads to Very Large Scale Integrated (VLSI) circuit architecture of maximal regularity (i.e., identical cells) with intercell communications serialized to a maximal degree.

  14. Next generation sequencing in endocrine practice.

    PubMed

    Forlenza, Gregory P; Calhoun, Amy; Beckman, Kenneth B; Halvorsen, Tanya; Hamdoun, Elwaseila; Zierhut, Heather; Sarafoglou, Kyriakie; Polgreen, Lynda E; Miller, Bradley S; Nathan, Brandon; Petryk, Anna

    2015-01-01

    With the completion of the Human Genome Project and advances in genomic sequencing technologies, the use of clinical molecular diagnostics has grown tremendously over the last decade. Next-generation sequencing (NGS) has overcome many of the practical roadblocks that had slowed the adoption of molecular testing for routine clinical diagnosis. In endocrinology, targeted NGS now complements biochemical testing and imaging studies. The goal of this review is to provide clinicians with a guide to the application of NGS to genetic testing for endocrine conditions, by compiling a list of established gene mutations detectable by NGS, and highlighting key phenotypic features of these disorders. As we outline in this review, the clinical utility of NGS-based molecular testing for endocrine disorders is very high. Identifying an exact genetic etiology improves understanding of the disease, provides clear explanation to families about the cause, and guides decisions about screening, prevention and/or treatment. To illustrate this approach, a case of hypophosphatasia with a pathogenic mutation in the ALPL gene detected by NGS is presented. Copyright © 2015. Published by Elsevier Inc.

  15. Efficient Study Design for Next Generation Sequencing

    PubMed Central

    Sampson, Joshua; Jacobs, Kevin; Yeager, Meredith; Chanock, Stephen; Chatterjee, Nilanjan

    2011-01-01

    Next Generation Sequencing represents a powerful tool for detecting genetic variation associated with human disease. Because of the high cost of this technology, it is critical that we develop efficient study designs that consider the trade-off between the number of subjects (n) and the coverage depth (μ). How we divide our resources between the two can greatly impact study success, particularly in pilot studies. We propose a strategy for selecting the optimal combination of n and μ for studies aimed at detecting rare variants and for studies aimed at detecting associations between rare or uncommon variants and disease. For detecting rare variants, we find the optimal coverage depth to be between 2 and 8 reads when using the likelihood ratio test. For association studies, we find the strategy of sequencing all available subjects to be preferable. In deriving these combinations, we provide a detailed analysis describing the distribution of depth across a genome and the depth needed to identify a minor allele in an individual. The optimal coverage depth depends on the aims of the study, and the chosen depth can have a large impact on study success. PMID:21370254

  16. Long period pseudo random number sequence generator

    NASA Technical Reports Server (NTRS)

    Wang, Charles C. (Inventor)

    1989-01-01

    A circuit for generating a sequence of pseudo random numbers, (A sub K). There is an exponentiator in GF(2 sup m) for the normal basis representation of elements in a finite field GF(2 sup m) each represented by m binary digits and having two inputs and an output from which the sequence (A sub K). Of pseudo random numbers is taken. One of the two inputs is connected to receive the outputs (E sub K) of maximal length shift register of n stages. There is a switch having a pair of inputs and an output. The switch outputs is connected to the other of the two inputs of the exponentiator. One of the switch inputs is connected for initially receiving a primitive element (A sub O) in GF(2 sup m). Finally, there is a delay circuit having an input and an output. The delay circuit output is connected to the other of the switch inputs and the delay circuit input is connected to the output of the exponentiator. Whereby after the exponentiator initially receives the primitive element (A sub O) in GF(2 sup m) through the switch, the switch can be switched to cause the exponentiator to receive as its input a delayed output A(K-1) from the exponentiator thereby generating (A sub K) continuously at the output of the exponentiator. The exponentiator in GF(2 sup m) is novel and comprises a cyclic-shift circuit; a Massey-Omura multiplier; and, a control logic circuit all operably connected together to perform the function U(sub i) = 92(sup i) (for n(sub i) = 1 or 1 (for n(subi) = 0).

  17. Applicability of Next Generation Sequencing Technology in Microsatellite Instability Testing

    PubMed Central

    Gan, Chun; Love, Clare; Beshay, Victoria; Macrae, Finlay; Fox, Stephen; Waring, Paul; Taylor, Graham

    2015-01-01

    Microsatellite instability (MSI) is a useful marker for risk assessment, prediction of chemotherapy responsiveness and prognosis in patients with colorectal cancer. Here, we describe a next generation sequencing approach for MSI testing using the MiSeq platform. Different from other MSI capturing strategies that are based on targeted gene capture, we utilize “deep resequencing”, where we focus the sequencing on only the microsatellite regions of interest. We sequenced a series of 44 colorectal tumours with normal controls for five MSI loci (BAT25, BAT26, BAT34c4, D18S55, D5S346) and a second series of six colorectal tumours (no control) with two mononucleotide loci (BAT25, BAT26). In the first series, we were able to determine 17 MSI-High, 1 MSI-Low and 26 microsatellite stable (MSS) tumours. In the second series, there were three MSI-High and three MSS tumours. Although there was some variation within individual markers, this NGS method produced the same overall MSI status for each tumour, as obtained with the traditional multiplex PCR-based method. PMID:25685876

  18. Next-Generation Phylogeography: A Targeted Approach for Multilocus Sequencing of Non-Model Organisms

    PubMed Central

    Puritz, Jonathan B.; Addison, Jason A.; Toonen, Robert J.

    2012-01-01

    The field of phylogeography has long since realized the need and utility of incorporating nuclear DNA (nDNA) sequences into analyses. However, the use of nDNA sequence data, at the population level, has been hindered by technical laboratory difficulty, sequencing costs, and problematic analytical methods dealing with genotypic sequence data, especially in non-model organisms. Here, we present a method utilizing the 454 GS-FLX Titanium pyrosequencing platform with the capacity to simultaneously sequence two species of sea star (Meridiastra calcar and Parvulastra exigua) at five different nDNA loci across 16 different populations of 20 individuals each per species. We compare results from 3 populations with traditional Sanger sequencing based methods, and demonstrate that this next-generation sequencing platform is more time and cost effective and more sensitive to rare variants than Sanger based sequencing. A crucial advantage is that the high coverage of clonally amplified sequences simplifies haplotype determination, even in highly polymorphic species. This targeted next-generation approach can greatly increase the use of nDNA sequence loci in phylogeographic and population genetic studies by mitigating many of the time, cost, and analytical issues associated with highly polymorphic, diploid sequence markers. PMID:22470543

  19. Minimum Information for Reporting Next Generation Sequence Genotyping (MIRING): Guidelines for Reporting HLA and KIR Genotyping via Next Generation Sequencing

    PubMed Central

    Mack, Steven J.; Milius, Robert P.; Gifford, Benjamin D.; Sauter, Jürgen; Hofmann, Jan; Osoegawa, Kazutoyo; Robinson, James; Groeneweg, Mathijs; Turenchalk, Gregory S.; Adai, Alex; Holcomb, Cherie; Rozemuller, Erik H.; Penning, Maarten T.; Heuer, Michael L.; Wang, Chunlin; Salit, Marc L.; Schmidt, Alexander H.; Parham, Peter R.; Müller, Carlheinz; Hague, Tim; Fischer, Gottfried; Fernandez-Viňa, Marcelo; Hollenbach, Jill A; Norman, Paul J.; Maiers, Martin

    2015-01-01

    The development of next-generation sequencing (NGS) technologies for HLA and KIR genotyping is rapidly advancing knowledge of genetic variation of these highly polymorphic loci. NGS genotyping is poised to replace older methods for clinical use, but standard methods for reporting and exchanging these new, high quality genotype data are needed. The Immunogenomic NGS Consortium, a broad collaboration of histocompatibility and immunogenetics clinicians, researchers, instrument manufacturers and software developers, has developed the Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) reporting guidelines. MIRING is a checklist that specifies the content of NGS genotyping results as well as a set of messaging guidelines for reporting the results. A MIRING message includes five categories of structured information – message annotation, reference context, full genotype, consensus sequence and novel polymorphism – and references to three categories of accessory information – NGS platform documentation, read processing documentation and primary data. These eight categories of information ensure the long-term portability and broad application of this NGS data for all current histocompatibility and immunogenetics use cases. In addition, MIRING can be extended to allow the reporting of genotype data generated using pre-NGS technologies. Because genotyping results reported using MIRING are easily updated in accordance with reference and nomenclature databases, MIRING represents a bold departure from previous methods of reporting HLA and KIR genotyping results, which have provided static and less-portable data. More information about MIRING can be found online at miring.immunogenomics.org. PMID:26407912

  20. Minimum information for reporting next generation sequence genotyping (MIRING): Guidelines for reporting HLA and KIR genotyping via next generation sequencing.

    PubMed

    Mack, Steven J; Milius, Robert P; Gifford, Benjamin D; Sauter, Jürgen; Hofmann, Jan; Osoegawa, Kazutoyo; Robinson, James; Groeneweg, Mathijs; Turenchalk, Gregory S; Adai, Alex; Holcomb, Cherie; Rozemuller, Erik H; Penning, Maarten T; Heuer, Michael L; Wang, Chunlin; Salit, Marc L; Schmidt, Alexander H; Parham, Peter R; Müller, Carlheinz; Hague, Tim; Fischer, Gottfried; Fernandez-Viňa, Marcelo; Hollenbach, Jill A; Norman, Paul J; Maiers, Martin

    2015-12-01

    The development of next-generation sequencing (NGS) technologies for HLA and KIR genotyping is rapidly advancing knowledge of genetic variation of these highly polymorphic loci. NGS genotyping is poised to replace older methods for clinical use, but standard methods for reporting and exchanging these new, high quality genotype data are needed. The Immunogenomic NGS Consortium, a broad collaboration of histocompatibility and immunogenetics clinicians, researchers, instrument manufacturers and software developers, has developed the Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) reporting guidelines. MIRING is a checklist that specifies the content of NGS genotyping results as well as a set of messaging guidelines for reporting the results. A MIRING message includes five categories of structured information - message annotation, reference context, full genotype, consensus sequence and novel polymorphism - and references to three categories of accessory information - NGS platform documentation, read processing documentation and primary data. These eight categories of information ensure the long-term portability and broad application of this NGS data for all current histocompatibility and immunogenetics use cases. In addition, MIRING can be extended to allow the reporting of genotype data generated using pre-NGS technologies. Because genotyping results reported using MIRING are easily updated in accordance with reference and nomenclature databases, MIRING represents a bold departure from previous methods of reporting HLA and KIR genotyping results, which have provided static and less-portable data. More information about MIRING can be found online at miring.immunogenomics.org.

  1. Deep sequencing analysis of phage libraries using Illumina platform.

    PubMed

    Matochko, Wadim L; Chu, Kiki; Jin, Bingjie; Lee, Sam W; Whitesides, George M; Derda, Ratmir

    2012-09-01

    This paper presents an analysis of phage-displayed libraries of peptides using Illumina. We describe steps for the preparation of short DNA fragments for deep sequencing and MatLab software for the analysis of the results. Screening of peptide libraries displayed on the surface of bacteriophage (phage display) can be used to discover peptides that bind to any target. The key step in this discovery is the analysis of peptide sequences present in the library. This analysis is usually performed by Sanger sequencing, which is labor intensive and limited to examination of a few hundred phage clones. On the other hand, Illumina deep-sequencing technology can characterize over 10(7) reads in a single run. We applied Illumina sequencing to analyze phage libraries. Using PCR, we isolated the variable regions from M13KE phage vectors from a phage display library. The PCR primers contained (i) sequences flanking the variable region, (ii) barcodes, and (iii) variable 5'-terminal region. We used this approach to examine how diversity of peptides in phage display libraries changes as a result of amplification of libraries in bacteria. Using HiSeq single-end Illumina sequencing of these fragments, we acquired over 2×10(7) reads, 57 base pairs (bp) in length. Each read contained information about the barcode (6bp), one complimentary region (12bp) and a variable region (36bp). We applied this sequencing to a model library of 10(6) unique clones and observed that amplification enriches ∼150 clones, which dominate ∼20% of the library. Deep sequencing, for the first time, characterized the collapse of diversity in phage libraries. The results suggest that screens based on repeated amplification and small-scale sequencing identify a few binding clones and miss thousands of useful clones. The deep sequencing approach described here could identify under-represented clones in phage screens. It could also be instrumental in developing new screening strategies, which can preserve

  2. The Molecular Blueprint of a Fungus by Next-Generation Sequencing (NGS).

    PubMed

    Grumaz, Christian; Kirstahler, Philipp; Sohn, Kai

    2017-01-01

    Sequencing the whole genome of an organism is invaluable for its comprehensive molecular characterization and has been drastically facilitated by the advent of high-throughput sequencing techniques. Especially in clinical microbiology the impact of sequenced strains increases as resistance and virulence markers can easily be detected. Here, we describe a combined approach for sequencing a fungal genome and transcriptome from initial nucleic acid isolation through the generation of ready-to-load DNA libraries for the Illumina platform and the final step of genome assembly with subsequent gene annotation.

  3. Calling amplified haplotypes in next generation tumor sequence data

    PubMed Central

    Dewal, Ninad; Hu, Yang; Freedman, Matthew L.; LaFramboise, Thomas; Pe'er, Itsik

    2012-01-01

    During tumor initiation and progression, cancer cells acquire a selective advantage, allowing them to outcompete their normal counterparts. Identification of the genetic changes that underlie these tumor acquired traits can provide deeper insights into the biology of tumorigenesis. Regions of copy number alterations and germline DNA variants are some of the elements subject to selection during tumor evolution. Integrated examination of inherited variation and somatic alterations holds the potential to reveal specific nucleotide alleles that a tumor “prefers” to have amplified. Next-generation sequencing of tumor and matched normal tissues provides a high-resolution platform to identify and analyze such somatic amplicons. Within an amplicon, examination of informative (e.g., heterozygous) sites deviating from a 1:1 ratio may suggest selection of that allele. A naive approach examines the reads for each heterozygous site in isolation; however, this ignores available valuable linkage information across sites. We, therefore, present a novel hidden Markov model-based method—Haplotype Amplification in Tumor Sequences (HATS)—that analyzes tumor and normal sequence data, along with training data for phasing purposes, to infer amplified alleles and haplotypes in regions of copy number gain. Our method is designed to handle rare variants and biases in read data. We assess the performance of HATS using simulated amplified regions generated from varying copy number and coverage levels, followed by amplicons in real data. We demonstrate that HATS infers the amplified alleles more accurately than does the naive approach, especially at low to intermediate coverage levels and in cases (including high coverage) possessing stromal contamination or allelic bias. PMID:22090379

  4. Multiple platform assessment of the EGF dependent transcriptome by microarray and deep tag sequencing analysis

    PubMed Central

    2011-01-01

    Background Epidermal Growth Factor (EGF) is a key regulatory growth factor activating many processes relevant to normal development and disease, affecting cell proliferation and survival. Here we use a combined approach to study the EGF dependent transcriptome of HeLa cells by using multiple long oligonucleotide based microarray platforms (from Agilent, Operon, and Illumina) in combination with digital gene expression profiling (DGE) with the Illumina Genome Analyzer. Results By applying a procedure for cross-platform data meta-analysis based on RankProd and GlobalAncova tests, we establish a well validated gene set with transcript levels altered after EGF treatment. We use this robust gene list to build higher order networks of gene interaction by interconnecting associated networks, supporting and extending the important role of the EGF signaling pathway in cancer. In addition, we find an entirely new set of genes previously unrelated to the currently accepted EGF associated cellular functions. Conclusions We propose that the use of global genomic cross-validation derived from high content technologies (microarrays or deep sequencing) can be used to generate more reliable datasets. This approach should help to improve the confidence of downstream in silico functional inference analyses based on high content data. PMID:21699700

  5. Strategies for complete mitochondrial genome sequencing on Ion Torrent PGM™ platform in forensic sciences.

    PubMed

    Zhou, Yishu; Guo, Fei; Yu, Jiao; Liu, Feng; Zhao, Jinling; Shen, Hongying; Zhao, Bin; Jia, Fei; Sun, Zhu; Song, He; Jiang, Xianhua

    2016-05-01

    Next generation sequencing (NGS) is a time saving and cost-efficient method to detect the complete mitochondrial genome (mtGenome) compared to Sanger sequencing. In this study we focused on developing strategies for mtGenome sequencing on the Ion Torrent PGM™ platform and NGS data analysis. With our experience, 4, 15 and 30 samples could be loaded onto Ion 314™, Ion 316™ and Ion 318™ chips respectively at a pooling concentration of 26pM, achieving to sufficient average coverage of ≥1500 × and well strand balance of 1.05. Data processing software is essential to NGS mega data analysis. The in-house Perl scripts were developed for primary data analysis to screen out uncertain positions and samples from variant call format (VCF) reports and for pedigree study to perform pairwise comparisons. The Integrative Genomic Viewer (IGV) and the NextGENe software were introduced to secondary data analysis. The mthap and EMMA were employed for haplogroup assignment. The dataset was reviewed and approved by the EMPOP as the final version, which showed 2.66% error rate generated from the Torrent Variant Caller (TVC). Across the mtGenome, 4022 variants were found at 725 nucleotide positions, where ratio of transitions to transversions was estimated at 20.89:1 and 22.18% of variants was concentrated at hypervariable segments I and II (HVS-I and HVS-II). Totally, 107 complete mtGenome haplotypes were observed from 107 Northern Chinese Han and assigned to 88 haplogroups. The random match probability (RMP) of complete mtGenome was calculated as 0.009345794, decreasing 26.19% by comparison to that of HVS-I only, and the haplotype diversity (HD) was evaluated as 1, increasing 0.33% by comparison to that of HVS-I only. Principal component analysis (PCA) showed that our population was clustered to East and Southeast Asians. The strategies in this study are suitable for complete mtGenome sequencing on Ion Torrent PGM™ platform and Northern Chinese Han (EMP00670) is the first

  6. Development of Computer Algorithm for Editing of Next Generation Sequencing Metagenome Data.

    PubMed

    Khanna, Radhika; Mittal, Sangeeta; Mohanty, Sujata

    2017-09-01

    The successful implementation of the advanced sequencing technology, the next generation sequencing (NGS) motivates scientists from diverse fields of biological research especially from genomics and transcriptomics in generating large genomic data set to make their analysis more robust and come up with strong inference. However, exploiting this huge genomic data set becomes a challenge for the molecular biologists. To corroborate this problem, computational software and hardware are being developed in parallel and become an integral part of life science. While executing the "Genomics project of Indian Drosophila species," we found strings of Ns in the whole genome sequences generated on Illumina platform. The present article aims at developing a computer algorithm (MATLAB and Python based) for editing raw sequences mainly eliminating bad residues before submitting to the publicly accessible sequence repository. These algorithms will be helpful to life scientists for analyzing large amount of biological data in short span of time.

  7. Preparation of SELEX Samples for Next-Generation Sequencing.

    PubMed

    Tolle, Fabian; Mayer, Günter

    2016-01-01

    Fuelled by massive whole genome sequencing projects such as the human genome project, enormous technological advancements and therefore tremendous price drops could be achieved, rendering next-generation sequencing very attractive for deep sequencing of SELEX libraries. Herein we describe the preparation of SELEX samples for Illumina sequencing, based on the already established whole genome sequencing workflow. We describe the addition of barcode sequences for multiplexing and the adapter ligation, avoiding associated pitfalls.

  8. The impact of next-generation sequencing on genomics

    PubMed Central

    Zhang, Jun; Chiodini, Rod; Badr, Ahmed; Zhang, Genfa

    2011-01-01

    This article reviews basic concepts, general applications, and the potential impact of next-generation sequencing (NGS) technologies on genomics, with particular reference to currently available and possible future platforms and bioinformatics. NGS technologies have demonstrated the capacity to sequence DNA at unprecedented speed, thereby enabling previously unimaginable scientific achievements and novel biological applications. But, the massive data produced by NGS also presents a significant challenge for data storage, analyses, and management solutions. Advanced bioinformatic tools are essential for the successful application of NGS technology. As evidenced throughout this review, NGS technologies will have a striking impact on genomic research and the entire biological field. With its ability to tackle the unsolved challenges unconquered by previous genomic technologies, NGS is likely to unravel the complexity of the human genome in terms of genetic variations, some of which may be confined to susceptible loci for some common human conditions. The impact of NGS technologies on genomics will be far reaching and likely change the field for years to come. PMID:21477781

  9. Next-generation sequencing diagnostics of bacteremia in septic patients.

    PubMed

    Grumaz, Silke; Stevens, Philip; Grumaz, Christian; Decker, Sebastian O; Weigand, Markus A; Hofer, Stefan; Brenner, Thorsten; von Haeseler, Arndt; Sohn, Kai

    2016-07-01

    Bloodstream infections remain one of the major challenges in intensive care units, leading to sepsis or even septic shock in many cases. Due to the lack of timely diagnostic approaches with sufficient sensitivity, mortality rates of sepsis are still unacceptably high. However a prompt diagnosis of the causative microorganism is critical to significantly improve outcome of bloodstream infections. Although various targeted molecular tests for blood samples are available, time-consuming blood culture-based approaches still represent the standard of care for the identification of bacteria. Here we describe the establishment of a complete diagnostic workflow for the identification of infectious microorganisms from seven septic patients based on unbiased sequence analyses of free circulating DNA from plasma by next-generation sequencing. We found significant levels of DNA fragments derived from pathogenic bacteria in samples from septic patients. Quantitative evaluation of normalized read counts and introduction of a sepsis indicating quantifier (SIQ) score allowed for an unambiguous identification of Gram-positive as well as Gram-negative bacteria that exactly matched with blood cultures from corresponding patient samples. In addition, we also identified species from samples where blood cultures were negative. Reads of non-human origin also comprised fragments derived from antimicrobial resistance genes, showing that, in principle, prediction of specific types of resistance might be possible. The complete workflow from sample preparation to species identification report could be accomplished in roughly 30 h, thus making this approach a promising diagnostic platform for critically ill patients suffering from bloodstream infections.

  10. Advanced Applications of Next-Generation Sequencing Technologies to Orchid Biology.

    PubMed

    Yeh, Chuan-Ming; Liu, Zhong-Jian; Tsai, Wen-Chieh

    2017-09-08

    Next-generation sequencing technologies are revolutionizing biology by permitting, transcriptome sequencing, whole-genome sequencing and resequencing, and genome-wide single nucleotide polymorphism profiling. Orchid research has benefited from this breakthrough, and a few orchid genomes are now available; new biological questions can be approached and new breeding strategies can be designed. The first part of this review describes the unique features of orchid biology. The second part provides an overview of the current next-generation sequencing platforms, many of which are already used in plant laboratories. The third part summarizes the state of orchid transcriptome and genome sequencing and illustrates current achievements. The genetic sequences currently obtained will not only provide a broad scope for the study of orchid biology, but also serves as a starting point for uncovering the mystery of orchid evolution.

  11. Applications of next-generation sequencing techniques in plant biology

    USDA-ARS?s Scientific Manuscript database

    The last several years have seen revolutionary advances in DNA sequencing technologies with the advent of next generation sequencing (NGS) techniques. NGS methods now allow millions of bases to be sequenced in one round, at a fraction of the cost relative to traditional Sanger sequencing, allowing u...

  12. Economic regulation of next-generation sequencing.

    PubMed

    Evans, Barbara J

    2014-01-01

    Next-generation sequencing broadens the debate about appropriate regulatory oversight of genetic testing and may force scholars to move beyond familiar privacy and health and safety regulatory issues to address new problems with industry structure and economic regulation. The genetic testing industry is passing through a period of profound structural change in response to shifts in technology and in the legal environment. Making genetic testing safe and effective for consumers increasingly requires access to comprehensive genomic data infrastructures that can support accurate, state-of-the-art interpretation of genetic test results. At present, there are significant barriers to access and there is no sector-specific regulator with power to ensure appropriate data access. Without it, genetic testing will not be safe for consumers even when it is performed at CLIA-certified laboratories using tests that have been FDA-cleared or approved. This article explores the emerging structure of the genetic testing industry and describes its present economic regulatory vacuum. In view of this gap in regulation, the article explores whether generally applicable law, particularly antitrust law, may offer solutions to the industry's data access problems. It concludes that courts may have a useful role to play, particularly in Europe and other jurisdictions where the essential facilities doctrine enjoys continued vitality. After Verizon Communications v. Law Offices of Curtis V. Trinko, the role of U.S. federal courts is less certain. Congress has demonstrated willingness to address access issues as they emerged in other infrastructure industries in recent decades. This article expresses no preference between legislative and judicial solutions. Its aim is simply to highlight an emerging economic regulatory issue which, if left unresolved, presents real health and safety concerns for consumers who receive genetic tests. © 2014 American Society of Law, Medicine & Ethics, Inc.

  13. Next-Generation Sequencing: A Review of Technologies and Tools for Wound Microbiome Research

    PubMed Central

    Hodkinson, Brendan P.; Grice, Elizabeth A.

    2015-01-01

    Significance: The colonization of wounds by specific microbes or communities of microbes may delay healing and/or lead to infection-related complication. Studies of wound-associated microbial communities (microbiomes) to date have primarily relied upon culture-based methods, which are known to have extreme biases and are not reliable for the characterization of microbiomes. Biofilms are very resistant to culture and are therefore especially difficult to study with techniques that remain standard in clinical settings. Recent Advances: Culture-independent approaches employing next-generation DNA sequencing have provided researchers and clinicians a window into wound-associated microbiomes that could not be achieved before and has begun to transform our view of wound-associated biodiversity. Within the past decade, many platforms have arisen for performing this type of sequencing, with various types of applications for microbiome research being possible on each. Critical Issues: Wound care incorporating knowledge of microbiomes gained from next-generation sequencing could guide clinical management and treatments. The purpose of this review is to outline the current platforms, their applications, and the steps necessary to undertake microbiome studies using next-generation sequencing. Future Directions: As DNA sequencing technology progresses, platforms will continue to produce longer reads and more reads per run at lower costs. A major future challenge is to implement these technologies in clinical settings for more precise and rapid identification of wound bioburden. PMID:25566414

  14. Generating Functions for the Powers of Fibonacci Sequences

    ERIC Educational Resources Information Center

    Terrana, D.; Chen, H.

    2007-01-01

    In this note, based on the Binet formulas and the power-reducing techniques, closed forms of generating functions for the powers of Fibonacci sequences are presented. The corresponding results are extended to some other famous sequences as well.

  15. Generating Functions for the Powers of Fibonacci Sequences

    ERIC Educational Resources Information Center

    Terrana, D.; Chen, H.

    2007-01-01

    In this note, based on the Binet formulas and the power-reducing techniques, closed forms of generating functions for the powers of Fibonacci sequences are presented. The corresponding results are extended to some other famous sequences as well.

  16. OnlineCall: fast online parameter estimation and base calling for illumina's next-generation sequencing.

    PubMed

    Das, Shreepriya; Vikalo, Haris

    2012-07-01

    Next-generation DNA sequencing platforms are becoming increasingly cost-effective and capable of providing enormous number of reads in a relatively short time. However, their accuracy and read lengths are still lagging behind those of conventional Sanger sequencing method. Performance of next-generation sequencing platforms is fundamentally limited by various imperfections in the sequencing-by-synthesis and signal acquisition processes. This drives the search for accurate, scalable and computationally tractable base calling algorithms capable of accounting for such imperfections. Relying on a statistical model of the sequencing-by-synthesis process and signal acquisition procedure, we develop a computationally efficient base calling method for Illumina's sequencing technology (specifically, Genome Analyzer II platform). Parameters of the model are estimated via a fast unsupervised online learning scheme, which uses the generalized expectation-maximization algorithm and requires only 3 s of running time per tile (on an Intel i7 machine @3.07GHz, single core)-a three orders of magnitude speed-up over existing parametric model-based methods. To minimize the latency between the end of the sequencing run and the generation of the base calling reports, we develop a fast online scalable decoding algorithm, which requires only 9 s/tile and achieves significantly lower error rates than the Illumina's base calling software. Moreover, it is demonstrated that the proposed online parameter estimation scheme efficiently computes tile-dependent parameters, which can thereafter be provided to the base calling algorithm, resulting in significant improvements over previously developed base calling methods for the considered platform in terms of performance, time/complexity and latency. A C code implementation of our algorithm can be downloaded from http://www.cerc.utexas.edu/OnlineCall/.

  17. OnlineCall: fast online parameter estimation and base calling for illumina's next-generation sequencing

    PubMed Central

    Das, Shreepriya; Vikalo, Haris

    2012-01-01

    Motivation: Next-generation DNA sequencing platforms are becoming increasingly cost-effective and capable of providing enormous number of reads in a relatively short time. However, their accuracy and read lengths are still lagging behind those of conventional Sanger sequencing method. Performance of next-generation sequencing platforms is fundamentally limited by various imperfections in the sequencing-by-synthesis and signal acquisition processes. This drives the search for accurate, scalable and computationally tractable base calling algorithms capable of accounting for such imperfections. Results: Relying on a statistical model of the sequencing-by-synthesis process and signal acquisition procedure, we develop a computationally efficient base calling method for Illumina's sequencing technology (specifically, Genome Analyzer II platform). Parameters of the model are estimated via a fast unsupervised online learning scheme, which uses the generalized expectation–maximization algorithm and requires only 3 s of running time per tile (on an Intel i7 machine @3.07GHz, single core)—a three orders of magnitude speed-up over existing parametric model-based methods. To minimize the latency between the end of the sequencing run and the generation of the base calling reports, we develop a fast online scalable decoding algorithm, which requires only 9 s/tile and achieves significantly lower error rates than the Illumina's base calling software. Moreover, it is demonstrated that the proposed online parameter estimation scheme efficiently computes tile-dependent parameters, which can thereafter be provided to the base calling algorithm, resulting in significant improvements over previously developed base calling methods for the considered platform in terms of performance, time/complexity and latency. Availability: A C code implementation of our algorithm can be downloaded from http://www.cerc.utexas.edu/OnlineCall/ Contact: hvikalo@ece.utexas.edu Supplementary information

  18. Historical perspective, development and applications of next-generation sequencing in plant virology.

    PubMed

    Barba, Marina; Czosnek, Henryk; Hadidi, Ahmed

    2014-01-06

    Next-generation high throughput sequencing technologies became available at the onset of the 21st century. They provide a highly efficient, rapid, and low cost DNA sequencing platform beyond the reach of the standard and traditional DNA sequencing technologies developed in the late 1970s. They are continually improved to become faster, more efficient and cheaper. They have been used in many fields of biology since 2004. In 2009, next-generation sequencing (NGS) technologies began to be applied to several areas of plant virology including virus/viroid genome sequencing, discovery and detection, ecology and epidemiology, replication and transcription. Identification and characterization of known and unknown viruses and/or viroids in infected plants are currently among the most successful applications of these technologies. It is expected that NGS will play very significant roles in many research and non-research areas of plant virology.

  19. Detection of Genomic Structural Variants from Next-Generation Sequencing Data

    PubMed Central

    Tattini, Lorenzo; D’Aurizio, Romina; Magi, Alberto

    2015-01-01

    Structural variants are genomic rearrangements larger than 50 bp accounting for around 1% of the variation among human genomes. They impact on phenotypic diversity and play a role in various diseases including neurological/neurocognitive disorders and cancer development and progression. Dissecting structural variants from next-generation sequencing data presents several challenges and a number of approaches have been proposed in the literature. In this mini review, we describe and summarize the latest tools – and their underlying algorithms – designed for the analysis of whole-genome sequencing, whole-exome sequencing, custom captures, and amplicon sequencing data, pointing out the major advantages/drawbacks. We also report a summary of the most recent applications of third-generation sequencing platforms. This assessment provides a guided indication – with particular emphasis on human genetics and copy number variants – for researchers involved in the investigation of these genomic events. PMID:26161383

  20. Historical Perspective, Development and Applications of Next-Generation Sequencing in Plant Virology

    PubMed Central

    Barba, Marina; Czosnek, Henryk; Hadidi, Ahmed

    2014-01-01

    Next-generation high throughput sequencing technologies became available at the onset of the 21st century. They provide a highly efficient, rapid, and low cost DNA sequencing platform beyond the reach of the standard and traditional DNA sequencing technologies developed in the late 1970s. They are continually improved to become faster, more efficient and cheaper. They have been used in many fields of biology since 2004. In 2009, next-generation sequencing (NGS) technologies began to be applied to several areas of plant virology including virus/viroid genome sequencing, discovery and detection, ecology and epidemiology, replication and transcription. Identification and characterization of known and unknown viruses and/or viroids in infected plants are currently among the most successful applications of these technologies. It is expected that NGS will play very significant roles in many research and non-research areas of plant virology. PMID:24399207

  1. Polynomials Generated by the Fibonacci Sequence

    NASA Astrophysics Data System (ADS)

    Garth, David; Mills, Donald; Mitchell, Patrick

    2007-06-01

    The Fibonacci sequence's initial terms are F_0=0 and F_1=1, with F_n=F_{n-1}+F_{n-2} for n>=2. We define the polynomial sequence p by setting p_0(x)=1 and p_{n}(x)=x*p_{n-1}(x)+F_{n+1} for n>=1, with p_{n}(x)= sum_{k=0}^{n} F_{k+1}x^{n-k}. We call p_n(x) the Fibonacci-coefficient polynomial (FCP) of order n. The FCP sequence is distinct from the well-known Fibonacci polynomial sequence. We answer several questions regarding these polynomials. Specifically, we show that each even-degree FCP has no real zeros, while each odd-degree FCP has a unique, and (for degree at least 3) irrational, real zero. Further, we show that this sequence of unique real zeros converges monotonically to the negative of the golden ratio. Using Rouche's theorem, we prove that the zeros of the FCP's approach the golden ratio in modulus. We also prove a general result that gives the Mahler measures of an infinite subsequence of the FCP sequence whose coefficients are reduced modulo an integer m>=2. We then apply this to the case that m=L_n, the nth Lucas number, showing that the Mahler measure of the subsequence is phi^{n-1}, where phi=(1+sqrt 5)/2.

  2. A research roadmap for next-generation sequencing informatics.

    PubMed

    Altman, Russ B; Prabhu, Snehit; Sidow, Arend; Zook, Justin M; Goldfeder, Rachel; Litwack, David; Ashley, Euan; Asimenos, George; Bustamante, Carlos D; Donigan, Katherine; Giacomini, Kathleen M; Johansen, Elaine; Khuri, Natalia; Lee, Eunice; Liang, Xueying Sharon; Salit, Marc; Serang, Omar; Tezak, Zivana; Wall, Dennis P; Mansfield, Elizabeth; Kass-Hout, Taha

    2016-04-20

    Next-generation sequencing technologies are fueling a wave of new diagnostic tests. Progress on a key set of nine research challenge areas will help generate the knowledge required to advance effectively these diagnostics to the clinic.

  3. Visual programming for next-generation sequencing data analytics.

    PubMed

    Milicchio, Franco; Rose, Rebecca; Bian, Jiang; Min, Jae; Prosperi, Mattia

    2016-01-01

    High-throughput or next-generation sequencing (NGS) technologies have become an established and affordable experimental framework in biological and medical sciences for all basic and translational research. Processing and analyzing NGS data is challenging. NGS data are big, heterogeneous, sparse, and error prone. Although a plethora of tools for NGS data analysis has emerged in the past decade, (i) software development is still lagging behind data generation capabilities, and (ii) there is a 'cultural' gap between the end user and the developer. Generic software template libraries specifically developed for NGS can help in dealing with the former problem, whilst coupling template libraries with visual programming may help with the latter. Here we scrutinize the state-of-the-art low-level software libraries implemented specifically for NGS and graphical tools for NGS analytics. An ideal developing environment for NGS should be modular (with a native library interface), scalable in computational methods (i.e. serial, multithread, distributed), transparent (platform-independent), interoperable (with external software interface), and usable (via an intuitive graphical user interface). These characteristics should facilitate both the run of standardized NGS pipelines and the development of new workflows based on technological advancements or users' needs. We discuss in detail the potential of a computational framework blending generic template programming and visual programming that addresses all of the current limitations. In the long term, a proper, well-developed (although not necessarily unique) software framework will bridge the current gap between data generation and hypothesis testing. This will eventually facilitate the development of novel diagnostic tools embedded in routine healthcare.

  4. MEGGASENSE - The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for 
the Construction of Sequence Data Warehouses.

    PubMed

    Gacesa, Ranko; Zucko, Jurica; Petursdottir, Solveig K; Gudmundsdottir, Elisabet Eik; Fridjonsson, Olafur H; Diminic, Janko; Long, Paul F; Cullum, John; Hranueli, Daslav; Hreggvidsson, Gudmundur O; Starcevic, Antonio

    2017-06-01

    The MEGGASENSE platform constructs relational databases of DNA or protein sequences. The default functional analysis uses 14 106 hidden Markov model (HMM) profiles based on sequences in the KEGG database. The Solr search engine allows sophisticated queries and a BLAST search function is also incorporated. These standard capabilities were used to generate the SCATT database from the predicted proteome of Streptomyces cattleya. The implementation of a specialised metagenome database (AMYLOMICS) for bioprospecting of carbohydrate-modifying enzymes is described. In addition to standard assembly of reads, a novel 'functional' assembly was developed, in which screening of reads with the HMM profiles occurs before the assembly. The AMYLOMICS database incorporates additional HMM profiles for carbohydrate-modifying enzymes and it is illustrated how the combination of HMM and BLAST analyses helps identify interesting genes. A variety of different proteome and metagenome databases have been generated by MEGGASENSE.

  5. [Molecular pathology of the lungs. New perspectives by next generation sequencing].

    PubMed

    Vollbrecht, C; König, K; Heukamp, L; Büttner, R; Odenthal, M

    2013-02-01

    Lung cancer is one of the most frequent malignancies in the western world. Its frequent association with a wide spectrum of mutations in genes encoding various signal transducers that are often linked to therapy response, emphasizes the obvious need for improved, fast and highly efficient approaches in molecular pathology. Comprehensive analyses of the mutation status of progression and therapy relevant genes can be performed by the novel sequencing forms named next generation sequencing (NGS) providing extremely high capacities for ultra-deep sequence analyses. The 454 pyrosequencing method, the sequencing by synthesis and the semiconductor sequencing platform are now available for parallel sequencing approaches of multitudinous target genes linked to multiple tumor DNA applications. The "one molecule, one clone, one read" principle by the NGS approaches supplies not only information on allele frequencies and mutation rates but also has the advantage of a very sensitive detection of low frequency variants.

  6. Simulations Using Random-Generated DNA and RNA Sequences

    ERIC Educational Resources Information Center

    Bryce, C. F. A.

    1977-01-01

    Using a very simple computer program written in BASIC, a very large number of random-generated DNA or RNA sequences are obtained. Students use these sequences to predict complementary sequences and translational products, evaluate base compositions, determine frequencies of particular triplet codons, and suggest possible secondary structures.…

  7. Next Generation Sequencing at the University of Chicago Genomics Core

    SciTech Connect

    Faber, Pieter

    2013-04-24

    The University of Chicago Genomics Core provides University of Chicago investigators (and external clients) access to State-of-the-Art genomics capabilities: next generation sequencing, Sanger sequencing / genotyping and micro-arrays (gene expression, genotyping, and methylation). The current presentation will highlight our capabilities in the area of ultra-high throughput sequencing analysis.

  8. Next generation sequencing for neurological diseases: New hope or new hype?

    PubMed Central

    Keogh, M.J.; Chinnery, P.F.

    2013-01-01

    Over the past year huge advances have been made in our ability to determine the genetic aetiology of many neurological diseases through the utilisation of next generation sequencing platforms. This technology is, on a daily basis, providing new breakthroughs in neurological disease. The aim of this article is to clearly describe the technological platforms, methods of data analysis, established breakthroughs, and potential future clinical and research applications of this innovative and exciting technique which has relevance to all those working within clinical neuroscience. PMID:23200550

  9. Learning gene regulatory networks from next generation sequencing data.

    PubMed

    Jia, Bochao; Xu, Suwa; Xiao, Guanghua; Lamba, Vishal; Liang, Faming

    2017-03-10

    In recent years, next generation sequencing (NGS) has gradually replaced microarray as the major platform in measuring gene expressions. Compared to microarray, NGS has many advantages, such as less noise and higher throughput. However, the discreteness of NGS data also challenges the existing statistical methodology. In particular, there still lacks an appropriate statistical method for reconstructing gene regulatory networks using NGS data in the literature. The existing local Poisson graphical model method is not consistent and can only infer certain local structures of the network. In this article, we propose a random effect model-based transformation to continuize NGS data and then we transform the continuized data to Gaussian via a semiparametric transformation and apply an equivalent partial correlation selection method to reconstruct gene regulatory networks. The proposed method is consistent. The numerical results indicate that the proposed method can lead to much more accurate inference of gene regulatory networks than the local Poisson graphical model and other existing methods. The proposed data-continuized transformation fills the theoretical gap for how to transform discrete data to continuous data and facilitates NGS data analysis. The proposed data-continuized transformation also makes it feasible to integrate different types of data, such as microarray and RNA-seq data, in reconstruction of gene regulatory networks.

  10. JVM: Java Visual Mapping tool for next generation sequencing read.

    PubMed

    Yang, Ye; Liu, Juan

    2015-01-01

    We developed a program JVM (Java Visual Mapping) for mapping next generation sequencing read to reference sequence. The program is implemented in Java and is designed to deal with millions of short read generated by sequence alignment using the Illumina sequencing technology. It employs seed index strategy and octal encoding operations for sequence alignments. JVM is useful for DNA-Seq, RNA-Seq when dealing with single-end resequencing. JVM is a desktop application, which supports reads capacity from 1 MB to 10 GB.

  11. Image encryption using random sequence generated from generalized information domain

    NASA Astrophysics Data System (ADS)

    Xia-Yan, Zhang; Guo-Ji, Zhang; Xuan, Li; Ya-Zhou, Ren; Jie-Hua, Wu

    2016-05-01

    A novel image encryption method based on the random sequence generated from the generalized information domain and permutation-diffusion architecture is proposed. The random sequence is generated by reconstruction from the generalized information file and discrete trajectory extraction from the data stream. The trajectory address sequence is used to generate a P-box to shuffle the plain image while random sequences are treated as keystreams. A new factor called drift factor is employed to accelerate and enhance the performance of the random sequence generator. An initial value is introduced to make the encryption method an approximately one-time pad. Experimental results show that the random sequences pass the NIST statistical test with a high ratio and extensive analysis demonstrates that the new encryption scheme has superior security.

  12. Variable speed wind turbine generator with zero-sequence filter

    DOEpatents

    Muljadi, Eduard

    1998-01-01

    A variable speed wind turbine generator system to convert mechanical power into electrical power or energy and to recover the electrical power or energy in the form of three phase alternating current and return the power or energy to a utility or other load with single phase sinusoidal waveform at sixty (60) hertz and unity power factor includes an excitation controller for generating three phase commanded current, a generator, and a zero sequence filter. Each commanded current signal includes two components: a positive sequence variable frequency current signal to provide the balanced three phase excitation currents required in the stator windings of the generator to generate the rotating magnetic field needed to recover an optimum level of real power from the generator; and a zero frequency sixty (60) hertz current signal to allow the real power generated by the generator to be supplied to the utility. The positive sequence current signals are balanced three phase signals and are prevented from entering the utility by the zero sequence filter. The zero sequence current signals have zero phase displacement from each other and are prevented from entering the generator by the star connected stator windings. The zero sequence filter allows the zero sequence current signals to pass through to deliver power to the utility.

  13. Variable speed wind turbine generator with zero-sequence filter

    DOEpatents

    Muljadi, E.

    1998-08-25

    A variable speed wind turbine generator system to convert mechanical power into electrical power or energy and to recover the electrical power or energy in the form of three phase alternating current and return the power or energy to a utility or other load with single phase sinusoidal waveform at sixty (60) hertz and unity power factor includes an excitation controller for generating three phase commanded current, a generator, and a zero sequence filter. Each commanded current signal includes two components: a positive sequence variable frequency current signal to provide the balanced three phase excitation currents required in the stator windings of the generator to generate the rotating magnetic field needed to recover an optimum level of real power from the generator; and a zero frequency sixty (60) hertz current signal to allow the real power generated by the generator to be supplied to the utility. The positive sequence current signals are balanced three phase signals and are prevented from entering the utility by the zero sequence filter. The zero sequence current signals have zero phase displacement from each other and are prevented from entering the generator by the star connected stator windings. The zero sequence filter allows the zero sequence current signals to pass through to deliver power to the utility. 14 figs.

  14. Variable Speed Wind Turbine Generator with Zero-sequence Filter

    DOEpatents

    Muljadi, Eduard

    1998-08-25

    A variable speed wind turbine generator system to convert mechanical power into electrical power or energy and to recover the electrical power or energy in the form of three phase alternating current and return the power or energy to a utility or other load with single phase sinusoidal waveform at sixty (60) hertz and unity power factor includes an excitation controller for generating three phase commanded current, a generator, and a zero sequence filter. Each commanded current signal includes two components: a positive sequence variable frequency current signal to provide the balanced three phase excitation currents required in the stator windings of the generator to generate the rotating magnetic field needed to recover an optimum level of real power from the generator; and a zero frequency sixty (60) hertz current signal to allow the real power generated by the generator to be supplied to the utility. The positive sequence current signals are balanced three phase signals and are prevented from entering the utility by the zero sequence filter. The zero sequence current signals have zero phase displacement from each other and are prevented from entering the generator by the star connected stator windings. The zero sequence filter allows the zero sequence current signals to pass through to deliver power to the utility.

  15. Embedded-Based Graphics Processing Unit Cluster Platform for Multiple Sequence Alignments

    PubMed Central

    Wei, Jyh-Da; Cheng, Hui-Jun; Lin, Chun-Yuan; Ye, Jin; Yeh, Kuan-Yu

    2017-01-01

    High-end graphics processing units (GPUs), such as NVIDIA Tesla/Fermi/Kepler series cards with thousands of cores per chip, are widely applied to high-performance computing fields in a decade. These desktop GPU cards should be installed in personal computers/servers with desktop CPUs, and the cost and power consumption of constructing a GPU cluster platform are very high. In recent years, NVIDIA releases an embedded board, called Jetson Tegra K1 (TK1), which contains 4 ARM Cortex-A15 CPUs and 192 Compute Unified Device Architecture cores (belong to Kepler GPUs). Jetson Tegra K1 has several advantages, such as the low cost, low power consumption, and high applicability, and it has been applied into several specific applications. In our previous work, a bioinformatics platform with a single TK1 (STK platform) was constructed, and this previous work is also used to prove that the Web and mobile services can be implemented in the STK platform with a good cost-performance ratio by comparing a STK platform with the desktop CPU and GPU. In this work, an embedded-based GPU cluster platform will be constructed with multiple TK1s (MTK platform). Complex system installation and setup are necessary procedures at first. Then, 2 job assignment modes are designed for the MTK platform to provide services for users. Finally, ClustalW v2.0.11 and ClustalWtk will be ported to the MTK platform. The experimental results showed that the speedup ratios achieved 5.5 and 4.8 times for ClustalW v2.0.11 and ClustalWtk, respectively, by comparing 6 TK1s with a single TK1. The MTK platform is proven to be useful for multiple sequence alignments. PMID:28835734

  16. Embedded-Based Graphics Processing Unit Cluster Platform for Multiple Sequence Alignments.

    PubMed

    Wei, Jyh-Da; Cheng, Hui-Jun; Lin, Chun-Yuan; Ye, Jin; Yeh, Kuan-Yu

    2017-01-01

    High-end graphics processing units (GPUs), such as NVIDIA Tesla/Fermi/Kepler series cards with thousands of cores per chip, are widely applied to high-performance computing fields in a decade. These desktop GPU cards should be installed in personal computers/servers with desktop CPUs, and the cost and power consumption of constructing a GPU cluster platform are very high. In recent years, NVIDIA releases an embedded board, called Jetson Tegra K1 (TK1), which contains 4 ARM Cortex-A15 CPUs and 192 Compute Unified Device Architecture cores (belong to Kepler GPUs). Jetson Tegra K1 has several advantages, such as the low cost, low power consumption, and high applicability, and it has been applied into several specific applications. In our previous work, a bioinformatics platform with a single TK1 (STK platform) was constructed, and this previous work is also used to prove that the Web and mobile services can be implemented in the STK platform with a good cost-performance ratio by comparing a STK platform with the desktop CPU and GPU. In this work, an embedded-based GPU cluster platform will be constructed with multiple TK1s (MTK platform). Complex system installation and setup are necessary procedures at first. Then, 2 job assignment modes are designed for the MTK platform to provide services for users. Finally, ClustalW v2.0.11 and ClustalWtk will be ported to the MTK platform. The experimental results showed that the speedup ratios achieved 5.5 and 4.8 times for ClustalW v2.0.11 and ClustalWtk, respectively, by comparing 6 TK1s with a single TK1. The MTK platform is proven to be useful for multiple sequence alignments.

  17. Characterisation and Next-generation Sequencing Analysis of Unknown Arboviruses

    DTIC Science & Technology

    2012-09-01

    using techniques such as PCR-select subtraction and next-generation sequencing. Preliminary analysis of the four sequenced viruses has shown that they...HOJV) and Harrison Dam virus (HARDV), and two unknown bunyaviruses, Buffalo Creek Virus (BCV) and Maprik virus (MPKV). It describes the techniques such...unknown viruses with greater speed and at lower cost. The rapid advancement of new generation sequencing techniques allows for highly specific acquisition

  18. Review of General Algorithmic Features for Genome Assemblers for Next Generation Sequencers

    PubMed Central

    Wajid, Bilal; Serpedin, Erchin

    2012-01-01

    In the realm of bioinformatics and computational biology, the most rudimentary data upon which all the analysis is built is the sequence data of genes, proteins and RNA. The sequence data of the entire genome is the solution to the genome assembly problem. The scope of this contribution is to provide an overview on the art of problem-solving applied within the domain of genome assembly in the next-generation sequencing (NGS) platforms. This article discusses the major genome assemblers that were proposed in the literature during the past decade by outlining their basic working principles. It is intended to act as a qualitative, not a quantitative, tutorial to all working on genome assemblers pertaining to the next generation of sequencers. We discuss the theoretical aspects of various genome assemblers, identifying their working schemes. We also discuss briefly the direction in which the area is headed towards along with discussing core issues on software simplicity. PMID:22768980

  19. Depositional sequence evolution, Paleozoic and early Mesozoic of the central Saharan platform, North Africa

    SciTech Connect

    Sprague, A.R.G. )

    1991-08-01

    Over 30 depositional sequences have been identified in the Paleozoic and lower Mesozoic of the Ghadames basin of eastern Algeria, southern Tunisia, and western Libya. Well logs and lithologic information from more than 500 wells were used to correlate the 30 sequences throughout the basin (total area more than 1 million km{sup 2}). Based on systematic change in the log response of strata in successively younger sequences, five groups of sequences with distinctive characteristics have been identified: Cambro-Ordivician, Upper Silurian-Middle Devonian, Upper Devonian, Carboniferous, and Middle Triassic-Middle Jurassic. Each sequence group is terminated by a major, tectonically enhanced sequence boundary that is immediately overlain (except for the Carboniferous) by a shale-prone interval deposited in response to basin-wide flooding. The four Paleozoic sequence groups were deposited on the Saharan platform, a north facing, clastic-dominated shelf that covered most of North Africa during the Paleozoic. The sequence boundary at the top of the Carboniferous sequence group is one of several Permian-Carboniferous angular unconformities in North Africa related to the Hercynian orogeny. The youngest sequence group (Middle Triassic to Middle Jurassic) is a clastic-evaporite package that onlaps southward onto the top of Paleozoic sequence boundary. The progressive changes from the Cambrian to the Jurassic, in the nature of the Ghadames basin sequences is a reflection of the interplay between basin morphology and tectonics, vegetation, eustasy, climate, and sediment supply.

  20. Transcriptome Sequencing and Development of an Expression Microarray Platform for Liver Infection in Adenovirus Type 5-Infected Syrian Golden Hamsters

    PubMed Central

    Ying, Baoling; Toth, Karoly; Spencer, Jacqueline F.; Aurora, Rajeev; Wold, William S.M.

    2015-01-01

    The Syrian golden hamster is an attractive animal for research on infectious diseases and other diseases. We report here the sequencing, assembly, and annotation of the Syrian hamster transcriptome. We include transcripts from ten pooled tissues from a naïve hamster and one stimulated with lipopolysaccharide. Our data set identified 42,707 non-redundant transcripts, representing 34,191 unique genes. Based on the transcriptome data, we generated a custom microarray and used this new platform to investigate the transcriptional response in the Syrian hamster liver following intravenous adenovirus type 5 (Ad5) infection. We found that Ad5 infection caused a massive change in regulation of liver transcripts, with robust up-regulation of genes involved in the antiviral response, indicating that the innate immune response functions in the host defense against Ad5 infection of the liver. The data and novel platforms developed in this study will facilitate further development of this important animal model. PMID:26319212

  1. [Automatic analysis pipeline of next-generation sequencing data].

    PubMed

    Wenke, Li; Fengyu, Li; Siyao, Zhang; Bin, Cai; Na, Zheng; Yu, Nie; Dao, Zhou; Qian, Zhao

    2014-06-01

    The development of next-generation sequencing has generated high demand for data processing and analysis. Although there are a lot of software for analyzing next-generation sequencing data, most of them are designed for one specific function (e.g., alignment, variant calling or annotation). Therefore, it is necessary to combine them together for data analysis and to generate interpretable results for biologists. This study designed a pipeline to process Illumina sequencing data based on Perl programming language and SGE system. The pipeline takes original sequence data (fastq format) as input, calls the standard data processing software (e.g., BWA, Samtools, GATK, and Annovar), and finally outputs a list of annotated variants that researchers can further analyze. The pipeline simplifies the manual operation and improves the efficiency by automatization and parallel computation. Users can easily run the pipeline by editing the configuration file or clicking the graphical interface. Our work will facilitate the research projects using the sequencing technology.

  2. SISEQ: manipulation of multiple sequence and large database files for common platforms.

    PubMed

    Sato, N

    2000-02-01

    A multiple sequence file converter for common platforms, SISEQ,is described, which performs extraction of DNA sequences that correspond to CDS or RNA field of a large database file as well as subsequent multi-sequence conversions for phylogenetic or molecular biological analysis. Command-line interface as well as a GUI and a script-driven operation mode are provided. The program is freely available to academic users in the form of Macintosh FAT binary, DOS executable, or UNIX source code at http://www.molbiol.saitama-u.ac.jp/ñaoki/ Software.html. naokisat@molbiol.saitama-u.ac.jp

  3. A high-throughput optomechanical retrieval method for sequence-verified clonal DNA from the NGS platform.

    PubMed

    Lee, Howon; Kim, Hyoki; Kim, Sungsik; Ryu, Taehoon; Kim, Hwangbeom; Bang, Duhee; Kwon, Sunghoon

    2015-02-02

    Writing DNA plays a significant role in the fields of synthetic biology, functional genomics and bioengineering. DNA clones on next-generation sequencing (NGS) platforms have the potential to be a rich and cost-effective source of sequence-verified DNAs as a precursor for DNA writing. However, it is still very challenging to retrieve target clonal DNA from high-density NGS platforms. Here we propose an enabling technology called 'Sniper Cloning' that enables the precise mapping of target clone features on NGS platforms and non-contact rapid retrieval of targets for the full utilization of DNA clones. By merging the three cutting-edge technologies of NGS, DNA microarray and our pulse laser retrieval system, Sniper Cloning is a week-long process that produces 5,188 error-free synthetic DNAs in a single run of NGS with a single microarray DNA pool. We believe that this technology has potential as a universal tool for DNA writing in biological sciences.

  4. [Detection of pathogenic mutations in Marfan syndrome by targeted next-generation semiconductor sequencing].

    PubMed

    Lu, Chaoxia; Wu, Wei; Xiao, Jifang; Meng, Yan; Zhang, Shuyang; Zhang, Xue

    2013-06-01

    To detect pathogenic mutations in Marfan syndrome (MFS) using an Ion Torrent Personal Genome Machine (PGM) and to validate the result of targeted next-generation semiconductor sequencing for the diagnosis of genetic disorders. Peripheral blood samples were collected from three MFS patients and a normal control with informed consent. Genomic DNA was isolated by standard method and then subjected to targeted sequencing using an Ion Ampliseq(TM) Inherited Disease Panel. Three multiplex PCR reactions were carried out to amplify the coding exons of 328 genes including FBN1, TGFBR1 and TGFBR2. DNA fragments from different samples were ligated with barcoded sequencing adaptors. Template preparation and emulsion PCR, and Ion Sphere Particles enrichment were carried out using an Ion One Touch system. The ion sphere particles were sequenced on a 318 chip using the PGM platform. Data from the PGM runs were processed using an Ion Torrent Suite 3.2 software to generate sequence reads. After sequence alignment and extraction of SNPs and indels, all the variants were filtered against dbSNP137. DNA sequences were visualized with an Integrated Genomics Viewer. The most likely disease-causing variants were analyzed by Sanger sequencing. The PGM sequencing has yielded an output of 855.80 Mb, with a > 100 × median sequencing depth and a coverage of > 98% for the targeted regions in all the four samples. After data analysis and database filtering, one known missense mutation (p.E1811K) and two novel premature termination mutations (p.E2264X and p.L871FfsX23) in the FBN1 gene were identified in the three MFS patients. All mutations were verified by conventional Sanger sequencing. Pathogenic FBN1 mutations have been identified in all patients with MFS, indicating that the targeted next-generation sequencing on the PGM sequencers can be applied for accurate and high-throughput testing of genetic disorders.

  5. Primer and platform effects on 16S rRNA tag sequencing

    SciTech Connect

    Tremblay, Julien; Singh, Kanwar; Fern, Alison; Kirton, Edward S.; He, Shaomei; Woyke, Tanja; Lee, Janey; Chen, Feng; Dangl, Jeffery L.; Tringe, Susannah G.

    2015-08-04

    Sequencing of 16S rRNA gene tags is a popular method for profiling and comparing microbial communities. The protocols and methods used, however, vary considerably with regard to amplification primers, sequencing primers, sequencing technologies; as well as quality filtering and clustering. How results are affected by these choices, and whether data produced with different protocols can be meaningfully compared, is often unknown. Here we compare results obtained using three different amplification primer sets (targeting V4, V6–V8, and V7–V8) and two sequencing technologies (454 pyrosequencing and Illumina MiSeq) using DNA from a mock community containing a known number of species as well as complex environmental samples whose PCR-independent profiles were estimated using shotgun sequencing. We find that paired-end MiSeq reads produce higher quality data and enabled the use of more aggressive quality control parameters over 454, resulting in a higher retention rate of high quality reads for downstream data analysis. While primer choice considerably influences quantitative abundance estimations, sequencing platform has relatively minor effects when matched primers are used. In conclusion, beta diversity metrics are surprisingly robust to both primer and sequencing platform biases.

  6. Primer and platform effects on 16S rRNA tag sequencing

    DOE PAGES

    Tremblay, Julien; Singh, Kanwar; Fern, Alison; ...

    2015-08-04

    Sequencing of 16S rRNA gene tags is a popular method for profiling and comparing microbial communities. The protocols and methods used, however, vary considerably with regard to amplification primers, sequencing primers, sequencing technologies; as well as quality filtering and clustering. How results are affected by these choices, and whether data produced with different protocols can be meaningfully compared, is often unknown. Here we compare results obtained using three different amplification primer sets (targeting V4, V6–V8, and V7–V8) and two sequencing technologies (454 pyrosequencing and Illumina MiSeq) using DNA from a mock community containing a known number of species as wellmore » as complex environmental samples whose PCR-independent profiles were estimated using shotgun sequencing. We find that paired-end MiSeq reads produce higher quality data and enabled the use of more aggressive quality control parameters over 454, resulting in a higher retention rate of high quality reads for downstream data analysis. While primer choice considerably influences quantitative abundance estimations, sequencing platform has relatively minor effects when matched primers are used. In conclusion, beta diversity metrics are surprisingly robust to both primer and sequencing platform biases.« less

  7. PWHATSHAP: efficient haplotyping for future generation sequencing.

    PubMed

    Bracciali, Andrea; Aldinucci, Marco; Patterson, Murray; Marschall, Tobias; Pisanti, Nadia; Merelli, Ivan; Torquati, Massimo

    2016-09-22

    Haplotype phasing is an important problem in the analysis of genomics information. Given a set of DNA fragments of an individual, it consists of determining which one of the possible alleles (alternative forms of a gene) each fragment comes from. Haplotype information is relevant to gene regulation, epigenetics, genome-wide association studies, evolutionary and population studies, and the study of mutations. Haplotyping is currently addressed as an optimisation problem aiming at solutions that minimise, for instance, error correction costs, where costs are a measure of the confidence in the accuracy of the information acquired from DNA sequencing. Solutions have typically an exponential computational complexity. WHATSHAP is a recent optimal approach which moves computational complexity from DNA fragment length to fragment overlap, i.e., coverage, and is hence of particular interest when considering sequencing technology's current trends that are producing longer fragments. Given the potential relevance of efficient haplotyping in several analysis pipelines, we have designed and engineered PWHATSHAP, a parallel, high-performance version of WHATSHAP. PWHATSHAP is embedded in a toolkit developed in Python and supports genomics datasets in standard file formats. Building on WHATSHAP, PWHATSHAP exhibits the same complexity exploring a number of possible solutions which is exponential in the coverage of the dataset. The parallel implementation on multi-core architectures allows for a relevant reduction of the execution time for haplotyping, while the provided results enjoy the same high accuracy as that provided by WHATSHAP, which increases with coverage. Due to its structure and management of the large datasets, the parallelisation of WHATSHAP posed demanding technical challenges, which have been addressed exploiting a high-level parallel programming framework. The result, PWHATSHAP, is a freely available toolkit that improves the efficiency of the analysis of genomics

  8. A comparison of Illumina and Ion Torrent sequencing platforms in the context of differential gene expression.

    PubMed

    Lahens, Nicholas F; Ricciotti, Emanuela; Smirnova, Olga; Toorens, Erik; Kim, Eun Ji; Baruzzo, Giacomo; Hayer, Katharina E; Ganguly, Tapan; Schug, Jonathan; Grant, Gregory R

    2017-08-10

    Though Illumina has largely dominated the RNA-Seq field, the simultaneous availability of Ion Torrent has left scientists wondering which platform is most effective for differential gene expression (DGE) analysis. Previous investigations of this question have typically used reference samples derived from cell lines and brain tissue, and do not involve biological variability. While these comparisons might inform studies of tissue-specific expression, marked by large-scale transcriptional differences, this is not the common use case. Here we employ a standard treatment/control experimental design, which enables us to evaluate these platforms in the context of the expression differences common in differential gene expression experiments. Specifically, we assessed the hepatic inflammatory response of mice by assaying liver RNA from control and IL-1β treated animals with both the Illumina HiSeq and the Ion Torrent Proton sequencing platforms. We found the greatest difference between the platforms at the level of read alignment, a moderate level of concordance at the level of DGE analysis, and nearly identical results at the level of differentially affected pathways. Interestingly, we also observed a strong interaction between sequencing platform and choice of aligner. By aligning both real and simulated Illumina and Ion Torrent data with the twelve most commonly-cited aligners in the literature, we observed that different aligner and platform combinations were better suited to probing different genomic features; for example, disentangling the source of expression in gene-pseudogene pairs. Taken together, our results indicate that while Illumina and Ion Torrent have similar capacities to detect changes in biology from a treatment/control experiment, these platforms may be tailored to interrogate different transcriptional phenomena through careful selection of alignment software.

  9. Effect of Next-Generation Exome Sequencing Depth for Discovery of Diagnostic Variants

    PubMed Central

    Kim, Kyung; Seong, Moon-Woo; Chung, Won-Hyong; Park, Sung Sup; Leem, Sangseob; Park, Won; Kim, Jihyun; Lee, KiYoung; Park, Rae Woong; Kim, Namshin

    2015-01-01

    Sequencing depth, which is directly related to the cost and time required for the generation, processing, and maintenance of next-generation sequencing data, is an important factor in the practical utilization of such data in clinical fields. Unfortunately, identifying an exome sequencing depth adequate for clinical use is a challenge that has not been addressed extensively. Here, we investigate the effect of exome sequencing depth on the discovery of sequence variants for clinical use. Toward this, we sequenced ten germ-line blood samples from breast cancer patients on the Illumina platform GAII(x) at a high depth of ~200×. We observed that most function-related diverse variants in the human exonic regions could be detected at a sequencing depth of 120×. Furthermore, investigation using a diagnostic gene set showed that the number of clinical variants identified using exome sequencing reached a plateau at an average sequencing depth of about 120×. Moreover, the phenomena were consistent across the breast cancer samples. PMID:26175660

  10. Non-random DNA fragmentation in next-generation sequencing

    NASA Astrophysics Data System (ADS)

    Poptsova, Maria S.; Il'Icheva, Irina A.; Nechipurenko, Dmitry Yu.; Panchenko, Larisa A.; Khodikov, Mingian V.; Oparina, Nina Y.; Polozov, Robert V.; Nechipurenko, Yury D.; Grokhovsky, Sergei L.

    2014-03-01

    Next Generation Sequencing (NGS) technology is based on cutting DNA into small fragments, and their massive parallel sequencing. The multiple overlapping segments termed ``reads'' are assembled into a contiguous sequence. To reduce sequencing errors, every genome region should be sequenced several dozen times. This sequencing approach is based on the assumption that genomic DNA breaks are random and sequence-independent. However, previously we showed that for the sonicated restriction DNA fragments the rates of double-stranded breaks depend on the nucleotide sequence. In this work we analyzed genomic reads from NGS data and discovered that fragmentation methods based on the action of the hydrodynamic forces on DNA, produce similar bias. Consideration of this non-random DNA fragmentation may allow one to unravel what factors and to what extent influence the non-uniform coverage of various genomic regions.

  11. Next Generation Sequencing Technologies: The Doorway to the Unexplored Genomics of Non-Model Plants

    PubMed Central

    Unamba, Chibuikem I. N.; Nag, Akshay; Sharma, Ram K.

    2015-01-01

    Non-model plants i.e., the species which have one or all of the characters such as long life cycle, difficulty to grow in the laboratory or poor fecundity, have been schemed out of sequencing projects earlier, due to high running cost of Sanger sequencing. Consequently, the information about their genomics and key biological processes are inadequate. However, the advent of fast and cost effective next generation sequencing (NGS) platforms in the recent past has enabled the unearthing of certain characteristic gene structures unique to these species. It has also aided in gaining insight about mechanisms underlying processes of gene expression and secondary metabolism as well as facilitated development of genomic resources for diversity characterization, evolutionary analysis and marker assisted breeding even without prior availability of genomic sequence information. In this review we explore how different Next Gen Sequencing platforms, as well as recent advances in NGS based high throughput genotyping technologies are rewarding efforts on de-novo whole genome/transcriptome sequencing, development of genome wide sequence based markers resources for improvement of non-model crops that are less costly than phenotyping. PMID:26734016

  12. Exploring the potential of next-generation sequencing in detection of respiratory viruses.

    PubMed

    Prachayangprecha, Slinporn; Schapendonk, Claudia M E; Koopmans, Marion P; Osterhaus, Albert D M E; Schürch, Anita C; Pas, Suzan D; van der Eijk, Annemiek A; Poovorawan, Yong; Haagmans, Bart L; Smits, Saskia L

    2014-10-01

    Efficient detection of human respiratory viral pathogens is crucial in the management of patients with acute respiratory tract infection. Sequence-independent amplification of nucleic acids combined with next-generation sequencing technology and bioinformatics analyses is a promising strategy for identifying pathogens in clinical and public health settings. It allows the characterization of hundreds of different known pathogens simultaneously and of novel pathogens that elude conventional testing. However, major hurdles for its routine use exist, including cost, turnaround time, and especially sensitivity of the assay, as the detection limit is dependent on viral load, host genetic material, and sequencing depth. To obtain insights into these aspects, we analyzed nasopharyngeal aspirates from a cohort of 81 Thai children with respiratory disease for the presence of respiratory viruses using a sequence-independent next-generation sequencing approach and routinely used diagnostic real-time reverse transcriptase PCR (real-time RT-PCR) assays. With respect to the detection of rhinovirus and human metapneumovirus, the next-generation sequencing approach was at least as sensitive as diagnostic real-time RT-PCR in this small cohort, whereas for bocavirus and enterovirus, next-generation sequencing was less sensitive than real-time RT-PCR. The advantage of the sequencing approach over real-time RT-PCR was the immediate availability of virus-typing information. Considering the development of platforms capable of generating more output data at declining costs, next-generation sequencing remains of interest for future virus diagnosis in clinical and public health settings and certainly as an additional tool when screening results from real-time RT-PCR are negative.

  13. Exploring the Potential of Next-Generation Sequencing in Detection of Respiratory Viruses

    PubMed Central

    Prachayangprecha, Slinporn; Schapendonk, Claudia M. E.; Koopmans, Marion P.; Osterhaus, Albert D. M. E.; Schürch, Anita C.; Pas, Suzan D.; van der Eijk, Annemiek A.; Poovorawan, Yong; Haagmans, Bart L.

    2014-01-01

    Efficient detection of human respiratory viral pathogens is crucial in the management of patients with acute respiratory tract infection. Sequence-independent amplification of nucleic acids combined with next-generation sequencing technology and bioinformatics analyses is a promising strategy for identifying pathogens in clinical and public health settings. It allows the characterization of hundreds of different known pathogens simultaneously and of novel pathogens that elude conventional testing. However, major hurdles for its routine use exist, including cost, turnaround time, and especially sensitivity of the assay, as the detection limit is dependent on viral load, host genetic material, and sequencing depth. To obtain insights into these aspects, we analyzed nasopharyngeal aspirates from a cohort of 81 Thai children with respiratory disease for the presence of respiratory viruses using a sequence-independent next-generation sequencing approach and routinely used diagnostic real-time reverse transcriptase PCR (real-time RT-PCR) assays. With respect to the detection of rhinovirus and human metapneumovirus, the next-generation sequencing approach was at least as sensitive as diagnostic real-time RT-PCR in this small cohort, whereas for bocavirus and enterovirus, next-generation sequencing was less sensitive than real-time RT-PCR. The advantage of the sequencing approach over real-time RT-PCR was the immediate availability of virus-typing information. Considering the development of platforms capable of generating more output data at declining costs, next-generation sequencing remains of interest for future virus diagnosis in clinical and public health settings and certainly as an additional tool when screening results from real-time RT-PCR are negative. PMID:25100822

  14. Computer program to generate attitude error equations for a gimballed platform

    NASA Technical Reports Server (NTRS)

    Hall, W. A., Jr.; Morris, T. D.; Rone, K. Y.

    1972-01-01

    Computer program for solving attitude error equations related to gimballed platform is described. Program generates matrix elements of attitude error equations when initial matrices and trigonometric identities have been defined. Program is written for IBM 360 computer.

  15. Zseq: An Approach for Preprocessing Next-Generation Sequencing Data.

    PubMed

    Alkhateeb, Abedalrhman; Rueda, Luis

    2017-08-01

    Next-generation sequencing technology generates a huge number of reads (short sequences), which contain a vast amount of genomic data. The sequencing process, however, comes with artifacts. Preprocessing of sequences is mandatory for further downstream analysis. We present Zseq, a linear method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of unique k-mers in each sequence as its corresponding score and also takes into the account other factors such as ambiguous nucleotides or high GC-content percentage in k-mers. Based on a z-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold. Zseq algorithm is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as de novo assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Moreover, de novo assembled transcripts from the reads filtered by Zseq have longer genomic sequences than other tested methods. Estimating the threshold of the cutoff point is introduced using labeling rules with optimistic results.

  16. Bioelectrochemical system platform for sustainable environmental remediation and energy generation.

    PubMed

    Wang, Heming; Luo, Haiping; Fallgren, Paul H; Jin, Song; Ren, Zhiyong Jason

    2015-01-01

    The increasing awareness of the energy-environment nexus is compelling the development of technologies that reduce environmental impacts during energy production as well as energy consumption during environmental remediation. Countries spend billions in pollution cleanup projects, and new technologies with low energy and chemical consumption are needed for sustainable remediation practice. This perspective review provides a comprehensive summary on the mechanisms of the new bioelectrochemical system (BES) platform technology for efficient and low cost remediation, including petroleum hydrocarbons, chlorinated solvents, perchlorate, azo dyes, and metals, and it also discusses the potential new uses of BES approach for some emerging contaminants remediation, such as CO2 in air and nutrients and micropollutants in water. The unique feature of BES for environmental remediation is the use of electrodes as non-exhaustible electron acceptors, or even donors, for contaminant degradation, which requires minimum energy or chemicals but instead produces sustainable energy for monitoring and other onsite uses. BES provides both oxidation (anode) and reduction (cathode) reactions that integrate microbial-electro-chemical removal mechanisms, so complex contaminants with different characteristics can be removed. We believe the BES platform carries great potential for sustainable remediation and hope this perspective provides background and insights for future research and development.

  17. Implication of next-generation sequencing on association studies

    PubMed Central

    2011-01-01

    Background Next-generation sequencing technologies can effectively detect the entire spectrum of genomic variation and provide a powerful tool for systematic exploration of the universe of common, low frequency and rare variants in the entire genome. However, the current paradigm for genome-wide association studies (GWAS) is to catalogue and genotype common variants (5% < MAF). The methods and study design for testing the association of low frequency (0.5% < MAF ≤ 5%) and rare variation (MAF ≤ 0.5%) have not been thoroughly investigated. The 1000 Genomes Project represents one such endeavour to characterize the human genetic variation pattern at the MAF = 1% level as a foundation for association studies. In this report, we explore different strategies and study designs for the near future GWAS in the post-era, based on both low coverage pilot data and exon pilot data in 1000 Genomes Project. Results We investigated the linkage disequilibrium (LD) pattern among common and low frequency SNPs and its implication for association studies. We found that the LD between low frequency alleles and low frequency alleles, and low frequency alleles and common alleles are much weaker than the LD between common and common alleles. We examined various tagging designs with and without statistical imputation approaches and compare their power against de novo resequencing in mapping causal variants under various disease models. We used the low coverage pilot data which contain ~14 M SNPs as a hypothetical genotype-array platform (Pilot 14 M) to interrogate its impact on the selection of tag SNPs, mapping coverage and power of association tests. We found that even after imputation we still observed 45.4% of low frequency SNPs which were untaggable and only 67.7% of the low frequency variation was covered by the Pilot 14 M array. Conclusions This suggested GWAS based on SNP arrays would be ill-suited for association studies of low frequency variation. PMID:21682891

  18. Multiple nuclear ortholog next generation sequencing phylogeny of Daucus

    USDA-ARS?s Scientific Manuscript database

    Next generation sequencing is helping to solve the data insufficiency problem hindering well-resolved dominant gene phylogenies. We used Roche 454 technology to obtain DNA sequences from 93 nuclear orthologs, dispersed throughout all linkage groups of Daucus. Of these 93 orthologs, ten were designed...

  19. Analyzing the safety of removal sequences for piles of an offshore jacket platform

    NASA Astrophysics Data System (ADS)

    Pan, Xin-Ying; Zhang, Zhao-De

    2009-12-01

    An inevitable consequence of the development of the offshore petroleum industry is the eventual obsolescence of large offshore structures. Proper methods for removal of decommissioned offshore platforms are becoming an important topic that the oil and gas industry must pay increasing attention to. While removing sections from a decommissioned jacket platform, the stability of the remaining parts is critical. The jacket danger indices D σ and D s defined in this paper are very useful for analyzing the safety of any procedure planned for disassembling a jacket platform. The safest piles cutting sequence can be determined easily by comparing every column of D σ and D s or simply analyzing the figures of every row of D σ and D s .

  20. Building a next generation platform for association studies in cacao

    USDA-ARS?s Scientific Manuscript database

    The drastic reductions in cost and time associated with the collection of DNA sequence and genotype data have revolutionized genetic mapping in model systems (e.g. humans, Arabidopsis) and also promise to significantly enhance the power and resolution of genetic mapping in agricultural systems. Prog...

  1. Qualimap: evaluating next-generation sequencing alignment data.

    PubMed

    García-Alcalde, Fernando; Okonechnikov, Konstantin; Carbonell, José; Cruz, Luis M; Götz, Stefan; Tarazona, Sonia; Dopazo, Joaquín; Meyer, Thomas F; Conesa, Ana

    2012-10-15

    The sequence alignment/map (SAM) and the binary alignment/map (BAM) formats have become the standard method of representation of nucleotide sequence alignments for next-generation sequencing data. SAM/BAM files usually contain information from tens to hundreds of millions of reads. Often, the sequencing technology, protocol and/or the selected mapping algorithm introduce some unwanted biases in these data. The systematic detection of such biases is a non-trivial task that is crucial to drive appropriate downstream analyses. We have developed Qualimap, a Java application that supports user-friendly quality control of mapping data, by considering sequence features and their genomic properties. Qualimap takes sequence alignment data and provides graphical and statistical analyses for the evaluation of data. Such quality-control data are vital for highlighting problems in the sequencing and/or mapping processes, which must be addressed prior to further analyses. Qualimap is freely available from http://www.qualimap.org.

  2. Aerodynamic platform comparison for jet-stream electricity generation

    NASA Astrophysics Data System (ADS)

    Fletcher, C. A. J.; Honan, A. J.; Sapuppo, J. S.

    1983-02-01

    Various aerodynamic platforms are considered for suitability for deriving electricity through wind turbines placed in the jet stream. Wind tunnel, economic, and performance analyses were performed for the integrated diffuser augmented wind turbine (IDAWT), the separated DAWT (SDAWT), a separated unshrouded wind turbine (SUWT), and a rotary wing concept (RWC). The wind tunnel trials were run with models and half models of the concepts to test the lift, static stability, and power extraction capability in a 25 m/sec flow. Variations in lift at varying angles of attack were also studied. The results indicated that the SDAWT and the IDAWT could be built at $650/kW and produce power at an operating cost of $.05/kWh. Improvements are projected to reduce the costs to $550/kW installed with operating costs less than $.04/kWh. The rotary wing concept was ruled out as a candidate.

  3. Comparative depositional geometries and facies within windward rimmed platform and carbonate ramp sequences

    SciTech Connect

    Boss, S.K.; Rasmussen, K.A.; Neumann, A.C. )

    1992-01-01

    Northern Great Bahama Bank (NGBB) combines geomorphic aspects of rimmed platforms and carbonate ramps in a windward (high-energy) environment. Analysis of Holocene sediment cores, seismic reflection mapping of the Holocene-Pleistocene unconformity and transgressive Holocene deposits and petrographic study of excavated Holocene submarine-cemented horizons provides an integrated view of evolving depositional geometries within both rimmed platform and ramp settings. Cores display gross textural and compositional homogeneity; all sediments are medium to coarse sands comprised of composite peloids, Halimeda sp., benthic foraminifera and molluscs. Three-dimensional seismic mapping reveals that this basal unconformity exhibits variation in topographic relief related to both constructional and erosional processes; rimmed portions of the platform are associated with topographic plateaus'' with fringing eolianite ridges or (rarely) reefs. These plateaus'' are separated by a somewhat deeper (ca. 5m deep) trough'' exhibiting little relief, but sloping seaward to form a ramp. Multiple intrasequence cemented horizons are a common feature of the thinner deposits of the NGBB ramp where tidal exchange is vigorous and sediment deposition is episodic or in dynamic balance with sediment export. Thus, rimmed carbonate platform facies are thick marine sands with relatively little submarine cementation while open, unsheltered ramp facies are characterized by thin sediment sequences containing numerous, discontinuous submarine-cemented horizons. In the absence of other obvious facies or geomorphic indicators (e.g. preserved reefal rims), the preservation of similar depositional features in ancient limestones may serve as a useful discriminant of rimmed platform versus carbonate ramp settings.

  4. Next-generation sequencing in clinical virology: Discovery of new viruses.

    PubMed

    Datta, Sibnarayan; Budhauliya, Raghvendra; Das, Bidisha; Chatterjee, Soumya; Vanlalhmuaka; Veer, Vijay

    2015-08-12

    Viruses are a cause of significant health problem worldwide, especially in the developing nations. Due to different anthropological activities, human populations are exposed to different viral pathogens, many of which emerge as outbreaks. In such situations, discovery of novel viruses is utmost important for deciding prevention and treatment strategies. Since last century, a number of different virus discovery methods, based on cell culture inoculation, sequence-independent PCR have been used for identification of a variety of viruses. However, the recent emergence and commercial availability of next-generation sequencers (NGS) has entirely changed the field of virus discovery. These massively parallel sequencing platforms can sequence a mixture of genetic materials from a very heterogeneous mix, with high sensitivity. Moreover, these platforms work in a sequence-independent manner, making them ideal tools for virus discovery. However, for their application in clinics, sample preparation or enrichment is necessary to detect low abundance virus populations. A number of techniques have also been developed for enrichment or viral nucleic acids. In this manuscript, we review the evolution of sequencing; NGS technologies available today as well as widely used virus enrichment technologies. We also discuss the challenges associated with their applications in the clinical virus discovery.

  5. Next-generation sequencing: the future of molecular genetics in poultry production and food safety.

    PubMed

    Diaz-Sanchez, S; Hanning, I; Pendleton, Sean; D'Souza, Doris

    2013-02-01

    The era of molecular biology and automation of the Sanger chain-terminator sequencing method has led to discovery and advances in diagnostics and biotechnology. The Sanger methodology dominated research for over 2 decades, leading to significant accomplishments and technological improvements in DNA sequencing. Next-generation high-throughput sequencing (HT-NGS) technologies were developed subsequently to overcome the limitations of this first generation technology that include higher speed, less labor, and lowered cost. Various platforms developed include sequencing-by-synthesis 454 Life Sciences, Illumina (Solexa) sequencing, SOLiD sequencing (among others), and the Ion Torrent semiconductor sequencing technologies that use different detection principles. As technology advances, progress made toward third generation sequencing technologies are being reported, which include Nanopore Sequencing and real-time monitoring of PCR activity through fluorescent resonant energy transfer. The advantages of these technologies include scalability, simplicity, with increasing DNA polymerase performance and yields, being less error prone, and even more economically feasible with the eventual goal of obtaining real-time results. These technologies can be directly applied to improve poultry production and enhance food safety. For example, sequence-based (determination of the gut microbial community, genes for metabolic pathways, or presence of plasmids) and function-based (screening for function such as antibiotic resistance, or vitamin production) metagenomic analysis can be carried out. Gut microbialflora/communities of poultry can be sequenced to determine the changes that affect health and disease along with efficacy of methods to control pathogenic growth. Thus, the purpose of this review is to provide an overview of the principles of these current technologies and their potential application to improve poultry production and food safety as well as public health.

  6. New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing

    PubMed Central

    Song, Kai; Ren, Jie; Reinert, Gesine; Deng, Minghua

    2014-01-01

    With the development of next-generation sequencing (NGS) technologies, a large amount of short read data has been generated. Assembly of these short reads can be challenging for genomes and metagenomes without template sequences, making alignment-based genome sequence comparison difficult. In addition, sequence reads from NGS can come from different regions of various genomes and they may not be alignable. Sequence signature-based methods for genome comparison based on the frequencies of word patterns in genomes and metagenomes can potentially be useful for the analysis of short reads data from NGS. Here we review the recent development of alignment-free genome and metagenome comparison based on the frequencies of word patterns with emphasis on the dissimilarity measures between sequences, the statistical power of these measures when two sequences are related and the applications of these measures to NGS data. PMID:24064230

  7. Next generation barcode tagged sequencing for monitoring microbial community dynamics.

    PubMed

    Breakwell, Katy; Tetu, Sasha G; Elbourne, Liam D H

    2014-01-01

    Microbial identification using 16S rDNA variable regions has become increasingly popular over the past decade. The application of next-generation amplicon sequencing to these regions allows microbial communities to be sequenced in far greater depth than previous techniques, as well as allowing for the identification of unculturable or rare organisms within a sample. Multiplexing can be used to sequence multiple samples in tandem through the use of sample-specific identification sequences which are attached to each amplicon, making this a cost-effective method for large-scale microbial identification experiments.

  8. Next generation sequencing technologies for insect virus discovery.

    PubMed

    Liu, Sijun; Vijayendran, Diveena; Bonning, Bryony C

    2011-10-01

    Insects are commonly infected with multiple viruses including those that cause sublethal, asymptomatic, and latent infections. Traditional methods for virus isolation typically lack the sensitivity required for detection of such viruses that are present at low abundance. In this respect, next generation sequencing technologies have revolutionized methods for the discovery and identification of new viruses from insects. Here we review both traditional and modern methods for virus discovery, and outline analysis of transcriptome and small RNA data for identification of viral sequences. We will introduce methods for de novo assembly of viral sequences, identification of potential viral sequences from BLAST data, and bioinformatics for generating full-length or near full-length viral genome sequences. We will also discuss implications of the ubiquity of viruses in insects and in insect cell lines. All of the methods described in this article can also apply to the discovery of viruses in other organisms.

  9. Next Generation Sequencing Technologies for Insect Virus Discovery

    PubMed Central

    Liu, Sijun; Vijayendran, Diveena; Bonning, Bryony C.

    2011-01-01

    Insects are commonly infected with multiple viruses including those that cause sublethal, asymptomatic, and latent infections. Traditional methods for virus isolation typically lack the sensitivity required for detection of such viruses that are present at low abundance. In this respect, next generation sequencing technologies have revolutionized methods for the discovery and identification of new viruses from insects. Here we review both traditional and modern methods for virus discovery, and outline analysis of transcriptome and small RNA data for identification of viral sequences. We will introduce methods for de novo assembly of viral sequences, identification of potential viral sequences from BLAST data, and bioinformatics for generating full-length or near full-length viral genome sequences. We will also discuss implications of the ubiquity of viruses in insects and in insect cell lines. All of the methods described in this article can also apply to the discovery of viruses in other organisms. PMID:22069519

  10. QColors: an algorithm for conservative viral quasispecies reconstruction from short and non-contiguous next generation sequencing reads.

    PubMed

    Huang, Austin; Kantor, Rami; DeLong, Allison; Schreier, Leeann; Istrail, Sorin

    Next generation sequencing technologies have recently been applied to characterize mutational spectra of the heterogeneous population of viral genotypes (known as a quasispecies) within HIV-infected patients. Such information is clinically relevant because minority genetic subpopulations of HIV within patients enable viral escape from selection pressures such as the immune response and antiretroviral therapy. However, methods for quasispecies sequence reconstruction from next generation sequencing reads are not yet widely used and remains an emerging area of research. Furthermore, the majority of research methodology in HIV has focused on 454 sequencing, while many next-generation sequencing platforms used in practice are limited to shorter read lengths relative to 454 sequencing. Little work has been done in determining how best to address the read length limitations of other platforms. The approach described here incorporates graph representations of both read differences and read overlap to conservatively determine the regions of the sequence with sufficient variability to separate quasispecies sequences. Within these tractable regions of quasispecies inference, we use constraint programming to solve for an optimal quasispecies subsequence determination via vertex coloring of the conflict graph, a representation which also lends itself to data with non-contiguous reads such as paired-end sequencing. We demonstrate the utility of the method by applying it to simulations based on actual intra-patient clonal HIV-1 sequencing data.

  11. The Feasibility Study of Non-Invasive Fetal Trisomy 18 and 21 Detection with Semiconductor Sequencing Platform

    PubMed Central

    Guo, Qiwei; Chen, Jinchun; Quan, Shengmao; Zhang, Ahong; Zheng, Hailing; Zhu, Xingqiang; Lin, Jin; Xu, Huan; Wu, Ayang; Park, Sin-Gi; Kim, Byung Chul; Joo, Hee Jae; Chen, Hongliang; Bhak, Jong

    2014-01-01

    Objective Recent non-invasive prenatal testing (NIPT) technologies are based on next-generation sequencing (NGS). NGS allows rapid and effective clinical diagnoses to be determined with two common sequencing systems: Illumina and Ion Torrent platforms. The majority of NIPT technology is associated with Illumina platform. We investigated whether fetal trisomy 18 and 21 were sensitively and specifically detectable by semiconductor sequencer: Ion Proton. Methods From March 2012 to October 2013, we enrolled 155 pregnant women with fetuses who were diagnosed as high risk of fetal defects at Xiamen Maternal & Child Health Care Hospital (Xiamen, Fujian, China). Adapter-ligated DNA libraries were analyzed by the Ion Proton™ System (Life Technologies, Grand Island, NY, USA) with an average 0.3× sequencing coverage per nucleotide. Average total raw reads per sample was 6.5 million and mean rate of uniquely mapped reads was 59.0%. The results of this study were derived from BWA mapping. Z-score was used for fetal trisomy 18 and 21 detection. Results Interactive dot diagrams showed the minimal z-score values to discriminate negative versus positive cases of fetal trisomy 18 and 21. For fetal trisomy 18, the minimal z-score value of 2.459 showed 100% positive predictive and negative predictive values. The minimal z-score of 2.566 was used to classify negative versus positive cases of fetal trisomy 21. Conclusion These results provide the evidence that fetal trisomy 18 and 21 detection can be performed with semiconductor sequencer. Our data also suggest that a prospective study should be performed with a larger cohort of clinically diverse obstetrics patients. PMID:25329639

  12. The feasibility study of non-invasive fetal trisomy 18 and 21 detection with semiconductor sequencing platform.

    PubMed

    Jeon, Young Joo; Zhou, Yulin; Li, Yihan; Guo, Qiwei; Chen, Jinchun; Quan, Shengmao; Zhang, Ahong; Zheng, Hailing; Zhu, Xingqiang; Lin, Jin; Xu, Huan; Wu, Ayang; Park, Sin-Gi; Kim, Byung Chul; Joo, Hee Jae; Chen, Hongliang; Bhak, Jong

    2014-01-01

    Recent non-invasive prenatal testing (NIPT) technologies are based on next-generation sequencing (NGS). NGS allows rapid and effective clinical diagnoses to be determined with two common sequencing systems: Illumina and Ion Torrent platforms. The majority of NIPT technology is associated with Illumina platform. We investigated whether fetal trisomy 18 and 21 were sensitively and specifically detectable by semiconductor sequencer: Ion Proton. From March 2012 to October 2013, we enrolled 155 pregnant women with fetuses who were diagnosed as high risk of fetal defects at Xiamen Maternal & Child Health Care Hospital (Xiamen, Fujian, China). Adapter-ligated DNA libraries were analyzed by the Ion Proton™ System (Life Technologies, Grand Island, NY, USA) with an average 0.3× sequencing coverage per nucleotide. Average total raw reads per sample was 6.5 million and mean rate of uniquely mapped reads was 59.0%. The results of this study were derived from BWA mapping. Z-score was used for fetal trisomy 18 and 21 detection. Interactive dot diagrams showed the minimal z-score values to discriminate negative versus positive cases of fetal trisomy 18 and 21. For fetal trisomy 18, the minimal z-score value of 2.459 showed 100% positive predictive and negative predictive values. The minimal z-score of 2.566 was used to classify negative versus positive cases of fetal trisomy 21. These results provide the evidence that fetal trisomy 18 and 21 detection can be performed with semiconductor sequencer. Our data also suggest that a prospective study should be performed with a larger cohort of clinically diverse obstetrics patients.

  13. Evaluation of exome variants using the Ion Proton Platform to sequence error-prone regions

    PubMed Central

    Min, Byung Joo; Seo, Myung Eui; Kim, Ju Han

    2017-01-01

    The Ion Proton sequencer from Thermo Fisher accurately determines sequence variants from target regions with a rapid turnaround time at a low cost. However, misleading variant-calling errors can occur. We performed a systematic evaluation and manual curation of read-level alignments for the 675 ultrarare variants reported by the Ion Proton sequencer from 27 whole-exome sequencing data but that are not present in either the 1000 Genomes Project and the Exome Aggregation Consortium. We classified positive variant calls into 393 highly likely false positives, 126 likely false positives, and 156 likely true positives, which comprised 58.2%, 18.7%, and 23.1% of the variants, respectively. We identified four distinct error patterns of variant calling that may be bioinformatically corrected when using different strategies: simplicity region, SNV cluster, peripheral sequence read, and base inversion. Local de novo assembly successfully corrected 201 (38.7%) of the 519 highly likely or likely false positives. We also demonstrate that the two sequencing kits from Thermo Fisher (the Ion PI Sequencing 200 kit V3 and the Ion PI Hi-Q kit) exhibit different error profiles across different error types. A refined calling algorithm with better polymerase may improve the performance of the Ion Proton sequencing platform. PMID:28742110

  14. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data.

    PubMed

    Patel, Ravi K; Jain, Mukesh

    2012-01-01

    Next generation sequencing (NGS) technologies provide a high-throughput means to generate large amount of sequence data. However, quality control (QC) of sequence data generated from these technologies is extremely important for meaningful downstream analysis. Further, highly efficient and fast processing tools are required to handle the large volume of datasets. Here, we have developed an application, NGS QC Toolkit, for quality check and filtering of high-quality data. This toolkit is a standalone and open source application freely available at http://www.nipgr.res.in/ngsqctoolkit.html. All the tools in the application have been implemented in Perl programming language. The toolkit is comprised of user-friendly tools for QC of sequencing data generated using Roche 454 and Illumina platforms, and additional tools to aid QC (sequence format converter and trimming tools) and analysis (statistics tools). A variety of options have been provided to facilitate the QC at user-defined parameters. The toolkit is expected to be very useful for the QC of NGS data to facilitate better downstream analysis.

  15. NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data

    PubMed Central

    Patel, Ravi K.; Jain, Mukesh

    2012-01-01

    Next generation sequencing (NGS) technologies provide a high-throughput means to generate large amount of sequence data. However, quality control (QC) of sequence data generated from these technologies is extremely important for meaningful downstream analysis. Further, highly efficient and fast processing tools are required to handle the large volume of datasets. Here, we have developed an application, NGS QC Toolkit, for quality check and filtering of high-quality data. This toolkit is a standalone and open source application freely available at http://www.nipgr.res.in/ngsqctoolkit.html. All the tools in the application have been implemented in Perl programming language. The toolkit is comprised of user-friendly tools for QC of sequencing data generated using Roche 454 and Illumina platforms, and additional tools to aid QC (sequence format converter and trimming tools) and analysis (statistics tools). A variety of options have been provided to facilitate the QC at user-defined parameters. The toolkit is expected to be very useful for the QC of NGS data to facilitate better downstream analysis. PMID:22312429

  16. Strategy for microbiome analysis using 16S rRNA gene sequence analysis on the Illumina sequencing platform.

    PubMed

    Ram, Jeffrey L; Karim, Aos S; Sendler, Edward D; Kato, Ikuko

    2011-06-01

    Understanding the identity and changes of organisms in the urogenital and other microbiomes of the human body may be key to discovering causes and new treatments of many ailments, such as vaginosis. High-throughput sequencing technologies have recently enabled discovery of the great diversity of the human microbiome. The cost per base of many of these sequencing platforms remains high (thousands of dollars per sample); however, the Illumina Genome Analyzer (IGA) is estimated to have a cost per base less than one-fifth of its nearest competitor. The main disadvantage of the IGA for sequencing PCR-amplified 16S rRNA genes is that the maximum read-length of the IGA is only 100 bases; whereas, at least 300 bases are needed to obtain phylogenetically informative data down to the genus and species level. In this paper we describe and conduct a pilot test of a multiplex sequencing strategy suitable for achieving total reads of > 300 bases per extracted DNA molecule on the IGA. Results show that all proposed primers produce products of the expected size and that correct sequences can be obtained, with all proposed forward primers. Various bioinformatic optimization of the Illumina Bustard analysis pipeline proved necessary to extract the correct sequence from IGA image data, and these modifications of the data files indicate that further optimization of the analysis pipeline may improve the quality rankings of the data and enable more sequence to be correctly analyzed. The successful application of this method could result in an unprecedentedly deep description (800,000 taxonomic identifications per sample) of the urogenital and other microbiomes in a large number of samples at a reasonable cost per sample.

  17. The Generation Challenge Programme Platform: Semantic Standards and Workbench for Crop Science

    PubMed Central

    Bruskiewich, Richard; Senger, Martin; Davenport, Guy; Ruiz, Manuel; Rouard, Mathieu; Hazekamp, Tom; Takeya, Masaru; Doi, Koji; Satoh, Kouji; Costa, Marcos; Simon, Reinhard; Balaji, Jayashree; Akintunde, Akinnola; Mauleon, Ramil; Wanchana, Samart; Shah, Trushar; Anacleto, Mylah; Portugal, Arllet; Ulat, Victor Jun; Thongjuea, Supat; Braak, Kyle; Ritter, Sebastian; Dereeper, Alexis; Skofic, Milko; Rojas, Edwin; Martins, Natalia; Pappas, Georgios; Alamban, Ryan; Almodiel, Roque; Barboza, Lord Hendrix; Detras, Jeffrey; Manansala, Kevin; Mendoza, Michael Jonathan; Morales, Jeffrey; Peralta, Barry; Valerio, Rowena; Zhang, Yi; Gregorio, Sergio; Hermocilla, Joseph; Echavez, Michael; Yap, Jan Michael; Farmer, Andrew; Schiltz, Gary; Lee, Jennifer; Casstevens, Terry; Jaiswal, Pankaj; Meintjes, Ayton; Wilkinson, Mark; Good, Benjamin; Wagner, James; Morris, Jane; Marshall, David; Collins, Anthony; Kikuchi, Shoshi; Metz, Thomas; McLaren, Graham; van Hintum, Theo

    2008-01-01

    The Generation Challenge programme (GCP) is a global crop research consortium directed toward crop improvement through the application of comparative biology and genetic resources characterization to plant breeding. A key consortium research activity is the development of a GCP crop bioinformatics platform to support GCP research. This platform includes the following: (i) shared, public platform-independent domain models, ontology, and data formats to enable interoperability of data and analysis flows within the platform; (ii) web service and registry technologies to identify, share, and integrate information across diverse, globally dispersed data sources, as well as to access high-performance computational (HPC) facilities for computationally intensive, high-throughput analyses of project data; (iii) platform-specific middleware reference implementations of the domain model integrating a suite of public (largely open-access/-source) databases and software tools into a workbench to facilitate biodiversity analysis, comparative analysis of crop genomic data, and plant breeding decision making. PMID:18483570

  18. Phylogenetic properties of 50 nuclear loci in Medicago (Leguminosae) generated using multiplexed sequence capture and next-generation sequencing.

    PubMed

    de Sousa, Filipe; Bertrand, Yann J K; Nylinder, Stephan; Oxelman, Bengt; Eriksson, Jonna S; Pfeil, Bernard E

    2014-01-01

    Next-generation sequencing technology has increased the capacity to generate molecular data for plant biological research, including phylogenetics, and can potentially contribute to resolving complex phylogenetic problems. The evolutionary history of Medicago L. (Leguminosae: Trifoliae) remains unresolved due to incongruence between published phylogenies. Identification of the processes causing this genealogical incongruence is essential for the inference of a correct species phylogeny of the genus and requires that more molecular data, preferably from low-copy nuclear genes, are obtained across different species. Here we report the development of 50 novel LCN markers in Medicago and assess the phylogenetic properties of each marker. We used the genomic resources available for Medicago truncatula Gaertn., hybridisation-based gene enrichment (sequence capture) techniques and Next-Generation Sequencing to generate sequences. This alternative proves to be a cost-effective approach to amplicon sequencing in phylogenetic studies at the genus or tribe level and allows for an increase in number and size of targeted loci. Substitution rate estimates for each of the 50 loci are provided, and an overview of the variation in substitution rates among a large number of low-copy nuclear genes in plants is presented for the first time. Aligned sequences of major species lineages of Medicago and its sister genus are made available and can be used in further probe development for sequence-capture of the same markers.

  19. Phylogenetic Properties of 50 Nuclear Loci in Medicago (Leguminosae) Generated Using Multiplexed Sequence Capture and Next-Generation Sequencing

    PubMed Central

    de Sousa, Filipe; Bertrand, Yann J. K.; Nylinder, Stephan; Oxelman, Bengt; Eriksson, Jonna S.; Pfeil, Bernard E.

    2014-01-01

    Next-generation sequencing technology has increased the capacity to generate molecular data for plant biological research, including phylogenetics, and can potentially contribute to resolving complex phylogenetic problems. The evolutionary history of Medicago L. (Leguminosae: Trifoliae) remains unresolved due to incongruence between published phylogenies. Identification of the processes causing this genealogical incongruence is essential for the inference of a correct species phylogeny of the genus and requires that more molecular data, preferably from low-copy nuclear genes, are obtained across different species. Here we report the development of 50 novel LCN markers in Medicago and assess the phylogenetic properties of each marker. We used the genomic resources available for Medicago truncatula Gaertn., hybridisation-based gene enrichment (sequence capture) techniques and Next-Generation Sequencing to generate sequences. This alternative proves to be a cost-effective approach to amplicon sequencing in phylogenetic studies at the genus or tribe level and allows for an increase in number and size of targeted loci. Substitution rate estimates for each of the 50 loci are provided, and an overview of the variation in substitution rates among a large number of low-copy nuclear genes in plants is presented for the first time. Aligned sequences of major species lineages of Medicago and its sister genus are made available and can be used in further probe development for sequence-capture of the same markers. PMID:25329401

  20. Third Generation Sequencing Techniques and Applications to Drug Discovery

    PubMed Central

    Ozsolak, Fatih

    2012-01-01

    Introduction There is an immediate need for functional and molecular studies to decipher differences between disease and “normal” settings to identify large quantities of validated targets with the highest therapeutic utilities. Furthermore, drug mechanism of action and biomarkers to predict drug efficacy and safety need to be identified for effective design of clinical trials, decreasing attrition rates, regulatory agency approval process and drug repositioning. By expanding the power of genetics and pharmacogenetics studies, next generation nucleic acid sequencing technologies have started to play an important role in all stages of drug discovery. Areas covered This article reviews the first and second generation sequencing technologies (SGSTs) and challenges they pose to biomedicine. The article then focuses on the emerging third generation sequencing technologies (TGSTs), their technological foundations and potential contributions to drug discovery. Expert Opinion Despite the scientific and commercial success of SGSTs, the goal of rapid, comprehensive and unbiased sequencing of nucleic acids has not been achieved. TGSTs promise to increase sequencing throughput and read lengths, decrease costs, run times and error rates, eliminate biases inherent in SGSTs, and offer capabilities beyond nucleic acid sequencing. Such changes will have positive impact in all sequencing applications to drug discovery. PMID:22468954

  1. Neural Sequence Generation Using Spatiotemporal Patterns of Inhibition

    PubMed Central

    Cannon, Jonathan; Kopell, Nancy; Gardner, Timothy; Markowitz, Jeffrey

    2015-01-01

    Stereotyped sequences of neural activity are thought to underlie reproducible behaviors and cognitive processes ranging from memory recall to arm movement. One of the most prominent theoretical models of neural sequence generation is the synfire chain, in which pulses of synchronized spiking activity propagate robustly along a chain of cells connected by highly redundant feedforward excitation. But recent experimental observations in the avian song production pathway during song generation have shown excitatory activity interacting strongly with the firing patterns of inhibitory neurons, suggesting a process of sequence generation more complex than feedforward excitation. Here we propose a model of sequence generation inspired by these observations in which a pulse travels along a spatially recurrent excitatory chain, passing repeatedly through zones of local feedback inhibition. In this model, synchrony and robust timing are maintained not through redundant excitatory connections, but rather through the interaction between the pulse and the spatiotemporal pattern of inhibition that it creates as it circulates the network. These results suggest that spatially and temporally structured inhibition may play a key role in sequence generation. PMID:26536029

  2. A Pulse Generator Based on an Arduino Platform for Ultrasonic Applications

    NASA Astrophysics Data System (ADS)

    Acevedo, Pedro; Vázquez, Mónica; Durán, Joel; Petrearce, Rodolfo

    The objective of this work is to use the Arduino platform as an ultrasonic pulse generator to excite PVDF ultrasonic arrays in transmission. An experimental setup was implemented using a through-transmission configuration to evaluate the performance of the generator.

  3. A Robust High Throughput Platform to Generate Functional Recombinant Monoclonal Antibodies Using Rabbit B Cells from Peripheral Blood

    PubMed Central

    Seeber, Stefan; Ros, Francesca; Thorey, Irmgard; Tiefenthaler, Georg; Kaluza, Klaus; Lifke, Valeria; Fischer, Jens André Alexander; Klostermann, Stefan; Endl, Josef; Kopetzki, Erhard; Pashine, Achal; Siewe, Basile; Kaluza, Brigitte; Platzer, Josef; Offner, Sonja

    2014-01-01

    We have developed a robust platform to generate and functionally characterize rabbit-derived antibodies using B cells from peripheral blood. The rapid high throughput procedure generates a diverse set of antibodies, yet requires only few animals to be immunized without the need to sacrifice them. The workflow includes (i) the identification and isolation of single B cells from rabbit blood expressing IgG antibodies, (ii) an elaborate short term B-cell cultivation to produce sufficient monoclonal antigen specific IgG for comprehensive phenotype screens, (iii) the isolation of VH and VL coding regions via PCR from B-cell clones producing antigen specific and functional antibodies followed by the sequence determination, and (iv) the recombinant expression and purification of IgG antibodies. The fully integrated and to a large degree automated platform (demonstrated in this paper using IL1RL1 immunized rabbits) yielded clonal and very diverse IL1RL1-specific and functional IL1RL1-inhibiting rabbit antibodies. These functional IgGs from individual animals were obtained at a short time range after immunization and could be identified already during primary screening, thus substantially lowering the workload for the subsequent B-cell PCR workflow. Early availability of sequence information permits one to select early-on function- and sequence-diverse antibodies for further characterization. In summary, this powerful technology platform has proven to be an efficient and robust method for the rapid generation of antigen specific and functional monoclonal rabbit antibodies without sacrificing the immunized animal. PMID:24503933

  4. Microbial Contamination in Next Generation Sequencing: Implications for Sequence-Based Analysis of Clinical Samples

    PubMed Central

    Strong, Michael J.; Xu, Guorong; Morici, Lisa; Splinter Bon-Durant, Sandra; Baddoo, Melody; Lin, Zhen; Fewell, Claire; Taylor, Christopher M.; Flemington, Erik K.

    2014-01-01

    The high level of accuracy and sensitivity of next generation sequencing for quantifying genetic material across organismal boundaries gives it tremendous potential for pathogen discovery and diagnosis in human disease. Despite this promise, substantial bacterial contamination is routinely found in existing human-derived RNA-seq datasets that likely arises from environmental sources. This raises the need for stringent sequencing and analysis protocols for studies investigating sequence-based microbial signatures in clinical samples. PMID:25412476

  5. NeSSM: a Next-generation Sequencing Simulator for Metagenomics.

    PubMed

    Jia, Ben; Xuan, Liming; Cai, Kaiye; Hu, Zhiqiang; Ma, Liangxiao; Wei, Chaochun

    2013-01-01

    Metagenomics can reveal the vast majority of microbes that have been missed by traditional cultivation-based methods. Due to its extremely wide range of application areas, fast metagenome sequencing simulation systems with high fidelity are in great demand to facilitate the development and comparison of metagenomics analysis tools. We present here a customizable metagenome simulation system: NeSSM (Next-generation Sequencing Simulator for Metagenomics). Combining complete genomes currently available, a community composition table, and sequencing parameters, it can simulate metagenome sequencing better than existing systems. Sequencing error models based on the explicit distribution of errors at each base and sequencing coverage bias are incorporated in the simulation. In order to improve the fidelity of simulation, tools are provided by NeSSM to estimate the sequencing error models, sequencing coverage bias and the community composition directly from existing metagenome sequencing data. Currently, NeSSM supports single-end and pair-end sequencing for both 454 and Illumina platforms. In addition, a GPU (graphics processing units) version of NeSSM is also developed to accelerate the simulation. By comparing the simulated sequencing data from NeSSM with experimental metagenome sequencing data, we have demonstrated that NeSSM performs better in many aspects than existing popular metagenome simulators, such as MetaSim, GemSIM and Grinder. The GPU version of NeSSM is more than one-order of magnitude faster than MetaSim. NeSSM is a fast simulation system for high-throughput metagenome sequencing. It can be helpful to develop tools and evaluate strategies for metagenomics analysis and it's freely available for academic users at http://cbb.sjtu.edu.cn/~ccwei/pub/software/NeSSM.php.

  6. Comparison of Two Massively Parallel Sequencing Platforms using 83 Single Nucleotide Polymorphisms for Human Identification.

    PubMed

    Apaga, Dame Loveliness T; Dennis, Sheila E; Salvador, Jazelyn M; Calacal, Gayvelline C; De Ungria, Maria Corazon A

    2017-03-24

    The potential of Massively Parallel Sequencing (MPS) technology to vastly expand the capabilities of human identification led to the emergence of different MPS platforms that use forensically relevant genetic markers. Two of the MPS platforms that are currently available are the MiSeq(®) FGx™ Forensic Genomics System (Illumina) and the HID-Ion Personal Genome Machine (PGM)™ (Thermo Fisher Scientific). These are coupled with the ForenSeq™ DNA Signature Prep kit (Illumina) and the HID-Ion AmpliSeq™ Identity Panel (Thermo Fisher Scientific), respectively. In this study, we compared the genotyping performance of the two MPS systems based on 83 SNP markers that are present in both MPS marker panels. Results show that MiSeq(®) FGx™ has greater sample-to-sample variation than the HID-Ion PGM™ in terms of read counts for all the 83 SNP markers. Allele coverage ratio (ACR) values show generally balanced heterozygous reads for both platforms. Two and four SNP markers from the MiSeq(®) FGx™ and HID-Ion PGM™, respectively, have average ACR values lower than the recommended value of 0.67. Comparison of genotype calls showed 99.7% concordance between the two platforms.

  7. Next-Generation Sequencing: From Understanding Biology to Personalized Medicine

    PubMed Central

    Frese, Karen S.; Katus, Hugo A.; Meder, Benjamin

    2013-01-01

    Within just a few years, the new methods for high-throughput next-generation sequencing have generated completely novel insights into the heritability and pathophysiology of human disease. In this review, we wish to highlight the benefits of the current state-of-the-art sequencing technologies for genetic and epigenetic research. We illustrate how these technologies help to constantly improve our understanding of genetic mechanisms in biological systems and summarize the progress made so far. This can be exemplified by the case of heritable heart muscle diseases, so-called cardiomyopathies. Here, next-generation sequencing is able to identify novel disease genes, and first clinical applications demonstrate the successful translation of this technology into personalized patient care. PMID:24832667

  8. Pittosporum cryptic virus 1: genome sequence completion using next-generation sequencing.

    PubMed

    Elbeaino, Toufic; Kubaa, Raied Abou; Tuzlali, Hasan Tuna; Digiaro, Michele

    2016-07-01

    Next-generation sequencing (NGS) was applied to dsRNAs extracted from an Italian pittosporum plant infected with pittosporum cryptic virus 1 (PiCV1). NGS allowed assembly of the full genome sequence of PiCV1, comprising dsRNA1 (1.9 kbp) and dsRNA2 (1.5 kbp), which encode the RNA-dependent RNA polymerase and capsid protein genes, respectively. Phylogenetic and sequence analyses confirmed that PiCV1 is a new member of the genus Deltapartitivirus, family Partiviridae. From the same plant, NSG also permitted assembly of the complete genome sequence of eggplant mottled dwarf virus (EMDV), which shared 86 % to 98 % nucleotide sequence identity with complete and partial sequences (ca 6750 nt) of other known EMDV isolates with sequences available in the GenBank database.

  9. High-Throughput Next-Generation Sequencing of Polioviruses.

    PubMed

    Montmayeur, Anna M; Ng, Terry Fei Fan; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A; Oberste, M Steven; Burns, Cara C

    2017-02-01

    The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance.

  10. Manipulating attentional load in sequence learning through random number generation.

    PubMed

    Wierzchoń, Michał; Gaillard, Vinciane; Asanowicz, Dariusz; Cleeremans, Axel

    2012-01-01

    Implicit learning is often assumed to be an effortless process. However, some artificial grammar learning and sequence learning studies using dual tasks seem to suggest that attention is essential for implicit learning to occur. This discrepancy probably results from the specific type of secondary task that is used. Different secondary tasks may engage attentional resources differently and therefore may bias performance on the primary task in different ways. Here, we used a random number generation (RNG) task, which may allow for a closer monitoring of a participant's engagement in a secondary task than the popular secondary task in sequence learning studies: tone counting (TC). In the first two experiments, we investigated the interference associated with performing RNG concurrently with a serial reaction time (SRT) task. In a third experiment, we compared the effects of RNG and TC. In all three experiments, we directly evaluated participants' knowledge of the sequence with a subsequent sequence generation task. Sequence learning was consistently observed in all experiments, but was impaired under dual-task conditions. Most importantly, our data suggest that RNG is more demanding and impairs learning to a greater extent than TC. Nevertheless, we failed to observe effects of the secondary task in subsequent sequence generation. Our studies indicate that RNG is a promising task to explore the involvement of attention in the SRT task.

  11. Manipulating attentional load in sequence learning through random number generation

    PubMed Central

    Wierzchoń, Michał; Gaillard, Vinciane; Asanowicz, Dariusz; Cleeremans, Axel

    2012-01-01

    Implicit learning is often assumed to be an effortless process. However, some artificial grammar learning and sequence learning studies using dual tasks seem to suggest that attention is essential for implicit learning to occur. This discrepancy probably results from the specific type of secondary task that is used. Different secondary tasks may engage attentional resources differently and therefore may bias performance on the primary task in different ways. Here, we used a random number generation (RNG) task, which may allow for a closer monitoring of a participant’s engagement in a secondary task than the popular secondary task in sequence learning studies: tone counting (TC). In the first two experiments, we investigated the interference associated with performing RNG concurrently with a serial reaction time (SRT) task. In a third experiment, we compared the effects of RNG and TC. In all three experiments, we directly evaluated participants’ knowledge of the sequence with a subsequent sequence generation task. Sequence learning was consistently observed in all experiments, but was impaired under dual-task conditions. Most importantly, our data suggest that RNG is more demanding and impairs learning to a greater extent than TC. Nevertheless, we failed to observe effects of the secondary task in subsequent sequence generation. Our studies indicate that RNG is a promising task to explore the involvement of attention in the SRT task. PMID:22723816

  12. Clinical Next Generation Sequencing for Precision Medicine in Cancer

    PubMed Central

    Dong, Ling; Wang, Wanheng; Li, Alvin; Kansal, Rina; Chen, Yuhan; Chen, Hong; Li, Xinmin

    2015-01-01

    Rapid adoption of next generation sequencing (NGS) in genomic medicine has been driven by low cost, high throughput sequencing and rapid advances in our understanding of the genetic bases of human diseases. Today, the NGS method has dominated sequencing space in genomic research, and quickly entered clinical practice. Because unique features of NGS perfectly meet the clinical reality (need to do more with less), the NGS technology is becoming a driving force to realize the dream of precision medicine. This article describes the strengths of NGS, NGS panels used in precision medicine, current applications of NGS in cytology, and its challenges and future directions for routine clinical use. PMID:27006629

  13. The Motif Tool Assessment Platform (MTAP) for sequence-based transcription factor binding site prediction tools.

    PubMed

    Quest, Daniel; Ali, Hesham

    2010-01-01

    Predicting transcription factor binding sites (TFBS) from sequence is one of the most challenging problems in computational biology. The development of (semi-)automated computer-assisted prediction methods is needed to find TFBS over an entire genome, which is a first step in reconstructing mechanisms that control gene activity. Bioinformatics journals continue to publish diverse methods for predicting TFBS on a monthly basis. To help practitioners in deciding which method to use to predict for a particular TFBS, we provide a platform to assess the quality and applicability of the available methods. Assessment tools allow researchers to determine how methods can be expected to perform on specific organisms or on specific transcription factor families. This chapter introduces the TFBS detection problem and reviews current strategies for evaluating algorithm effectiveness. In this chapter, a novel and robust assessment tool, the Motif Tool Assessment Platform (MTAP), is introduced and discussed.

  14. Collaborative Effort for a Centralized Worldwide Tuberculosis Relational Sequencing Data Platform

    PubMed Central

    Starks, Angela M.; Avilés, Enrique; Cirillo, Daniela M.; Denkinger, Claudia M.; Dolinger, David L.; Emerson, Claudia; Gallarda, Jim; Hanna, Debra; Kim, Peter S.; Liwski, Richard; Miotto, Paolo; Schito, Marco; Zignol, Matteo

    2015-01-01

    Continued progress in addressing challenges associated with detection and management of tuberculosis requires new diagnostic tools. These tools must be able to provide rapid and accurate information for detecting resistance to guide selection of the treatment regimen for each patient. To achieve this goal, globally representative genotypic, phenotypic, and clinical data are needed in a standardized and curated data platform. A global partnership of academic institutions, public health agencies, and nongovernmental organizations has been established to develop a tuberculosis relational sequencing data platform (ReSeqTB) that seeks to increase understanding of the genetic basis of resistance by correlating molecular data with results from drug susceptibility testing and, optimally, associated patient outcomes. These data will inform development of new diagnostics, facilitate clinical decision making, and improve surveillance for drug resistance. ReSeqTB offers an opportunity for collaboration to achieve improved patient outcomes and to advance efforts to prevent and control this devastating disease. PMID:26409275

  15. A resampling procedure for generating conditioned daily weather sequences

    USGS Publications Warehouse

    Clark, M.P.; Gangopadhyay, S.; Brandon, D.; Werner, K.; Hay, L.; Rajagopalan, B.; Yates, D.

    2004-01-01

    [1] A method is introduced to generate conditioned daily precipitation and temperature time series at multiple stations. The method resamples data from the historical record "nens" times for the period of interest (nens = number of ensemble members) and reorders the ensemble members to reconstruct the observed spatial (intersite) and temporal correlation statistics. The weather generator model is applied to 2307 stations in the contiguous United States and is shown to reproduce the observed spatial correlation between neighboring stations, the observed correlation between variables (e.g., between precipitation and temperature), and the observed temporal correlation between subsequent days in the generated weather sequence. The weather generator model is extended to produce sequences of weather that are conditioned on climate indices (in this case the Nin??o 3.4 index). Example illustrations of conditioned weather sequences are provided for a station in Arizona (Petrified Forest, 34.8??N, 109.9??W), where El Nin??o and La Nin??a conditions have a strong effect on winter precipitation. The conditioned weather sequences generated using the methods described in this paper are appropriate for use as input to hydrologic models to produce multiseason forecasts of streamflow.

  16. Minimizing Next-Generation Sequencing Errors for HIV Drug Resistance Testing.

    PubMed

    Fernández-Caballero, José A; Chueca, Natalia; Poveda, Eva; García, Federico

    2017-05-23

    Next-generation sequencing prototypes for the routine diagnosis of resistance to antiretrovirals approved for the treatment of HIV infection are now being used in many clinical diagnostic laboratories. As some of the next-generation sequencing platforms may be a source of errors, it is necessary to improve the currently available protocols and implement bioinformatic tools that may help to correctly identify the presence of resistance mutations with clinical impact. Several studies have addressed these issues in recent years. Some of them are mainly focused on improving protocols for decreasing the magnitude of errors during the polymerase change reaction. Other studies propose specific bioinformatic tools, able to reach both a 93-98% reduction of indels (insertions/deletions) and a sensitivity and specificity close to 100% in single nucleotide polymorphism variant calling. The implementation of new protocols and bioinformatic tools improving the accuracy of next-generation sequencing results must be considered for a correct analysis of HIV resistance mutations for making clinical decisions. This review summarizes the most relevant data available for the optimization of next-generation sequencing applied to HIV resistance testing.

  17. Next-generation sequencing for endocrine cancers: Recent advances and challenges.

    PubMed

    Suresh, Padmanaban S; Venkatesh, Thejaswini; Tsutsumi, Rie; Shetty, Abhishek

    2017-05-01

    Contemporary molecular biology research tools have enriched numerous areas of biomedical research that address challenging diseases, including endocrine cancers (pituitary, thyroid, parathyroid, adrenal, testicular, ovarian, and neuroendocrine cancers). These tools have placed several intriguing clues before the scientific community. Endocrine cancers pose a major challenge in health care and research despite considerable attempts by researchers to understand their etiology. Microarray analyses have provided gene signatures from many cells, tissues, and organs that can differentiate healthy states from diseased ones, and even show patterns that correlate with stages of a disease. Microarray data can also elucidate the responses of endocrine tumors to therapeutic treatments. The rapid progress in next-generation sequencing methods has overcome many of the initial challenges of these technologies, and their advantages over microarray techniques have enabled them to emerge as valuable aids for clinical research applications (prognosis, identification of drug targets, etc.). A comprehensive review describing the recent advances in next-generation sequencing methods and their application in the evaluation of endocrine and endocrine-related cancers is lacking. The main purpose of this review is to illustrate the concepts that collectively constitute our current view of the possibilities offered by next-generation sequencing technological platforms, challenges to relevant applications, and perspectives on the future of clinical genetic testing of patients with endocrine tumors. We focus on recent discoveries in the use of next-generation sequencing methods for clinical diagnosis of endocrine tumors in patients and conclude with a discussion on persisting challenges and future objectives.

  18. Performance Evaluation Tools for Next Generation Scalable Computing Platforms

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Sarukkai, Sekhar; Craw, James (Technical Monitor)

    1995-01-01

    The Federal High Performance and Communications (HPCC) Program continue to focus on R&D in a wide range of high performance computing and communications technologies. Using its accomplishments in the past four years as building blocks towards a Global Information Infrastructure (GII), an Implementation Plan that identifies six Strategic Focus Areas for R&D has been proposed. This white paper argues that a new generation of system software and programming tools must be developed to support these focus areas, so that the R&D we invest today can lead to technology pay-off a decade from now. The Global Computing Infrastructure (GCI) in the Year 2000 and Beyond would consists of thousands of powerful computing nodes connected via high-speed networks across the globe. Users will be able to obtain computing in formation services the GCI with the ease of using a plugging a toaster into the electrical outlet on the wall anywhere in the country. Developing and managing the GO requires performance prediction and monitoring capabilities that do not exist. Various accomplishments in this field today must be integrated and expanded to support this vision.

  19. Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data

    PubMed Central

    Sandmann, Sarah; de Graaf, Aniek O.; Karimi, Mohsen; van der Reijden, Bert A.; Hellström-Lindberg, Eva; Jansen, Joop H.; Dugas, Martin

    2017-01-01

    Valid variant calling results are crucial for the use of next-generation sequencing in clinical routine. However, there are numerous variant calling tools that usually differ in algorithms, filtering strategies, recommendations and thus, also in the output. We evaluated eight open-source tools regarding their ability to call single nucleotide variants and short indels with allelic frequencies as low as 1% in non-matched next-generation sequencing data: GATK HaplotypeCaller, Platypus, VarScan, LoFreq, FreeBayes, SNVer, SAMtools and VarDict. We analysed two real datasets from patients with myelodysplastic syndrome, covering 54 Illumina HiSeq samples and 111 Illumina NextSeq samples. Mutations were validated by re-sequencing on the same platform, on a different platform and expert based review. In addition we considered two simulated datasets with varying coverage and error profiles, covering 50 samples each. In all cases an identical target region consisting of 19 genes (42,322 bp) was analysed. Altogether, no tool succeeded in calling all mutations. High sensitivity was always accompanied by low precision. Influence of varying coverages- and background noise on variant calling was generally low. Taking everything into account, VarDict performed best. However, our results indicate that there is a need to improve reproducibility of the results in the context of multithreading. PMID:28233799

  20. Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data

    NASA Astrophysics Data System (ADS)

    Sandmann, Sarah; de Graaf, Aniek O.; Karimi, Mohsen; van der Reijden, Bert A.; Hellström-Lindberg, Eva; Jansen, Joop H.; Dugas, Martin

    2017-02-01

    Valid variant calling results are crucial for the use of next-generation sequencing in clinical routine. However, there are numerous variant calling tools that usually differ in algorithms, filtering strategies, recommendations and thus, also in the output. We evaluated eight open-source tools regarding their ability to call single nucleotide variants and short indels with allelic frequencies as low as 1% in non-matched next-generation sequencing data: GATK HaplotypeCaller, Platypus, VarScan, LoFreq, FreeBayes, SNVer, SAMtools and VarDict. We analysed two real datasets from patients with myelodysplastic syndrome, covering 54 Illumina HiSeq samples and 111 Illumina NextSeq samples. Mutations were validated by re-sequencing on the same platform, on a different platform and expert based review. In addition we considered two simulated datasets with varying coverage and error profiles, covering 50 samples each. In all cases an identical target region consisting of 19 genes (42,322 bp) was analysed. Altogether, no tool succeeded in calling all mutations. High sensitivity was always accompanied by low precision. Influence of varying coverages- and background noise on variant calling was generally low. Taking everything into account, VarDict performed best. However, our results indicate that there is a need to improve reproducibility of the results in the context of multithreading.

  1. Future Developments of the Next Generation Manned Space Platforms (European and Russian Space Students Perspectives)

    NASA Astrophysics Data System (ADS)

    Robinson, Douglas K. R.

    2002-01-01

    The opportunities for research made available by in-orbit manned space platforms is extensive. Research topics from space life science and biotechnology to material science and structural mechanics, from Astrophysics to the Low Earth Orbit environment to name a few. The list is long and has been growing steadily since the launch of Salyut 1 in 1971 till the present day ISS. With the construction of the ISS now into its final phase, what is the future of such research platforms? What will the "Next Generation" space station comprise of? What of manned research platforms beyond LEO and what constraints are foreseen after ISS. This paper presents current issues concerning the conceptual design of the "Next Generation" manned space platforms, the obstacles that are predicted concerning major subsystems of such platforms and also predictions of where the foci of research will concentrate. Future developments of the next generation manned space platforms presents research by the author in both his previous academic institutions1, personal opinions and the opinions of other young space research students and space professionals including Super Aero (France), Leicester University and Space Research Centre (UK) and Moscow State University (Russia). Here the author will detail the areas in which the contributors (representing the next generation space professionals) believe manned space platform architectures will be evolved, new technological developments and barriers to be overcome. In addition, new methods of Spacecraft design will also be presented, referring in the main to the Space Station Design Workshop 2002 (ESTEC Concurrent Design Facility) a week long workshop where a group of 30 young space professionals where brought together to design a conceptual space station. Future developments of the next generation manned space platforms has been composed with two aims. Firstly, to convey to both young space enthusiasts and more mature space professionals the ideas

  2. Detection of Bacillus anthracis DNA in Complex Soil and Air Samples Using Next-Generation Sequencing

    PubMed Central

    Be, Nicholas A.; Thissen, James B.; Gardner, Shea N.; McLoughlin, Kevin S.; Fofanov, Viacheslav Y.; Koshinsky, Heather; Ellingson, Sally R.; Brettin, Thomas S.; Jackson, Paul J.; Jaing, Crystal J.

    2013-01-01

    Bacillus anthracis is the potentially lethal etiologic agent of anthrax disease, and is a significant concern in the realm of biodefense. One of the cornerstones of an effective biodefense strategy is the ability to detect infectious agents with a high degree of sensitivity and specificity in the context of a complex sample background. The nature of the B. anthracis genome, however, renders specific detection difficult, due to close homology with B. cereus and B. thuringiensis. We therefore elected to determine the efficacy of next-generation sequencing analysis and microarrays for detection of B. anthracis in an environmental background. We applied next-generation sequencing to titrated genome copy numbers of B. anthracis in the presence of background nucleic acid extracted from aerosol and soil samples. We found next-generation sequencing to be capable of detecting as few as 10 genomic equivalents of B. anthracis DNA per nanogram of background nucleic acid. Detection was accomplished by mapping reads to either a defined subset of reference genomes or to the full GenBank database. Moreover, sequence data obtained from B. anthracis could be reliably distinguished from sequence data mapping to either B. cereus or B. thuringiensis. We also demonstrated the efficacy of a microbial census microarray in detecting B. anthracis in the same samples, representing a cost-effective and high-throughput approach, complementary to next-generation sequencing. Our results, in combination with the capacity of sequencing for providing insights into the genomic characteristics of complex and novel organisms, suggest that these platforms should be considered important components of a biosurveillance strategy. PMID:24039948

  3. Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes.

    PubMed

    Oyola, Samuel O; Otto, Thomas D; Gu, Yong; Maslen, Gareth; Manske, Magnus; Campino, Susana; Turner, Daniel J; Macinnis, Bronwyn; Kwiatkowski, Dominic P; Swerdlow, Harold P; Quail, Michael A

    2012-01-03

    Massively parallel sequencing technology is revolutionizing approaches to genomic and genetic research. Since its advent, the scale and efficiency of Next-Generation Sequencing (NGS) has rapidly improved. In spite of this success, sequencing genomes or genomic regions with extremely biased base composition is still a great challenge to the currently available NGS platforms. The genomes of some important pathogenic organisms like Plasmodium falciparum (high AT content) and Mycobacterium tuberculosis (high GC content) display extremes of base composition. The standard library preparation procedures that employ PCR amplification have been shown to cause uneven read coverage particularly across AT and GC rich regions, leading to problems in genome assembly and variation analyses. Alternative library-preparation approaches that omit PCR amplification require large quantities of starting material and hence are not suitable for small amounts of DNA/RNA such as those from clinical isolates. We have developed and optimized library-preparation procedures suitable for low quantity starting material and tolerant to extremely high AT content sequences. We have used our optimized conditions in parallel with standard methods to prepare Illumina sequencing libraries from a non-clinical and a clinical isolate (containing ~53% host contamination). By analyzing and comparing the quality of sequence data generated, we show that our optimized conditions that involve a PCR additive (TMAC), produces amplified libraries with improved coverage of extremely AT-rich regions and reduced bias toward GC neutral templates. We have developed a robust and optimized Next-Generation Sequencing library amplification method suitable for extremely AT-rich genomes. The new amplification conditions significantly reduce bias and retain the complexity of either extremes of base composition. This development will greatly benefit sequencing clinical samples that often require amplification due to low mass of

  4. Pattern Recognition on Read Positioning in Next Generation Sequencing

    PubMed Central

    Byeon, Boseon; Kovalchuk, Igor

    2016-01-01

    The usefulness and the utility of the next generation sequencing (NGS) technology are based on the assumption that the DNA or cDNA cleavage required to generate short sequence reads is random. Several previous reports suggest the existence of sequencing bias of NGS reads. To address this question in greater detail, we analyze NGS data from four organisms with different GC content, Plasmodium falciparum (19.39%), Arabidopsis thaliana (36.03%), Homo sapiens (40.91%) and Streptomyces coelicolor (72.00%). Using machine learning techniques, we recognize the pattern that the NGS read start is positioned in the local region where the nucleotide distribution is dissimilar from the global nucleotide distribution. We also demonstrate that the mono-nucleotide distribution underestimates sequencing bias, and the recognized pattern is explained largely by the distribution of multi-nucleotides (di-, tri-, and tetra- nucleotides) rather than mono-nucleotides. This implies that the correction of sequencing bias needs to be performed on the basis of the multi-nucleotide distribution. Providing companion software to quantify the effect of the recognized pattern on read positioning, we exemplify that the bias correction based on the mono-nucleotide distribution may not be sufficient to clean sequencing bias. PMID:27299343

  5. Next-Generation Sequencing in the Understanding of Kaposi's Sarcoma-Associated Herpesvirus (KSHV) Biology.

    PubMed

    Strahan, Roxanne; Uppal, Timsy; Verma, Subhash C

    2016-03-31

    Non-Sanger-based novel nucleic acid sequencing techniques, referred to as Next-Generation Sequencing (NGS), provide a rapid, reliable, high-throughput, and massively parallel sequencing methodology that has improved our understanding of human cancers and cancer-related viruses. NGS has become a quintessential research tool for more effective characterization of complex viral and host genomes through its ever-expanding repertoire, which consists of whole-genome sequencing, whole-transcriptome sequencing, and whole-epigenome sequencing. These new NGS platforms provide a comprehensive and systematic genome-wide analysis of genomic sequences and a full transcriptional profile at a single nucleotide resolution. When combined, these techniques help unlock the function of novel genes and the related pathways that contribute to the overall viral pathogenesis. Ongoing research in the field of virology endeavors to identify the role of various underlying mechanisms that control the regulation of the herpesvirus biphasic lifecycle in order to discover potential therapeutic targets and treatment strategies. In this review, we have complied the most recent findings about the application of NGS in Kaposi's sarcoma-associated herpesvirus (KSHV) biology, including identification of novel genomic features and whole-genome KSHV diversities, global gene regulatory network profiling for intricate transcriptome analyses, and surveying of epigenetic marks (DNA methylation, modified histones, and chromatin remodelers) during de novo, latent, and productive KSHV infections.

  6. Generation and Analysis of a Mouse Intestinal Metatranscriptome through Illumina Based RNA-Sequencing

    PubMed Central

    Robertson, Charles E.; Hung, Stacy S.; Markle, Janet; Canty, Angelo J.; McCoy, Kathy D.; Macpherson, Andrew J.; Poussier, Philippe; Danska, Jayne S.; Parkinson, John

    2012-01-01

    With the advent of high through-put sequencing (HTS), the emerging science of metagenomics is transforming our understanding of the relationships of microbial communities with their environments. While metagenomics aims to catalogue the genes present in a sample through assessing which genes are actively expressed, metatranscriptomics can provide a mechanistic understanding of community inter-relationships. To achieve these goals, several challenges need to be addressed from sample preparation to sequence processing, statistical analysis and functional annotation. Here we use an inbred non-obese diabetic (NOD) mouse model in which germ-free animals were colonized with a defined mixture of eight commensal bacteria, to explore methods of RNA extraction and to develop a pipeline for the generation and analysis of metatranscriptomic data. Applying the Illumina HTS platform, we sequenced 12 NOD cecal samples prepared using multiple RNA-extraction protocols. The absence of a complete set of reference genomes necessitated a peptide-based search strategy. Up to 16% of sequence reads could be matched to a known bacterial gene. Phylogenetic analysis of the mapped ORFs revealed a distribution consistent with ribosomal RNA, the majority from Bacteroides or Clostridium species. To place these HTS data within a systems context, we mapped the relative abundance of corresponding Escherichia coli homologs onto metabolic and protein-protein interaction networks. These maps identified bacterial processes with components that were well-represented in the datasets. In summary this study highlights the potential of exploiting the economy of HTS platforms for metatranscriptomics. PMID:22558305

  7. An Evolution Based Biosensor Receptor DNA Sequence Generation Algorithm

    PubMed Central

    Kim, Eungyeong; Lee, Malrey; Gatton, Thomas M.; Lee, Jaewan; Zang, Yupeng

    2010-01-01

    A biosensor is composed of a bioreceptor, an associated recognition molecule, and a signal transducer that can selectively detect target substances for analysis. DNA based biosensors utilize receptor molecules that allow hybridization with the target analyte. However, most DNA biosensor research uses oligonucleotides as the target analytes and does not address the potential problems of real samples. The identification of recognition molecules suitable for real target analyte samples is an important step towards further development of DNA biosensors. This study examines the characteristics of DNA used as bioreceptors and proposes a hybrid evolution-based DNA sequence generating algorithm, based on DNA computing, to identify suitable DNA bioreceptor recognition molecules for stable hybridization with real target substances. The Traveling Salesman Problem (TSP) approach is applied in the proposed algorithm to evaluate the safety and fitness of the generated DNA sequences. This approach improves efficiency and stability for enhanced and variable-length DNA sequence generation and allows extension to generation of variable-length DNA sequences with diverse receptor recognition requirements. PMID:22315543

  8. Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline

    PubMed Central

    2014-01-01

    Background Massively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results. Results To address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts. Conclusions By taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples. PMID:24475911

  9. Plant virology and next generation sequencing: experiences with a Potyvirus.

    PubMed

    Kehoe, Monica A; Coutts, Brenda A; Buirchell, Bevan J; Jones, Roger A C

    2014-01-01

    Next generation sequencing is quickly emerging as the go-to tool for plant virologists when sequencing whole virus genomes, and undertaking plant metagenomic studies for new virus discoveries. This study aims to compare the genomic and biological properties of Bean yellow mosaic virus (BYMV) (genus Potyvirus), isolates from Lupinus angustifolius plants with black pod syndrome (BPS), systemic necrosis or non-necrotic symptoms, and from two other plant species. When one Clover yellow vein virus (ClYVV) (genus Potyvirus) and 22 BYMV isolates were sequenced on the Illumina HiSeq2000, one new ClYVV and 23 new BYMV sequences were obtained. When the 23 new BYMV genomes were compared with 17 other BYMV genomes available on Genbank, phylogenetic analysis provided strong support for existence of nine phylogenetic groupings. Biological studies involving seven isolates of BYMV and one of ClYVV gave no symptoms or reactions that could be used to distinguish BYMV isolates from L. angustifolius plants with black pod syndrome from other isolates. Here, we propose that the current system of nomenclature based on biological properties be replaced by numbered groups (I-IX). This is because use of whole genomes revealed that the previous phylogenetic grouping system based on partial sequences of virus genomes and original isolation hosts was unsustainable. This study also demonstrated that, where next generation sequencing is used to obtain complete plant virus genomes, consideration needs to be given to issues regarding sample preparation, adequate levels of coverage across a genome and methods of assembly. It also provided important lessons that will be helpful to other plant virologists using next generation sequencing in the future.

  10. Generation and analysis of expressed sequence tags from the ciliate protozoan parasite Ichthyophthirius multifiliis

    PubMed Central

    Abernathy, Jason W; Xu, Peng; Li, Ping; Xu, De-Hai; Kucuktas, Huseyin; Klesius, Phillip; Arias, Covadonga; Liu, Zhanjiang

    2007-01-01

    Background The ciliate protozoan Ichthyophthirius multifiliis (Ich) is an important parasite of freshwater fish that causes 'white spot disease' leading to significant losses. A genomic resource for large-scale studies of this parasite has been lacking. To study gene expression involved in Ich pathogenesis and virulence, our goal was to generate expressed sequence tags (ESTs) for the development of a powerful microarray platform for the analysis of global gene expression in this species. Here, we initiated a project to sequence and analyze over 10,000 ESTs. Results We sequenced 10,368 EST clones using a normalized cDNA library made from pooled samples of the trophont, tomont, and theront life-cycle stages, and generated 9,769 sequences (94.2% success rate). Post-sequencing processing led to 8,432 high quality sequences. Clustering analysis of these ESTs allowed identification of 4,706 unique sequences containing 976 contigs and 3,730 singletons. These unique sequences represent over two million base pairs (~10% of Plasmodium falciparum genome, a phylogenetically related protozoan). BLASTX searches produced 2,518 significant (E-value < 10-5) hits and further Gene Ontology (GO) analysis annotated 1,008 of these genes. The ESTs were analyzed comparatively against the genomes of the related protozoa Tetrahymena thermophila and P. falciparum, allowing putative identification of additional genes. All the EST sequences were deposited by dbEST in GenBank (GenBank: EG957858–EG966289). Gene discovery and annotations are presented and discussed. Conclusion This set of ESTs represents a significant proportion of the Ich transcriptome, and provides a material basis for the development of microarrays useful for gene expression studies concerning Ich development, pathogenesis, and virulence. PMID:17577414

  11. Application of next generation sequencing technology in Mendelian movement disorders.

    PubMed

    Wang, Yumin; Pan, Xuya; Xue, Dan; Li, Yuwei; Zhang, Xueying; Kuang, Biao; Zheng, Jiabo; Deng, Hao; Li, Xiaoling; Xiong, Wei; Zeng, Zhaoyang; Li, Guiyuan

    2016-02-01

    Next generation sequencing (NGS) has developed very rapidly in the last decade. Compared with Sanger sequencing, NGS has the advantages of high sensitivity and high throughput. Movement disorders are a common type of neurological disease. Although traditional linkage analysis has become a standard method to identify the pathogenic genes in diseases, it is getting difficult to find new pathogenic genes in rare Mendelian disorders, such as movement disorders, due to a lack of appropriate families with high penetrance or enough affected individuals. Thus, NGS is an ideal approach to identify the causal alleles for inherited disorders. NGS is used to identify genes in several diseases and new mutant sites in Mendelian movement disorders. This article reviewed the recent progress in NGS and the use of NGS in Mendelian movement disorders from genome sequencing and transcriptome sequencing. A perspective on how NGS could be employed in rare Mendelian disorders is also provided.

  12. Using chaos to generate variations on movement sequences

    NASA Astrophysics Data System (ADS)

    Bradley, Elizabeth; Stuart, Joshua

    1998-12-01

    We describe a method for introducing variations into predefined motion sequences using a chaotic symbol-sequence reordering technique. A progression of symbols representing the body positions in a dance piece, martial arts form, or other motion sequence is mapped onto a chaotic trajectory, establishing a symbolic dynamics that links the movement sequence and the attractor structure. A variation on the original piece is created by generating a trajectory with slightly different initial conditions, inverting the mapping, and using special corpus-based graph-theoretic interpolation schemes to smooth any abrupt transitions. Sensitive dependence guarantees that the variation is different from the original; the attractor structure and the symbolic dynamics guarantee that the two resemble one another in both aesthetic and mathematical senses.

  13. A Real-Time de novo DNA Sequencing Assembly Platform Based on an FPGA Implementation.

    PubMed

    Hu, Yuanqi; Georgiou, Pantelis

    2016-01-01

    This paper presents an FPGA based DNA comparison platform which can be run concurrently with the sensing phase of DNA sequencing and shortens the overall time needed for de novo DNA assembly. A hybrid overlap searching algorithm is applied which is scalable and can deal with incremental detection of new bases. To handle the incomplete data set which gradually increases during sequencing time, all-against-all comparisons are broken down into successive window-against-window comparison phases and executed using a novel dynamic suffix comparison algorithm combined with a partitioned dynamic programming method. The complete system has been designed to facilitate parallel processing in hardware, which allows real-time comparison and full scalability as well as a decrease in the number of computations required. A base pair comparison rate of 51.2 G/s is achieved when implemented on an FPGA with successful DNA comparison when using data sets from real genomes.

  14. Genome wide SNP discovery in flax through next generation sequencing of reduced representation libraries.

    PubMed

    Kumar, Santosh; You, Frank M; Cloutier, Sylvie

    2012-12-06

    Flax (Linum usitatissimum L.) is a significant fibre and oilseed crop. Current flax molecular markers, including isozymes, RAPDs, AFLPs and SSRs are of limited use in the construction of high density linkage maps and for association mapping applications due to factors such as low reproducibility, intense labour requirements and/or limited numbers. We report here on the use of a reduced representation library strategy combined with next generation Illumina sequencing for rapid and large scale discovery of SNPs in eight flax genotypes. SNP discovery was performed through in silico analysis of the sequencing data against the whole genome shotgun sequence assembly of flax genotype CDC Bethune. Genotyping-by-sequencing of an F6-derived recombinant inbred line population provided validation of the SNPs. Reduced representation libraries of eight flax genotypes were sequenced on the Illumina sequencing platform resulting in sequence coverage ranging from 4.33 to 15.64X (genome equivalents). Depending on the relatedness of the genotypes and the number and length of the reads, between 78% and 93% of the reads mapped onto the CDC Bethune whole genome shotgun sequence assembly. A total of 55,465 SNPs were discovered with the largest number of SNPs belonging to the genotypes with the highest mapping coverage percentage. Approximately 84% of the SNPs discovered were identified in a single genotype, 13% were shared between any two genotypes and the remaining 3% in three or more. Nearly a quarter of the SNPs were found in genic regions. A total of 4,706 out of 4,863 SNPs discovered in Macbeth were validated using genotyping-by-sequencing of 96 F6 individuals from a recombinant inbred line population derived from a cross between CDC Bethune and Macbeth, corresponding to a validation rate of 96.8%. Next generation sequencing of reduced representation libraries was successfully implemented for genome-wide SNP discovery from flax. The genotyping-by-sequencing approach proved to be

  15. Genome wide SNP discovery in flax through next generation sequencing of reduced representation libraries

    PubMed Central

    2012-01-01

    Background Flax (Linum usitatissimum L.) is a significant fibre and oilseed crop. Current flax molecular markers, including isozymes, RAPDs, AFLPs and SSRs are of limited use in the construction of high density linkage maps and for association mapping applications due to factors such as low reproducibility, intense labour requirements and/or limited numbers. We report here on the use of a reduced representation library strategy combined with next generation Illumina sequencing for rapid and large scale discovery of SNPs in eight flax genotypes. SNP discovery was performed through in silico analysis of the sequencing data against the whole genome shotgun sequence assembly of flax genotype CDC Bethune. Genotyping-by-sequencing of an F6-derived recombinant inbred line population provided validation of the SNPs. Results Reduced representation libraries of eight flax genotypes were sequenced on the Illumina sequencing platform resulting in sequence coverage ranging from 4.33 to 15.64X (genome equivalents). Depending on the relatedness of the genotypes and the number and length of the reads, between 78% and 93% of the reads mapped onto the CDC Bethune whole genome shotgun sequence assembly. A total of 55,465 SNPs were discovered with the largest number of SNPs belonging to the genotypes with the highest mapping coverage percentage. Approximately 84% of the SNPs discovered were identified in a single genotype, 13% were shared between any two genotypes and the remaining 3% in three or more. Nearly a quarter of the SNPs were found in genic regions. A total of 4,706 out of 4,863 SNPs discovered in Macbeth were validated using genotyping-by-sequencing of 96 F6 individuals from a recombinant inbred line population derived from a cross between CDC Bethune and Macbeth, corresponding to a validation rate of 96.8%. Conclusions Next generation sequencing of reduced representation libraries was successfully implemented for genome-wide SNP discovery from flax. The genotyping-by-sequencing

  16. Development of microsatellite markers for the Korean Mussel, Mytilus coruscus (Mytilidae) using next-generation sequencing.

    PubMed

    An, Hye Suck; Lee, Jang Wook

    2012-01-01

    Mytilus coruscus (family Mytilidae) is one of the most important marine shellfish species in Korea. During the past few decades, this species has become endangered due to the loss of habitats and overfishing. Despite this species' importance, information on its genetic background is scarce. In this study, we developed microsatellite markers for M. coruscus using next-generation sequencing. A total of 263,900 raw reads were obtained from a quarter-plate run on the 454 GS-FLX titanium platform, and 176,327 unique sequences were generated with an average length of 381 bp; 2569 (1.45%) sequences contained a minimum of five di- to tetra-nucleotide repeat motifs. Of the 51 loci screened, 46 were amplified successfully, and 22 were polymorphic among 30 individuals, with seven of trinucleotide repeats and three of tetranucleotide repeats. All loci exhibited high genetic variability, with an average of 17.32 alleles per locus, and the mean observed and expected heterozygosities were 0.67 and 0.90, respectively. In addition, cross-amplification was tested for all 22 loci in another congener species, M. galloprovincialis. None of the primer pairs resulted in effective amplification, which might be due to their high mutation rates. Our work demonstrated the utility of next-generation 454 sequencing as a method for the rapid and cost-effective identification of microsatellites. The high degree of polymorphism exhibited by the 22 newly developed microsatellites will be useful in future conservation genetic studies of this species.

  17. Large disclosing the nature of computational tools for the analysis of next generation sequencing data.

    PubMed

    Cordero, Francesca; Beccuti, Marco; Donatelli, Susanna; Calogero, Raffaele A

    2012-01-01

    Next-generation sequencing (NGS) technologies are rapidly changing the approach to complex genomic studies, opening the way to personalized drugs development and personalized medicine. NGS technologies are characterized by a massive throughput for relatively short-sequences (30-100), and they are currently the most reliable and accurate method for grouping individuals on the basis of their genetic profiles. The first and crucial step in sequence analysis is the conversion of millions of short sequences (reads) into valuable genetic information by their mapping to a known (reference) genome. New computational methods, specifically designed for the type and the amount of data generated by NGS technologies, are replacing earlier widespread genome alignment algorithms which are unable to cope with such massive amount of data. This review provides an overview of the bioinformatics techniques that have been developed for the mapping of NGS data onto a reference genome, with a special focus on polymorphism rate and sequence error detection. The different techniques have been experimented on an appropriately defined dataset, to investigate their relative computational costs and usability, as seen from an user perspective. Since NGS platforms interrogate the genome using either the conventional nucleotide space or the more recent color space, this review does consider techniques both in nucleotide and color space, emphasizing similarities and diversities.

  18. Software updates in the Illumina HiSeq platform affect whole-genome bisulfite sequencing.

    PubMed

    Toh, Hidehiro; Shirane, Kenjiro; Miura, Fumihito; Kubo, Naoki; Ichiyanagi, Kenji; Hayashi, Katsuhiko; Saitou, Mitinori; Suyama, Mikita; Ito, Takashi; Sasaki, Hiroyuki

    2017-01-05

    Methylation of cytosine in genomic DNA is a well-characterized epigenetic modification involved in many cellular processes and diseases. Whole-genome bisulfite sequencing (WGBS), such as MethylC-seq and post-bisulfite adaptor tagging sequencing (PBAT-seq), uses the power of high-throughput DNA sequencers and provides genome-wide DNA methylation profiles at single-base resolution. However, the accuracy and consistency of WGBS outputs in relation to the operating conditions of high-throughput sequencers have not been explored. We have used the Illumina HiSeq platform for our PBAT-based WGBS, and found that different versions of HiSeq Control Software (HCS) and Real-Time Analysis (RTA) installed on the system provided different global CpG methylation levels (approximately 5% overall difference) for the same libraries. This problem was reproduced multiple times with different WGBS libraries and likely to be associated with the low sequence diversity of bisulfite-converted DNA. We found that HCS was the major determinant in the observed differences. To determine which version of HCS is most suitable for WGBS, we used substrates with predetermined CpG methylation levels, and found that HCS v2.0.5 is the best among the examined versions. HCS v2.0.12 showed the poorest performance and provided artificially lower CpG methylation levels when 5-methylcytosine is read as guanine (first read of PBAT-seq and second read of MethylC-seq). In addition, paired-end sequencing of low diversity libraries using HCS v2.2.38 or the latest HCS v2.2.58 was greatly affected by cluster densities. Software updates in the Illumina HiSeq platform can affect the outputs from low-diversity sequencing libraries such as WGBS libraries. More recent versions are not necessarily the better, and HCS v2.0.5 is currently the best for WGBS among the examined HCS versions. Thus, together with other experimental conditions, special care has to be taken on this point when CpG methylation levels are to be

  19. Temporally consistent virtual camera generation from stereo image sequences

    NASA Astrophysics Data System (ADS)

    Fox, Simon R.; Flack, Julien; Shao, Juliang; Harman, Phil

    2004-05-01

    The recent emergence of auto-stereoscopic 3D viewing technologies has increased demand for the creation of 3D video content. A range of glasses-free multi-viewer screens have been developed that require as many as 9 views generated for each frame of video. This presents difficulties in both view generation and transmission bandwidth. This paper examines the use of stereo video capture as a means to generate multiple scene views via disparity analysis. A machine learning approach is applied to learn relationships between disparity generated depth information and source footage, and to generate depth information in a temporally smooth manner for both left and right eye image sequences. A view morphing approach to multiple view rendering is described which provides an excellent 3D effect on a range of glasses-free displays, while providing robustness to inaccurate stereo disparity calculations.

  20. Repetitive reef to ooid sequences near leeward margin of Caicos Platform, British West Indies

    SciTech Connect

    Waltz, M.; Rossinsky, V.; Wanless, H.R.

    1987-05-01

    Drill core transects and outcrops near the leeward margin of the Caicos Platform, BWI, reveal repetitive (one Holocene and two Pleistocene) shallowing-upward sequences of either (a) reefal boundstones overlain by layered oolitic grainstones or (b) burrowed oolitic grainstones overlain by layered oolitic grainstones. Each sediment sequence is separated from the other by a calcrete exposure surface. A transect, perpendicular to the trend of an exposed Pleistocene barrier reef/ooid sand complex, shows two separate sediment packages of reefal boundstones and reef-derived skeletal packstones overlain by layered oolitic grainstones. The well-exposed upper package consists of a shallowing-upward barrier reef, which is immediately overlain by burrowed and cross-bedded oolitic grainstones, beach rock blocks, and coral rubble, capped by layered oolitic grainstones. Separated by an exposure horizon, the lowermost package consists of coral and skeletal sands overlain by layered oolitic grainstones. Cores from a transect in a non-reefal setting north of the barrier reef complex reveal highly burrowed oolitic grainstones capped by layered oolitic grainstones. As a Holocene example, immediately offshore of this transect, modern reefs and bioturbated oolitic grainstones are presently being buried beneath coral rubble, beach rock blocks, and prograding oolitic beaches. Deposition of the capping layered oolitic grainstones appears to occur during stable and falling sea levels. This co-occurrence of reefal sediment and ooid sands suggests that the two are not mutually exclusive and that reef-ooid succession is a reoccurring part of leeward margin platform margin-building.

  1. In Silico Proficiency Testing for Clinical Next-Generation Sequencing.

    PubMed

    Duncavage, Eric J; Abel, Haley J; Pfeifer, John D

    2017-01-01

    Quality assurance for clinical next-generation sequencing (NGS)-based assays is difficult given the complex methods and the range of sequence variants such assays can detect. As the number and range of mutations detected by clinical NGS assays has increased, it is difficult to apply standard analyte-specific proficiency testing (PT). Most current proficiency testing challenges for NGS are methods-based PT surveys that use DNA from reference samples engineered to harbor specific mutations that test both sequence generation and bioinformatics analysis. These methods-based PTs are limited by the number and types of mutations that can be physically introduced into a single DNA sample. In silico proficiency testing, which evaluates only the bioinformatics component of NGS assays, is a recently introduced PT method that allows for evaluation of numerous mutations spanning a range of variant classes. In silico PT data sets can be generated from simulated or actual sequencing data and are used to test alignment through variant detection and annotation steps. In silico PT has several advantages over the use of physical samples, including greater flexibility in tested variants, the ability to design laboratory-specific challenges, and lower costs. Herein, we review the use of in silico PT as an alternative to traditional methods-based PT as it is evolving in oncology applications and discuss how the approach is applicable more broadly.

  2. Next generation sequencing in sporadic retinoblastoma patients reveals somatic mosaicism.

    PubMed

    Amitrano, Sara; Marozza, Annabella; Somma, Serena; Imperatore, Valentina; Hadjistilianou, Theodora; De Francesco, Sonia; Toti, Paolo; Galimberti, Daniela; Meloni, Ilaria; Cetta, Francesco; Piu, Pietro; Di Marco, Chiara; Dosa, Laura; Lo Rizzo, Caterina; Carignani, Giulia; Mencarelli, Maria Antonietta; Mari, Francesca; Renieri, Alessandra; Ariani, Francesca

    2015-11-01

    In about 50% of sporadic cases of retinoblastoma, no constitutive RB1 mutations are detected by conventional methods. However, recent research suggests that, at least in some of these cases, there is somatic mosaicism with respect to RB1 normal and mutant alleles. The increased availability of next generation sequencing improves our ability to detect the exact percentage of patients with mosaicism. Using this technology, we re-tested a series of 40 patients with sporadic retinoblastoma: 10 of them had been previously classified as constitutional heterozygotes, whereas in 30 no RB1 mutations had been found in lymphocytes. In 3 of these 30 patients, we have now identified low-level mosaic variants, varying in frequency between 8 and 24%. In 7 out of the 10 cases previously classified as heterozygous from testing blood cells, we were able to test additional tissues (ocular tissues, urine and/or oral mucosa): in three of them, next generation sequencing has revealed mosaicism. Present results thus confirm that a significant fraction (6/40; 15%) of sporadic retinoblastoma cases are due to postzygotic events and that deep sequencing is an efficient method to unambiguously distinguish mosaics. Re-testing of retinoblastoma patients through next generation sequencing can thus provide new information that may have important implications with respect to genetic counseling and family care.

  3. Eye movement sequence generation in humans: Motor or goal updating?

    PubMed Central

    Quaia, Christian; Joiner, Wilsaan M.; FitzGibbon, Edmond J.; Optican, Lance M.; Smith, Maurice A.

    2011-01-01

    Saccadic eye movements are often grouped in pre-programmed sequences. The mechanism underlying the generation of each saccade in a sequence is currently poorly understood. Broadly speaking, two alternative schemes are possible: first, after each saccade the retinotopic location of the next target could be estimated, and an appropriate saccade could be generated. We call this the goal updating hypothesis. Alternatively, multiple motor plans could be pre-computed, and they could then be updated after each movement. We call this the motor updating hypothesis. We used McLaughlin’s intra-saccadic step paradigm to artificially create a condition under which these two hypotheses make discriminable predictions. We found that in human subjects, when sequences of two saccades are planned, the motor updating hypothesis predicts the landing position of the second saccade in two-saccade sequences much better than the goal updating hypothesis. This finding suggests that the human saccadic system is capable of executing sequences of saccades to multiple targets by planning multiple motor commands, which are then updated by serial subtraction of ongoing motor output. PMID:21191134

  4. Next-generation sequencing technologies: breaking the sound barrier of human genetics.

    PubMed

    Bahassi, El Mustapha; Stambrook, Peter J

    2014-09-01

    Demand for new technologies that deliver fast, inexpensive and accurate genome information has never been greater. This challenge has catalysed the rapid development of advances in next-generation sequencing (NGS). The generation of large volumes of sequence data and the speed of data acquisition are the primary advantages over previous, more standard methods. In 2013, the Food and Drug Administration granted marketing authorisation for the first high-throughput NG sequencer, Illumina's MiSeqDx, which allowed the development and use of a large number of new genome-based tests. Here, we present a review of template preparation, nucleic acid sequencing and imaging, genome assembly and alignment approaches as well as recent advances in current and near-term commercially available NGS instruments. We also outline the broad range of applications for NGS technologies and provide guidelines for platform selection to best address biological questions of interest. DNA sequencing has revolutionised biological and medical research, and is poised to have a similar impact on the practice of medicine. This tool is but one of an increasing arsenal of developing tools that enhance our capabilities to identify, quantify and functionally characterise the components of biological networks that keep us healthy or make us sick. Despite advances in other 'omic' technologies, DNA sequencing and analysis, in many respects, have played the leading role to date. The new technologies provide a bridge between genotype and phenotype, both in man and model organisms, and have revolutionised how risk of developing a complex human disease may be assessed. The generation of large DNA sequence data sets is producing a wealth of medically relevant information on a large number of individuals and populations that will potentially form the basis of truly individualised medical care in the future.

  5. Sequence variation of 22 autosomal STR loci detected by next generation sequencing.

    PubMed

    Gettings, Katherine Butler; Kiesler, Kevin M; Faith, Seth A; Montano, Elizabeth; Baker, Christine H; Young, Brian A; Guerrieri, Richard A; Vallone, Peter M

    2016-03-01

    Sequencing short tandem repeat (STR) loci allows for determination of repeat motif variations within the STR (or entire PCR amplicon) which cannot be ascertained by size-based PCR fragment analysis. Sanger sequencing has been used in research laboratories to further characterize STR loci, but is impractical for routine forensic use due to the laborious nature of the procedure in general and additional steps required to separate heterozygous alleles. Recent advances in library preparation methods enable high-throughput next generation sequencing (NGS) and technological improvements in sequencing chemistries now offer sufficient read lengths to encompass STR alleles. Herein, we present sequencing results from 183 DNA samples, including African American, Caucasian, and Hispanic individuals, at 22 autosomal forensic STR loci using an assay designed for NGS. The resulting dataset has been used to perform population genetic analyses of allelic diversity by length compared to sequence, and exemplifies which loci are likely to achieve the greatest gains in discrimination via sequencing. Within this data set, six loci demonstrate greater than double the number of alleles obtained by sequence compared to the number of alleles obtained by length: D12S391, D2S1338, D21S11, D8S1179, vWA, and D3S1358. As expected, repeat region sequences which had not previously been reported in forensic literature were identified.

  6. Sequence variation of 22 autosomal STR loci detected by next generation sequencing

    PubMed Central

    Gettings, Katherine Butler; Kiesler, Kevin M.; Faith, Seth A.; Montano, Elizabeth; Baker, Christine H.; Young, Brian A.; Guerrieri, Richard A.; Vallone, Peter M.

    2016-01-01

    Sequencing short tandem repeat (STR) loci allows for determination of repeat motif variations within the STR (or entire PCR amplicon) which cannot be ascertained by size-based PCR fragment analysis. Sanger sequencing has been used in research laboratories to further characterize STR loci, but is impractical for routine forensic use due to the laborious nature of the procedure in general and additional steps required to separate heterozygous alleles. Recent advances in library preparation methods enable high-throughput next generation sequencing (NGS) and technological improvements in sequencing chemistries now offer sufficient read lengths to encompass STR alleles. Herein, we present sequencing results from 183 DNA samples, including African American, Caucasian, and Hispanic individuals, at 22 autosomal forensic STR loci using an assay designed for NGS. The resulting dataset has been used to perform population genetic analyses of allelic diversity by length compared to sequence, and exemplifies which loci are likely to achieve the greatest gains in discrimination via sequencing. Within this data set, six loci demonstrate greater than double the number of alleles obtained by sequence compared to the number of alleles obtained by length: D12S391, D2S1338, D21S11, D8S1179, vWA, and D3S1358. As expected, repeat region sequences which had not previously been reported in forensic literature were identified. PMID:26701720

  7. Automatic generation of primary sequence patterns from sets of related protein sequences.

    PubMed Central

    Smith, R F; Smith, T F

    1990-01-01

    We have developed a computer algorithm that can extract the pattern of conserved primary sequence elements common to all members of a homologous protein family. The method involves clustering the pairwise similarity scores among a set of related sequences to generate a binary dendrogram (tree). The tree is then reduced in a stepwise manner by progressively replacing the node connecting the two most similar termini by one common pattern until only a single common "root" pattern remains. A pattern is generated at a node by (i) performing a local optimal alignment on the sequence/pattern pair connected by the node with the use of an extended dynamic programming algorithm and then (ii) constructing a single common pattern from this alignment with a nested hierarchy of amino acid classes to identify the minimal inclusive amino acid class covering each paired set of elements in the alignment. Gaps within an alignment are created and/or extended using a "pay once" gap penalty rule, and gapped positions are converted into gap characters that function as 0 or 1 amino acid of any type during subsequent alignment. This method has been used to generate a library of covering patterns for homologous families in the National Biomedical Research Foundation/Protein Identification Resource protein sequence data base. We show that a covering pattern can be more diagnostic for sequence family membership than any of the individual sequences used to construct the pattern. Images PMID:2296575

  8. Fourth Generation of Next-Generation Sequencing Technologies: Promise and Consequences.

    PubMed

    Ke, Rongqin; Mignardi, Marco; Hauling, Thomas; Nilsson, Mats

    2016-12-01

    In this review, we discuss the emergence of the fourth-generation sequencing technologies that preserve the spatial coordinates of RNA and DNA sequences with up to subcellular resolution, thus enabling back mapping of sequencing reads to the original histological context. This information is used, for example, in two current large-scale projects that aim to unravel the function of the brain. Also in cancer research, fourth-generation sequencing has the potential to revolutionize the field. Cancer Research UK has named "Mapping the molecular and cellular tumor microenvironment in order to define new targets for therapy and prognosis" one of the grand challenges in tumor biology. We discuss the advantages of sequencing nucleic acids directly in fixed cells over traditional next-generation sequencing (NGS) methods, the limitations and challenges that these new methods have to face to become broadly applicable, and the impact that the information generated by the combination of in situ sequencing and NGS methods will have in research and diagnostics.

  9. Fourth Generation of Next‐Generation Sequencing Technologies: Promise and Consequences

    PubMed Central

    Ke, Rongqin; Mignardi, Marco; Hauling, Thomas

    2016-01-01

    ABSTRACT In this review, we discuss the emergence of the fourth‐generation sequencing technologies that preserve the spatial coordinates of RNA and DNA sequences with up to subcellular resolution, thus enabling back mapping of sequencing reads to the original histological context. This information is used, for example, in two current large‐scale projects that aim to unravel the function of the brain. Also in cancer research, fourth‐generation sequencing has the potential to revolutionize the field. Cancer Research UK has named “Mapping the molecular and cellular tumor microenvironment in order to define new targets for therapy and prognosis” one of the grand challenges in tumor biology. We discuss the advantages of sequencing nucleic acids directly in fixed cells over traditional next‐generation sequencing (NGS) methods, the limitations and challenges that these new methods have to face to become broadly applicable, and the impact that the information generated by the combination of in situ sequencing and NGS methods will have in research and diagnostics. PMID:27406789

  10. Rapid Generation and Testing of a Lassa Fever Vaccine Using VaxCelerate Platform

    DTIC Science & Technology

    2014-08-28

    that are restricted by HLA-A2. J Virol 2006;80(17):8351-61. Bredenbeek PJ, Molenkamp R, Spaan WJM. A recombinant Yellow Fever 17D vaccine ...SECURITY CLASSIFICATION OF: In this project, the VaxCelerate Consortium completed the generation and testing of a new vaccine against Lassa fever ...Dec-2013 Approved for Public Release; Distribution Unlimited Rapid Generation and Testing of a Lassa Fever Vaccine Using VaxCelerate Platform The views

  11. Third generation sequencing technologies applied to diagnostic microbiology: benefits and challenges in applications and data analysis.

    PubMed

    Lavezzo, Enrico; Barzon, Luisa; Toppo, Stefano; Palù, Giorgio

    2016-09-01

    The diagnosis of infectious diseases is among the most successful areas of application of new generation sequencing technologies. The field has seen the development of numerous experimental and analytical approaches for the detection and the fine description of pathogenic and non-pathogenic microorganisms. Without claiming to be exhaustive with respect to all applications and methods developed over the years, this review focuses on the advantages and the issues brought by the new technologies, with an eye in particular to third generation sequencing methods. Both experimental procedures and algorithmic strategies are presented, following the most relevant publications which have led to progress in our ability of detecting infectious agents. Expert commentary: The technical advance brought by third generation sequencing platforms has the potential to significantly expand the range of diagnostic tools that will be available to clinicians. Nonetheless, the implementation of these technologies in clinical practice is still far from being actionable and will temporally follow the path undertaken by second generation methods, which still require the setup of standardized pipelines in both wet and dry laboratory procedures.

  12. Next-generation sequencing in schizophrenia and other neuropsychiatric disorders.

    PubMed

    Schreiber, Matthew; Dorschner, Michael; Tsuang, Debby

    2013-10-01

    Schizophrenia is a debilitating lifelong illness that lacks a cure and poses a worldwide public health burden. The disease is characterized by a heterogeneous clinical and genetic presentation that complicates research efforts to identify causative genetic variations. This review examines the potential of current findings in schizophrenia and in other related neuropsychiatric disorders for application in next-generation technologies, particularly whole-exome sequencing (WES) and whole-genome sequencing (WGS). These approaches may lead to the discovery of underlying genetic factors for schizophrenia and may thereby identify and target novel therapeutic targets for this devastating disorder.

  13. Next-Generation Technologies for Multiomics Approaches Including Interactome Sequencing

    PubMed Central

    Ohashi, Hiroyuki; Miyamoto-Sato, Etsuko

    2015-01-01

    The development of high-speed analytical techniques such as next-generation sequencing and microarrays allows high-throughput analysis of biological information at a low cost. These techniques contribute to medical and bioscience advancements and provide new avenues for scientific research. Here, we outline a variety of new innovative techniques and discuss their use in omics research (e.g., genomics, transcriptomics, metabolomics, proteomics, and interactomics). We also discuss the possible applications of these methods, including an interactome sequencing technology that we developed, in future medical and life science research. PMID:25649523

  14. DNA extraction from vegetative tissue for next-generation sequencing.

    PubMed

    Furtado, Agnelo

    2014-01-01

    The quality of extracted DNA is crucial for several applications in molecular biology. If the DNA is to be used for next-generation sequencing (NGS), then microgram quantities of good-quality DNA is required. In addition, the DNA must substantially be of high molecular weight so that it can be used for library preparation and NGS sequencing. Contaminating phenol or starch in the isolated DNA can be easily removed by filtration through kit-based cartridges. In this chapter we describe a simple two-reagent DNA extraction protocol which yields a high quality and quantity of DNA which can be used for different applications including NGS.

  15. Sequencing, De novo Assembly, Functional Annotation and Analysis of Phyllanthus amarus Leaf Transcriptome Using the Illumina Platform

    PubMed Central

    Bose Mazumdar, Aparupa; Chattopadhyay, Sharmila

    2016-01-01

    Phyllanthus amarus Schum. and Thonn., a widely distributed annual medicinal herb has a long history of use in the traditional system of medicine for over 2000 years. However, the lack of genomic data for P. amarus, a non-model organism hinders research at the molecular level. In the present study, high-throughput sequencing technology has been employed to enhance better understanding of this herb and provide comprehensive genomic information for future work. Here P. amarus leaf transcriptome was sequenced using the Illumina Miseq platform. We assembled 85,927 non-redundant (nr) “unitranscript” sequences with an average length of 1548 bp, from 18,060,997 raw reads. Sequence similarity analyses and annotation of these unitranscripts were performed against databases like green plants nr protein database, Gene Ontology (GO), Clusters of Orthologous Groups (COG), PlnTFDB, KEGG databases. As a result, 69,394 GO terms, 583 enzyme codes (EC), 134 KEGG maps, and 59 Transcription Factor (TF) families were generated. Functional and comparative analyses of assembled unitranscripts were also performed with the most closely related species like Populus trichocarpa and Ricinus communis using TRAPID. KEGG analysis showed that a number of assembled unitranscripts were involved in secondary metabolites, mainly phenylpropanoid, flavonoid, terpenoids, alkaloids, and lignan biosynthetic pathways that have significant medicinal attributes. Further, Fragments Per Kilobase of transcript per Million mapped reads (FPKM) values of the identified secondary metabolite pathway genes were determined and Reverse Transcription PCR (RT-PCR) of a few of these genes were performed to validate the de novo assembled leaf transcriptome dataset. In addition 65,273 simple sequence repeats (SSRs) were also identified. To the best of our knowledge, this is the first transcriptomic dataset of P. amarus till date. Our study provides the largest genetic resource that will lead to drug development and pave

  16. Alignment-free sequence comparison based on next-generation sequencing reads.

    PubMed

    Song, Kai; Ren, Jie; Zhai, Zhiyuan; Liu, Xuemei; Deng, Minghua; Sun, Fengzhu

    2013-02-01

    Next-generation sequencing (NGS) technologies have generated enormous amounts of shotgun read data, and assembly of the reads can be challenging, especially for organisms without template sequences. We study the power of genome comparison based on shotgun read data without assembly using three alignment-free sequence comparison statistics, D(2), D(*)(2) and D(s)(2), both theoretically and by simulations. Theoretical formulas for the power of detecting the relationship between two sequences related through a common motif model are derived. It is shown that both D(*)(2) and D(s)(2), outperform D(2) for detecting the relationship between two sequences based on NGS data. We then study the effects of length of the tuple, read length, coverage, and sequencing error on the power of D(*)(2) and D(s)(2). Finally, variations of these statistics, d(2), d(*)(2) and d(s)(2), respectively, are used to first cluster five mammalian species with known phylogenetic relationships, and then cluster 13 tree species whose complete genome sequences are not available using NGS shotgun reads. The clustering results using d(s)(2) are consistent with biological knowledge for the 5 mammalian and 13 tree species, respectively. Thus, the statistic d(s)(2) provides a powerful alignment-free comparison tool to study the relationships among different organisms based on NGS read data without assembly.

  17. Biomarker discovery by CE-MS enables sequence analysis via MS/MS with platform-independent separation.

    PubMed

    Zürbig, Petra; Renfrow, Matthew B; Schiffer, Eric; Novak, Jan; Walden, Michael; Wittke, Stefan; Just, Ingo; Pelzing, Matthias; Neusüss, Christian; Theodorescu, Dan; Root, Karen E; Ross, Mark M; Mischak, Harald

    2006-06-01

    CE-MS is a successful proteomic platform for the definition of biomarkers in different body fluids. Besides the biomarker defining experimental parameters, CE migration time and molecular weight, especially biomarker's sequence identity is an indispensable cornerstone for deeper insights into the pathophysiological pathways of diseases or for made-to-measure therapeutic drug design. Therefore, this report presents a detailed discussion of different peptide sequencing platforms consisting of high performance separation method either coupled on-line or off-line to different MS/MS devices, such as MALDI-TOF-TOF, ESI-IT, ESI-QTOF and Fourier transform ion cyclotron resonance, for sequencing indicative peptides. This comparison demonstrates the unique feature of CE-MS technology to serve as a reliable basis for the assignment of peptide sequence data obtained using different separation MS/MS methods to the biomarker defining parameters, CE migration time and molecular weight. Discovery of potential biomarkers by CE-MS enables sequence analysis via MS/MS with platform-independent sample separation. This is due to the fact that the number of basic and neutral polar amino acids of biomarkers sequences distinctly correlates with their CE-MS migration time/molecular weight coordinates. This uniqueness facilitates the independent entry of different sequencing platforms for peptide sequencing of CE-MS-defined biomarkers from highly complex mixtures.

  18. Histoimmunogenetics Markup Language 1.0: Reporting next generation sequencing-based HLA and KIR genotyping.

    PubMed

    Milius, Robert P; Heuer, Michael; Valiga, Daniel; Doroschak, Kathryn J; Kennedy, Caleb J; Bolon, Yung-Tsi; Schneider, Joel; Pollack, Jane; Kim, Hwa Ran; Cereb, Nezih; Hollenbach, Jill A; Mack, Steven J; Maiers, Martin

    2015-12-01

    We present an electronic format for exchanging data for HLA and KIR genotyping with extensions for next-generation sequencing (NGS). This format addresses NGS data exchange by refining the Histoimmunogenetics Markup Language (HML) to conform to the proposed Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) reporting guidelines (miring.immunogenomics.org). Our refinements of HML include two major additions. First, NGS is supported by new XML structures to capture additional NGS data and metadata required to produce a genotyping result, including analysis-dependent (dynamic) and method-dependent (static) components. A full genotype, consensus sequence, and the surrounding metadata are included directly, while the raw sequence reads and platform documentation are externally referenced. Second, genotype ambiguity is fully represented by integrating Genotype List Strings, which use a hierarchical set of delimiters to represent allele and genotype ambiguity in a complete and accurate fashion. HML also continues to enable the transmission of legacy methods (e.g. site-specific oligonucleotide, sequence-specific priming, and Sequence Based Typing (SBT)), adding features such as allowing multiple group-specific sequencing primers, and fully leveraging techniques that combine multiple methods to obtain a single result, such as SBT integrated with NGS.

  19. Detection of false positive mutations in BRCA gene by next generation sequencing.

    PubMed

    Suryavanshi, Moushumi; Kumar, Dushyant; Panigrahi, Manoj Kumar; Chowdhary, Meenakshi; Mehta, Anurag

    2016-11-15

    BRCA1 and BRCA2 genes are implicated in 20-25% of hereditary breast and ovarian cancers. New age sequencing platforms have revolutionized massively parallel sequencing in clinical practice by providing cost effective, rapid, and sensitive sequencing. This study critically evaluates the false positives in multiplex panels and suggests the need for careful analysis. We employed multiplex PCR based BRCA1 and BRCA2 community Panel with ion torrent PGM machine for evaluation of these mutations. Out of all 41samples analyzed for BRCA1 and BRCA2 five were found with 950_951 insA(Asn319fs) at Chr13:32906565 position and one sample with 1032_1033 insA(Asn346fs) at Chr13:32906647, both being frame-shift mutations in BRCA2 gene. 950_951 insA(Asn319fs) mutation is reported as pathogenic allele in NCBI dbSNP. On examination of IGV for all these samples, it was seen that both mutations had 'A' nucleotide insertion at 950, and 1032 position in exon 10 of BRCA2 gene. Sanger Sequencing did not confirm these insertions. Next-generation sequencing shows great promise by allowing rapid mutational analysis of multiple genes in human cancer but our results indicate the need for careful sequence analysis to avoid false positive results.

  20. Using Next Generation RAD Sequencing to Isolate Multispecies Microsatellites for Pilosocereus (Cactaceae)

    PubMed Central

    Bonatelli, Isabel A. S.; Carstens, Bryan C.; Moraes, Evandro M.

    2015-01-01

    Microsatellite markers (also known as SSRs, Simple Sequence Repeats) are widely used in plant science and are among the most informative molecular markers for population genetic investigations, but the development of such markers presents substantial challenges. In this report, we discuss how next generation sequencing can replace the cloning, Sanger sequencing, identification of polymorphic loci, and testing cross-amplification that were previously required to develop microsatellites. We report the development of a large set of microsatellite markers for five species of the Neotropical cactus genus Pilosocereus using a restriction-site-associated DNA sequencing (RAD-seq) on a Roche 454 platform. We identified an average of 165 microsatellites per individual, with the absolute numbers across individuals proportional to the sequence reads obtained per individual. Frequency distribution of the repeat units was similar in the five species, with shorter motifs such as di- and trinucleotide being the most abundant repeats. In addition, we provide 72 microsatellites that could be potentially amplified in the sampled species and 22 polymorphic microsatellites validated in two populations of the species Pilosocereus machrisii. Although low coverage sequencing among individuals was observed for most of the loci, which we suggest to be more related to the nature of the microsatellite markers and the possible bias inserted by the restriction enzymes than to the genome size, our work demonstrates that an NGS approach is an efficient method to isolate multispecies microsatellites even in non-model organisms. PMID:26561396

  1. Using Next Generation RAD Sequencing to Isolate Multispecies Microsatellites for Pilosocereus (Cactaceae).

    PubMed

    Bonatelli, Isabel A S; Carstens, Bryan C; Moraes, Evandro M

    2015-01-01

    Microsatellite markers (also known as SSRs, Simple Sequence Repeats) are widely used in plant science and are among the most informative molecular markers for population genetic investigations, but the development of such markers presents substantial challenges. In this report, we discuss how next generation sequencing can replace the cloning, Sanger sequencing, identification of polymorphic loci, and testing cross-amplification that were previously required to develop microsatellites. We report the development of a large set of microsatellite markers for five species of the Neotropical cactus genus Pilosocereus using a restriction-site-associated DNA sequencing (RAD-seq) on a Roche 454 platform. We identified an average of 165 microsatellites per individual, with the absolute numbers across individuals proportional to the sequence reads obtained per individual. Frequency distribution of the repeat units was similar in the five species, with shorter motifs such as di- and trinucleotide being the most abundant repeats. In addition, we provide 72 microsatellites that could be potentially amplified in the sampled species and 22 polymorphic microsatellites validated in two populations of the species Pilosocereus machrisii. Although low coverage sequencing among individuals was observed for most of the loci, which we suggest to be more related to the nature of the microsatellite markers and the possible bias inserted by the restriction enzymes than to the genome size, our work demonstrates that an NGS approach is an efficient method to isolate multispecies microsatellites even in non-model organisms.

  2. Large-scale MHC class II genotyping of a wild lemur population by next generation sequencing.

    PubMed

    Huchard, Elise; Albrecht, Christina; Schliehe-Diecks, Susanne; Baniel, Alice; Roos, Christian; Kappeler, Peter M; Peter, Peter M Kappeler; Brameier, Markus

    2012-12-01

    The critical role of major histocompatibility complex (MHC) genes in disease resistance, along with their putative function in sexual selection, reproduction and chemical ecology, make them an important genetic system in evolutionary ecology. Studying selective pressures acting on MHC genes in the wild nevertheless requires population-wide genotyping, which has long been challenging because of their extensive polymorphism. Here, we report on large-scale genotyping of the MHC class II loci of the grey mouse lemur (Microcebus murinus) from a wild population in western Madagascar. The second exons from MHC-DRB and -DQB of 772 and 672 individuals were sequenced, respectively, using a 454 sequencing platform, generating more than 800,000 reads. Sequence analysis, through a stepwise variant validation procedure, allowed reliable typing of more than 600 individuals. The quality of our genotyping was evaluated through three independent methods, namely genotyping the same individuals by both cloning and 454 sequencing, running duplicates, and comparing parent-offspring dyads; each displaying very high accuracy. A total of 61 (including 20 new) and 60 (including 53 new) alleles were detected at DRB and DQB genes, respectively. Both loci were non-duplicated, in tight linkage disequilibrium and in Hardy-Weinberg equilibrium, despite the fact that sequence analysis revealed clear evidence of historical selection. Our results highlight the potential of 454 sequencing technology in attempts to investigate patterns of selection shaping MHC variation in contemporary populations. The power of this approach will nevertheless be conditional upon strict quality control of the genotyping data.

  3. SeqHound: biological sequence and structure database as a platform for bioinformatics research

    PubMed Central

    2002-01-01

    Background SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. Results SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. Conclusions The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit. PMID:12401134

  4. The 2013 seismic sequence close to gas injection platform of the Castor project, offshore Spain

    NASA Astrophysics Data System (ADS)

    Cesca, Simone; Grigoli, Francesco; Heimann, Sebastian; Gonzalez, Alvaro; Buforn, Elisa; Maghsoudi, Samira; Blanch, Estefania; Dahm, Torsten

    2014-05-01

    A spatially localized seismic sequence has originated few tens of kilometres offshore the Mediterranean coast of Spain, starting on September 5, 2013, and lasting at least until October 2013. The sequence culminated in a maximal moment magnitude Mw 4.3 earthquake, on October 1, 2013. The epicentral region is located near the offshore platform of the Castor project, where gas is conducted through a pipeline from mainland and where it was recently injected in a depleted oil reservoir, at about 2 km depth. We analyse the temporal evolution of the seismic sequence and use full waveform techniques to derive absolute and relative locations, estimate depths and focal mechanisms for the largest events in the sequence (with magnitude mbLg larger than 3), and compare them to a previous event (April 8, 2012, mbLg 3.3) taking place in the same region prior to the gas injection. Moment tensor inversion results show that the overall seismicity in this sequence is characterized by oblique mechanisms with a normal fault component, with a 30° low-dip angle plane oriented NNE-SSW and a sub- vertical plane oriented NW-SE. The combined analysis of hypocentral location and focal mechanisms could indicate that the seismic sequence corresponds to rupture processes along sub- horizontal shallow surfaces, which could have been triggered by the gas injection in the reservoir,. An alternative scenario includes the iterated triggering of a system of steep faults oriented NW-SE, which were identified by prior marine seismics investigations. The most relevant seismogenic feature in the area is the Fosa de Amposta fault system, which includes different strands mapped at different distances to the coast, with a general NE-SW orientation, roughly parallel to the coastline. No significant known historical seismicity has involved this fault in the past. Our both scenarios exclude its activation, as its known orientation is inconsistent with focal mechanism results.

  5. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data.

    PubMed

    Kearse, Matthew; Moir, Richard; Wilson, Amy; Stones-Havas, Steven; Cheung, Matthew; Sturrock, Shane; Buxton, Simon; Cooper, Alex; Markowitz, Sidney; Duran, Chris; Thierer, Tobias; Ashton, Bruce; Meintjes, Peter; Drummond, Alexei

    2012-06-15

    The two main functions of bioinformatics are the organization and analysis of biological data using computational resources. Geneious Basic has been designed to be an easy-to-use and flexible desktop software application framework for the organization and analysis of biological data, with a focus on molecular sequences and related data types. It integrates numerous industry-standard discovery analysis tools, with interactive visualizations to generate publication-ready images. One key contribution to researchers in the life sciences is the Geneious public application programming interface (API) that affords the ability to leverage the existing framework of the Geneious Basic software platform for virtually unlimited extension and customization. The result is an increase in the speed and quality of development of computation tools for the life sciences, due to the functionality and graphical user interface available to the developer through the public API. Geneious Basic represents an ideal platform for the bioinformatics community to leverage existing components and to integrate their own specific requirements for the discovery, analysis and visualization of biological data. Binaries and public API freely available for download at http://www.geneious.com/basic, implemented in Java and supported on Linux, Apple OSX and MS Windows. The software is also available from the Bio-Linux package repository at http://nebc.nerc.ac.uk/news/geneiousonbl.

  6. Into the unknown: expression profiling without genome sequence information in CHO by next generation sequencing.

    PubMed

    Birzele, Fabian; Schaub, Jochen; Rust, Werner; Clemens, Christoph; Baum, Patrick; Kaufmann, Hitto; Weith, Andreas; Schulz, Torsten W; Hildebrandt, Tobias

    2010-07-01

    The arrival of next-generation sequencing (NGS) technologies has led to novel opportunities for expression profiling and genome analysis by utilizing vast amounts of short read sequence data. Here, we demonstrate that expression profiling in organisms lacking any genome or transcriptome sequence information is feasible by combining Illumina's mRNA-seq technology with a novel bioinformatics pipeline that integrates assembled and annotated Chinese hamster ovary (CHO) sequences with information derived from related organisms. We applied this pipeline to the analysis of CHO cells which were chosen as a model system owing to its relevance in the production of therapeutic proteins. Specifically, we analysed CHO cells undergoing butyrate treatment which is known to affect cell cycle regulation and to increase the specific productivity of recombinant proteins. By this means, we identified sequences for >13,000 CHO genes which added sequence information of approximately 5000 novel genes to the CHO model. More than 6000 transcript sequences are predicted to be complete, as they covered >95% of the corresponding mouse orthologs. Detailed analysis of selected biological functions such as DNA replication and cell cycle control, demonstrated the potential of NGS expression profiling in organisms without extended genome sequence to improve both data quantity and quality.

  7. Clinical Application of Targeted Next Generation Sequencing for Colorectal Cancers

    PubMed Central

    Fontanges, Quitterie; De Mendonca, Ricardo; Salmon, Isabelle; Le Mercier, Marie; D’Haene, Nicky

    2016-01-01

    Promising targeted therapy and personalized medicine are making molecular profiling of tumours a priority. For colorectal cancer (CRC) patients, international guidelines made RAS (KRAS and NRAS) status a prerequisite for the use of anti-epidermal growth factor receptor agents (anti-EGFR). Daily, new data emerge on the theranostic and prognostic role of molecular biomarkers, which is a strong incentive for a validated, sensitive and broadly available molecular screening test in order to implement and improve multi-modal therapy strategy and clinical trials. Next generation sequencing (NGS) has begun to supplant other technologies for genomic profiling. Targeted NGS is a method that allows parallel sequencing of thousands of short DNA sequences in a single test offering a cost-effective approach for detecting multiple genetic alterations with a minimum amount of DNA. In the present review, we collected data concerning the clinical application of NGS technology in the setting of colorectal cancer. PMID:27999270

  8. Nanopore-based Fourth-generation DNA Sequencing Technology

    PubMed Central

    Feng, Yanxiao; Zhang, Yuechuan; Ying, Cuifeng; Wang, Deqiang; Du, Chunlei

    2015-01-01

    Nanopore-based sequencers, as the fourth-generation DNA sequencing technology, have the potential to quickly and reliably sequence the entire human genome for less than $1000, and possibly for even less than $100. The single-molecule techniques used by this technology allow us to further study the interaction between DNA and protein, as well as between protein and protein. Nanopore analysis opens a new door to molecular biology investigation at the single-molecule scale. In this article, we have reviewed academic achievements in nanopore technology from the past as well as the latest advances, including both biological and solid-state nanopores, and discussed their recent and potential applications. PMID:25743089

  9. Generating Researcher Networks with Identified Persons on a Semantic Service Platform

    NASA Astrophysics Data System (ADS)

    Jung, Hanmin; Lee, Mikyoung; Kim, Pyung; Lee, Seungwoo

    This paper describes a Semantic Web-based method to acquire researcher networks by means of identification scheme, ontology, and reasoning. Three steps are required to realize it; resolving co-references, finding experts, and generating researcher networks. We adopt OntoFrame as an underlying semantic service platform and apply reasoning to make direct relations between far-off classes in ontology schema. 453,124 Elsevier journal articles with metadata and full-text documents in information technology and biomedical domains have been loaded and served on the platform as a test set.

  10. Automatic Generation of Randomized Trial Sequences for Priming Experiments

    PubMed Central

    Ihrke, Matthias; Behrendt, Jörg

    2011-01-01

    In most psychological experiments, a randomized presentation of successive displays is crucial for the validity of the results. For some paradigms, this is not a trivial issue because trials are interdependent, e.g., priming paradigms. We present a software that automatically generates optimized trial sequences for (negative-) priming experiments. Our implementation is based on an optimization heuristic known as genetic algorithms that allows for an intuitive interpretation due to its similarity to natural evolution. The program features a graphical user interface that allows the user to generate trial sequences and to interactively improve them. The software is based on freely available software and is released under the GNU General Public License. PMID:22007178

  11. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

    PubMed Central

    2011-01-01

    Background Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was

  12. All-optical pseudorandom bit sequences generator based on TOADs

    NASA Astrophysics Data System (ADS)

    Sun, Zhenchao; Wang, Zhi; Wu, Chongqing; Wang, Fu; Li, Qiang

    2016-03-01

    A scheme for all-optical pseudorandom bit sequences (PRBS) generator is demonstrated with optical logic gate 'XNOR' and all-optical wavelength converter based on cascaded Tera-Hertz Optical Asymmetric Demultiplexer (TOADs). Its feasibility is verified by generation of return-to-zero on-off keying (RZ-OOK) 263-1 PRBS at the speed of 1 Gb/s with 10% duty radio. The high randomness of ultra-long cycle PRBS is validated by successfully passing the standard benchmark test.

  13. Next generation sequencing: an application in forensic sciences?

    PubMed

    Alvarez-Cubero, Maria Jesus; Saiz, Maria; Martínez-García, Belén; Sayalero, Sara M; Entrala, Carmen; Lorente, Jose Antonio; Martinez-Gonzalez, Luis Javier

    2017-09-26

    Over the last few decades, advances in sequencing have improved greatly. One of the most important achievements of Next Generation Sequencing (NGS) is to produce millions of sequence reads in a short period of time, and to produce large sequences of DNA in fragments of any size. Libraries can be generated from whole genomes or any DNA or RNA region of interest without the need to know its sequence beforehand. This allows for looking for variations and facilitating genetic identification. A deep analysis of current NGS technologies and their application, especially in forensics, including a discussion about the pros and cons of these technologies in genetic identification. A systematic literature search in PubMed, Science Direct and Scopus electronic databases was performed for the period of December 2012 to June 2015. In the forensic field, one of the main problems is the limited amount of sample available, as well as its degraded state. If the amount of DNA input required for preparing NGS libraries continues to decrease, nearly any sample could be sequenced; therefore, the maximum information from any biological remains could be obtained. Additionally, microbiome typification could be an interesting application to study for crime scene characterisation. NGS technologies are going to be crucial for DNA human typing in cases like mass disasters or other events where forensic specimens and samples are compromised and degraded. With the use of NGS it will be possible to achieve the simultaneous analysis of the standard autosomal DNA (STRs and SNPs), mitochondrial DNA, and X and Y chromosomal markers.

  14. Versatile ion S5XL sequencer for targeted next generation sequencing of solid tumors in a clinical laboratory.

    PubMed

    Mehrotra, Meenakshi; Duose, Dzifa Yawa; Singh, Rajesh R; Barkoh, Bedia A; Manekia, Jawad; Harmon, Michael A; Patel, Keyur P; Routbort, Mark J; Medeiros, L Jeffrey; Wistuba, Ignacio I; Luthra, Rajyalakshmi

    2017-01-01

    Next generation sequencing based tumor tissue genotyping involves complex workflow and a relatively longer turnaround time. Semiconductor based next generation platforms varied from low throughput Ion PGM to high throughput Ion Proton and Ion S5XL sequencer. In this study, we compared Ion PGM and Ion Proton, with a new Ion S5XL NGS system for workflow scalability, analytical sensitivity and specificity, turnaround time and sequencing performance in a clinical laboratory. Eighteen solid tumor samples positive for various mutations as detected previously by Ion PGM and Ion Proton were selected for study. Libraries were prepared using DNA (range10-40ng) from micro-dissected formalin-fixed, paraffin-embedded (FFPE) specimens using the Ion Ampliseq Library Kit 2.0 for comprehensive cancer (CCP), oncomine comprehensive cancer (OCP) and cancer hotspot panel v2 (CHPv2) panel as per manufacturer's instructions. The CHPv2 were sequenced using Ion PGM whereas CCP and OCP were sequenced using Ion Proton respectively. All the three libraries were further sequenced individually (S540) or multiplexed (S530) using Ion S5XL. For S5XL, Ion chef was used to automate template preparation, enrichment of ion spheres and chip loading. Data analysis was performed using Torrent Suite 4.6 software on board S5XL and Ion Reporter. A limit of detection and reproducibility studies was performed using serially diluted DLD1 cell line. A total of 241 variant calls (235 single nucleotide variants and 6 indels) expected in the studied cohort were successfully detected by S5XL with 100% and 97% concordance with Ion PGM and Proton, respectively. Sequencing run time was reduced from 4.5 to 2.5 hours with output range of 3-5 GB (S530) and 8-9.3Gb (S540). Data analysis time for the Ion S5XL is faster 1 h (S520), 2.5 h (S530) and 5 h (S540) chip, respectively as compared to the Ion PGM (3.5-5 h) and Ion Proton (8h). A limit detection of 5% allelic frequency was established along with high inter

  15. Estimating individual admixture proportions from next generation sequencing data.

    PubMed

    Skotte, Line; Korneliussen, Thorfinn Sand; Albrechtsen, Anders

    2013-11-01

    Inference of population structure and individual ancestry is important both for population genetics and for association studies. With next generation sequencing technologies it is possible to obtain genetic data for all accessible genetic variations in the genome. Existing methods for admixture analysis rely on known genotypes. However, individual genotypes cannot be inferred from low-depth sequencing data without introducing errors. This article presents a new method for inferring an individual's ancestry that takes the uncertainty introduced in next generation sequencing data into account. This is achieved by working directly with genotype likelihoods that contain all relevant information of the unobserved genotypes. Using simulations as well as publicly available sequencing data, we demonstrate that the presented method has great accuracy even for very low-depth data. At the same time, we demonstrate that applying existing methods to genotypes called from the same data can introduce severe biases. The presented method is implemented in the NGSadmix software available at http://www.popgen.dk/software.

  16. New Generations: Sequencing Machines and Their Computational Challenges

    PubMed Central

    Schwartz, David C.; Waterman, Michael S.

    2011-01-01

    New generation sequencing systems are changing how molecular biology is practiced. The widely promoted $1000 genome will be a reality with attendant changes for healthcare, including personalized medicine. More broadly the genomes of many new organisms with large samplings from populations will be commonplace. What is less appreciated is the explosive demands on computation, both for CPU cycles and storage as well as the need for new computational methods. In this article we will survey some of these developments and demands. PMID:22121326

  17. New Generations: Sequencing Machines and Their Computational Challenges.

    PubMed

    Schwartz, David C; Waterman, Michael S

    2010-01-01

    New generation sequencing systems are changing how molecular biology is practiced. The widely promoted $1000 genome will be a reality with attendant changes for healthcare, including personalized medicine. More broadly the genomes of many new organisms with large samplings from populations will be commonplace. What is less appreciated is the explosive demands on computation, both for CPU cycles and storage as well as the need for new computational methods. In this article we will survey some of these developments and demands.

  18. Nucleotide-resolution DNA double-strand break mapping by next-generation sequencing.

    PubMed

    Crosetto, Nicola; Mitra, Abhishek; Silva, Maria Joao; Bienko, Magda; Dojer, Norbert; Wang, Qi; Karaca, Elif; Chiarle, Roberto; Skrzypczak, Magdalena; Ginalski, Krzysztof; Pasero, Philippe; Rowicka, Maga; Dikic, Ivan

    2013-04-01

    We present a genome-wide approach to map DNA double-strand breaks (DSBs) at nucleotide resolution by a method we termed BLESS (direct in situ breaks labeling, enrichment on streptavidin and next-generation sequencing). We validated and tested BLESS using human and mouse cells and different DSBs-inducing agents and sequencing platforms. BLESS was able to detect telomere ends, Sce endonuclease-induced DSBs and complex genome-wide DSB landscapes. As a proof of principle, we characterized the genomic landscape of sensitivity to replication stress in human cells, and we identified >2,000 nonuniformly distributed aphidicolin-sensitive regions (ASRs) overrepresented in genes and enriched in satellite repeats. ASRs were also enriched in regions rearranged in human cancers, with many cancer-associated genes exhibiting high sensitivity to replication stress. Our method is suitable for genome-wide mapping of DSBs in various cells and experimental conditions, with a specificity and resolution unachievable by current techniques.

  19. Rapid evaluation and quality control of next generation sequencing data with FaQCs

    DOE PAGES

    Lo, Chien -Chi; Chain, Patrick S. G.

    2014-12-01

    Background: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Results: Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly processmore » large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. Conclusion: FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.« less

  20. Rapid evaluation and quality control of next generation sequencing data with FaQCs

    SciTech Connect

    Lo, Chien -Chi; Chain, Patrick S. G.

    2014-12-01

    Background: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Results: Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly process large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. Conclusion: FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.

  1. Mapping Sensorimotor Sequences to Word Sequences: A Connectionist Model of Language Acquisition and Sentence Generation

    ERIC Educational Resources Information Center

    Takac, Martin; Benuskova, Lubica; Knott, Alistair

    2012-01-01

    In this article we present a neural network model of sentence generation. The network has both technical and conceptual innovations. Its main technical novelty is in its semantic representations: the messages which form the input to the network are structured as sequences, so that message elements are delivered to the network one at a time. Rather…

  2. Mapping Sensorimotor Sequences to Word Sequences: A Connectionist Model of Language Acquisition and Sentence Generation

    ERIC Educational Resources Information Center

    Takac, Martin; Benuskova, Lubica; Knott, Alistair

    2012-01-01

    In this article we present a neural network model of sentence generation. The network has both technical and conceptual innovations. Its main technical novelty is in its semantic representations: the messages which form the input to the network are structured as sequences, so that message elements are delivered to the network one at a time. Rather…

  3. Generation of control sequences for a pilot-disassembly system

    NASA Astrophysics Data System (ADS)

    Seliger, Guenther; Kim, Hyung-Ju; Keil, Thomas

    2002-02-01

    Closing the product and material cycles has emerged as a paradigm for industry in the 21st century. Disassembly plays a key role in a life cycle economy since it enables the recovery of resources. A partly automated disassembly system should adapt to a large variety of products and different degrees of devaluation. Also the amounts of products to be disassembled can vary strongly. To cope with these demands an approach to generate on-line disassembly control sequences will be presented. In order to react on these demands the technological feasibility is considered within a procedure for the generation of disassembly control sequences. Procedures are designed to find available and technologically feasible disassembly processes. The control system is formed by modularised and parameterised control units in the cell level within the entire control architecture. In the first development stage product and process analyses at the sample product washing machine were executed. Furthermore a generalized disassembly process was defined. Afterwards these processes were structured in primary and secondary functions. In the second stage the disassembly control at the technological level was investigated. Factors were the availability of the disassembly tools and the technological feasibility of the disassembly processes within the disassembly system. Technical alternative disassembly processes are determined as a result of availability of the tools and technological feasibility of processes. The fourth phase was the concept for the generation of the disassembly control sequences. The approach will be proved in a prototypical disassembly system.

  4. Random Sequence for Optimal Low-Power Laser Generated Ultrasound

    NASA Astrophysics Data System (ADS)

    Vangi, D.; Virga, A.; Gulino, M. S.

    2017-08-01

    Low-power laser generated ultrasounds are lately gaining importance in the research world, thanks to the possibility of investigating a mechanical component structural integrity through a non-contact and Non-Destructive Testing (NDT) procedure. The ultrasounds are, however, very low in amplitude, making it necessary to use pre-processing and post-processing operations on the signals to detect them. The cross-correlation technique is used in this work, meaning that a random signal must be used as laser input. For this purpose, a highly random and simple-to-create code called T sequence, capable of enhancing the ultrasound detectability, is introduced (not previously available at the state of the art). Several important parameters which characterize the T sequence can influence the process: the number of pulses Npulses , the pulse duration δ and the distance between pulses dpulses . A Finite Element FE model of a 3 mm steel disk has been initially developed to analytically study the longitudinal ultrasound generation mechanism and the obtainable outputs. Later, experimental tests have shown that the T sequence is highly flexible for ultrasound detection purposes, making it optimal to use high Npulses and δ but low dpulses . In the end, apart from describing all phenomena that arise in the low-power laser generation process, the results of this study are also important for setting up an effective NDT procedure using this technology.

  5. Vidjil: A Web Platform for Analysis of High-Throughput Repertoire Sequencing

    PubMed Central

    Duez, Marc; Herbert, Ryan; Rocher, Tatiana; Salson, Mikaël; Thonier, Florian

    2016-01-01

    Background The B and T lymphocytes are white blood cells playing a key role in the adaptive immunity. A part of their DNA, called the V(D)J recombinations, is specific to each lymphocyte, and enables recognition of specific antigenes. Today, with new sequencing techniques, one can get billions of DNA sequences from these regions. With dedicated Repertoire Sequencing (RepSeq) methods, it is now possible to picture population of lymphocytes, and to monitor more accurately the immune response as well as pathologies such as leukemia. Methods and Results Vidjil is an open-source platform for the interactive analysis of high-throughput sequencing data from lymphocyte recombinations. It contains an algorithm gathering reads into clonotypes according to their V(D)J junctions, a web application made of a sample, experiment and patient database and a visualization for the analysis of clonotypes along the time. Vidjil is implemented in C++, Python and Javascript and licensed under the GPLv3 open-source license. Source code, binaries and a public web server are available at http://www.vidjil.org and at http://bioinfo.lille.inria.fr/vidjil. Using the Vidjil web application consists of four steps: 1. uploading a raw sequence file (typically a FASTQ); 2. running RepSeq analysis software; 3. visualizing the results; 4. annotating the results and saving them for future use. For the end-user, the Vidjil web application needs no specific installation and just requires a connection and a modern web browser. Vidjil is used by labs in hematology or immunology for research and clinical applications. PMID:27835690

  6. Reducing ligation bias of small RNAs in libraries for next generation sequencing.

    PubMed

    Sorefan, Karim; Pais, Helio; Hall, Adam E; Kozomara, Ana; Griffiths-Jones, Sam; Moulton, Vincent; Dalmay, Tamas

    2012-05-30

    The use of nucleic acid-modifying enzymes has driven the rapid advancement in molecular biology. Understanding their function is important for modifying or improving their activity. However, functional analysis usually relies upon low-throughput experiments. Here we present a method for functional analysis of nucleic acid-modifying enzymes using next generation sequencing. We demonstrate that sequencing data of libraries generated by RNA ligases can reveal novel secondary structure preferences of these enzymes, which are used in small RNA cloning and library preparation for NGS. Using this knowledge we demonstrate that the cloning bias in small RNA libraries is RNA ligase-dependent. We developed a high definition (HD) protocol that reduces the RNA ligase-dependent cloning bias. The HD protocol doubled read coverage, is quantitative and found previously unidentified microRNAs. In addition, we show that microRNAs in miRBase are those preferred by the adapters of the main sequencing platform. Sequencing bias of small RNAs partially influenced which microRNAs have been studied in depth; therefore most previous small RNA profiling experiments should be re-evaluated. New microRNAs are likely to be found, which were selected against by existing adapters. Preference of currently used adapters towards known microRNAs suggests that the annotation of all existing small RNAs, including miRNAs, siRNAs and piRNAs, has been biased.

  7. TELP, a sensitive and versatile library construction method for next-generation sequencing

    PubMed Central

    Peng, Xu; Wu, Jingyi; Brunmeir, Reinhard; Kim, Sun-Yee; Zhang, Qiongyi; Ding, Chunming; Han, Weiping; Xie, Wei; Xu, Feng

    2015-01-01

    Next-generation sequencing has been widely used for the genome-wide profiling of histone modifications, transcription factor binding and gene expression through chromatin immunoprecipitated DNA sequencing (ChIP-seq) and cDNA sequencing (RNA-seq). Here, we describe a versatile library construction method that can be applied to both ChIP-seq and RNA-seq on the widely used Illumina platforms. Standard methods for ChIP-seq library construction require nanograms of starting DNA, substantially limiting its application to rare cell types or limited clinical samples. By minimizing the DNA purification steps that cause major sample loss, our method achieved a high sensitivity in ChIP-seq library preparation. Using this method, we achieved the following: (i) generated high-quality epigenomic and transcription factor-binding maps using ChIP-seq for murine adipocytes; (ii) successfully prepared a ChIP-seq library from as little as 25 pg of starting DNA; (iii) achieved paired-end sequencing of the ChIP-seq libraries; (iv) systematically profiled gene expression dynamics during murine adipogenesis using RNA-seq and (v) preserved the strand specificity of the transcripts in RNA-seq. Given its sensitivity and versatility in both double-stranded and single-stranded DNA library construction, this method has wide applications in genomic, epigenomic, transcriptomic and interactomic studies. PMID:25223787

  8. Rep-Seq: uncovering the immunological repertoire through next-generation sequencing

    PubMed Central

    Benichou, Jennifer; Ben-Hamo, Rotem; Louzoun, Yoram; Efroni, Sol

    2012-01-01

    Recent scientific discoveries fuelled by the application of next-generation DNA and RNA sequencing technologies highlight the striking impact of these platforms in characterizing multiple aspects in genomics research. This technology has been used in the study of the B-cell and T-cell receptor repertoire. The novelty of immunosequencing comes from the recent rapid development of techniques and the exponential reduction in cost of sequencing. Here, we describe some of the technologies, which we collectively refer to as Rep-Seq (repertoire sequencing), to portray achievements in the field and to present the essential and inseparable role of next-generation sequencing to the understanding of entities in immune response. The large Rep-Seq data sets that should be available in the near future call for new computational algorithms to segue the transition from ‘classic’ molecular-based analysis to system-wide analysis. The combination of new algorithms with high-throughput data will form the basis for possible new clinical implications in personalized medicine and deeper understanding of immune behaviour and immune response. PMID:22043864

  9. Quantifying population genetic differentiation from next-generation sequencing data.

    PubMed

    Fumagalli, Matteo; Vieira, Filipe G; Korneliussen, Thorfinn Sand; Linderoth, Tyler; Huerta-Sánchez, Emilia; Albrechtsen, Anders; Nielsen, Rasmus

    2013-11-01

    Over the past few years, new high-throughput DNA sequencing technologies have dramatically increased speed and reduced sequencing costs. However, the use of these sequencing technologies is often challenged by errors and biases associated with the bioinformatical methods used for analyzing the data. In particular, the use of naïve methods to identify polymorphic sites and infer genotypes can inflate downstream analyses. Recently, explicit modeling of genotype probability distributions has been proposed as a method for taking genotype call uncertainty into account. Based on this idea, we propose a novel method for quantifying population genetic differentiation from next-generation sequencing data. In addition, we present a strategy for investigating population structure via principal components analysis. Through extensive simulations, we compare the new method herein proposed to approaches based on genotype calling and demonstrate a marked improvement in estimation accuracy for a wide range of conditions. We apply the method to a large-scale genomic data set of domesticated and wild silkworms sequenced at low coverage. We find that we can infer the fine-scale genetic structure of the sampled individuals, suggesting that employing this new method is useful for investigating the genetic relationships of populations sampled at low coverage.

  10. Evaluation of GS Junior and MiSeq next-generation sequencing technologies as an alternative to Trugene population sequencing in the clinical HIV laboratory.

    PubMed

    Ram, Daniela; Leshkowitz, Dena; Gonzalez, Dimitri; Forer, Relly; Levy, Itzchak; Chowers, Michal; Lorber, Margalit; Hindiyeh, Musa; Mendelson, Ella; Mor, Orna

    2015-02-01

    Population HIV-1 sequencing is currently the method of choice for the identification and follow-up of HIV-1 antiretroviral drug resistance. It has limited sensitivity and results in a consensus sequence showing the most prevalent nucleotide per position. Moreover concomitant sequencing and interpretation of the results for several samples together is laborious and time consuming. In this study, the practical use of GS Junior and MiSeq bench-top next generation sequencing (NGS) platforms as an alternative to Trugene Sanger-based population sequencing in the clinical HIV laboratory was assessed. DeepChek(®)-HIV TherapyEdge software was used for processing all the protease and reverse transcriptase sequences and for resistance interpretation. Plasma samples from nine HIV-1 carriers, representing the major HIV-1 subtypes in Israel, were compared. The total number of amino acid substitutions identified in the nine samples by GS Junior (232 substitutions) and MiSeq (243 substitutions) was similar and higher than Trugene (181 substitutions), emphasizing the advantage of deep sequencing on population sequencing. More than 80% of the identified substitutions were identical between the GS Junior and MiSeq platforms, most of which (184 of 199) at similar frequency. Low abundance substitutions accounted for 20.9% of the MiSeq and 21.9% of the GS Junior output, the majority of which were not detected by Trugene. More drug resistance mutations were identified by both the NGS platforms, primarily, but not only, at low abundance. In conclusion, in combination with DeepChek, both GS Junior and MiSeq were found to be more sensitive than Trugene and adequate for HIV-1 resistance analysis in the clinical HIV laboratory.

  11. Continuous flow generation of magnetoliposomes in a low-cost portable microfluidic platform.

    PubMed

    Conde, Alvaro J; Batalla, Milena; Cerda, Belén; Mykhaylyk, Olga; Plank, Christian; Podhajcer, Osvaldo; Cabaleiro, Juan M; Madrid, Rossana E; Policastro, Lucia

    2014-12-07

    We present a low-cost, portable microfluidic platform that uses laminated polymethylmethacrylate chips, peristaltic micropumps and LEGO® Mindstorms components for the generation of magnetoliposomes that does not require extrusion steps. Mixtures of lipids reconstituted in ethanol and an aqueous phase were injected independently in order to generate a combination of laminar flows in such a way that we could effectively achieve four hydrodynamic focused nanovesicle generation streams. Monodisperse magnetoliposomes with characteristics comparable to those obtained by traditional methods have been obtained. The magnetoliposomes are responsive to external magnetic field gradients, a result that suggests that the nanovesicles can be used in research and applications in nanomedicine.

  12. Next-generation sequencing techniques for eukaryotic microorganisms: sequencing-based solutions to biological problems.

    PubMed

    Nowrousian, Minou

    2010-09-01

    Over the past 5 years, large-scale sequencing has been revolutionized by the development of several so-called next-generation sequencing (NGS) technologies. These have drastically increased the number of bases obtained per sequencing run while at the same time decreasing the costs per base. Compared to Sanger sequencing, NGS technologies yield shorter read lengths; however, despite this drawback, they have greatly facilitated genome sequencing, first for prokaryotic genomes and within the last year also for eukaryotic ones. This advance was possible due to a concomitant development of software that allows the de novo assembly of draft genomes from large numbers of short reads. In addition, NGS can be used for metagenomics studies as well as for the detection of sequence variations within individual genomes, e.g., single-nucleotide polymorphisms (SNPs), insertions/deletions (indels), or structural variants. Furthermore, NGS technologies have quickly been adopted for other high-throughput studies that were previously performed mostly by hybridization-based methods like microarrays. This includes the use of NGS for transcriptomics (RNA-seq) or the genome-wide analysis of DNA/protein interactions (ChIP-seq). This review provides an overview of NGS technologies that are currently available and the bioinformatics analyses that are necessary to obtain information from the flood of sequencing data as well as applications of NGS to address biological questions in eukaryotic microorganisms.

  13. Efficient and sensitive identification and quantification of airborne pollen using next-generation DNA sequencing.

    PubMed

    Kraaijeveld, Ken; de Weger, Letty A; Ventayol García, Marina; Buermans, Henk; Frank, Jeroen; Hiemstra, Pieter S; den Dunnen, Johan T

    2015-01-01

    Pollen monitoring is an important and widely used tool in allergy research and creation of awareness in pollen-allergic patients. Current pollen monitoring methods are microscope-based, labour intensive and cannot identify pollen to the genus level in some relevant allergenic plant groups. Therefore, a more efficient, cost-effective and sensitive method is needed. Here, we present a method for identification and quantification of airborne pollen using DNA sequencing. Pollen is collected from ambient air using standard techniques. DNA is extracted from the collected pollen, and a fragment of the chloroplast gene trnL is amplified using PCR. The PCR product is subsequently sequenced on a next-generation sequencing platform (Ion Torrent). Amplicon molecules are sequenced individually, allowing identification of different sequences from a mixed sample. We show that this method provides an accurate qualitative and quantitative view of the species composition of samples of airborne pollen grains. We also show that it correctly identifies the individual grass genera present in a mixed sample of grass pollen, which cannot be achieved using microscopic pollen identification. We conclude that our method is more efficient and sensitive than current pollen monitoring techniques and therefore has the potential to increase the throughput of pollen monitoring.

  14. Applications of next-generation sequencing to phylogeography and phylogenetics.

    PubMed

    McCormack, John E; Hird, Sarah M; Zellmer, Amanda J; Carstens, Bryan C; Brumfield, Robb T

    2013-02-01

    This is a time of unprecedented transition in DNA sequencing technologies. Next-generation sequencing (NGS) clearly holds promise for fast and cost-effective generation of multilocus sequence data for phylogeography and phylogenetics. However, the focus on non-model organisms, in addition to uncertainty about which sample preparation methods and analyses are appropriate for different research questions and evolutionary timescales, have contributed to a lag in the application of NGS to these fields. Here, we outline some of the major obstacles specific to the application of NGS to phylogeography and phylogenetics, including the focus on non-model organisms, the necessity of obtaining orthologous loci in a cost-effective manner, and the predominate use of gene trees in these fields. We describe the most promising methods of sample preparation that address these challenges. Methods that reduce the genome by restriction digest and manual size selection are most appropriate for studies at the intraspecific level, whereas methods that target specific genomic regions (i.e., target enrichment or sequence capture) have wider applicability from the population level to deep-level phylogenomics. Additionally, we give an overview of how to analyze NGS data to arrive at data sets applicable to the standard toolkit of phylogeography and phylogenetics, including initial data processing to alignment and genotype calling (both SNPs and loci involving many SNPs). Even though whole-genome sequencing is likely to become affordable rather soon, because phylogeography and phylogenetics rely on analysis of hundreds of individuals in many cases, methods that reduce the genome to a subset of loci should remain more cost-effective for some time to come. Copyright © 2011 Elsevier Inc. All rights reserved.

  15. Genomic Selection in the Era of Next Generation Sequencing for Complex Traits in Plant Breeding

    PubMed Central

    Bhat, Javaid A.; Ali, Sajad; Salgotra, Romesh K.; Mir, Zahoor A.; Dutta, Sutapa; Jadon, Vasudha; Tyagi, Anshika; Mushtaq, Muntazir; Jain, Neelu; Singh, Pradeep K.; Singh, Gyanendra P.; Prabhu, K. V.

    2016-01-01

    Genomic selection (GS) is a promising approach exploiting molecular genetic markers to design novel breeding programs and to develop new markers-based models for genetic evaluation. In plant breeding, it provides opportunities to increase genetic gain of complex traits per unit time and cost. The cost-benefit balance was an important consideration for GS to work in crop plants. Availability of genome-wide high-throughput, cost-effective and flexible markers, having low ascertainment bias, suitable for large population size as well for both model and non-model crop species with or without the reference genome sequence was the most important factor for its successful and effective implementation in crop species. These factors were the major limitations to earlier marker systems viz., SSR and array-based, and was unimaginable before the availability of next-generation sequencing (NGS) technologies which have provided novel SNP genotyping platforms especially the genotyping by sequencing. These marker technologies have changed the entire scenario of marker applications and made the use of GS a routine work for crop improvement in both model and non-model crop species. The NGS-based genotyping have increased genomic-estimated breeding value prediction accuracies over other established marker platform in cereals and other crop species, and made the dream of GS true in crop breeding. But to harness the true benefits from GS, these marker technologies will be combined with high-throughput phenotyping for achieving the valuable genetic gain from complex traits. Moreover, the continuous decline in sequencing cost will make the WGS feasible and cost effective for GS in near future. Till that time matures the targeted sequencing seems to be more cost-effective option for large scale marker discovery and GS, particularly in case of large and un-decoded genomes. PMID:28083016

  16. Next generation sequencing improves detection of drug resistance mutations in infants after PMTCT failure.

    PubMed

    Fisher, Randall G; Smith, Davey M; Murrell, Ben; Slabbert, Ruhan; Kirby, Bronwyn M; Edson, Clair; Cotton, Mark F; Haubrich, Richard H; Kosakovsky Pond, Sergei L; Van Zyl, Gert U

    2015-01-01

    Next generation sequencing (NGS) allows the detection of minor variant HIV drug resistance mutations (DRMs). However data from new NGS platforms after Prevention-of-Mother-to-Child-Transmission (PMTCT) regimen failure are limited. To compare major and minor variant HIV DRMs with Illumina MiSeq and Life Technologies Ion Personal Genome Machine (PGM) in infants infected despite a PMTCT regimen. We conducted a cross-sectional study of NGS for detecting DRMs in infants infected despite a zidovudine (AZT) and Nevirapine (NVP) regimen, before initiation of combination antiretroviral therapy. Sequencing was performed on PCR products from plasma samples on PGM and MiSeq platforms. Bioinformatic analyses were undertaken using a codon-aware version of the Smith-Waterman mapping algorithm and a mixture multinomial error filtering statistical model. Of 15 infants, tested at a median age of 3.4 months after birth, 2 (13%) had non-nucleoside reverse transcriptase inhibitor (NNRTI) DRMs (K103N and Y181C) by bulk sequencing, whereas PGM detected 4 (26%) and MiSeq 5 (30%). NGS enabled the detection of additional minor variant DRMs in the infant with K103N. Coverage and instrument quality scores were higher with MiSeq, increasing the confidence of minor variant calls. NGS followed by bioinformatic analyses detected multiple minor variant DRMs in HIV-1 RT among infants where PMTCT failed. The high coverage of MiSeq and high read quality improved the confidence of identified DRMs and may make this platform ideal for minor variant detection. Copyright © 2014 Elsevier B.V. All rights reserved.

  17. Development and Evaluation of a Panel of Filovirus Sequence Capture Probes for Pathogen Detection by Next-Generation Sequencing

    PubMed Central

    Koehler, Jeffrey W.; Hall, Adrienne T.; Rolfe, P. Alexander; Honko, Anna N.; Palacios, Gustavo F.; Fair, Joseph N.; Muyembe, Jean-Jacques; Mulembekani, Prime; Schoepp, Randal J.; Adesokan, Adeyemi; Minogue, Timothy D.

    2014-01-01

    A detailed understanding of the circulating pathogens in a particular geographic location aids in effectively utilizing targeted, rapid diagnostic assays, thus allowing for appropriate therapeutic and containment procedures. This is especially important in regions prevalent for highly pathogenic viruses co-circulating with other endemic pathogens such as the malaria parasite. The importance of biosurveillance is highlighted by the ongoing Ebola virus disease outbreak in West Africa. For example, a more comprehensive assessment of the regional pathogens could have identified the risk of a filovirus disease outbreak earlier and led to an improved diagnostic and response capacity in the region. In this context, being able to rapidly screen a single sample for multiple pathogens in a single tube reaction could improve both diagnostics as well as pathogen surveillance. Here, probes were designed to capture identifying filovirus sequence for the ebolaviruses Sudan, Ebola, Reston, Taï Forest, and Bundibugyo and the Marburg virus variants Musoke, Ci67, and Angola. These probes were combined into a single probe panel, and the captured filovirus sequence was successfully identified using the MiSeq next-generation sequencing platform. This panel was then used to identify the specific filovirus from nonhuman primates experimentally infected with Ebola virus as well as Bundibugyo virus in human sera samples from the Democratic Republic of the Congo, thus demonstrating the utility for pathogen detection using clinical samples. While not as sensitive and rapid as real-time PCR, this panel, along with incorporating additional sequence capture probe panels, could be used for broad pathogen screening and biosurveillance. PMID:25207553

  18. Generating animated sequences from 3D whole-body scans

    NASA Astrophysics Data System (ADS)

    Pargas, Roy P.; Chhatriwala, Murtuza; Mulfinger, Daniel; Deshmukh, Pushkar; Vadhiyar, Sathish

    1999-03-01

    3D images of human subjects are, today, easily obtained using 3D wholebody scanners. 3D human images can provide static information about the physical characteristics of a person, information valuable to professionals such as clothing designers, anthropometrists, medical doctors, physical therapists, athletic trainers, and sculptors. Can 3D human images can be used to provide e more than static physical information. This research described in this paper attempts to answer the question by explaining a way that animated sequences may be generated from a single 3D scan. The process stars by subdividing the human image into segments and mapping the segments to those of a human model defined in a human-motion simulation package. The simulation software provides information used to display movement of the human image. Snapshots of the movement are captured and assembled to create an animated sequence. All of the postures and motion of the human images come from a single 3D scan. This paper describes the process involved in animating human figures from static 3D wholebody scans, presents an example of a generated animated sequence, and discusses possible applications of this approach.

  19. Unraveling genomic variation from next generation sequencing data

    PubMed Central

    2013-01-01

    Elucidating the content of a DNA sequence is critical to deeper understand and decode the genetic information for any biological system. As next generation sequencing (NGS) techniques have become cheaper and more advanced in throughput over time, great innovations and breakthrough conclusions have been generated in various biological areas. Few of these areas, which get shaped by the new technological advances, involve evolution of species, microbial mapping, population genetics, genome-wide association studies (GWAs), comparative genomics, variant analysis, gene expression, gene regulation, epigenetics and personalized medicine. While NGS techniques stand as key players in modern biological research, the analysis and the interpretation of the vast amount of data that gets produced is a not an easy or a trivial task and still remains a great challenge in the field of bioinformatics. Therefore, efficient tools to cope with information overload, tackle the high complexity and provide meaningful visualizations to make the knowledge extraction easier are essential. In this article, we briefly refer to the sequencing methodologies and the available equipment to serve these analyses and we describe the data formats of the files which get produced by them. We conclude with a thorough review of tools developed to efficiently store, analyze and visualize such data with emphasis in structural variation analysis and comparative genomics. We finally comment on their functionality, strengths and weaknesses and we discuss how future applications could further develop in this field. PMID:23885890

  20. Unraveling genomic variation from next generation sequencing data.

    PubMed

    Pavlopoulos, Georgios A; Oulas, Anastasis; Iacucci, Ernesto; Sifrim, Alejandro; Moreau, Yves; Schneider, Reinhard; Aerts, Jan; Iliopoulos, Ioannis

    2013-07-25

    Elucidating the content of a DNA sequence is critical to deeper understand and decode the genetic information for any biological system. As next generation sequencing (NGS) techniques have become cheaper and more advanced in throughput over time, great innovations and breakthrough conclusions have been generated in various biological areas. Few of these areas, which get shaped by the new technological advances, involve evolution of species, microbial mapping, population genetics, genome-wide association studies (GWAs), comparative genomics, variant analysis, gene expression, gene regulation, epigenetics and personalized medicine. While NGS techniques stand as key players in modern biological research, the analysis and the interpretation of the vast amount of data that gets produced is a not an easy or a trivial task and still remains a great challenge in the field of bioinformatics. Therefore, efficient tools to cope with information overload, tackle the high complexity and provide meaningful visualizations to make the knowledge extraction easier are essential. In this article, we briefly refer to the sequencing methodologies and the available equipment to serve these analyses and we describe the data formats of the files which get produced by them. We conclude with a thorough review of tools developed to efficiently store, analyze and visualize such data with emphasis in structural variation analysis and comparative genomics. We finally comment on their functionality, strengths and weaknesses and we discuss how future applications could further develop in this field.

  1. Next generation sequencing: new tools in immunology and hematology.

    PubMed

    Mori, Antonio; Deola, Sara; Xumerle, Luciano; Mijatovic, Vladan; Malerba, Giovanni; Monsurrò, Vladia

    2013-12-01

    One of the hallmarks of the adaptive immune system is the specificity of B and T cell receptors. Thanks to somatic recombination, a large repertoire of receptors can be generated within an individual that guarantee the recognition of a vast number of antigens. Monoclonal antibodies have limited applicability, given the high degree of diversity among these receptors, in BCR and TCR monitoring. Furthermore, with regard to cancer, better characterization of complex genomes and the ability to monitor tumor-specific cryptic mutations or translocations are needed to develop better tailored therapies. Novel technologies, by enhancing the ability of BCR and TCR monitoring, can help in the search for minimal residual disease during hematological malignancy diagnosis and follow-up, and can aid in improving bone marrow transplantation techniques. Recently, a novel technology known as next generation sequencing has been developed; this allows the recognition of unique sequences and provides depth of coverage, heterogeneity, and accuracy of sequencing. This provides a powerful tool that, along with microarray analysis for gene expression, may become integral in resolving the remaining key problems in hematology. This review describes the state of the art of this novel technology, its application in the immunological and hematological fields, and the possible benefits it will provide for the hematology and immunology community.

  2. Analysis of Metagenomics Next Generation Sequence Data for Fungal ITS Barcoding: Do You Need Advance Bioinformatics Experience?

    PubMed

    Ahmed, Abdalla

    2016-01-01

    During the last few decades, most of microbiology laboratories have become familiar in analyzing Sanger sequence data for ITS barcoding. However, with the availability of next-generation sequencing platforms in many centers, it has become important for medical mycologists to know how to make sense of the massive sequence data generated by these new sequencing technologies. In many reference laboratories, the analysis of such data is not a big deal, since suitable IT infrastructure and well-trained bioinformatics scientists are always available. However, in small research laboratories and clinical microbiology laboratories the availability of such resources are always lacking. In this report, simple and user-friendly bioinformatics work-flow is suggested for fast and reproducible ITS barcoding of fungi.

  3. Next-generation sequencing for diagnosis of rare diseases in the neonatal intensive care unit

    PubMed Central

    Daoud, Hussein; Luco, Stephanie M.; Li, Rui; Bareke, Eric; Beaulieu, Chandree; Jarinova, Olga; Carson, Nancy; Nikkel, Sarah M.; Graham, Gail E.; Richer, Julie; Armour, Christine; Bulman, Dennis E.; Chakraborty, Pranesh; Geraghty, Michael; Lines, Matthew A.; Lacaze-Masmonteil, Thierry; Majewski, Jacek; Boycott, Kym M.; Dyment, David A.

    2016-01-01

    Background: Rare diseases often present in the first days and weeks of life and may require complex management in the setting of a neonatal intensive care unit (NICU). Exhaustive consultations and traditional genetic or metabolic investigations are costly and often fail to arrive at a final diagnosis when no recognizable syndrome is suspected. For this pilot project, we assessed the feasibility of next-generation sequencing as a tool to improve the diagnosis of rare diseases in newborns in the NICU. Methods: We retrospectively identified and prospectively recruited newborns and infants admitted to the NICU of the Children’s Hospital of Eastern Ontario and the Ottawa Hospital, General Campus, who had been referred to the medical genetics or metabolics inpatient consult service and had features suggesting an underlying genetic or metabolic condition. DNA from the newborns and parents was enriched for a panel of clinically relevant genes and sequenced on a MiSeq sequencing platform (Illumina Inc.). The data were interpreted with a standard informatics pipeline and reported to care providers, who assessed the importance of genotype–phenotype correlations. Results: Of 20 newborns studied, 8 received a diagnosis on the basis of next-generation sequencing (diagnostic rate 40%). The diagnoses were renal tubular dysgenesis, SCN1A-related encephalopathy syndrome, myotubular myopathy, FTO deficiency syndrome, cranioectodermal dysplasia, congenital myasthenic syndrome, autosomal dominant intellectual disability syndrome type 7 and Denys–Drash syndrome. Interpretation: This pilot study highlighted the potential of next-generation sequencing to deliver molecular diagnoses rapidly with a high success rate. With broader use, this approach has the potential to alter health care delivery in the NICU. PMID:27241786

  4. Suppression Subtractive Hybridization Versus Next-Generation Sequencing in Plant Genetic Engineering: Challenges and Perspectives.

    PubMed

    Sahebi, Mahbod; Hanafi, Mohamed M; Azizi, Parisa; Hakim, Abdul; Ashkani, Sadegh; Abiri, Rambod

    2015-10-01

    Suppression subtractive hybridization (SSH) is an effective method to identify different genes with different expression levels involved in a variety of biological processes. This method has often been used to study molecular mechanisms of plants in complex relationships with different pathogens and a variety of biotic stresses. Compared to other techniques used in gene expression profiling, SSH needs relatively smaller amounts of the initial materials, with lower costs, and fewer false positives present within the results. Extraction of total RNA from plant species rich in phenolic compounds, carbohydrates, and polysaccharides that easily bind to nucleic acids through cellular mechanisms is difficult and needs to be considered. Remarkable advancement has been achieved in the next-generation sequencing (NGS) field. As a result of progress within fields related to molecular chemistry and biology as well as specialized engineering, parallelization in the sequencing reaction has exceptionally enhanced the overall read number of generated sequences per run. Currently available sequencing platforms support an earlier unparalleled view directly into complex mixes associated with RNA in addition to DNA samples. NGS technology has demonstrated the ability to sequence DNA with remarkable swiftness, therefore allowing previously unthinkable scientific accomplishments along with novel biological purposes. However, the massive amounts of data generated by NGS impose a substantial challenge with regard to data safe-keeping and analysis. This review examines some simple but vital points involved in preparing the initial material for SSH and introduces this method as well as its associated applications to detect different novel genes from different plant species. This review evaluates general concepts, basic applications, plus the probable results of NGS technology in genomics, with unique mention of feasible potential tools as well as bioinformatics.

  5. A prototypic microfluidic platform generating stepwise concentration gradients for real-time study of cell apoptosis.

    PubMed

    Dai, Wen; Zheng, Yizhe; Luo, Kathy Qian; Wu, Hongkai

    2010-04-16

    This work describes the development of a prototypic microfluidic platform for the generation of stepwise concentration gradients of drugs. A sensitive apoptotic analysis method is integrated into this microfluidic system for studying apoptosis of HeLa cells under the influence of anticancer drug, etoposide, with various concentrations in parallel; it measures the yellow fluorescent proteincyan fluorescent protein fluorescence resonance energy transfer (FRET) signal that responds to the activation of caspase-3, an indicator of cell apoptosis. Sets of microfluidic valves on the chip generate stepwise concentration gradient of etoposide in various cell-culture microchambers. The FRET signals from multiple chambers are simultaneously monitored under a fluorescent microscope for long-time observation and the on-chip results are compared with those from 96-well plate study and the methylthiazolyldiphenyl-tetrazolium bromide (MTT) assay. The microfluidic platform shows several advantages including high-throughput capacity, low drug consumption, and high sensitivity.

  6. Molecular Characterization of Transgenic Events Using Next Generation Sequencing Approach

    PubMed Central

    Mammadov, Jafar; Ye, Liang; Soe, Khaing; Richey, Kimberly; Cruse, James; Zhuang, Meibao; Gao, Zhifang; Evans, Clive; Rounsley, Steve; Kumpatla, Siva P.

    2016-01-01

    Demand for the commercial use of genetically modified (GM) crops has been increasing in light of the projected growth of world population to nine billion by 2050. A prerequisite of paramount importance for regulatory submissions is the rigorous safety assessment of GM crops. One of the components of safety assessment is molecular characterization at DNA level which helps to determine the copy number, integrity and stability of a transgene; characterize the integration site within a host genome; and confirm the absence of vector DNA. Historically, molecular characterization has been carried out using Southern blot analysis coupled with Sanger sequencing. While this is a robust approach to characterize the transgenic crops, it is both time- and resource-consuming. The emergence of next-generation sequencing (NGS) technologies has provided highly sensitive and cost- and labor-effective alternative for molecular characterization compared to traditional Southern blot analysis. Herein, we have demonstrated the successful application of both whole genome sequencing and target capture sequencing approaches for the characterization of single and stacked transgenic events and compared the results and inferences with traditional method with respect to key criteria required for regulatory submissions. PMID:26908260

  7. BING: biomedical informatics pipeline for Next Generation Sequencing.

    PubMed

    Kriseman, Jeffrey; Busick, Christopher; Szelinger, Szabolcs; Dinu, Valentin

    2010-06-01

    High throughput parallel genomic sequencing (Next Generation Sequencing, NGS) shifts the bottleneck in sequencing processes from experimental data production to computationally intensive informatics-based data analysis. This manuscript introduces a biomedical informatics pipeline (BING) for the analysis of NGS data that offers several novel computational approaches to 1. image alignment, 2. signal correlation, compensation, separation, and pixel-based cluster registration, 3. signal measurement and base calling, 4. quality control and accuracy measurement. These approaches address many of the informatics challenges, including image processing, computational performance, and accuracy. These new algorithms are benchmarked against the Illumina Genome Analysis Pipeline. BING is the one of the first software tools to perform pixel-based analysis of NGS data. When compared to the Illumina informatics tool, BING's pixel-based approach produces a significant increase in the number of sequence reads, while reducing the computational time per experiment and error rate (<2%). This approach has the potential of increasing the density and throughput of NGS technologies.

  8. Light-generated oligonucleotide arrays for rapid DNA sequence analysis.

    PubMed Central

    Pease, A C; Solas, D; Sullivan, E J; Cronin, M T; Holmes, C P; Fodor, S P

    1994-01-01

    In many areas of molecular biology there is a need to rapidly extract and analyze genetic information; however, current technologies for DNA sequence analysis are slow and labor intensive. We report here how modern photolithographic techniques can be used to facilitate sequence analysis by generating miniaturized arrays of densely packed oligonucleotide probes. These probe arrays, or DNA chips, can then be applied to parallel DNA hybridization analysis, directly yielding sequence information. In a preliminary experiment, a 1.28 x 1.28 cm array of 256 different octanucleotides was produced in 16 chemical reaction cycles, requiring 4 hr to complete. The hybridization pattern of fluorescently labeled oligonucleotide targets was then detected by epifluorescence microscopy. The fluorescence signals from complementary probes were 5-35 times stronger than those with single or double base-pair hybridization mismatches, demonstrating specificity in the identification of complementary sequences. This method should prove to be a powerful tool for rapid investigations in human genetics and diagnostics, pathogen detection, and DNA molecular recognition. Images PMID:8197176

  9. Trends in Next-Generation Sequencing and a New Era for Whole Genome Sequencing

    PubMed Central

    2016-01-01

    This article is a mini-review that provides a general overview for next-generation sequencing (NGS) and introduces one of the most popular NGS applications, whole genome sequencing (WGS), developed from the expansion of human genomics. NGS technology has brought massively high throughput sequencing data to bear on research questions, enabling a new era of genomic research. Development of bioinformatic software for NGS has provided more opportunities for researchers to use various applications in genomic fields. De novo genome assembly and large scale DNA resequencing to understand genomic variations are popular genomic research tools for processing a tremendous amount of data at low cost. Studies on transcriptomes are now available, from previous-hybridization based microarray methods. Epigenetic studies are also available with NGS applications such as whole genome methylation sequencing and chromatin immunoprecipitation followed by sequencing. Human genetics has faced a new paradigm of research and medical genomics by sequencing technologies since the Human Genome Project. The trend of NGS technologies in human genomics has brought a new era of WGS by enabling the building of human genomes databases and providing appropriate human reference genomes, which is a necessary component of personalized medicine and precision medicine. PMID:27915479

  10. Trends in Next-Generation Sequencing and a New Era for Whole Genome Sequencing.

    PubMed

    Park, Sang Tae; Kim, Jayoung

    2016-11-01

    This article is a mini-review that provides a general overview for next-generation sequencing (NGS) and introduces one of the most popular NGS applications, whole genome sequencing (WGS), developed from the expansion of human genomics. NGS technology has brought massively high throughput sequencing data to bear on research questions, enabling a new era of genomic research. Development of bioinformatic software for NGS has provided more opportunities for researchers to use various applications in genomic fields. De novo genome assembly and large scale DNA resequencing to understand genomic variations are popular genomic research tools for processing a tremendous amount of data at low cost. Studies on transcriptomes are now available, from previous-hybridization based microarray methods. Epigenetic studies are also available with NGS applications such as whole genome methylation sequencing and chromatin immunoprecipitation followed by sequencing. Human genetics has faced a new paradigm of research and medical genomics by sequencing technologies since the Human Genome Project. The trend of NGS technologies in human genomics has brought a new era of WGS by enabling the building of human genomes databases and providing appropriate human reference genomes, which is a necessary component of personalized medicine and precision medicine.

  11. rMotifGen: random motif generator for DNA and protein sequences.

    PubMed

    Rouchka, Eric C; Hardin, C Timothy

    2007-08-07

    Detection of short, subtle conserved motif regions within a set of related DNA or amino acid sequences can lead to discoveries about important regulatory domains such as transcription factor and DNA binding sites as well as conserved protein domains. In order to help assess motif detection algorithms on motifs with varying properties and levels of conservation, we have developed a computational tool, rMotifGen, with the sole purpose of generating a number of random DNA or protein sequences containing short sequence motifs. Each motif consensus can be user-defined, randomly generated, or created from a position-specific scoring matrix (PSSM). Insertions and mutations within these motifs are created according to user-defined parameters and substitution matrices. The resulting sequences can be helpful in mutational simulations and in testing the limits of motif detection algorithms. Two implementations of rMotifGen have been created, one providing a graphical user interface (GUI) for random motif construction, and the other serving as a command line interface. The second implementation has the added advantages of platform independence and being able to be called in a batch mode. rMotifGen was used to construct sample sets of sequences containing DNA motifs and amino acid motifs that were then tested against the Gibbs sampler and MEME packages. rMotifGen provides an efficient and convenient method for creating random DNA or amino acid sequences with a variable number of motifs, where the instance of each motif can be incorporated using a position-specific scoring matrix (PSSM) or by creating an instance mutated from its corresponding consensus using an evolutionary model based on substitution matrices. rMotifGen is freely available at: http://bioinformatics.louisville.edu/brg/rMotifGen/.

  12. rMotifGen: random motif generator for DNA and protein sequences

    PubMed Central

    Rouchka, Eric C; Hardin, C Timothy

    2007-01-01

    Background Detection of short, subtle conserved motif regions within a set of related DNA or amino acid sequences can lead to discoveries about important regulatory domains such as transcription factor and DNA binding sites as well as conserved protein domains. In order to help assess motif detection algorithms on motifs with varying properties and levels of conservation, we have developed a computational tool, rMotifGen, with the sole purpose of generating a number of random DNA or protein sequences containing short sequence motifs. Each motif consensus can be user-defined, randomly generated, or created from a position-specific scoring matrix (PSSM). Insertions and mutations within these motifs are created according to user-defined parameters and substitution matrices. The resulting sequences can be helpful in mutational simulations and in testing the limits of motif detection algorithms. Results Two implementations of rMotifGen have been created, one providing a graphical user interface (GUI) for random motif construction, and the other serving as a command line interface. The second implementation has the added advantages of platform independence and being able to be called in a batch mode. rMotifGen was used to construct sample sets of sequences containing DNA motifs and amino acid motifs that were then tested against the Gibbs sampler and MEME packages. Conclusion rMotifGen provides an efficient and convenient method for creating random DNA or amino acid sequences with a variable number of motifs, where the instance of each motif can be incorporated using a position-specific scoring matrix (PSSM) or by creating an instance mutated from its corresponding consensus using an evolutionary model based on substitution matrices. rMotifGen is freely available at: . PMID:17683637

  13. BamView: visualizing and interpretation of next-generation sequencing read alignments

    PubMed Central

    Harris, Simon R.; Otto, Thomas D.; Berriman, Matthew; Parkhill, Julian; McQuillan, Jacqueline A.

    2013-01-01

    So-called next-generation sequencing (NGS) has provided the ability to sequence on a massive scale at low cost, enabling biologists to perform powerful experiments and gain insight into biological processes. BamView has been developed to visualize and analyse sequence reads from NGS platforms, which have been aligned to a reference sequence. It is a desktop application for browsing the aligned or mapped reads [Ruffalo, M, LaFramboise, T, Koyutürk, M. Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics 2011;27:2790–6] at different levels of magnification, from nucleotide level, where the base qualities can be seen, to genome or chromosome level where overall coverage is shown. To enable in-depth investigation of NGS data, various views are provided that can be configured to highlight interesting aspects of the data. Multiple read alignment files can be overlaid to compare results from different experiments, and filters can be applied to facilitate the interpretation of the aligned reads. As well as being a standalone application it can be used as an integrated part of the Artemis genome browser, BamView allows the user to study NGS data in the context of the sequence and annotation of the reference genome. Single nucleotide polymorphism (SNP) density and candidate SNP sites can be highlighted and investigated, and read-pair information can be used to discover large structural insertions and deletions. The application will also calculate simple analyses of the read mapping, including reporting the read counts and reads per kilobase per million mapped reads (RPKM) for genes selected by the user. Availability: BamView and Artemis are freely available software. These can be downloaded from their home pages: http://bamview.sourceforge.net/; http://www.sanger.ac.uk/resources/software/artemis/. Requirements: Java 1.6 or higher. PMID:22253280

  14. Drug resistance analysis by next generation sequencing in Leishmania

    PubMed Central

    Leprohon, Philippe; Fernandez-Prada, Christopher; Gazanion, Élodie; Monte-Neto, Rubens; Ouellette, Marc

    2014-01-01

    The use of next generation sequencing has the power to expedite the identification of drug resistance determinants and biomarkers and was applied successfully to drug resistance studies in Leishmania. This allowed the identification of modulation in gene expression, gene dosage alterations, changes in chromosome copy numbers and single nucleotide polymorphisms that correlated with resistance in Leishmania strains derived from the laboratory and from the field. An impressive heterogeneity at the population level was also observed, individual clones within populations often differing in both genotypes and phenotypes, hence complicating the elucidation of resistance mechanisms. This review summarizes the most recent highlights that whole genome sequencing brought to our understanding of Leishmania drug resistance and likely new directions. PMID:25941624

  15. Next generation sequencing--implications for clinical practice.

    PubMed

    Raffan, Eleanor; Semple, Robert K

    2011-01-01

    Genetic testing in inherited disease has traditionally relied upon recognition of the presenting clinical syndrome and targeted analysis of genes known to be linked to that syndrome. Consequently, many patients with genetic syndromes remain without a specific diagnosis. New 'next-generation' sequencing (NGS) techniques permit simultaneous sequencing of enormous amounts of DNA. A slew of research publications have recently demonstrated the tremendous power of these technologies in increasing understanding of human genetic disease. These approaches are likely to be increasingly employed in routine diagnostic practice, but the scale of the genetic information yielded about individuals means that caution must be exercised to avoid net harm in this setting. Use of NGS in a research setting will increasingly have a major but indirect beneficial impact on clinical practice. However, important technical, ethical and social challenges need to be addressed through informed professional and public dialogue before it finds its mature niche as a direct tool in the clinical diagnostic armoury.

  16. Improved timing sequence generator on the DIII-D tokamak

    NASA Astrophysics Data System (ADS)

    Colio, R. A.; Finkenthal, D. F.; Deterly, T. M.

    2011-10-01

    The DIII-D tokamak uses a central clock source and trigger system to synchronize plant operations and diagnostics. The system uses a bi-phase encoding technique to send both clock and trigger signals to remote receivers, and supports both pre-programmed sequences of triggers as well as event-driven triggers. A 1 MHz timebase is used and triggers are encoded as eight-bit hexadecimal words. Currently, the system relies on a cascaded series of CAMAC-based delay generators to produce the trigger sequence. We present a modern and more versatile implementation based on a single FPGA (field programmable gate array) capable of providing clock rates upward of 100 MHz while maintaining compatibility with existing equipment. A proposal for system clock synchronization with GPS for improved precision is also presented. Work supported in part by US DOE under DE-FC02-04ER54698 and the National Undergraduate Fellowship in Fusion Science and Engineering.

  17. Actionable Diagnosis of Neuroleptospirosis by Next-Generation Sequencing

    PubMed Central

    Wilson, Michael R.; Naccache, Samia N.; Samayoa, Erik; Biagtan, Mark; Bashir, Hiba; Yu, Guixia; Salamat, Shahriar M.; Somasekar, Sneha; Federman, Scot; Miller, Steve; Sokolic, Robert; Garabedian, Elizabeth; Candotti, Fabio; Buckley, Rebecca H.; Reed, Kurt D.; Meyer, Teresa L.; Seroogy, Christine M.; Galloway, Renee; Henderson, Sheryl L.; Gern, James E.; DeRisi, Joseph L.; Chiu, Charles Y.

    2014-01-01

    SUMMARY A 14-year-old boy with severe combined immunodeficiency presented three times to a medical facility over a period of 4 months with fever and headache that progressed to hydrocephalus and status epilepticus necessitating a medically induced coma. Diagnostic workup including brain biopsy was unrevealing. Unbiased next-generation sequencing of the cerebrospinal fluid identified 475 of 3,063,784 sequence reads (0.016%) corresponding to leptospira infection. Clinical assays for leptospirosis were negative. Targeted antimicrobial agents were administered, and the patient was discharged home 32 days later with a status close to his premorbid condition. Polymerase-chain-reaction (PCR) and serologic testing at the Centers for Disease Control and Prevention (CDC) subsequently confirmed evidence of Leptospira santarosai infection. PMID:24896819

  18. Relationship between peptide amino acid sequence and membrane curvature generation

    NASA Astrophysics Data System (ADS)

    Schmidt, Nathan; Kuo, David; Hwee Lai, Ghee; Mishra, Abhijit; Wong, Gerard

    2012-02-01

    Amphipathic peptides and amphipathic domains in proteins can perturb and restructure biological membranes. For example, it is believed that the cationic, amphipathic motif found in membrane active antimicrobial peptides (AMPs) is responsible for their membrane disruption mechanisms of action. And ApoA-I, the main apolipoprotein in high density lipoprotein contains a series of amphipathic α-helical repeats which are responsible for its lipid associating properties. We use small angle x-ray scattering (SAXS) to investigate the interaction of model cell membranes with prototypical AMPs and consensus peptides derived from the helical structural motif of ApoA-I. The relationship between peptide sequence and the peptide-induced changes in membrane curvature and topology is examined. By comparing the membrane rearrangement and corresponding phase behavior induced by these two distinct classes of membrane restructuring peptides we will discuss the role of amino acid sequence on membrane curvature generation.

  19. Second-generation environmental sequencing unmasks marine metazoan biodiversity

    PubMed Central

    Fonseca, Vera G.; Carvalho, Gary R.; Sung, Way; Johnson, Harriet F.; Power, Deborah M.; Neill, Simon P.; Packer, Margaret; Blaxter, Mark L.; Lambshead, P. John D.; Thomas, W. Kelley; Creer, Simon

    2010-01-01

    Biodiversity is of crucial importance for ecosystem functioning, sustainability and resilience, but the magnitude and organization of marine diversity at a range of spatial and taxonomic scales are undefined. In this paper, we use second-generation sequencing to unmask putatively diverse marine metazoan biodiversity in a Scottish temperate benthic ecosystem. We show that remarkable differences in diversity occurred at microgeographical scales and refute currently accepted ecological and taxonomic paradigms of meiofaunal identity, rank abundance and concomitant understanding of trophic dynamics. Richness estimates from the current benchmarked Operational Clustering of Taxonomic Units from Parallel UltraSequencing analyses are broadly aligned with those derived from morphological assessments. However, the slope of taxon rarefaction curves for many phyla remains incomplete, suggesting that the true alpha diversity is likely to exceed current perceptions. The approaches provide a rapid, objective and cost-effective taxonomic framework for exploring links between ecosystem structure and function of all hitherto intractable, but ecologically important, communities. PMID:20981026

  20. Using next generation transcriptome sequencing to predict an ectomycorrhizal metablome.

    SciTech Connect

    Larsen, P. E.; Sreedasyam, A.; Trivedi, G; Podila, G. K.; Cseke, L. J.; Collart, F. R.

    2011-05-13

    Mycorrhizae, symbiotic interactions between soil fungi and tree roots, are ubiquitous in terrestrial ecosystems. The fungi contribute phosphorous, nitrogen and mobilized nutrients from organic matter in the soil and in return the fungus receives photosynthetically-derived carbohydrates. This union of plant and fungal metabolisms is the mycorrhizal metabolome. Understanding this symbiotic relationship at a molecular level provides important contributions to the understanding of forest ecosystems and global carbon cycling. We generated next generation short-read transcriptomic sequencing data from fully-formed ectomycorrhizae between Laccaria bicolor and aspen (Populus tremuloides) roots. The transcriptomic data was used to identify statistically significantly expressed gene models using a bootstrap-style approach, and these expressed genes were mapped to specific metabolic pathways. Integration of expressed genes that code for metabolic enzymes and the set of expressed membrane transporters generates a predictive model of the ectomycorrhizal metabolome. The generated model of mycorrhizal metabolome predicts that the specific compounds glycine, glutamate, and allantoin are synthesized by L. bicolor and that these compounds or their metabolites may be used for the benefit of aspen in exchange for the photosynthetically-derived sugars fructose and glucose. The analysis illustrates an approach to generate testable biological hypotheses to investigate the complex molecular interactions that drive ectomycorrhizal symbiosis. These models are consistent with experimental environmental data and provide insight into the molecular exchange processes for organisms in this complex ecosystem. The method used here for predicting metabolomic models of mycorrhizal systems from deep RNA sequencing data can be generalized and is broadly applicable to transcriptomic data derived from complex systems.

  1. Using next generation transcriptome sequencing to predict an ectomycorrhizal metabolome

    PubMed Central

    2011-01-01

    Background Mycorrhizae, symbiotic interactions between soil fungi and tree roots, are ubiquitous in terrestrial ecosystems. The fungi contribute phosphorous, nitrogen and mobilized nutrients from organic matter in the soil and in return the fungus receives photosynthetically-derived carbohydrates. This union of plant and fungal metabolisms is the mycorrhizal metabolome. Understanding this symbiotic relationship at a molecular level provides important contributions to the understanding of forest ecosystems and global carbon cycling. Results We generated next generation short-read transcriptomic sequencing data from fully-formed ectomycorrhizae between Laccaria bicolor and aspen (Populus tremuloides) roots. The transcriptomic data was used to identify statistically significantly expressed gene models using a bootstrap-style approach, and these expressed genes were mapped to specific metabolic pathways. Integration of expressed genes that code for metabolic enzymes and the set of expressed membrane transporters generates a predictive model of the ectomycorrhizal metabolome. The generated model of mycorrhizal metabolome predicts that the specific compounds glycine, glutamate, and allantoin are synthesized by L. bicolor and that these compounds or their metabolites may be used for the benefit of aspen in exchange for the photosynthetically-derived sugars fructose and glucose. Conclusions The analysis illustrates an approach to generate testable biological hypotheses to investigate the complex molecular interactions that drive ectomycorrhizal symbiosis. These models are consistent with experimental environmental data and provide insight into the molecular exchange processes for organisms in this complex ecosystem. The method used here for predicting metabolomic models of mycorrhizal systems from deep RNA sequencing data can be generalized and is broadly applicable to transcriptomic data derived from complex systems. PMID:21569493

  2. Next-generation sequencing technology in clinical virology.

    PubMed

    Capobianchi, M R; Giombini, E; Rozera, G

    2013-01-01

    Recent advances in nucleic acid sequencing technologies, referred to as 'next-generation' sequencing (NGS), have produced a true revolution and opened new perspectives for research and diagnostic applications, owing to the high speed and throughput of data generation. So far, NGS has been applied to metagenomics-based strategies for the discovery of novel viruses and the characterization of viral communities. Additional applications include whole viral genome sequencing, detection of viral genome variability, and the study of viral dynamics. These applications are particularly suitable for viruses such as human immunodeficiency virus, hepatitis B virus, and hepatitis C virus, whose error-prone replication machinery, combined with the high replication rate, results, in each infected individual, in the formation of many genetically related viral variants referred to as quasi-species. The viral quasi-species, in turn, represents the substrate for the selective pressure exerted by the immune system or by antiviral drugs. With traditional approaches, it is difficult to detect and quantify minority genomes present in viral quasi-species that, in fact, may have biological and clinical relevance. NGS provides, for each patient, a dataset of clonal sequences that is some order of magnitude higher than those obtained with conventional approaches. Hence, NGS is an extremely powerful tool with which to investigate previously inaccessible aspects of viral dynamics, such as the contribution of different viral reservoirs to replicating virus in the course of the natural history of the infection, co-receptor usage in minority viral populations harboured by different cell lineages, the dynamics of development of drug resistance, and the re-emergence of hidden genomes after treatment interruptions. The diagnostic application of NGS is just around the corner.

  3. Four methods of preparing mRNA 5' end libraries using the Illumina sequencing platform.

    PubMed

    Machida, Ryuji J; Lin, Ya-Ying

    2014-01-01

    The 5' untranslated regions of mRNA play an important role in their translation. Here, we describe the development of four methods of profiling mRNA 5' ends using the Illumina sequencing platform; the first method utilizes SMART (Switching Mechanism At 5' end of RNA Transcript) technology, while the second involves replacing the 5' cap structure with RNA oligomers via ligation. The third and fourth methods are modifications of SMART, and involve enriching mRNA molecules with (nuclear transcripts) and without (mitochondrial transcripts) 5' end cap structures, respectively. Libraries prepared using SMART technology gave more reproducible results, but the ligation method was advantageous in that it only sequenced mRNAs with a cap structure at the 5' end. These methods are suitable for global mapping of mRNA 5' ends, both with and without cap structures, at a single molecule resolution. In addition, comparison of the present results obtained using different methods revealed the presence of abundant messenger RNAs without a cap structure.

  4. Four Methods of Preparing mRNA 5′ End Libraries Using the Illumina Sequencing Platform

    PubMed Central

    Machida, Ryuji J.; Lin, Ya-Ying

    2014-01-01

    Background The 5′ untranslated regions of mRNA play an important role in their translation. Results Here, we describe the development of four methods of profiling mRNA 5′ ends using the Illumina sequencing platform; the first method utilizes SMART (Switching Mechanism At 5′ end of RNA Transcript) technology, while the second involves replacing the 5′ cap structure with RNA oligomers via ligation. The third and fourth methods are modifications of SMART, and involve enriching mRNA molecules with (nuclear transcripts) and without (mitochondrial transcripts) 5′ end cap structures, respectively. Libraries prepared using SMART technology gave more reproducible results, but the ligation method was advantageous in that it only sequenced mRNAs with a cap structure at the 5′ end. Conclusions These methods are suitable for global mapping of mRNA 5′ ends, both with and without cap structures, at a single molecule resolution. In addition, comparison of the present results obtained using different methods revealed the presence of abundant messenger RNAs without a cap structure. PMID:25003736

  5. Preliminary Sequence stratigraphy framework of the SW part of the Actopan Platform, Lower Cretaceous, Hidalgo, Mexico

    NASA Astrophysics Data System (ADS)

    Abascal, G.; Murillo-Muñeton, G.

    2013-05-01

    The oldest sedimentary rocks in what is known as the Actopan Platform, in the State of Hidalgo, Mexico, are superbly exposed toward the southwestern part of such platform. A detailed stratigraphic/sedimentologic study was carried out to a 623 m-thick section; this study was focused to establish a sequence stratigraphic framework. The base of the section consists of a Lower Cretaceous 6223-m thick, mixed siliciclastic-carbonate sedimentary succession that has been named Santuario Formation. The terrigenous facies of this unit correspond to red beds that consist of shales, sandstones y few conglomerates deposited under continental conditions (fluvial). White and yellowish sandstones, possibly deposited by deltaic systems, occur in minor amounts. A tuff layer is found in its lower part. The carbonate facies of the Santuario Formation consist mainly of skeletal mudstones/wackestones de bioclastos-peloides and subordinate quantities of sandy dolostones, skeletal packstones/grainstones and rudist (requeniids) boundstones. The middle and upper parts of the studied stratigraphic section correspond to an essentially carbonate succession that in known as El Abra Formation. This unit is comprised of the following facies: skeletal mudstones/wackestones, skeletal packstones/grainstone, and minor rudist (requeniid and Chondrodonta) boundstones and cryptalgal laminites deposited in shallow subtidal lagoon to tidal flat conditions. At this location, a "Middle" Cretaceous age (Albian-Cenomanian) has been assigned to the El Abra Formation. However, the common presence of the benthic foraminifer Chofatella decipiens Schlumberger in these facies indicates that their age extends, at least, to the Lower Cretaceous (Barremian). This age was confirmed with the dating of zircons in tuff deposited in the base section. The carbonate facies of the Santuario Formation stack forming fifth-order subtidal cycles or parasequences. While the carbonate facies of the El Abra Formation also stack

  6. Ultrasensitive single-genome sequencing: accurate, targeted, next generation sequencing of HIV-1 RNA.

    PubMed

    Boltz, Valerie F; Rausch, Jason; Shao, Wei; Hattori, Junko; Luke, Brian; Maldarelli, Frank; Mellors, John W; Kearney, Mary F; Coffin, John M

    2016-12-20

    Although next generation sequencing (NGS) offers the potential for studying virus populations in unprecedented depth, PCR error, amplification bias and recombination during library construction have limited its use to population sequencing and measurements of unlinked allele frequencies. Here we report a method, termed ultrasensitive Single-Genome Sequencing (uSGS), for NGS library construction and analysis that eliminates PCR errors and recombinants, and generates single-genome sequences of the same quality as the "gold-standard" of HIV-1 single-genome sequencing assay but with more than 100-fold greater depth. Primer ID tagged cDNA was synthesized from mixtures of cloned BH10 wild-type and mutant HIV-1 transcripts containing ten drug resistance mutations. First, the resultant cDNA was divided and NGS libraries were generated in parallel using two methods: uSGS and a method applying long PCR primers to attach the NGS adaptors (LP-PCR-1). Second, cDNA was divided and NGS libraries were generated in parallel comparing 3 methods: uSGS and 2 methods adapted from more recent reports using variations of the long PCR primers to attach the adaptors (LP-PCR-2 and LP-PCR-3). Consistently, the uSGS method amplified a greater proportion of cDNAs, averaging 30% compared to 13% for LP-PCR-1, 21% for LP-PCR-2 and 14% for LP-PCR-3. Most importantly, when the uSGS sequences were binned according to their primer IDs, 94% of the bins did not contain PCR recombinant sequences versus only 55, 75 and 65% for LP-PCR-1, 2 and 3, respectively. Finally, when uSGS was applied to plasma samples from HIV-1 infected donors, both frequent and rare variants were detected in each sample and neighbor-joining trees revealed clusters of genomes driven by the linkage of these mutations, showing the lack of PCR recombinants in the datasets. The uSGS assay can be used for accurate detection of rare variants and for identifying linkage of rare alleles associated with HIV-1 drug resistance. In addition

  7. Complete plastid genome sequence of Vaccinium macrocarpon: structure, gene content and rearrangements revealed by next generation sequencing

    USDA-ARS?s Scientific Manuscript database

    The complete plastid genome sequence of the American cranberry was reconstructed using next-generation sequencing data by in silico procedures. We used Roche 454 shotgun sequence data to isolate cranberry plastid-specific sequences of the cultivar ‘HyRed’ via homology comparisons with complete seque...

  8. A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance

    PubMed Central

    Ahrenfeldt, Johanne; Cisneros, Jose Luis Bellod; Jurtz, Vanessa; Larsen, Mette Voldby; Hasman, Henrik; Aarestrup, Frank Møller; Lund, Ole

    2016-01-01

    Recent advances in whole genome sequencing have made the technology available for routine use in microbiological laboratories. However, a major obstacle for using this technology is the availability of simple and automatic bioinformatics tools. Based on previously published and already available web-based tools we developed a single pipeline for batch uploading of whole genome sequencing data from multiple bacterial isolates. The pipeline will automatically identify the bacterial species and, if applicable, assemble the genome, identify the multilocus sequence type, plasmids, virulence genes and antimicrobial resistance genes. A short printable report for each sample will be provided and an Excel spreadsheet containing all the metadata and a summary of the results for all submitted samples can be downloaded. The pipeline was benchmarked using datasets previously used to test the individual services. The reported results enable a rapid overview of the major results, and comparing that to the previously found results showed that the platform is reliable and able to correctly predict the species and find most of the expected genes automatically. In conclusion, a combined bioinformatics platform was developed and made publicly available, providing easy-to-use automated analysis of bacterial whole genome sequencing data. The platform may be of immediate relevance as a guide for investigators using whole genome sequencing for clinical diagnostics and surveillance. The platform is freely available at: https://cge.cbs.dtu.dk/services/CGEpipeline-1.1 and it is the intention that it will continue to be expanded with new features as these become available. PMID:27327771

  9. A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance.

    PubMed

    Thomsen, Martin Christen Frølund; Ahrenfeldt, Johanne; Cisneros, Jose Luis Bellod; Jurtz, Vanessa; Larsen, Mette Voldby; Hasman, Henrik; Aarestrup, Frank Møller; Lund, Ole

    2016-01-01

    Recent advances in whole genome sequencing have made the technology available for routine use in microbiological laboratories. However, a major obstacle for using this technology is the availability of simple and automatic bioinformatics tools. Based on previously published and already available web-based tools we developed a single pipeline for batch uploading of whole genome sequencing data from multiple bacterial isolates. The pipeline will automatically identify the bacterial species and, if applicable, assemble the genome, identify the multilocus sequence type, plasmids, virulence genes and antimicrobial resistance genes. A short printable report for each sample will be provided and an Excel spreadsheet containing all the metadata and a summary of the results for all submitted samples can be downloaded. The pipeline was benchmarked using datasets previously used to test the individual services. The reported results enable a rapid overview of the major results, and comparing that to the previously found results showed that the platform is reliable and able to correctly predict the species and find most of the expected genes automatically. In conclusion, a combined bioinformatics platform was developed and made publicly available, providing easy-to-use automated analysis of bacterial whole genome sequencing data. The platform may be of immediate relevance as a guide for investigators using whole genome sequencing for clinical diagnostics and surveillance. The platform is freely available at: https://cge.cbs.dtu.dk/services/CGEpipeline-1.1 and it is the intention that it will continue to be expanded with new features as these become available.

  10. The next generation of target capture technologies - large DNA fragment enrichment and sequencing determines regional genomic variation of high complexity.

    PubMed

    Dapprich, Johannes; Ferriola, Deborah; Mackiewicz, Kate; Clark, Peter M; Rappaport, Eric; D'Arcy, Monica; Sasson, Ariella; Gai, Xiaowu; Schug, Jonathan; Kaestner, Klaus H; Monos, Dimitri

    2016-07-09

    The ability to capture and sequence large contiguous DNA fragments represents a significant advancement towards the comprehensive characterization of complex genomic regions. While emerging sequencing platforms are capable of producing several kilobases-long reads, the fragment sizes generated by current DNA target enrichment technologies remain a limiting factor, producing DNA fragments generally shorter than 1 kbp. The DNA enrichment methodology described herein, Region-Specific Extraction (RSE), produces DNA segments in excess of 20 kbp in length. Coupling this enrichment method to appropriate sequencing platforms will significantly enhance the ability to generate complete and accurate sequence characterization of any genomic region without the need for reference-based assembly. RSE is a long-range DNA target capture methodology that relies on the specific hybridization of short (20-25 base) oligonucleotide primers to selected sequence motifs within the DNA target region. These capture primers are then enzymatically extended on the 3'-end, incorporating biotinylated nucleotides into the DNA. Streptavidin-coated beads are subsequently used to pull-down the original, long DNA template molecules via the newly synthesized, biotinylated DNA that is bound to them. We demonstrate the accuracy, simplicity and utility of the RSE method by capturing and sequencing a 4 Mbp stretch of the major histocompatibility complex (MHC). Our results show an average depth of coverage of 164X for the entire MHC. This depth of coverage contributes significantly to a 99.94 % total coverage of the targeted region and to an accuracy that is over 99.99 %. RSE represents a cost-effective target enrichment method capable of producing sequencing templates in excess of 20 kbp in length. The utility of our method has been proven to generate superior coverage across the MHC as compared to other commercially available methodologies, with the added advantage of producing longer sequencing

  11. The Antibody Genetics of Multiple Sclerosis: Comparing Next-Generation Sequencing to Sanger Sequencing

    PubMed Central

    Rounds, William H.; Ligocki, Ann J.; Levin, Mikhail K.; Greenberg, Benjamin M.; Bigwood, Douglas W.; Eastman, Eric M.; Cowell, Lindsay G.; Monson, Nancy L.

    2014-01-01

    We previously identified a distinct mutation pattern in the antibody genes of B cells isolated from cerebrospinal fluid (CSF) that can identify patients who have relapsing-remitting multiple sclerosis (RRMS) and patients with clinically isolated syndromes who will convert to RRMS. This antibody gene signature (AGS) was developed using Sanger sequencing of single B cells. While potentially helpful to patients, Sanger sequencing is not an assay that can be practically deployed in clinical settings. In order to provide AGS evaluations to patients as part of their diagnostic workup, we developed protocols to generate AGS scores using next-generation DNA sequencing (NGS) on CSF-derived cell pellets without the need to isolate single cells. This approach has the potential to increase the coverage of the B-cell population being analyzed, reduce the time needed to generate AGS scores, and may improve the overall performance of the AGS approach as a diagnostic test in the future. However, no investigations have focused on whether NGS-based repertoires will properly reflect antibody gene frequencies and somatic hypermutation patterns defined by Sanger sequencing. To address this issue, we isolated paired CSF samples from eight patients who either had MS or were at risk to develop MS. Here, we present data that antibody gene frequencies and somatic hypermutation patterns are similar in Sanger and NGS-based antibody repertoires from these paired CSF samples. In addition, AGS scores derived from the NGS database correctly identified the patients who initially had or subsequently converted to RRMS, with precision similar to that of the Sanger sequencing approach. Further investigation of the utility of the AGS in predicting conversion to MS using NGS-derived antibody repertoires in a larger cohort of patients is warranted. PMID:25278930

  12. Generation of Weibull distribution clutter based on correlated Gaussian sequence

    NASA Astrophysics Data System (ADS)

    Wang, Bin; Xin, Fengming

    2017-08-01

    With the continuous development of science and technology, the electromagnetic environment becomes more complex. Accurate clutter modeling is becoming increasingly difficult, which will have adverse effects in echo analysis. In this paper, in order to overcome electromagnetic interference, we use correlated Gaussian sequence to generate Weibull distribution clutter. Simulation results show that the estimated value of the proposed method is close to the theoretical value in the aspect of probability density and power spectral density. That demonstrates the validity of our method. Finally, the conclusions are given.

  13. An integrated approach for analyzing clinical genomic variant data from next-generation sequencing.

    PubMed

    Crowgey, Erin L; Stabley, Deborah L; Chen, Chuming; Huang, Hongzhan; Robbins, Katherine M; Polson, Shawn W; Sol-Church, Katia; Wu, Cathy H

    2015-04-01

    Next-generation sequencing (NGS) technologies provide the potential for developing high-throughput and low-cost platforms for clinical diagnostics. A limiting factor to clinical applications of genomic NGS is downstream bioinformatics analysis for data interpretation. We have developed an integrated approach for end-to-end clinical NGS data analysis from variant detection to functional profiling. Robust bioinformatics pipelines were implemented for genome alignment, single nucleotide polymorphism (SNP), small insertion/deletion (InDel), and copy number variation (CNV) detection of whole exome sequencing (WES) data from the Illumina platform. Quality-control metrics were analyzed at each step of the pipeline by use of a validated training dataset to ensure data integrity for clinical applications. We annotate the variants with data regarding the disease population and variant impact. Custom algorithms were developed to filter variants based on criteria, such as quality of variant, inheritance pattern, and impact of variant on protein function. The developed clinical variant pipeline links the identified rare variants to Integrated Genome Viewer for visualization in a genomic context and to the Protein Information Resource's iProXpress for rich protein and disease information. With the application of our system of annotations, prioritizations, inheritance filters, and functional profiling and analysis, we have created a unique methodology for downstream variant filtering that empowers clinicians and researchers to interpret more effectively the relevance of genomic alterations within a rare genetic disease.

  14. NGS-Trex: Next Generation Sequencing Transcriptome profile explorer

    PubMed Central

    2013-01-01

    Background Next-Generation Sequencing (NGS) technology has exceptionally increased the ability to sequence DNA in a massively parallel and cost-effective manner. Nevertheless, NGS data analysis requires bioinformatics skills and computational resources well beyond the possibilities of many "wet biology" laboratories. Moreover, most of projects only require few sequencing cycles and standard tools or workflows to carry out suitable analyses for the identification and annotation of genes, transcripts and splice variants found in the biological samples under investigation. These projects can take benefits from the availability of easy to use systems to automatically analyse sequences and to mine data without the preventive need of strong bioinformatics background and hardware infrastructure. Results To address this issue we developed an automatic system targeted to the analysis of NGS data obtained from large-scale transcriptome studies. This system, we named NGS-Trex (NGS Transcriptome profile explorer) is available through a simple web interface http://www.ngs-trex.org and allows the user to upload raw sequences and easily obtain an accurate characterization of the transcriptome profile after the setting of few parameters required to tune the analysis procedure. The system is also able to assess differential expression at both gene and transcript level (i.e. splicing isoforms) by comparing the expression profile of different samples. By using simple query forms the user can obtain list of genes, transcripts, splice sites ranked and filtered according to several criteria. Data can be viewed as tables, text files or through a simple genome browser which helps the visual inspection of the data. Conclusions NGS-Trex is a simple tool for RNA-Seq data analysis mainly targeted to "wet biology" researchers with limited bioinformatics skills. It offers simple data mining tools to explore transcriptome profiles of samples investigated taking advantage of NGS technologies

  15. Mapping sensorimotor sequences to word sequences: a connectionist model of language acquisition and sentence generation.

    PubMed

    Takac, Martin; Benuskova, Lubica; Knott, Alistair

    2012-11-01

    In this article we present a neural network model of sentence generation. The network has both technical and conceptual innovations. Its main technical novelty is in its semantic representations: the messages which form the input to the network are structured as sequences, so that message elements are delivered to the network one at a time. Rather than learning to linearise a static semantic representation as a sequence of words, our network rehearses a sequence of semantic signals, and learns to generate words from selected signals. Conceptually, the network's use of rehearsed sequences of semantic signals is motivated by work in embodied cognition, which posits that the structure of semantic representations has its origin in the serial structure of sensorimotor processing. The rich sequential structure of the network's semantic inputs also allows it to incorporate certain Chomskyan ideas about innate syntactic knowledge and parameter-setting, as well as a more empiricist account of the acquisition of idiomatic syntactic constructions. Copyright © 2012 Elsevier B.V. All rights reserved.

  16. Computational characterisation of cancer molecular profiles derived using next generation sequencing

    PubMed Central

    Oleksiewicz, Urszula; Tomczak, Katarzyna; Woropaj, Jakub; Markowska, Monika; Stępniak, Piotr

    2015-01-01

    Our current understanding of cancer genetics is grounded on the principle that cancer arises from a clone that has accumulated the requisite somatically acquired genetic aberrations, leading to the malignant transformation. It also results in aberrent of gene and protein expression. Next generation sequencing (NGS) or deep sequencing platforms are being used to create large catalogues of changes in copy numbers, mutations, structural variations, gene fusions, gene expression, and other types of information for cancer patients. However, inferring different types of biological changes from raw reads generated using the sequencing experiments is algorithmically and computationally challenging. In this article, we outline common steps for the quality control and processing of NGS data. We highlight the importance of accurate and application-specific alignment of these reads and the methodological steps and challenges in obtaining different types of information. We comment on the importance of integrating these data and building infrastructure to analyse it. We also provide exhaustive lists of available software to obtain information and point the readers to articles comparing software for deeper insight in specialised areas. We hope that the article will guide readers in choosing the right tools for analysing oncogenomic datasets. PMID:25691827

  17. A Next-Generation Sequencing Method for Genotyping-by-Sequencing of Highly Heterozygous Autotetraploid Potato

    PubMed Central

    Uitdewilligen, Jan G. A. M. L.; Wolters, Anne-Marie A.; D’hoop, Bjorn B.; Borm, Theo J. A.; Visser, Richard G. F.; van Eck, Herman J.

    2013-01-01

    Assessment of genomic DNA sequence variation and genotype calling in autotetraploids implies the ability to distinguish among five possible alternative allele copy number states. This study demonstrates the accuracy of genotyping-by-sequencing (GBS) of a large collection of autotetraploid potato cultivars using next-generation sequencing. It is still costly to reach sufficient read depths on a genome wide scale, across the cultivated gene pool. Therefore, we enriched cultivar-specific DNA sequencing libraries using an in-solution hybridisation method (SureSelect). This complexity reduction allowed to confine our study to 807 target genes distributed across the genomes of 83 tetraploid cultivars and one reference (DM 1–3 511). Indexed sequencing libraries were paired-end sequenced in 7 pools of 12 samples using Illumina HiSeq2000. After filtering and processing the raw sequence data, 12.4 Gigabases of high-quality sequence data was obtained, which mapped to 2.1 Mb of the potato reference genome, with a median average read depth of 63× per cultivar. We detected 129,156 sequence variants and genotyped the allele copy number of each variant for every cultivar. In this cultivar panel a variant density of 1 SNP/24 bp in exons and 1 SNP/15 bp in introns was obtained. The average minor allele frequency (MAF) of a variant was 0.14. Potato germplasm displayed a large number of relatively rare variants and/or haplotypes, with 61% of the variants having a MAF below 0.05. A very high average nucleotide diversity (π = 0.0107) was observed. Nucleotide diversity varied among potato chromosomes. Several genes under selection were identified. Genotyping-by-sequencing results, with allele copy number estimates, were validated with a KASP genotyping assay. This validation showed that read depths of ∼60–80× can be used as a lower boundary for reliable assessment of allele copy number of sequence variants in autotetraploids. Genotypic data were associated with traits, and

  18. Investigation of a steam generator tube rupture sequence using VICTORIA

    SciTech Connect

    Bixler, N.E.; Erickson, C.M.; Schaperow, J.H.

    1995-12-31

    VICTORIA-92 is a mechanistic computer code for analyzing fission product behavior within the reactor coolant system (RCS) during a severe reactor accident. It provides detailed predictions of the release of radionuclides and nonradioactive materials from the core and transport of these materials within the RCS. The modeling accounts for the chemical and aerosol processes that affect radionuclide behavior. Coupling of detailed chemistry and aerosol packages is a unique feature of VICTORIA; it allows exploration of phenomena involving deposition, revaporization, and re-entrainment that cannot be resolved with other codes. The purpose of this work is to determine the attenuation of fission products in the RCS and on the secondary side of the steam generator in an accident initiated by a steam generator tube rupture (SGTR). As a class, bypass sequences have been identified in NUREG-1150 as being risk dominant for the Surry and Sequoyah pressurized water reactor (PWR) plants.

  19. Generation and functional assessment of 3D multicellular spheroids in droplet based microfluidics platform.

    PubMed

    Sabhachandani, P; Motwani, V; Cohen, N; Sarkar, S; Torchilin, V; Konry, T

    2016-02-07

    Here we describe a robust, microfluidic technique to generate and analyze 3D tumor spheroids, which resembles tumor microenvironment and can be used as a more effective preclinical drug testing and screening model. Monodisperse cell-laden alginate droplets were generated in polydimethylsiloxane (PDMS) microfluidic devices that combine T-junction droplet generation and external gelation for spheroid formation. The proposed approach has the capability to incorporate multiple cell types. For the purposes of our study, we generated spheroids with breast cancer cell lines (MCF-7 drug sensitive and resistant) and co-culture spheroids of MCF-7 together with a fibroblast cell line (HS-5). The device has the capability to house 1000 spheroids on chip for drug screening and other functional analysis. Cellular viability of spheroids in the array part of the device was maintained for two weeks by continuous perfusion of complete media into the device. The functional performance of our 3D tumor models and a dose dependent response of standard chemotherapeutic drug, doxorubicin (Dox) and standard drug combination Dox and paclitaxel (PCT) was analyzed on our chip-based platform. Altogether, our work provides a simple and novel, in vitro platform to generate, image and analyze uniform, 3D monodisperse alginate hydrogel tumors for various omic studies and therapeutic efficiency screening, an important translational step before in vivo studies.

  20. Integrated platform for optimized solar PV system design and engineering plan set generation

    SciTech Connect

    Adeyemo, Samuel

    2015-12-30

    The Aurora team has developed software that allows users to quickly generate a three-dimensional model for a building, with a corresponding irradiance map, from any two-dimensional image with associated geo-coordinates. The purpose of this project is to build upon that technology by developing and distributing to solar installers a software platform that automatically retrieves engineering, financial and geographic data for a specific site, and quickly generates an optimal customer proposal and corresponding engineering plans for that site. At the end of the project, Aurora’s optimization platform would have been used to make at least one thousand proposals from at least ten unique solar installation companies, two of whom would sign economically viable contracts to use the software. Furthermore, Aurora’s algorithms would be tested to show that in at least seventy percent of cases, Aurora automatically generated a design equivalent to or better than what a human could have done manually. A ‘better’ design is one that generates more energy for the same cost, or that generates a higher return on investment, while complying with all site-specific aesthetic, electrical and spatial requirements.

  1. Detection of Inter-Lineage Natural Recombination in Avian Paramyxovirus Serotype 1 Using Simplified Deep Sequencing Platform.

    PubMed

    Satharasinghe, Dilan A; Murulitharan, Kavitha; Tan, Sheau W; Yeap, Swee K; Munir, Muhammad; Ideris, Aini; Omar, Abdul R

    2016-01-01

    Newcastle disease virus (NDV) is a prototype member of avian paramyxovirus serotype 1 (APMV-1), which causes severe and contagious disease in the commercial poultry and wild birds. Despite extensive vaccination programs and other control measures, the disease remains endemic around the globe especially in Asia, Africa, and the Middle East. Being a single serotype, genotype II based vaccines remained most acceptable means of immunization. However, the evidence is emerging on failures of vaccines mainly due to evolving nature of the virus and higher genetic gaps between vaccine and field strains of APMV-1. Most of the epidemiological and genetic characterizations of APMVs are based on conventional methods, which are prone to mask the diverse population of viruses in complex samples. In this study, we report the application of a simple, robust, and less resource-demanding methodology for the whole genome sequencing of NDV, using next-generation sequencing (NGS) on the Illumina MiSeq platform. Using this platform, we sequenced full genomes of five virulent Malaysian NDV strains collected during 2004-2013. All isolates clustered within highly prevalent lineage 5 (specifically in lineage 5a); however, a significantly greater genetic divergence was observed in isolates collected from 2004 to 2011. Interestingly, genetic characterization of one isolate collected in 2013 (IBS025/13) shown natural recombination between lineage 2 and lineage 5. In the event of recombination, the isolate (IBS025/13) carried nucleocapsid protein consist of 55-1801 nucleotides (nts) and near-complete phosphoprotein (1804-3254 nts) genes of lineage 2 whereas surface glycoproteins (fusion, hemagglutinin-neuraminidase) and large polymerase of lineage 5. Additionally, the recombinant virus has a genome size of 15,186 nts which is characteristics for the old genotypes I-IV isolated from 1930 to 1960. Taken together, we report the occurrence of a natural recombination in circulating strains of NDV in

  2. Detection of Inter-Lineage Natural Recombination in Avian Paramyxovirus Serotype 1 Using Simplified Deep Sequencing Platform

    PubMed Central

    Satharasinghe, Dilan A.; Murulitharan, Kavitha; Tan, Sheau W.; Yeap, Swee K.; Munir, Muhammad; Ideris, Aini; Omar, Abdul R.

    2016-01-01

    Newcastle disease virus (NDV) is a prototype member of avian paramyxovirus serotype 1 (APMV-1), which causes severe and contagious disease in the commercial poultry and wild birds. Despite extensive vaccination programs and other control measures, the disease remains endemic around the globe especially in Asia, Africa, and the Middle East. Being a single serotype, genotype II based vaccines remained most acceptable means of immunization. However, the evidence is emerging on failures of vaccines mainly due to evolving nature of the virus and higher genetic gaps between vaccine and field strains of APMV-1. Most of the epidemiological and genetic characterizations of APMVs are based on conventional methods, which are prone to mask the diverse population of viruses in complex samples. In this study, we report the application of a simple, robust, and less resource-demanding methodology for the whole genome sequencing of NDV, using next-generation sequencing (NGS) on the Illumina MiSeq platform. Using this platform, we sequenced full genomes of five virulent Malaysian NDV strains collected during 2004–2013. All isolates clustered within highly prevalent lineage 5 (specifically in lineage 5a); however, a significantly greater genetic divergence was observed in isolates collected from 2004 to 2011. Interestingly, genetic characterization of one isolate collected in 2013 (IBS025/13) shown natural recombination between lineage 2 and lineage 5. In the event of recombination, the isolate (IBS025/13) carried nucleocapsid protein consist of 55–1801 nucleotides (nts) and near-complete phosphoprotein (1804–3254 nts) genes of lineage 2 whereas surface glycoproteins (fusion, hemagglutinin-neuraminidase) and large polymerase of lineage 5. Additionally, the recombinant virus has a genome size of 15,186 nts which is characteristics for the old genotypes I–IV isolated from 1930 to 1960. Taken together, we report the occurrence of a natural recombination in circulating strains of

  3. A Comprehensive Platform for NGS Data Analysis

    SciTech Connect

    Kravitz, Saul

    2010-06-03

    Saul Kravitz of CLC Bio discusses the company's Genomic Workbench and how it can be used with data from next generation sequencing platforms on June 3, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  4. SRAdb: query and use public next-generation sequencing data from within R.

    PubMed

    Zhu, Yuelin; Stephens, Robert M; Meltzer, Paul S; Davis, Sean R

    2013-01-17

    The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next generation of sequencing platforms including Illumina (Genome Analyzer, HiSeq, MiSeq, .etc), Roche 454 GS System, Applied Biosystems SOLiD System, Helicos Heliscope, PacBio RS, and others. SRAdb is an attempt to make queries of the metadata associated with SRA submission, study, sample, experiment and run more robust and precise, and make access to sequencing data in the SRA easier. We have parsed all the SRA metadata into a SQLite database that is routinely updated and can be easily distributed. The SRAdb R/Bioconductor package then utilizes this SQLite database for querying and accessing metadata. Full text search functionality makes querying metadata very flexible and powerful. Fastq files associated with query results can be downloaded easily for local analysis. The package also includes an interface from R to a popular genome browser, the Integrated Genomics Viewer. SRAdb Bioconductor package provides a convenient and integrated framework to query and access SRA metadata quickly and powerfully from within R.

  5. SRAdb: query and use public next-generation sequencing data from within R

    PubMed Central

    2013-01-01

    Background The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next generation of sequencing platforms including Illumina (Genome Analyzer, HiSeq, MiSeq, .etc), Roche 454 GS System, Applied Biosystems SOLiD System, Helicos Heliscope, PacBio RS, and others. Results SRAdb is an attempt to make queries of the metadata associated with SRA submission, study, sample, experiment and run more robust and precise, and make access to sequencing data in the SRA easier. We have parsed all the SRA metadata into a SQLite database that is routinely updated and can be easily distributed. The SRAdb R/Bioconductor package then utilizes this SQLite database for querying and accessing metadata. Full text search functionality makes querying metadata very flexible and powerful. Fastq files associated with query results can be downloaded easily for local analysis. The package also includes an interface from R to a popular genome browser, the Integrated Genomics Viewer. Conclusions SRAdb Bioconductor package provides a convenient and integrated framework to query and access SRA metadata quickly and powerfully from within R. PMID:23323543

  6. Functional genomics of a living fossil tree, Ginkgo, based on next-generation sequencing technology.

    PubMed

    Lin, Xiaohan; Zhang, Jin; Li, Ying; Luo, Hongmei; Wu, Qiong; Sun, Chao; Song, Jingyuan; Li, Xiwen; Wei, Jianhe; Lu, Aiping; Qian, Zhongzhi; Khan, Ikhlas A; Chen, Shilin

    2011-11-01

    Ginkgo biloba is monotypic species native to China and has old, dioecious, medicinally important characteristics. The functional genes related to these characteristics have not been effectively explored due to a limited number of expressed sequence tags (ESTs) from Ginkgo. To discover novel functional genes efficiently and to understand the development of a living fossil tree, Ginkgo, we used massive parallel pyrosequencing on the Roche 454 GS FLX Titanium platform to generate 64 057 ESTs. The ESTs combined with the 21 590 Ginkgo ESTs in genbank were assembled into 22 304 unique putative transcripts, in which 13 922 novel unique putative transcripts were identified by 454 sequencing. After being assigned to putative functions with Gene Ontology terms, a detailed view of the Ginkgo biological systems was displayed, including characterization of unique putative transcripts with homology to known key enzymes and transcription factors involved in ginkgolide/bilobalide and flavonoid biosynthetic pathways, as well as unique putative transcripts related to development, response to disease and defence. The fact that three full-length Ginkgo genes encoding key enzymes were found and cloned, suggests that high-throughput sequencing technology is superior to traditional gene-by-gene approach in discovery of genes. Additionally, a total of 204 simple sequence repeat motifs were detected. Our study not only lays the foundations for transcriptome-led studies in biosynthetic mechanisms, but also contributes significantly to the understanding of functional genomics and development in non-model plants.

  7. The Sequencing Bead Array (SBA), a next-generation digital suspension array.

    PubMed

    Akhras, Michael S; Pettersson, Erik; Diamond, Lisa; Unemo, Magnus; Okamoto, Jennifer; Davis, Ronald W; Pourmand, Nader

    2013-01-01

    Here we describe the novel Sequencing Bead Array (SBA), a complete assay for molecular diagnostics and typing applications. SBA is a digital suspension array using Next-Generation Sequencing (NGS), to replace conventional optical readout platforms. The technology allows for reducing the number of instruments required in a laboratory setting, where the same NGS instrument could be employed from whole-genome and targeted sequencing to SBA broad-range biomarker detection and genotyping. As proof-of-concept, a model assay was designed that could distinguish ten Human Papillomavirus (HPV) genotypes associated with cervical cancer progression. SBA was used to genotype 20 cervical tumor samples and, when compared with amplicon pyrosequencing, was able to detect two additional co-infections due to increased sensitivity. We also introduce in-house software Sphix, enabling easy accessibility and interpretation of results. The technology offers a multi-parallel, rapid, robust, and scalable system that is readily adaptable for a multitude of microarray diagnostic and typing applications, e.g. genetic signatures, single nucleotide polymorphisms (SNPs), structural variations, and immunoassays. SBA has the potential to dramatically change the way we perform probe-based applications, and allow for a smooth transition towards the technology offered by genomic sequencing.

  8. Evaluation of next generation sequencing for the analysis of Eimeria communities in wildlife.

    PubMed

    Vermeulen, Elke T; Lott, Matthew J; Eldridge, Mark D B; Power, Michelle L

    2016-05-01

    Next-generation sequencing (NGS) techniques are well-established for studying bacterial communities but not yet for microbial eukaryotes. Parasite communities remain poorly studied, due in part to the lack of reliable and accessible molecular methods to analyse eukaryotic communities. We aimed to develop and evaluate a methodology to analyse communities of the protozoan parasite Eimeria from populations of the Australian marsupial Petrogale penicillata (brush-tailed rock-wallaby) using NGS. An oocyst purification method for small sample sizes and polymerase chain reaction (PCR) protocol for the 18S rRNA locus targeting Eimeria was developed and optimised prior to sequencing on the Illumina MiSeq platform. A data analysis approach was developed by modifying methods from bacterial metagenomics and utilising existing Eimeria sequences in GenBank. Operational taxonomic unit (OTU) assignment at a high similarity threshold (97%) was more accurate at assigning Eimeria contigs into Eimeria OTUs but at a lower threshold (95%) there was greater resolution between OTU consensus sequences. The assessment of two amplification PCR methods prior to Illumina MiSeq, single and nested PCR, determined that single PCR was more sensitive to Eimeria as more Eimeria OTUs were detected in single amplicons. We have developed a simple and cost-effective approach to a data analysis pipeline for community analysis of eukaryotic organisms using Eimeria communities as a model. The pipeline provides a basis for evaluation using other eukaryotic organisms and potential for diverse community analysis studies.

  9. Transcriptome Sequencing and De Novo Analysis for Ma Bamboo (Dendrocalamus latiflorus Munro) Using the Illumina Platform

    PubMed Central

    Liu, Mingying; Qiao, Guirong; Jiang, Jing; Yang, Huiqin; Xie, Lihua; Xie, Jinzhong; Zhuo, Renying

    2012-01-01

    Background Bamboo occupies an important phylogenetic node in the grass family with remarkable sizes, woodiness and a striking life history. However, limited genetic research has focused on bamboo partially because of the lack of genomic resources. The advent of high-throughput sequencing technologies enables generation of genomic resources in a short time and at a minimal cost, and therefore provides a turning point for bamboo research. In the present study, we performed de novo transcriptome sequencing for the first time to produce a comprehensive dataset for the Ma bamboo (Dendrocalamus latiflorus Munro). Results The Ma bamboo transcriptome was sequenced using the Illumina paired-end sequencing technology. We produced 15,138,726 reads and assembled them into 103,354 scaffolds. A total of 68,229 unigenes were identified, among which 46,087 were annotated in the NCBI non-redundant protein database and 28,165 were annotated in the Swiss-Prot database. Of these annotated unigenes, 11,921 and 10,147 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. We could map 45,649 unigenes onto 292 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database. The annotated unigenes were compared against Moso bamboo, rice and millet. Unigenes that did not match any of those three sequence datasets are considered to be Ma bamboo unique. We predicted 105 unigenes encoding eight key enzymes involved in lignin biosynthesis. In addition, 621 simple sequence repeats (SSRs) were detected. Conclusion Our data provide the most comprehensive transcriptomic resource currently available for D. latiflorus Munro. Candidate genes potentially involved in growth and development were identified, and those predicted to be unique to Ma bamboo are expected to give a better insight on Ma bamboo gene diversity. Numerous SSRs characterized contributed to marker development. These data constitute a new valuable resource for genomic studies

  10. Transcriptome sequencing and de novo analysis for Ma bamboo (Dendrocalamus latiflorus Munro) using the Illumina platform.

    PubMed

    Liu, Mingying; Qiao, Guirong; Jiang, Jing; Yang, Huiqin; Xie, Lihua; Xie, Jinzhong; Zhuo, Renying

    2012-01-01

    Bamboo occupies an important phylogenetic node in the grass family with remarkable sizes, woodiness and a striking life history. However, limited genetic research has focused on bamboo partially because of the lack of genomic resources. The advent of high-throughput sequencing technologies enables generation of genomic resources in a short time and at a minimal cost, and therefore provides a turning point for bamboo research. In the present study, we performed de novo transcriptome sequencing for the first time to produce a comprehensive dataset for the Ma bamboo (Dendrocalamus latiflorus Munro). The Ma bamboo transcriptome was sequenced using the Illumina paired-end sequencing technology. We produced 15,138,726 reads and assembled them into 103,354 scaffolds. A total of 68,229 unigenes were identified, among which 46,087 were annotated in the NCBI non-redundant protein database and 28,165 were annotated in the Swiss-Prot database. Of these annotated unigenes, 11,921 and 10,147 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. We could map 45,649 unigenes onto 292 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database. The annotated unigenes were compared against Moso bamboo, rice and millet. Unigenes that did not match any of those three sequence datasets are considered to be Ma bamboo unique. We predicted 105 unigenes encoding eight key enzymes involved in lignin biosynthesis. In addition, 621 simple sequence repeats (SSRs) were detected. Our data provide the most comprehensive transcriptomic resource currently available for D. latiflorus Munro. Candidate genes potentially involved in growth and development were identified, and those predicted to be unique to Ma bamboo are expected to give a better insight on Ma bamboo gene diversity. Numerous SSRs characterized contributed to marker development. These data constitute a new valuable resource for genomic studies on D. latiflorus Munro and

  11. Next generation sequencing technologies: tool to study avian virus diversity.

    PubMed

    Kapgate, S S; Barbuddhe, S B; Kumanan, K

    2015-03-01

    Increased globalisation, climatic changes and wildlife-livestock interface led to emergence of novel viral pathogens or zoonoses that have become serious concern to avian, animal and human health. High biodiversity and bird migration facilitate spread of the pathogen and provide reservoirs for emerging infectious diseases. Current classical diagnostic methods designed to be virus-specific or aim to be limited to group of viral agents, hinder identifying of novel viruses or viral variants. Recently developed approaches of next-generation sequencing (NGS) provide culture-independent methods that are useful for understanding viral diversity and discovery of novel virus, thereby enabling a better diagnosis and disease control. This review discusses the different possible steps of a NGS study utilizing sequence-independent amplification, high-throughput sequencing and bioinformatics approaches to identify novel avian viruses and their diversity. NGS lead to the identification of a wide range of new viruses such as picobirnavirus, picornavirus, orthoreovirus and avian gamma coronavirus associated with fulminating disease in guinea fowl and is also used in describing viral diversity among avian species. The review also briefly discusses areas of viral-host interaction and disease associated causalities with newly identified avian viruses.

  12. Application of next-generation sequencing technologies in Neurology

    PubMed Central

    Jiang, Teng; Tan, Meng-Shan

    2014-01-01

    Genetic risk factors that underlie many rare and common neurological diseases remain poorly understood because of the multi-factorial and heterogeneous nature of these disorders. Although genome-wide association studies (GWAS) have successfully uncovered numerous susceptibility genes for these diseases, odds ratios associated with risk alleles are generally low and account for only a small proportion of estimated heritability. These results implicated that there are rare (present in <5% of the population) but not causative variants exist in the pathogenesis of these diseases, which usually have large effect size and cannot be captured by GWAS. With the decreasing cost of next-generation sequencing (NGS) technologies, whole-genome sequencing (WGS) and whole-exome sequencing (WES) have enabled the rapid identification of rare variants with large effect size, which made huge progress in understanding the basis of many Mendelian neurological conditions as well as complex neurological diseases. In this article, recent NGS-based studies that aimed to investigate genetic causes for neurological diseases, including Alzheimer’s disease, Parkinson’s disease, epilepsy, multiple sclerosis, stroke, amyotrophic lateral sclerosis and spinocerebellar ataxias, have been reviewed. In addition, we also discuss the future directions of NGS applications in this article. PMID:25568878

  13. Next Generation Sequencing in Predicting Gene Function in Podophyllotoxin Biosynthesis*

    PubMed Central

    Marques, Joaquim V.; Kim, Kye-Won; Lee, Choonseok; Costa, Michael A.; May, Gregory D.; Crow, John A.; Davin, Laurence B.; Lewis, Norman G.

    2013-01-01

    Podophyllum species are sources of (−)-podophyllotoxin, an aryltetralin lignan used for semi-synthesis of various powerful and extensively employed cancer-treating drugs. Its biosynthetic pathway, however, remains largely unknown, with the last unequivocally demonstrated intermediate being (−)-matairesinol. Herein, massively parallel sequencing of Podophyllum hexandrum and Podophyllum peltatum transcriptomes and subsequent bioinformatics analyses of the corresponding assemblies were carried out. Validation of the assembly process was first achieved through confirmation of assembled sequences with those of various genes previously established as involved in podophyllotoxin biosynthesis as well as other candidate biosynthetic pathway genes. This contribution describes characterization of two of the latter, namely the cytochrome P450s, CYP719A23 from P. hexandrum and CYP719A24 from P. peltatum. Both enzymes were capable of converting (−)-matairesinol into (−)-pluviatolide by catalyzing methylenedioxy bridge formation and did not act on other possible substrates tested. Interestingly, the enzymes described herein were highly similar to methylenedioxy bridge-forming enzymes from alkaloid biosynthesis, whereas candidates more similar to lignan biosynthetic enzymes were catalytically inactive with the substrates employed. This overall strategy has thus enabled facile further identification of enzymes putatively involved in (−)-podophyllotoxin biosynthesis and underscores the deductive power of next generation sequencing and bioinformatics to probe and deduce medicinal plant biosynthetic pathways. PMID:23161544

  14. Application of next-generation sequencing technologies in Neurology.

    PubMed

    Jiang, Teng; Tan, Meng-Shan; Tan, Lan; Yu, Jin-Tai

    2014-12-01

    Genetic risk factors that underlie many rare and common neurological diseases remain poorly understood because of the multi-factorial and heterogeneous nature of these disorders. Although genome-wide association studies (GWAS) have successfully uncovered numerous susceptibility genes for these diseases, odds ratios associated with risk alleles are generally low and account for only a small proportion of estimated heritability. These results implicated that there are rare (present in <5% of the population) but not causative variants exist in the pathogenesis of these diseases, which usually have large effect size and cannot be captured by GWAS. With the decreasing cost of next-generation sequencing (NGS) technologies, whole-genome sequencing (WGS) and whole-exome sequencing (WES) have enabled the rapid identification of rare variants with large effect size, which made huge progress in understanding the basis of many Mendelian neurological conditions as well as complex neurological diseases. In this article, recent NGS-based studies that aimed to investigate genetic causes for neurological diseases, including Alzheimer's disease, Parkinson's disease, epilepsy, multiple sclerosis, stroke, amyotrophic lateral sclerosis and spinocerebellar ataxias, have been reviewed. In addition, we also discuss the future directions of NGS applications in this article.

  15. Generation of animation sequences of three dimensional models

    NASA Technical Reports Server (NTRS)

    Poi, Sharon (Inventor); Bell, Brad N. (Inventor)

    1990-01-01

    The invention is directed toward a method and apparatus for generating an animated sequence through the movement of three-dimensional graphical models. A plurality of pre-defined graphical models are stored and manipulated in response to interactive commands or by means of a pre-defined command file. The models may be combined as part of a hierarchical structure to represent physical systems without need to create a separate model which represents the combined system. System motion is simulated through the introduction of translation, rotation and scaling parameters upon a model within the system. The motion is then transmitted down through the system hierarchy of models in accordance with hierarchical definitions and joint movement limitations. The present invention also calls for a method of editing hierarchical structure in response to interactive commands or a command file such that a model may be included, deleted, copied or moved within multiple system model hierarchies. The present invention also calls for the definition of multiple viewpoints or cameras which may exist as part of a system hierarchy or as an independent camera. The simulated movement of the models and systems is graphically displayed on a monitor and a frame is recorded by means of a video controller. Multiple movement and hierarchy manipulations are then recorded as a sequence of frames which may be played back as an animation sequence on a video cassette recorder.

  16. Applications for next-generation sequencing in fish ecotoxicogenomics

    PubMed Central

    Mehinto, Alvine C.; Martyniuk, Christopher J.; Spade, Daniel J.; Denslow, Nancy D.

    2012-01-01

    The new technologies for next-generation sequencing (NGS) and global gene expression analyses that are widely used in molecular medicine are increasingly applied to the field of fish biology. This has facilitated new directions to address research areas that could not be previously considered due to the lack of molecular information for ecologically relevant species. Over the past decade, the cost of NGS has decreased significantly, making it possible to use non-model fish species to investigate emerging environmental issues. NGS technologies have permitted researchers to obtain large amounts of raw data in short periods of time. There have also been significant improvements in bioinformatics to assemble the sequences and annotate the genes, thus facilitating the management of these large datasets.The combination of DNA sequencing and bioinformatics has improved our abilities to design custom microarrays and study the genome and transcriptome of a wide variety of organisms. Despite the promising results obtained using these techniques in fish studies, NGS technologies are currently underused in ecotoxicogenomics and few studies have employed these methods. These issues should be addressed in order to exploit the full potential of NGS in ecotoxicological studies and expand our understanding of the biology of non-model organisms. PMID:22539934

  17. Application of next-generation sequencing technologies in virology

    PubMed Central

    Chapman, David; Dixon, Linda; Chantrey, Julian; Darby, Alistair C.; Hall, Neil

    2012-01-01

    The progress of science is punctuated by the advent of revolutionary technologies that provide new ways and scales to formulate scientific questions and advance knowledge. Following on from electron microscopy, cell culture and PCR, next-generation sequencing is one of these methodologies that is now changing the way that we understand viruses, particularly in the areas of genome sequencing, evolution, ecology, discovery and transcriptomics. Possibilities for these methodologies are only limited by our scientific imagination and, to some extent, by their cost, which has restricted their use to relatively small numbers of samples. Challenges remain, including the storage and analysis of the large amounts of data generated. As the chemistries employed mature, costs will decrease. In addition, improved methods for analysis will become available, opening yet further applications in virology including routine diagnostic work on individuals, and new understanding of the interaction between viral and host transcriptomes. An exciting era of viral exploration has begun, and will set us new challenges to understand the role of newly discovered viral diversity in both disease and health. PMID:22647373

  18. Application of next-generation sequencing technologies in virology.

    PubMed

    Radford, Alan D; Chapman, David; Dixon, Linda; Chantrey, Julian; Darby, Alistair C; Hall, Neil

    2012-09-01

    The progress of science is punctuated by the advent of revolutionary technologies that provide new ways and scales to formulate scientific questions and advance knowledge. Following on from electron microscopy, cell culture and PCR, next-generation sequencing is one of these methodologies that is now changing the way that we understand viruses, particularly in the areas of genome sequencing, evolution, ecology, discovery and transcriptomics. Possibilities for these methodologies are only limited by our scientific imagination and, to some extent, by their cost, which has restricted their use to relatively small numbers of samples. Challenges remain, including the storage and analysis of the large amounts of data generated. As the chemistries employed mature, costs will decrease. In addition, improved methods for analysis will become available, opening yet further applications in virology including routine diagnostic work on individuals, and new understanding of the interaction between viral and host transcriptomes. An exciting era of viral exploration has begun, and will set us new challenges to understand the role of newly discovered viral diversity in both disease and health.

  19. Generation and sequencing of pulmonary carcinoid tumor cell lines.

    PubMed

    Asiedu, Michael K; Thomas, Charles F; Tomaszek, Sandra C; Peikert, Tobias; Sanyal, Bharati; Sutor, Shari L; Aubry, Marie-Christine; Li, Peter; Wigle, Dennis A

    2014-12-01

    Pulmonary carcinoid tumors account for approximately 5% of all lung malignancies in adults, and comprise 30% of all carcinoid tumors. There are limited reagents available to study these rare tumors, and consequently no major advances have been made for patient treatment. We report the generation and characterization of human pulmonary carcinoid tumor cell lines to study underlying biology, and to provide models for testing novel chemotherapeutic agents. Tissue was harvested from three patients with primary pulmonary typical carcinoid tumors undergoing surgical resection. The tumor was dissociated and plated onto dishes in culture media. The established cell lines were characterized by immunohistochemistry, Western blotting, and cell proliferation assays. Tumorigenicity was confirmed by soft agar growth and the ability to form tumors in a mouse xenograft model. Exome and RNA sequencing of patient tumor samples and cell lines was performed using standard protocols. Three typical carcinoid tumor lines grew as adherent monolayers in vitro, expressed neuroendocrine markers consistent with the primary tumor, and formed colonies in soft agar. A single cell line produced lung tumors in nude mice after intravenous injection. Exome and RNA sequencing of this cell line showed lineage relationship with the primary tumor, and demonstrated mutations in a number of genes related to neuronal differentiation. Three human pulmonary typical carcinoid tumor cell lines have been generated and characterized as a tool for studying the biology and novel treatment approaches for these rare tumors.

  20. [Next generation sequencing for the diagnostics and epidemiology of tuberculosis].

    PubMed

    Comas, Iñaki; Gil, Ana

    2016-07-01

    Tuberculosis (TB) has overtaken HIV (human immunodeficiency virus) and malaria as the leading cause of death by an infectious disease worldwide. The reduction in the TB incidence is a modest 2% of cases per year, thus we will need 200 years to eradicate the disease. Part of the problem is that TB control tools are decades old and cannot anymore contribute to accelerate eradication of TB. New diagnostics, treatments and vaccines are urgently needed. Next generation sequencing has the potential to become one of these new tools. Genomic characterization of TB isolates is already showing its potential for epidemiology and diagnostics, particularly to identify drug resistance mutations. However, the experimental and bioinformatics skills needed are still far from being standardized and are not easy to incorporate as a routine in clinical laboratories. In this review we will describe current next generation sequencing approaches applied to the Mycobacterium tuberculosis complex, their contribution to the diagnostics and epidemiology of the disease and the efforts that are being undertaken to make the technology accessible to public health and clinical microbiology laboratories. Copyright © 2016 Elsevier España, S.L.U. All rights reserved.

  1. High Resolution Near Surface 3D Seismic Experiments: A Carbonate Platform vs. a Siliciclastic Sequence

    NASA Astrophysics Data System (ADS)

    Filippidou, N.; Drijkoningen, G.; Braaksma, H.; Verwer, K.; Kenter, J.

    2005-05-01

    Interest in high-resolution 3D seismic experiments for imaging shallow targets has increased over the past years. Many case studies presented, show that producing clear seismic images with this non-evasive method, is still a challenge. We use two test-sites where nearby outcrops are present so that an accurate geological model can be built and the seismic result validated. The first so-called natural field laboratory is located in Boulonnais (N. France). It is an upper Jurassic siliciclastic sequence; age equivalent of the source rock of N. Sea. The second one is located in Cap Blanc,to the southwest of the Mallorca island(Spain); depicting an excellent example of Miocene prograding reef platform (Llucmajor Platform); it is a textbook analog for carbonate reservoirs. In both cases, the multidisciplinary experiment included the use of multicomponent and quasi- or 3D seismic recordings. The target depth does not exceed 120m. Vertical and shear portable vibrators were used as source. In the center of the setups, boreholes were drilled and Vertical Seismic Profiles were shot, along with core and borehole measurements both in situ and in the laboratory. These two geologically different sites, with different seismic stratigraphy have provided us with exceptionally high resolution seismic images. In general seismic data was processed more or less following standard procedures, a few innovative techniques on the Mallorca data, as rotation of horizontal components, 3D F-K filter and addition of parallel profiles, have improved the seismic image. In this paper we discuss the basic differences as seen on the seismic sections. The Boulonnais data present highly continuous reflection patterns of extremenly high resolution. This facilitated a high resolution stratigraphic description. Results from the VSP showed substantial wave energy attenuation. However, the high-fold (330 traces ) Mallorca seismic experiment returned a rather discontinuous pattern of possible reflectors

  2. Next-Generation Sequencing and Genome Editing in Plant Virology.

    PubMed

    Hadidi, Ahmed; Flores, Ricardo; Candresse, Thierry; Barba, Marina

    2016-01-01

    Next-generation sequencing (NGS) has been applied to plant virology since 2009. NGS provides highly efficient, rapid, low cost DNA, or RNA high-throughput sequencing of the genomes of plant viruses and viroids and of the specific small RNAs generated during the infection process. These small RNAs, which cover frequently the whole genome of the infectious agent, are 21-24 nt long and are known as vsRNAs for viruses and vd-sRNAs for viroids. NGS has been used in a number of studies in plant virology including, but not limited to, discovery of novel viruses and viroids as well as detection and identification of those pathogens already known, analysis of genome diversity and evolution, and study of pathogen epidemiology. The genome engineering editing method, clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 system has been successfully used recently to engineer resistance to DNA geminiviruses (family, Geminiviridae) by targeting different viral genome sequences in infected Nicotiana benthamiana or Arabidopsis plants. The DNA viruses targeted include tomato yellow leaf curl virus and merremia mosaic virus (begomovirus); beet curly top virus and beet severe curly top virus (curtovirus); and bean yellow dwarf virus (mastrevirus). The technique has also been used against the RNA viruses zucchini yellow mosaic virus, papaya ringspot virus and turnip mosaic virus (potyvirus) and cucumber vein yellowing virus (ipomovirus, family, Potyviridae) by targeting the translation initiation genes eIF4E in cucumber or Arabidopsis plants. From these recent advances of major importance, it is expected that NGS and CRISPR-Cas technologies will play a significant role in the very near future in advancing the field of plant virology and connecting it with other related fields of biology.

  3. Next-Generation Sequencing and Genome Editing in Plant Virology

    PubMed Central

    Hadidi, Ahmed; Flores, Ricardo; Candresse, Thierry; Barba, Marina

    2016-01-01

    Next-generation sequencing (NGS) has been applied to plant virology since 2009. NGS provides highly efficient, rapid, low cost DNA, or RNA high-throughput sequencing of the genomes of plant viruses and viroids and of the specific small RNAs generated during the infection process. These small RNAs, which cover frequently the whole genome of the infectious agent, are 21–24 nt long and are known as vsRNAs for viruses and vd-sRNAs for viroids. NGS has been used in a number of studies in plant virology including, but not limited to, discovery of novel viruses and viroids as well as detection and identification of those pathogens already known, analysis of genome diversity and evolution, and study of pathogen epidemiology. The genome engineering editing method, clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 system has been successfully used recently to engineer resistance to DNA geminiviruses (family, Geminiviridae) by targeting different viral genome sequences in infected Nicotiana benthamiana or Arabidopsis plants. The DNA viruses targeted include tomato yellow leaf curl virus and merremia mosaic virus (begomovirus); beet curly top virus and beet severe curly top virus (curtovirus); and bean yellow dwarf virus (mastrevirus). The technique has also been used against the RNA viruses zucchini yellow mosaic virus, papaya ringspot virus and turnip mosaic virus (potyvirus) and cucumber vein yellowing virus (ipomovirus, family, Potyviridae) by targeting the translation initiation genes eIF4E in cucumber or Arabidopsis plants. From these recent advances of major importance, it is expected that NGS and CRISPR-Cas technologies will play a significant role in the very near future in advancing the field of plant virology and connecting it with other related fields of biology. PMID:27617007

  4. Comprehensive transcriptome analysis of the highly complex Pisum sativum genome using next generation sequencing

    PubMed Central

    2011-01-01

    Background The garden pea, Pisum sativum, is among the best-investigated legume plants and of significant agro-commercial relevance. Pisum sativum has a large and complex genome and accordingly few comprehensive genomic resources exist. Results We analyzed the pea transcriptome at the highest possible amount of accuracy by current technology. We used next generation sequencing with the Roche/454 platform and evaluated and compared a variety of approaches, including diverse tissue libraries, normalization, alternative sequencing technologies, saturation estimation and diverse assembly strategies. We generated libraries from flowers, leaves, cotyledons, epi- and hypocotyl, and etiolated and light treated etiolated seedlings, comprising a total of 450 megabases. Libraries were assembled into 324,428 unigenes in a first pass assembly. A second pass assembly reduced the amount to 81,449 unigenes but caused a significant number of chimeras. Analyses of the assemblies identified the assembly step as a major possibility for improvement. By recording frequencies of Arabidopsis orthologs hit by randomly drawn reads and fitting parameters of the saturation curve we concluded that sequencing was exhaustive. For leaf libraries we found normalization allows partial recovery of expression strength aside the desired effect of increased coverage. Based on theoretical and biological considerations we concluded that the sequence reads in the database tagged the vast majority of transcripts in the aerial tissues. A pathway representation analysis showed the merits of sampling multiple aerial tissues to increase the number of tagged genes. All results have been made available as a fully annotated database in fasta format. Conclusions We conclude that the approach taken resulted in a high quality - dataset which serves well as a first comprehensive reference set for the model legume pea. We suggest future deep sequencing transcriptome projects of species lacking a genomics backbone will

  5. SNP discovery in the transcriptome of white Pacific shrimp Litopenaeus vannamei by next generation sequencing.

    PubMed

    Yu, Yang; Wei, Jiankai; Zhang, Xiaojun; Liu, Jingwen; Liu, Chengzhang; Li, Fuhua; Xiang, Jianhai

    2014-01-01

    The application of next generation sequencing technology has greatly facilitated high throughput single nucleotide polymorphism (SNP) discovery and genotyping in genetic research. In the present study, SNPs were discovered based on two transcriptomes of Litopenaeus vannamei (L. vannamei) generated from Illumina sequencing platform HiSeq 2000. One transcriptome of L. vannamei was obtained through sequencing on the RNA from larvae at mysis stage and its reference sequence was de novo assembled. The data from another transcriptome were downloaded from NCBI and the reads of the two transcriptomes were mapped separately to the assembled reference by BWA. SNP calling was performed using SAMtools. A total of 58,717 and 36,277 SNPs with high quality were predicted from the two transcriptomes, respectively. SNP calling was also performed using the reads of two transcriptomes together, and a total of 96,040 SNPs with high quality were predicted. Among these 96,040 SNPs, 5,242 and 29,129 were predicted as non-synonymous and synonymous SNPs respectively. Characterization analysis of the predicted SNPs in L. vannamei showed that the estimated SNP frequency was 0.21% (one SNP per 476 bp) and the estimated ratio for transition to transversion was 2.0. Fifty SNPs were randomly selected for validation by Sanger sequencing after PCR amplification and 76% of SNPs were confirmed, which indicated that the SNPs predicted in this study were reliable. These SNPs will be very useful for genetic study in L. vannamei, especially for the high density linkage map construction and genome-wide association studies.

  6. Microfluidic platform for on-demand generation of spatially indexed combinatorial droplets.

    PubMed

    Zec, Helena; Rane, Tushar D; Wang, Tza-Huei

    2012-09-07

    We propose a highly versatile and programmable nanolitre droplet-based platform that accepts an unlimited number of sample plugs from a multi-well plate, performs digitization of these sample plugs into smaller daughter droplets and subsequent synchronization-free, robust injection of multiple reagents into the sample daughter droplets on-demand. This platform combines excellent control of valve-based microfluidics with the high-throughput capability of droplet microfluidics. We demonstrate the functioning of a proof-of-concept device which generates combinatorial mixture droplets from a linear array of sample plugs and four different reagents, using food dyes to mimic samples and reagents. Generation of a one dimensional array of the combinatorial mixture droplets on the device leads to automatic spatial indexing of these droplets, precluding the need to include a barcode in each droplet to identify its contents. We expect this platform to further expand the range of applications of droplet microfluidics to include applications requiring a high degree of multiplexing as well as high throughput analysis of multiple samples.

  7. Addressing Benefits, Risks and Consent in Next Generation Sequencing Studies

    PubMed Central

    Meller, R

    2016-01-01

    The sequencing of the human genome and technological advances in DNA sequencing have led to a revolution with respect to DNA sequencing and its potential to diagnose genetic disorders. However, requests for open access to genomic data must be balanced against the guiding principles of the Common Rule for human subject research. Unfortunately, the risks to patients involved in genomic studies are still evolving and as such may not be clear to learned and well-intentioned scientists. Central to this issue are the strategies that enable human participants in such studies to remain anonymous, or de-identified. The wealth of genomic data on the Internet in genomic data repositories and other databases has enabled de-identified data to be broken and research subjects to be identified. The security of de-identification neglects the fact that DNA itself is an identifying element. Therefore, it is questionable whether data security standards can ever truly protect the identity of a patient, under the current conditions or in the future. As Big Data methodologies advance, additional sources of data may enable the re-identification of patients enrolled in next-generation sequencing (NGS) studies. As such, it is time to re-evaluate the risks of sharing genomic data and establish new guidelines for good practices. In this commentary, I address the challenges facing federally funded investigators who need to strike a balance between compliance with federal (US) rules for human subjects and the recent requirement for open access/sharing of data from National Institute for Health (NIH)-funded studies involving human subjects. PMID:27375922

  8. SMITH: a LIMS for handling next-generation sequencing workflows

    PubMed Central

    2014-01-01

    Background Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). Methods SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. Results SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The

  9. Next-generation polyploid phylogenetics: rapid resolution of hybrid polyploid complexes using PacBio single-molecule sequencing.

    PubMed

    Rothfels, Carl J; Pryer, Kathleen M; Li, Fay-Wei

    2017-01-01

    Difficulties in generating nuclear data for polyploids have impeded phylogenetic study of these groups. We describe a high-throughput protocol and an associated bioinformatics pipeline (Pipeline for Untangling Reticulate Complexes (Purc)) that is able to generate these data quickly and conveniently, and demonstrate its efficacy on accessions from the fern family Cystopteridaceae. We conclude with a demonstration of the downstream utility of these data by inferring a multi-labeled species tree for a subset of our accessions. We amplified four c. 1-kb-long nuclear loci and sequenced them in a parallel-tagged amplicon sequencing approach using the PacBio platform. Purc infers the final sequences from the raw reads via an iterative approach that corrects PCR and sequencing errors and removes PCR-mediated recombinant sequences (chimeras). We generated data for all gene copies (homeologs, paralogs, and segregating alleles) present in each of three sets of 50 mostly polyploid accessions, for four loci, in three PacBio runs (one run per set). From the raw sequencing reads, Purc was able to accurately infer the underlying sequences. This approach makes it easy and economical to study the phylogenetics of polyploids, and, in conjunction with recent analytical advances, facilitates investigation of broad patterns of polyploid evolution.

  10. ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence

    PubMed Central

    2011-01-01

    Background The possibilities offered by next generation sequencing (NGS) platforms are revolutionizing biotechnological laboratories. Moreover, the combination of NGS sequencing and affordable high-throughput genotyping technologies is facilitating the rapid discovery and use of SNPs in non-model species. However, this abundance of sequences and polymorphisms creates new software needs. To fulfill these needs, we have developed a powerful, yet easy-to-use application. Results The ngs_backbone software is a parallel pipeline capable of analyzing Sanger, 454, Illumina and SOLiD (Sequencing by Oligonucleotide Ligation and Detection) sequence reads. Its main supported analyses are: read cleaning, transcriptome assembly and annotation, read mapping and single nucleotide polymorphism (SNP) calling and selection. In order to build a truly useful tool, the software development was paired with a laboratory experiment. All public tomato Sanger EST reads plus 14.2 million Illumina reads were employed to test the tool and predict polymorphism in tomato. The cleaned reads were mapped to the SGN tomato transcriptome obtaining a coverage of 4.2 for Sanger and 8.5 for Illumina. 23,360 single nucleotide variations (SNVs) were predicted. A total of 76 SNVs were experimentally validated, and 85% were found to be real. Conclusions ngs_backbone is a new software package capable of analyzing sequences produced by NGS technologies and predicting SNVs with great accuracy. In our tomato example, we created a highly polymorphic collection of SNVs that will be a useful resource for tomato researchers and breeders. The software developed along with its documentation is freely available under the AGPL license and can be downloaded from http://bioinf.comav.upv.es/ngs_backbone/ or http://github.com/JoseBlanca/franklin. PMID:21635747

  11. Characterization of NIST human mitochondrial DNA SRM-2392 and SRM-2392-I standard reference materials by next generation sequencing.

    PubMed

    Riman, Sarah; Kiesler, Kevin M; Borsuk, Lisa A; Vallone, Peter M

    2017-07-01

    Standard Reference Materials SRM 2392 and 2392-I are intended to provide quality control when amplifying and sequencing human mitochondrial genome sequences. The National Institute of Standards and Technology (NIST) offers these SRMs to laboratories performing DNA-based forensic human identification, molecular diagnosis of mitochondrial diseases, mutation detection, evolutionary anthropology, and genetic genealogy. The entire mtGenome (∼16569bp) of SRM 2392 and 2392-I have previously been characterized at NIST by Sanger sequencing. Herein, we used the sensitivity, specificity, and accuracy offered by next generation sequencing (NGS) to: (1) re-sequence the certified values of the SRM 2392 and 2392-I; (2) confirm Sanger data with a high coverage new sequencing technology; (3) detect lower level heteroplasmies (<20%); and thus (4) support mitochondrial sequencing communities in the adoption of NGS methods. To obtain a consensus sequence for the SRMs as well as identify and control any bias, sequencing was performed using two NGS platforms and data was analyzed using different bioinformatics pipelines. Our results confirm five low level heteroplasmy sites that were not previously observed with Sanger sequencing: three sites in the GM09947A template in SRM 2392 and two sites in the HL-60 template in SRM 2392-I. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. PAFFT: A new homology search algorithm for third-generation sequencers.

    PubMed

    Misawa, Kazuharu; Ootsuki, Ryo

    2015-11-01

    DNA sequencers that can conduct real-time sequencing from a single polymerase molecule are known as third-generation sequencers. Third-generation sequencers enable sequencing of reads that are several kilobases long. However, the raw data generated from third-generation sequencers are known to be error-prone. Because of sequencing errors, it is difficult to identify which genes are homologous to the reads obtained using third-generation sequencers. In this study, a new method for homology search algorithm, PAFFT, is developed. This method is the extension of the MAFFT algorithm which was used for multiple alignments. PAFFT detects global homology rather than local homology so that homologous regions can be detected even when the error rate of sequencing is high. PAFFT will boost application of third-generation sequencers. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Transcriptome sequencing and analysis of leaf tissue of Avicennia marina using the Illumina platform.

    PubMed

    Huang, Jianzi; Lu, Xiang; Zhang, Wanke; Huang, Rongfeng; Chen, Shouyi; Zheng, Yizhi

    2014-01-01

    Avicennia marina is a widely distributed mangrove species that thrives in high-salinity habitats. It plays a significant role in supporting coastal ecosystem and holds unique potential for studying molecular mechanisms underlying ecological adaptation. Despite and sometimes because of its numerous merits, this species is facing increasing pressure of exploitation and deforestation. Both study on adaptation mechanisms and conservation efforts necessitate more genomic resources for A. marina. In this study, we used Illumina sequencing of an A. marina foliar cDNA library to generate a transcriptome dataset for gene and marker discovery. We obtained 40 million high-quality reads and assembled them into 91,125 unigenes with a mean length of 463 bp. These unigenes covered most of the publicly available A. marina Sanger ESTs and greatly extended the repertoire of transcripts for this species. A total of 54,497 and 32,637 unigenes were annotated based on homology to sequences in the NCBI non-redundant and the Swiss-prot protein databases, respectively. Both Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis revealed some transcriptomic signatures of stress adaptation for this halophytic species. We also detected an extraordinary amount of transcripts derived from fungal endophytes and demonstrated the utility of transcriptome sequencing in surveying endophyte diversity without isolating them out of plant tissues. Additionally, we identified 3,423 candidate simple sequence repeats (SSRs) from 3,141 unigenes with a density of one SSR locus every 8.25 kb sequence. Our transcriptomic data will provide valuable resources for ecological, genetic and evolutionary studies in A. marina.

  14. Transcriptome Sequencing and Analysis of Leaf Tissue of Avicennia marina Using the Illumina Platform

    PubMed Central

    Zhang, Wanke; Huang, Rongfeng; Chen, Shouyi; Zheng, Yizhi

    2014-01-01

    Avicennia marina is a widely distributed mangrove species that thrives in high-salinity habitats. It plays a significant role in supporting coastal ecosystem and holds unique potential for studying molecular mechanisms underlying ecological adaptation. Despite and sometimes because of its numerous merits, this species is facing increasing pressure of exploitation and deforestation. Both study on adaptation mechanisms and conservation efforts necessitate more genomic resources for A. marina. In this study, we used Illumina sequencing of an A. marina foliar cDNA library to generate a transcriptome dataset for gene and marker discovery. We obtained 40 million high-quality reads and assembled them into 91,125 unigenes with a mean length of 463 bp. These unigenes covered most of the publicly available A. marina Sanger ESTs and greatly extended the repertoire of transcripts for this species. A total of 54,497 and 32,637 unigenes were annotated based on homology to sequences in the NCBI non-redundant and the Swiss-prot protein databases, respectively. Both Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis revealed some transcriptomic signatures of stress adaptation for this halophytic species. We also detected an extraordinary amount of transcripts derived from fungal endophytes and demonstrated the utility of transcriptome sequencing in surveying endophyte diversity without isolating them out of plant tissues. Additionally, we identified 3,423 candidate simple sequence repeats (SSRs) from 3,141 unigenes with a density of one SSR locus every 8.25 kb sequence. Our transcriptomic data will provide valuable resources for ecological, genetic and evolutionary studies in A. marina. PMID:25265387

  15. Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing.

    PubMed

    Misra, Sanchit; Agrawal, Ankit; Liao, Wei-keng; Choudhary, Alok

    2011-01-15

    Recently, a number of programs have been proposed for mapping short reads to a reference genome. Many of them are heavily optimized for short-read mapping and hence are very efficient for shorter queries, but that makes them inefficient or not applicable for reads longer than 200 bp. However, many sequencers are already generating longer reads and more are expected to follow. For long read sequence mapping, there are limited options; BLAT, SSAHA2, FANGS and BWA-SW are among the popular ones. However, resequencing and personalized medicine need much faster software to map these long sequencing reads to a reference genome to identify SNPs or rare transcripts. We present AGILE (AliGnIng Long rEads), a hash table based high-throughput sequence mapping algorithm for longer 454 reads that uses diagonal multiple seed-match criteria, customized q-gram filtering and a dynamic incremental search approach among other heuristics to optimize every step of the mapping process. In our experiments, we observe that AGILE is more accurate than BLAT, and comparable to BWA-SW and SSAHA2. For practical error rates (< 5%) and read lengths (200-1000 bp), AGILE is significantly faster than BLAT, SSAHA2 and BWA-SW. Even for the other cases, AGILE is comparable to BWA-SW and several times faster than BLAT and SSAHA2. http://www.ece.northwestern.edu/~smi539/agile.html.

  16. High-Throughput, Amplicon-Based Sequencing of the CREBBP Gene as a Tool to Develop a Universal Platform-Independent Assay

    PubMed Central

    Fuellgrabe, Marc W.; Herrmann, Dietrich; Knecht, Henrik; Kuenzel, Sven; Kneba, Michael; Pott, Christiane; Brüggemann, Monika

    2015-01-01

    High-throughput sequencing technologies are widely used to analyse genomic variants or rare mutational events in different fields of genomic research, with a fast development of new or adapted platforms and technologies, enabling amplicon-based analysis of single target genes or even whole genome sequencing within a short period of time. Each sequencing platform is characterized by well-defined types of errors, resulting from different steps in the sequencing workflow. Here we describe a universal method to prepare amplicon libraries that can be used for sequencing on different high-throughput sequencing platforms. We have sequenced distinct exons of the CREB binding protein (CREBBP) gene and analysed the output resulting from three major deep-sequencing platforms. platform-specific errors were adjusted according to the result of sequence analysis from the remaining platforms. Additionally, bioinformatic methods are described to determine platform dependent errors. Summarizing the results we present a platform-independent cost-efficient and timesaving method that can be used as an alternative to commercially available sample-preparation kits. PMID:26057250

  17. Genomics of medulloblastoma: from Giemsa-banding to next-generation sequencing in 20 years.

    PubMed

    Northcott, Paul A; Rutka, James T; Taylor, Michael D

    2010-01-01

    Advances in the field of genomics have recently enabled the unprecedented characterization of the cancer genome, providing novel insight into the molecular mechanisms underlying malignancies in humans. The application of high-resolution microarray platforms to the study of medulloblastoma has revealed new oncogenes and tumor suppressors and has implicated changes in DNA copy number, gene expression, and methylation state in its etiology. Additionally, the integration of medulloblastoma genomics with patient clinical data has confirmed molecular markers of prognostic significance and highlighted the potential utility of molecular disease stratification. The advent of next-generation sequencing technologies promises to greatly transform our understanding of medulloblastoma pathogenesis in the next few years, permitting comprehensive analyses of all aspects of the genome and increasing the likelihood that genomic medicine will become part of the routine diagnosis and treatment of medulloblastoma.

  18. Identification of conserved genomic regions and variation therein amongst Cetartiodactyla species using next generation sequencing

    USDA-ARS?s Scientific Manuscript database

    Background Next Generation Sequencing has created an opportunity to genetically characterize an individual both inexpensively and comprehensively. In earlier work produced in our collaboration [1], it was demonstrated that, for animals without a reference genome, their Next Generation Sequence data ...

  19. Sequence capture and next-generation sequencing of ultraconserved elements in a large-genome salamander.

    PubMed

    Newman, Catherine E; Austin, Christopher C

    2016-12-01

    Amidst the rapid advancement in next-generation sequencing (NGS) technology over the last few years, salamanders have been left behind. Salamanders have enormous genomes-up to 40 times the size of the human genome-and this poses challenges to generating NGS data sets of quality and quantity similar to those of other vertebrates. However, optimization of laboratory protocols is time-consuming and often cost prohibitive, and continued omission of salamanders from novel phylogeographic research is detrimental to species facing decline. Here, we use a salamander endemic to the southeastern United States, Plethodon serratus, to test the utility of an established protocol for sequence capture of ultraconserved elements (UCEs) in resolving intraspecific phylogeographic relationships and delimiting cryptic species. Without modifying the standard laboratory protocol, we generated a data set consisting of over 600 million reads for 85 P. serratus samples. Species delimitation analyses support recognition of seven species within P. serratus sensu lato, and all phylogenetic relationships among the seven species are fully resolved under a coalescent model. Results also corroborate previous data suggesting nonmonophyly of the Ouachita and Louisiana regions. Our results demonstrate that established UCE protocols can successfully be used in phylogeographic studies of salamander species, providing a powerful tool for future research on evolutionary history of amphibians and other organisms with large genomes.

  20. Generative Technologies for Model Animation in the TopCased Platform

    NASA Astrophysics Data System (ADS)

    Crégut, Xavier; Combemale, Benoit; Pantel, Marc; Faudoux, Raphaël; Pavei, Jonatas

    Domain Specific Modeling Languages (DSML) are more and more used to handle high level concepts, and thus bring complex software development under control. The increasingly recurring definition of new languages raises the problem of the definition of support tools such as editor, simulator, compiler, etc. In this paper we propose generative technologies that have been designed to ease the development of model animation tools inside the TopCased platform. These tools rely on the automatically generated graphical editors of TopCased and provide additional generators for building model animator graphical interface. We also rely on an architecture for executable metamodel (i.e., the TopCased model execution metamodeling pattern) to bind the behavioral semantics of the modeling language. These tools were designed in a pragmatic manner by abstracting the various model animators that had been hand-coded in the TopCased project, and then validated by refactoring these animators.

  1. Transcriptome sequencing as a platform to elucidate molecular components of the diapause response in the Asian tiger mosquito, Aedes albopictus.

    PubMed

    Poelchau, Monica F; Reynolds, Julie A; Denlinger, David L; Elsik, Christine G; Armbruster, Peter A

    2013-06-01

    Diapause has long been recognized as a crucial ecological adaptation to spatio-temporal environmental variation. More recently, rapid evolution of the diapause response has been implicated in response to contemporary global warming and during the range expansion of invasive species. Although the molecular regulation of diapause remains largely unresolved, rapidly emerging next-generation sequencing (NGS) technologies provide exciting opportunities to address this longstanding question. Herein, a new assembly from life-history stages relevant to diapause in the Asian tiger mosquito, Aedes albopictus (Skuse) is presented, along with unique methods for the analysis of NGS data and transcriptome assembly. A digital normalization procedure that significantly reduces computational resources required for transcriptome assembly is evaluated. Additionally, a method for protein reference-based and genomic reference-based merged assembly of 454 and Illumina reads is described. Finally, a gene ontology analysis is presented, which creates a platform to identify physiological processes associated with diapause. Taken together, these methods provide valuable tools for analyzing the transcriptional underpinnings of many complex phenotypes, including diapause, and provide a basis for determining the molecular regulation of diapause in Ae. albopictus.

  2. Targeted Next Generation Sequencing Identifies Markers of Response to PD-1 Blockade.

    PubMed

    Johnson, Douglas B; Frampton, Garrett M; Rioth, Matthew J; Yusko, Erik; Xu, Yaomin; Guo, Xingyi; Ennis, Riley C; Fabrizio, David; Chalmers, Zachary R; Greenbowe, Joel; Ali, Siraj M; Balasubramanian, Sohail; Sun, James X; He, Yuting; Frederick, Dennie T; Puzanov, Igor; Balko, Justin M; Cates, Justin M; Ross, Jeffrey S; Sanders, Catherine; Robins, Harlan; Shyr, Yu; Miller, Vincent A; Stephens, Philip J; Sullivan, Ryan J; Sosman, Jeffrey A; Lovly, Christine M

    2016-11-01

    Therapeutic antibodies blocking programmed death-1 and its ligand (PD-1/PD-L1) induce durable responses in a substantial fraction of melanoma patients. We sought to determine whether the number and/or type of mutations identified using a next-generation sequencing (NGS) panel available in the clinic was correlated with response to anti-PD-1 in melanoma. Using archival melanoma samples from anti-PD-1/PD-L1-treated patients, we performed hybrid capture-based NGS on 236-315 genes and T-cell receptor (TCR) sequencing on initial and validation cohorts from two centers. Patients who responded to anti-PD-1/PD-L1 had higher mutational loads in an initial cohort (median, 45.6 vs. 3.9 mutations/MB; P = 0.003) and a validation cohort (37.1 vs. 12.8 mutations/MB; P = 0.002) compared with nonresponders. Response rate, progression-free survival, and overall survival were superior in the high, compared with intermediate and low, mutation load groups. Melanomas with NF1 mutations harbored high mutational loads (median, 62.7 mutations/MB) and high response rates (74%), whereas BRAF/NRAS/NF1 wild-type melanomas had a lower mutational load. In these archival samples, TCR clonality did not predict response. Mutation numbers in the 315 genes in the NGS platform strongly correlated with those detected by whole-exome sequencing in The Cancer Genome Atlas samples, but was not associated with survival. In conclusion, mutational load, as determined by an NGS platform available in the clinic, effectively stratified patients by likelihood of response. This approach may provide a clinically feasible predictor of response to anti-PD-1/PD-L1. Cancer Immunol Res; 4(11); 959-67. ©2016 AACR.

  3. Low diversity in the mitogenome of sperm whales revealed by next-generation sequencing.

    PubMed

    Alexander, Alana; Steel, Debbie; Slikas, Beth; Hoekzema, Kendra; Carraher, Colm; Parks, Matthew; Cronn, Richard; Baker, C Scott

    2013-01-01

    Large population sizes and global distributions generally associate with high mitochondrial DNA control region (CR) diversity. The sperm whale (Physeter macrocephalus) is an exception, showing low CR diversity relative to other cetaceans; however, diversity levels throughout the remainder of the sperm whale mitogenome are unknown. We sequenced 20 mitogenomes from 17 sperm whales representative of worldwide diversity using Next Generation Sequencing (NGS) technologies (Illumina GAIIx, Roche 454 GS Junior). Resequencing of three individuals with both NGS platforms and partial Sanger sequencing showed low discrepancy rates (454-Illumina: 0.0071%; Sanger-Illumina: 0.0034%; and Sanger-454: 0.0023%) confirming suitability of both NGS platforms for investigating low mitogenomic diversity. Using the 17 sperm whale mitogenomes in a phylogenetic reconstruction with 41 other species, including 11 new dolphin mitogenomes, we tested two hypotheses for the low CR diversity. First, the hypothesis that CR-specific constraints have reduced diversity solely in the CR was rejected as diversity was low throughout the mitogenome, not just in the CR (overall diversity π = 0.096%; protein-coding 3rd codon = 0.22%; CR = 0.35%), and CR phylogenetic signal was congruent with protein-coding regions. Second, the hypothesis that slow substitution rates reduced diversity throughout the sperm whale mitogenome was rejected as sperm whales had significantly higher rates of CR evolution and no evidence of slow coding region evolution relative to other cetaceans. The estimated time to most recent common ancestor for sperm whale mitogenomes was 72,800 to 137,400 years ago (95% highest probability density interval), consistent with previous hypotheses of a bottleneck or selective sweep as likely causes of low mitogenome diversity.

  4. Low Diversity in the Mitogenome of Sperm Whales Revealed by Next-Generation Sequencing

    PubMed Central

    Alexander, Alana; Steel, Debbie; Slikas, Beth; Hoekzema, Kendra; Carraher, Colm; Parks, Matthew; Cronn, Richard; Baker, C. Scott

    2013-01-01

    Large population sizes and global distributions generally associate with high mitochondrial DNA control region (CR) diversity. The sperm whale (Physeter macrocephalus) is an exception, showing low CR diversity relative to other cetaceans; however, diversity levels throughout the remainder of the sperm whale mitogenome are unknown. We sequenced 20 mitogenomes from 17 sperm whales representative of worldwide diversity using Next Generation Sequencing (NGS) technologies (Illumina GAIIx, Roche 454 GS Junior). Resequencing of three individuals with both NGS platforms and partial Sanger sequencing showed low discrepancy rates (454-Illumina: 0.0071%; Sanger-Illumina: 0.0034%; and Sanger-454: 0.0023%) confirming suitability of both NGS platforms for investigating low mitogenomic diversity. Using the 17 sperm whale mitogenomes in a phylogenetic reconstruction with 41 other species, including 11 new dolphin mitogenomes, we tested two hypotheses for the low CR diversity. First, the hypothesis that CR-specific constraints have reduced diversity solely in the CR was rejected as diversity was low throughout the mitogenome, not just in the CR (overall diversity π = 0.096%; protein-coding 3rd codon = 0.22%; CR = 0.35%), and CR phylogenetic signal was congruent with protein-coding regions. Second, the hypothesis that slow substitution rates reduced diversity throughout the sperm whale mitogenome was rejected as sperm whales had significantly higher rates of CR evolution and no evidence of slow coding region evolution relative to other cetaceans. The estimated time to most recent common ancestor for sperm whale mitogenomes was 72,800 to 137,400 years ago (95% highest probability density interval), consistent with previous hypotheses of a bottleneck or selective sweep as likely causes of low mitogenome diversity. PMID:23254394

  5. Metre-scale cyclicity in Middle Eocene platform carbonates in northern Egypt: Implications for facies development and sequence stratigraphy

    NASA Astrophysics Data System (ADS)

    Tawfik, Mohamed; El-Sorogy, Abdelbaset; Moussa, Mahmoud

    2016-07-01

    The shallow-water carbonates of the Middle Eocene in northern Egypt represent a Tethyan reef-rimmed carbonate platform with bedded inner-platform facies. Based on extensive micro- and biofacies documentation, five lithofacies associations were defined and their respective depositional environments were interpreted. Investigated sections were subdivided into three third-order sequences, named S1, S2 and S3. Sequence S1 is interpreted to correspond to the Lutetian, S2 corresponds to the Late Lutetian and Early Bartonian, and S3 represents the Late Bartonian. Each of the three sequences was further subdivided into fourth-order cycle sets and fifth-order cycles. The complete hierarchy of cycles can be correlated along 190 km across the study area, and highlighting a general "layer-cake" stratigraphic architecture. The documentation of the studied outcrops may contribute to the better regional understanding of the Middle Eocene formations in northern Egypt and to Tethyan pericratonic carbonate models in general.

  6. Application of next-generation sequencing technology in forensic science.

    PubMed

    Yang, Yaran; Xie, Bingbing; Yan, Jiangwei

    2014-10-01

    Next-generation sequencing (NGS) technology, with its high-throughput capacity and low cost, has developed rapidly in recent years and become an important analytical tool for many genomics researchers. New opportunities in the research domain of the forensic studies emerge by harnessing the power of NGS technology, which can be applied to simultaneously analyzing multiple loci of forensic interest in different genetic contexts, such as autosomes, mitochondrial and sex chromosomes. Furthermore, NGS technology can also have potential applications in many other aspects of research. These include DNA database construction, ancestry and phenotypic inference, monozygotic twin studies, body fluid and species identification, and forensic animal, plant and microbiological analyses. Here we review the application of NGS technology in the field of forensic science with the aim of providing a reference for future forensics studies and practice. Copyright © 2014 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  7. Next-Generation Sequencing in Genetic Hearing Loss

    PubMed Central

    Yan, Denise; Tekin, Mustafa; Blanton, Susan H.

    2013-01-01

    The advent of the $1000 genome has the potential to revolutionize the identification of genes and their mutations underlying genetic disorders. This is especially true for extremely heterogeneous Mendelian conditions such as deafness, where the mutation, and indeed the gene, may be private. The recent technological advances in target-enrichment methods and next generation sequencing offer a unique opportunity to break through the barriers of limitations imposed by gene arrays. These approaches now allow for the complete analysis of all known deafness-causing genes and will result in a new wave of discoveries of the remaining genes for Mendelian disorders. In this review, we describe commonly used genomic technologies as well as the application of these technologies to the genetic diagnosis of hearing loss (HL) and to the discovery of novel genes for syndromic and nonsyndromic HL. PMID:23738631

  8. Utility of Next Generation Sequencing in Clinical Primary Immunodeficiencies

    PubMed Central

    Raje, Nikita; Soden, Sarah; Swanson, Douglas; Ciaccio, Christina E.; Kingsmore, Stephen F.; Dinwiddie, Darrell L.

    2015-01-01

    Primary immunodeficiencies (PIDs) are a group of genetically heterogeneous disorders that present with very similar symptoms, complicating definitive diagnosis. More than 240 genes have hitherto been associated with PIDs, of which more than 30 have been identified in the last 3 years. Next generation sequencing (NGS) of genomes or exomes of informative families has played a central role in the discovery of novel PID genes. Furthermore, NGS has the potential to transform clinical molecular testing for established PIDs, allowing all PID differential diagnoses to be tested at once, leading to increased diagnostic yield, while decreasing both the time and cost of obtaining a molecular diagnosis. Given that treatment of PID varies by disease gene, early achievement of a molecular diagnosis is likely to enhance treatment decisions and improve patient outcomes. PMID:25149170

  9. Current next generation sequencing technology may not meet forensic standards.

    PubMed

    Bandelt, Hans-Jürgen; Salas, Antonio

    2012-01-01

    In a Nature paper of 2010, the concern was raised that intra-individual mtDNA variation may be more pronounced than previously believed, in that heteroplasmies are common and vary markedly from tissue to tissue. This claim taken at face value would have considerable impact on forensic casework. It turns out however that the employed technology detected the germ-line variation relative to the reference sequence only incompletely: on average at least five mutations were missed per sample, as an in silico reassessment of the data reveals. Before one can really set out to access to entire mtDNA genome data with relative ease for forensic purposes, one needs careful calibration studies under strict forensic conditions-or might have to wait for another generation.

  10. Next generation sequencing in epigenetics: insights and challenges.

    PubMed

    Meaburn, Emma; Schulz, Reiner

    2012-04-01

    The epigenetics community was an early adopter of next generation sequencing (NGS). NGS-based studies have provided detailed and comprehensive views of epigenetic modifications for the genomes of many species and cell types. Recently, DNA methylation has attracted much attention due to the discovery of 5-hydroxymethyl-cytosine and its role in epigenetic reprogramming and pluripotency. This renewed interest has been concomitant with methodological progress enabling, for example, high coverage and single base resolution profiling of the mammalian methylome in small numbers of cells. We summarise this progress and highlight resulting key findings about the complexity of eukaryotic DNA methylation, its role in metazoan genome evolution, epigenetic reprogramming, and its close ties with histone modifications in the context of transcription. Finally, we discuss how fundamental insights gained by NGS, particularly the discovery of widespread allele-specific epigenetic variation in the human genome, have the potential to significantly contribute to the understanding of human common complex diseases.

  11. Prenatal diagnosis of Gaucher disease using next-generation sequencing.

    PubMed

    Yoshida, Shinichiro; Kido, Jun; Matsumoto, Shirou; Momosaki, Ken; Mitsubuchi, Hiroshi; Shimazu, Tomoyuki; Sugawara, Keishin; Endo, Fumio; Nakamura, Kimitoshi

    2016-09-01

    In the prenatal diagnosis of Gaucher disease (GD), glucocerebrosidase (GBA) activity is measured with fetal cells, and gene analysis is performed when pathogenic mutations in GBA are identified in advance. Herein is described prenatal diagnosis in a family in which two children had GD. Although prior genetic information for this GD family was not obtained, next-generation sequencing (NGS) was carried out for this family because immediate prenatal diagnosis was necessary. Three mutations were identified in this GD family. The father had one mutation in intron 3 (IVS2 + 1), the mother had two mutations in exons 3 (I[-20]V) and 5 (M85T), and child 1 had all three of these mutations; child 3 had none of these mutations. On NGS the present fetus (child 3) was not a carrier of GD-related mutations. NGS may facilitate early detection and treatment before disease onset. © 2016 Japan Pediatric Society.

  12. Application of Next-generation Sequencing Technology in Forensic Science

    PubMed Central

    Yang, Yaran; Xie, Bingbing; Yan, Jiangwei

    2014-01-01

    Next-generation sequencing (NGS) technology, with its high-throughput capacity and low cost, has developed rapidly in recent years and become an important analytical tool for many genomics researchers. New opportunities in the research domain of the forensic studies emerge by harnessing the power of NGS technology, which can be applied to simultaneously analyzing multiple loci of forensic interest in different genetic contexts, such as autosomes, mitochondrial and sex chromosomes. Furthermore, NGS technology can also have potential applications in many other aspects of research. These include DNA database construction, ancestry and phenotypic inference, monozygotic twin studies, body fluid and species identification, and forensic animal, plant and microbiological analyses. Here we review the application of NGS technology in the field of forensic science with the aim of providing a reference for future forensics studies and practice. PMID:25462152

  13. Molecular diagnostics of a single drug-resistant multiple myeloma case using targeted next-generation sequencing

    PubMed Central

    Ikeda, Hiroshi; Ishiguro, Kazuya; Igarashi, Tetsuyuki; Aoki, Yuka; Hayashi, Toshiaki; Ishida, Tadao; Sasaki, Yasushi; Tokino, Takashi; Shinomura, Yasuhisa

    2015-01-01

    A 69-year-old man was diagnosed with IgG λ-type multiple myeloma (MM), Stage II in October 2010. He was treated with one cycle of high-dose dexamethasone. After three cycles of bortezomib, the patient exhibited slow elevations in the free light-chain levels and developed a significant new increase of serum M protein. Bone marrow cytogenetic analysis revealed a complex karyotype characteristic of malignant plasma cells. To better understand the molecular pathogenesis of this patient, we sequenced for mutations in the entire coding regions of 409 cancer-related genes using a semiconductor-based sequencing platform. Sequencing analysis revealed eight nonsynonymous somatic mutations in addition to several copy number variants, including CCND1 and RB1. These alterations may play roles in the pathobiology of this disease. This targeted next-generation sequencing can allow for the prediction of drug resistance and facilitate improvements in the treatment of MM patients. PMID:26491355

  14. Metagenome of microorganisms associated with the toxic Cyanobacteria Microcystis aeruginosa analyzed using the 454 sequencing platform

    NASA Astrophysics Data System (ADS)

    Li, Nan; Zhang, Lei; Li, Fuchao; Wang, Yuezhu; Zhu, Yongqiang; Kang, Hui; Wang, Shengyue; Qin, Song

    2011-05-01

    In this study, the 454 pyrosequencing technology was used to analyze the DNA of the Microcystis aeruginosa symbiosis system from cyanobacterial algal blooms in Taihu Lake, China. We generated 183 228 reads with an average length of 248 bp. Running the 454 assembly algorithm over our sequences yielded 22 239 significant contigs. After excluding the M. aeruginosa sequences, we obtained 1 322 assembled contigs longer than 1 000 bp. Taxonomic analysis indicated that four kingdoms were represented in the community: Archaea ( n = 9; 0.01%), Bacteria ( n = 98 921; 99.6%), Eukaryota ( n = 373; 3.7%), and Viruses ( n = 18; 0.02%). The bacterial sequences were predominantly Alphaproteobacteria ( n = 41 805; 83.3%), Betaproteobacteria ( n = 5 254; 10.5%) and Gammaproteobacteria ( n = 1 180; 2.4%). Gene annotations and assignment of COG (clusters of orthologous groups) functional categories indicate that a large number of the predicted genes are involved in metabolic, genetic, and environmental information processes. Our results demonstrate the extraordinary diversity of a microbial community in an ectosymbiotic system and further establish the tremendous utility of pyrosequencing.

  15. Identification of Disease-Causing Mutations in Autosomal Dominant Retinitis Pigmentosa (adRP) Using Next-Generation DNA Sequencing

    PubMed Central

    Bowne, Sara J.; Sullivan, Lori S.; Koboldt, Daniel C.; Ding, Li; Fulton, Robert; Abbott, Rachel M.; Sodergren, Erica J.; Birch, David G.; Wheaton, Dianna H.; Heckenlively, John R.; Liu, Qin; Pierce, Eric A.; Weinstock, George M.

    2011-01-01

    Purpose. To determine whether massively parallel next-generation DNA sequencing offers rapid and efficient detection of disease-causing mutations in patients with monogenic inherited diseases. Retinitis pigmentosa (RP) is a challenging application for this technology because it is a monogenic disease in individuals and families but is highly heterogeneous in patient populations. RP has multiple patterns of inheritance, with mutations in many genes for each inheritance pattern and numerous, distinct, disease-causing mutations at each locus; further, many RP genes have not been identified yet. Methods. Next-generation sequencing was used to identify mutations in pairs of affected individuals from 21 families with autosomal dominant RP, selected from a cohort of families without mutations in “common” RP genes. One thousand amplicons targeting 249,267 unique bases of 46 candidate genes were sequenced with the 454GS FLX Titanium (Roche Diagnostics, Indianapolis, IN) and GAIIx (Illumina/Solexa, San Diego, CA) platforms. Results. An average sequence depth of 70× and 125× was obtained for the 454GS FLX and GAIIx platforms, respectively. More than 9000 sequence variants were identified and analyzed, to assess the likelihood of pathogenicity. One hundred twelve of these were selected as likely candidates and tested for segregation with traditional di-deoxy capillary electrophoresis sequencing of additional family members and control subjects. Five disease-causing mutations (24%) were identified in the 21 families. Conclusion. This project demonstrates that next-generation sequencing is an effective approach for detecting novel, rare mutations causing heterogeneous monogenic disorders such as RP. With the addition of this technology, disease-causing mutations can now be identified in 65% of autosomal dominant RP cases. PMID:20861475

  16. Next-generation sequencing applications for wheat crop improvement.

    PubMed

    Berkman, Paul J; Lai, Kaitao; Lorenc, Michal T; Edwards, David

    2012-02-01

    Bread wheat (Triticum aestivum; Poaceae) is a crop plant of great importance. It provides nearly 20% of the world's daily food supply measured by calorie intake, similar to that provided by rice. The yield of wheat has doubled over the last 40 years due to a combination of advanced agronomic practice and improved germplasm through selective breeding. More recently, yield growth has been less dramatic, and a significant improvement in wheat production will be required if demand from the growing human population is to be met. Next-generation sequencing (NGS) technologies are revolutionizing biology and can be applied to address critical issues in plant biology. Technologies can produce draft sequences of genomes with a significant reduction to the cost and timeframe of traditional technologies. In addition, NGS technologies can be used to assess gene structure and expression, and importantly, to identify heritable genome variation underlying important agronomic traits. This review provides an overview of the wheat genome and NGS technologies, details some of the problems in applying NGS technology to wheat, and describes how NGS technologies are starting to impact wheat crop improvement.

  17. Next generation sequencing reveals genetic landscape of hepatocellular carcinomas.

    PubMed

    Li, Shuyu; Mao, Mao

    2013-11-01

    Liver cancer is one of most deadly cancers worldwide. Hepatocellular carcinoma (HCC) represents a major histological subtype of liver cancers. As cancer is a genetic disease, genetic lesions play a major role in HCC tumorigenesis and progression. Although significant progress has been made to uncover genetic alterations in HCCs, our understanding of genetics involved in the initiation and progression of HCC is far from complete. Next generation sequencing (NGS) has provided a new paradigm in biomedical research to delineate the genetic basis of human diseases. While identification of cancer somatic mutations has been serendipitous, genome sequencing has provided an unbiased approach to systematically catalog somatic mutations and elucidate the mechanisms of tumourigenesis. A number of recently published NGS studies on HCCs have not only confirmed previously known mutations in CTNNB1 and TP53 in HCC, but also identified novel genetic alterations in HCC including mutations in genes involved in epigenetic regulation. WNT, cell cycle and chromatin remodeling pathways have emerged as key oncogenic drivers in HCCs. The frequently altered genes and pathways in HCC reflect classical cancer hallmarks. These findings have started to depict a genetic landscape in HCC and will facilitate development of novel therapeutics for the treatment of this deadly disease. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  18. De novo generation of simple sequence during gene amplification.

    PubMed Central

    Kirschner, L S

    1996-01-01

    Mammalian cells that have undergone gene amplification and/or gene rearrangement have been used as resources to gain insight into the questions of chromosome structure and dynamics. The multidrug resistant murine cell line J7.V2-1 has been shown previously to contain two distinct forms of the highly amplified mdr2 gene, a member of the mouse gene family responsible for the multidrug resistant (MDR) phenotype [Kirschner, L. S. (1995) DNA Cell Biol. 14, 47-59]. Characterization of both forms of the gene revealed that one form corresponded to the wild-type structure of the gene, whereas the other represented a rearrangement. Investigation of this altered gene demonstrated a deletion of 1.6 kb of the wild-type sequence, and replacement of this region with a poly(AT) tract that appears to have been generated de novo. Analysis of the native sequence in this region demonstrated the absence of repetitive elements, but was notable for the presence of two long stretches of polypurine: polypyrimidine strand asymmetry. Analysis of mdr2 transcripts in this cell line revealed that nearly all of the mRNA is transcribed from the rearranged form of the gene. This message is unable to code for a functional mdr2 gene product, owing to a deletion of the fourth exon during this event. Mechanisms of the rearrangement, as well as the significance of this curious effect on transcription, are discussed. PMID:8759018

  19. Next generation sequencing: Coping with rare genetic diseases in China

    PubMed Central

    Cram, David S; Zhou, Daixing

    2016-01-01

    Summary With a population of 1.4 billion, China shares the largest burden of rare genetic diseases worldwide. Current estimates suggest that there are over ten million individuals afflicted with chromosome disease syndromes and well over one million individuals with monogenic disease. Care of patients with rare genetic diseases remains a largely unmet need due to the paucity of available and affordable treatments. Over recent years, there is increasing recognition of the need for affirmative action by government, health providers, clinicians and patients. The advent of new next generation sequencing (NGS) technologies such as whole genome/exome sequencing, offers an unprecedented opportunity to provide large-scale population screening of the Chinese population to identify the molecular causes of rare genetic diseases. As a surrogate for lack of effective treatments, recent development and implementation of noninvasive prenatal testing (NIPT) in China has the greatest potential, as a single technology, for reducing the number of children born with rare genetic diseases. PMID:27672536

  20. Next generation sequencing for disorders of sex development.

    PubMed

    Tobias, Edward S; McElreavey, Ken

    2014-01-01

    Advances in sequencing technologies are having a major impact on our understanding of the genetic causes of many human congenital disorders. Next generation sequencing (NGS) approaches are particularly important for determining the inherited genetic changes leading to disorders of sex development (DSD). Knowledge of the genetic pathways involved in ovary or testis development is incomplete and, currently, a molecular diagnosis is made in a minority of DSD cases. Here, we review the different NGS strategies applied to the analysis of rare diseases and highlight the potential pitfalls and advantages that are associated with each approach. We also discuss the problems of variant calling as well as the challenges involved in the identification and interpretation of pathogenic mutations from NGS datasets. As clinics start to use NGS on a routine basis, a close collaboration between the molecular and clinical geneticists is essential. This is particularly relevant in the context of unsolicited genetic findings, where clear guidelines regarding counseling, truly informed consent and precise data interpretation will be invaluable.

  1. Recommendations on e-infrastructures for next-generation sequencing.

    PubMed

    Spjuth, Ola; Bongcam-Rudloff, Erik; Dahlberg, Johan; Dahlö, Martin; Kallio, Aleksi; Pireddu, Luca; Vezzi, Francesco; Korpelainen, Eija

    2016-06-07

    With ever-increasing amounts of data being produced by next-generation sequencing (NGS) experiments, the requirements placed on supporting e-infrastructures have grown. In this work, we provide recommendations based on the collective experiences from participants in the EU COST Action SeqAhead for the tasks of data preprocessing, upstream processing, data delivery, and downstream analysis, as well as long-term storage and archiving. We cover demands on computational and storage resources, networks, software stacks, automation of analysis, education, and also discuss emerging trends in the field. E-infrastructures for NGS require substantial effort to set up and maintain over time, and with sequencing technologies and best practices for data analysis evolving rapidly it is important to prioritize both processing capacity and e-infrastructure flexibility when making strategic decisions to support the data analysis demands of tomorrow. Due to increasingly demanding technical requirements we recommend that e-infrastructure development and maintenance be handled by a professional service unit, be it internal or external to the organization, and emphasis should be placed on collaboration between researchers and IT professionals.

  2. Family-Based Association Studies for Next-Generation Sequencing

    PubMed Central

    Zhu, Yun; Xiong, Momiao

    2012-01-01

    An individual's disease risk is determined by the compounded action of both common variants, inherited from remote ancestors, that segregated within the population and rare variants, inherited from recent ancestors, that segregated mainly within pedigrees. Next-generation sequencing (NGS) technologies generate high-dimensional data that allow a nearly complete evaluation of genetic variation. Despite their promise, NGS technologies also suffer from remarkable limitations: high error rates, enrichment of rare variants, and a large proportion of missing values, as well as the fact that most current analytical methods are designed for population-based association studies. To meet the analytical challenges raised by NGS, we propose a general framework for sequence-based association studies that can use various types of family and unrelated-individual data sampled from any population structure and a universal procedure that can transform any population-based association test statistic for use in family-based association tests. We develop family-based functional principal-component analysis (FPCA) with or without smoothing, a generalized T2, combined multivariate and collapsing (CMC) method, and single-marker association test statistics. Through intensive simulations, we demonstrate that the family-based smoothed FPCA (SFPCA) has the correct type I error rates and much more power to detect association of (1) common variants, (2) rare variants, (3) both common and rare variants, and (4) variants with opposite directions of effect from other population-based or family-based association analysis methods. The proposed statistics are applied to two data sets with pedigree structures. The results show that the smoothed FPCA has a much smaller p value than other statistics. PMID:22682329

  3. Exome sequencing covers >98% of mutations identified on targeted next generation sequencing panels

    PubMed Central

    LaDuca, Holly; Farwell, Kelly D.; Vuong, Huy; Lu, Hsiao-Mei; Mu, Wenbo; Shahmirzadi, Layla; Tang, Sha; Chen, Jefferey; Bhide, Shruti; Chao, Elizabeth C.

    2017-01-01

    Background With the expanded availability of next generation sequencing (NGS)-based clinical genetic tests, clinicians seeking to test patients with Mendelian diseases must weigh the superior coverage of targeted gene panels with the greater number of genes included in whole exome sequencing (WES) when considering their first-tier testing approach. Here, we use an in silico analysis to predict the analytic sensitivity of WES using pathogenic variants identified on targeted NGS panels as a reference. Methods Corresponding nucleotide positions for 1533 different alterations classified as pathogenic or likely pathogenic identified on targeted NGS multi-gene panel tests in our laboratory were interrogated in data from 100 randomly-selected clinical WES samples to quantify the sequence coverage at each position. Pathogenic variants represented 91 genes implicated in hereditary cancer, X-linked intellectual disability, primary ciliary dyskinesia, Marfan syndrome/aortic aneurysms, cardiomyopathies and arrhythmias. Results When assessing coverage among 100 individual WES samples for each pathogenic variant (153,300 individual assessments), 99.7% (n = 152,798) would likely have been detected on WES. All pathogenic variants had at least some coverage on exome sequencing, with a total of 97.3% (n = 1491) detectable across all 100 individuals. For the remaining 42 pathogenic variants, the number of WES samples with adequate coverage ranged from 35 to 99. Factors such as location in GC-rich, repetitive, or homologous regions likely explain why some of these alterations were not detected across all samples. To validate study findings, a similar analysis was performed against coverage data from 60,706 exomes available through the Exome Aggregation Consortium (ExAC). Results from this validation confirmed that 98.6% (91,743,296/93,062,298) of pathogenic variants demonstrated adequate depth for detection. Conclusions Results from this in silico analysis suggest that exome

  4. Mutation Detection in Patients with Retinal Dystrophies Using Targeted Next Generation Sequencing

    PubMed Central

    Weisschuh, Nicole; Mayer, Anja K.; Strom, Tim M.; Kohl, Susanne; Glöckle, Nicola; Schubach, Max; Andreasson, Sten; Bernd, Antje; Birch, David G.; Hamel, Christian P.; Heckenlively, John R.; Jacobson, Samuel G.; Kamme, Christina; Kellner, Ulrich; Kunstmann, Erdmute; Maffei, Pietro; Reiff, Charlotte M.; Rohrschneider, Klaus; Rosenberg, Thomas; Rudolph, Günther; Vámos, Rita; Varsányi, Balázs; Weleber, Richard G.; Wissinger, Bernd

    2016-01-01

    Retinal dystrophies (RD) constitute a group of blinding diseases that are characterized by clinical variability and pronounced genetic heterogeneity. The different nonsyndromic and syndromic forms of RD can be attributed to mutations in more than 200 genes. Consequently, next generation sequencing (NGS) technologies are among the most promising approaches to identify mutations in RD. We screened a large cohort of patients comprising 89 independent cases and families with various subforms of RD applying different NGS platforms. While mutation screening in 50 cases was performed using a RD gene capture panel, 47 cases were analyzed using whole exome sequencing. One family was analyzed using whole genome sequencing. A detection rate of 61% was achieved including mutations in 34 known and two novel RD genes. A total of 69 distinct mutations were identified, including 39 novel mutations. Notably, genetic findings in several families were not consistent with the initial clinical diagnosis. Clinical reassessment resulted in refinement of the clinical diagnosis in some of these families and confirmed the broad clinical spectrum associated with mutations in RD genes. PMID:26766544

  5. A new insight into CFTR allele frequency in Brazil through next generation sequencing.

    PubMed

    Nunes, Luisa M; Ribeiro, Roberto; Niewiadonski, Vivian D T; Sabino, Ester; Yamamoto, Guilherme L; Bertola, Débora R; Gaburo, Nelson; da Silva Filho, Luiz Vicente R F

    2017-10-01

    As of 2013, fewer than 20% of patients in the Brazilian CF Registry had two CFTR mutations identified. The aim of this study was to sequence the coding region of the CFTR in Brazilian CF patients and determine the frequency of mutations in this cohort. Patients with CF and those with suspected atypical CF or CFTR-related disorders were invited to enroll. Total DNA was extracted from blood samples, quantified, and purified. Library preparation was performed using Ion Xpress™ Plus gDNA and Amplicon Library preparation kits (Life Technologies), as well as sequencing using the Ion Torrent platform (Life Technologies). A total of 141 patients were enrolled, and 45 mutations were identified. Among 126 CF patients, we identified mutations in 97.2% of alleles. The three most common mutations were F508del, G542X, and 3120 + 1G->A. Five novel pathogenic mutations were also identified. Next generation sequencing (NGS) allowed the identification of mutations in most CF alleles and confirmed allelic heterogeneity in our population. © 2017 Wiley Periodicals, Inc.

  6. Small RNA Profiling by Next-Generation Sequencing Using High-Definition Adapters.

    PubMed

    Billmeier, Martina; Xu, Ping

    2017-01-01

    Small RNAs (sRNAs) as key regulators of gene expression play fundamental roles in many biological processes. Next-generation sequencing (NGS) has become an important tool for sRNA discovery and profiling. However, NGS data often show bias for or against certain sequences which is mainly caused by adapter oligonucleotides that are ligated to sRNAs more or less efficiently by RNA ligases. In order to reduce ligation bias, High-definition (HD) adapters for the Illumina sequencing platform were developed. However, a large amount of direct 5' and 3' adapter ligation products are often produced when the current commercially available kits are used for cloning with HD adapters. In this chapter we describe a protocol for sRNA library construction using HD adapters with drastically reduced direct 5' adapter-3' adapter ligation product. The protocol can be used for sRNA library preparation from total RNA or sRNA of various plant, animal, insect, or fungal samples. The protocol includes total RNA extraction from plant leaf tissue and cultured mammalian cells and sRNA library construction using HD adapters.

  7. Applic