Draft versus finished sequence data for DNA and protein diagnostic signature development
Gardner, Shea N.; Lam, Marisa W.; Smith, Jason R.; Torres, Clinton L.; Slezak, Tom R.
2005-01-01
Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP uses simulations to estimate the number of target genomes and close phylogenetic relatives (near neighbors or NNs) to sequence. We use SAP to assess whether draft data are sufficient or finished sequencing is required using Marburg and variola virus sequences. Simulations indicate that intermediate to high-quality draft with error rates of 10−3–10−5 (∼8× coverage) of target organisms is suitable for DNA signature prediction. Low-quality draft with error rates of ∼1% (3× to 6× coverage) of target isolates is inadequate for DNA signature prediction, although low-quality draft of NNs is sufficient, as long as the target genomes are of high quality. For protein signature prediction, sequencing errors in target genomes substantially reduce the detection of amino acid sequence conservation, even if the draft is of high quality. In summary, high-quality draft of target and low-quality draft of NNs appears to be a cost-effective investment for DNA signature prediction, but may lead to underestimation of predicted protein signatures. PMID:16243783
SUGAR: graphical user interface-based data refiner for high-throughput DNA sequencing.
Sato, Yukuto; Kojima, Kaname; Nariai, Naoki; Yamaguchi-Kabata, Yumi; Kawai, Yosuke; Takahashi, Mamoru; Mimori, Takahiro; Nagasaki, Masao
2014-08-08
Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.
Nakazato, Takeru; Bono, Hidemasa
2017-01-01
Abstract It is important for public data repositories to promote the reuse of archived data. In the growing field of omics science, however, the increasing number of submissions of high-throughput sequencing (HTSeq) data to public repositories prevents users from choosing a suitable data set from among the large number of search results. Repository users need to be able to set a threshold to reduce the number of results to obtain a suitable subset of high-quality data for reanalysis. We calculated the quality of sequencing data archived in a public data repository, the Sequence Read Archive (SRA), by using the quality control software FastQC. We obtained quality values for 1 171 313 experiments, which can be used to evaluate the suitability of data for reuse. We also visualized the data distribution in SRA by integrating the quality information and metadata of experiments and samples. We provide quality information of all of the archived sequencing data, which enable users to obtain sufficient quality sequencing data for reanalyses. The calculated quality data are available to the public in various formats. Our data also provide an example of enhancing the reuse of public data by adding metadata to published research data by a third party. PMID:28449062
Ohta, Tazro; Nakazato, Takeru; Bono, Hidemasa
2017-06-01
It is important for public data repositories to promote the reuse of archived data. In the growing field of omics science, however, the increasing number of submissions of high-throughput sequencing (HTSeq) data to public repositories prevents users from choosing a suitable data set from among the large number of search results. Repository users need to be able to set a threshold to reduce the number of results to obtain a suitable subset of high-quality data for reanalysis. We calculated the quality of sequencing data archived in a public data repository, the Sequence Read Archive (SRA), by using the quality control software FastQC. We obtained quality values for 1 171 313 experiments, which can be used to evaluate the suitability of data for reuse. We also visualized the data distribution in SRA by integrating the quality information and metadata of experiments and samples. We provide quality information of all of the archived sequencing data, which enable users to obtain sufficient quality sequencing data for reanalyses. The calculated quality data are available to the public in various formats. Our data also provide an example of enhancing the reuse of public data by adding metadata to published research data by a third party. © The Authors 2017. Published by Oxford University Press.
USDA-ARS?s Scientific Manuscript database
Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole...
Kavousi, Niloofar; Eng, Wilhelm Wei Han; Lee, Yin Peng; Tan, Lian Huat; Thuraisingham, Ravindran; Yule, Catherine M; Gan, Han Ming
2016-03-03
We report here the first high-quality draft genome sequence of Pasteurella multocida sequence type 128, which was isolated from the infected finger bone of an adult female who was bitten by a domestic dog. The draft genome will be a valuable addition to the scarce genomic resources available for P. multocida. Copyright © 2016 Kavousi et al.
A high-throughput Sanger strategy for human mitochondrial genome sequencing
2013-01-01
Background A population reference database of complete human mitochondrial genome (mtGenome) sequences is needed to enable the use of mitochondrial DNA (mtDNA) coding region data in forensic casework applications. However, the development of entire mtGenome haplotypes to forensic data quality standards is difficult and laborious. A Sanger-based amplification and sequencing strategy that is designed for automated processing, yet routinely produces high quality sequences, is needed to facilitate high-volume production of these mtGenome data sets. Results We developed a robust 8-amplicon Sanger sequencing strategy that regularly produces complete, forensic-quality mtGenome haplotypes in the first pass of data generation. The protocol works equally well on samples representing diverse mtDNA haplogroups and DNA input quantities ranging from 50 pg to 1 ng, and can be applied to specimens of varying DNA quality. The complete workflow was specifically designed for implementation on robotic instrumentation, which increases throughput and reduces both the opportunities for error inherent to manual processing and the cost of generating full mtGenome sequences. Conclusions The described strategy will assist efforts to generate complete mtGenome haplotypes which meet the highest data quality expectations for forensic genetic and other applications. Additionally, high-quality data produced using this protocol can be used to assess mtDNA data developed using newer technologies and chemistries. Further, the amplification strategy can be used to enrich for mtDNA as a first step in sample preparation for targeted next-generation sequencing. PMID:24341507
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ghodhbane-Gtari, Faten; Beauchemin, Nicholas; Louati, Moussa
Here, we report the first genome sequence of a Nocardia plant endophyte, N. casuarinae strain BMG51109, isolated from Casuarina glauca root nodules. The improved high-quality draft genome sequence contains 8,787,999 bp with a 68.90% GC content and 7,307 predicted protein-coding genes.
Ghodhbane-Gtari, Faten; Beauchemin, Nicholas; Louati, Moussa; ...
2016-08-04
Here, we report the first genome sequence of a Nocardia plant endophyte, N. casuarinae strain BMG51109, isolated from Casuarina glauca root nodules. The improved high-quality draft genome sequence contains 8,787,999 bp with a 68.90% GC content and 7,307 predicted protein-coding genes.
Low-Cost, High-Throughput Sequencing of DNA Assemblies Using a Highly Multiplexed Nextera Process.
Shapland, Elaine B; Holmes, Victor; Reeves, Christopher D; Sorokin, Elena; Durot, Maxime; Platt, Darren; Allen, Christopher; Dean, Jed; Serber, Zach; Newman, Jack; Chandran, Sunil
2015-07-17
In recent years, next-generation sequencing (NGS) technology has greatly reduced the cost of sequencing whole genomes, whereas the cost of sequence verification of plasmids via Sanger sequencing has remained high. Consequently, industrial-scale strain engineers either limit the number of designs or take short cuts in quality control. Here, we show that over 4000 plasmids can be completely sequenced in one Illumina MiSeq run for less than $3 each (15× coverage), which is a 20-fold reduction over using Sanger sequencing (2× coverage). We reduced the volume of the Nextera tagmentation reaction by 100-fold and developed an automated workflow to prepare thousands of samples for sequencing. We also developed software to track the samples and associated sequence data and to rapidly identify correctly assembled constructs having the fewest defects. As DNA synthesis and assembly become a centralized commodity, this NGS quality control (QC) process will be essential to groups operating high-throughput pipelines for DNA construction.
Library construction for next-generation sequencing: Overviews and challenges
Head, Steven R.; Komori, H. Kiyomi; LaMere, Sarah A.; Whisenant, Thomas; Van Nieuwerburgh, Filip; Salomon, Daniel R.; Ordoukhanian, Phillip
2014-01-01
High-throughput sequencing, also known as next-generation sequencing (NGS), has revolutionized genomic research. In recent years, NGS technology has steadily improved, with costs dropping and the number and range of sequencing applications increasing exponentially. Here, we examine the critical role of sequencing library quality and consider important challenges when preparing NGS libraries from DNA and RNA sources. Factors such as the quantity and physical characteristics of the RNA or DNA source material as well as the desired application (i.e., genome sequencing, targeted sequencing, RNA-seq, ChIP-seq, RIP-seq, and methylation) are addressed in the context of preparing high quality sequencing libraries. In addition, the current methods for preparing NGS libraries from single cells are also discussed. PMID:24502796
Mavromatis, Konstantinos; Land, Miriam L; Brettin, Thomas S; Quest, Daniel J; Copeland, Alex; Clum, Alicia; Goodwin, Lynne; Woyke, Tanja; Lapidus, Alla; Klenk, Hans Peter; Cottingham, Robert W; Kyrpides, Nikos C
2012-01-01
The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).
USDA-ARS?s Scientific Manuscript database
The current pig reference genome sequence (Sscrofa10.2) was established using Sanger sequencing and following the clone-by-clone hierarchical shotgun sequencing approach used in the public human genome project. However, as sequence coverage was low (4-6x) the resulting assembly was only of draft qua...
AfterQC: automatic filtering, trimming, error removing and quality control for fastq data.
Chen, Shifu; Huang, Tanxiao; Zhou, Yanqing; Han, Yue; Xu, Mingyan; Gu, Jia
2017-03-14
Some applications, especially those clinical applications requiring high accuracy of sequencing data, usually have to face the troubles caused by unavoidable sequencing errors. Several tools have been proposed to profile the sequencing quality, but few of them can quantify or correct the sequencing errors. This unmet requirement motivated us to develop AfterQC, a tool with functions to profile sequencing errors and correct most of them, plus highly automated quality control and data filtering features. Different from most tools, AfterQC analyses the overlapping of paired sequences for pair-end sequencing data. Based on overlapping analysis, AfterQC can detect and cut adapters, and furthermore it gives a novel function to correct wrong bases in the overlapping regions. Another new feature is to detect and visualise sequencing bubbles, which can be commonly found on the flowcell lanes and may raise sequencing errors. Besides normal per cycle quality and base content plotting, AfterQC also provides features like polyX (a long sub-sequence of a same base X) filtering, automatic trimming and K-MER based strand bias profiling. For each single or pair of FastQ files, AfterQC filters out bad reads, detects and eliminates sequencer's bubble effects, trims reads at front and tail, detects the sequencing errors and corrects part of them, and finally outputs clean data and generates HTML reports with interactive figures. AfterQC can run in batch mode with multiprocess support, it can run with a single FastQ file, a single pair of FastQ files (for pair-end sequencing), or a folder for all included FastQ files to be processed automatically. Based on overlapping analysis, AfterQC can estimate the sequencing error rate and profile the error transform distribution. The results of our error profiling tests show that the error distribution is highly platform dependent. Much more than just another new quality control (QC) tool, AfterQC is able to perform quality control, data filtering, error profiling and base correction automatically. Experimental results show that AfterQC can help to eliminate the sequencing errors for pair-end sequencing data to provide much cleaner outputs, and consequently help to reduce the false-positive variants, especially for the low-frequency somatic mutations. While providing rich configurable options, AfterQC can detect and set all the options automatically and require no argument in most cases.
Evaluating Quality of Aged Archival Formalin-Fixed Paraffin-Embedded Samples for RNA-Sequencing
Archival formalin-fixed paraffin-embedded (FFPE) samples offer a vast, untapped source of genomic data for biomarker discovery. However, the quality of FFPE samples is often highly variable, and conventional methods to assess RNA quality for RNA-sequencing (RNA-seq) are not infor...
Analysis of quality raw data of second generation sequencers with Quality Assessment Software.
Ramos, Rommel Tj; Carneiro, Adriana R; Baumbach, Jan; Azevedo, Vasco; Schneider, Maria Pc; Silva, Artur
2011-04-18
Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage. Independent of the program chosen for the construction process, DNA sequences are superimposed, based on identity, to extend the reads, generating contigs; mismatches indicate a lack of homology and are not included. This process improves our confidence in the sequences that are generated. We developed Quality Assessment Software, with which one can review graphs showing the distribution of quality values from the sequencing reads. This software allow us to adopt more stringent quality standards for sequence data, based on quality-graph analysis and estimated coverage after applying the quality filter, providing acceptable sequence coverage for genome construction from short reads. Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies. Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.
The diploid genome sequence of an Asian individual
Wang, Jun; Wang, Wei; Li, Ruiqiang; Li, Yingrui; Tian, Geng; Goodman, Laurie; Fan, Wei; Zhang, Junqing; Li, Jun; Zhang, Juanbin; Guo, Yiran; Feng, Binxiao; Li, Heng; Lu, Yao; Fang, Xiaodong; Liang, Huiqing; Du, Zhenglin; Li, Dong; Zhao, Yiqing; Hu, Yujie; Yang, Zhenzhen; Zheng, Hancheng; Hellmann, Ines; Inouye, Michael; Pool, John; Yi, Xin; Zhao, Jing; Duan, Jinjie; Zhou, Yan; Qin, Junjie; Ma, Lijia; Li, Guoqing; Yang, Zhentao; Zhang, Guojie; Yang, Bin; Yu, Chang; Liang, Fang; Li, Wenjie; Li, Shaochuan; Li, Dawei; Ni, Peixiang; Ruan, Jue; Li, Qibin; Zhu, Hongmei; Liu, Dongyuan; Lu, Zhike; Li, Ning; Guo, Guangwu; Zhang, Jianguo; Ye, Jia; Fang, Lin; Hao, Qin; Chen, Quan; Liang, Yu; Su, Yeyang; san, A.; Ping, Cuo; Yang, Shuang; Chen, Fang; Li, Li; Zhou, Ke; Zheng, Hongkun; Ren, Yuanyuan; Yang, Ling; Gao, Yang; Yang, Guohua; Li, Zhuo; Feng, Xiaoli; Kristiansen, Karsten; Wong, Gane Ka-Shu; Nielsen, Rasmus; Durbin, Richard; Bolund, Lars; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian
2009-01-01
Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics. PMID:18987735
Sharma, Davinder; Golla, Naresh; Singh, Dheer; Onteru, Suneel K
2018-03-01
The next-generation sequencing (NGS) based RNA sequencing (RNA-Seq) and transcriptome profiling offers an opportunity to unveil complex biological processes. Successful RNA-Seq and transcriptome profiling requires a large amount of high-quality RNA. However, NGS-quality RNA isolation is extremely difficult from recalcitrant adipose tissue (AT) with high lipid content and low cell numbers. Further, the amount and biochemical composition of AT lipid varies depending upon the animal species which can pose different degree of resistance to RNA extraction. Currently available approaches may work effectively in one species but can be almost unproductive in another species. Herein, we report a two step protocol for the extraction of NGS quality RNA from AT across a broad range of animal species. © 2017 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Daum, Christopher; Zane, Matthew; Han, James
2011-01-31
The U.S. Department of Energy (DOE) Joint Genome Institute's (JGI) Production Sequencing group is committed to the generation of high-quality genomic DNA sequence to support the mission areas of renewable energy generation, global carbon management, and environmental characterization and clean-up. Within the JGI's Production Sequencing group, a robust Illumina Genome Analyzer and HiSeq pipeline has been established. Optimization of the sesequencer pipelines has been ongoing with the aim of continual process improvement of the laboratory workflow, reducing operational costs and project cycle times to increases ample throughput, and improving the overall quality of the sequence generated. A sequence QC analysismore » pipeline has been implemented to automatically generate read and assembly level quality metrics. The foremost of these optimization projects, along with sequencing and operational strategies, throughput numbers, and sequencing quality results will be presented.« less
Jun, Goo; Wing, Mary Kate; Abecasis, Gonçalo R; Kang, Hyun Min
2015-06-01
The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies. © 2015 Jun et al.; Published by Cold Spring Harbor Laboratory Press.
Tian, Rui; Parker, Matthew; Seshadri, Rekha; ...
2015-05-16
Bradyrhizobium sp. Th.b2 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-fixing root nodule of Amphicarpaea bracteata collected in Johnson City, New York. Here we describe the features of Bradyrhizobium sp. Th.b2, together with high-quality permanent draft genome sequence information and annotation. The 10,118,060 high-quality draft genome is arranged in 266 scaffolds of 274 contigs, contains 9,809 protein-coding genes and 108 RNA-only encoding genes. In conclusion, this rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tian, Rui; Parker, Matthew; Seshadri, Rekha
Bradyrhizobium sp. Th.b2 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-fixing root nodule of Amphicarpaea bracteata collected in Johnson City, New York. Here we describe the features of Bradyrhizobium sp. Th.b2, together with high-quality permanent draft genome sequence information and annotation. The 10,118,060 high-quality draft genome is arranged in 266 scaffolds of 274 contigs, contains 9,809 protein-coding genes and 108 RNA-only encoding genes. In conclusion, this rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project.
USDA-ARS?s Scientific Manuscript database
Using next-generation-sequencing technology to assess entire transcriptomes requires high quality starting RNA. Currently, RNA quality is routinely judged using automated microfluidic gel electrophoresis platforms and associated algorithms. Here we report that such automated methods generate false-n...
Hosseinkhani, Farideh; Emaneini, Mohammad; van Leeuwen, Willem
2017-07-20
Using Illumina HiSeq and PacBio technologies, we sequenced the genome of the multidrug-resistant bacterium Staphylococcus haemolyticus , originating from a bloodstream infection in a neonate. The sequence data can be used as an accurate reference sequence. Copyright © 2017 Hosseinkhani et al.
Fritz, Jan; Ahlawat, Shivani; Demehri, Shadpour; Thawait, Gaurav K; Raithel, Esther; Gilson, Wesley D; Nittka, Mathias
2016-10-01
The aim of this study was to prospectively test the hypothesis that a compressed sensing-based slice encoding for metal artifact correction (SEMAC) turbo spin echo (TSE) pulse sequence prototype facilitates high-resolution metal artifact reduction magnetic resonance imaging (MRI) of cobalt-chromium knee arthroplasty implants within acquisition times of less than 5 minutes, thereby yielding better image quality than high-bandwidth (BW) TSE of similar length and similar image quality than lengthier SEMAC standard of reference pulse sequences. This prospective study was approved by our institutional review board. Twenty asymptomatic subjects (12 men, 8 women; mean age, 56 years; age range, 44-82 years) with total knee arthroplasty implants underwent MRI of the knee using a commercially available, clinical 1.5 T MRI system. Two compressed sensing-accelerated SEMAC prototype pulse sequences with 8-fold undersampling and acquisition times of approximately 5 minutes each were compared with commercially available high-BW and SEMAC pulse sequences with acquisition times of approximately 5 minutes and 11 minutes, respectively. For each pulse sequence type, sagittal intermediate-weighted (TR, 3750-4120 milliseconds; TE, 26-28 milliseconds; voxel size, 0.5 × 0.5 × 3 mm) and short tau inversion recovery (TR, 4010 milliseconds; TE, 5.2-7.5 milliseconds; voxel size, 0.8 × 0.8 × 4 mm) were acquired. Outcome variables included image quality, display of the bone-implant interfaces and pertinent knee structures, artifact size, signal-to-noise ratio (SNR), and contrast-to-noise ratio (CNR). Statistical analysis included Friedman, repeated measures analysis of variances, and Cohen weighted k tests. Bonferroni-corrected P values of 0.005 and less were considered statistically significant. Image quality, bone-implant interfaces, anatomic structures, artifact size, SNR, and CNR parameters were statistically similar between the compressed sensing-accelerated SEMAC prototype and SEMAC commercial pulse sequences. There was mild blur on images of both SEMAC sequences when compared with high-BW images (P < 0.001), which however did not impair the assessment of knee structures. Metal artifact reduction and visibility of central knee structures and bone-implant interfaces were good to very good and significantly better on both types of SEMAC than on high-BW images (P < 0.004). All 3 pulse sequences showed peripheral structures similarly well. The implant artifact size was 46% to 51% larger on high-BW images when compared with both types of SEMAC images (P < 0.0001). Signal-to-noise ratios and CNRs of fat tissue, tendon tissue, muscle tissue, and fluid were statistically similar on intermediate-weighted MR images of all 3 pulse sequence types. On short tau inversion recovery images, the SNRs of tendon tissue and the CNRs of fat and fluid, fluid and muscle, as well as fluid and tendon were significantly higher on SEMAC and compressed sensing SEMAC images (P < 0.005, respectively). We accept the hypothesis that prospective compressed sensing acceleration of SEMAC is feasible for high-quality metal artifact reduction MRI of cobalt-chromium knee arthroplasty implants in less than 5 minutes and yields better quality than high-BW TSE and similarly high quality than lengthier SEMAC pulse sequences.
What can we learn about lyssavirus genomes using 454 sequencing?
Höper, Dirk; Finke, Stefan; Freuling, Conrad M; Hoffmann, Bernd; Beer, Martin
2012-01-01
The main task of the individual project number four"Whole genome sequencing, virus-host adaptation, and molecular epidemiological analyses of lyssaviruses "within the network" Lyssaviruses--a potential re-emerging public health threat" is to provide high quality complete genome sequences from lyssaviruses. These sequences are analysed in-depth with regard to the diversity of the viral populations as to both quasi-species and so-called defective interfering RNAs. Moreover, the sequence data will facilitate further epidemiological analyses, will provide insight into the evolution of lyssaviruses and will be the basis for the design of novel nucleic acid based diagnostics. The first results presented here indicate that not only high quality full-length lyssavirus genome sequences can be generated, but indeed efficient analysis of the viral population gets feasible.
Tian, Rui; Parker, Matthew; Seshadri, Rekha; ...
2015-05-17
Bradyrhizobiumsp. Tv2a.2 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-fixing root nodule of Tachigali versicolor collected in Barro Colorado Island of Panama. Here we describe the features of Bradyrhizobiumsp. Tv2a.2, together with high-quality permanent draft genome sequence information and annotation. The 8,496,279 bp high-quality draft genome is arranged in 87 scaffolds of 87 contigs, contains 8,109 protein-coding genes and 72 RNA-only encoding genes. In conclusion, this rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project.
Nakazato, Takeru; Ohta, Tazro; Bono, Hidemasa
2013-01-01
High-throughput sequencing technology, also called next-generation sequencing (NGS), has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA). As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs) from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH) extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called “Gendoo”. We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called “DBCLS SRA” (http://sra.dbcls.jp/). This service will improve accessibility to high-quality data from SRA. PMID:24167589
Non-ECG-gated unenhanced MRA of the carotids: optimization and clinical feasibility.
Raoult, H; Gauvrit, J Y; Schmitt, P; Le Couls, V; Bannier, E
2013-11-01
To optimise and assess the clinical feasibility of a carotid non-ECG-gated unenhanced MRA sequence. Sixteen healthy volunteers and 11 patients presenting with internal carotid artery (ICA) disease underwent large field-of-view balanced steady-state free precession (bSSFP) unenhanced MRA at 3T. Sampling schemes acquiring the k-space centre either early (kCE) or late (kCL) in the acquisition window were evaluated. Signal and image quality was scored in comparison to ECG-gated kCE unenhanced MRA and TOF. For patients, computed tomography angiography was used as the reference. In volunteers, kCE sampling yielded higher image quality than kCL and TOF, with fewer flow artefacts and improved signal homogeneity. kCE unenhanced MRA image quality was higher without ECG-gating. Arterial signal and artery/vein contrast were higher with both bSSFP sampling schemes than with TOF. The kCE sequence allowed correct quantification of ten significant stenoses, and it facilitated the identification of an infrapetrous dysplasia, which was outside of the TOF imaging coverage. Non-ECG-gated bSSFP carotid imaging offers high-quality images and is a promising sequence for carotid disease diagnosis in a short acquisition time with high spatial resolution and a large field of view. • Non-ECG-gated unenhanced bSSFP MRA offers high-quality imaging of the carotid arteries. • Sequences using early acquisition of the k-space centre achieve higher image quality. • Non-ECG-gated unenhanced bSSFP MRA allows quantification of significant carotid stenosis. • Short MR acquisition times and ungated sequences are helpful in clinical practice. • High 3D spatial resolution and a large field of view improve diagnostic performance.
2012-01-01
Background Genetic mapping and QTL detection are powerful methodologies in plant improvement and breeding. Construction of a high-density and high-quality genetic map would be of great benefit in the production of superior grapes to meet human demand. High throughput and low cost of the recently developed next generation sequencing (NGS) technology have resulted in its wide application in genome research. Sequencing restriction-site associated DNA (RAD) might be an efficient strategy to simplify genotyping. Combining NGS with RAD has proven to be powerful for single nucleotide polymorphism (SNP) marker development. Results An F1 population of 100 individual plants was developed. In-silico digestion-site prediction was used to select an appropriate restriction enzyme for construction of a RAD sequencing library. Next generation RAD sequencing was applied to genotype the F1 population and its parents. Applying a cluster strategy for SNP modulation, a total of 1,814 high-quality SNP markers were developed: 1,121 of these were mapped to the female genetic map, 759 to the male map, and 1,646 to the integrated map. A comparison of the genetic maps to the published Vitis vinifera genome revealed both conservation and variations. Conclusions The applicability of next generation RAD sequencing for genotyping a grape F1 population was demonstrated, leading to the successful development of a genetic map with high density and quality using our designed SNP markers. Detailed analysis revealed that this newly developed genetic map can be used for a variety of genome investigations, such as QTL detection, sequence assembly and genome comparison. PMID:22908993
Jeong, Hee-Won; Bang, Man-Seok; Lee, Yea-Jin; Lee, Su Ji; Lee, Sang-Cheol; Shin, Jang-In; Oh, Chung-Hun
2018-06-21
We present here the complete genome sequence of Bacillus subtilis strain DKU_NT_03 isolated from the traditional Korean food chung-gook-jang, which is made from soybeans. This strain was chosen to identify genetic factors with high-quality nattokinase activity. Copyright © 2018 Jeong et al.
Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase.
Delong, Allison K; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W; Kantor, Rami
2012-08-01
Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature ( http://hivdb.Stanford.edu ). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1-2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or <0.5% or >15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences.
Improved High-Quality Draft Genome Sequence and Annotation of Burkholderia contaminans LMG 23361T.
Jung, Ji Young; Ahn, Youngbeom; Kweon, Ohgew; LiPuma, John J; Hussong, David; Marasa, Bernard S; Cerniglia, Carl E
2017-04-20
Burkholderia contaminans LMG 23361 is the type strain of the species isolated from the milk of a dairy sheep with mastitis. Some pharmaceutical products contain disinfectants such as benzalkonium chloride (BZK) and previously we reported that B. contaminans LMG 23361 T possesses the ability to inactivate BZK with high biodegradation rates. Here, we report an improved high-quality draft genome sequence of this strain. Copyright © 2017 Jung et al.
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
Chin, Chen-Shan; Alexander, David H; Marks, Patrick; Klammer, Aaron A; Drake, James; Heiner, Cheryl; Clum, Alicia; Copeland, Alex; Huddleston, John; Eichler, Evan E; Turner, Stephen W; Korlach, Jonas
2013-06-01
We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.
Woo, Hannah L.; DeAngelis, Kristen M.; Teshima, Hazuki; ...
2017-05-04
In this paper, we report the high-quality draft genome sequences of four phylogenetically diverse lignocellulose-degrading bacteria isolated from tropical soil ( Gordonia sp., Paenibacillus sp., Variovorax sp., and Vogesella sp.) to elucidate the genetic basis of their ability to degrade lignocellulose. These isolates may provide novel enzymes for biofuel production.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woo, Hannah L.; DeAngelis, Kristen M.; Teshima, Hazuki
In this paper, we report the high-quality draft genome sequences of four phylogenetically diverse lignocellulose-degrading bacteria isolated from tropical soil ( Gordonia sp., Paenibacillus sp., Variovorax sp., and Vogesella sp.) to elucidate the genetic basis of their ability to degrade lignocellulose. These isolates may provide novel enzymes for biofuel production.
Progress in ion torrent semiconductor chip based sequencing.
Merriman, Barry; Rothberg, Jonathan M
2012-12-01
In order for next-generation sequencing to become widely used as a diagnostic in the healthcare industry, sequencing instrumentation will need to be mass produced with a high degree of quality and economy. One way to achieve this is to recast DNA sequencing in a format that fully leverages the manufacturing base created for computer chips, complementary metal-oxide semiconductor chip fabrication, which is the current pinnacle of large scale, high quality, low-cost manufacturing of high technology. To achieve this, ideally the entire sensory apparatus of the sequencer would be embodied in a standard semiconductor chip, manufactured in the same fab facilities used for logic and memory chips. Recently, such a sequencing chip, and the associated sequencing platform, has been developed and commercialized by Ion Torrent, a division of Life Technologies, Inc. Here we provide an overview of this semiconductor chip based sequencing technology, and summarize the progress made since its commercial introduction. We described in detail the progress in chip scaling, sequencing throughput, read length, and accuracy. We also summarize the enhancements in the associated platform, including sample preparation, data processing, and engagement of the broader development community through open source and crowdsourcing initiatives. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
van den Akker, Jeroen; Mishne, Gilad; Zimmer, Anjali D; Zhou, Alicia Y
2018-04-17
Next generation sequencing (NGS) has become a common technology for clinical genetic tests. The quality of NGS calls varies widely and is influenced by features like reference sequence characteristics, read depth, and mapping accuracy. With recent advances in NGS technology and software tools, the majority of variants called using NGS alone are in fact accurate and reliable. However, a small subset of difficult-to-call variants that still do require orthogonal confirmation exist. For this reason, many clinical laboratories confirm NGS results using orthogonal technologies such as Sanger sequencing. Here, we report the development of a deterministic machine-learning-based model to differentiate between these two types of variant calls: those that do not require confirmation using an orthogonal technology (high confidence), and those that require additional quality testing (low confidence). This approach allows reliable NGS-based calling in a clinical setting by identifying the few important variant calls that require orthogonal confirmation. We developed and tested the model using a set of 7179 variants identified by a targeted NGS panel and re-tested by Sanger sequencing. The model incorporated several signals of sequence characteristics and call quality to determine if a variant was identified at high or low confidence. The model was tuned to eliminate false positives, defined as variants that were called by NGS but not confirmed by Sanger sequencing. The model achieved very high accuracy: 99.4% (95% confidence interval: +/- 0.03%). It categorized 92.2% (6622/7179) of the variants as high confidence, and 100% of these were confirmed to be present by Sanger sequencing. Among the variants that were categorized as low confidence, defined as NGS calls of low quality that are likely to be artifacts, 92.1% (513/557) were found to be not present by Sanger sequencing. This work shows that NGS data contains sufficient characteristics for a machine-learning-based model to differentiate low from high confidence variants. Additionally, it reveals the importance of incorporating site-specific features as well as variant call features in such a model.
Mitra, Abhishek; Skrzypczak, Magdalena; Ginalski, Krzysztof; Rowicka, Maga
2015-01-01
Sequencing microRNA, reduced representation sequencing, Hi-C technology and any method requiring the use of in-house barcodes result in sequencing libraries with low initial sequence diversity. Sequencing such data on the Illumina platform typically produces low quality data due to the limitations of the Illumina cluster calling algorithm. Moreover, even in the case of diverse samples, these limitations are causing substantial inaccuracies in multiplexed sample assignment (sample bleeding). Such inaccuracies are unacceptable in clinical applications, and in some other fields (e.g. detection of rare variants). Here, we discuss how both problems with quality of low-diversity samples and sample bleeding are caused by incorrect detection of clusters on the flowcell during initial sequencing cycles. We propose simple software modifications (Long Template Protocol) that overcome this problem. We present experimental results showing that our Long Template Protocol remarkably increases data quality for low diversity samples, as compared with the standard analysis protocol; it also substantially reduces sample bleeding for all samples. For comprehensiveness, we also discuss and compare experimental results from alternative approaches to sequencing low diversity samples. First, we discuss how the low diversity problem, if caused by barcodes, can be avoided altogether at the barcode design stage. Second and third, we present modified guidelines, which are more stringent than the manufacturer’s, for mixing low diversity samples with diverse samples and lowering cluster density, which in our experience consistently produces high quality data from low diversity samples. Fourth and fifth, we present rescue strategies that can be applied when sequencing results in low quality data and when there is no more biological material available. In such cases, we propose that the flowcell be re-hybridized and sequenced again using our Long Template Protocol. Alternatively, we discuss how analysis can be repeated from saved sequencing images using the Long Template Protocol to increase accuracy. PMID:25860802
Augmenting Chinese hamster genome assembly by identifying regions of high confidence.
Vishwanathan, Nandita; Bandyopadhyay, Arpan A; Fu, Hsu-Yuan; Sharma, Mohit; Johnson, Kathryn C; Mudge, Joann; Ramaraj, Thiruvarangan; Onsongo, Getiria; Silverstein, Kevin A T; Jacob, Nitya M; Le, Huong; Karypis, George; Hu, Wei-Shou
2016-09-01
Chinese hamster Ovary (CHO) cell lines are the dominant industrial workhorses for therapeutic recombinant protein production. The availability of genome sequence of Chinese hamster and CHO cells will spur further genome and RNA sequencing of producing cell lines. However, the mammalian genomes assembled using shot-gun sequencing data still contain regions of uncertain quality due to assembly errors. Identifying high confidence regions in the assembled genome will facilitate its use for cell engineering and genome engineering. We assembled two independent drafts of Chinese hamster genome by de novo assembly from shotgun sequencing reads and by re-scaffolding and gap-filling the draft genome from NCBI for improved scaffold lengths and gap fractions. We then used the two independent assemblies to identify high confidence regions using two different approaches. First, the two independent assemblies were compared at the sequence level to identify their consensus regions as "high confidence regions" which accounts for at least 78 % of the assembled genome. Further, a genome wide comparison of the Chinese hamster scaffolds with mouse chromosomes revealed scaffolds with large blocks of collinearity, which were also compiled as high-quality scaffolds. Genome scale collinearity was complemented with EST based synteny which also revealed conserved gene order compared to mouse. As cell line sequencing becomes more commonly practiced, the approaches reported here are useful for assessing the quality of assembly and potentially facilitate the engineering of cell lines. Copyright © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Assembly of cucumber (Cucumis sativus L.) somaclones
NASA Astrophysics Data System (ADS)
Skarzyńska, Agnieszka; Kuśmirek, Wiktor; Pawełkowicz, Magdalena; PlÄ der, Wojciech; Nowak, Robert M.
2017-08-01
The development of next generation sequencing opens the possibility of using sequencing in various plant studies, such as finding structural changes and small polymorphisms between species and within them. Most analyzes rely on genomic sequences and it is crucial to use well-assembled genomes of high quality and completeness. Herein we compare commonly available programs for genomic assembling and newly developed software - dnaasm. Assemblies were tested on cucumber (Cucumis sativus L.) lines obtained by in vitro regeneration (somaclones), showing different phenotypes. Obtained results shows that dnaasm assembler is a good tool for short read assembly, which allows obtaining genomes of high quality and completeness.
NASA Astrophysics Data System (ADS)
Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.
2017-07-01
DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.
ScanRanker: Quality Assessment of Tandem Mass Spectra via Sequence Tagging
Ma, Ze-Qiang; Chambers, Matthew C.; Ham, Amy-Joan L.; Cheek, Kristin L.; Whitwell, Corbin W.; Aerni, Hans-Rudolf; Schilling, Birgit; Miller, Aaron W.; Caprioli, Richard M.; Tabb, David L.
2011-01-01
In shotgun proteomics, protein identification by tandem mass spectrometry relies on bioinformatics tools. Despite recent improvements in identification algorithms, a significant number of high quality spectra remain unidentified for various reasons. Here we present ScanRanker, an open-source tool that evaluates the quality of tandem mass spectra via sequence tagging with reliable performance in data from different instruments. The superior performance of ScanRanker enables it not only to find unassigned high quality spectra that evade identification through database search, but also to select spectra for de novo sequencing and cross-linking analysis. In addition, we demonstrate that the distribution of ScanRanker scores predicts the richness of identifiable spectra among multiple LC-MS/MS runs in an experiment, and ScanRanker scores assist the process of peptide assignment validation to increase confident spectrum identifications. The source code and executable versions of ScanRanker are available from http://fenchurch.mc.vanderbilt.edu. PMID:21520941
NASA Astrophysics Data System (ADS)
Liu, Qiong; Wang, Wen-xi; Zhu, Ke-ren; Zhang, Chao-yong; Rao, Yun-qing
2014-11-01
Mixed-model assembly line sequencing is significant in reducing the production time and overall cost of production. To improve production efficiency, a mathematical model aiming simultaneously to minimize overtime, idle time and total set-up costs is developed. To obtain high-quality and stable solutions, an advanced scatter search approach is proposed. In the proposed algorithm, a new diversification generation method based on a genetic algorithm is presented to generate a set of potentially diverse and high-quality initial solutions. Many methods, including reference set update, subset generation, solution combination and improvement methods, are designed to maintain the diversification of populations and to obtain high-quality ideal solutions. The proposed model and algorithm are applied and validated in a case company. The results indicate that the proposed advanced scatter search approach is significant for mixed-model assembly line sequencing in this company.
Blom, Mozes P K
2015-08-05
Recently developed molecular methods enable geneticists to target and sequence thousands of orthologous loci and infer evolutionary relationships across the tree of life. Large numbers of genetic markers benefit species tree inference but visual inspection of alignment quality, as traditionally conducted, is challenging with thousands of loci. Furthermore, due to the impracticality of repeated visual inspection with alternative filtering criteria, the potential consequences of using datasets with different degrees of missing data remain nominally explored in most empirical phylogenomic studies. In this short communication, I describe a flexible high-throughput pipeline designed to assess alignment quality and filter exonic sequence data for subsequent inference. The stringency criteria for alignment quality and missing data can be adapted based on the expected level of sequence divergence. Each alignment is automatically evaluated based on the stringency criteria specified, significantly reducing the number of alignments that require visual inspection. By developing a rapid method for alignment filtering and quality assessment, the consistency of phylogenetic estimation based on exonic sequence alignments can be further explored across distinct inference methods, while accounting for different degrees of missing data.
Chaitanya, Lakshmi; van Oven, Mannis; Brauer, Silke; Zimmermann, Bettina; Huber, Gabriela; Xavier, Catarina; Parson, Walther; de Knijff, Peter; Kayser, Manfred
2016-03-01
The use of mitochondrial DNA (mtDNA) for maternal lineage identification often marks the last resort when investigating forensic and missing-person cases involving highly degraded biological materials. As with all comparative DNA testing, a match between evidence and reference sample requires a statistical interpretation, for which high-quality mtDNA population frequency data are crucial. Here, we determined, under high quality standards, the complete mtDNA control-region sequences of 680 individuals from across the Netherlands sampled at 54 sites, covering the entire country with 10 geographic sub-regions. The complete mtDNA control region (nucleotide positions 16,024-16,569 and 1-576) was amplified with two PCR primers and sequenced with ten different sequencing primers using the EMPOP protocol. Haplotype diversity of the entire sample set was very high at 99.63% and, accordingly, the random-match probability was 0.37%. No population substructure within the Netherlands was detected with our dataset. Phylogenetic analyses were performed to determine mtDNA haplogroups. Inclusion of these high-quality data in the EMPOP database (accession number: EMP00666) will improve its overall data content and geographic coverage in the interest of all EMPOP users worldwide. Moreover, this dataset will serve as (the start of) a national reference database for mtDNA applications in forensic and missing person casework in the Netherlands. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Next-generation sequencing technology such as genotyping-by-sequencing (GBS) made low-cost, but often low-coverage, whole-genome sequencing widely available. Extensive inbreeding in crop plants provides an untapped, high quality source of phased haplotypes for imputing missing genotypes. We introduc...
Primer and platform effects on 16S rRNA tag sequencing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tremblay, Julien; Singh, Kanwar; Fern, Alison
Sequencing of 16S rRNA gene tags is a popular method for profiling and comparing microbial communities. The protocols and methods used, however, vary considerably with regard to amplification primers, sequencing primers, sequencing technologies; as well as quality filtering and clustering. How results are affected by these choices, and whether data produced with different protocols can be meaningfully compared, is often unknown. Here we compare results obtained using three different amplification primer sets (targeting V4, V6–V8, and V7–V8) and two sequencing technologies (454 pyrosequencing and Illumina MiSeq) using DNA from a mock community containing a known number of species as wellmore » as complex environmental samples whose PCR-independent profiles were estimated using shotgun sequencing. We find that paired-end MiSeq reads produce higher quality data and enabled the use of more aggressive quality control parameters over 454, resulting in a higher retention rate of high quality reads for downstream data analysis. While primer choice considerably influences quantitative abundance estimations, sequencing platform has relatively minor effects when matched primers are used. In conclusion, beta diversity metrics are surprisingly robust to both primer and sequencing platform biases.« less
Primer and platform effects on 16S rRNA tag sequencing
Tremblay, Julien; Singh, Kanwar; Fern, Alison; ...
2015-08-04
Sequencing of 16S rRNA gene tags is a popular method for profiling and comparing microbial communities. The protocols and methods used, however, vary considerably with regard to amplification primers, sequencing primers, sequencing technologies; as well as quality filtering and clustering. How results are affected by these choices, and whether data produced with different protocols can be meaningfully compared, is often unknown. Here we compare results obtained using three different amplification primer sets (targeting V4, V6–V8, and V7–V8) and two sequencing technologies (454 pyrosequencing and Illumina MiSeq) using DNA from a mock community containing a known number of species as wellmore » as complex environmental samples whose PCR-independent profiles were estimated using shotgun sequencing. We find that paired-end MiSeq reads produce higher quality data and enabled the use of more aggressive quality control parameters over 454, resulting in a higher retention rate of high quality reads for downstream data analysis. While primer choice considerably influences quantitative abundance estimations, sequencing platform has relatively minor effects when matched primers are used. In conclusion, beta diversity metrics are surprisingly robust to both primer and sequencing platform biases.« less
Protein sequence annotation in the genome era: the annotation concept of SWISS-PROT+TREMBL.
Apweiler, R; Gateau, A; Contrino, S; Martin, M J; Junker, V; O'Donovan, C; Lang, F; Mitaritonna, N; Kappus, S; Bairoch, A
1997-01-01
SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases. Ongoing genome sequencing projects have dramatically increased the number of protein sequences to be incorporated into SWISS-PROT. Since we do not want to dilute the quality standards of SWISS-PROT by incorporating sequences without proper sequence analysis and annotation, we cannot speed up the incorporation of new incoming data indefinitely. However, as we also want to make the sequences available as fast as possible, we introduced TREMBL (TRanslation of EMBL nucleotide sequence database), a supplement to SWISS-PROT. TREMBL consists of computer-annotated entries in SWISS-PROT format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except for CDS already included in SWISS-PROT. While TREMBL is already of immense value, its computer-generated annotation does not match the quality of SWISS-PROTs. The main difference is in the protein functional information attached to sequences. With this in mind, we are dedicating substantial effort to develop and apply computer methods to enhance the functional information attached to TREMBL entries.
Recording high quality speech during tagged cine-MRI studies using a fiber optic microphone.
NessAiver, Moriel S; Stone, Maureen; Parthasarathy, Vijay; Kahana, Yuvi; Paritsky, Alexander; Paritsky, Alex
2006-01-01
To investigate the feasibility of obtaining high quality speech recordings during cine imaging of tongue movement using a fiber optic microphone. A Complementary Spatial Modulation of Magnetization (C-SPAMM) tagged cine sequence triggered by an electrocardiogram (ECG) simulator was used to image a volunteer while speaking the syllable pairs /a/-/u/, /i/-/u/, and the words "golly" and "Tamil" in sync with the imaging sequence. A noise-canceling, optical microphone was fastened approximately 1-2 inches above the mouth of the volunteer. The microphone was attached via optical fiber to a laptop computer, where the speech was sampled at 44.1 kHz. A reference recording of gradient activity with no speech was subtracted from target recordings. Good quality speech was discernible above the background gradient sound using the fiber optic microphone without reference subtraction. The audio waveform of gradient activity was extremely stable and reproducible. Subtraction of the reference gradient recording further reduced gradient noise by roughly 21 dB, resulting in exceptionally high quality speech waveforms. It is possible to obtain high quality speech recordings using an optical microphone even during exceptionally loud cine imaging sequences. This opens up the possibility of more elaborate MRI studies of speech including spectral analysis of the speech signal in all types of MRI.
Atropos: specific, sensitive, and speedy trimming of sequencing reads.
Didion, John P; Martin, Marcel; Collins, Francis S
2017-01-01
A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Atropos is open source and free software written in Python (3.3+) and available at https://github.com/jdidion/atropos.
Atropos: specific, sensitive, and speedy trimming of sequencing reads
Collins, Francis S.
2017-01-01
A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Atropos is open source and free software written in Python (3.3+) and available at https://github.com/jdidion/atropos. PMID:28875074
Monitoring Error Rates In Illumina Sequencing.
Manley, Leigh J; Ma, Duanduan; Levine, Stuart S
2016-12-01
Guaranteeing high-quality next-generation sequencing data in a rapidly changing environment is an ongoing challenge. The introduction of the Illumina NextSeq 500 and the depreciation of specific metrics from Illumina's Sequencing Analysis Viewer (SAV; Illumina, San Diego, CA, USA) have made it more difficult to determine directly the baseline error rate of sequencing runs. To improve our ability to measure base quality, we have created an open-source tool to construct the Percent Perfect Reads (PPR) plot, previously provided by the Illumina sequencers. The PPR program is compatible with HiSeq 2000/2500, MiSeq, and NextSeq 500 instruments and provides an alternative to Illumina's quality value (Q) scores for determining run quality. Whereas Q scores are representative of run quality, they are often overestimated and are sourced from different look-up tables for each platform. The PPR's unique capabilities as a cross-instrument comparison device, as a troubleshooting tool, and as a tool for monitoring instrument performance can provide an increase in clarity over SAV metrics that is often crucial for maintaining instrument health. These capabilities are highlighted.
dBBQs: dataBase of Bacterial Quality scores.
Wanchai, Visanu; Patumcharoenpol, Preecha; Nookaew, Intawat; Ussery, David
2017-12-28
It is well-known that genome sequencing technologies are becoming significantly cheaper and faster. As a result of this, the exponential growth in sequencing data in public databases allows us to explore ever growing large collections of genome sequences. However, it is less known that the majority of available sequenced genome sequences in public databases are not complete, drafts of varying qualities. We have calculated quality scores for around 100,000 bacterial genomes from all major genome repositories and put them in a fast and easy-to-use database. Prokaryotic genomic data from all sources were collected and combined to make a non-redundant set of bacterial genomes. The genome quality score for each was calculated by four different measurements: assembly quality, number of rRNA and tRNA genes, and the occurrence of conserved functional domains. The dataBase of Bacterial Quality scores (dBBQs) was designed to store and retrieve quality scores. It offers fast searching and download features which the result can be used for further analysis. In addition, the search results are shown in interactive JavaScript chart framework using DC.js. The analysis of quality scores across major public genome databases find that around 68% of the genomes are of acceptable quality for many uses. dBBQs (available at http://arc-gem.uams.edu/dbbqs ) provides genome quality scores for all available prokaryotic genome sequences with a user-friendly Web-interface. These scores can be used as cut-offs to get a high-quality set of genomes for testing bioinformatics tools or improving the analysis. Moreover, all data of the four measurements that were combined to make the quality score for each genome, which can potentially be used for further analysis. dBBQs will be updated regularly and is freely use for non-commercial purpose.
MIPS bacterial genomes functional annotation benchmark dataset.
Tetko, Igor V; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Fobo, Gisela; Ruepp, Andreas; Antonov, Alexey V; Surmeli, Dimitrij; Mewes, Hans-Wernen
2005-05-15
Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. BFAB is available at http://mips.gsf.de/proj/bfab
Fei, Xiaolu; Li, Shanshan; Gao, Shan; Wei, Lan; Wang, Lihong
2014-09-04
Radio Frequency Identification(RFID) has been widely used in healthcare facilities, but it has been paid little attention whether RFID applications are safe enough under healthcare environment. The purpose of this study is to assess the effects of RFID tags on Magnetic Resonance (MR) imaging in a typical electromagnetic environment in hospitals, and to evaluate the safety of their applications. A Magphan phantom was used to simulate the imaging objects, while active RFID tags were placed at different distances (0, 4, 8, 10 cm) from the phantom border. The phantom was scanned by using three typical sequences including spin-echo (SE) sequence, gradient-echo (GRE) sequence and inversion-recovery (IR) sequence. The quality of the image was quantitatively evaluated by using signal-to-noise ratio (SNR), uniformity, high-contrast resolution, and geometric distortion. RFID tags were read by an RFID reader to calculate their usable rate. RFID tags can be read properly after being placed in high magnetic field for up to 30 minutes. SNR: There were no differences between the group with RFID tags and the group without RFID tags using SE and IR sequence, but it was lower when using GRE sequence.Uniformity: There was a significant difference between the group with RFID tags and the group without RFID tags using SE and GRE sequence. Geometric distortion and high-contrast resolution: There were no obvious differences found. Active RFID tags can affect MR imaging quality, especially using the GRE sequence. Increasing the distance from the RFID tags to the imaging objects can reduce that influence. When the distance was longer than 8 cm, MR imaging quality were almost unaffected. However, the Gradient Echo related sequence is not recommended when patients wear a RFID wristband.
Morgan, Martin; Anders, Simon; Lawrence, Michael; Aboyoun, Patrick; Pagès, Hervé; Gentleman, Robert
2009-01-01
Summary: ShortRead is a package for input, quality assessment, manipulation and output of high-throughput sequencing data. ShortRead is provided in the R and Bioconductor environments, allowing ready access to additional facilities for advanced statistical analysis, data transformation, visualization and integration with diverse genomic resources. Availability and Implementation: This package is implemented in R and available at the Bioconductor web site; the package contains a ‘vignette’ outlining typical work flows. Contact: mtmorgan@fhcrc.org PMID:19654119
Simultaneous phylogeny reconstruction and multiple sequence alignment
Yue, Feng; Shi, Jian; Tang, Jijun
2009-01-01
Background A phylogeny is the evolutionary history of a group of organisms. To date, sequence data is still the most used data type for phylogenetic reconstruction. Before any sequences can be used for phylogeny reconstruction, they must be aligned, and the quality of the multiple sequence alignment has been shown to affect the quality of the inferred phylogeny. At the same time, all the current multiple sequence alignment programs use a guide tree to produce the alignment and experiments showed that good guide trees can significantly improve the multiple alignment quality. Results We devise a new algorithm to simultaneously align multiple sequences and search for the phylogenetic tree that leads to the best alignment. We also implemented the algorithm as a C program package, which can handle both DNA and protein data and can take simple cost model as well as complex substitution matrices, such as PAM250 or BLOSUM62. The performance of the new method are compared with those from other popular multiple sequence alignment tools, including the widely used programs such as ClustalW and T-Coffee. Experimental results suggest that this method has good performance in terms of both phylogeny accuracy and alignment quality. Conclusion We present an algorithm to align multiple sequences and reconstruct the phylogenies that minimize the alignment score, which is based on an efficient algorithm to solve the median problems for three sequences. Our extensive experiments suggest that this method is very promising and can produce high quality phylogenies and alignments. PMID:19208110
A high resolution radiation hybrid map of wheat chromosome 4A
USDA-ARS?s Scientific Manuscript database
Bread wheat has a large and complex allohexaploid genome with low recombination level at chromosome centromeric and peri-centromeric regions. This significantly hampers ordering of markers, contigs of physical maps and sequence scaffolds and impedes obtaining of high-quality reference genome sequenc...
High quality de novo sequencing and assembly of the Saccharomyces arboricolus genome
2013-01-01
Background Comparative genomics is a formidable tool to identify functional elements throughout a genome. In the past ten years, studies in the budding yeast Saccharomyces cerevisiae and a set of closely related species have been instrumental in showing the benefit of analyzing patterns of sequence conservation. Increasing the number of closely related genome sequences makes the comparative genomics approach more powerful and accurate. Results Here, we report the genome sequence and analysis of Saccharomyces arboricolus, a yeast species recently isolated in China, that is closely related to S. cerevisiae. We obtained high quality de novo sequence and assemblies using a combination of next generation sequencing technologies, established the phylogenetic position of this species and considered its phenotypic profile under multiple environmental conditions in the light of its gene content and phylogeny. Conclusions We suggest that the genome of S. arboricolus will be useful in future comparative genomics analysis of the Saccharomyces sensu stricto yeasts. PMID:23368932
Sequence independent amplification of DNA
Bohlander, S.K.
1998-03-24
The present invention is a rapid sequence-independent amplification procedure (SIA). Even minute amounts of DNA from various sources can be amplified independent of any sequence requirements of the DNA or any a priori knowledge of any sequence characteristics of the DNA to be amplified. This method allows, for example, the sequence independent amplification of microdissected chromosomal material and the reliable construction of high quality fluorescent in situ hybridization (FISH) probes from YACs or from other sources. These probes can be used to localize YACs on metaphase chromosomes but also--with high efficiency--in interphase nuclei. 25 figs.
Sequence independent amplification of DNA
Bohlander, Stefan K.
1998-01-01
The present invention is a rapid sequence-independent amplification procedure (SIA). Even minute amounts of DNA from various sources can be amplified independent of any sequence requirements of the DNA or any a priori knowledge of any sequence characteristics of the DNA to be amplified. This method allows, for example the sequence independent amplification of microdissected chromosomal material and the reliable construction of high quality fluorescent in situ hybridization (FISH) probes from YACs or from other sources. These probes can be used to localize YACs on metaphase chromosomes but also--with high efficiency--in interphase nuclei.
SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read
2010-01-01
Background High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. Results SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. Conclusions SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts. PMID:20089148
Venturia carpophila draft genome sequence
USDA-ARS?s Scientific Manuscript database
Venturia carpophila causes peach scab, a disease that renders peach fruit unmarketable. We report a high-quality draft genome sequence (36.9 Mb) of V. carpophila from an isolate collected from a peach tree in central Georgia in the United States. The genome sequence described will be a useful resour...
Hu, Zhi-Liang; Ramos, Antonio M.; Humphray, Sean J.; Rogers, Jane; Reecy, James M.; Rothschild, Max F.
2011-01-01
The newly available pig genome sequence has provided new information to fine map quantitative trait loci (QTL) in order to eventually identify causal variants. With targeted genomic sequencing efforts, we were able to obtain high quality BAC sequences that cover a region on pig chromosome 17 where a number of meat quality QTL have been previously discovered. Sequences from 70 BAC clones were assembled to form an 8-Mbp contig. Subsequently, we successfully mapped five previously identified QTL, three for meat color and two for lactate related traits, to the contig. With an additional 25 genetic markers that were identified by sequence comparison, we were able to carry out further linkage disequilibrium analysis to narrow down the genomic locations of these QTL, which allowed identification of the chromosomal regions that likely contain the causative variants. This research has provided one practical approach to combine genetic and molecular information for QTL mining. PMID:22303339
Improved growth of GaN layers on ultra thin silicon nitride/Si (1 1 1) by RF-MBE
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kumar, Mahesh; Roul, Basanta; Central Research Laboratory, Bharat Electronics, Bangalore 560013
High-quality GaN epilayers were grown on Si (1 1 1) substrates by molecular beam epitaxy using a new growth process sequence which involved a substrate nitridation at low temperatures, annealing at high temperatures, followed by nitridation at high temperatures, deposition of a low-temperature buffer layer, and a high-temperature overgrowth. The material quality of the GaN films was also investigated as a function of nitridation time and temperature. Crystallinity and surface roughness of GaN was found to improve when the Si substrate was treated under the new growth process sequence. Micro-Raman and photoluminescence (PL) measurement results indicate that the GaN filmmore » grown by the new process sequence has less tensile stress and optically good. The surface and interface structures of an ultra thin silicon nitride film grown on the Si surface are investigated by core-level photoelectron spectroscopy and it clearly indicates that the quality of silicon nitride notably affects the properties of GaN growth.« less
T1 weighted fat/water separated PROPELLER acquired with dual bandwidths.
Rydén, Henric; Berglund, Johan; Norbeck, Ola; Avventi, Enrico; Skare, Stefan
2018-04-24
To describe a fat/water separated dual receiver bandwidth (rBW) spin echo PROPELLER sequence that eliminates the dead time associated with single rBW sequences. A nonuniform noise whitening by regularization of the fat/water inverse problem is proposed, to enable dual rBW reconstructions. Bipolar, flyback, and dual spin echo sequences were developed. All sequences acquire two echoes with different rBW without dead time. Chemical shift displacement was corrected by performing the fat/water separation in k-space, prior to gridding. The proposed sequences were compared to fat saturation, and single rBW sequences, in terms of SNR and CNR efficiency, using clinically relevant acquisition parameters. The impact of motion was investigated. Chemical shift correction greatly improved the image quality, especially at high resolution acquired with low rBW, and also improved motion estimates. SNR efficiency of the dual spin echo sequence was up to 20% higher than the single rBW acquisition, while CNR efficiency was 50% higher for the bipolar acquisition. Noise whitening was deemed necessary for all dual rBW acquisitions, rendering high image quality with strong and homogenous fat suppression. Dual rBW sequences eliminate the dead time present in single rBW sequences, which improves SNR efficiency. In combination with the proposed regularization, this enables highly efficient T1-weighted PROPELLER images without chemical shift displacement. © 2018 International Society for Magnetic Resonance in Medicine.
MytiBase: a knowledgebase of mussel (M. galloprovincialis) transcribed sequences
Venier, Paola; De Pittà, Cristiano; Bernante, Filippo; Varotto, Laura; De Nardi, Barbara; Bovo, Giuseppe; Roch, Philippe; Novoa, Beatriz; Figueras, Antonio; Pallavicini, Alberto; Lanfranchi, Gerolamo
2009-01-01
Background Although Bivalves are among the most studied marine organisms due to their ecological role, economic importance and use in pollution biomonitoring, very little information is available on the genome sequences of mussels. This study reports the functional analysis of a large-scale Expressed Sequence Tag (EST) sequencing from different tissues of Mytilus galloprovincialis (the Mediterranean mussel) challenged with toxic pollutants, temperature and potentially pathogenic bacteria. Results We have constructed and sequenced seventeen cDNA libraries from different Mediterranean mussel tissues: gills, digestive gland, foot, anterior and posterior adductor muscle, mantle and haemocytes. A total of 24,939 clones were sequenced from these libraries generating 18,788 high-quality ESTs which were assembled into 2,446 overlapping clusters and 4,666 singletons resulting in a total of 7,112 non-redundant sequences. In particular, a high-quality normalized cDNA library (Nor01) was constructed as determined by the high rate of gene discovery (65.6%). Bioinformatic screening of the non-redundant M. galloprovincialis sequences identified 159 microsatellite-containing ESTs. Clusters, consensuses, related similarities and gene ontology searches have been organized in a dedicated, searchable database . Conclusion We defined the first species-specific catalogue of M. galloprovincialis ESTs including 7,112 unique transcribed sequences. Putative microsatellite markers were identified. This annotated catalogue represents a valuable platform for expression studies, marker validation and genetic linkage analysis for investigations in the biology of Mediterranean mussels. PMID:19203376
Moll, Karen M; Zhou, Peng; Ramaraj, Thiruvarangan; Fajardo, Diego; Devitt, Nicholas P; Sadowsky, Michael J; Stupar, Robert M; Tiffin, Peter; Miller, Jason R; Young, Nevin D; Silverstein, Kevin A T; Mudge, Joann
2017-08-04
Third generation sequencing technologies, with sequencing reads in the tens- of kilo-bases, facilitate genome assembly by spanning ambiguous regions and improving continuity. This has been critical for plant genomes, which are difficult to assemble due to high repeat content, gene family expansions, segmental and tandem duplications, and polyploidy. Recently, high-throughput mapping and scaffolding strategies have further improved continuity. Together, these long-range technologies enable quality draft assemblies of complex genomes in a cost-effective and timely manner. Here, we present high quality genome assemblies of the model legume plant, Medicago truncatula (R108) using PacBio, Dovetail Chicago (hereafter, Dovetail) and BioNano technologies. To test these technologies for plant genome assembly, we generated five assemblies using all possible combinations and ordering of these three technologies in the R108 assembly. While the BioNano and Dovetail joins overlapped, they also showed complementary gains in continuity and join numbers. Both technologies spanned repetitive regions that PacBio alone was unable to bridge. Combining technologies, particularly Dovetail followed by BioNano, resulted in notable improvements compared to Dovetail or BioNano alone. A combination of PacBio, Dovetail, and BioNano was used to generate a high quality draft assembly of R108, a M. truncatula accession widely used in studies of functional genomics. As a test for the usefulness of the resulting genome sequence, the new R108 assembly was used to pinpoint breakpoints and characterize flanking sequence of a previously identified translocation between chromosomes 4 and 8, identifying more than 22.7 Mb of novel sequence not present in the earlier A17 reference assembly. Adding Dovetail followed by BioNano data yielded complementary improvements in continuity over the original PacBio assembly. This strategy proved efficient and cost-effective for developing a quality draft assembly compared to traditional reference assemblies.
Whole-Genome Sequencing of Lactobacillus salivarius Strains BCRC 14759 and BCRC 12574
Chiu, Shih-Hau; Wang, Li-Ting; Huang, Lina
2017-01-01
ABSTRACT Lactobacillus salivarius BCRC 14759 has been identified as a high-exopolysaccharide-producing strain with potential as a probiotic or fermented dairy product. Here, we report the genome sequences of L. salivarius BCRC 14759 and the comparable strain BCRC 12574, isolated from human saliva. The PacBio RSII sequencing platform was used to obtain high-quality assemblies for characterization of this probiotic candidate. PMID:29167259
Olivieri, Laura J; Cross, Russell R; O'Brien, Kendall E; Ratnayaka, Kanishka; Hansen, Michael S
2015-09-01
Cardiac magnetic resonance (MR) imaging is a valuable tool in congenital heart disease; however patients frequently have metal devices in the chest from the treatment of their disease that complicate imaging. Methods are needed to improve imaging around metal implants near the heart. Basic sequence parameter manipulations have the potential to minimize artifact while limiting effects on image resolution and quality. Our objective was to design cine and static cardiac imaging sequences to minimize metal artifact while maintaining image quality. Using systematic variation of standard imaging parameters on a fluid-filled phantom containing commonly used metal cardiac devices, we developed optimized sequences for steady-state free precession (SSFP), gradient recalled echo (GRE) cine imaging, and turbo spin-echo (TSE) black-blood imaging. We imaged 17 consecutive patients undergoing routine cardiac MR with 25 metal implants of various origins using both standard and optimized imaging protocols for a given slice position. We rated images for quality and metal artifact size by measuring metal artifact in two orthogonal planes within the image. All metal artifacts were reduced with optimized imaging. The average metal artifact reduction for the optimized SSFP cine was 1.5+/-1.8 mm, and for the optimized GRE cine the reduction was 4.6+/-4.5 mm (P < 0.05). Quality ratings favored the optimized GRE cine. Similarly, the average metal artifact reduction for the optimized TSE images was 1.6+/-1.7 mm (P < 0.05), and quality ratings favored the optimized TSE imaging. Imaging sequences tailored to minimize metal artifact are easily created by modifying basic sequence parameters, and images are superior to standard imaging sequences in both quality and artifact size. Specifically, for optimized cine imaging a GRE sequence should be used with settings that favor short echo time, i.e. flow compensation off, weak asymmetrical echo and a relatively high receiver bandwidth. For static black-blood imaging, a TSE sequence should be used with fat saturation turned off and high receiver bandwidth.
454-pyrosequencing: A tool for discovery and biomarker development
USDA-ARS?s Scientific Manuscript database
The Roche GS-FLX (454) sequencer has made possible what was thought impossible just a few years ago: sequence >1 million high-quality nucleotide reads (mean 400 bp) in less than 12 h. This technology provides valuable species-specific sequence information, and is a valuable tool to discover and und...
Analysis of Illumina Microbial Assemblies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clum, Alicia; Foster, Brian; Froula, Jeff
2010-05-28
Since the emerging of second generation sequencing technologies, the evaluation of different sequencing approaches and their assembly strategies for different types of genomes has become an important undertaken. Next generation sequencing technologies dramatically increase sequence throughput while decreasing cost, making them an attractive tool for whole genome shotgun sequencing. To compare different approaches for de-novo whole genome assembly, appropriate tools and a solid understanding of both quantity and quality of the underlying sequence data are crucial. Here, we performed an in-depth analysis of short-read Illumina sequence assembly strategies for bacterial and archaeal genomes. Different types of Illumina libraries as wellmore » as different trim parameters and assemblers were evaluated. Results of the comparative analysis and sequencing platforms will be presented. The goal of this analysis is to develop a cost-effective approach for the increased throughput of the generation of high quality microbial genomes.« less
Unassigned MS/MS Spectra: Who Am I?
Pathan, Mohashin; Samuel, Monisha; Keerthikumar, Shivakumar; Mathivanan, Suresh
2017-01-01
Recent advances in high resolution tandem mass spectrometry (MS) has resulted in the accumulation of high quality data. Paralleled with these advances in instrumentation, bioinformatics software have been developed to analyze such quality datasets. In spite of these advances, data analysis in mass spectrometry still remains critical for protein identification. In addition, the complexity of the generated MS/MS spectra, unpredictable nature of peptide fragmentation, sequence annotation errors, and posttranslational modifications has impeded the protein identification process. In a typical MS data analysis, about 60 % of the MS/MS spectra remains unassigned. While some of these could attribute to the low quality of the MS/MS spectra, a proportion can be classified as high quality. Further analysis may reveal how much of the unassigned MS spectra attribute to search space, sequence annotation errors, mutations, and/or posttranslational modifications. In this chapter, the tools used to identify proteins and ways to assign unassigned tandem MS spectra are discussed.
Research, development and pilot production of high output thin silicon solar cells
NASA Technical Reports Server (NTRS)
Iles, P. A.
1976-01-01
Work was performed to define and apply processes which could lead to high output from thin (2-8 mils) silicon solar cells. The overall problems are outlined, and two satisfactory process sequences were developed. These sequences led to good output cells in the thickness range to just below 4 mils; although the initial contract scope was reduced, one of these sequences proved capable of operating beyond a pilot line level, to yield good quality 4-6 mil cells of high output.
Use of the Fluidigm C1 platform for RNA sequencing of single mouse pancreatic islet cells.
Xin, Yurong; Kim, Jinrang; Ni, Min; Wei, Yi; Okamoto, Haruka; Lee, Joseph; Adler, Christina; Cavino, Katie; Murphy, Andrew J; Yancopoulos, George D; Lin, Hsin Chieh; Gromada, Jesper
2016-03-22
This study provides an assessment of the Fluidigm C1 platform for RNA sequencing of single mouse pancreatic islet cells. The system combines microfluidic technology and nanoliter-scale reactions. We sequenced 622 cells, allowing identification of 341 islet cells with high-quality gene expression profiles. The cells clustered into populations of α-cells (5%), β-cells (92%), δ-cells (1%), and pancreatic polypeptide cells (2%). We identified cell-type-specific transcription factors and pathways primarily involved in nutrient sensing and oxidation and cell signaling. Unexpectedly, 281 cells had to be removed from the analysis due to low viability, low sequencing quality, or contamination resulting in the detection of more than one islet hormone. Collectively, we provide a resource for identification of high-quality gene expression datasets to help expand insights into genes and pathways characterizing islet cell types. We reveal limitations in the C1 Fluidigm cell capture process resulting in contaminated cells with altered gene expression patterns. This calls for caution when interpreting single-cell transcriptomics data using the C1 Fluidigm system.
A new and fast method for preparing high quality lambda DNA suitable for sequencing.
Manfioletti, G; Schneider, C
1988-01-01
A method is described for the rapid purification of high quality lambda DNA. The method can be used from either liquid or plate lysates and on a small scale or a large scale. It relies on the preadsobtion of all polyanions present in the lysate to an "insoluble" anion-exchange matrix (DEAE or TEAE). Phage particles are then disrupted by combined treatment with EDTA/proteinase K and the resulting DNA is precipitated by the addition of the cationic detergent cetyl (or hexadecyl)-trimethyl ammonium bromide-CTAB ("soluble" anion-exchange matrix). The precipitated CTAB-DNA complex is then exchanged to Na-DNA and ethanol precipitated. The resultant purified DNA is suitable for enzymatic reactions and provides a high quality template for dideoxy-sequence analysis. Images PMID:2966928
Irwin, Jodi A; Saunier, Jessica L; Strouss, Katharine M; Sturk, Kimberly A; Diegoli, Toni M; Just, Rebecca S; Coble, Michael D; Parson, Walther; Parsons, Thomas J
2007-06-01
In an effort to increase the quantity, breadth and availability of mtDNA databases suitable for forensic comparisons, we have developed a high-throughput process to generate approximately 5000 control region sequences per year from regional US populations, global populations from which the current US population is derived and global populations currently under-represented in available forensic databases. The system utilizes robotic instrumentation for all laboratory steps from pre-extraction through sequence detection, and a rigorous eight-step, multi-laboratory data review process with entirely electronic data transfer. Over the past 3 years, nearly 10,000 control region sequences have been generated using this approach. These data are being made publicly available and should further address the need for consistent, high-quality mtDNA databases for forensic testing.
USDA-ARS?s Scientific Manuscript database
The effect of refrigeration on bacterial communities within raw and pasteurized buffalo milk was studied using high-throughput sequencing. High quality samples of raw buffalo milk were obtained from five dairy farms in the Guangxi province of China. A sample of each milk was pasteurized, and both r...
High-Quality Draft Genome Sequence of Candida apicola NRRL Y-50540
Vega-Alvarado, Leticia; Gómez-Angulo, Jorge; Escalante-García, Zazil; Grande, Ricardo; Gschaedler-Mathis, Anne; Amaya-Delgado, Lorena
2015-01-01
Candida apicola, a highly osmotolerant ascomycetes yeast, produces sophorolipids (biosurfactants), membrane fatty acids, and enzymes of biotechnological interest. The genome obtained has a high-quality draft for this species and can be used as a reference to perform further analyses, such as differential gene expression in yeast from Candida genera. PMID:26067948
Fischbach, Katharina; Kosiek, Otrud; Friebe, Björn; Wybranski, Christian; Schnackenburg, Bernhard; Schmeisser, Alexander; Smid, Jan; Ricke, Jens; Pech, Maciej
2017-01-01
Cardiac magnetic resonance imaging (cMRI) has become the non-invasive reference standard for the evaluation of cardiac function and viability. The introduction of open, high-field, 1.0T (HFO) MR scanners offers advantages for examinations of obese, claustrophobic and paediatric patients.The aim of our study was to compare standard cMRI sequences from an HFO scanner and those from a cylindrical, 1.5T MR system. Fifteen volunteers underwent cMRI both in an open HFO and in a cylindrical MR system. The protocol consisted of cine and unenhanced tissue sequences. The signal-to-noise ratio (SNR) for each sequence and blood-myocardium contrast for the cine sequences were assessed. Image quality and artefacts were rated. The location and number of non-diagnostic segments was determined. Volunteers' tolerance to examinations in both scanners was investigated. SNR was significantly lower in the HFO scanner (all p<0.001). However, the contrast of the cine sequence was significantly higher in the HFO platform compared to the 1.5T MR scanner (0.685±0.41 vs. 0.611±0.54; p<0.001). Image quality was comparable for all sequences (all p>0.05). Overall, only few non-diagnostic myocardial segments were recorded: 6/960 (0.6%) by the HFO and 17/960 (1.8%) segments by the cylindrical system. The volunteers expressed a preference for the open MR system (p<0.01). Standard cardiac MRI sequences in an HFO platform offer a high image quality that is comparable to the quality of images acquired in a cylindrical 1.5T MR scanner. An open scanner design may potentially improve tolerance of cardiac MRI and therefore allow to examine an even broader patient spectrum.
Whole-Genome Sequencing of Lactobacillus salivarius Strains BCRC 14759 and BCRC 12574.
Chiu, Shih-Hau; Chen, Chien-Chi; Wang, Li-Ting; Huang, Lina
2017-11-22
Lactobacillus salivarius BCRC 14759 has been identified as a high-exopolysaccharide-producing strain with potential as a probiotic or fermented dairy product. Here, we report the genome sequences of L. salivarius BCRC 14759 and the comparable strain BCRC 12574, isolated from human saliva. The PacBio RSII sequencing platform was used to obtain high-quality assemblies for characterization of this probiotic candidate. Copyright © 2017 Chiu et al.
Nanoliter reactors improve multiple displacement amplification of genomes from single cells.
Marcy, Yann; Ishoey, Thomas; Lasken, Roger S; Stockwell, Timothy B; Walenz, Brian P; Halpern, Aaron L; Beeson, Karen Y; Goldberg, Susanne M D; Quake, Stephen R
2007-09-01
Since only a small fraction of environmental bacteria are amenable to laboratory culture, there is great interest in genomic sequencing directly from single cells. Sufficient DNA for sequencing can be obtained from one cell by the Multiple Displacement Amplification (MDA) method, thereby eliminating the need to develop culture methods. Here we used a microfluidic device to isolate individual Escherichia coli and amplify genomic DNA by MDA in 60-nl reactions. Our results confirm a report that reduced MDA reaction volume lowers nonspecific synthesis that can result from contaminant DNA templates and unfavourable interaction between primers. The quality of the genome amplification was assessed by qPCR and compared favourably to single-cell amplifications performed in standard 50-microl volumes. Amplification bias was greatly reduced in nanoliter volumes, thereby providing a more even representation of all sequences. Single-cell amplicons from both microliter and nanoliter volumes provided high-quality sequence data by high-throughput pyrosequencing, thereby demonstrating a straightforward route to sequencing genomes from single cells.
Assessment of the cPAS-based BGISEQ-500 platform for metagenomic sequencing.
Fang, Chao; Zhong, Huanzi; Lin, Yuxiang; Chen, Bing; Han, Mo; Ren, Huahui; Lu, Haorong; Luber, Jacob M; Xia, Min; Li, Wangsheng; Stein, Shayna; Xu, Xun; Zhang, Wenwei; Drmanac, Radoje; Wang, Jian; Yang, Huanming; Hammarström, Lennart; Kostic, Aleksandar D; Kristiansen, Karsten; Li, Junhua
2018-03-01
More extensive use of metagenomic shotgun sequencing in microbiome research relies on the development of high-throughput, cost-effective sequencing. Here we present a comprehensive evaluation of the performance of the new high-throughput sequencing platform BGISEQ-500 for metagenomic shotgun sequencing and compare its performance with that of 2 Illumina platforms. Using fecal samples from 20 healthy individuals, we evaluated the intra-platform reproducibility for metagenomic sequencing on the BGISEQ-500 platform in a setup comprising 8 library replicates and 8 sequencing replicates. Cross-platform consistency was evaluated by comparing 20 pairwise replicates on the BGISEQ-500 platform vs the Illumina HiSeq 2000 platform and the Illumina HiSeq 4000 platform. In addition, we compared the performance of the 2 Illumina platforms against each other. By a newly developed overall accuracy quality control method, an average of 82.45 million high-quality reads (96.06% of raw reads) per sample, with 90.56% of bases scoring Q30 and above, was obtained using the BGISEQ-500 platform. Quantitative analyses revealed extremely high reproducibility between BGISEQ-500 intra-platform replicates. Cross-platform replicates differed slightly more than intra-platform replicates, yet a high consistency was observed. Only a low percentage (2.02%-3.25%) of genes exhibited significant differences in relative abundance comparing the BGISEQ-500 and HiSeq platforms, with a bias toward genes with higher GC content being enriched on the HiSeq platforms. Our study provides the first set of performance metrics for human gut metagenomic sequencing data using BGISEQ-500. The high accuracy and technical reproducibility confirm the applicability of the new platform for metagenomic studies, though caution is still warranted when combining metagenomic data from different platforms.
Beaton, Ainsley; Lood, Cédric; Cunningham-Oakes, Edward; MacFadyen, Alison; Mullins, Alex J; Bestawy, Walid El; Botelho, João; Chevalier, Sylvie; Dalzell, Chloe; Dolan, Stephen K; Faccenda, Alberto; Ghequire, Maarten G K; Higgins, Steven; Kutschera, Alexander; Murray, Jordan; Redway, Martha; Salih, Talal; Smith, Brian A; Smits, Nathan; Thomson, Ryan; Woodcock, Stuart; Cornelis, Pierre; Lavigne, Rob; van Noort, Vera
2018-01-01
Abstract Pseudomonas baetica strain a390T is the type strain of this recently described species and here we present its high-contiguity draft genome. To celebrate the 16th International Conference on Pseudomonas, the genome of P. baetica strain a390T was sequenced using a unique combination of Ion Torrent semiconductor and Oxford Nanopore methods as part of a collaborative community-led project. The use of high-quality Ion Torrent sequences with long Nanopore reads gave rapid, high-contiguity and -quality, 16-contig genome sequence. Whole genome phylogenetic analysis places P. baetica within the P. koreensis clade of the P. fluorescens group. Comparison of the main genomic features of P. baetica with a variety of other Pseudomonas spp. suggests that it is a highly adaptable organism, typical of the genus. This strain was originally isolated from the liver of a diseased wedge sole fish, and genotypic and phenotypic analyses show that it is tolerant to osmotic stress and to oxytetracycline. PMID:29579234
Mrochen, Michael; Schelling, Urs; Wuellner, Christian; Donitzky, Christof
2009-02-01
To investigate the effect of temporal and spatial distributions of laser spots (scan sequences) on the corneal surface quality after ablation and the maximum ablation of a given refractive correction after photoablation with a high-repetition-rate scanning-spot laser. IROC AG, Zurich, Switzerland, and WaveLight AG, Erlangen, Germany. Bovine corneas and poly(methyl methacrylate) (PMMA) plates were photoablated using a 1050 Hz excimer laser prototype for corneal laser surgery. Four temporal and spatial spot distributions (scan sequences) with different temporal overlapping factors were created for 3 myopic, 3 hyperopic, and 3 phototherapeutic keratectomy ablation profiles. Surface quality and maximum ablation depth were measured using a surface profiling system. The surface quality factor increased (rough surfaces) as the amount of temporal overlapping in the scan sequence and the amount of correction increased. The rise in surface quality factor was less for bovine corneas than for PMMA. The scan sequence might cause systematic substructures at the surface of the ablated material depending on the overlapping factor. The maximum ablation varied within the scan sequence. The temporal and spatial distribution of the laser spots (scan sequence) during a corneal laser procedure affected the surface quality and maximum ablation depth of the ablation profile. Corneal laser surgery could theoretically benefit from smaller spot sizes and higher repetition rates. The temporal and spatial spot distributions are relevant to achieving these aims.
Draft genome sequence of Venturia carpophila, the causal agent of peach scab
USDA-ARS?s Scientific Manuscript database
Venturia carpophila causes peach scab, a disease that renders peach fruit unmarketable. We report a high-quality draft genome sequence (36.9 Mb) of V. carpophila from an isolate collected from a peach tree in central Georgia in the United States. The genome sequence described will be a useful resour...
USDA-ARS?s Scientific Manuscript database
The low cost of next generation sequencing (NGS) technology and the availability of a large number of well annotated plant genomes has made sequencing technology useful to breeding programs. With the published high quality tomato reference genome of the processing cultivar Heinz 1706, we can now uti...
Methodologic European external quality assurance for DNA sequencing: the EQUALseq program.
Ahmad-Nejad, Parviz; Dorn-Beineke, Alexandra; Pfeiffer, Ulrike; Brade, Joachim; Geilenkeuser, Wolf-Jochen; Ramsden, Simon; Pazzagli, Mario; Neumaier, Michael
2006-04-01
DNA sequencing is a key technique in molecular diagnostics, but to date no comprehensive methodologic external quality assessment (EQA) programs have been instituted. Between 2003 and 2005, the European Union funded, as specific support actions, the EQUAL initiative to develop methodologic EQA schemes for genotyping (EQUALqual), quantitative PCR (EQUALquant), and sequencing (EQUALseq). Here we report on the results of the EQUALseq program. The participating laboratories received a 4-sample set comprising 2 DNA plasmids, a PCR product, and a finished sequencing reaction to be analyzed. Data and information from detailed questionnaires were uploaded online and evaluated by use of a scoring system for technical skills and proficiency of data interpretation. Sixty laboratories from 21 European countries registered, and 43 participants (72%) returned data and samples. Capillary electrophoresis was the predominant platform (n = 39; 91%). The median contiguous correct sequence stretch was 527 nucleotides with considerable variation in quality of both primary data and data evaluation. The association between laboratory performance and the number of sequencing assays/year was statistically significant (P <0.05). Interestingly, more than 30% of participants neither added comments to their data nor made efforts to identify the gene sequences or mutational positions. Considerable variations exist even in a highly standardized methodology such as DNA sequencing. Methodologic EQAs are appropriate tools to uncover strengths and weaknesses in both technique and proficiency, and our results emphasize the need for mandatory EQAs. The results of EQUALseq should help improve the overall quality of molecular genetics findings obtained by DNA sequencing.
Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi
2015-11-20
The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.
quality, cutting-edge genomic services and technologies in order to expand our understanding of disease high quality next generation sequencing and genotyping services to investigators working to discover issues as they relate to study design, data production and quality control. Completed studies encompass
Murillo, Gabriel H; You, Na; Su, Xiaoquan; Cui, Wei; Reilly, Muredach P; Li, Mingyao; Ning, Kang; Cui, Xinping
2016-05-15
Single nucleotide variant (SNV) detection procedures are being utilized as never before to analyze the recent abundance of high-throughput DNA sequencing data, both on single and multiple sample datasets. Building on previously published work with the single sample SNV caller genotype model selection (GeMS), a multiple sample version of GeMS (MultiGeMS) is introduced. Unlike other popular multiple sample SNV callers, the MultiGeMS statistical model accounts for enzymatic substitution sequencing errors. It also addresses the multiple testing problem endemic to multiple sample SNV calling and utilizes high performance computing (HPC) techniques. A simulation study demonstrates that MultiGeMS ranks highest in precision among a selection of popular multiple sample SNV callers, while showing exceptional recall in calling common SNVs. Further, both simulation studies and real data analyses indicate that MultiGeMS is robust to low-quality data. We also demonstrate that accounting for enzymatic substitution sequencing errors not only improves SNV call precision at low mapping quality regions, but also improves recall at reference allele-dominated sites with high mapping quality. The MultiGeMS package can be downloaded from https://github.com/cui-lab/multigems xinping.cui@ucr.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species
2013-01-01
Background The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another. PMID:23870653
The need for high-quality whole-genome sequence databases in microbial forensics.
Sjödin, Andreas; Broman, Tina; Melefors, Öjar; Andersson, Gunnar; Rasmusson, Birgitta; Knutsson, Rickard; Forsman, Mats
2013-09-01
Microbial forensics is an important part of a strengthened capability to respond to biocrime and bioterrorism incidents to aid in the complex task of distinguishing between natural outbreaks and deliberate acts. The goal of a microbial forensic investigation is to identify and criminally prosecute those responsible for a biological attack, and it involves a detailed analysis of the weapon--that is, the pathogen. The recent development of next-generation sequencing (NGS) technologies has greatly increased the resolution that can be achieved in microbial forensic analyses. It is now possible to identify, quickly and in an unbiased manner, previously undetectable genome differences between closely related isolates. This development is particularly relevant for the most deadly bacterial diseases that are caused by bacterial lineages with extremely low levels of genetic diversity. Whole-genome analysis of pathogens is envisaged to be increasingly essential for this purpose. In a microbial forensic context, whole-genome sequence analysis is the ultimate method for strain comparisons as it is informative during identification, characterization, and attribution--all 3 major stages of the investigation--and at all levels of microbial strain identity resolution (ie, it resolves the full spectrum from family to isolate). Given these capabilities, one bottleneck in microbial forensics investigations is the availability of high-quality reference databases of bacterial whole-genome sequences. To be of high quality, databases need to be curated and accurate in terms of sequences, metadata, and genetic diversity coverage. The development of whole-genome sequence databases will be instrumental in successfully tracing pathogens in the future.
Moazzam Jazi, Maryam; Rajaei, Saideh; Seyedi, Seyed Mahdi
2015-10-01
The quality and quantity of RNA are critical for successful downstream transcriptome-based studies such as microarrays and RNA sequencing (RNA-Seq). RNA isolation from woody plants, such as Pistacia vera, with very high amounts of polyphenols and polysaccharides is an enormous challenge. Here, we describe a highly efficient protocol that overcomes the limitations posed by poor quality and low yield of isolated RNA from pistachio and various recalcitrant woody plants. The key factors that resulted in a yield of 150 μg of high quality RNA per 200 mg of plant tissue include the elimination of phenol from the extraction buffer, raising the concentration of β-mercaptoethanol, long time incubation at 65 °C, and nucleic acid precipitation with optimized volume of NaCl and isopropyl alcohol. Also, the A260/A280 and A260/A230 of extracted RNA were about 1.9-2.1and 2.2-2.3, respectively, revealing the high purity. Since the isolated RNA passed highly stringent quality control standards for sensitive reactions, including RNA sequencing and real-time PCR, it can be considered as a reliable and cost-effective method for RNA extraction from woody plants.
DOE Office of Scientific and Technical Information (OSTI.GOV)
De Meyer, Sofie E.; Fabiano, Elena; Tian, Rui
Cupriavidus sp. strain UYPR2.512 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from a root nodule of Parapiptadenia rigida grown in soils from a native forest of Uruguay. Here we describe the features of Cupriavidus sp. strain UYPR2.512, together with sequence and annotation. We find the 7,858,949 bp high-quality permanent draft genome is arranged in 365 scaffolds of 369 contigs, contains 7,411 protein-coding genes and 76 RNA-only encoding genes, and is part of the GEBA-RNB project proposal.
De Meyer, Sofie E.; Fabiano, Elena; Tian, Rui; ...
2015-04-11
Cupriavidus sp. strain UYPR2.512 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from a root nodule of Parapiptadenia rigida grown in soils from a native forest of Uruguay. Here we describe the features of Cupriavidus sp. strain UYPR2.512, together with sequence and annotation. We find the 7,858,949 bp high-quality permanent draft genome is arranged in 365 scaffolds of 369 contigs, contains 7,411 protein-coding genes and 76 RNA-only encoding genes, and is part of the GEBA-RNB project proposal.
A High Quality Draft Consensus Sequence of the Genome of a Heterozygous Grapevine Variety
Cartwright, Dustin A.; Cestaro, Alessandro; Pruss, Dmitry; Pindo, Massimo; FitzGerald, Lisa M.; Vezzulli, Silvia; Reid, Julia; Malacarne, Giulia; Iliev, Diana; Coppola, Giuseppina; Wardell, Bryan; Micheletti, Diego; Macalma, Teresita; Facci, Marco; Mitchell, Jeff T.; Perazzolli, Michele; Eldredge, Glenn; Gatto, Pamela; Oyzerski, Rozan; Moretto, Marco; Gutin, Natalia; Stefanini, Marco; Chen, Yang; Segala, Cinzia; Davenport, Christine; Demattè, Lorenzo; Mraz, Amy; Battilana, Juri; Stormo, Keith; Costa, Fabrizio; Tao, Quanzhou; Si-Ammour, Azeddine; Harkins, Tim; Lackey, Angie; Perbost, Clotilde; Taillon, Bruce; Stella, Alessandra; Solovyev, Victor; Fawcett, Jeffrey A.; Sterck, Lieven; Vandepoele, Klaas; Grando, Stella M.; Toppo, Stefano; Moser, Claudio; Lanchbury, Jerry; Bogden, Robert; Skolnick, Mark; Sgaramella, Vittorio; Bhatnagar, Satish K.; Fontana, Paolo; Gutin, Alexander; Van de Peer, Yves; Salamini, Francesco; Viola, Roberto
2007-01-01
Background Worldwide, grapes and their derived products have a large market. The cultivated grape species Vitis vinifera has potential to become a model for fruit trees genetics. Like many plant species, it is highly heterozygous, which is an additional challenge to modern whole genome shotgun sequencing. In this paper a high quality draft genome sequence of a cultivated clone of V. vinifera Pinot Noir is presented. Principal Findings We estimate the genome size of V. vinifera to be 504.6 Mb. Genomic sequences corresponding to 477.1 Mb were assembled in 2,093 metacontigs and 435.1 Mb were anchored to the 19 linkage groups (LGs). The number of predicted genes is 29,585, of which 96.1% were assigned to LGs. This assembly of the grape genome provides candidate genes implicated in traits relevant to grapevine cultivation, such as those influencing wine quality, via secondary metabolites, and those connected with the extreme susceptibility of grape to pathogens. Single nucleotide polymorphism (SNP) distribution was consistent with a diffuse haplotype structure across the genome. Of around 2,000,000 SNPs, 1,751,176 were mapped to chromosomes and one or more of them were identified in 86.7% of anchored genes. The relative age of grape duplicated genes was estimated and this made possible to reveal a relatively recent Vitis-specific large scale duplication event concerning at least 10 chromosomes (duplication not reported before). Conclusions Sanger shotgun sequencing and highly efficient sequencing by synthesis (SBS), together with dedicated assembly programs, resolved a complex heterozygous genome. A consensus sequence of the genome and a set of mapped marker loci were generated. Homologous chromosomes of Pinot Noir differ by 11.2% of their DNA (hemizygous DNA plus chromosomal gaps). SNP markers are offered as a tool with the potential of introducing a new era in the molecular breeding of grape. PMID:18094749
Losada, Liliana; Varga, John J.; Hostetler, Jessica; Radune, Diana; Kim, Maria; Durkin, Scott; Schneewind, Olaf; Nierman, William C.
2011-01-01
Yersinia pestis is the causative agent of the plague. Y. pestis KIM 10+ strain was passaged and selected for loss of the 102 kb pgm locus, resulting in an attenuated strain, KIM D27. In this study, whole genome sequencing was performed on KIM D27 in order to identify any additional differences. Initial assemblies of 454 data were highly fragmented, and various bioinformatic tools detected between 15 and 465 SNPs and INDELs when comparing both strains, the vast majority associated with A or T homopolymer sequences. Consequently, Illumina sequencing was performed to improve the quality of the assembly. Hybrid sequence assemblies were performed and a total of 56 validated SNP/INDELs and 5 repeat differences were identified in the D27 strain relative to published KIM 10+ sequence. However, further analysis showed that 55 of these SNP/INDELs and 3 repeats were errors in the KIM 10+ reference sequence. We conclude that both 454 and Illumina sequencing were required to obtain the most accurate and rapid sequence results for Y. pestis KIMD27. SNP and INDELS calls were most accurate when both Newbler and CLC Genomics Workbench were employed. For purposes of obtaining high quality genome sequence differences between strains, any identified differences should be verified in both the new and reference genomes. PMID:21559501
Losada, Liliana; Varga, John J; Hostetler, Jessica; Radune, Diana; Kim, Maria; Durkin, Scott; Schneewind, Olaf; Nierman, William C
2011-04-29
Yersinia pestis is the causative agent of the plague. Y. pestis KIM 10+ strain was passaged and selected for loss of the 102 kb pgm locus, resulting in an attenuated strain, KIM D27. In this study, whole genome sequencing was performed on KIM D27 in order to identify any additional differences. Initial assemblies of 454 data were highly fragmented, and various bioinformatic tools detected between 15 and 465 SNPs and INDELs when comparing both strains, the vast majority associated with A or T homopolymer sequences. Consequently, Illumina sequencing was performed to improve the quality of the assembly. Hybrid sequence assemblies were performed and a total of 56 validated SNP/INDELs and 5 repeat differences were identified in the D27 strain relative to published KIM 10+ sequence. However, further analysis showed that 55 of these SNP/INDELs and 3 repeats were errors in the KIM 10+ reference sequence. We conclude that both 454 and Illumina sequencing were required to obtain the most accurate and rapid sequence results for Y. pestis KIMD27. SNP and INDELS calls were most accurate when both Newbler and CLC Genomics Workbench were employed. For purposes of obtaining high quality genome sequence differences between strains, any identified differences should be verified in both the new and reference genomes.
2010-01-01
Background Little genomic or trancriptomic information on Ganoderma lucidum (Lingzhi) is known. This study aims to discover the transcripts involved in secondary metabolite biosynthesis and developmental regulation of G. lucidum using an expressed sequence tag (EST) library. Methods A cDNA library was constructed from the G. lucidum fruiting body. Its high-quality ESTs were assembled into unique sequences with contigs and singletons. The unique sequences were annotated according to sequence similarities to genes or proteins available in public databases. The detection of simple sequence repeats (SSRs) was preformed by online analysis. Results A total of 1,023 clones were randomly selected from the G. lucidum library and sequenced, yielding 879 high-quality ESTs. These ESTs showed similarities to a diverse range of genes. The sequences encoding squalene epoxidase (SE) and farnesyl-diphosphate synthase (FPS) were identified in this EST collection. Several candidate genes, such as hydrophobin, MOB2, profilin and PHO84 were detected for the first time in G. lucidum. Thirteen (13) potential SSR-motif microsatellite loci were also identified. Conclusion The present study demonstrates a successful application of EST analysis in the discovery of transcripts involved in the secondary metabolite biosynthesis and the developmental regulation of G. lucidum. PMID:20230644
Image quality assessment of silent T2 PROPELLER sequence for brain imaging in infants.
Kim, Hyun Gi; Choi, Jin Wook; Yoon, Soo Han; Lee, Sieun
2018-02-01
Infants are vulnerable to high acoustic noise. Acoustic noise generated by MR scanning can be reduced by a silent sequence. The purpose of this study is to compare the image quality of the conventional and silent T2 PROPELLER sequences for brain imaging in infants. A total of 36 scans were acquired from 24 infants using a 3 T MR scanner. Each patient underwent both conventional and silent T2 PROPELLER sequences. Acoustic noise level was measured. Quantitative and qualitative assessments were performed with the images taken with each sequence. The sound pressure level of the conventional T2 PROPELLER imaging sequence was 92.1 dB and that of the silent T2 PROPELLER imaging sequence was 73.3 dB (reduction of 20%). On quantitative assessment, the two sequences (conventional vs silent T2 PROPELLER) did not show significant difference in relative contrast (0.069 vs 0.068, p value = 0.536) and signal-to-noise ratio (75.4 vs 114.8, p value = 0.098). Qualitative assessment of overall image quality (p value = 0.572), grey-white differentiation (p value = 0.986), shunt-related artefact (p value > 0.999), motion artefact (p value = 0.801) and myelination degree in different brain regions (p values ≥ 0.092) did not show significant difference between the two sequences. The silent T2 PROPELLER sequence reduces acoustic noise and generated comparable image quality to that of the conventional sequence. Advances in knowledge: This is the first report to compare silent T2 PROPELLER images with that of conventional T2 PROPELLER images in children.
Riesgo, Ana; Pérez-Porro, Alicia R; Carmona, Susana; Leys, Sally P; Giribet, Gonzalo
2012-03-01
Transcriptome sequencing with next-generation sequencing technologies has the potential for addressing many long-standing questions about the biology of sponges. Transcriptome sequence quality depends on good cDNA libraries, which requires high-quality mRNA. Standard protocols for preserving and isolating mRNA often require optimization for unusual tissue types. Our aim was assessing the efficiency of two preservation modes, (i) flash freezing with liquid nitrogen (LN₂) and (ii) immersion in RNAlater, for the recovery of high-quality mRNA from sponge tissues. We also tested whether the long-term storage of samples at -80 °C affects the quantity and quality of mRNA. We extracted mRNA from nine sponge species and analysed the quantity and quality (A260/230 and A260/280 ratios) of mRNA according to preservation method, storage time, and taxonomy. The quantity and quality of mRNA depended significantly on the preservation method used (LN₂) outperforming RNAlater), the sponge species, and the interaction between them. When the preservation was analysed in combination with either storage time or species, the quantity and A260/230 ratio were both significantly higher for LN₂-preserved samples. Interestingly, individual comparisons for each preservation method over time indicated that both methods performed equally efficiently during the first month, but RNAlater lost efficiency in storage times longer than 2 months compared with flash-frozen samples. In summary, we find that for long-term preservation of samples, flash freezing is the preferred method. If LN₂ is not available, RNAlater can be used, but mRNA extraction during the first month of storage is advised. © 2011 Blackwell Publishing Ltd.
Purification of High Molecular Weight Genomic DNA from Powdery Mildew for Long-Read Sequencing.
Feehan, Joanna M; Scheibel, Katherine E; Bourras, Salim; Underwood, William; Keller, Beat; Somerville, Shauna C
2017-03-31
The powdery mildew fungi are a group of economically important fungal plant pathogens. Relatively little is known about the molecular biology and genetics of these pathogens, in part due to a lack of well-developed genetic and genomic resources. These organisms have large, repetitive genomes, which have made genome sequencing and assembly prohibitively difficult. Here, we describe methods for the collection, extraction, purification and quality control assessment of high molecular weight genomic DNA from one powdery mildew species, Golovinomyces cichoracearum. The protocol described includes mechanical disruption of spores followed by an optimized phenol/chloroform genomic DNA extraction. A typical yield was 7 µg DNA per 150 mg conidia. The genomic DNA that is isolated using this procedure is suitable for long-read sequencing (i.e., > 48.5 kbp). Quality control measures to ensure the size, yield, and purity of the genomic DNA are also described in this method. Sequencing of the genomic DNA of the quality described here will allow for the assembly and comparison of multiple powdery mildew genomes, which in turn will lead to a better understanding and improved control of this agricultural pathogen.
Adachi, Noboru; Umetsu, Kazuo; Shojo, Hideki
2014-01-01
Mitochondrial DNA (mtDNA) is widely used for DNA analysis of highly degraded samples because of its polymorphic nature and high number of copies in a cell. However, as endogenous mtDNA in deteriorated samples is scarce and highly fragmented, it is not easy to obtain reliable data. In the current study, we report the risks of direct sequencing mtDNA in highly degraded material, and suggest a strategy to ensure the quality of sequencing data. It was observed that direct sequencing data of the hypervariable segment (HVS) 1 by using primer sets that generate an amplicon of 407 bp (long-primer sets) was different from results obtained by using newly designed primer sets that produce an amplicon of 120-139 bp (mini-primer sets). The data aligned with the results of mini-primer sets analysis in an amplicon length-dependent manner; the shorter the amplicon, the more evident the endogenous sequence became. Coding region analysis using multiplex amplified product-length polymorphisms revealed the incongruence of single nucleotide polymorphisms between the coding region and HVS 1 caused by contamination with exogenous mtDNA. Although the sequencing data obtained using long-primer sets turned out to be erroneous, it was unambiguous and reproducible. These findings suggest that PCR primers that produce amplicons shorter than those currently recognized should be used for mtDNA analysis in highly degraded samples. Haplogroup motif analysis of the coding region and HVS should also be performed to improve the reliability of forensic mtDNA data. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Altahawi, Faysal F; Blount, Kevin J; Morley, Nicholas P; Raithel, Esther; Omar, Imran M
2017-01-01
To compare a faster, new, high-resolution accelerated 3D-fast-spin-echo (3D-FSE) acquisition sequence (CS-SPACE) to traditional 2D and high-resolution 3D sequences for knee 3-T magnetic resonance imaging (MRI). Twenty patients received knee MRIs that included routine 2D (T1, PD ± FS, T2-FS; 0.5 × 0.5 × 3 mm 3 ; ∼10 min), traditional 3D FSE (SPACE-PD-FS; 0.5 × 0.5 × 0.5 mm 3 ; ∼7.5 min), and accelerated 3D-FSE prototype (CS-SPACE-PD-FS; 0.5 × 0.5 × 0.5 mm 3 ; ∼5 min) acquisitions on a 3-T MRI system (Siemens MAGNETOM Skyra). Three musculoskeletal radiologists (MSKRs) prospectively and independently reviewed the studies with graded surveys comparing image and diagnostic quality. Tissue-specific signal-to-noise ratios (SNR) and contrast-to-noise ratios (CNR) were also compared. MSKR-perceived diagnostic quality of cartilage was significantly higher for CS-SPACE than for SPACE and 2D sequences (p < 0.001). Assessment of diagnostic quality of menisci and synovial fluid was higher for CS-SPACE than for SPACE (p < 0.001). CS-SPACE was not significantly different from SPACE but had lower assessments than 2D sequences for evaluation of bones, ligaments, muscles, and fat (p ≤ 0.004). 3D sequences had higher spatial resolution, but lower overall assessed contrast (p < 0.001). Overall image quality from CS-SPACE was assessed as higher than SPACE (p = 0.007), but lower than 2D sequences (p < 0.001). Compared to SPACE, CS-SPACE had higher fluid SNR and CNR against all other tissues (all p < 0.001). The CS-SPACE prototype allows for faster isotropic acquisitions of knee MRIs over currently used protocols. High fluid-to-cartilage CNR and higher spatial resolution over routine 2D sequences may present a valuable role for CS-SPACE in the evaluation of cartilage and menisci.
DNA Extraction Protocols for Whole-Genome Sequencing in Marine Organisms.
Panova, Marina; Aronsson, Henrik; Cameron, R Andrew; Dahl, Peter; Godhe, Anna; Lind, Ulrika; Ortega-Martinez, Olga; Pereyra, Ricardo; Tesson, Sylvie V M; Wrange, Anna-Lisa; Blomberg, Anders; Johannesson, Kerstin
2016-01-01
The marine environment harbors a large proportion of the total biodiversity on this planet, including the majority of the earths' different phyla and classes. Studying the genomes of marine organisms can bring interesting insights into genome evolution. Today, almost all marine organismal groups are understudied with respect to their genomes. One potential reason is that extraction of high-quality DNA in sufficient amounts is challenging for many marine species. This is due to high polysaccharide content, polyphenols and other secondary metabolites that will inhibit downstream DNA library preparations. Consequently, protocols developed for vertebrates and plants do not always perform well for invertebrates and algae. In addition, many marine species have large population sizes and, as a consequence, highly variable genomes. Thus, to facilitate the sequence read assembly process during genome sequencing, it is desirable to obtain enough DNA from a single individual, which is a challenge in many species of invertebrates and algae. Here, we present DNA extraction protocols for seven marine species (four invertebrates, two algae, and a marine yeast), optimized to provide sufficient DNA quality and yield for de novo genome sequencing projects.
Yield and Economic Responses of Peanut to Crop Rotation Sequence
USDA-ARS?s Scientific Manuscript database
National Peanut Research Laboratory, Dawson, GA 39842. Proper crop rotation is essential to maintaining high peanut yield and quality. However, the economic considerations of maintaining or altering crop rotation sequences must incorporate the commodity prices, production costs, and yield responses...
Sma3s: a three-step modular annotator for large sequence datasets.
Muñoz-Mérida, Antonio; Viguera, Enrique; Claros, M Gonzalo; Trelles, Oswaldo; Pérez-Pulido, Antonio J
2014-08-01
Automatic sequence annotation is an essential component of modern 'omics' studies, which aim to extract information from large collections of sequence data. Most existing tools use sequence homology to establish evolutionary relationships and assign putative functions to sequences. However, it can be difficult to define a similarity threshold that achieves sufficient coverage without sacrificing annotation quality. Defining the correct configuration is critical and can be challenging for non-specialist users. Thus, the development of robust automatic annotation techniques that generate high-quality annotations without needing expert knowledge would be very valuable for the research community. We present Sma3s, a tool for automatically annotating very large collections of biological sequences from any kind of gene library or genome. Sma3s is composed of three modules that progressively annotate query sequences using either: (i) very similar homologues, (ii) orthologous sequences or (iii) terms enriched in groups of homologous sequences. We trained the system using several random sets of known sequences, demonstrating average sensitivity and specificity values of ~85%. In conclusion, Sma3s is a versatile tool for high-throughput annotation of a wide variety of sequence datasets that outperforms the accuracy of other well-established annotation algorithms, and it can enrich existing database annotations and uncover previously hidden features. Importantly, Sma3s has already been used in the functional annotation of two published transcriptomes. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Cuesta, Isabel; González, Luis M.; Estrada, Karel; Grande, Ricardo; Zaballos, Ángel; Lobo, Cheryl A.; Barrera, Jorge
2014-01-01
Babesia divergens causes significant morbidity and mortality in cattle and splenectomized or immunocompromised individuals. Here, we present a 10.7-Mb high-quality draft genome of this parasite close to chromosome resolution that will enable comparative genome analyses and synteny studies among related parasites. PMID:25395649
High-quality genome of the peach scab pathogen, Venturia carpophila
USDA-ARS?s Scientific Manuscript database
Venturia carpophila causes peach scab, a disease that renders peach (Prunus persica) fruit unmarketable. We report a high-quality draft genome (36.9 Mb) of V. carpophila from an isolate collected from a peach tree in central Georgia. The genome was sequenced by MiSeq using an Illumina paired-end lib...
Brandstätter, Anita; Peterson, Christine T; Irwin, Jodi A; Mpoke, Solomon; Koech, Davy K; Parson, Walther; Parsons, Thomas J
2004-10-01
Large forensic mtDNA databases which adhere to strict guidelines for generation and maintenance, are not available for many populations outside of the United States and western Europe. We have established a high quality mtDNA control region sequence database for urban Nairobi as both a reference database for forensic investigations, and as a tool to examine the genetic variation of Kenyan sequences in the context of known African variation. The Nairobi sequences exhibited high variation and a low random match probability, indicating utility for forensic testing. Haplogroup identification and frequencies were compared with those reported from other published studies on African, or African-origin populations from Mozambique, Sierra Leone, and the United States, and suggest significant differences in the mtDNA compositions of the various populations. The quality of the sequence data in our study was investigated and supported using phylogenetic measures. Our data demonstrate the diversity and distinctiveness of African populations, and underline the importance of establishing additional forensic mtDNA databases of indigenous African populations.
Identification of Microbial Profile of Koji Using Single Molecule, Real-Time Sequencing Technology.
Hui, Wenyan; Hou, Qiangchuan; Cao, Chenxia; Xu, Haiyan; Zhen, Yi; Kwok, Lai-Yu; Sun, Tiansong; Zhang, Heping; Zhang, Wenyi
2017-05-01
Koji is a kind of Japanese traditional fermented starter that has been used for centuries. Many fermented foods are made from koji, such as sake, miso, and soy sauce. This study used the single molecule real-time sequencing technology (SMRT) to investigate the bacterial and fungal microbiota of 3 Japanese koji samples. After SMRT analysis, a total of 39121 high-quality sequences were generated, including 14354 bacterial and 24767 fungal sequence reads. The high-quality gene sequences were assigned to 5 bacterial and 2 fungal plyla, dominated by Proteobacteria and Ascomycota, respectively. At the genus level, Ochrobactrum and Wickerhamomyces were the most abundant bacterial and fungal genera, respectively. The predominant bacterial and fungal species were Ochrobactrum lupini and Wickerhamomyces anomalus, respectively. Our study profiled the microbiota composition of 3 Japanese koji samples to the species level precision. The results may be useful for further development of traditional fermented products, especially optimization of koji preparation. Meanwhile, this study has demonstrated that SMRT is a robust tool for analyzing the microbial composition in food samples. © 2017 Institute of Food Technologists®.
One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly.
Koren, Sergey; Phillippy, Adam M
2015-02-01
Like a jigsaw puzzle with large pieces, a genome sequenced with long reads is easier to assemble. However, recent sequencing technologies have favored lowering per-base cost at the expense of read length. This has dramatically reduced sequencing cost, but resulted in fragmented assemblies, which negatively affect downstream analyses and hinder the creation of finished (gapless, high-quality) genomes. In contrast, emerging long-read sequencing technologies can now produce reads tens of kilobases in length, enabling the automated finishing of microbial genomes for under $1000. This promises to improve the quality of reference databases and facilitate new studies of chromosomal structure and variation. We present an overview of these new technologies and the methods used to assemble long reads into complete genomes. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
2011-01-01
Background BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. Results This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Conclusions Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed. PMID:21794110
Feltus, Frank A; Saski, Christopher A; Mockaitis, Keithanne; Haiminen, Niina; Parida, Laxmi; Smith, Zachary; Ford, James; Staton, Margaret E; Ficklin, Stephen P; Blackmon, Barbara P; Cheng, Chun-Huai; Schnell, Raymond J; Kuhn, David N; Motamayor, Juan-Carlos
2011-07-27
BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.
2012-01-01
Background Cotton is the world’s most important natural textile fiber and a significant oilseed crop. Decoding cotton genomes will provide the ultimate reference and resource for research and utilization of the species. Integration of high-density genetic maps with genomic sequence information will largely accelerate the process of whole-genome assembly in cotton. Results In this paper, we update a high-density interspecific genetic linkage map of allotetraploid cultivated cotton. An additional 1,167 marker loci have been added to our previously published map of 2,247 loci. Three new marker types, InDel (insertion-deletion) and SNP (single nucleotide polymorphism) developed from gene information, and REMAP (retrotransposon-microsatellite amplified polymorphism), were used to increase map density. The updated map consists of 3,414 loci in 26 linkage groups covering 3,667.62 cM with an average inter-locus distance of 1.08 cM. Furthermore, genome-wide sequence analysis was finished using 3,324 informative sequence-based markers and publicly-available Gossypium DNA sequence information. A total of 413,113 EST and 195 BAC sequences were physically anchored and clustered by 3,324 sequence-based markers. Of these, 14,243 ESTs and 188 BACs from different species of Gossypium were clustered and specifically anchored to the high-density genetic map. A total of 2,748 candidate unigenes from 2,111 ESTs clusters and 63 BACs were mined for functional annotation and classification. The 337 ESTs/genes related to fiber quality traits were integrated with 132 previously reported cotton fiber quality quantitative trait loci, which demonstrated the important roles in fiber quality of these genes. Higher-level sequence conservation between different cotton species and between the A- and D-subgenomes in tetraploid cotton was found, indicating a common evolutionary origin for orthologous and paralogous loci in Gossypium. Conclusion This study will serve as a valuable genomic resource for tetraploid cotton genome assembly, for cloning genes related to superior agronomic traits, and for further comparative genomic analyses in Gossypium. PMID:23046547
Zook, Justin M.; Samarov, Daniel; McDaniel, Jennifer; Sen, Shurjo K.; Salit, Marc
2012-01-01
While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a data set used to calculate association of SSEs with various features in the reads and sequence context. This data set is typically either from a part of the data set being “recalibrated” (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 Phred-scaled quality score units, and by as much as 13 units at CpG sites. In addition, since the spike-in data used for recalibration are independent of the genome being sequenced, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration. PMID:22859977
PuLSE: Quality control and quantification of peptide sequences explored by phage display libraries.
Shave, Steven; Mann, Stefan; Koszela, Joanna; Kerr, Alastair; Auer, Manfred
2018-01-01
The design of highly diverse phage display libraries is based on assumption that DNA bases are incorporated at similar rates within the randomized sequence. As library complexity increases and expected copy numbers of unique sequences decrease, the exploration of library space becomes sparser and the presence of truly random sequences becomes critical. We present the program PuLSE (Phage Library Sequence Evaluation) as a tool for assessing randomness and therefore diversity of phage display libraries. PuLSE runs on a collection of sequence reads in the fastq file format and generates tables profiling the library in terms of unique DNA sequence counts and positions, translated peptide sequences, and normalized 'expected' occurrences from base to residue codon frequencies. The output allows at-a-glance quantitative quality control of a phage library in terms of sequence coverage both at the DNA base and translated protein residue level, which has been missing from toolsets and literature. The open source program PuLSE is available in two formats, a C++ source code package for compilation and integration into existing bioinformatics pipelines and precompiled binaries for ease of use.
Kamada, Mayumi; Hase, Sumitaka; Sato, Kengo; Toyoda, Atsushi; Fujiyama, Asao; Sakakibara, Yasubumi
2014-01-01
De novo microbial genome sequencing reached a turning point with third-generation sequencing (TGS) platforms, and several microbial genomes have been improved by TGS long reads. Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and it has a function in the production of the traditional Japanese fermented food “natto.” The B. subtilis natto BEST195 genome was previously sequenced with short reads, but it included some incomplete regions. We resequenced the BEST195 genome using a PacBio RS sequencer, and we successfully obtained a complete genome sequence from one scaffold without any gaps, and we also applied Illumina MiSeq short reads to enhance quality. Compared with the previous BEST195 draft genome and Marburg 168 genome, we found that incomplete regions in the previous genome sequence were attributed to GC-bias and repetitive sequences, and we also identified some novel genes that are found only in the new genome. PMID:25329997
QSRA: a quality-value guided de novo short read assembler.
Bryant, Douglas W; Wong, Weng-Keen; Mockler, Todd C
2009-02-24
New rapid high-throughput sequencing technologies have sparked the creation of a new class of assembler. Since all high-throughput sequencing platforms incorporate errors in their output, short-read assemblers must be designed to account for this error while utilizing all available data. We have designed and implemented an assembler, Quality-value guided Short Read Assembler, created to take advantage of quality-value scores as a further method of dealing with error. Compared to previous published algorithms, our assembler shows significant improvements not only in speed but also in output quality. QSRA generally produced the highest genomic coverage, while being faster than VCAKE. QSRA is extremely competitive in its longest contig and N50/N80 contig lengths, producing results of similar quality to those of EDENA and VELVET. QSRA provides a step closer to the goal of de novo assembly of complex genomes, improving upon the original VCAKE algorithm by not only drastically reducing runtimes but also increasing the viability of the assembly algorithm through further error handling capabilities.
Solving the Problem: Genome Annotation Standards before the Data Deluge.
Klimke, William; O'Donovan, Claire; White, Owen; Brister, J Rodney; Clark, Karen; Fedorov, Boris; Mizrachi, Ilene; Pruitt, Kim D; Tatusova, Tatiana
2011-10-15
The promise of genome sequencing was that the vast undiscovered country would be mapped out by comparison of the multitude of sequences available and would aid researchers in deciphering the role of each gene in every organism. Researchers recognize that there is a need for high quality data. However, different annotation procedures, numerous databases, and a diminishing percentage of experimentally determined gene functions have resulted in a spectrum of annotation quality. NCBI in collaboration with sequencing centers, archival databases, and researchers, has developed the first international annotation standards, a fundamental step in ensuring that high quality complete prokaryotic genomes are available as gold standard references. Highlights include the development of annotation assessment tools, community acceptance of protein naming standards, comparison of annotation resources to provide consistent annotation, and improved tracking of the evidence used to generate a particular annotation. The development of a set of minimal standards, including the requirement for annotated complete prokaryotic genomes to contain a full set of ribosomal RNAs, transfer RNAs, and proteins encoding core conserved functions, is an historic milestone. The use of these standards in existing genomes and future submissions will increase the quality of databases, enabling researchers to make accurate biological discoveries.
Solving the Problem: Genome Annotation Standards before the Data Deluge
Klimke, William; O'Donovan, Claire; White, Owen; Brister, J. Rodney; Clark, Karen; Fedorov, Boris; Mizrachi, Ilene; Pruitt, Kim D.; Tatusova, Tatiana
2011-01-01
The promise of genome sequencing was that the vast undiscovered country would be mapped out by comparison of the multitude of sequences available and would aid researchers in deciphering the role of each gene in every organism. Researchers recognize that there is a need for high quality data. However, different annotation procedures, numerous databases, and a diminishing percentage of experimentally determined gene functions have resulted in a spectrum of annotation quality. NCBI in collaboration with sequencing centers, archival databases, and researchers, has developed the first international annotation standards, a fundamental step in ensuring that high quality complete prokaryotic genomes are available as gold standard references. Highlights include the development of annotation assessment tools, community acceptance of protein naming standards, comparison of annotation resources to provide consistent annotation, and improved tracking of the evidence used to generate a particular annotation. The development of a set of minimal standards, including the requirement for annotated complete prokaryotic genomes to contain a full set of ribosomal RNAs, transfer RNAs, and proteins encoding core conserved functions, is an historic milestone. The use of these standards in existing genomes and future submissions will increase the quality of databases, enabling researchers to make accurate biological discoveries. PMID:22180819
Yield and Economic Responses of Peanut to Crop Rotation Sequence
USDA-ARS?s Scientific Manuscript database
Proper crop rotation is essential to maintaining high peanut yield and quality. However, the economic considerations of maintaining or altering crop rotation sequences must incorporate the commodity prices, production costs, and yield responses of all crops in, or potentially in, the crop rotation ...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tian, Rui; Parker, Matthew; Seshadri, Rekha
Bradyrhizobiumsp. Tv2a.2 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-fixing root nodule of Tachigali versicolor collected in Barro Colorado Island of Panama. Here we describe the features of Bradyrhizobiumsp. Tv2a.2, together with high-quality permanent draft genome sequence information and annotation. The 8,496,279 bp high-quality draft genome is arranged in 87 scaffolds of 87 contigs, contains 8,109 protein-coding genes and 72 RNA-only encoding genes. In conclusion, this rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project.
1.5 versus 3 versus 7 Tesla in abdominal MRI: A comparative study.
Laader, Anja; Beiderwellen, Karsten; Kraff, Oliver; Maderwald, Stefan; Wrede, Karsten; Ladd, Mark E; Lauenstein, Thomas C; Forsting, Michael; Quick, Harald H; Nassenstein, Kai; Umutlu, Lale
2017-01-01
The aim of this study was to investigate and compare the feasibility as well as potential impact of altered magnetic field properties on image quality and potential artifacts of 1.5 Tesla, 3 Tesla and 7 Tesla non-enhanced abdominal MRI. Magnetic Resonance (MR) imaging of the upper abdomen was performed in 10 healthy volunteers on a 1.5 Tesla, a 3 Tesla and a 7 Tesla MR system. The study protocol comprised a (1) T1-weighted fat-saturated spoiled gradient-echo sequence (2D FLASH), (2) T1-weighted fat-saturated volumetric interpolated breath hold examination sequence (3D VIBE), (3) T1-weighted 2D in and opposed phase sequence, (4) True fast imaging with steady-state precession sequence (TrueFISP) and (5) T2-weighted turbo spin-echo (TSE) sequence. For comparison reasons field of view and acquisition times were kept comparable for each correlating sequence at all three field strengths, while trying to achieve the highest possible spatial resolution. Qualitative and quantitative analyses were tested for significant differences. While 1.5 and 3 Tesla MRI revealed comparable results in all assessed features and sequences, 7 Tesla MRI yielded considerable differences in T1 and T2 weighted imaging. Benefits of 7 Tesla MRI encompassed an increased higher spatial resolution and a non-enhanced hyperintense vessel signal at 7 Tesla, potentially offering a more accurate diagnosis of abdominal parenchymatous and vasculature disease. 7 Tesla MRI was also shown to be more impaired by artifacts, including residual B1 inhomogeneities, susceptibility and chemical shift artifacts, resulting in reduced overall image quality and overall image impairment ratings. While 1.5 and 3 Tesla T2w imaging showed equivalently high image quality, 7 Tesla revealed strong impairments in its diagnostic value. Our results demonstrate the feasibility and overall comparable imaging ability of T1-weighted 7 Tesla abdominal MRI towards 3 Tesla and 1.5 Tesla MRI, yielding a promising diagnostic potential for non-enhanced Magnetic Resonance Angiography (MRA). 1.5 Tesla and 3 Tesla offer comparably high-quality T2w imaging, showing superior diagnostic quality over 7 Tesla MRI.
1.5 versus 3 versus 7 Tesla in abdominal MRI: A comparative study
Beiderwellen, Karsten; Kraff, Oliver; Maderwald, Stefan; Wrede, Karsten; Ladd, Mark E.; Lauenstein, Thomas C.; Forsting, Michael; Quick, Harald H.; Nassenstein, Kai; Umutlu, Lale
2017-01-01
Objectives The aim of this study was to investigate and compare the feasibility as well as potential impact of altered magnetic field properties on image quality and potential artifacts of 1.5 Tesla, 3 Tesla and 7 Tesla non-enhanced abdominal MRI. Materials and methods Magnetic Resonance (MR) imaging of the upper abdomen was performed in 10 healthy volunteers on a 1.5 Tesla, a 3 Tesla and a 7 Tesla MR system. The study protocol comprised a (1) T1-weighted fat-saturated spoiled gradient-echo sequence (2D FLASH), (2) T1-weighted fat-saturated volumetric interpolated breath hold examination sequence (3D VIBE), (3) T1-weighted 2D in and opposed phase sequence, (4) True fast imaging with steady-state precession sequence (TrueFISP) and (5) T2-weighted turbo spin-echo (TSE) sequence. For comparison reasons field of view and acquisition times were kept comparable for each correlating sequence at all three field strengths, while trying to achieve the highest possible spatial resolution. Qualitative and quantitative analyses were tested for significant differences. Results While 1.5 and 3 Tesla MRI revealed comparable results in all assessed features and sequences, 7 Tesla MRI yielded considerable differences in T1 and T2 weighted imaging. Benefits of 7 Tesla MRI encompassed an increased higher spatial resolution and a non-enhanced hyperintense vessel signal at 7 Tesla, potentially offering a more accurate diagnosis of abdominal parenchymatous and vasculature disease. 7 Tesla MRI was also shown to be more impaired by artifacts, including residual B1 inhomogeneities, susceptibility and chemical shift artifacts, resulting in reduced overall image quality and overall image impairment ratings. While 1.5 and 3 Tesla T2w imaging showed equivalently high image quality, 7 Tesla revealed strong impairments in its diagnostic value. Conclusions Our results demonstrate the feasibility and overall comparable imaging ability of T1-weighted 7 Tesla abdominal MRI towards 3 Tesla and 1.5 Tesla MRI, yielding a promising diagnostic potential for non-enhanced Magnetic Resonance Angiography (MRA). 1.5 Tesla and 3 Tesla offer comparably high-quality T2w imaging, showing superior diagnostic quality over 7 Tesla MRI. PMID:29125850
2004-01-01
The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5′-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline. PMID:15489334
High-throughput sequencing: a failure mode analysis.
Yang, George S; Stott, Jeffery M; Smailus, Duane; Barber, Sarah A; Balasundaram, Miruna; Marra, Marco A; Holt, Robert A
2005-01-04
Basic manufacturing principles are becoming increasingly important in high-throughput sequencing facilities where there is a constant drive to increase quality, increase efficiency, and decrease operating costs. While high-throughput centres report failure rates typically on the order of 10%, the causes of sporadic sequencing failures are seldom analyzed in detail and have not, in the past, been formally reported. Here we report the results of a failure mode analysis of our production sequencing facility based on detailed evaluation of 9,216 ESTs generated from two cDNA libraries. Two categories of failures are described; process-related failures (failures due to equipment or sample handling) and template-related failures (failures that are revealed by close inspection of electropherograms and are likely due to properties of the template DNA sequence itself). Preventative action based on a detailed understanding of failure modes is likely to improve the performance of other production sequencing pipelines.
Verde, Ignazio; Jenkins, Jerry; Dondini, Luca; Micali, Sabrina; Pagliarani, Giulia; Vendramin, Elisa; Paris, Roberta; Aramini, Valeria; Gazza, Laura; Rossini, Laura; Bassi, Daniele; Troggio, Michela; Shu, Shengqiang; Grimwood, Jane; Tartarini, Stefano; Dettori, Maria Teresa; Schmutz, Jeremy
2017-03-11
The availability of the peach genome sequence has fostered relevant research in peach and related Prunus species enabling the identification of genes underlying important horticultural traits as well as the development of advanced tools for genetic and genomic analyses. The first release of the peach genome (Peach v1.0) represented a high-quality WGS (Whole Genome Shotgun) chromosome-scale assembly with high contiguity (contig L50 214.2 kb), large portions of mapped sequences (96%) and high base accuracy (99.96%). The aim of this work was to improve the quality of the first assembly by increasing the portion of mapped and oriented sequences, correcting misassemblies and improving the contiguity and base accuracy using high-throughput linkage mapping and deep resequencing approaches. Four linkage maps with 3,576 molecular markers were used to improve the portion of mapped and oriented sequences (from 96.0% and 85.6% of Peach v1.0 to 99.2% and 98.2% of v2.0, respectively) and enabled a more detailed identification of discernible misassemblies (10.4 Mb in total). The deep resequencing approach fixed 859 homozygous SNPs (Single Nucleotide Polymorphisms) and 1347 homozygous indels. Moreover, the assembled NGS contigs enabled the closing of 212 gaps with an improvement in the contig L50 of 19.2%. The improved high quality peach genome assembly (Peach v2.0) represents a valuable tool for the analysis of the genetic diversity, domestication, and as a vehicle for genetic improvement of peach and related Prunus species. Moreover, the important phylogenetic position of peach and the absence of recent whole genome duplication (WGD) events make peach a pivotal species for comparative genomics studies aiming at elucidating plant speciation and diversification processes.
ERIC Educational Resources Information Center
O'Day, Danton H.
2006-01-01
There is accumulating evidence that animations aid learning of dynamic concepts in cell biology. However, existing animation packages are expensive and difficult to learn, and the subsequent production of even short animations can take weeks to months. Here I outline the principles and sequence of steps for producing high-quality PowerPoint…
Non-Enhanced MR Imaging of Cerebral Arteriovenous Malformations at 7 Tesla.
Wrede, Karsten H; Dammann, Philipp; Johst, Sören; Mönninghoff, Christoph; Schlamann, Marc; Maderwald, Stefan; Sandalcioglu, I Erol; Ladd, Mark E; Forsting, Michael; Sure, Ulrich; Umutlu, Lale
2016-03-01
To evaluate prospectively 7 Tesla time-of-flight (TOF) magnetic resonance angiography (MRA) and 7 Tesla non-contrast-enhanced magnetization-prepared rapid acquisition gradient-echo (MPRAGE) for delineation of intracerebral arteriovenous malformations (AVMs) in comparison to 1.5 Tesla TOF MRA and digital subtraction angiography (DSA). Twenty patients with single or multifocal AVMs were enrolled in this trial. The study protocol comprised 1.5 and 7 Tesla TOF MRA and 7 Tesla non-contrast-enhanced MPRAGE sequences. All patients underwent an additional four-vessel 3D DSA. Image analysis of the following five AVM features was performed individually by two radiologists on a five-point scale: nidus, feeder(s), draining vein(s), relationship to adjacent vessels, and overall image quality and presence of artefacts. A total of 21 intracerebral AVMs were detected. Both sequences at 7 Tesla were rated superior over 1.5 Tesla TOF MRA in the assessment of all considered AVM features. Image quality at 7 Tesla was comparable with DSA considering both sequences. Inter-observer accordance was good to excellent for the majority of ratings. This study demonstrates excellent image quality for depiction of intracerebral AVMs using non-contrast-enhanced 7 Tesla MRA, comparable with DSA. Assessment of untreated AVMs is a promising clinical application of ultra-high-field MRA. • Non-contrast-enhanced 7 Tesla MRA demonstrates excellent image quality for intracerebral AVM depiction. • Image quality at 7 Tesla was comparable with DSA considering both sequences. • Assessment of intracerebral AVMs is a promising clinical application of ultra-high-field MRA.
Image Quality in High-resolution and High-cadence Solar Imaging
NASA Astrophysics Data System (ADS)
Denker, C.; Dineva, E.; Balthasar, H.; Verma, M.; Kuckein, C.; Diercke, A.; González Manrique, S. J.
2018-03-01
Broad-band imaging and even imaging with a moderate bandpass (about 1 nm) provides a photon-rich environment, where frame selection (lucky imaging) becomes a helpful tool in image restoration, allowing us to perform a cost-benefit analysis on how to design observing sequences for imaging with high spatial resolution in combination with real-time correction provided by an adaptive optics (AO) system. This study presents high-cadence (160 Hz) G-band and blue continuum image sequences obtained with the High-resolution Fast Imager (HiFI) at the 1.5-meter GREGOR solar telescope, where the speckle-masking technique is used to restore images with nearly diffraction-limited resolution. The HiFI employs two synchronized large-format and high-cadence sCMOS detectors. The median filter gradient similarity (MFGS) image-quality metric is applied, among others, to AO-corrected image sequences of a pore and a small sunspot observed on 2017 June 4 and 5. A small region of interest, which was selected for fast-imaging performance, covered these contrast-rich features and their neighborhood, which were part of Active Region NOAA 12661. Modifications of the MFGS algorithm uncover the field- and structure-dependency of this image-quality metric. However, MFGS still remains a good choice for determining image quality without a priori knowledge, which is an important characteristic when classifying the huge number of high-resolution images contained in data archives. In addition, this investigation demonstrates that a fast cadence and millisecond exposure times are still insufficient to reach the coherence time of daytime seeing. Nonetheless, the analysis shows that data acquisition rates exceeding 50 Hz are required to capture a substantial fraction of the best seeing moments, significantly boosting the performance of post-facto image restoration.
Pagnuco, Inti Anabela; Revuelta, María Victoria; Bondino, Hernán Gabriel; Brun, Marcel; Ten Have, Arjen
2018-01-01
Protein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle. This is hampered by the lack of a score cut-off that is both sensitive and specific. HMMER Cut-off Threshold Tool (HMMERCTTER) adds a reliable cut-off threshold to the popular HMMER. Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off. Profiles and thresholds are then used as classifiers to screen a target dataset. Iterative inclusion of novel sequences to groups and the corresponding HMMER profiles results in high sensitivity while specificity is maintained by imposing 100% P&R self detection. In three presented case studies of protein superfamilies, classification of large datasets with 100% precision was achieved with over 95% recall. Limits and caveats are presented and explained. HMMERCTTER is a promising protein superfamily sequence classifier provided high quality training datasets are used. It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity. All relevant data and source codes are available from the Github repository at the following URL: https://github.com/BBCMdP/HMMERCTTER.
Pagnuco, Inti Anabela; Revuelta, María Victoria; Bondino, Hernán Gabriel; Brun, Marcel
2018-01-01
Background Protein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle. This is hampered by the lack of a score cut-off that is both sensitive and specific. Results HMMER Cut-off Threshold Tool (HMMERCTTER) adds a reliable cut-off threshold to the popular HMMER. Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off. Profiles and thresholds are then used as classifiers to screen a target dataset. Iterative inclusion of novel sequences to groups and the corresponding HMMER profiles results in high sensitivity while specificity is maintained by imposing 100% P&R self detection. In three presented case studies of protein superfamilies, classification of large datasets with 100% precision was achieved with over 95% recall. Limits and caveats are presented and explained. Conclusions HMMERCTTER is a promising protein superfamily sequence classifier provided high quality training datasets are used. It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity. All relevant data and source codes are available from the Github repository at the following URL: https://github.com/BBCMdP/HMMERCTTER. PMID:29579071
Template-based protein structure modeling using the RaptorX web server.
Källberg, Morten; Wang, Haipeng; Wang, Sheng; Peng, Jian; Wang, Zhiyong; Lu, Hui; Xu, Jinbo
2012-07-19
A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ~35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ~6,000 sequences submitted by ~1,600 users from around the world.
Template-based protein structure modeling using the RaptorX web server
Källberg, Morten; Wang, Haipeng; Wang, Sheng; Peng, Jian; Wang, Zhiyong; Lu, Hui; Xu, Jinbo
2016-01-01
A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ~35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ~6,000 sequences submitted by ~1,600 users from around the world. PMID:22814390
Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data
Kosugi, Shunichi; Natsume, Satoshi; Yoshida, Kentaro; MacLean, Daniel; Cano, Liliana; Kamoun, Sophien; Terauchi, Ryohei
2013-01-01
Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in ‘targeted’ alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/. PMID:24116042
Laurie, Matthew T; Bertout, Jessica A; Taylor, Sean D; Burton, Joshua N; Shendure, Jay A; Bielas, Jason H
2013-08-01
Due to the high cost of failed runs and suboptimal data yields, quantification and determination of fragment size range are crucial steps in the library preparation process for massively parallel sequencing (or next-generation sequencing). Current library quality control methods commonly involve quantification using real-time quantitative PCR and size determination using gel or capillary electrophoresis. These methods are laborious and subject to a number of significant limitations that can make library calibration unreliable. Herein, we propose and test an alternative method for quality control of sequencing libraries using droplet digital PCR (ddPCR). By exploiting a correlation we have discovered between droplet fluorescence and amplicon size, we achieve the joint quantification and size determination of target DNA with a single ddPCR assay. We demonstrate the accuracy and precision of applying this method to the preparation of sequencing libraries.
Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato; Takeyama, Haruko
2017-07-12
Massively parallel single-cell genome sequencing is required to further understand genetic diversities in complex biological systems. Whole genome amplification (WGA) is the first step for single-cell sequencing, but its throughput and accuracy are insufficient in conventional reaction platforms. Here, we introduce single droplet multiple displacement amplification (sd-MDA), a method that enables massively parallel amplification of single cell genomes while maintaining sequence accuracy and specificity. Tens of thousands of single cells are compartmentalized in millions of picoliter droplets and then subjected to lysis and WGA by passive droplet fusion in microfluidic channels. Because single cells are isolated in compartments, their genomes are amplified to saturation without contamination. This enables the high-throughput acquisition of contamination-free and cell specific sequence reads from single cells (21,000 single-cells/h), resulting in enhancement of the sequence data quality compared to conventional methods. This method allowed WGA of both single bacterial cells and human cancer cells. The obtained sequencing coverage rivals those of conventional techniques with superior sequence quality. In addition, we also demonstrate de novo assembly of uncultured soil bacteria and obtain draft genomes from single cell sequencing. This sd-MDA is promising for flexible and scalable use in single-cell sequencing.
Three-dimensional brain MRI for DBS patients within ultra-low radiofrequency power limits.
Sarkar, Subhendra N; Papavassiliou, Efstathios; Hackney, David B; Alsop, David C; Shih, Ludy C; Madhuranthakam, Ananth J; Busse, Reed F; La Ruche, Susan; Bhadelia, Rafeeque A
2014-04-01
For patients with deep brain stimulators (DBS), local absorbed radiofrequency (RF) power is unknown and is much higher than what the system estimates. We developed a comprehensive, high-quality brain magnetic resonance imaging (MRI) protocol for DBS patients utilizing three-dimensional (3D) magnetic resonance sequences at very low RF power. Six patients with DBS were imaged (10 sessions) using a transmit/receive head coil at 1.5 Tesla with modified 3D sequences within ultra-low specific absorption rate (SAR) limits (0.1 W/kg) using T2 , fast fluid-attenuated inversion recovery (FLAIR) and T1 -weighted image contrast. Tissue signal and tissue contrast from the low-SAR images were subjectively and objectively compared with routine clinical images of six age-matched controls. Low-SAR images of DBS patients demonstrated tissue contrast comparable to high-SAR images and were of diagnostic quality except for slightly reduced signal. Although preliminary, we demonstrated diagnostic quality brain MRI with optimized, volumetric sequences in DBS patients within very conservative RF safety guidelines offering a greater safety margin. © 2014 International Parkinson and Movement Disorder Society.
"First generation" automated DNA sequencing technology.
Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M
2011-10-01
Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.
Curtobacterium sp. Genome Sequencing Underlines Plant Growth Promotion-Related Traits
Bulgari, Daniela; Minio, Andrea; Casati, Paola; Quaglino, Fabio; Delledonne, Massimo
2014-01-01
Endophytic bacteria are microorganisms residing in plant tissues without causing disease symptoms. Here, we provide the high-quality genome sequence of Curtobacterium sp. strain S6, isolated from grapevine plant. The genome assembly contains 2,759,404 bp in 13 contigs and 2,456 predicted genes. PMID:25035321
USDA-ARS?s Scientific Manuscript database
Cultivated citrus are selections from, or hybrids of, wild progenitor species whose identities and contributions to citrus domestication remain controversial. Here we sequence and compare citrus genomes—a high-quality reference haploid clementine genome and mandarin, pummelo, sweet-orange and sour-o...
Verifying Digital Components of Physical Systems: Experimental Evaluation of Test Quality
NASA Astrophysics Data System (ADS)
Laputenko, A. V.; López, J. E.; Yevtushenko, N. V.
2018-03-01
This paper continues the study of high quality test derivation for verifying digital components which are used in various physical systems; those are sensors, data transfer components, etc. We have used logic circuits b01-b010 of the package of ITC'99 benchmarks (Second Release) for experimental evaluation which as stated before, describe digital components of physical systems designed for various applications. Test sequences are derived for detecting the most known faults of the reference logic circuit using three different approaches to test derivation. Three widely used fault types such as stuck-at-faults, bridges, and faults which slightly modify the behavior of one gate are considered as possible faults of the reference behavior. The most interesting test sequences are short test sequences that can provide appropriate guarantees after testing, and thus, we experimentally study various approaches to the derivation of the so-called complete test suites which detect all fault types. In the first series of experiments, we compare two approaches for deriving complete test suites. In the first approach, a shortest test sequence is derived for testing each fault. In the second approach, a test sequence is pseudo-randomly generated by the use of an appropriate software for logic synthesis and verification (ABC system in our study) and thus, can be longer. However, after deleting sequences detecting the same set of faults, a test suite returned by the second approach is shorter. The latter underlines the fact that in many cases it is useless to spend `time and efforts' for deriving a shortest distinguishing sequence; it is better to use the test minimization afterwards. The performed experiments also show that the use of only randomly generated test sequences is not very efficient since such sequences do not detect all the faults of any type. After reaching the fault coverage around 70%, saturation is observed, and the fault coverage cannot be increased anymore. For deriving high quality short test suites, the approach that is the combination of randomly generated sequences together with sequences which are aimed to detect faults not detected by random tests, allows to reach the good fault coverage using shortest test sequences.
2011-01-01
Background The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases. Results We provide quantifications and evidence for GC bias, error rates, error sequence context, effects of quality filtering, and the reliability of quality values. By combining different filtering criteria we reduced error rates 7-fold at the expense of discarding 12.5% of alignable bases. While overall error rates are low in HiSeq data we observed regions of accumulated wrong base calls. Only 3% of all error positions accounted for 24.7% of all substitution errors. Analyzing the forward and reverse strands separately revealed error rates of up to 18.7%. Insertions and deletions occurred at very low rates on average but increased to up to 2% in homopolymers. A positive correlation between read coverage and GC content was found depending on the GC content range. Conclusions The errors and biases we report have implications for the use and the interpretation of Illumina sequencing data. GAIIx and HiSeq data sets show slightly different error profiles. Quality filtering is essential to minimize downstream analysis artifacts. Supporting previous recommendations, the strand-specificity provides a criterion to distinguish sequencing errors from low abundance polymorphisms. PMID:22067484
Kaya, Hilal Betul; Cetin, Oznur; Kaya, Hulya; Sahin, Mustafa; Sefer, Filiz; Kahraman, Abdullah; Tanyolac, Bahattin
2013-01-01
Background The olive tree (Olea europaea L.) is a diploid (2n = 2x = 46) outcrossing species mainly grown in the Mediterranean area, where it is the most important oil-producing crop. Because of its economic, cultural and ecological importance, various DNA markers have been used in the olive to characterize and elucidate homonyms, synonyms and unknown accessions. However, a comprehensive characterization and a full sequence of its transcriptome are unavailable, leading to the importance of an efficient large-scale single nucleotide polymorphism (SNP) discovery in olive. The objectives of this study were (1) to discover olive SNPs using next-generation sequencing and to identify SNP primers for cultivar identification and (2) to characterize 96 olive genotypes originating from different regions of Turkey. Methodology/Principal Findings Next-generation sequencing technology was used with five distinct olive genotypes and generated cDNA, producing 126,542,413 reads using an Illumina Genome Analyzer IIx. Following quality and size trimming, the high-quality reads were assembled into 22,052 contigs with an average length of 1,321 bases and 45 singletons. The SNPs were filtered and 2,987 high-quality putative SNP primers were identified. The assembled sequences and singletons were subjected to BLAST similarity searches and annotated with a Gene Ontology identifier. To identify the 96 olive genotypes, these SNP primers were applied to the genotypes in combination with amplified fragment length polymorphism (AFLP) and simple sequence repeats (SSR) markers. Conclusions/Significance This study marks the highest number of SNP markers discovered to date from olive genotypes using transcriptome sequencing. The developed SNP markers will provide a useful source for molecular genetic studies, such as genetic diversity and characterization, high density quantitative trait locus (QTL) analysis, association mapping and map-based gene cloning in the olive. High levels of genetic variation among Turkish olive genotypes revealed by SNPs, AFLPs and SSRs allowed us to characterize the Turkish olive genotype. PMID:24058483
Quality scores for 32,000 genomes
Land, Miriam L.; Hyatt, Doug; Jun, Se-Ran; ...
2014-12-08
More than 80% of the microbial genomes in GenBank are of ‘draft’ quality (12,553 draft vs. 2,679 finished, as of October, 2013). In this study, we have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences. Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes hadmore » quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes. The score can be used to set thresholds for screening data when analyzing “all published genomes” and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Finally and unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an ‘A’ (codons ending with a ‘U’) are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.« less
Quality scores for 32,000 genomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Land, Miriam L.; Hyatt, Doug; Jun, Se-Ran
More than 80% of the microbial genomes in GenBank are of ‘draft’ quality (12,553 draft vs. 2,679 finished, as of October, 2013). In this study, we have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences. Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes hadmore » quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes. The score can be used to set thresholds for screening data when analyzing “all published genomes” and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Finally and unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an ‘A’ (codons ending with a ‘U’) are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Goordial, Jacqueline; Raymond-Bouchard, Isabelle; Riley, Robert
Here, we report the draft genome sequence of Rhodotorula sp. strain JG1b, a yeast that was isolated from ice-cemented permafrost in the upper-elevation McMurdo Dry Valleys, Antarctica. The sequenced genome size is 19.39 Mb, consisting of 156 scaffolds and containing a total of 5,625 predicted genes. This is the first known cold-adapted Rhodotorula sp. sequenced to date.
Goordial, Jacqueline; Raymond-Bouchard, Isabelle; Riley, Robert; ...
2016-03-17
Here, we report the draft genome sequence of Rhodotorula sp. strain JG1b, a yeast that was isolated from ice-cemented permafrost in the upper-elevation McMurdo Dry Valleys, Antarctica. The sequenced genome size is 19.39 Mb, consisting of 156 scaffolds and containing a total of 5,625 predicted genes. This is the first known cold-adapted Rhodotorula sp. sequenced to date.
The complete genome sequence of a Neandertal from the Altai Mountains
Prüfer, Kay; Racimo, Fernando; Patterson, Nick; Jay, Flora; Sankararaman, Sriram; Sawyer, Susanna; Heinze, Anja; Renaud, Gabriel; Sudmant, Peter H.; de Filippo, Cesare; Li, Heng; Mallick, Swapan; Dannemann, Michael; Fu, Qiaomei; Kircher, Martin; Kuhlwilm, Martin; Lachmann, Michael; Meyer, Matthias; Ongyerth, Matthias; Siebauer, Michael; Theunert, Christoph; Tandon, Arti; Moorjani, Priya; Pickrell, Joseph; Mullikin, James C.; Vohr, Samuel H.; Green, Richard E.; Hellmann, Ines; Johnson, Philip L. F.; Blanche, Hélène; Cann, Howard; Kitzman, Jacob O.; Shendure, Jay; Eichler, Evan E.; Lein, Ed S.; Bakken, Trygve E.; Golovanova, Liubov V.; Doronichev, Vladimir B.; Shunkov, Michael V.; Derevianko, Anatoli P.; Viola, Bence; Slatkin, Montgomery; Reich, David; Kelso, Janet; Pääbo, Svante
2014-01-01
We present a high-quality genome sequence of a Neandertal woman from Siberia. We show that her parents were related at the level of half siblings and that mating among close relatives was common among her recent ancestors. We also sequenced the genome of a Neandertal from the Caucasus to low coverage. An analysis of the relationships and population history of available archaic genomes and 25 present-day human genomes shows that several gene flow events occurred among Neandertals, Denisovans and early modern humans, possibly including gene flow into Denisovans from an unknown archaic group. Thus, interbreeding, albeit of low magnitude, occurred among many hominin groups in the Late Pleistocene. In addition, the high quality Neandertal genome allows us to establish a definitive list of substitutions that became fixed in modern humans after their separation from the ancestors of Neandertals and Denisovans. PMID:24352235
High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development.
Daccord, Nicolas; Celton, Jean-Marc; Linsmith, Gareth; Becker, Claude; Choisne, Nathalie; Schijlen, Elio; van de Geest, Henri; Bianco, Luca; Micheletti, Diego; Velasco, Riccardo; Di Pierro, Erica Adele; Gouzy, Jérôme; Rees, D Jasper G; Guérif, Philippe; Muranty, Hélène; Durel, Charles-Eric; Laurens, François; Lespinasse, Yves; Gaillard, Sylvain; Aubourg, Sébastien; Quesneville, Hadi; Weigel, Detlef; van de Weg, Eric; Troggio, Michela; Bucher, Etienne
2017-07-01
Using the latest sequencing and optical mapping technologies, we have produced a high-quality de novo assembly of the apple (Malus domestica Borkh.) genome. Repeat sequences, which represented over half of the assembly, provided an unprecedented opportunity to investigate the uncharacterized regions of a tree genome; we identified a new hyper-repetitive retrotransposon sequence that was over-represented in heterochromatic regions and estimated that a major burst of different transposable elements (TEs) occurred 21 million years ago. Notably, the timing of this TE burst coincided with the uplift of the Tian Shan mountains, which is thought to be the center of the location where the apple originated, suggesting that TEs and associated processes may have contributed to the diversification of the apple ancestor and possibly to its divergence from pear. Finally, genome-wide DNA methylation data suggest that epigenetic marks may contribute to agronomically relevant aspects, such as apple fruit development.
The OGCleaner: filtering false-positive homology clusters.
Fujimoto, M Stanley; Suvorov, Anton; Jensen, Nicholas O; Clement, Mark J; Snell, Quinn; Bybee, Seth M
2017-01-01
Detecting homologous sequences in organisms is an essential step in protein structure and function prediction, gene annotation and phylogenetic tree construction. Heuristic methods are often employed for quality control of putative homology clusters. These heuristics, however, usually only apply to pairwise sequence comparison and do not examine clusters as a whole. We present the Orthology Group Cleaner (the OGCleaner), a tool designed for filtering putative orthology groups as homology or non-homology clusters by considering all sequences in a cluster. The OGCleaner relies on high-quality orthologous groups identified in OrthoDB to train machine learning algorithms that are able to distinguish between true-positive and false-positive homology groups. This package aims to improve the quality of phylogenetic tree construction especially in instances of lower-quality transcriptome assemblies. https://github.com/byucsl/ogcleaner CONTACT: sfujimoto@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
De Meyer, Sofie E.; Fabiano, Elena; Tian, Rui; ...
2015-06-04
We report that Burkholderia sp. strain UYPR1.413 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from a root nodule of Parapiptadenia rigida collected at the Angico plantation, Mandiyu, Uruguay, in December 2006. A survey of symbionts of P. rigida in Uruguay demonstrated that this species is nodulated predominantly by Burkholderia microsymbionts. Moreover, Burkholderia sp. strain UYPR1.413 is a highly efficient nitrogen fixing symbiont with this host. Currently, the only other sequenced isolate to fix with this host is Cupriavidus sp. UYPR2.512. Therefore, Burkholderia sp. strain UYPR1.413 was selected for sequencing on the basis of its environmental and agriculturalmore » relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the GEBA-RNB project. Here we describe the features of Burkholderia sp. strain UYPR1.413, together with sequence and annotation. The 10,373,764 bp high-quality permanent draft genome is arranged in 336 scaffolds of 342 contigs, contains 9759 protein-coding genes and 77 RNA-only encoding genes.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
De Meyer, Sofie E.; Fabiano, Elena; Tian, Rui
We report that Burkholderia sp. strain UYPR1.413 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from a root nodule of Parapiptadenia rigida collected at the Angico plantation, Mandiyu, Uruguay, in December 2006. A survey of symbionts of P. rigida in Uruguay demonstrated that this species is nodulated predominantly by Burkholderia microsymbionts. Moreover, Burkholderia sp. strain UYPR1.413 is a highly efficient nitrogen fixing symbiont with this host. Currently, the only other sequenced isolate to fix with this host is Cupriavidus sp. UYPR2.512. Therefore, Burkholderia sp. strain UYPR1.413 was selected for sequencing on the basis of its environmental and agriculturalmore » relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the GEBA-RNB project. Here we describe the features of Burkholderia sp. strain UYPR1.413, together with sequence and annotation. The 10,373,764 bp high-quality permanent draft genome is arranged in 336 scaffolds of 342 contigs, contains 9759 protein-coding genes and 77 RNA-only encoding genes.« less
[The principle and application of the single-molecule real-time sequencing technology].
Yanhu, Liu; Lu, Wang; Li, Yu
2015-03-01
Last decade witnessed the explosive development of the third-generation sequencing strategy, including single-molecule real-time sequencing (SMRT), true single-molecule sequencing (tSMSTM) and the single-molecule nanopore DNA sequencing. In this review, we summarize the principle, performance and application of the SMRT sequencing technology. Compared with the traditional Sanger method and the next-generation sequencing (NGS) technologies, the SMRT approach has several advantages, including long read length, high speed, PCR-free and the capability of direct detection of epigenetic modifications. However, the disadvantage of its low accuracy, most of which resulted from insertions and deletions, is also notable. So, the raw sequence data need to be corrected before assembly. Up to now, the SMRT is a good fit for applications in the de novo genomic sequencing and the high-quality assemblies of small genomes. In the future, it is expected to play an important role in epigenetics, transcriptomic sequencing, and assemblies of large genomes.
The zebrafish reference genome sequence and its relationship to the human genome.
Howe, Kerstin; Clark, Matthew D; Torroja, Carlos F; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T; Guerra-Assunção, José A; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F; Laird, Gavin K; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Elliot, David; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Begum, Sharmin; Mortimore, Beverley; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Lloyd, Christine; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James D; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Lanz, Christa; Raddatz, Günter; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Schuster, Stephan C; Carter, Nigel P; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M J; Enright, Anton; Geisler, Robert; Plasterk, Ronald H A; Lee, Charles; Westerfield, Monte; de Jong, Pieter J; Zon, Leonard I; Postlethwait, John H; Nüsslein-Volhard, Christiane; Hubbard, Tim J P; Roest Crollius, Hugues; Rogers, Jane; Stemple, Derek L
2013-04-25
Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
The zebrafish reference genome sequence and its relationship to the human genome
Howe, Kerstin; Clark, Matthew D.; Torroja, Carlos F.; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E.; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C.; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T.; Guerra-Assunção, José A.; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F.; Laird, Gavin K.; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M.; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Carter, Nigel P.; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M. J.; Enright, Anton; Geisler, Robert; Plasterk, Ronald H. A.; Lee, Charles; Westerfield, Monte; de Jong, Pieter J.; Zon, Leonard I.; Postlethwait, John H.; Nüsslein-Volhard, Christiane; Hubbard, Tim J. P.; Crollius, Hugues Roest; Rogers, Jane; Stemple, Derek L.
2013-01-01
Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3–5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination. PMID:23594743
Wright, Imogen A.; Travers, Simon A.
2014-01-01
The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618
Ultrafast Brain MRI: Clinical Deployment and Comparison to Conventional Brain MRI at 3T.
Prakkamakul, Supada; Witzel, Thomas; Huang, Susie; Boulter, Daniel; Borja, Maria J; Schaefer, Pamela; Rosen, Bruce; Heberlein, Keith; Ratai, Eva; Gonzalez, Gilberto; Rapalino, Otto
2016-09-01
To compare an ultrafast brain magnetic resonance imaging (MRI) protocol to the conventional protocol in motion-prone inpatient clinical settings. This retrospective study was HIPAA compliant and approved by the Institutional Review Board with waived inform consent. Fifty-nine inpatients (30 males, 29 females; mean age 55.1, range 23-93 years)who underwent 3-Tesla brain MRI using ultrafast and conventional protocols, both including five sequences, were included in the study. The total scan time for five ultrafast sequences was 4 minutes 59 seconds. The ideal conventional acquisition time was 10 minutes 32 seconds but the actual acquisition took 15-20 minutes. The average scan times for ultrafast localizers, T1-weighted, T2-weighted, fluid-attenuated inversion recovery (FLAIR), diffusion-weighted, T2*-weighted sequences were 14, 41, 62, 96, 80, 6 seconds, respectively. Two blinded neuroradiologists independently assessed three aspects: (1) image quality, (2) gray-white matter (GM-WM) differentiation, and (3) diagnostic concordance for the detection of six clinically relevant imaging findings. Wilcoxon signed-rank test was used to compare image quality and GM-WM scores. Interobserver reproducibility was calculated. The ultrafast T1-weighted sequence demonstrated significantly better image quality (P = .005) and GM-WM differentiation (P < .001) compared to the conventional sequence. There was high agreement (>85%) between both protocols for the detection of mass-like lesion, hemorrhage, diffusion restriction, WM FLAIR hyperintensities, subarachnoid FLAIR hyperintensities, and hydrocephalus. The ultrafast protocol achieved at least comparable image quality and high diagnostic concordance compared to the conventional protocol. This fast protocol can be a viable option to replace the conventional protocol in motion-prone inpatient clinical settings. Copyright © 2016 by the American Society of Neuroimaging.
Peck, Michelle A; Sturk-Andreaggi, Kimberly; Thomas, Jacqueline T; Oliver, Robert S; Barritt-Ross, Suzanne; Marshall, Charla
2018-05-01
Generating mitochondrial genome (mitogenome) data from reference samples in a rapid and efficient manner is critical to harnessing the greater power of discrimination of the entire mitochondrial DNA (mtDNA) marker. The method of long-range target enrichment, Nextera XT library preparation, and Illumina sequencing on the MiSeq is a well-established technique for generating mitogenome data from high-quality samples. To this end, a validation was conducted for this mitogenome method processing up to 24 samples simultaneously along with analysis in the CLC Genomics Workbench and utilizing the AQME (AFDIL-QIAGEN mtDNA Expert) tool to generate forensic profiles. This validation followed the Federal Bureau of Investigation's Quality Assurance Standards (QAS) for forensic DNA testing laboratories and the Scientific Working Group on DNA Analysis Methods (SWGDAM) validation guidelines. The evaluation of control DNA, non-probative samples, blank controls, mixtures, and nonhuman samples demonstrated the validity of this method. Specifically, the sensitivity was established at ≥25 pg of nuclear DNA input for accurate mitogenome profile generation. Unreproducible low-level variants were observed in samples with low amplicon yields. Further, variant quality was shown to be a useful metric for identifying sequencing error and crosstalk. Success of this method was demonstrated with a variety of reference sample substrates and extract types. These studies further demonstrate the advantages of using NGS techniques by highlighting the quantitative nature of heteroplasmy detection. The results presented herein from more than 175 samples processed in ten sequencing runs, show this mitogenome sequencing method and analysis strategy to be valid for the generation of reference data. Copyright © 2018 Elsevier B.V. All rights reserved.
Development of a reference material of a single DNA molecule for the quality control of PCR testing.
Mano, Junichi; Hatano, Shuko; Futo, Satoshi; Yoshii, Junji; Nakae, Hiroki; Naito, Shigehiro; Takabatake, Reona; Kitta, Kazumi
2014-09-02
We developed a reference material of a single DNA molecule with a specific nucleotide sequence. The double-strand linear DNA which has PCR target sequences at the both ends was prepared as a reference DNA molecule, and we named the PCR targets on each side as confirmation sequence and standard sequence. The highly diluted solution of the reference molecule was dispensed into 96 wells of a plastic PCR plate to make the average number of molecules in a well below one. Subsequently, the presence or absence of the reference molecule in each well was checked by real-time PCR targeting for the confirmation sequence. After an enzymatic treatment of the reaction mixture in the positive wells for the digestion of PCR products, the resultant solution was used as the reference material of a single DNA molecule with the standard sequence. PCR analyses revealed that the prepared samples included only one reference molecule with high probability. The single-molecule reference material developed in this study will be useful for the absolute evaluation of a detection limit of PCR-based testing methods, the quality control of PCR analyses, performance evaluations of PCR reagents and instruments, and the preparation of an accurate calibration curve for real-time PCR quantitation.
A no-reference image and video visual quality metric based on machine learning
NASA Astrophysics Data System (ADS)
Frantc, Vladimir; Voronin, Viacheslav; Semenishchev, Evgenii; Minkin, Maxim; Delov, Aliy
2018-04-01
The paper presents a novel visual quality metric for lossy compressed video quality assessment. High degree of correlation with subjective estimations of quality is due to using of a convolutional neural network trained on a large amount of pairs video sequence-subjective quality score. We demonstrate how our predicted no-reference quality metric correlates with qualitative opinion in a human observer study. Results are shown on the EVVQ dataset with comparison existing approaches.
Zheng, Xiasheng; Zhang, Peng; Liao, Baosheng; Li, Jing; Liu, Xingyun; Shi, Yuhua; Cheng, Jinle; Lai, Zhitian; Xu, Jiang; Chen, Shilin
2017-01-01
Herbal medicine is a major component of complementary and alternative medicine, contributing significantly to the health of many people and communities. Quality control of herbal medicine is crucial to ensure that it is safe and sound for use. Here, we investigated a comprehensive quality evaluation system for a classic herbal medicine, Danggui Buxue Formula, by applying genetic-based and analytical chemistry approaches to authenticate and evaluate the quality of its samples. For authenticity, we successfully applied two novel technologies, third-generation sequencing and PCR-DGGE (denaturing gradient gel electrophoresis), to analyze the ingredient composition of the tested samples. For quality evaluation, we used high performance liquid chromatography assays to determine the content of chemical markers to help estimate the dosage relationship between its two raw materials, plant roots of Huangqi and Danggui. A series of surveys were then conducted against several exogenous contaminations, aiming to further access the efficacy and safety of the samples. In conclusion, the quality evaluation system demonstrated here can potentially address the authenticity, quality, and safety of herbal medicines, thus providing novel insight for enhancing their overall quality control. Highlight: We established a comprehensive quality evaluation system for herbal medicine, by combining two genetic-based approaches third-generation sequencing and DGGE (denaturing gradient gel electrophoresis) with analytical chemistry approaches to achieve the authentication and quality connotation of the samples. PMID:28955365
Zheng, Xiasheng; Zhang, Peng; Liao, Baosheng; Li, Jing; Liu, Xingyun; Shi, Yuhua; Cheng, Jinle; Lai, Zhitian; Xu, Jiang; Chen, Shilin
2017-01-01
Herbal medicine is a major component of complementary and alternative medicine, contributing significantly to the health of many people and communities. Quality control of herbal medicine is crucial to ensure that it is safe and sound for use. Here, we investigated a comprehensive quality evaluation system for a classic herbal medicine, Danggui Buxue Formula, by applying genetic-based and analytical chemistry approaches to authenticate and evaluate the quality of its samples. For authenticity, we successfully applied two novel technologies, third-generation sequencing and PCR-DGGE (denaturing gradient gel electrophoresis), to analyze the ingredient composition of the tested samples. For quality evaluation, we used high performance liquid chromatography assays to determine the content of chemical markers to help estimate the dosage relationship between its two raw materials, plant roots of Huangqi and Danggui. A series of surveys were then conducted against several exogenous contaminations, aiming to further access the efficacy and safety of the samples. In conclusion, the quality evaluation system demonstrated here can potentially address the authenticity, quality, and safety of herbal medicines, thus providing novel insight for enhancing their overall quality control. Highlight : We established a comprehensive quality evaluation system for herbal medicine, by combining two genetic-based approaches third-generation sequencing and DGGE (denaturing gradient gel electrophoresis) with analytical chemistry approaches to achieve the authentication and quality connotation of the samples.
Yu, Yang; Wei, Jiankai; Zhang, Xiaojun; Liu, Jingwen; Liu, Chengzhang; Li, Fuhua; Xiang, Jianhai
2014-01-01
The application of next generation sequencing technology has greatly facilitated high throughput single nucleotide polymorphism (SNP) discovery and genotyping in genetic research. In the present study, SNPs were discovered based on two transcriptomes of Litopenaeus vannamei (L. vannamei) generated from Illumina sequencing platform HiSeq 2000. One transcriptome of L. vannamei was obtained through sequencing on the RNA from larvae at mysis stage and its reference sequence was de novo assembled. The data from another transcriptome were downloaded from NCBI and the reads of the two transcriptomes were mapped separately to the assembled reference by BWA. SNP calling was performed using SAMtools. A total of 58,717 and 36,277 SNPs with high quality were predicted from the two transcriptomes, respectively. SNP calling was also performed using the reads of two transcriptomes together, and a total of 96,040 SNPs with high quality were predicted. Among these 96,040 SNPs, 5,242 and 29,129 were predicted as non-synonymous and synonymous SNPs respectively. Characterization analysis of the predicted SNPs in L. vannamei showed that the estimated SNP frequency was 0.21% (one SNP per 476 bp) and the estimated ratio for transition to transversion was 2.0. Fifty SNPs were randomly selected for validation by Sanger sequencing after PCR amplification and 76% of SNPs were confirmed, which indicated that the SNPs predicted in this study were reliable. These SNPs will be very useful for genetic study in L. vannamei, especially for the high density linkage map construction and genome-wide association studies. PMID:24498047
A comprehensive evaluation of assembly scaffolding tools
2014-01-01
Background Genome assembly is typically a two-stage process: contig assembly followed by the use of paired sequencing reads to join contigs into scaffolds. Scaffolds are usually the focus of reported assembly statistics; longer scaffolds greatly facilitate the use of genome sequences in downstream analyses, and it is appealing to present larger numbers as metrics of assembly performance. However, scaffolds are highly prone to errors, especially when generated using short reads, which can directly result in inflated assembly statistics. Results Here we provide the first independent evaluation of scaffolding tools for second-generation sequencing data. We find large variations in the quality of results depending on the tool and dataset used. Even extremely simple test cases of perfect input, constructed to elucidate the behaviour of each algorithm, produced some surprising results. We further dissect the performance of the scaffolders using real and simulated sequencing data derived from the genomes of Staphylococcus aureus, Rhodobacter sphaeroides, Plasmodium falciparum and Homo sapiens. The results from simulated data are of high quality, with several of the tools producing perfect output. However, at least 10% of joins remains unidentified when using real data. Conclusions The scaffolders vary in their usability, speed and number of correct and missed joins made between contigs. Results from real data highlight opportunities for further improvements of the tools. Overall, SGA, SOPRA and SSPACE generally outperform the other tools on our datasets. However, the quality of the results is highly dependent on the read mapper and genome complexity. PMID:24581555
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saha, Malay C.; Brummer, E. Charles; Kaeppler, Shawn
Switchgrass (Panicum virgatum L.) is a C4 grass with high biomass yield potential and a model species for bioenergy feedstock development. Understanding the genetic basis of quantitative traits is essential to facilitate genome-enabled breeding programs. The nested association mapping (NAM) analysis combines the best features of both bi-parental and association analyses and can provide high power and high resolution in QTL detection and will ensure significant improvements in biomass yield and quality. To develop a NAM population of switchgrass, 15 highly diverse genotypes with specific characteristics were selected from a diversity panel and crossed to a recurrent parent, AP13, amore » genotype selected for whole genome sequencing and parent of a mapping population. Ten genotypes from each of the 15 F1 families were then chain crossed. Progenies form each family were randomly selected to develop the NAM population. The switchgrass NAM population consists of a total of 2000 genotypes from 15 families. All the progenies, founder parents, F1 parents (n=2350) were evaluated in replicated field trials at Ardmore, OK and Knoxville, TN. Phenotypic data on plant height, tillering ability, regrowth, flowering time, and biomass yield were collected. Dried biomass samples were also analyzed using prediction equations of NIRS at the Noble Foundation and for lignin content, S/G ratio, and sugar release characteristics at the NREL. Genomic shotgun sequencing of 15 switchgrass NAM founder parental genomes at JGI produced 28-66 Gb high-quality sequence data. Alignment of these sequences with the reference genome, AP13 (v3.0), revealed that up to 99% of the genomic sequences mapped to the reference genome. A total of 2,149 individuals from NAM populations were sequenced by exome capture and two sets of 15 SNP matrices (one for each family) were generated. QTL associated with important traits have been identified and verified in breeding populations. The QTL detected and their associated markers can be used in molecular breeding programs to facilitate development of improved switchgrass cultivars for biofuel production.« less
Genome analysis and identification of gelatinase encoded gene in Enterobacter aerogenes
NASA Astrophysics Data System (ADS)
Shahimi, Safiyyah; Mutalib, Sahilah Abdul; Khalid, Rozida Abdul; Repin, Rul Aisyah Mat; Lamri, Mohd Fadly; Bakar, Mohd Faizal Abu; Isa, Mohd Noor Mat
2016-11-01
In this study, bioinformatic analysis towards genome sequence of E. aerogenes was done to determine gene encoded for gelatinase. Enterobacter aerogenes was isolated from hot spring water and gelatinase species-specific bacterium to porcine and fish gelatin. This bacterium offers the possibility of enzymes production which is specific to both species gelatine, respectively. Enterobacter aerogenes was partially genome sequenced resulting in 5.0 mega basepair (Mbp) total size of sequence. From pre-process pipeline, 87.6 Mbp of total reads, 68.8 Mbp of total high quality reads and 78.58 percent of high quality percentage was determined. Genome assembly produced 120 contigs with 67.5% of contigs over 1 kilo base pair (kbp), 124856 bp of N50 contig length and 55.17 % of GC base content percentage. About 4705 protein gene was identified from protein prediction analysis. Two candidate genes selected have highest similarity identity percentage against gelatinase enzyme available in Swiss-Prot and NCBI online database. They were NODE_9_length_26866_cov_148.013245_12 containing 1029 base pair (bp) sequence with 342 amino acid sequence and NODE_24_length_155103_cov_177.082458_62 which containing 717 bp sequence with 238 amino acid sequence, respectively. Thus, two paired of primers (forward and reverse) were designed, based on the open reading frame (ORF) of selected genes. Genome analysis of E. aerogenes resulting genes encoded gelatinase were identified.
Error and Error Mitigation in Low-Coverage Genome Assemblies
Hubisz, Melissa J.; Lin, Michael F.; Kellis, Manolis; Siepel, Adam
2011-01-01
The recent release of twenty-two new genome sequences has dramatically increased the data available for mammalian comparative genomics, but twenty of these new sequences are currently limited to ∼2× coverage. Here we examine the extent of sequencing error in these 2× assemblies, and its potential impact in downstream analyses. By comparing 2× assemblies with high-quality sequences from the ENCODE regions, we estimate the rate of sequencing error to be 1–4 errors per kilobase. While this error rate is fairly modest, sequencing error can still have surprising effects. For example, an apparent lineage-specific insertion in a coding region is more likely to reflect sequencing error than a true biological event, and the length distribution of coding indels is strongly distorted by error. We find that most errors are contributed by a small fraction of bases with low quality scores, in particular, by the ends of reads in regions of single-read coverage in the assembly. We explore several approaches for automatic sequencing error mitigation (SEM), making use of the localized nature of sequencing error, the fact that it is well predicted by quality scores, and information about errors that comes from comparisons across species. Our automatic methods for error mitigation cannot replace the need for additional sequencing, but they do allow substantial fractions of errors to be masked or eliminated at the cost of modest amounts of over-correction, and they can reduce the impact of error in downstream phylogenomic analyses. Our error-mitigated alignments are available for download. PMID:21340033
Genome Sequencing and Assembly by Long Reads in Plants
Li, Changsheng; Lin, Feng; An, Dong; Huang, Ruidong
2017-01-01
Plant genomes generated by Sanger and Next Generation Sequencing (NGS) have provided insight into species diversity and evolution. However, Sanger sequencing is limited in its applications due to high cost, labor intensity, and low throughput, while NGS reads are too short to resolve abundant repeats and polyploidy, leading to incomplete or ambiguous assemblies. The advent and improvement of long-read sequencing by Third Generation Sequencing (TGS) methods such as PacBio and Nanopore have shown promise in producing high-quality assemblies for complex genomes. Here, we review the development of sequencing, introducing the application as well as considerations of experimental design in TGS of plant genomes. We also introduce recent revolutionary scaffolding technologies including BioNano, Hi-C, and 10× Genomics. We expect that the informative guidance for genome sequencing and assembly by long reads will benefit the initiation of scientists’ projects. PMID:29283420
Maize - GO annotation methods, evaluation, and review (Maize-GAMER)
USDA-ARS?s Scientific Manuscript database
Making a genome sequence accessible and useful involves three basic steps: genome assembly, structural annotation, and functional annotation. The quality of data generated at each step influences the accuracy of inferences that can be made, with high-quality analyses produce better datasets resultin...
NASA Astrophysics Data System (ADS)
Mantel, Claire; Korhonen, Jari; Pedersen, Jesper M.; Bech, Søren; Andersen, Jakob Dahl; Forchhammer, Søren
2015-01-01
This paper focuses on the influence of ambient light on the perceived quality of videos displayed on Liquid Crystal Display (LCD) with local backlight dimming. A subjective test assessing the quality of videos with two backlight dimming methods and three lighting conditions, i.e. no light, low light level (5 lux) and higher light level (60 lux) was organized to collect subjective data. Results show that participants prefer the method exploiting local dimming possibilities to the conventional full backlight but that this preference varies depending on the ambient light level. The clear preference for one method at the low light conditions decreases at the high ambient light, confirming that the ambient light significantly attenuates the perception of the leakage defect (light leaking through dark pixels). Results are also highly dependent on the content of the sequence, which can modulate the effect of the ambient light from having an important influence on the quality grades to no influence at all.
Automated sample-preparation technologies in genome sequencing projects.
Hilbert, H; Lauber, J; Lubenow, H; Düsterhöft, A
2000-01-01
A robotic workstation system (BioRobot 96OO, QIAGEN) and a 96-well UV spectrophotometer (Spectramax 250, Molecular Devices) were integrated in to the process of high-throughput automated sequencing of double-stranded plasmid DNA templates. An automated 96-well miniprep kit protocol (QIAprep Turbo, QIAGEN) provided high-quality plasmid DNA from shotgun clones. The DNA prepared by this procedure was used to generate more than two mega bases of final sequence data for two genomic projects (Arabidopsis thaliana and Schizosaccharomyces pombe), three thousand expressed sequence tags (ESTs) plus half a mega base of human full-length cDNA clones, and approximately 53,000 single reads for a whole genome shotgun project (Pseudomonas putida).
Deep sequencing in library selection projects: what insight does it bring?
Glanville, J; D'Angelo, S; Khan, T A; Reddy, S T; Naranjo, L; Ferrara, F; Bradbury, A R M
2015-08-01
High throughput sequencing is poised to change all aspects of the way antibodies and other binders are discovered and engineered. Millions of available sequence reads provide an unprecedented sampling depth able to guide the design and construction of effective, high quality naïve libraries containing tens of billions of unique molecules. Furthermore, during selections, high throughput sequencing enables quantitative tracing of enriched clones and position-specific guidance to amino acid variation under positive selection during antibody engineering. Successful application of the technologies relies on specific PCR reagent design, correct sequencing platform selection, and effective use of computational tools and statistical measures to remove error, identify antibodies, estimate diversity, and extract signatures of selection from the clone down to individual structural positions. Here we review these considerations and discuss some of the remaining challenges to the widespread adoption of the technology. Copyright © 2015 Elsevier Ltd. All rights reserved.
Deep sequencing in library selection projects: what insight does it bring?
Glanville, J; D’Angelo, S; Khan, T.A.; Reddy, S. T.; Naranjo, L.; Ferrara, F.; Bradbury, A.R.M.
2015-01-01
High throughput sequencing is poised to change all aspects of the way antibodies and other binders are discovered and engineered. Millions of available sequence reads provide an unprecedented sampling depth able to guide the design and construction of effective, high quality naïve libraries containing tens of billions of unique molecules. Furthermore, during selections, high throughput sequencing enables quantitative tracing of enriched clones and position-specific guidance to amino acid variation under positive selection during antibody engineering. Successful application of the technologies relies on specific PCR reagent design, correct sequencing platform selection, and effective use of computational tools and statistical measures to remove error, identify antibodies, estimate diversity, and extract signatures of selection from the clone down to individual structural positions. Here we review these considerations and discuss some of the remaining challenges to the widespread adoption of the technology. PMID:26451649
Holtz, Yan; Ardisson, Morgane; Ranwez, Vincent; Besnard, Alban; Leroy, Philippe; Poux, Gérard; Roumet, Pierre; Viader, Véronique; Santoni, Sylvain; David, Jacques
2016-01-01
Targeted sequence capture is a promising technology which helps reduce costs for sequencing and genotyping numerous genomic regions in large sets of individuals. Bait sequences are designed to capture specific alleles previously discovered in parents or reference populations. We studied a set of 135 RILs originating from a cross between an emmer cultivar (Dic2) and a recent durum elite cultivar (Silur). Six thousand sequence baits were designed to target Dic2 vs. Silur polymorphisms discovered in a previous RNAseq study. These baits were exposed to genomic DNA of the RIL population. Eighty percent of the targeted SNPs were recovered, 65% of which were of high quality and coverage. The final high density genetic map consisted of more than 3,000 markers, whose genetic and physical mapping were consistent with those obtained with large arrays. PMID:27171472
MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach
Watson, Mick; Minot, Samuel S.; Rivera, Maria C.; Franklin, Rima B.
2017-01-01
Abstract Background: Environmental metagenomic analysis is typically accomplished by assigning taxonomy and/or function from whole genome sequencing or 16S amplicon sequences. Both of these approaches are limited, however, by read length, among other technical and biological factors. A nanopore-based sequencing platform, MinION™, produces reads that are ≥1 × 104 bp in length, potentially providing for more precise assignment, thereby alleviating some of the limitations inherent in determining metagenome composition from short reads. We tested the ability of sequence data produced by MinION (R7.3 flow cells) to correctly assign taxonomy in single bacterial species runs and in three types of low-complexity synthetic communities: a mixture of DNA using equal mass from four species, a community with one relatively rare (1%) and three abundant (33% each) components, and a mixture of genomic DNA from 20 bacterial strains of staggered representation. Taxonomic composition of the low-complexity communities was assessed by analyzing the MinION sequence data with three different bioinformatic approaches: Kraken, MG-RAST, and One Codex. Results: Long read sequences generated from libraries prepared from single strains using the version 5 kit and chemistry, run on the original MinION device, yielded as few as 224 to as many as 3497 bidirectional high-quality (2D) reads with an average overall study length of 6000 bp. For the single-strain analyses, assignment of reads to the correct genus by different methods ranged from 53.1% to 99.5%, assignment to the correct species ranged from 23.9% to 99.5%, and the majority of misassigned reads were to closely related organisms. A synthetic metagenome sequenced with the same setup yielded 714 high quality 2D reads of approximately 5500 bp that were up to 98% correctly assigned to the species level. Synthetic metagenome MinION libraries generated using version 6 kit and chemistry yielded from 899 to 3497 2D reads with lengths averaging 5700 bp with up to 98% assignment accuracy at the species level. The observed community proportions for “equal” and “rare” synthetic libraries were close to the known proportions, deviating from 0.1% to 10% across all tests. For a 20-species mock community with staggered contributions, a sequencing run detected all but 3 species (each included at <0.05% of DNA in the total mixture), 91% of reads were assigned to the correct species, 93% of reads were assigned to the correct genus, and >99% of reads were assigned to the correct family. Conclusions: At the current level of output and sequence quality (just under 4 × 103 2D reads for a synthetic metagenome), MinION sequencing followed by Kraken or One Codex analysis has the potential to provide rapid and accurate metagenomic analysis where the consortium is comprised of a limited number of taxa. Important considerations noted in this study included: high sensitivity of the MinION platform to the quality of input DNA, high variability of sequencing results across libraries and flow cells, and relatively small numbers of 2D reads per analysis limit. Together, these limited detection of very rare components of the microbial consortia, and would likely limit the utility of MinION for the sequencing of high-complexity metagenomic communities where thousands of taxa are expected. Furthermore, the limitations of the currently available data analysis tools suggest there is considerable room for improvement in the analytical approaches for the characterization of microbial communities using long reads. Nevertheless, the fact that the accurate taxonomic assignment of high-quality reads generated by MinION is approaching 99.5% and, in most cases, the inferred community structure mirrors the known proportions of a synthetic mixture warrants further exploration of practical application to environmental metagenomics as the platform continues to develop and improve. With further improvement in sequence throughput and error rate reduction, this platform shows great promise for precise real-time analysis of the composition and structure of more complex microbial communities. PMID:28327976
MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.
Brown, Bonnie L; Watson, Mick; Minot, Samuel S; Rivera, Maria C; Franklin, Rima B
2017-03-01
Environmental metagenomic analysis is typically accomplished by assigning taxonomy and/or function from whole genome sequencing or 16S amplicon sequences. Both of these approaches are limited, however, by read length, among other technical and biological factors. A nanopore-based sequencing platform, MinION™, produces reads that are ≥1 × 104 bp in length, potentially providing for more precise assignment, thereby alleviating some of the limitations inherent in determining metagenome composition from short reads. We tested the ability of sequence data produced by MinION (R7.3 flow cells) to correctly assign taxonomy in single bacterial species runs and in three types of low-complexity synthetic communities: a mixture of DNA using equal mass from four species, a community with one relatively rare (1%) and three abundant (33% each) components, and a mixture of genomic DNA from 20 bacterial strains of staggered representation. Taxonomic composition of the low-complexity communities was assessed by analyzing the MinION sequence data with three different bioinformatic approaches: Kraken, MG-RAST, and One Codex. Results: Long read sequences generated from libraries prepared from single strains using the version 5 kit and chemistry, run on the original MinION device, yielded as few as 224 to as many as 3497 bidirectional high-quality (2D) reads with an average overall study length of 6000 bp. For the single-strain analyses, assignment of reads to the correct genus by different methods ranged from 53.1% to 99.5%, assignment to the correct species ranged from 23.9% to 99.5%, and the majority of misassigned reads were to closely related organisms. A synthetic metagenome sequenced with the same setup yielded 714 high quality 2D reads of approximately 5500 bp that were up to 98% correctly assigned to the species level. Synthetic metagenome MinION libraries generated using version 6 kit and chemistry yielded from 899 to 3497 2D reads with lengths averaging 5700 bp with up to 98% assignment accuracy at the species level. The observed community proportions for “equal” and “rare” synthetic libraries were close to the known proportions, deviating from 0.1% to 10% across all tests. For a 20-species mock community with staggered contributions, a sequencing run detected all but 3 species (each included at <0.05% of DNA in the total mixture), 91% of reads were assigned to the correct species, 93% of reads were assigned to the correct genus, and >99% of reads were assigned to the correct family. Conclusions: At the current level of output and sequence quality (just under 4 × 103 2D reads for a synthetic metagenome), MinION sequencing followed by Kraken or One Codex analysis has the potential to provide rapid and accurate metagenomic analysis where the consortium is comprised of a limited number of taxa. Important considerations noted in this study included: high sensitivity of the MinION platform to the quality of input DNA, high variability of sequencing results across libraries and flow cells, and relatively small numbers of 2D reads per analysis limit. Together, these limited detection of very rare components of the microbial consortia, and would likely limit the utility of MinION for the sequencing of high-complexity metagenomic communities where thousands of taxa are expected. Furthermore, the limitations of the currently available data analysis tools suggest there is considerable room for improvement in the analytical approaches for the characterization of microbial communities using long reads. Nevertheless, the fact that the accurate taxonomic assignment of high-quality reads generated by MinION is approaching 99.5% and, in most cases, the inferred community structure mirrors the known proportions of a synthetic mixture warrants further exploration of practical application to environmental metagenomics as the platform continues to develop and improve. With further improvement in sequence throughput and error rate reduction, this platform shows great promise for precise real-time analysis of the composition and structure of more complex microbial communities. © The Author 2017. Published by Oxford University Press.
Osman, Wan Adnawani Meor; van Berkum, Peter; León-Barrios, Milagros; Velázquez, Encarna; Elia, Patrick; Tian, Rui; Ardley, Julie; Gollagher, Margaret; Seshadri, Rekha; Reddy, T B K; Ivanova, Natalia; Woyke, Tanja; Pati, Amrita; Markowitz, Victor; Baeshen, Mohamed N; Baeshen, Naseebh Nabeeh; Kyrpides, Nikos; Reeve, Wayne
2017-01-01
10.1601/nm.1335 Mlalz-1 (INSDC = ATZD00000000) is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-fixing nodule of Medicago laciniata (L.) Miller from a soil sample collected near the town of Guatiza on the island of Lanzarote, the Canary Islands, Spain. This strain nodulates and forms an effective symbiosis with the highly specific host M. laciniata . This rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) sequencing project. Here the features of 10.1601/nm.1335 Mlalz-1 are described, together with high-quality permanent draft genome sequence information and annotation. The 6,664,116 bp high-quality draft genome is arranged in 99 scaffolds of 100 contigs, containing 6314 protein-coding genes and 74 RNA-only encoding genes. Strain Mlalz-1 is closely related to 10.1601/nm.1335 10.1601/strainfinder?urlappend=%3Fid%3DIAM+12611 T , 10.1601/nm.1334 A 321 T and 10.1601/nm.17831 10.1601/strainfinder?urlappend=%3Fid%3DORS+1407 T , based on 16S rRNA gene sequences. gANI values of ≥98.1% support the classification of strain Mlalz-1 as 10.1601/nm.1335. Nodulation of M. laciniata requires a specific nodC allele, and the nodC gene of strain Mlalz-1 shares ≥98% sequence identity with nodC of M. laciniata -nodulating 10.1601/nm.1328 strains, but ≤93% with nodC of 10.1601/nm.1328 strains that nodulate other Medicago species. Strain Mlalz-1 is unique among sequenced 10.1601/nm.1335 strains in possessing genes encoding components of a T2SS and in having two versions of the adaptive acid tolerance response lpiA-acvB operon. In 10.1601/nm.1334 strain 10.1601/strainfinder?urlappend=%3Fid%3DWSM+419, lpiA is essential for enhancing survival in lethal acid conditions. The second copy of the lpiA-acvB operon of strain Mlalz-1 has highest sequence identity (> 96%) with that of 10.1601/nm.1334 strains, which suggests genetic recombination between strain Mlalz-1 and 10.1601/nm.1334 and the horizontal gene transfer of lpiA-acvB .
Genome Sequences of 19 Novel Erwinia amylovora Bacteriophages
Esplin, Ian N. D.; Berg, Jordan A.; Sharma, Ruchira; Allen, Robert C.; Arens, Daniel K.; Ashcroft, Cody R.; Bairett, Shannon R.; Beatty, Nolan J.; Bickmore, Madeline; Bloomfield, Travis J.; Brady, T. Scott; Bybee, Rachel N.; Carter, John L.; Choi, Minsey C.; Duncan, Steven; Fajardo, Christopher P.; Foy, Brayden B.; Fuhriman, David A.; Gibby, Paul D.; Grossarth, Savannah E.; Harbaugh, Kala; Harris, Natalie; Hilton, Jared A.; Hurst, Emily; Hyde, Jonathan R.; Ingersoll, Kayleigh; Jacobson, Caitlin M.; James, Brady D.; Jarvis, Todd M.; Jaen-Anieves, Daniella; Jensen, Garrett L.; Knabe, Bradley K.; Kruger, Jared L.; Merrill, Bryan D.; Pape, Jenny A.; Payne Anderson, Ashley M.; Payne, David E.; Peck, Malia D.; Pollock, Samuel V.; Putnam, Micah J.; Ransom, Ethan K.; Ririe, Devin B.; Robinson, David M.; Rogers, Spencer L.; Russell, Kerri A.; Schoenhals, Jonathan E.; Shurtleff, Christopher A.; Simister, Austin R.; Smith, Hunter G.; Stephenson, Michael B.; Staley, Lyndsay A.; Stettler, Jason M.; Stratton, Mallorie L.; Tateoka, Olivia B.; Tatlow, P. J.; Taylor, Alexander S.; Thompson, Suzanne E.; Townsend, Michelle H.; Thurgood, Trever L.; Usher, Brittian K.; Whitley, Kiara V.; Ward, Andrew T.; Ward, Megan E. H.; Webb, Charles J.; Wienclaw, Trevor M.; Williamson, Taryn L.; Wells, Michael J.; Wright, Cole K.; Breakwell, Donald P.; Hope, Sandra
2017-01-01
ABSTRACT Erwinia amylovora is the causal agent of fire blight, a devastating disease affecting some plants of the Rosaceae family. We isolated bacteriophages from samples collected from infected apple and pear trees along the Wasatch Front in Utah. We announce 19 high-quality complete genome sequences of E. amylovora bacteriophages. PMID:29146842
Curtobacterium sp. Genome Sequencing Underlines Plant Growth Promotion-Related Traits.
Bulgari, Daniela; Minio, Andrea; Casati, Paola; Quaglino, Fabio; Delledonne, Massimo; Bianco, Piero A
2014-07-17
Endophytic bacteria are microorganisms residing in plant tissues without causing disease symptoms. Here, we provide the high-quality genome sequence of Curtobacterium sp. strain S6, isolated from grapevine plant. The genome assembly contains 2,759,404 bp in 13 contigs and 2,456 predicted genes. Copyright © 2014 Bulgari et al.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sen, Arnab; Beauchemin, Nicholas; Bruce, David
Members of actinomycete genus Frankia form a nitrogen-fixing symbiosis with 8 different families of actinorhizal plants. We report a high-quality draft genome sequence for Frankia sp. stain QA3, a nitrogen-fixing actinobacterium isolated from root nodules of Alnus nitida.
An improved filtering algorithm for big read datasets and its application to single-cell assembly.
Wedemeyer, Axel; Kliemann, Lasse; Srivastav, Anand; Schielke, Christian; Reusch, Thorsten B; Rosenstiel, Philip
2017-07-03
For single-cell or metagenomic sequencing projects, it is necessary to sequence with a very high mean coverage in order to make sure that all parts of the sample DNA get covered by the reads produced. This leads to huge datasets with lots of redundant data. A filtering of this data prior to assembly is advisable. Brown et al. (2012) presented the algorithm Diginorm for this purpose, which filters reads based on the abundance of their k-mers. We present Bignorm, a faster and quality-conscious read filtering algorithm. An important new algorithmic feature is the use of phred quality scores together with a detailed analysis of the k-mer counts to decide which reads to keep. We qualify and recommend parameters for our new read filtering algorithm. Guided by these parameters, we remove in terms of median 97.15% of the reads while keeping the mean phred score of the filtered dataset high. Using the SDAdes assembler, we produce assemblies of high quality from these filtered datasets in a fraction of the time needed for an assembly from the datasets filtered with Diginorm. We conclude that read filtering is a practical and efficient method for reducing read data and for speeding up the assembly process. This applies not only for single cell assembly, as shown in this paper, but also to other projects with high mean coverage datasets like metagenomic sequencing projects. Our Bignorm algorithm allows assemblies of competitive quality in comparison to Diginorm, while being much faster. Bignorm is available for download at https://git.informatik.uni-kiel.de/axw/Bignorm .
QuickProbs 2: Towards rapid construction of high-quality alignments of large protein families
Gudyś, Adam; Deorowicz, Sebastian
2017-01-01
The ever-increasing size of sequence databases caused by the development of high throughput sequencing, poses to multiple alignment algorithms one of the greatest challenges yet. As we show, well-established techniques employed for increasing alignment quality, i.e., refinement and consistency, are ineffective when large protein families are investigated. We present QuickProbs 2, an algorithm for multiple sequence alignment. Based on probabilistic models, equipped with novel column-oriented refinement and selective consistency, it offers outstanding accuracy. When analysing hundreds of sequences, Quick-Probs 2 is noticeably better than ClustalΩ and MAFFT, the previous leaders for processing numerous protein families. In the case of smaller sets, for which consistency-based methods are the best performing, QuickProbs 2 is also superior to the competitors. Due to low computational requirements of selective consistency and utilization of massively parallel architectures, presented algorithm has similar execution times to ClustalΩ, and is orders of magnitude faster than full consistency approaches, like MSAProbs or PicXAA. All these make QuickProbs 2 an excellent tool for aligning families ranging from few, to hundreds of proteins. PMID:28139687
Childs, Kevin L; Konganti, Kranti; Buell, C Robin
2012-01-01
Major feedstock sources for future biofuel production are likely to be high biomass producing plant species such as poplar, pine, switchgrass, sorghum and maize. One active area of research in these species is genome-enabled improvement of lignocellulosic biofuel feedstock quality and yield. To facilitate genomic-based investigations in these species, we developed the Biofuel Feedstock Genomic Resource (BFGR), a database and web-portal that provides high-quality, uniform and integrated functional annotation of gene and transcript assembly sequences from species of interest to lignocellulosic biofuel feedstock researchers. The BFGR includes sequence data from 54 species and permits researchers to view, analyze and obtain annotation at the gene, transcript, protein and genome level. Annotation of biochemical pathways permits the identification of key genes and transcripts central to the improvement of lignocellulosic properties in these species. The integrated nature of the BFGR in terms of annotation methods, orthologous/paralogous relationships and linkage to seven species with complete genome sequences allows comparative analyses for biofuel feedstock species with limited sequence resources. Database URL: http://bfgr.plantbiology.msu.edu.
SCARF: maximizing next-generation EST assemblies for evolutionary and population genomic analyses.
Barker, Michael S; Dlugosch, Katrina M; Reddy, A Chaitanya C; Amyotte, Sarah N; Rieseberg, Loren H
2009-02-15
Scaffolded and Corrected Assembly of Roche 454 (SCARF) is a next-generation sequence assembly tool for evolutionary genomics that is designed especially for assembling 454 EST sequences against high-quality reference sequences from related species. The program was created to knit together 454 contigs that do not assemble during traditional de novo assembly, using a reference sequence library to orient the 454 sequences. SCARF is freely available at http://msbarker.com/software.htm, and is released under the open source GPLv3 license (http://www.opensource.org/licenses/gpl-3.0.html.
DOE Office of Scientific and Technical Information (OSTI.GOV)
De Meyer, Sofie E.; Tian, Rui; Seshadri, Rekha
Burkholderia dilworthii strain WSM3556T is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective N2-fixing root nodule of Lebeckia ambigua collected near Grotto Bay Nature Reserve, in the Western Cape of South Africa, in October 2004. This plant persists in infertile and deep sandy soils with acidic pH, and is therefore an ideal candidate for a perennial based agriculture system in Western Australia. WSM3556T thus represents a potential inoculant quality strain for L. ambigua for which we describe the general features, together with genome sequence and annotation. Lastly, the 7,679,067 bp high-quality permanent draft genome is arrangedmore » in 140 scaffolds of 141 contigs, contains 7,059 protein-coding genes and 64 RNA-only encoding genes, and is part of the GEBA-RNB project proposal.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
De Meyer, Sofie E.; Tian, Rui; Seshadri, Rekha
We report that Burkholderia sp. strain WSM4176 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective N2-fixing root nodule of Lebeckia ambigua collected in Nieuwoudtville, Western Cape of South Africa, in October 2007. This plant persists in infertile, acidic and deep sandy soils, and is therefore an ideal candidate for a perennial based agriculture system in Western Australia. Here we describe the features of Burkholderia sp. strain WSM4176, which represents a potential inoculant quality strain for L. ambigua, together with sequence and annotation. The 9,065,247 bp high-quality-draft genome is arranged in 13 scaffolds of 65 contigs,more » contains 8369 protein-coding genes and 128 RNA-only encoding genes, and is part of the GEBA-RNB project proposal (Project ID 882).« less
De Meyer, Sofie E.; Tian, Rui; Seshadri, Rekha; ...
2015-09-19
Burkholderia dilworthii strain WSM3556T is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective N2-fixing root nodule of Lebeckia ambigua collected near Grotto Bay Nature Reserve, in the Western Cape of South Africa, in October 2004. This plant persists in infertile and deep sandy soils with acidic pH, and is therefore an ideal candidate for a perennial based agriculture system in Western Australia. WSM3556T thus represents a potential inoculant quality strain for L. ambigua for which we describe the general features, together with genome sequence and annotation. Lastly, the 7,679,067 bp high-quality permanent draft genome is arrangedmore » in 140 scaffolds of 141 contigs, contains 7,059 protein-coding genes and 64 RNA-only encoding genes, and is part of the GEBA-RNB project proposal.« less
De Meyer, Sofie E.; Tian, Rui; Seshadri, Rekha; ...
2015-10-16
We report that Burkholderia sp. strain WSM4176 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective N2-fixing root nodule of Lebeckia ambigua collected in Nieuwoudtville, Western Cape of South Africa, in October 2007. This plant persists in infertile, acidic and deep sandy soils, and is therefore an ideal candidate for a perennial based agriculture system in Western Australia. Here we describe the features of Burkholderia sp. strain WSM4176, which represents a potential inoculant quality strain for L. ambigua, together with sequence and annotation. The 9,065,247 bp high-quality-draft genome is arranged in 13 scaffolds of 65 contigs,more » contains 8369 protein-coding genes and 128 RNA-only encoding genes, and is part of the GEBA-RNB project proposal (Project ID 882).« less
Blåhed, Ida-Maria; Königsson, Helena; Ericsson, Göran; Spong, Göran
2018-01-01
Monitoring of wild animal populations is challenging, yet reliable information about population processes is important for both management and conservation efforts. Access to molecular markers, such as SNPs, enables population monitoring through genotyping of various DNA sources. We have developed 96 high quality SNP markers for individual identification of moose (Alces alces), an economically and ecologically important top-herbivore in boreal regions. Reduced representation libraries constructed from 34 moose were high-throughput de novo sequenced, generating nearly 50 million read pairs. About 50 000 stacks of aligned reads containing one or more SNPs were discovered with the Stacks pipeline. Several quality criteria were applied on the candidate SNPs to find markers informative on the individual level and well representative for the population. An empirical validation by genotyping of sequenced individuals and additional moose, resulted in the selection of a final panel of 86 high quality autosomal SNPs. Additionally, five sex-specific SNPs and five SNPs for sympatric species diagnostics are included in the panel. The genotyping error rate was 0.002 for the total panel and probability of identities were low enough to separate individuals with high confidence. Moreover, the autosomal SNPs were highly informative also for population level analyses. The potential applications of this SNP panel are thus many including investigations of population size, sex ratios, relatedness, reproductive success and population structure. Ideally, SNP-based studies could improve today's population monitoring and increase our knowledge about moose population dynamics.
Reducing assembly complexity of microbial genomes with single-molecule sequencing.
Koren, Sergey; Harhay, Gregory P; Smith, Timothy P L; Bono, James L; Harhay, Dayna M; Mcvey, Scott D; Radune, Diana; Bergman, Nicholas H; Phillippy, Adam M
2013-01-01
The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Third-generation, single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem. To measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These single-library assemblies are also more accurate than typical short-read assemblies and hybrid assemblies of short and long reads. Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to $1,000 for most genomes, and future advances in this technology are expected to drive the cost lower. This is expected to increase the number of completed genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan-genomes and chromosomal organization.
Dabir, Darius; Naehle, Claas Philip; Clauberg, Ralf; Gieseke, Juergen; Schild, Hans H; Thomas, Daniel
2012-10-29
Using first-pass MRA (FP-MRA) spatial resolution is limited by breath-hold duration. In addition, image quality may be hampered by respiratory and cardiac motion artefacts. In order to overcome these limitations an ECG- and navigator-gated high-resolution-MRA sequence (HR-MRA) with slow infusion of extracellular contrast agent was implemented at 3 Tesla for the assessment of congenital heart disease and compared to standard first-pass-MRA (FP-MRA). 34 patients (median age: 13 years) with congenital heart disease (CHD) were prospectively examined on a 3 Tesla system. The CMR-protocol comprised functional imaging, FP- and HR-MRA, and viability imaging. After the acquisition of the FP-MRA sequence using a single dose of extracellular contrast agent the motion compensated HR-MRA sequence with isotropic resolution was acquired while injecting the second single dose, utilizing the timeframe before viability imaging. Qualitative scores for image quality (two independent reviewers) as well as quantitative measurements of vessel sharpness and relative contrast were compared using the Wilcoxon signed-rank test. Quantitative measurements of vessel diameters were compared using the Bland-Altman test. The mean image quality score revealed significantly better image quality of the HR-MRA sequence compared to the FP-MRA sequence in all vessels of interest (ascending aorta (AA), left pulmonary artery (LPA), left superior pulmonary vein (LSPV), coronary sinus (CS), and coronary ostia (CO); all p < 0.0001). In comparison to FP-MRA, HR-MRA revealed significantly better vessel sharpness for all considered vessels (AA, LSPV and LPA; all p < 0.0001). The relative contrast of the HR-MRA sequence was less compared to the FP-MRA sequence (AA: p <0.028, main pulmonary artery: p <0.004, LSPV: p <0.005). Both, the results of the intra- and interobserver measurements of the vessel diameters revealed closer correlation and closer 95 % limits of agreement for the HR-MRA. HR-MRA revealed one additional clinical finding, missed by FP-MRA. An ECG- and navigator-gated HR-MRA-protocol with infusion of extracellular contrast agent at 3 Tesla is feasible. HR-MRA delivers significantly better image quality and vessel sharpness compared to FP-MRA. It may be integrated into a standard CMR-protocol for patients with CHD without the need for additional contrast agent injection and without any additional examination time.
High resolution identity testing of inactivated poliovirus vaccines
Mee, Edward T.; Minor, Philip D.; Martin, Javier
2015-01-01
Background Definitive identification of poliovirus strains in vaccines is essential for quality control, particularly where multiple wild-type and Sabin strains are produced in the same facility. Sequence-based identification provides the ultimate in identity testing and would offer several advantages over serological methods. Methods We employed random RT-PCR and high throughput sequencing to recover full-length genome sequences from monovalent and trivalent poliovirus vaccine products at various stages of the manufacturing process. Results All expected strains were detected in previously characterised products and the method permitted identification of strains comprising as little as 0.1% of sequence reads. Highly similar Mahoney and Sabin 1 strains were readily discriminated on the basis of specific variant positions. Analysis of a product known to contain incorrect strains demonstrated that the method correctly identified the contaminants. Conclusion Random RT-PCR and shotgun sequencing provided high resolution identification of vaccine components. In addition to the recovery of full-length genome sequences, the method could also be easily adapted to the characterisation of minor variant frequencies and distinction of closely related products on the basis of distinguishing consensus and low frequency polymorphisms. PMID:26049003
Yu, Dan-Dan; Xie, Yan-Ming; Liao, Xing; Zhi, Ying-Jie; Jiang, Jun-Jie; Chen, Wei
2018-02-01
To evaluate the methodological quality and reporting quality of randomized controlled trials(RCTs) published in China Journal of Chinese Materia Medica, we searched CNKI and China Journal of Chinese Materia webpage to collect RCTs since the establishment of the magazine. The Cochrane risk of bias assessment tool was used to evaluate the methodological quality of RCTs. The CONSORT 2010 list was adopted as reporting quality evaluating tool. Finally, 184 RCTs were included and evaluated methodologically, of which 97 RCTs were evaluated with reporting quality. For the methodological evaluating, 62 trials(33.70%) reported the random sequence generation; 9(4.89%) trials reported the allocation concealment; 25(13.59%) trials adopted the method of blinding; 30(16.30%) trials reported the number of patients withdrawing, dropping out and those lost to follow-up;2 trials (1.09%) reported trial registration and none of the trial reported the trial protocol; only 8(4.35%) trials reported the sample size estimation in details. For reporting quality appraising, 3 reporting items of 25 items were evaluated with high-quality,including: abstract, participants qualified criteria, and statistical methods; 4 reporting items with medium-quality, including purpose, intervention, random sequence method, and data collection of sites and locations; 9 items with low-quality reporting items including title, backgrounds, random sequence types, allocation concealment, blindness, recruitment of subjects, baseline data, harms, and funding;the rest of items were of extremely low quality(the compliance rate of reporting item<10%). On the whole, the methodological and reporting quality of RCTs published in the magazine are generally low. Further improvement in both methodological and reporting quality for RCTs of traditional Chinese medicine are warranted. It is recommended that the international standards and procedures for RCT design should be strictly followed to conduct high-quality trials. At the same time, in order to improve the reporting quality of randomized controlled trials, CONSORT standards should be adopted in the preparation of research reports and submissions. Copyright© by the Chinese Pharmaceutical Association.
2011-01-01
Background Biodiesel or ethanol derived from lipids or starch produced by microalgae may overcome many of the sustainability challenges previously ascribed to petroleum-based fuels and first generation plant-based biofuels. The paucity of microalgae genome sequences, however, limits gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for the non-model microalgae species, Dunaliella tertiolecta, and identify pathways and genes of importance related to biofuel production. Results Next generation DNA pyrosequencing technology applied to D. tertiolecta transcripts produced 1,363,336 high quality reads with an average length of 400 bases. Following quality and size trimming, ~ 45% of the high quality reads were assembled into 33,307 isotigs with a 31-fold coverage and 376,482 singletons. Assembled sequences and singletons were subjected to BLAST similarity searches and annotated with Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology (KO) identifiers. These analyses identified the majority of lipid and starch biosynthesis and catabolism pathways in D. tertiolecta. Conclusions The construction of metabolic pathways involved in the biosynthesis and catabolism of fatty acids, triacylglycrols, and starch in D. tertiolecta as well as the assembled transcriptome provide a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock. PMID:21401935
Sharon, Jeffrey D; Northcutt, Benjamin G; Aygun, Nafi; Francis, Howard W
2016-10-01
To study the quality and usability of magnetic resonance imaging (MRI) obtained with a cochlear implant magnet in situ. Retrospective chart review. Tertiary care center. All patients who underwent brain MRI with a cochlear implant magnet in situ from 2007 to 2016. None. Grade of view of the ipsilateral internal auditory canal (IAC) and cerebellopontine angle (CPA). Inclusion criteria were met by 765 image sequences in 57 MRI brain scans. For the ipsilateral IAC, significant predictors of a grade 1 (normal) view included: absence of fat saturation algorithm (p = 0.001), nonaxial plane of imaging (p = 0.01), and contrast administration (p = 0.001). For the ipsilateral CPA, significant predictors of a grade 1 view included: absence of fat saturation algorithm (p = 0.001), high-resolution images (p = 0.001), and nonaxial plane of imaging (p = 0.001). Overall, coronal T1 high-resolution images produced the highest percentage of grade 1 views (89%). Fat saturation also caused a secondary ring-shaped distortion artifact, which impaired the view of the contralateral CPA 52.7% of the time, and the contralateral IAC 42.8% of the time. MRI scans without any usable (grade 1) sequences had fewer overall sequences (N = 4.3) than scans with at least one usable sequence (N = 7.1, p = 0.001). MRI image quality with a cochlear implant magnet in situ depends on several factors, which can be modified to maximize image quality in this unique patient population.
Wright, Imogen A; Travers, Simon A
2014-07-01
The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Wei, Yu-Jie; Wu, Yun; Yan, Yin-Zhuo; Zou, Wan; Xue, Jie; Ma, Wen-Rui; Wang, Wei; Tian, Ge; Wang, Li-Ye
2018-01-01
In this study Illumina MiSeq was performed to investigate microbial diversity in soil, leaves, grape, grape juice and wine. A total of 1,043,102 fungal Internal Transcribed Spacer (ITS) reads and 2,422,188 high quality bacterial 16S rDNA sequences were used for taxonomic classification, revealed five fungal and eight bacterial phyla. At the genus level, the dominant fungi were Ascomycota, Sordariales, Tetracladium and Geomyces in soil, Aureobasidium and Pleosporaceae in grapes leaves, Aureobasidium in grape and grape juice. The dominant bacteria were Kaistobacter, Arthrobacter, Skermanella and Sphingomonas in soil, Pseudomonas, Acinetobacter and Kaistobacter in grape and grapes leaves, and Oenococcus in grape juice and wine. Principal coordinate analysis showed structural separation between the composition of fungi and bacteria in all samples. This is the first study to understand microbiome population in soil, grape, grapes leaves, grape juice and wine in Xinjiang through High-throughput Sequencing and identify microorganisms like Saccharomyces cerevisiae and Oenococcus spp. that may contribute to the quality and flavor of wine.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mhamdi, Ridha; Ardley, Julie; Tian, Rui
We report that Ensifer meliloti 4H41 is an aerobic, motile, Gram-negative, non-spore-forming rod that can exist as a soil saprophyte or as a legume microsymbiont of common bean (Phaseolus vulgaris). Strain 4H41 was isolated in 2002 from root nodules of P. vulgaris grown in South Tunisia from the oasis of Rjim-Maatoug. Strain 4H41 is salt- and drought-tolerant and highly effective at fixing nitrogen with P. vulgaris. Here we describe the features of E. meliloti 4H41, together with genome sequence information and its annotation. The 6,795,637 bp high-quality permanent draft genome is arranged into 47 scaffolds of 47 contigs containing 6,350more » protein-coding genes and 72 RNA-only encoding genes, and is one of the rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project proposal.« less
Yates, Ron; Howieson, John; De Meyer, Sofie E.; ...
2015-07-24
Rhizobium sullae strain WSM1592 is an aerobic, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen (N2) fixing root nodule formed on the short-lived perennial legume Hedysarum coronarium (also known as Sulla coronaria or Sulla). WSM1592 was isolated from a nodule recovered from H. coronarium roots located in Ottava, bordering Sassari, Sardinia in 1995. WSM1592 is highly effective at fixing nitrogen with H. coronarium, and is currently the commercial Sulla inoculant strain in Australia. Here we describe the features of R. sullae strain WSM1592, together with genome sequence information and its annotation. The 7,530,820 bp high-quality permanent draft genomemore » is arranged into 118 scaffolds of 118 contigs containing 7.453 protein-coding genes and 73 RNA-only encoding genes. In conclusion, this rhizobial genome is sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project.« less
Mhamdi, Ridha; Ardley, Julie; Tian, Rui; ...
2015-07-02
We report that Ensifer meliloti 4H41 is an aerobic, motile, Gram-negative, non-spore-forming rod that can exist as a soil saprophyte or as a legume microsymbiont of common bean (Phaseolus vulgaris). Strain 4H41 was isolated in 2002 from root nodules of P. vulgaris grown in South Tunisia from the oasis of Rjim-Maatoug. Strain 4H41 is salt- and drought-tolerant and highly effective at fixing nitrogen with P. vulgaris. Here we describe the features of E. meliloti 4H41, together with genome sequence information and its annotation. The 6,795,637 bp high-quality permanent draft genome is arranged into 47 scaffolds of 47 contigs containing 6,350more » protein-coding genes and 72 RNA-only encoding genes, and is one of the rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project proposal.« less
Yan, Yin-zhuo; Zou, Wan; Ma, Wen-rui; Wang, Wei; Tian, Ge; Wang, Li-ye
2018-01-01
In this study Illumina MiSeq was performed to investigate microbial diversity in soil, leaves, grape, grape juice and wine. A total of 1,043,102 fungal Internal Transcribed Spacer (ITS) reads and 2,422,188 high quality bacterial 16S rDNA sequences were used for taxonomic classification, revealed five fungal and eight bacterial phyla. At the genus level, the dominant fungi were Ascomycota, Sordariales, Tetracladium and Geomyces in soil, Aureobasidium and Pleosporaceae in grapes leaves, Aureobasidium in grape and grape juice. The dominant bacteria were Kaistobacter, Arthrobacter, Skermanella and Sphingomonas in soil, Pseudomonas, Acinetobacter and Kaistobacter in grape and grapes leaves, and Oenococcus in grape juice and wine. Principal coordinate analysis showed structural separation between the composition of fungi and bacteria in all samples. This is the first study to understand microbiome population in soil, grape, grapes leaves, grape juice and wine in Xinjiang through High-throughput Sequencing and identify microorganisms like Saccharomyces cerevisiae and Oenococcus spp. that may contribute to the quality and flavor of wine. PMID:29565999
Genome Sequences of 19 Novel Erwinia amylovora Bacteriophages.
Esplin, Ian N D; Berg, Jordan A; Sharma, Ruchira; Allen, Robert C; Arens, Daniel K; Ashcroft, Cody R; Bairett, Shannon R; Beatty, Nolan J; Bickmore, Madeline; Bloomfield, Travis J; Brady, T Scott; Bybee, Rachel N; Carter, John L; Choi, Minsey C; Duncan, Steven; Fajardo, Christopher P; Foy, Brayden B; Fuhriman, David A; Gibby, Paul D; Grossarth, Savannah E; Harbaugh, Kala; Harris, Natalie; Hilton, Jared A; Hurst, Emily; Hyde, Jonathan R; Ingersoll, Kayleigh; Jacobson, Caitlin M; James, Brady D; Jarvis, Todd M; Jaen-Anieves, Daniella; Jensen, Garrett L; Knabe, Bradley K; Kruger, Jared L; Merrill, Bryan D; Pape, Jenny A; Payne Anderson, Ashley M; Payne, David E; Peck, Malia D; Pollock, Samuel V; Putnam, Micah J; Ransom, Ethan K; Ririe, Devin B; Robinson, David M; Rogers, Spencer L; Russell, Kerri A; Schoenhals, Jonathan E; Shurtleff, Christopher A; Simister, Austin R; Smith, Hunter G; Stephenson, Michael B; Staley, Lyndsay A; Stettler, Jason M; Stratton, Mallorie L; Tateoka, Olivia B; Tatlow, P J; Taylor, Alexander S; Thompson, Suzanne E; Townsend, Michelle H; Thurgood, Trever L; Usher, Brittian K; Whitley, Kiara V; Ward, Andrew T; Ward, Megan E H; Webb, Charles J; Wienclaw, Trevor M; Williamson, Taryn L; Wells, Michael J; Wright, Cole K; Breakwell, Donald P; Hope, Sandra; Grose, Julianne H
2017-11-16
Erwinia amylovora is the causal agent of fire blight, a devastating disease affecting some plants of the Rosaceae family. We isolated bacteriophages from samples collected from infected apple and pear trees along the Wasatch Front in Utah. We announce 19 high-quality complete genome sequences of E. amylovora bacteriophages. Copyright © 2017 Esplin et al.
The use of PacBio and Hi-C data in denovo assembly of the goat genome
USDA-ARS?s Scientific Manuscript database
Generating de novo reference genome assemblies for non-model organisms is a laborious task that often requires a large amount of data from several sequencing platforms and cytogenetic surveys. By using PacBio sequence data and new library creation techniques, we present a de novo, high quality refer...
USDA-ARS?s Scientific Manuscript database
We needed to obtain an alternative to conventional cloning to generate high-quality DNA sequences from a variety of nuclear orthologs for phylogenetic studies in potato, to save time and money and to avoid problems typically encountered in cloning. We tested a variety of SSCP protocols to include pu...
Wala, Jeremiah; Zhang, Cheng-Zhong; Meyerson, Matthew; Beroukhim, Rameen
2016-07-01
We developed VariantBam, a C ++ read filtering and profiling tool for use with BAM, CRAM and SAM sequencing files. VariantBam provides a flexible framework for extracting sequencing reads or read-pairs that satisfy combinations of rules, defined by any number of genomic intervals or variant sites. We have implemented filters based on alignment data, sequence motifs, regional coverage and base quality. For example, VariantBam achieved a median size reduction ratio of 3.1:1 when applied to 10 lung cancer whole genome BAMs by removing large tags and selecting for only high-quality variant-supporting reads and reads matching a large dictionary of sequence motifs. Thus VariantBam enables efficient storage of sequencing data while preserving the most relevant information for downstream analysis. VariantBam and full documentation are available at github.com/jwalabroad/VariantBam rameen@broadinstitute.org Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Klonowska, Agnieszka; López-López, Aline; Moulin, Lionel; ...
2017-01-17
Rhizobium mesoamericanum STM6155 (INSCD=ATYY01000000) is an aerobic, motile, Gram-negative, non-spore-forming rod that can exist as a soil saprophyte or as an effective nitrogen fixing microsymbiont of the legume Mimosa pudica L.. STM6155 was isolated in 2009 from a nodule of the trap host M. pudica grown in nickel-rich soil collected near Mont Dore, New Caledonia. R. mesoamericanum STM6155 was selected as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) genome sequencing project. Here we describe the symbiotic properties of R. mesoamericanum STM6155, together with its genome sequence information and annotation. Themore » 6,927,906bp high-quality draft genome is arranged into 147 scaffolds of 152 contigs containing 6855 protein-coding genes and 71 RNA-only encoding genes. Strain STM6155 forms an ANI clique (ID 2435) with the sequenced R. mesoamericanum strain STM3625, and the nodulation genes are highly conserved in these strains and the type strain of Rhizobium grahamii CCGE501 T . Within the STM6155 genome, we have identified a chr chromate efflux gene cluster of six genes arranged into two putative operons and we postulate that this cluster is important for the survival of STM6155 in ultramafic soils containing high concentrations of chromate.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klonowska, Agnieszka; López-López, Aline; Moulin, Lionel
Rhizobium mesoamericanum STM6155 (INSCD=ATYY01000000) is an aerobic, motile, Gram-negative, non-spore-forming rod that can exist as a soil saprophyte or as an effective nitrogen fixing microsymbiont of the legume Mimosa pudica L.. STM6155 was isolated in 2009 from a nodule of the trap host M. pudica grown in nickel-rich soil collected near Mont Dore, New Caledonia. R. mesoamericanum STM6155 was selected as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) genome sequencing project. Here we describe the symbiotic properties of R. mesoamericanum STM6155, together with its genome sequence information and annotation. Themore » 6,927,906bp high-quality draft genome is arranged into 147 scaffolds of 152 contigs containing 6855 protein-coding genes and 71 RNA-only encoding genes. Strain STM6155 forms an ANI clique (ID 2435) with the sequenced R. mesoamericanum strain STM3625, and the nodulation genes are highly conserved in these strains and the type strain of Rhizobium grahamii CCGE501 T . Within the STM6155 genome, we have identified a chr chromate efflux gene cluster of six genes arranged into two putative operons and we postulate that this cluster is important for the survival of STM6155 in ultramafic soils containing high concentrations of chromate.« less
Unmanned aerial vehicles for high-throughput phenotyping and agronomic research
USDA-ARS?s Scientific Manuscript database
Advances in automation and data science have led agriculturists to seek real-time, high-quality, high-volume crop data to accelerate crop improvement through breeding and to optimize agronomic practices. Breeders have recently gained massive data-collection capability in genome sequencing of plants....
Walther, Dirk; Bartha, Gábor; Morris, Macdonald
2001-01-01
A pivotal step in electrophoresis sequencing is the conversion of the raw, continuous chromatogram data into the actual sequence of discrete nucleotides, a process referred to as basecalling. We describe a novel algorithm for basecalling implemented in the program LifeTrace. Like Phred, currently the most widely used basecalling software program, LifeTrace takes processed trace data as input. It was designed to be tolerant to variable peak spacing by means of an improved peak-detection algorithm that emphasizes local chromatogram information over global properties. LifeTrace is shown to generate high-quality basecalls and reliable quality scores. It proved particularly effective when applied to MegaBACE capillary sequencing machines. In a benchmark test of 8372 dye-primer MegaBACE chromatograms, LifeTrace generated 17% fewer substitution errors, 16% fewer insertion/deletion errors, and 2.4% more aligned bases to the finished sequence than did Phred. For two sets totaling 6624 dye-terminator chromatograms, the performance improvement was 15% fewer substitution errors, 10% fewer insertion/deletion errors, and 2.1% more aligned bases. The processing time required by LifeTrace is comparable to that of Phred. The predicted quality scores were in line with observed quality scores, permitting direct use for quality clipping and in silico single nucleotide polymorphism (SNP) detection. Furthermore, we introduce a new type of quality score associated with every basecall: the gap-quality. It estimates the probability of a deletion error between the current and the following basecall. This additional quality score improves detection of single basepair deletions when used for locating potential basecalling errors during the alignment. We also describe a new protocol for benchmarking that we believe better discerns basecaller performance differences than methods previously published. PMID:11337481
NASA Astrophysics Data System (ADS)
Beigi, Maryam; Jafarian, Arman; Javanbakht, Mohammad; Wanas, H. A.; Mattern, Frank; Tabatabaei, Amin
2017-05-01
This study aims to determine the depositional facies, diagenetic processes and sequence stratigraphic elements of the subsurface carbonate-evaporite succession of the Upper Jurassic (Kimmeridgian-Tithonian) Surmeh Formation of the Salman Oil Field (the Persian Gulf, Iran), in an attempt to explore their impacts on reservoir quality. The Surmeh Formation consists mainly of carbonate rocks, intercalated with evaporite layers. Petrographically, the Surmeh Formation consists of nine microfacies (MF1-MF9). These microfacies are grouped into three facies associations related to three depositional environments (peritidal flat, lagoon and high-energy shoal) sited on the inner part of a homoclinal carbonate ramp. The recorded diagenetic processes include dolomitization, anhydritization, compaction, micritization, neomorphism, dissolution and cementation. Vertical stacking patterns of the studied facies reveal the presence of three third-order depositional sequences, each of which consists of transgressive systems tract (TST) and highstand systems tract (HST). The TSTs comprise intertidal and lagoon facies whereas the HSTs include supratidal and shoal facies. In terms of their impacts on reservoir quality, the shoal facies represent the best reservoir quality, whereas the peritidal and lagoonal facies exhibit moderate to lowest reservoir quality. Also, poikilotopic anhydrite cement played the most significant role in declining the reservoir quality, whereas the widespread dissolution of labile grains and formation of moldic and vuggy pores contributed in enhancing the reservoir quality. In addition, the HSTs have a better reservoir quality than the TSTs. This study represents an approach to use the depositional facies, diagenetic alterations and sequence stratigraphic framework of carbonate -evaporite succession for a more successful reservoir characterization.
Reid-Bayliss, Kate S; Loeb, Lawrence A
2017-08-29
Transcriptional mutagenesis (TM) due to misincorporation during RNA transcription can result in mutant RNAs, or epimutations, that generate proteins with altered properties. TM has long been hypothesized to play a role in aging, cancer, and viral and bacterial evolution. However, inadequate methodologies have limited progress in elucidating a causal association. We present a high-throughput, highly accurate RNA sequencing method to measure epimutations with single-molecule sensitivity. Accurate RNA consensus sequencing (ARC-seq) uniquely combines RNA barcoding and generation of multiple cDNA copies per RNA molecule to eliminate errors introduced during cDNA synthesis, PCR, and sequencing. The stringency of ARC-seq can be scaled to accommodate the quality of input RNAs. We apply ARC-seq to directly assess transcriptome-wide epimutations resulting from RNA polymerase mutants and oxidative stress.
Linking microarray reporters with protein functions.
Gaj, Stan; van Erk, Arie; van Haaften, Rachel I M; Evelo, Chris T A
2007-09-26
The analysis of microarray experiments requires accurate and up-to-date functional annotation of the microarray reporters to optimize the interpretation of the biological processes involved. Pathway visualization tools are used to connect gene expression data with existing biological pathways by using specific database identifiers that link reporters with elements in the pathways. This paper proposes a novel method that aims to improve microarray reporter annotation by BLASTing the original reporter sequences against a species-specific EMBL subset, that was derived from and crosslinked back to the highly curated UniProt database. The resulting alignments were filtered using high quality alignment criteria and further compared with the outcome of a more traditional approach, where reporter sequences were BLASTed against EnsEMBL followed by locating the corresponding protein (UniProt) entry for the high quality hits. Combining the results of both methods resulted in successful annotation of > 58% of all reporter sequences with UniProt IDs on two commercial array platforms, increasing the amount of Incyte reporters that could be coupled to Gene Ontology terms from 32.7% to 58.3% and to a local GenMAPP pathway from 9.6% to 16.7%. For Agilent, 35.3% of the total reporters are now linked towards GO nodes and 7.1% on local pathways. Our methods increased the annotation quality of microarray reporter sequences and allowed us to visualize more reporters using pathway visualization tools. Even in cases where the original reporter annotation showed the correct description the new identifiers often allowed improved pathway and Gene Ontology linking. These methods are freely available at http://www.bigcat.unimaas.nl/public/publications/Gaj_Annotation/.
2013-01-01
Background With high quantity and quality data production and low cost, next generation sequencing has the potential to provide new opportunities for plant phylogeographic studies on single and multiple species. Here we present an approach for in silicio chloroplast DNA assembly and single nucleotide polymorphism detection from short-read shotgun sequencing. The approach is simple and effective and can be implemented using standard bioinformatic tools. Results The chloroplast genome of Toona ciliata (Meliaceae), 159,514 base pairs long, was assembled from shotgun sequencing on the Illumina platform using de novo assembly of contigs. To evaluate its practicality, value and quality, we compared the short read assembly with an assembly completed using 454 data obtained after chloroplast DNA isolation. Sanger sequence verifications indicated that the Illumina dataset outperformed the longer read 454 data. Pooling of several individuals during preparation of the shotgun library enabled detection of informative chloroplast SNP markers. Following validation, we used the identified SNPs for a preliminary phylogeographic study of T. ciliata in Australia and to confirm low diversity across the distribution. Conclusions Our approach provides a simple method for construction of whole chloroplast genomes from shotgun sequencing of whole genomic DNA using short-read data and no available closely related reference genome (e.g. from the same species or genus). The high coverage of Illumina sequence data also renders this method appropriate for multiplexing and SNP discovery and therefore a useful approach for landscape level studies of evolutionary ecology. PMID:23497206
McCarthy, Davis J; Campbell, Kieran R; Lun, Aaron T L; Wills, Quin F
2017-04-15
Single-cell RNA sequencing (scRNA-seq) is increasingly used to study gene expression at the level of individual cells. However, preparing raw sequence data for further analysis is not a straightforward process. Biases, artifacts and other sources of unwanted variation are present in the data, requiring substantial time and effort to be spent on pre-processing, quality control (QC) and normalization. We have developed the R/Bioconductor package scater to facilitate rigorous pre-processing, quality control, normalization and visualization of scRNA-seq data. The package provides a convenient, flexible workflow to process raw sequencing reads into a high-quality expression dataset ready for downstream analysis. scater provides a rich suite of plotting tools for single-cell data and a flexible data structure that is compatible with existing tools and can be used as infrastructure for future software development. The open-source code, along with installation instructions, vignettes and case studies, is available through Bioconductor at http://bioconductor.org/packages/scater . davis@ebi.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Wei, Chaoling; Yang, Hua; Wang, Songbo; Zhao, Jian; Liu, Chun; Gao, Liping; Xia, Enhua; Lu, Ying; Tai, Yuling; She, Guangbiao; Sun, Jun; Cao, Haisheng; Tong, Wei; Gao, Qiang; Li, Yeyun; Deng, Weiwei; Jiang, Xiaolan; Wang, Wenzhao; Chen, Qi; Zhang, Shihua; Li, Haijing; Wu, Junlan; Wang, Ping; Li, Penghui; Shi, Chengying; Zheng, Fengya; Jian, Jianbo; Huang, Bei; Shan, Dai; Shi, Mingming; Fang, Congbing; Yue, Yi; Li, Fangdong; Li, Daxiang; Wei, Shu; Han, Bin; Jiang, Changjun; Yin, Ye; Xia, Tao; Zhang, Zhengzhu; Bennetzen, Jeffrey L; Zhao, Shancen; Wan, Xiaochun
2018-05-01
Tea, one of the world's most important beverage crops, provides numerous secondary metabolites that account for its rich taste and health benefits. Here we present a high-quality sequence of the genome of tea, Camellia sinensis var. sinensis (CSS), using both Illumina and PacBio sequencing technologies. At least 64% of the 3.1-Gb genome assembly consists of repetitive sequences, and the rest yields 33,932 high-confidence predictions of encoded proteins. Divergence between two major lineages, CSS and Camellia sinensis var. assamica (CSA), is calculated to ∼0.38 to 1.54 million years ago (Mya). Analysis of genic collinearity reveals that the tea genome is the product of two rounds of whole-genome duplications (WGDs) that occurred ∼30 to 40 and ∼90 to 100 Mya. We provide evidence that these WGD events, and subsequent paralogous duplications, had major impacts on the copy numbers of secondary metabolite genes, particularly genes critical to producing three key quality compounds: catechins, theanine, and caffeine. Analyses of transcriptome and phytochemistry data show that amplification and transcriptional divergence of genes encoding a large acyltransferase family and leucoanthocyanidin reductases are associated with the characteristic young leaf accumulation of monomeric galloylated catechins in tea, while functional divergence of a single member of the glutamine synthetase gene family yielded theanine synthetase. This genome sequence will facilitate understanding of tea genome evolution and tea metabolite pathways, and will promote germplasm utilization for breeding improved tea varieties. Copyright © 2018 the Author(s). Published by PNAS.
Wei, Chaoling; Yang, Hua; Wang, Songbo; Zhao, Jian; Liu, Chun; Gao, Liping; Xia, Enhua; Lu, Ying; Tai, Yuling; She, Guangbiao; Sun, Jun; Cao, Haisheng; Tong, Wei; Gao, Qiang; Li, Yeyun; Deng, Weiwei; Jiang, Xiaolan; Wang, Wenzhao; Chen, Qi; Zhang, Shihua; Li, Haijing; Wu, Junlan; Wang, Ping; Li, Penghui; Shi, Chengying; Zheng, Fengya; Jian, Jianbo; Huang, Bei; Shan, Dai; Shi, Mingming; Fang, Congbing; Yue, Yi; Li, Fangdong; Li, Daxiang; Wei, Shu; Han, Bin; Jiang, Changjun; Yin, Ye; Xia, Tao; Zhang, Zhengzhu; Bennetzen, Jeffrey L.; Zhao, Shancen; Wan, Xiaochun
2018-01-01
Tea, one of the world’s most important beverage crops, provides numerous secondary metabolites that account for its rich taste and health benefits. Here we present a high-quality sequence of the genome of tea, Camellia sinensis var. sinensis (CSS), using both Illumina and PacBio sequencing technologies. At least 64% of the 3.1-Gb genome assembly consists of repetitive sequences, and the rest yields 33,932 high-confidence predictions of encoded proteins. Divergence between two major lineages, CSS and Camellia sinensis var. assamica (CSA), is calculated to ∼0.38 to 1.54 million years ago (Mya). Analysis of genic collinearity reveals that the tea genome is the product of two rounds of whole-genome duplications (WGDs) that occurred ∼30 to 40 and ∼90 to 100 Mya. We provide evidence that these WGD events, and subsequent paralogous duplications, had major impacts on the copy numbers of secondary metabolite genes, particularly genes critical to producing three key quality compounds: catechins, theanine, and caffeine. Analyses of transcriptome and phytochemistry data show that amplification and transcriptional divergence of genes encoding a large acyltransferase family and leucoanthocyanidin reductases are associated with the characteristic young leaf accumulation of monomeric galloylated catechins in tea, while functional divergence of a single member of the glutamine synthetase gene family yielded theanine synthetase. This genome sequence will facilitate understanding of tea genome evolution and tea metabolite pathways, and will promote germplasm utilization for breeding improved tea varieties. PMID:29678829
Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing
Tourlousse, Dieter M.; Yoshiike, Satowa; Ohashi, Akiko; Matsukura, Satoko; Noda, Naohiro
2017-01-01
Abstract High-throughput sequencing of 16S rRNA gene amplicons (16S-seq) has become a widely deployed method for profiling complex microbial communities but technical pitfalls related to data reliability and quantification remain to be fully addressed. In this work, we have developed and implemented a set of synthetic 16S rRNA genes to serve as universal spike-in standards for 16S-seq experiments. The spike-ins represent full-length 16S rRNA genes containing artificial variable regions with negligible identity to known nucleotide sequences, permitting unambiguous identification of spike-in sequences in 16S-seq read data from any microbiome sample. Using defined mock communities and environmental microbiota, we characterized the performance of the spike-in standards and demonstrated their utility for evaluating data quality on a per-sample basis. Further, we showed that staggered spike-in mixtures added at the point of DNA extraction enable concurrent estimation of absolute microbial abundances suitable for comparative analysis. Results also underscored that template-specific Illumina sequencing artifacts may lead to biases in the perceived abundance of certain taxa. Taken together, the spike-in standards represent a novel bioanalytical tool that can substantially improve 16S-seq-based microbiome studies by enabling comprehensive quality control along with absolute quantification. PMID:27980100
Rutvisuttinunt, Wiriya; Chinnawirotpisan, Piyawan; Simasathien, Sriluck; Shrestha, Sanjaya K; Yoon, In-Kyu; Klungthong, Chonticha; Fernandez, Stefan
2013-11-01
Active global surveillance and characterization of influenza viruses are essential for better preparation against possible pandemic events. Obtaining comprehensive information about the influenza genome can improve our understanding of the evolution of influenza viruses and emergence of new strains, and improve the accuracy when designing preventive vaccines. This study investigated the use of deep sequencing by the next-generation sequencing (NGS) Illumina MiSeq Platform to obtain complete genome sequence information from influenza virus isolates. The influenza virus isolates were cultured from 6 respiratory acute clinical specimens collected in Thailand and Nepal. DNA libraries obtained from each viral isolate were mixed and all were sequenced simultaneously. Total information of 2.6 Gbases was obtained from a 455±14 K/mm2 density with 95.76% (8,571,655/8,950,724 clusters) of the clusters passing quality control (QC) filters. Approximately 93.7% of all sequences from Read1 and 83.5% from Read2 contained high quality sequences that were ≥Q30, a base calling QC score standard. Alignments analysis identified three seasonal influenza A H3N2 strains, one 2009 pandemic influenza A H1N1 strain and two influenza B strains. The nearly entire genomes of all six virus isolates yielded equal or greater than 600-fold sequence coverage depth. MiSeq Platform identified seasonal influenza A H3N2, 2009 pandemic influenza A H1N1and influenza B in the DNA library mixtures efficiently. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.
Eshraghi, Leila; De Meyer, Sofie E.; Tian, Rui; ...
2015-10-26
Bradyrhizobium sp. strain WSM1743 is an aerobic, motile, Gram-negative, non-spore-forming rod that can exist as a soil saprophyte or as a legume microsymbiont of an Indigofera sp. WSM1743 was isolated from a nodule recovered from the roots of an Indigofera sp. growing 20 km north of Carnarvon in Australia. It is slow growing, tolerates up to 1 % NaCl and is capable of growth at 37 °C. Here we describe the features of Bradyrhizobium sp. strain WSM1743, together with genome sequence information and its annotation. Finally, the 8,341,956 bp high-quality permanent draft genome is arranged into 163 scaffolds and 167more » contigs, contains 7908 protein-coding genes and 75 RNA-only encoding genes and was sequenced as part of the Root Nodule Bacteria chapter of the Genomic Encyclopedia of Bacteria and Archaea project.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eshraghi, Leila; De Meyer, Sofie E.; Tian, Rui
Bradyrhizobium sp. strain WSM1743 is an aerobic, motile, Gram-negative, non-spore-forming rod that can exist as a soil saprophyte or as a legume microsymbiont of an Indigofera sp. WSM1743 was isolated from a nodule recovered from the roots of an Indigofera sp. growing 20 km north of Carnarvon in Australia. It is slow growing, tolerates up to 1 % NaCl and is capable of growth at 37 °C. Here we describe the features of Bradyrhizobium sp. strain WSM1743, together with genome sequence information and its annotation. Finally, the 8,341,956 bp high-quality permanent draft genome is arranged into 163 scaffolds and 167more » contigs, contains 7908 protein-coding genes and 75 RNA-only encoding genes and was sequenced as part of the Root Nodule Bacteria chapter of the Genomic Encyclopedia of Bacteria and Archaea project.« less
Breast MRI at 7 Tesla with a bilateral coil and robust fat suppression.
Brown, Ryan; Storey, Pippa; Geppert, Christian; McGorty, KellyAnne; Klautau Leite, Ana Paula; Babb, James; Sodickson, Daniel K; Wiggins, Graham C; Moy, Linda
2014-03-01
To develop a bilateral coil and fat suppressed T1-weighted sequence for 7 Tesla (T) breast MRI. A dual-solenoid coil and three-dimensional (3D) T1w gradient echo sequence with B1+ insensitive fat suppression (FS) were developed. T1w FS image quality was characterized through image uniformity and fat-water contrast measurements in 11 subjects. Signal-to-noise ratio (SNR) and flip angle maps were acquired to assess the coil performance. Bilateral contrast-enhanced and unilateral high resolution (0.6 mm isotropic, 6.5 min acquisition time) imaging highlighted the 7T SNR advantage. Reliable and effective FS and high image quality was observed in all subjects at 7T, indicating that the custom coil and pulse sequence were insensitive to high-field obstacles such as variable tissue loading. 7T and 3T image uniformity was similar (P=0.24), indicating adequate 7T B1+ uniformity. High 7T SNR and fat-water contrast enabled 0.6 mm isotropic imaging and visualization of a high level of fibroglandular tissue detail. 7T T1w FS bilateral breast imaging is feasible with a custom radiofrequency (RF) coil and pulse sequence. Similar image uniformity was achieved at 7T and 3T, despite different RF field behavior and variable coil-tissue interaction due to anatomic differences that might be expected to alter magnetic field patterns. Copyright © 2013 Wiley Periodicals, Inc.
Breast MRI at 7 Tesla with a Bilateral Coil and Robust Fat Suppression
Brown, Ryan; Storey, Pippa; Geppert, Christian; McGorty, KellyAnne; Leite, Ana Paula Klautau; Babb, James; Sodickson, Daniel K.; Wiggins, Graham C.; Moy, Linda
2013-01-01
Purpose To develop a bilateral coil and optimized fat suppressed T1-weighted sequence for 7T breast MRI. Materials and Methods A dual-solenoid coil and 3D T1w gradient echo sequence with B1+ insensitive fat suppression (FS) were developed for 7T. T1w FS image quality was characterized through image uniformity and fat/water contrast measurements in 11 subjects. Signal-to-noise ratio (SNR) and flip angle maps were acquired to assess the coil performance. Bilateral contrast-enhanced and unilateral high resolution (0.6 mm isotropic, 6.5 min acquisition time) imaging highlighted the 7 T SNR advantage. Results Reliable and effective FS and high image quality was observed in all subjects at 7T, indicating that the custom coil and pulse sequence were insensitive to high-field obstacles such as variable tissue loading. 7T and 3T T1w FS image uniformity was similar (P=0.24), indicating adequate 7T B1+ uniformity. High 7T SNR and fat/water contrast enabled 0.6 mm isotropic imaging and visualization of a high level of fibroglandular tissue detail. Conclusion 7T T1w FS bilateral breast imaging is feasible with a custom RF coil and pulse sequence. Similar image uniformity was achieved at 7T and 3T, despite different RF field behavior and variable coil-tissue interaction due to anatomic differences that might be expected to alter magnetic field patterns. PMID:24123517
Comparing de novo genome assembly: the long and short of it.
Narzisi, Giuseppe; Mishra, Bud
2011-04-29
Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers--both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies--are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing "next-generation" assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium.
High density FTA plates serve as efficient long-term sample storage for HLA genotyping.
Lange, V; Arndt, K; Schwarzelt, C; Boehme, I; Giani, A S; Schmidt, A H; Ehninger, G; Wassmuth, R
2014-02-01
Storage of dried blood spots (DBS) on high-density FTA(®) plates could constitute an appealing alternative to frozen storage. However, it remains controversial whether DBS are suitable for high-resolution sequencing of human leukocyte antigen (HLA) alleles. Therefore, we extracted DNA from DBS that had been stored for up to 4 years, using six different methods. We identified those extraction methods that recovered sufficient high-quality DNA for reliable high-resolution HLA sequencing. Further, we confirmed that frozen whole blood samples that had been stored for several years can be transferred to filter paper without compromising HLA genotyping upon extraction. Concluding, DNA derived from high-density FTA(®) plates is suitable for high-resolution HLA sequencing, provided that appropriate extraction protocols are employed. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Hulse-Kemp, Amanda M; Maheshwari, Shamoni; Stoffel, Kevin; Hill, Theresa A; Jaffe, David; Williams, Stephen R; Weisenfeld, Neil; Ramakrishnan, Srividya; Kumar, Vijay; Shah, Preyas; Schatz, Michael C; Church, Deanna M; Van Deynze, Allen
2018-01-01
Linked-Read sequencing technology has recently been employed successfully for de novo assembly of human genomes, however, the utility of this technology for complex plant genomes is unproven. We evaluated the technology for this purpose by sequencing the 3.5-gigabase (Gb) diploid pepper ( Capsicum annuum ) genome with a single Linked-Read library. Plant genomes, including pepper, are characterized by long, highly similar repetitive sequences. Accordingly, significant effort is used to ensure that the sequenced plant is highly homozygous and the resulting assembly is a haploid consensus. With a phased assembly approach, we targeted a heterozygous F 1 derived from a wide cross to assess the ability to derive both haplotypes and characterize a pungency gene with a large insertion/deletion. The Supernova software generated a highly ordered, more contiguous sequence assembly than all currently available C. annuum reference genomes. Over 83% of the final assembly was anchored and oriented using four publicly available de novo linkage maps. A comparison of the annotation of conserved eukaryotic genes indicated the completeness of assembly. The validity of the phased assembly is further demonstrated with the complete recovery of both 2.5-Kb insertion/deletion haplotypes of the PUN1 locus in the F 1 sample that represents pungent and nonpungent peppers, as well as nearly full recovery of the BUSCO2 gene set within each of the two haplotypes. The most contiguous pepper genome assembly to date has been generated which demonstrates that Linked-Read library technology provides a tool to de novo assemble complex highly repetitive heterozygous plant genomes. This technology can provide an opportunity to cost-effectively develop high-quality genome assemblies for other complex plants and compare structural and gene differences through accurate haplotype reconstruction.
Integrative workflows for metagenomic analysis
Ladoukakis, Efthymios; Kolisis, Fragiskos N.; Chatziioannou, Aristotelis A.
2014-01-01
The rapid evolution of all sequencing technologies, described by the term Next Generation Sequencing (NGS), have revolutionized metagenomic analysis. They constitute a combination of high-throughput analytical protocols, coupled to delicate measuring techniques, in order to potentially discover, properly assemble and map allelic sequences to the correct genomes, achieving particularly high yields for only a fraction of the cost of traditional processes (i.e., Sanger). From a bioinformatic perspective, this boils down to many GB of data being generated from each single sequencing experiment, rendering the management or even the storage, critical bottlenecks with respect to the overall analytical endeavor. The enormous complexity is even more aggravated by the versatility of the processing steps available, represented by the numerous bioinformatic tools that are essential, for each analytical task, in order to fully unveil the genetic content of a metagenomic dataset. These disparate tasks range from simple, nonetheless non-trivial, quality control of raw data to exceptionally complex protein annotation procedures, requesting a high level of expertise for their proper application or the neat implementation of the whole workflow. Furthermore, a bioinformatic analysis of such scale, requires grand computational resources, imposing as the sole realistic solution, the utilization of cloud computing infrastructures. In this review article we discuss different, integrative, bioinformatic solutions available, which address the aforementioned issues, by performing a critical assessment of the available automated pipelines for data management, quality control, and annotation of metagenomic data, embracing various, major sequencing technologies and applications. PMID:25478562
A filtering method to generate high quality short reads using illumina paired-end technology.
Eren, A Murat; Vineis, Joseph H; Morrison, Hilary G; Sogin, Mitchell L
2013-01-01
Consensus between independent reads improves the accuracy of genome and transcriptome analyses, however lack of consensus between very similar sequences in metagenomic studies can and often does represent natural variation of biological significance. The common use of machine-assigned quality scores on next generation platforms does not necessarily correlate with accuracy. Here, we describe using the overlap of paired-end, short sequence reads to identify error-prone reads in marker gene analyses and their contribution to spurious OTUs following clustering analysis using QIIME. Our approach can also reduce error in shotgun sequencing data generated from libraries with small, tightly constrained insert sizes. The open-source implementation of this algorithm in Python programming language with user instructions can be obtained from https://github.com/meren/illumina-utils.
USDA-ARS?s Scientific Manuscript database
Predicted rising global temperatures due to climate change have generated a demand for crops that are resistant to yield and quality losses from heat stress. Broccoli (Brassica oleracea var. italica) is a cool weather crop with high temperatures during production decreasing both head quality and yie...
Nair, Shalima S; Luu, Phuc-Loi; Qu, Wenjia; Maddugoda, Madhavi; Huschtscha, Lily; Reddel, Roger; Chenevix-Trench, Georgia; Toso, Martina; Kench, James G; Horvath, Lisa G; Hayes, Vanessa M; Stricker, Phillip D; Hughes, Timothy P; White, Deborah L; Rasko, John E J; Wong, Justin J-L; Clark, Susan J
2018-05-28
Comprehensive genome-wide DNA methylation profiling is critical to gain insights into epigenetic reprogramming during development and disease processes. Among the different genome-wide DNA methylation technologies, whole genome bisulphite sequencing (WGBS) is considered the gold standard for assaying genome-wide DNA methylation at single base resolution. However, the high sequencing cost to achieve the optimal depth of coverage limits its application in both basic and clinical research. To achieve 15× coverage of the human methylome, using WGBS, requires approximately three lanes of 100-bp-paired-end Illumina HiSeq 2500 sequencing. It is important, therefore, for advances in sequencing technologies to be developed to enable cost-effective high-coverage sequencing. In this study, we provide an optimised WGBS methodology, from library preparation to sequencing and data processing, to enable 16-20× genome-wide coverage per single lane of HiSeq X Ten, HCS 3.3.76. To process and analyse the data, we developed a WGBS pipeline (METH10X) that is fast and can call SNPs. We performed WGBS on both high-quality intact DNA and degraded DNA from formalin-fixed paraffin-embedded tissue. First, we compared different library preparation methods on the HiSeq 2500 platform to identify the best method for sequencing on the HiSeq X Ten. Second, we optimised the PhiX and genome spike-ins to achieve higher quality and coverage of WGBS data on the HiSeq X Ten. Third, we performed integrated whole genome sequencing (WGS) and WGBS of the same DNA sample in a single lane of HiSeq X Ten to improve data output. Finally, we compared methylation data from the HiSeq 2500 and HiSeq X Ten and found high concordance (Pearson r > 0.9×). Together we provide a systematic, efficient and complete approach to perform and analyse WGBS on the HiSeq X Ten. Our protocol allows for large-scale WGBS studies at reasonable processing time and cost on the HiSeq X Ten platform.
Hong, Xutao; Chen, Jing; Liu, Lin; Wu, Huan; Tan, Haiqin; Xie, Guangfa; Xu, Qian; Zou, Huijun; Yu, Wenjing; Wang, Lan; Qin, Nan
2016-01-01
Chinese Rice Wine (CRW) is a common alcoholic beverage in China. To investigate the influence of microbial composition on the quality of CRW, high throughput sequencing was performed for 110 wine samples on bacterial 16S rRNA gene and fungal Internal Transcribed Spacer II (ITS2). Bioinformatic analyses demonstrated that the quality of yeast starter and final wine correlated with microbial taxonomic composition, which was exemplified by our finding that wine spoilage resulted from a high proportion of genus Lactobacillus. Subsequently, based on Lactobacillus abundance of an early stage, a model was constructed to predict final wine quality. In addition, three batches of 20 representative wine samples selected from a pool of 110 samples were further analyzed in metagenomics. The results revealed that wine spoilage was due to rapid growth of Lactobacillus brevis at the early stage of fermentation. Gene functional analysis indicated the importance of some pathways such as synthesis of biotin, malolactic fermentation and production of short-chain fatty acid. These results led to a conclusion that metabolisms of microbes influence the wine quality. Thus, nurturing of beneficial microbes and inhibition of undesired ones are both important for the mechanized brewery. PMID:27241862
Hong, Xutao; Chen, Jing; Liu, Lin; Wu, Huan; Tan, Haiqin; Xie, Guangfa; Xu, Qian; Zou, Huijun; Yu, Wenjing; Wang, Lan; Qin, Nan
2016-05-31
Chinese Rice Wine (CRW) is a common alcoholic beverage in China. To investigate the influence of microbial composition on the quality of CRW, high throughput sequencing was performed for 110 wine samples on bacterial 16S rRNA gene and fungal Internal Transcribed Spacer II (ITS2). Bioinformatic analyses demonstrated that the quality of yeast starter and final wine correlated with microbial taxonomic composition, which was exemplified by our finding that wine spoilage resulted from a high proportion of genus Lactobacillus. Subsequently, based on Lactobacillus abundance of an early stage, a model was constructed to predict final wine quality. In addition, three batches of 20 representative wine samples selected from a pool of 110 samples were further analyzed in metagenomics. The results revealed that wine spoilage was due to rapid growth of Lactobacillus brevis at the early stage of fermentation. Gene functional analysis indicated the importance of some pathways such as synthesis of biotin, malolactic fermentation and production of short-chain fatty acid. These results led to a conclusion that metabolisms of microbes influence the wine quality. Thus, nurturing of beneficial microbes and inhibition of undesired ones are both important for the mechanized brewery.
Li, Fuchao; Jiang, Peng; Zheng, Huajun; Wang, Shengyue; Zhao, Guoping; Qin, Song; Liu, Zhaopu
2011-07-01
Streptomyces griseoaurantiacus M045, isolated from marine sediment, produces manumycin and chinikomycin antibiotics. Here we present a high-quality draft genome sequence of S. griseoaurantiacus M045, the first marine Streptomyces species to be sequenced and annotated. The genome encodes several gene clusters for biosynthesis of secondary metabolites and has provided insight into genomic islands linking secondary metabolism to functional adaptation in marine S. griseoaurantiacus M045.
Song, Ju Yeon; Jeong, Haeyoung; Yu, Dong Su; Fischbach, Michael A.; Park, Hong-Seog; Kim, Jae Jong; Seo, Jeong-Sun; Jensen, Susan E.; Oh, Tae Kwang; Lee, Kye Joon; Kim, Jihyun F.
2010-01-01
Streptomyces clavuligerus is an important industrial strain that produces a number of antibiotics, including clavulanic acid and cephamycin C. A high-quality draft genome sequence of the S. clavuligerus NRRL 3585 strain was produced by employing a hybrid approach that involved Sanger sequencing, Roche/454 pyrosequencing, optical mapping, and partial finishing. Its genome, comprising four linear replicons, one chromosome, and four plasmids, carries numerous sets of genes involved in the biosynthesis of secondary metabolites, including a variety of antibiotics. PMID:20889745
2011-01-01
Genome targeting methods enable cost-effective capture of specific subsets of the genome for sequencing. We present here an automated, highly scalable method for carrying out the Solution Hybrid Selection capture approach that provides a dramatic increase in scale and throughput of sequence-ready libraries produced. Significant process improvements and a series of in-process quality control checkpoints are also added. These process improvements can also be used in a manual version of the protocol. PMID:21205303
Overcoming bias and systematic errors in next generation sequencing data.
Taub, Margaret A; Corrada Bravo, Hector; Irizarry, Rafael A
2010-12-10
Considerable time and effort has been spent in developing analysis and quality assessment methods to allow the use of microarrays in a clinical setting. As is the case for microarrays and other high-throughput technologies, data from new high-throughput sequencing technologies are subject to technological and biological biases and systematic errors that can impact downstream analyses. Only when these issues can be readily identified and reliably adjusted for will clinical applications of these new technologies be feasible. Although much work remains to be done in this area, we describe consistently observed biases that should be taken into account when analyzing high-throughput sequencing data. In this article, we review current knowledge about these biases, discuss their impact on analysis results, and propose solutions.
Oliveira, R R; Viana, A J C; Reátegui, A C E; Vincentz, M G A
2015-12-29
Determination of gene expression is an important tool to study biological processes and relies on the quality of the extracted RNA. Changes in gene expression profiles may be directly related to mutations in regulatory DNA sequences or alterations in DNA cytosine methylation, which is an epigenetic mark. Correlation of gene expression with DNA sequence or epigenetic mark polymorphism is often desirable; for this, a robust protocol to isolate high-quality RNA and DNA simultaneously from the same sample is required. Although commercial kits and protocols are available, they are mainly optimized for animal tissues and, in general, restricted to RNA or DNA extraction, not both. In the present study, we describe an efficient and accessible method to extract both RNA and DNA simultaneously from the same sample of various plant tissues, using small amounts of starting material. The protocol was efficient in the extraction of high-quality nucleic acids from several Arabidopsis thaliana tissues (e.g., leaf, inflorescence stem, flower, fruit, cotyledon, seedlings, root, and embryo) and from other tissues of non-model plants, such as Avicennia schaueriana (Acanthaceae), Theobroma cacao (Malvaceae), Paspalum notatum (Poaceae), and Sorghum bicolor (Poaceae). The obtained nucleic acids were used as templates for downstream analyses, such as mRNA sequencing, quantitative real time-polymerase chain reaction, bisulfite treatment, and others; the results were comparable to those obtained with commercial kits. We believe that this protocol could be applied to a broad range of plant species, help avoid technical and sampling biases, and facilitate several RNA- and DNA-dependent analyses.
Genome Sequence of Enterohemorrhagic Escherichia coli NCCP15658
Song, Ju Yeon; Yoo, Ran Hee; Jang, Song Yee; Seong, Won-Keun; Kim, Seon-Young; Jeong, Haeyoung; Kang, Sung Gyun; Kim, Byung Kwon; Kwon, Soon-Kyeong; Lee, Choong Hoon; Yu, Dong Su; Park, Mi-Sun
2012-01-01
Enterohemorrhagic Escherichia coli causes severe food-borne disease in the guts of humans and animals. Here, we report the high-quality draft genome sequence of E. coli NCCP15658 isolated from a patient in the Republic of Korea. Its genome size was determined to be 5.46 Mb, and its genomic features, including genes encoding virulence factors, were analyzed. PMID:22740673
NASA Astrophysics Data System (ADS)
Munawar, Muhammad Jawad; Lin, Chengyan; Chunmei, Dong; Zhang, Xianguo; Zhao, Haiyan; Xiao, Shuming; Azeem, Tahir; Zahid, Muhammad Aleem; Ma, Cunfei
2018-05-01
The architecture and quality of lacustrine turbidites that act as petroleum reservoirs are less well documented. Reservoir architecture and multiscale heterogeneity in turbidites represent serious challenges to production performance. Additionally, establishing a hierarchy profile to delineate heterogeneity is a challenging task in lacustrine turbidite deposits. Here, we report on the turbidites in the middle third member of the Eocene Shahejie Formation (Es3), which was deposited during extensive Middle to Late Eocene rifting in the Dongying Depression. Seismic records, wireline log responses, and core observations were integrated to describe the reservoir heterogeneity by delineating the architectural elements, sequence stratigraphic framework and lithofacies assemblage. A petrographic approach was adopted to constrain microscopic heterogeneity using an optical microscope, routine core analyses and X-ray diffraction (XRD) analyses. The Es3m member is interpreted as a sequence set composed of four composite sequences: CS1, CS2, CS3 and CS4. A total of forty-five sequences were identified within these four composite sequences. Sand bodies were mainly deposited as channels, levees, overbank splays, lobes and lobe fringes. The combination of fining-upward and coarsening-upward lithofacies patterns in the architectural elements produces highly complex composite flow units. Microscopic heterogeneity is produced by diagenetic alteration processes (i.e., feldspar dissolution, authigenic clay formation and quartz cementation). The widespread kaolinization of feldspar and mobilization of materials enhanced the quality of the reservoir by producing secondary enlarged pores. In contrast, the formation of pore-filling authigenic illite and illite/smectite clays reduced its permeability. Recovery rates are higher in the axial areas and smaller in the marginal areas of architectural elements. This study represents a significant insight into the reservoir architecture and heterogeneity of lacustrine turbidites, and the understanding of compartmentalization and distribution of high-quality sand reservoirs can be applied to improve primary and secondary production in these fields.
Kagale, Sateesh; Koh, Chushin; Nixon, John; Bollina, Venkatesh; Clarke, Wayne E.; Tuteja, Reetu; Spillane, Charles; Robinson, Stephen J.; Links, Matthew G.; Clarke, Carling; Higgins, Erin E.; Huebert, Terry; Sharpe, Andrew G.; Parkin, Isobel A. P.
2014-01-01
Camelina sativa is an oilseed with desirable agronomic and oil-quality attributes for a viable industrial oil platform crop. Here we generate the first chromosome-scale high-quality reference genome sequence for C. sativa and annotated 89,418 protein-coding genes, representing a whole-genome triplication event relative to the crucifer model Arabidopsis thaliana. C. sativa represents the first crop species to be sequenced from lineage I of the Brassicaceae. The well-preserved hexaploid genome structure of C. sativa surprisingly mirrors those of economically important amphidiploid Brassica crop species from lineage II as well as wheat and cotton. The three genomes of C. sativa show no evidence of fractionation bias and limited expression-level bias, both characteristics commonly associated with polyploid evolution. The highly undifferentiated polyploid genome of C. sativa presents significant consequences for breeding and genetic manipulation of this industrial oil crop. PMID:24759634
Si, Zengzhi; Du, Bing; Huo, Jinxi; He, Shaozhen; Liu, Qingchang; Zhai, Hong
2016-11-21
Sweetpotato, Ipomoea batatas (L.) Lam., is an important food crop widely grown in the world. However, little is known about the genome of this species because it is a highly heterozygous hexaploid. Gaining a more in-depth knowledge of sweetpotato genome is therefore necessary and imperative. In this study, the first bacterial artificial chromosome (BAC) library of sweetpotato was constructed. Clones from the BAC library were end-sequenced and analyzed to provide genome-wide information about this species. The BAC library contained 240,384 clones with an average insert size of 101 kb and had a 7.93-10.82 × coverage of the genome, and the probability of isolating any single-copy DNA sequence from the library was more than 99%. Both ends of 8310 BAC clones randomly selected from the library were sequenced to generate 11,542 high-quality BAC-end sequences (BESs), with an accumulative length of 7,595,261 bp and an average length of 658 bp. Analysis of the BESs revealed that 12.17% of the sweetpotato genome were known repetitive DNA, including 7.37% long terminal repeat (LTR) retrotransposons, 1.15% Non-LTR retrotransposons and 1.42% Class II DNA transposons etc., 18.31% of the genome were identified as sweetpotato-unique repetitive DNA and 10.00% of the genome were predicted to be coding regions. In total, 3,846 simple sequences repeats (SSRs) were identified, with a density of one SSR per 1.93 kb, from which 288 SSRs primers were designed and tested for length polymorphism using 20 sweetpotato accessions, 173 (60.07%) of them produced polymorphic bands. Sweetpotato BESs had significant hits to the genome sequences of I. trifida and more matches to the whole-genome sequences of Solanum lycopersicum than those of Vitis vinifera, Theobroma cacao and Arabidopsis thaliana. The first BAC library for sweetpotato has been successfully constructed. The high quality BESs provide first insights into sweetpotato genome composition, and have significant hits to the genome sequences of I. trifida and more matches to the whole-genome sequences of Solanum lycopersicum. These resources as a robust platform will be used in high-resolution mapping, gene cloning, assembly of genome sequences, comparative genomics and evolution for sweetpotato.
Optimisation of DNA extraction from the crustacean Daphnia
Athanasio, Camila Gonçalves; Chipman, James K.; Viant, Mark R.
2016-01-01
Daphnia are key model organisms for mechanistic studies of phenotypic plasticity, adaptation and microevolution, which have led to an increasing demand for genomics resources. A key step in any genomics analysis, such as high-throughput sequencing, is the availability of sufficient and high quality DNA. Although commercial kits exist to extract genomic DNA from several species, preparation of high quality DNA from Daphnia spp. and other chitinous species can be challenging. Here, we optimise methods for tissue homogenisation, DNA extraction and quantification customised for different downstream analyses (e.g., LC-MS/MS, Hiseq, mate pair sequencing or Nanopore). We demonstrate that if Daphnia magna are homogenised as whole animals (including the carapace), absorbance-based DNA quantification methods significantly over-estimate the amount of DNA, resulting in using insufficient starting material for experiments, such as preparation of sequencing libraries. This is attributed to the high refractive index of chitin in Daphnia’s carapace at 260 nm. Therefore, unless the carapace is removed by overnight proteinase digestion, the extracted DNA should be quantified with fluorescence-based methods. However, overnight proteinase digestion will result in partial fragmentation of DNA therefore the prepared DNA is not suitable for downstream methods that require high molecular weight DNA, such as PacBio, mate pair sequencing and Nanopore. In conclusion, we found that the MasterPure DNA purification kit, coupled with grinding of frozen tissue, is the best method for extraction of high molecular weight DNA as long as the extracted DNA is quantified with fluorescence-based methods. This method generated high yield and high molecular weight DNA (3.10 ± 0.63 ng/µg dry mass, fragments >60 kb), free of organic contaminants (phenol, chloroform) and is suitable for large number of downstream analyses. PMID:27190714
High resolution identity testing of inactivated poliovirus vaccines.
Mee, Edward T; Minor, Philip D; Martin, Javier
2015-07-09
Definitive identification of poliovirus strains in vaccines is essential for quality control, particularly where multiple wild-type and Sabin strains are produced in the same facility. Sequence-based identification provides the ultimate in identity testing and would offer several advantages over serological methods. We employed random RT-PCR and high throughput sequencing to recover full-length genome sequences from monovalent and trivalent poliovirus vaccine products at various stages of the manufacturing process. All expected strains were detected in previously characterised products and the method permitted identification of strains comprising as little as 0.1% of sequence reads. Highly similar Mahoney and Sabin 1 strains were readily discriminated on the basis of specific variant positions. Analysis of a product known to contain incorrect strains demonstrated that the method correctly identified the contaminants. Random RT-PCR and shotgun sequencing provided high resolution identification of vaccine components. In addition to the recovery of full-length genome sequences, the method could also be easily adapted to the characterisation of minor variant frequencies and distinction of closely related products on the basis of distinguishing consensus and low frequency polymorphisms. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Visually lossless compression of digital hologram sequences
NASA Astrophysics Data System (ADS)
Darakis, Emmanouil; Kowiel, Marcin; Näsänen, Risto; Naughton, Thomas J.
2010-01-01
Digital hologram sequences have great potential for the recording of 3D scenes of moving macroscopic objects as their numerical reconstruction can yield a range of perspective views of the scene. Digital holograms inherently have large information content and lossless coding of holographic data is rather inefficient due to the speckled nature of the interference fringes they contain. Lossy coding of still holograms and hologram sequences has shown promising results. By definition, lossy compression introduces errors in the reconstruction. In all of the previous studies, numerical metrics were used to measure the compression error and through it, the coding quality. Digital hologram reconstructions are highly speckled and the speckle pattern is very sensitive to data changes. Hence, numerical quality metrics can be misleading. For example, for low compression ratios, a numerically significant coding error can have visually negligible effects. Yet, in several cases, it is of high interest to know how much lossy compression can be achieved, while maintaining the reconstruction quality at visually lossless levels. Using an experimental threshold estimation method, the staircase algorithm, we determined the highest compression ratio that was not perceptible to human observers for objects compressed with Dirac and MPEG-4 compression methods. This level of compression can be regarded as the point below which compression is perceptually lossless although physically the compression is lossy. It was found that up to 4 to 7.5 fold compression can be obtained with the above methods without any perceptible change in the appearance of video sequences.
Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu
2015-01-01
The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Structural and functional partitioning of bread wheat chromosome 3B.
Choulet, Frédéric; Alberti, Adriana; Theil, Sébastien; Glover, Natasha; Barbe, Valérie; Daron, Josquin; Pingault, Lise; Sourdille, Pierre; Couloux, Arnaud; Paux, Etienne; Leroy, Philippe; Mangenot, Sophie; Guilhot, Nicolas; Le Gouis, Jacques; Balfourier, Francois; Alaux, Michael; Jamilloux, Véronique; Poulain, Julie; Durand, Céline; Bellec, Arnaud; Gaspin, Christine; Safar, Jan; Dolezel, Jaroslav; Rogers, Jane; Vandepoele, Klaas; Aury, Jean-Marc; Mayer, Klaus; Berges, Hélène; Quesneville, Hadi; Wincker, Patrick; Feuillet, Catherine
2014-07-18
We produced a reference sequence of the 1-gigabase chromosome 3B of hexaploid bread wheat. By sequencing 8452 bacterial artificial chromosomes in pools, we assembled a sequence of 774 megabases carrying 5326 protein-coding genes, 1938 pseudogenes, and 85% of transposable elements. The distribution of structural and functional features along the chromosome revealed partitioning correlated with meiotic recombination. Comparative analyses indicated high wheat-specific inter- and intrachromosomal gene duplication activities that are potential sources of variability for adaption. In addition to providing a better understanding of the organization, function, and evolution of a large and polyploid genome, the availability of a high-quality sequence anchored to genetic maps will accelerate the identification of genes underlying important agronomic traits. Copyright © 2014, American Association for the Advancement of Science.
Rochi, Lucia; Diéguez, María José; Burguener, Germán; Darino, Martín Alejandro; Pergolesi, María Fernanda; Ingala, Lorena Romina; Cuyeu, Alba Romina; Turjanski, Adrián; Kreff, Enrique Domingo; Sacco, Francisco
2018-03-01
Rust fungi are one of the most devastating pathogens of crop plants. The biotrophic fungus Puccinia sorghi Schwein (Ps) is responsible for maize common rust, an endemic disease of maize (Zea mays L.) in Argentina that causes significant yield losses in corn production. In spite of this, the Ps genomic sequence was not available. We used Illumina sequencing to rapidly produce the 99.6Mbdraft genome sequence of Ps race RO10H11247, derived from a single-uredinial isolate from infected maize leaves collected in the Argentine Corn Belt Region during 2010. High quality reads were obtained from 200bppaired-end and 5000bpmate-paired libraries and assembled in 15,722 scaffolds. A pipeline which combined an ab initio program with homology-based models and homology to in planta enriched ESTs from four cereal pathogenic fungus (the three sequenced wheat rusts and Ustilago maydis) was used to identify 21,087 putative coding sequences, of which 1599 might be part of the Ps RO10H11247 secretome. Among the 458 highly conserved protein families from the euKaryotic Orthologous Groups (KOG) that occur in a wide range of eukaryotic organisms, 97.5% have at least one member with high homology in the Ps assembly (TBlastN, E-value⩽e-10) covering more than 50% of the length of the KOG protein. Comparative studies with the three sequenced wheat rust fungus, and microsynteny analysis involving Puccinia striiformis f. sp. tritici (Pst, wheat stripe rust fungus), support the quality achieved. The results presented here show the effectiveness of the Illumina strategy for sequencing dikaryotic genomes of non-model organisms and provides reliable DNA sequence information for genomic studies, including pathogenic mechanisms of this maize fungus and molecular marker design. Copyright © 2016 Elsevier Inc. All rights reserved.
High-throughput physical mapping of chromosomes using automated in situ hybridization.
George, Phillip; Sharakhova, Maria V; Sharakhov, Igor V
2012-06-28
Projects to obtain whole-genome sequences for 10,000 vertebrate species and for 5,000 insect and related arthropod species are expected to take place over the next 5 years. For example, the sequencing of the genomes for 15 malaria mosquitospecies is currently being done using an Illumina platform. This Anopheles species cluster includes both vectors and non-vectors of malaria. When the genome assemblies become available, researchers will have the unique opportunity to perform comparative analysis for inferring evolutionary changes relevant to vector ability. However, it has proven difficult to use next-generation sequencing reads to generate high-quality de novo genome assemblies. Moreover, the existing genome assemblies for Anopheles gambiae, although obtained using the Sanger method, are gapped or fragmented. Success of comparative genomic analyses will be limited if researchers deal with numerous sequencing contigs, rather than with chromosome-based genome assemblies. Fragmented, unmapped sequences create problems for genomic analyses because: (i) unidentified gaps cause incorrect or incomplete annotation of genomic sequences; (ii) unmapped sequences lead to confusion between paralogous genes and genes from different haplotypes; and (iii) the lack of chromosome assignment and orientation of the sequencing contigs does not allow for reconstructing rearrangement phylogeny and studying chromosome evolution. Developing high-resolution physical maps for species with newly sequenced genomes is a timely and cost-effective investment that will facilitate genome annotation, evolutionary analysis, and re-sequencing of individual genomes from natural populations. Here, we present innovative approaches to chromosome preparation, fluorescent in situ hybridization (FISH), and imaging that facilitate rapid development of physical maps. Using An. gambiae as an example, we demonstrate that the development of physical chromosome maps can potentially improve genome assemblies and, thus, the quality of genomic analyses. First, we use a high-pressure method to prepare polytene chromosome spreads. This method, originally developed for Drosophila, allows the user to visualize more details on chromosomes than the regular squashing technique. Second, a fully automated, front-end system for FISH is used for high-throughput physical genome mapping. The automated slide staining system runs multiple assays simultaneously and dramatically reduces hands-on time. Third, an automatic fluorescent imaging system, which includes a motorized slide stage, automatically scans and photographs labeled chromosomes after FISH. This system is especially useful for identifying and visualizing multiple chromosomal plates on the same slide. In addition, the scanning process captures a more uniform FISH result. Overall, the automated high-throughput physical mapping protocol is more efficient than a standard manual protocol.
2012-01-01
Background Using first-pass MRA (FP-MRA) spatial resolution is limited by breath-hold duration. In addition, image quality may be hampered by respiratory and cardiac motion artefacts. In order to overcome these limitations an ECG- and navigator-gated high-resolution-MRA sequence (HR-MRA) with slow infusion of extracellular contrast agent was implemented at 3 Tesla for the assessment of congenital heart disease and compared to standard first-pass-MRA (FP-MRA). Methods 34 patients (median age: 13 years) with congenital heart disease (CHD) were prospectively examined on a 3 Tesla system. The CMR-protocol comprised functional imaging, FP- and HR-MRA, and viability imaging. After the acquisition of the FP-MRA sequence using a single dose of extracellular contrast agent the motion compensated HR-MRA sequence with isotropic resolution was acquired while injecting the second single dose, utilizing the timeframe before viability imaging. Qualitative scores for image quality (two independent reviewers) as well as quantitative measurements of vessel sharpness and relative contrast were compared using the Wilcoxon signed-rank test. Quantitative measurements of vessel diameters were compared using the Bland-Altman test. Results The mean image quality score revealed significantly better image quality of the HR-MRA sequence compared to the FP-MRA sequence in all vessels of interest (ascending aorta (AA), left pulmonary artery (LPA), left superior pulmonary vein (LSPV), coronary sinus (CS), and coronary ostia (CO); all p < 0.0001). In comparison to FP-MRA, HR-MRA revealed significantly better vessel sharpness for all considered vessels (AA, LSPV and LPA; all p < 0.0001). The relative contrast of the HR-MRA sequence was less compared to the FP-MRA sequence (AA: p <0.028, main pulmonary artery: p <0.004, LSPV: p <0.005). Both, the results of the intra- and interobserver measurements of the vessel diameters revealed closer correlation and closer 95 % limits of agreement for the HR-MRA. HR-MRA revealed one additional clinical finding, missed by FP-MRA. Conclusions An ECG- and navigator-gated HR-MRA-protocol with infusion of extracellular contrast agent at 3 Tesla is feasible. HR-MRA delivers significantly better image quality and vessel sharpness compared to FP-MRA. It may be integrated into a standard CMR-protocol for patients with CHD without the need for additional contrast agent injection and without any additional examination time. PMID:23107424
Land, Sally; Zhou, Julian; Cunningham, Philip; Sohn, Annette H; Singtoroj, Thida; Katzenstein, David; Mann, Marita; Sayer, David; Kantor, Rami
2013-01-01
Background The TREAT Asia Quality Assessment Scheme (TAQAS) was developed as a quality assessment programme through expert education and training, for laboratories in the Asia-Pacific and Africa that perform HIV drug-resistance (HIVDR) genotyping. We evaluated the programme performance and factors associated with high-quality HIVDR genotyping. Methods Laboratories used their standard protocols to test panels of human immunodeficiency virus (HIV)-positive plasma samples or electropherograms. Protocols were documented and performance was evaluated according to a newly developed scoring system, agreement with panel-specific consensus sequence, and detection of drug-resistance mutations (DRMs) and mixtures of wild-type and resistant virus (mixtures). High-quality performance was defined as detection of ≥95% DRMs. Results Over 4.5 years, 23 participating laboratories in 13 countries tested 45 samples (30 HIV-1 subtype B; 15 non-B subtypes) in nine panels. Median detection of DRMs was 88–98% in plasma panels and 90–97% in electropherogram panels. Laboratories were supported to amend and improve their test outcomes as appropriate. Three laboratories that detected <80% DRMs in early panels demonstrated subsequent improvement. Sample complexity factors – number of DRMs (p<0.001) and number of DRMs as mixtures (p<0.001); and laboratory performance factors – detection of mixtures (p<0.001) and agreement with consensus sequence (p<0.001), were associated with high performance; sample format (plasma or electropherogram), subtype and genotyping protocol were not. Conclusion High-quality HIVDR genotyping was achieved in the TAQAS collaborative laboratory network. Sample complexity and detection of mixtures were associated with performance quality. Laboratories conducting HIVDR genotyping are encouraged to participate in quality assessment programmes. PMID:23845227
Shinozuka, Hiroshi; Forster, John W
2016-01-01
Background. Multiplexed sequencing is commonly performed on massively parallel short-read sequencing platforms such as Illumina, and the efficiency of library normalisation can affect the quality of the output dataset. Although several library normalisation approaches have been established, none are ideal for highly multiplexed sequencing due to issues of cost and/or processing time. Methods. An inexpensive and high-throughput library quantification method has been developed, based on an adaptation of the melting curve assay. Sequencing libraries were subjected to the assay using the Bio-Rad Laboratories CFX Connect(TM) Real-Time PCR Detection System. The library quantity was calculated through summation of reduction of relative fluorescence units between 86 and 95 °C. Results.PCR-enriched sequencing libraries are suitable for this quantification without pre-purification of DNA. Short DNA molecules, which ideally should be eliminated from the library for subsequent processing, were differentiated from the target DNA in a mixture on the basis of differences in melting temperature. Quantification results for long sequences targeted using the melting curve assay were correlated with those from existing methods (R (2) > 0.77), and that observed from MiSeq sequencing (R (2) = 0.82). Discussion.The results of multiplexed sequencing suggested that the normalisation performance of the described method is equivalent to that of another recently reported high-throughput bead-based method, BeNUS. However, costs for the melting curve assay are considerably lower and processing times shorter than those of other existing methods, suggesting greater suitability for highly multiplexed sequencing applications.
Haplotype estimation using sequencing reads.
Delaneau, Olivier; Howie, Bryan; Cox, Anthony J; Zagury, Jean-François; Marchini, Jonathan
2013-10-03
High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Linking microarray reporters with protein functions
Gaj, Stan; van Erk, Arie; van Haaften, Rachel IM; Evelo, Chris TA
2007-01-01
Background The analysis of microarray experiments requires accurate and up-to-date functional annotation of the microarray reporters to optimize the interpretation of the biological processes involved. Pathway visualization tools are used to connect gene expression data with existing biological pathways by using specific database identifiers that link reporters with elements in the pathways. Results This paper proposes a novel method that aims to improve microarray reporter annotation by BLASTing the original reporter sequences against a species-specific EMBL subset, that was derived from and crosslinked back to the highly curated UniProt database. The resulting alignments were filtered using high quality alignment criteria and further compared with the outcome of a more traditional approach, where reporter sequences were BLASTed against EnsEMBL followed by locating the corresponding protein (UniProt) entry for the high quality hits. Combining the results of both methods resulted in successful annotation of > 58% of all reporter sequences with UniProt IDs on two commercial array platforms, increasing the amount of Incyte reporters that could be coupled to Gene Ontology terms from 32.7% to 58.3% and to a local GenMAPP pathway from 9.6% to 16.7%. For Agilent, 35.3% of the total reporters are now linked towards GO nodes and 7.1% on local pathways. Conclusion Our methods increased the annotation quality of microarray reporter sequences and allowed us to visualize more reporters using pathway visualization tools. Even in cases where the original reporter annotation showed the correct description the new identifiers often allowed improved pathway and Gene Ontology linking. These methods are freely available at http://www.bigcat.unimaas.nl/public/publications/Gaj_Annotation/. PMID:17897448
Dilliott, Allison A; Farhan, Sali M K; Ghani, Mahdi; Sato, Christine; Liang, Eric; Zhang, Ming; McIntyre, Adam D; Cao, Henian; Racacho, Lemuel; Robinson, John F; Strong, Michael J; Masellis, Mario; Bulman, Dennis E; Rogaeva, Ekaterina; Lang, Anthony; Tartaglia, Carmela; Finger, Elizabeth; Zinman, Lorne; Turnbull, John; Freedman, Morris; Swartz, Rick; Black, Sandra E; Hegele, Robert A
2018-04-04
Next-generation sequencing (NGS) is quickly revolutionizing how research into the genetic determinants of constitutional disease is performed. The technique is highly efficient with millions of sequencing reads being produced in a short time span and at relatively low cost. Specifically, targeted NGS is able to focus investigations to genomic regions of particular interest based on the disease of study. Not only does this further reduce costs and increase the speed of the process, but it lessens the computational burden that often accompanies NGS. Although targeted NGS is restricted to certain regions of the genome, preventing identification of potential novel loci of interest, it can be an excellent technique when faced with a phenotypically and genetically heterogeneous disease, for which there are previously known genetic associations. Because of the complex nature of the sequencing technique, it is important to closely adhere to protocols and methodologies in order to achieve sequencing reads of high coverage and quality. Further, once sequencing reads are obtained, a sophisticated bioinformatics workflow is utilized to accurately map reads to a reference genome, to call variants, and to ensure the variants pass quality metrics. Variants must also be annotated and curated based on their clinical significance, which can be standardized by applying the American College of Medical Genetics and Genomics Pathogenicity Guidelines. The methods presented herein will display the steps involved in generating and analyzing NGS data from a targeted sequencing panel, using the ONDRISeq neurodegenerative disease panel as a model, to identify variants that may be of clinical significance.
Construction of a scFv Library with Synthetic, Non-combinatorial CDR Diversity.
Bai, Xuelian; Shim, Hyunbo
2017-01-01
Many large synthetic antibody libraries have been designed, constructed, and successfully generated high-quality antibodies suitable for various demanding applications. While synthetic antibody libraries have many advantages such as optimized framework sequences and a broader sequence landscape than natural antibodies, their sequence diversities typically are generated by random combinatorial synthetic processes which cause the incorporation of many undesired CDR sequences. Here, we describe the construction of a synthetic scFv library using oligonucleotide mixtures that contain predefined, non-combinatorially synthesized CDR sequences. Each CDR is first inserted to a master scFv framework sequence and the resulting single-CDR libraries are subjected to a round of proofread panning. The proofread CDR sequences are assembled to produce the final scFv library with six diversified CDRs.
Novel method for high-throughput colony PCR screening in nanoliter-reactors
Walser, Marcel; Pellaux, Rene; Meyer, Andreas; Bechtold, Matthias; Vanderschuren, Herve; Reinhardt, Richard; Magyar, Joseph; Panke, Sven; Held, Martin
2009-01-01
We introduce a technology for the rapid identification and sequencing of conserved DNA elements employing a novel suspension array based on nanoliter (nl)-reactors made from alginate. The reactors have a volume of 35 nl and serve as reaction compartments during monoseptic growth of microbial library clones, colony lysis, thermocycling and screening for sequence motifs via semi-quantitative fluorescence analyses. nl-Reactors were kept in suspension during all high-throughput steps which allowed performing the protocol in a highly space-effective fashion and at negligible expenses of consumables and reagents. As a first application, 11 high-quality microsatellites for polymorphism studies in cassava were isolated and sequenced out of a library of 20 000 clones in 2 days. The technology is widely scalable and we envision that throughputs for nl-reactor based screenings can be increased up to 100 000 and more samples per day thereby efficiently complementing protocols based on established deep-sequencing technologies. PMID:19282448
Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing
Tsai, Yu-Chih; Deming, Clayton; Segre, Julia A.; Kong, Heidi H.; Korlach, Jonas
2016-01-01
ABSTRACT Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation. PMID:26861018
Sequence-Based Genotyping for Marker Discovery and Co-Dominant Scoring in Germplasm and Populations
Truong, Hoa T.; Ramos, A. Marcos; Yalcin, Feyruz; de Ruiter, Marjo; van der Poel, Hein J. A.; Huvenaars, Koen H. J.; Hogers, René C. J.; van Enckevort, Leonora. J. G.; Janssen, Antoine; van Orsouw, Nathalie J.; van Eijk, Michiel J. T.
2012-01-01
Conventional marker-based genotyping platforms are widely available, but not without their limitations. In this context, we developed Sequence-Based Genotyping (SBG), a technology for simultaneous marker discovery and co-dominant scoring, using next-generation sequencing. SBG offers users several advantages including a generic sample preparation method, a highly robust genome complexity reduction strategy to facilitate de novo marker discovery across entire genomes, and a uniform bioinformatics workflow strategy to achieve genotyping goals tailored to individual species, regardless of the availability of a reference sequence. The most distinguishing features of this technology are the ability to genotype any population structure, regardless whether parental data is included, and the ability to co-dominantly score SNP markers segregating in populations. To demonstrate the capabilities of SBG, we performed marker discovery and genotyping in Arabidopsis thaliana and lettuce, two plant species of diverse genetic complexity and backgrounds. Initially we obtained 1,409 SNPs for arabidopsis, and 5,583 SNPs for lettuce. Further filtering of the SNP dataset produced over 1,000 high quality SNP markers for each species. We obtained a genotyping rate of 201.2 genotypes/SNP and 58.3 genotypes/SNP for arabidopsis (n = 222 samples) and lettuce (n = 87 samples), respectively. Linkage mapping using these SNPs resulted in stable map configurations. We have therefore shown that the SBG approach presented provides users with the utmost flexibility in garnering high quality markers that can be directly used for genotyping and downstream applications. Until advances and costs will allow for routine whole-genome sequencing of populations, we expect that sequence-based genotyping technologies such as SBG will be essential for genotyping of model and non-model genomes alike. PMID:22662172
Bergman, Casey M.; Haddrill, Penelope R.
2015-01-01
To contribute to our general understanding of the evolutionary forces that shape variation in genome sequences in nature, we have sequenced genomes from 50 isofemale lines and six pooled samples from populations of Drosophila melanogaster on three continents. Analysis of raw and reference-mapped reads indicates the quality of these genomic sequence data is very high. Comparison of the predicted and experimentally-determined Wolbachia infection status of these samples suggests that strain or sample swaps are unlikely to have occurred in the generation of these data. Genome sequences are freely available in the European Nucleotide Archive under accession ERP009059. Isofemale lines can be obtained from the Drosophila Species Stock Center. PMID:25717372
Bergman, Casey M; Haddrill, Penelope R
2015-01-01
To contribute to our general understanding of the evolutionary forces that shape variation in genome sequences in nature, we have sequenced genomes from 50 isofemale lines and six pooled samples from populations of Drosophila melanogaster on three continents. Analysis of raw and reference-mapped reads indicates the quality of these genomic sequence data is very high. Comparison of the predicted and experimentally-determined Wolbachia infection status of these samples suggests that strain or sample swaps are unlikely to have occurred in the generation of these data. Genome sequences are freely available in the European Nucleotide Archive under accession ERP009059. Isofemale lines can be obtained from the Drosophila Species Stock Center.
Goettel, Wolfgang; Xia, Eric; Upchurch, Robert; Wang, Ming-Li; Chen, Pengyin; An, Yong-Qiang Charles
2014-04-23
Variation in seed oil composition and content among soybean varieties is largely attributed to differences in transcript sequences and/or transcript accumulation of oil production related genes in seeds. Discovery and analysis of sequence and expression variations in these genes will accelerate soybean oil quality improvement. In an effort to identify these variations, we sequenced the transcriptomes of soybean seeds from nine lines varying in oil composition and/or total oil content. Our results showed that 69,338 distinct transcripts from 32,885 annotated genes were expressed in seeds. A total of 8,037 transcript expression polymorphisms and 50,485 transcript sequence polymorphisms (48,792 SNPs and 1,693 small Indels) were identified among the lines. Effects of the transcript polymorphisms on their encoded protein sequences and functions were predicted. The studies also provided independent evidence that the lack of FAD2-1A gene activity and a non-synonymous SNP in the coding sequence of FAB2C caused elevated oleic acid and stearic acid levels in soybean lines M23 and FAM94-41, respectively. As a proof-of-concept, we developed an integrated RNA-seq and bioinformatics approach to identify and functionally annotate transcript polymorphisms, and demonstrated its high effectiveness for discovery of genetic and transcript variations that result in altered oil quality traits. The collection of transcript polymorphisms coupled with their predicted functional effects will be a valuable asset for further discovery of genes, gene variants, and functional markers to improve soybean oil quality.
Utturkar, Sagar M.; Klingeman, Dawn Marie; Land, Miriam L.; ...
2014-06-14
Our motivation with this work was to assess the potential of different types of sequence data combined with de novo and hybrid assembly approaches to improve existing draft genome sequences. Our results show Illumina, 454 and PacBio sequencing technologies were used to generate de novo and hybrid genome assemblies for four different bacteria, which were assessed for quality using summary statistics (e.g. number of contigs, N50) and in silico evaluation tools. Differences in predictions of multiple copies of rDNA operons for each respective bacterium were evaluated by PCR and Sanger sequencing, and then the validated results were applied as anmore » additional criterion to rank assemblies. In general, assemblies using longer PacBio reads were better able to resolve repetitive regions. In this study, the combination of Illumina and PacBio sequence data assembled through the ALLPATHS-LG algorithm gave the best summary statistics and most accurate rDNA operon number predictions. This study will aid others looking to improve existing draft genome assemblies. As to availability and implementation–all assembly tools except CLC Genomics Workbench are freely available under GNU General Public License.« less
Identification, validation and high-throughput genotyping of transcribed gene SNPs in cassava.
Ferguson, Morag E; Hearne, Sarah J; Close, Timothy J; Wanamaker, Steve; Moskal, William A; Town, Christopher D; de Young, Joe; Marri, Pradeep Reddy; Rabbi, Ismail Yusuf; de Villiers, Etienne P
2012-03-01
The availability of genomic resources can facilitate progress in plant breeding through the application of advanced molecular technologies for crop improvement. This is particularly important in the case of less researched crops such as cassava, a staple and food security crop for more than 800 million people. Here, expressed sequence tags (ESTs) were generated from five drought stressed and well-watered cassava varieties. Two cDNA libraries were developed: one from root tissue (CASR), the other from leaf, stem and stem meristem tissue (CASL). Sequencing generated 706 contigs and 3,430 singletons. These sequences were combined with those from two other EST sequencing initiatives and filtered based on the sequence quality. Quality sequences were aligned using CAP3 and embedded in a Windows browser called HarvEST:Cassava which is made available. HarvEST:Cassava consists of a Unigene set of 22,903 quality sequences. A total of 2,954 putative SNPs were identified. Of these 1,536 SNPs from 1,170 contigs and 53 cassava genotypes were selected for SNP validation using Illumina's GoldenGate assay. As a result 1,190 SNPs were validated technically and biologically. The location of validated SNPs on scaffolds of the cassava genome sequence (v.4.1) is provided. A diversity assessment of 53 cassava varieties reveals some sub-structure based on the geographical origin, greater diversity in the Americas as opposed to Africa, and similar levels of diversity in West Africa and southern, eastern and central Africa. The resources presented allow for improved genetic dissection of economically important traits and the application of modern genomics-based approaches to cassava breeding and conservation.
Kim, Byung Kwon; Lee, Seong Hyuk; Kim, Seon-Young; Jeong, Haeyoung; Kwon, Soon-Kyeong; Lee, Choong Hoon; Song, Ju Yeon; Yu, Dong Su
2012-01-01
Thermococcus zilligii, a thermophilic anaerobe in freshwater, is useful for physiological research and biotechnological applications. Here we report the high-quality draft genome sequence of T. zilligii AN1T. The genome contains a number of genes for an immune system and adaptation to a microbial biomass-rich environment as well as hydrogenase genes. PMID:22740682
Complete Genome Sequence of the Endophytic Bacterium Burkholderia sp. Strain KJ006
Kwak, Min-Jung; Song, Ju Yeon; Kim, Seon-Young; Jeong, Haeyoung; Kang, Sung Gyun; Kim, Byung Kwon; Kwon, Soon-Kyeong; Lee, Choong Hoon; Yu, Dong Su
2012-01-01
Endophytes live inside plant tissues without causing any harm and may even benefit plants. Here, we provide the high-quality genome sequence of Burkholderia sp. strain KJ006, an endophytic bacterium of rice with antifungal activity. The 6.6-Mb genome, consisting of three chromosomes and a single plasmid, contains genes related to plant growth promotion or degradation of aromatic compounds. PMID:22843575
Schreck, Katharina; Herbold, Craig W.; Daims, Holger; Wagner, Michael; Loy, Alexander
2018-01-01
ABSTRACT The facultative anaerobic chemoorganoheterotrophic alphaproteobacterium Telmatospirillum siberiense 26-4b1 was isolated from a Siberian peatland. We report here a 6.20-Mbp near-complete high-quality draft genome sequence of T. siberiense that reveals expected and novel metabolic potential for the genus Telmatospirillum, including genes for sulfur oxidation. PMID:29371357
Assembly and diploid architecture of an individual human genome via single-molecule technologies
Pendleton, Matthew; Sebra, Robert; Pang, Andy Wing Chun; Ummat, Ajay; Franzen, Oscar; Rausch, Tobias; Stütz, Adrian M; Stedman, William; Anantharaman, Thomas; Hastie, Alex; Dai, Heng; Fritz, Markus Hsi-Yang; Cao, Han; Cohain, Ariella; Deikus, Gintaras; Durrett, Russell E; Blanchard, Scott C; Altman, Roger; Chin, Chen-Shan; Guo, Yan; Paxinos, Ellen E; Korbel, Jan O; Darnell, Robert B; McCombie, W Richard; Kwok, Pui-Yan; Mason, Christopher E; Schadt, Eric E; Bashir, Ali
2015-01-01
We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality. PMID:26121404
Assembly and diploid architecture of an individual human genome via single-molecule technologies.
Pendleton, Matthew; Sebra, Robert; Pang, Andy Wing Chun; Ummat, Ajay; Franzen, Oscar; Rausch, Tobias; Stütz, Adrian M; Stedman, William; Anantharaman, Thomas; Hastie, Alex; Dai, Heng; Fritz, Markus Hsi-Yang; Cao, Han; Cohain, Ariella; Deikus, Gintaras; Durrett, Russell E; Blanchard, Scott C; Altman, Roger; Chin, Chen-Shan; Guo, Yan; Paxinos, Ellen E; Korbel, Jan O; Darnell, Robert B; McCombie, W Richard; Kwok, Pui-Yan; Mason, Christopher E; Schadt, Eric E; Bashir, Ali
2015-08-01
We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality.
Sequence Data for Clostridium autoethanogenum using Three Generations of Sequencing Technologies
Utturkar, Sagar M.; Klingeman, Dawn Marie; Bruno-Barcena, José M.; ...
2015-04-14
During the past decade, DNA sequencing output has been mostly dominated by the second generation sequencing platforms which are characterized by low cost, high throughput and shorter read lengths for example, Illumina. The emergence and development of so called third generation sequencing platforms such as PacBio has permitted exceptionally long reads (over 20 kb) to be generated. Due to read length increases, algorithm improvements and hybrid assembly approaches, the concept of one chromosome, one contig and automated finishing of microbial genomes is now a realistic and achievable task for many microbial laboratories. In this paper, we describe high quality sequencemore » datasets which span three generations of sequencing technologies, containing six types of data from four NGS platforms and originating from a single microorganism, Clostridium autoethanogenum. The dataset reported here will be useful for the scientific community to evaluate upcoming NGS platforms, enabling comparison of existing and novel bioinformatics approaches and will encourage interest in the development of innovative experimental and computational methods for NGS data.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Osman, Wan Adnawani Meor; van Berkum, Peter; León-Barrios, Milagros
Ensifer meliloti Mlalz-1 (INSDC = ATZD00000000) is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-fixing nodule of Medicago laciniata (L.) Miller from a soil sample collected near the town of Guatiza on the island of Lanzarote, the Canary Islands, Spain. This strain nodulates and forms an effective symbiosis with the highly specific host M. laciniata. This rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) sequencing project. Here in this paper, the features of E. meliloti Mlalz-1 are described, together with high-qualitymore » permanent draft genome sequence information and annotation. The 6,664,116 bp high-quality draft genome is arranged in 99 scaffolds of 100 contigs, containing 6314 protein-coding genes and 74 RNA-only encoding genes. Strain Mlalz-1 is closely related to Ensifer meliloti IAM 12611 T, Ensifer medicae A 321T and Ensifer numidicus ORS 1407 T, based on 16S rRNA gene sequences. gANI values of ≥98.1% support the classification of strain Mlalz-1 as E. meliloti . Nodulation of M. laciniata requires a specific nodC allele, and the nodC gene of strain Mlalz-1 shares ≥98% sequence identity with nodC of M. laciniata-nodulating Ensifer strains, but ≤93% with nodC of Ensifer strains that nodulate other Medicago species. Strain Mlalz-1 is unique among sequenced E. meliloti strains in possessing genes encoding components of a T2SS and in having two versions of the adaptive acid tolerance response lpiA-acvB operon. In E. medicae strain WSM419, lpiA is essential for enhancing survival in lethal acid conditions. The second copy of the lpiA-acvB operon of strain Mlalz-1 has highest sequence identity (> 96%) with that of E. medicae strains, which suggests genetic recombination between strain Mlalz-1 and E. medicae and the horizontal gene transfer of lpiA-acvB.« less
Osman, Wan Adnawani Meor; van Berkum, Peter; León-Barrios, Milagros; ...
2017-09-25
Ensifer meliloti Mlalz-1 (INSDC = ATZD00000000) is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-fixing nodule of Medicago laciniata (L.) Miller from a soil sample collected near the town of Guatiza on the island of Lanzarote, the Canary Islands, Spain. This strain nodulates and forms an effective symbiosis with the highly specific host M. laciniata. This rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) sequencing project. Here in this paper, the features of E. meliloti Mlalz-1 are described, together with high-qualitymore » permanent draft genome sequence information and annotation. The 6,664,116 bp high-quality draft genome is arranged in 99 scaffolds of 100 contigs, containing 6314 protein-coding genes and 74 RNA-only encoding genes. Strain Mlalz-1 is closely related to Ensifer meliloti IAM 12611 T, Ensifer medicae A 321T and Ensifer numidicus ORS 1407 T, based on 16S rRNA gene sequences. gANI values of ≥98.1% support the classification of strain Mlalz-1 as E. meliloti . Nodulation of M. laciniata requires a specific nodC allele, and the nodC gene of strain Mlalz-1 shares ≥98% sequence identity with nodC of M. laciniata-nodulating Ensifer strains, but ≤93% with nodC of Ensifer strains that nodulate other Medicago species. Strain Mlalz-1 is unique among sequenced E. meliloti strains in possessing genes encoding components of a T2SS and in having two versions of the adaptive acid tolerance response lpiA-acvB operon. In E. medicae strain WSM419, lpiA is essential for enhancing survival in lethal acid conditions. The second copy of the lpiA-acvB operon of strain Mlalz-1 has highest sequence identity (> 96%) with that of E. medicae strains, which suggests genetic recombination between strain Mlalz-1 and E. medicae and the horizontal gene transfer of lpiA-acvB.« less
Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing.
Tourlousse, Dieter M; Yoshiike, Satowa; Ohashi, Akiko; Matsukura, Satoko; Noda, Naohiro; Sekiguchi, Yuji
2017-02-28
High-throughput sequencing of 16S rRNA gene amplicons (16S-seq) has become a widely deployed method for profiling complex microbial communities but technical pitfalls related to data reliability and quantification remain to be fully addressed. In this work, we have developed and implemented a set of synthetic 16S rRNA genes to serve as universal spike-in standards for 16S-seq experiments. The spike-ins represent full-length 16S rRNA genes containing artificial variable regions with negligible identity to known nucleotide sequences, permitting unambiguous identification of spike-in sequences in 16S-seq read data from any microbiome sample. Using defined mock communities and environmental microbiota, we characterized the performance of the spike-in standards and demonstrated their utility for evaluating data quality on a per-sample basis. Further, we showed that staggered spike-in mixtures added at the point of DNA extraction enable concurrent estimation of absolute microbial abundances suitable for comparative analysis. Results also underscored that template-specific Illumina sequencing artifacts may lead to biases in the perceived abundance of certain taxa. Taken together, the spike-in standards represent a novel bioanalytical tool that can substantially improve 16S-seq-based microbiome studies by enabling comprehensive quality control along with absolute quantification. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
DNA extraction for streamlined metagenomics of diverse environmental samples.
Marotz, Clarisse; Amir, Amnon; Humphrey, Greg; Gaffney, James; Gogul, Grant; Knight, Rob
2017-06-01
A major bottleneck for metagenomic sequencing is rapid and efficient DNA extraction. Here, we compare the extraction efficiencies of three magnetic bead-based platforms (KingFisher, epMotion, and Tecan) to a standardized column-based extraction platform across a variety of sample types, including feces, oral, skin, soil, and water. Replicate sample plates were extracted and prepared for 16S rRNA gene amplicon sequencing in parallel to assess extraction bias and DNA quality. The data demonstrate that any effect of extraction method on sequencing results was small compared with the variability across samples; however, the KingFisher platform produced the largest number of high-quality reads in the shortest amount of time. Based on these results, we have identified an extraction pipeline that dramatically reduces sample processing time without sacrificing bacterial taxonomic or abundance information.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tiwari, Ravi; Howieson, John; Yates, Ron
Bradyrhizobium sp. WSM1253 is a novel N 2-fixing bacterium isolated from a root nodule of the herbaceous annual legume Ornithopus compressus that was growing on the Greek Island of Sifnos. WSM1253 emerged as a strain of interest in an Australian program that was selecting inoculant quality bradyrhizobial strains for inoculation of Mediterranean species of lupins ( Lupinus angustifolius, L. princei, L. atlanticus, L. pilosus ). In this report we describe, for the first time, the genome sequence information and annotation of this legume microsymbiont. The 8,719,808 bp genome has a G + C content of 63.09 % with 71 contigsmore » arranged into two scaffolds. The assembled genome contains 8,432 protein-coding genes, 66 RNA genes and a single rRNA operon. In conclusion, this improved-high-quality draft rhizobial genome is one of 20 sequenced through a DOE Joint Genome Institute 2010 Community Sequencing Project.« less
Tiwari, Ravi; Howieson, John; Yates, Ron; ...
2015-11-30
Bradyrhizobium sp. WSM1253 is a novel N 2-fixing bacterium isolated from a root nodule of the herbaceous annual legume Ornithopus compressus that was growing on the Greek Island of Sifnos. WSM1253 emerged as a strain of interest in an Australian program that was selecting inoculant quality bradyrhizobial strains for inoculation of Mediterranean species of lupins ( Lupinus angustifolius, L. princei, L. atlanticus, L. pilosus ). In this report we describe, for the first time, the genome sequence information and annotation of this legume microsymbiont. The 8,719,808 bp genome has a G + C content of 63.09 % with 71 contigsmore » arranged into two scaffolds. The assembled genome contains 8,432 protein-coding genes, 66 RNA genes and a single rRNA operon. In conclusion, this improved-high-quality draft rhizobial genome is one of 20 sequenced through a DOE Joint Genome Institute 2010 Community Sequencing Project.« less
Center for Inherited Disease Research (CIDR)
The Center for Inherited Disease Research (CIDR) Program at The Johns Hopkins University provides high-quality next generation sequencing and genotyping services to investigators working to discover genes that contribute to common diseases.
DNA barcode identification of Podocarpaceae--the second largest conifer family.
Little, Damon P; Knopf, Patrick; Schulz, Christian
2013-01-01
We have generated matK, rbcL, and nrITS2 DNA barcodes for 320 specimens representing all 18 extant genera of the conifer family Podocarpaceae. The sample includes 145 of the 198 recognized species. Comparative analyses of sequence quality and species discrimination were conducted on the 159 individuals from which all three markers were recovered (representing 15 genera and 97 species). The vast majority of sequences were of high quality (B 30 = 0.596-0.989). Even the lowest quality sequences exceeded the minimum requirements of the BARCODE data standard. In the few instances that low quality sequences were generated, the responsible mechanism could not be discerned. There were no statistically significant differences in the discriminatory power of markers or marker combinations (p = 0.05). The discriminatory power of the barcode markers individually and in combination is low (56.7% of species at maximum). In some instances, species discrimination failed in spite of ostensibly useful variation being present (genotypes were shared among species), but in many cases there was simply an absence of sequence variation. Barcode gaps (maximum intraspecific p-distance > minimum interspecific p-distance) were observed in 50.5% of species when all three markers were considered simultaneously. The presence of a barcode gap was not predictive of discrimination success (p = 0.02) and there was no statistically significant difference in the frequency of barcode gaps among markers (p = 0.05). In addition, there was no correlation between number of individuals sampled per species and the presence of a barcode gap (p = 0.27).
DNA Barcode Identification of Podocarpaceae—The Second Largest Conifer Family
Little, Damon P.; Knopf, Patrick; Schulz, Christian
2013-01-01
We have generated matK, rbcL, and nrITS2 DNA barcodes for 320 specimens representing all 18 extant genera of the conifer family Podocarpaceae. The sample includes 145 of the 198 recognized species. Comparative analyses of sequence quality and species discrimination were conducted on the 159 individuals from which all three markers were recovered (representing 15 genera and 97 species). The vast majority of sequences were of high quality (B 30 = 0.596–0.989). Even the lowest quality sequences exceeded the minimum requirements of the BARCODE data standard. In the few instances that low quality sequences were generated, the responsible mechanism could not be discerned. There were no statistically significant differences in the discriminatory power of markers or marker combinations (p = 0.05). The discriminatory power of the barcode markers individually and in combination is low (56.7% of species at maximum). In some instances, species discrimination failed in spite of ostensibly useful variation being present (genotypes were shared among species), but in many cases there was simply an absence of sequence variation. Barcode gaps (maximum intraspecific p–distance > minimum interspecific p–distance) were observed in 50.5% of species when all three markers were considered simultaneously. The presence of a barcode gap was not predictive of discrimination success (p = 0.02) and there was no statistically significant difference in the frequency of barcode gaps among markers (p = 0.05). In addition, there was no correlation between number of individuals sampled per species and the presence of a barcode gap (p = 0.27). PMID:24312258
Illumina Production Sequencing at the DOE Joint Genome Institute - Workflow and Optimizations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tarver, Angela; Fern, Alison; Diego, Matthew San
2010-06-18
The U.S. Department of Energy (DOE) Joint Genome Institute?s (JGI) Production Sequencing group is committed to the generation of high-quality genomic DNA sequence to support the DOE mission areas of renewable energy generation, global carbon management, and environmental characterization and clean-up. Within the JGI?s Production Sequencing group, the Illumina Genome Analyzer pipeline has been established as one of three sequencing platforms, along with Roche/454 and ABI/Sanger. Optimization of the Illumina pipeline has been ongoing with the aim of continual process improvement of the laboratory workflow. These process improvement projects are being led by the JGI?s Process Optimization, Sequencing Technologies, Instrumentation&more » Engineering, and the New Technology Production groups. Primary focus has been on improving the procedural ergonomics and the technicians? operating environment, reducing manually intensive technician operations with different tools, reducing associated production costs, and improving the overall process and generated sequence quality. The U.S. DOE JGI was established in 1997 in Walnut Creek, CA, to unite the expertise and resources of five national laboratories? Lawrence Berkeley, Lawrence Livermore, Los Alamos, Oak Ridge, and Pacific Northwest ? along with HudsonAlpha Institute for Biotechnology. JGI is operated by the University of California for the U.S. DOE.« less
Yang, Jun-Bo; Li, De-Zhu; Li, Hong-Tao
2014-09-01
Chloroplast genomes supply indispensable information that helps improve the phylogenetic resolution and even as organelle-scale barcodes. Next-generation sequencing technologies have helped promote sequencing of complete chloroplast genomes, but compared with the number of angiosperms, relatively few chloroplast genomes have been sequenced. There are two major reasons for the paucity of completely sequenced chloroplast genomes: (i) massive amounts of fresh leaves are needed for chloroplast sequencing and (ii) there are considerable gaps in the sequenced chloroplast genomes of many plants because of the difficulty of isolating high-quality chloroplast DNA, preventing complete chloroplast genomes from being assembled. To overcome these obstacles, all known angiosperm chloroplast genomes available to date were analysed, and then we designed nine universal primer pairs corresponding to the highly conserved regions. Using these primers, angiosperm whole chloroplast genomes can be amplified using long-range PCR and sequenced using next-generation sequencing methods. The primers showed high universality, which was tested using 24 species representing major clades of angiosperms. To validate the functionality of the primers, eight species representing major groups of angiosperms, that is, early-diverging angiosperms, magnoliids, monocots, Saxifragales, fabids, malvids and asterids, were sequenced and assembled their complete chloroplast genomes. In our trials, only 100 mg of fresh leaves was used. The results show that the universal primer set provided an easy, effective and feasible approach for sequencing whole chloroplast genomes in angiosperms. The designed universal primer pairs provide a possibility to accelerate genome-scale data acquisition and will therefore magnify the phylogenetic resolution and species identification in angiosperms. © 2014 John Wiley & Sons Ltd.
Method and Apparatus for Evaluating the Visual Quality of Processed Digital Video Sequences
NASA Technical Reports Server (NTRS)
Watson, Andrew B. (Inventor)
2002-01-01
A Digital Video Quality (DVQ) apparatus and method that incorporate a model of human visual sensitivity to predict the visibility of artifacts. The DVQ method and apparatus are used for the evaluation of the visual quality of processed digital video sequences and for adaptively controlling the bit rate of the processed digital video sequences without compromising the visual quality. The DVQ apparatus minimizes the required amount of memory and computation. The input to the DVQ apparatus is a pair of color image sequences: an original (R) non-compressed sequence, and a processed (T) sequence. Both sequences (R) and (T) are sampled, cropped, and subjected to color transformations. The sequences are then subjected to blocking and discrete cosine transformation, and the results are transformed to local contrast. The next step is a time filtering operation which implements the human sensitivity to different time frequencies. The results are converted to threshold units by dividing each discrete cosine transform coefficient by its respective visual threshold. At the next stage the two sequences are subtracted to produce an error sequence. The error sequence is subjected to a contrast masking operation, which also depends upon the reference sequence (R). The masked errors can be pooled in various ways to illustrate the perceptual error over various dimensions, and the pooled error can be converted to a visual quality measure.
Ramos, Rommel Thiago Jucá; Carneiro, Adriana Ribeiro; Soares, Siomar de Castro; dos Santos, Anderson Rodrigues; Almeida, Sintia; Guimarães, Luis; Figueira, Flávia; Barbosa, Eudes; Tauch, Andreas; Azevedo, Vasco; Silva, Artur
2013-03-01
New sequencing platforms have enabled rapid decoding of complete prokaryotic genomes at relatively low cost. The Ion Torrent platform is an example of these technologies, characterized by lower coverage, generating challenges for the genome assembly. One particular problem is the lack of genomes that enable reference-based assembly, such as the one used in the present study, Corynebacterium pseudotuberculosis biovar equi, which causes high economic losses in the US equine industry. The quality treatment strategy incorporated into the assembly pipeline enabled a 16-fold greater use of the sequencing data obtained compared with traditional quality filter approaches. Data preprocessing prior to the de novo assembly enabled the use of known methodologies in the next-generation sequencing data assembly. Moreover, manual curation was proved to be essential for ensuring a quality assembly, which was validated by comparative genomics with other species of the genus Corynebacterium. The present study presents a modus operandi that enables a greater and better use of data obtained from semiconductor sequencing for obtaining the complete genome from a prokaryotic microorganism, C. pseudotuberculosis, which is not a traditional biological model such as Escherichia coli. © 2012 The Authors. Published by Society for Applied Microbiology and Blackwell Publishing Ltd. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
Chauhan, Arjun; Sharma, J N; Modgil, Manju; Siddappa, Sundaresha
2018-05-29
Marssonina coronaria causes apple blotch disease resulting in severe premature defoliation, and is distributed in many leading apple-growing areas in the world. Effective, reliable and high-quality RNA extraction is an indispensable procedure in any molecular biology study. No method currently exists for RNA extraction from M. coronaria that produces a high quantity of melanin-free RNA. Therefore, we evaluated eight RNA extraction methods including manual and commercial kits, to yield a sufficient quantity of high-quality and melanin-free RNA. Manual methods used here resulted in low quality and black colored RNA pellets showing the presence of melanin, despite all the modifications employed to original procedures. However, these methods when coupled with clean up resulted in melanin-free RNA. On the other hand, all commercial kits used were able to yield high-quality melanin-free RNA having variable yields. TRIzol™ Reagent + RNA Clean & Concentrator™-5 and Ambion-PureLink® RNA Mini Kit were found to be the best methods as the RNA extracted with these methods from 15 day old fungal culture grown on solid medium were free of melanin with good yield. RNA extracted by this improved methodology was applied for RT-PCR, subsequent PCR amplification, and isolation of calmodulin gene sequences from M. coronaria and infected apple leaf pieces. These methods are more time effective than traditional methods and take only an hour to complete. To our knowledge, this is the first report on the method of isolation of high-quality RNA for cDNA synthesis as well as isolation of the calmodulin gene sequence from this fungus. Copyright © 2018 Elsevier B.V. All rights reserved.
Reading, Benjamin J; Chapman, Robert W; Schaff, Jennifer E; Scholl, Elizabeth H; Opperman, Charles H; Sullivan, Craig V
2012-02-21
The striped bass and its relatives (genus Morone) are important fisheries and aquaculture species native to estuaries and rivers of the Atlantic coast and Gulf of Mexico in North America. To open avenues of gene expression research on reproduction and breeding of striped bass, we generated a collection of expressed sequence tags (ESTs) from a complementary DNA (cDNA) library representative of their ovarian transcriptome. Sequences of a total of 230,151 ESTs (51,259,448 bp) were acquired by Roche 454 pyrosequencing of cDNA pooled from ovarian tissues obtained at all stages of oocyte growth, at ovulation (eggs), and during preovulatory atresia. Quality filtering of ESTs allowed assembly of 11,208 high-quality contigs ≥ 100 bp, including 2,984 contigs 500 bp or longer (average length 895 bp). Blastx comparisons revealed 5,482 gene orthologues (E-value < 10-3), of which 4,120 (36.7% of total contigs) were annotated with Gene Ontology terms (E-value < 10-6). There were 5,726 remaining unknown unique sequences (51.1% of total contigs). All of the high-quality EST sequences are available in the National Center for Biotechnology Information (NCBI) Short Read Archive (GenBank: SRX007394). Informative contigs were considered to be abundant if they were assembled from groups of ESTs comprising ≥ 0.15% of the total short read sequences (≥ 345 reads/contig). Approximately 52.5% of these abundant contigs were predicted to have predominant ovary expression through digital differential display in silico comparisons to zebrafish (Danio rerio) UniGene orthologues. Over 1,300 Gene Ontology terms from Biological Process classes of Reproduction, Reproductive process, and Developmental process were assigned to this collection of annotated contigs. This first large reference sequence database available for the ecologically and economically important temperate basses (genus Morone) provides a foundation for gene expression studies in these species. The predicted predominance of ovary gene expression and assignment of directly relevant Gene Ontology classes suggests a powerful utility of this dataset for analysis of ovarian gene expression related to fundamental questions of oogenesis. Additionally, a high definition Agilent 60-mer oligo ovary 'UniClone' microarray with 8 × 15,000 probe format has been designed based on this striped bass transcriptome (eArray Group: Striper Group, Design ID: 029004).
High-Frame-Rate Doppler Ultrasound Using a Repeated Transmit Sequence
Podkowa, Anthony S.; Oelze, Michael L.; Ketterling, Jeffrey A.
2018-01-01
The maximum detectable velocity of high-frame-rate color flow Doppler ultrasound is limited by the imaging frame rate when using coherent compounding techniques. Traditionally, high quality ultrasonic images are produced at a high frame rate via coherent compounding of steered plane wave reconstructions. However, this compounding operation results in an effective downsampling of the slow-time signal, thereby artificially reducing the frame rate. To alleviate this effect, a new transmit sequence is introduced where each transmit angle is repeated in succession. This transmit sequence allows for direct comparison between low resolution, pre-compounded frames at a short time interval in ways that are resistent to sidelobe motion. Use of this transmit sequence increases the maximum detectable velocity by a scale factor of the transmit sequence length. The performance of this new transmit sequence was evaluated using a rotating cylindrical phantom and compared with traditional methods using a 15-MHz linear array transducer. Axial velocity estimates were recorded for a range of ±300 mm/s and compared to the known ground truth. Using these new techniques, the root mean square error was reduced from over 400 mm/s to below 50 mm/s in the high-velocity regime compared to traditional techniques. The standard deviation of the velocity estimate in the same velocity range was reduced from 250 mm/s to 30 mm/s. This result demonstrates the viability of the repeated transmit sequence methods in detecting and quantifying high-velocity flow. PMID:29910966
A Bayesian framework for extracting human gait using strong prior knowledge.
Zhou, Ziheng; Prügel-Bennett, Adam; Damper, Robert I
2006-11-01
Extracting full-body motion of walking people from monocular video sequences in complex, real-world environments is an important and difficult problem, going beyond simple tracking, whose satisfactory solution demands an appropriate balance between use of prior knowledge and learning from data. We propose a consistent Bayesian framework for introducing strong prior knowledge into a system for extracting human gait. In this work, the strong prior is built from a simple articulated model having both time-invariant (static) and time-variant (dynamic) parameters. The model is easily modified to cater to situations such as walkers wearing clothing that obscures the limbs. The statistics of the parameters are learned from high-quality (indoor laboratory) data and the Bayesian framework then allows us to "bootstrap" to accurate gait extraction on the noisy images typical of cluttered, outdoor scenes. To achieve automatic fitting, we use a hidden Markov model to detect the phases of images in a walking cycle. We demonstrate our approach on silhouettes extracted from fronto-parallel ("sideways on") sequences of walkers under both high-quality indoor and noisy outdoor conditions. As well as high-quality data with synthetic noise and occlusions added, we also test walkers with rucksacks, skirts, and trench coats. Results are quantified in terms of chamfer distance and average pixel error between automatically extracted body points and corresponding hand-labeled points. No one part of the system is novel in itself, but the overall framework makes it feasible to extract gait from very much poorer quality image sequences than hitherto. This is confirmed by comparing person identification by gait using our method and a well-established baseline recognition algorithm.
Technical Considerations for Reduced Representation Bisulfite Sequencing with Multiplexed Libraries
Chatterjee, Aniruddha; Rodger, Euan J.; Stockwell, Peter A.; Weeks, Robert J.; Morison, Ian M.
2012-01-01
Reduced representation bisulfite sequencing (RRBS), which couples bisulfite conversion and next generation sequencing, is an innovative method that specifically enriches genomic regions with a high density of potential methylation sites and enables investigation of DNA methylation at single-nucleotide resolution. Recent advances in the Illumina DNA sample preparation protocol and sequencing technology have vastly improved sequencing throughput capacity. Although the new Illumina technology is now widely used, the unique challenges associated with multiplexed RRBS libraries on this platform have not been previously described. We have made modifications to the RRBS library preparation protocol to sequence multiplexed libraries on a single flow cell lane of the Illumina HiSeq 2000. Furthermore, our analysis incorporates a bioinformatics pipeline specifically designed to process bisulfite-converted sequencing reads and evaluate the output and quality of the sequencing data generated from the multiplexed libraries. We obtained an average of 42 million paired-end reads per sample for each flow-cell lane, with a high unique mapping efficiency to the reference human genome. Here we provide a roadmap of modifications, strategies, and trouble shooting approaches we implemented to optimize sequencing of multiplexed libraries on an a RRBS background. PMID:23193365
Optimization of Spiral-Based Pulse Sequences for First Pass Myocardial Perfusion Imaging
Salerno, Michael; Sica, Christopher T.; Kramer, Christopher M.; Meyer, Craig H.
2010-01-01
While spiral trajectories have multiple attractive features such as their isotropic resolution, acquisition efficiency, and robustness to motion, there has been limited application of these techniques to first pass perfusion imaging because of potential off-resonance and inconsistent data artifacts. Spiral trajectories may also be less sensitive to dark-rim artifacts (DRA) that are caused, at least in part, by cardiac motion. By careful consideration of the spiral trajectory readout duration, flip angle strategy, and image reconstruction strategy, spiral artifacts can be abated to create high quality first pass myocardial perfusion images with high SNR. The goal of this paper was to design interleaved spiral pulse sequences for first-pass myocardial perfusion imaging, and to evaluate them clinically for image quality and the presence of dark-rim, blurring, and dropout artifacts. PMID:21590802
Novel primers for complete mitochondrial cytochrome b genesequencing in mammals
Naidu, Ashwin; Fitak, Robert R.; Munguia-Vega, Adrian; Culver, Melanie
2011-01-01
Sequence-based species identification relies on the extent and integrity of sequence data available in online databases such as GenBank. When identifying species from a sample of unknown origin, partial DNA sequences obtained from the sample are aligned against existing sequences in databases. When the sequence from the matching species is not present in the database, high-scoring alignments with closely related sequences might produce unreliable results on species identity. For species identification in mammals, the cytochrome b (cyt b) gene has been identified to be highly informative; thus, large amounts of reference sequence data from the cyt b gene are much needed. To enhance availability of cyt b gene sequence data on a large number of mammalian species in GenBank and other such publicly accessible online databases, we identified a primer pair for complete cyt b gene sequencing in mammals. Using this primer pair, we successfully PCR amplified and sequenced the complete cyt b gene from 40 of 44 mammalian species representing 10 orders of mammals. We submitted 40 complete, correctly annotated, cyt b protein coding sequences to GenBank. To our knowledge, this is the first single primer pair to amplify the complete cyt b gene in a broad range of mammalian species. This primer pair can be used for the addition of new cyt b gene sequences and to enhance data available on species represented in GenBank. The availability of novel and complete gene sequences as high-quality reference data can improve the reliability of sequence-based species identification.
Genome sequencing of a single tardigrade Hypsibius dujardini individual
Arakawa, Kazuharu; Yoshida, Yuki; Tomita, Masaru
2016-01-01
Tardigrades are ubiquitous microscopic animals that play an important role in the study of metazoan phylogeny. Most terrestrial tardigrades can withstand extreme environments by entering an ametabolic desiccated state termed anhydrobiosis. Due to their small size and the non-axenic nature of laboratory cultures, molecular studies of tardigrades are prone to contamination. To minimize the possibility of microbial contaminations and to obtain high-quality genomic information, we have developed an ultra-low input library sequencing protocol to enable the genome sequencing of a single tardigrade Hypsibius dujardini individual. Here, we describe the details of our sequencing data and the ultra-low input library preparation methodologies. PMID:27529330
Genome sequencing of a single tardigrade Hypsibius dujardini individual.
Arakawa, Kazuharu; Yoshida, Yuki; Tomita, Masaru
2016-08-16
Tardigrades are ubiquitous microscopic animals that play an important role in the study of metazoan phylogeny. Most terrestrial tardigrades can withstand extreme environments by entering an ametabolic desiccated state termed anhydrobiosis. Due to their small size and the non-axenic nature of laboratory cultures, molecular studies of tardigrades are prone to contamination. To minimize the possibility of microbial contaminations and to obtain high-quality genomic information, we have developed an ultra-low input library sequencing protocol to enable the genome sequencing of a single tardigrade Hypsibius dujardini individual. Here, we describe the details of our sequencing data and the ultra-low input library preparation methodologies.
A Modified Protocol for High-Quality RNA Extraction from Oleoresin-Producing Adult Pines.
de Lima, Júlio César; Füller, Thanise Nogueira; de Costa, Fernanda; Rodrigues-Corrêa, Kelly C S; Fett-Neto, Arthur G
2016-01-01
RNA extraction resulting in good yields and quality is a fundamental step for the analyses of transcriptomes through high-throughput sequencing technologies, microarray, and also northern blots, RT-PCR, and RTqPCR. Even though many specific protocols designed for plants with high content of secondary metabolites have been developed, these are often expensive, time consuming, and not suitable for a wide range of tissues. Here we present a modification of the method previously described using the commercially available Concert™ Plant RNA Reagent (Invitrogen) buffer for field-grown adult pine trees with high oleoresin content.
Software for pre-processing Illumina next-generation sequencing short read sequences
2014-01-01
Background When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets. Methods We developed ngsShoRT (next-generation sequencing Short Reads Trimmer), a flexible and comprehensive open-source software package written in Perl that provides a set of algorithms commonly used for pre-processing NGS short read sequences. We compared the features and performance of ngsShoRT with existing tools: CutAdapt, NGS QC Toolkit and Trimmomatic. We also compared the effects of using pre-processed short read sequences generated by different algorithms on de novo and reference-based assembly for three different genomes: Caenorhabditis elegans, Saccharomyces cerevisiae S288c, and Escherichia coli O157 H7. Results Several combinations of ngsShoRT algorithms were tested on publicly available Illumina GA II, HiSeq 2000, and MiSeq eukaryotic and bacteria genomic short read sequences with the focus on removing sequencing artifacts and low-quality reads and/or bases. Our results show that across three organisms and three sequencing platforms, trimming improved the mean quality scores of trimmed sequences. Using trimmed sequences for de novo and reference-based assembly improved assembly quality as well as assembler performance. In general, ngsShoRT outperformed comparable trimming tools in terms of trimming speed and improvement of de novo and reference-based assembly as measured by assembly contiguity and correctness. Conclusions Trimming of short read sequences can improve the quality of de novo and reference-based assembly and assembler performance. The parallel processing capability of ngsShoRT reduces trimming time and improves the memory efficiency when dealing with large datasets. We recommend combining sequencing artifacts removal, and quality score based read filtering and base trimming as the most consistent method for improving sequence quality and downstream assemblies. ngsShoRT source code, user guide and tutorial are available at http://research.bioinformatics.udel.edu/genomics/ngsShoRT/. ngsShoRT can be incorporated as a pre-processing step in genome and transcriptome assembly projects. PMID:24955109
SILVA tree viewer: interactive web browsing of the SILVA phylogenetic guide trees.
Beccati, Alan; Gerken, Jan; Quast, Christian; Yilmaz, Pelin; Glöckner, Frank Oliver
2017-09-30
Phylogenetic trees are an important tool to study the evolutionary relationships among organisms. The huge amount of available taxa poses difficulties in their interactive visualization. This hampers the interaction with the users to provide feedback for the further improvement of the taxonomic framework. The SILVA Tree Viewer is a web application designed for visualizing large phylogenetic trees without requiring the download of any software tool or data files. The SILVA Tree Viewer is based on Web Geographic Information Systems (Web-GIS) technology with a PostgreSQL backend. It enables zoom and pan functionalities similar to Google Maps. The SILVA Tree Viewer enables access to two phylogenetic (guide) trees provided by the SILVA database: the SSU Ref NR99 inferred from high-quality, full-length small subunit sequences, clustered at 99% sequence identity and the LSU Ref inferred from high-quality, full-length large subunit sequences. The Tree Viewer provides tree navigation, search and browse tools as well as an interactive feedback system to collect any kinds of requests ranging from taxonomy to data curation and improving the tool itself.
Ba, Hengxing; Jia, Boyin; Wang, Guiwu; Yang, Yifeng; Kedem, Gilead; Li, Chunyi
2017-09-07
Sika deer are an economically valuable species owing to their use in traditional Chinese medicine, particularly their velvet antlers. Sika deer in northeast China are mostly farmed in enclosure. Therefore, genetic management of farmed sika deer would benefit from detailed knowledge of their genetic diversity. In this study, we generated over 1.45 billion high-quality paired-end reads (288 Gbp) across 42 unrelated individuals using double-digest restriction site-associated DNA sequencing (ddRAD-seq). A total of 96,188 (29.63%) putative biallelic SNP loci were identified with an average sequencing depth of 23×. Based on the analysis, we found that the majority of the loci had a deficit of heterozygotes (F IS >0) and low values of H obs , which could be due to inbreeding and Wahlund effects. We also developed a collection of high-quality SNP probes that will likely be useful in a variety of applications in genotyping for cervid species in the future. Copyright © 2017 Ba et al.
Ba, Hengxing; Jia, Boyin; Wang, Guiwu; Yang, Yifeng; Kedem, Gilead; Li, Chunyi
2017-01-01
Sika deer are an economically valuable species owing to their use in traditional Chinese medicine, particularly their velvet antlers. Sika deer in northeast China are mostly farmed in enclosure. Therefore, genetic management of farmed sika deer would benefit from detailed knowledge of their genetic diversity. In this study, we generated over 1.45 billion high-quality paired-end reads (288 Gbp) across 42 unrelated individuals using double-digest restriction site-associated DNA sequencing (ddRAD-seq). A total of 96,188 (29.63%) putative biallelic SNP loci were identified with an average sequencing depth of 23×. Based on the analysis, we found that the majority of the loci had a deficit of heterozygotes (FIS >0) and low values of Hobs, which could be due to inbreeding and Wahlund effects. We also developed a collection of high-quality SNP probes that will likely be useful in a variety of applications in genotyping for cervid species in the future. PMID:28751500
[cDNA library construction from panicle meristem of finger millet].
Radchuk, V; Pirko, Ia V; Isaenkov, S V; Emets, A I; Blium, Ia B
2014-01-01
The protocol for production of full-size cDNA using SuperScript Full-Length cDNA Library Construction Kit II (Invitrogen) was tested and high quality cDNA library from meristematic tissue of finger millet panicle (Eleusine coracana (L.) Gaertn) was created. The titer of obtained cDNA library comprised 3.01 x 10(5) CFU/ml in avarage. In average the length of cDNA insertion consisted about 1070 base pairs, the effectivity of cDNA fragment insertions--99.5%. The selective sequencing of cDNA clones from created library was performed. The sequences of cDNA clones were identified with usage of BLAST-search. The results of cDNA library analysis and selective sequencing represents prove good functionality and full length character of inserted cDNA clones. Obtained cDNA library from meristematic tissue of finger millet panicle represents good and valuable source for isolation and identification of key genes regulating metabolism and meristematic development and for mining of new molecular markers to conduct out high quality genetic investigations and molecular breeding as well.
HAMAP in 2013, new developments in the protein family classification and annotation system
Pedruzzi, Ivo; Rivoire, Catherine; Auchincloss, Andrea H.; Coudert, Elisabeth; Keller, Guillaume; de Castro, Edouard; Baratin, Delphine; Cuche, Béatrice A.; Bougueleret, Lydie; Poux, Sylvain; Redaschi, Nicole; Xenarios, Ioannis; Bridge, Alan
2013-01-01
HAMAP (High-quality Automated and Manual Annotation of Proteins—available at http://hamap.expasy.org/) is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated annotation rules that specify annotations that apply to family members. HAMAP was originally developed to support the manual curation of UniProtKB/Swiss-Prot records describing microbial proteins. Here we describe new developments in HAMAP, including the extension of HAMAP to eukaryotic proteins, the use of HAMAP in the automated annotation of UniProtKB/TrEMBL, providing high-quality annotation for millions of protein sequences, and the future integration of HAMAP into a unified system for UniProtKB annotation, UniRule. HAMAP is continuously updated by expert curators with new family profiles and annotation rules as new protein families are characterized. The collection of HAMAP family classification profiles and annotation rules can be browsed and viewed on the HAMAP website, which also provides an interface to scan user sequences against HAMAP profiles. PMID:23193261
Review of functional markers for improving cooking, eating, and the nutritional qualities of rice
Lau, Wendy C. P.; Rafii, Mohd Y.; Ismail, Mohd R.; Puteh, Adam; Latif, Mohammad A.; Ramli, Asfaliza
2015-01-01
After yield, quality is one of the most important aspects of rice breeding. Preference for rice quality varies among cultures and regions; therefore, rice breeders have to tailor the quality according to the preferences of local consumers. Rice quality assessment requires routine chemical analysis procedures. The advancement of molecular marker technology has revolutionized the strategy in breeding programs. The availability of rice genome sequences and the use of forward and reverse genetics approaches facilitate gene discovery and the deciphering of gene functions. A well-characterized gene is the basis for the development of functional markers, which play an important role in plant genotyping and, in particular, marker-assisted breeding. In addition, functional markers offer advantages that counteract the limitations of random DNA markers. Some functional markers have been applied in marker-assisted breeding programs and have successfully improved rice quality to meet local consumers’ preferences. Although functional markers offer a plethora of advantages over random genetic markers, the development and application of functional markers should be conducted with care. The decreasing cost of sequencing will enable more functional markers for rice quality improvement to be developed, and application of these markers in rice quality breeding programs is highly anticipated. PMID:26528304
Hazes, Bart
2014-02-28
Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.
High-speed imaging using 3CCD camera and multi-color LED flashes
NASA Astrophysics Data System (ADS)
Hijazi, Ala; Friedl, Alexander; Cierpka, Christian; Kähler, Christian; Madhavan, Vis
2017-11-01
This paper demonstrates the possibility of capturing full-resolution, high-speed image sequences using a regular 3CCD color camera in conjunction with high-power light emitting diodes of three different colors. This is achieved using a novel approach, referred to as spectral-shuttering, where a high-speed image sequence is captured using short duration light pulses of different colors that are sent consecutively in very close succession. The work presented in this paper demonstrates the feasibility of configuring a high-speed camera system using low cost and readily available off-the-shelf components. This camera can be used for recording six-frame sequences at frame rates up to 20 kHz or three-frame sequences at even higher frame rates. Both color crosstalk and spatial matching between the different channels of the camera are found to be within acceptable limits. A small amount of magnification difference between the different channels is found and a simple calibration procedure for correcting the images is introduced. The images captured using the approach described here are of good quality to be used for obtaining full-field quantitative information using techniques such as digital image correlation and particle image velocimetry. A sequence of six high-speed images of a bubble splash recorded at 400 Hz is presented as a demonstration.
Ning, Yi; Li, Yan-Ling; Zhou, Guo-Ying; Yang, Lu-Cun; Xu, Wen-Hua
2016-04-01
High throughput sequencing technology is also called Next Generation Sequencing (NGS), which can sequence hundreds and thousands sequences in different samples at the same time. In the present study, the culture-independent high throughput sequencing technology was applied to sequence the fungi metagenomic DNA of the fungal internal transcribed spacer 1(ITS 1) in the root of Sinopodophyllum hexandrum. Sequencing data suggested that after the quality control, 22 565 reads were remained. Cluster similarity analysis was done based on 97% sequence similarity, which obtained 517 OTUs for the three samples (LD1, LD2 and LD3). All the fungi which identified from all the reads of OTUs based on 0.8 classification thresholds using the software of RDP classifier were classified as 13 classes, 35 orders, 44 family, 55 genera. Among these genera, the genus of Tetracladium was the dominant genera in all samples(35.49%, 68.55% and 12.96%).The Shannon's diversity indices and the Simpson indices of the endophytic fungi in the samples ranged from 1.75-2.92, 0.11-0.32, respectively.This is the first time for applying high through put sequencing technol-ogyto analyze the community composition and diversity of endophytic fungi in the medicinal plant, and the results showed that there were hyper diver sity and high community composition complexity of endophytic fungi in the root of S. hexandrum. It is also proved that the high through put sequencing technology has great advantage for analyzing ecommunity composition and diversity of endophtye in the plant. Copyright© by the Chinese Pharmaceutical Association.
Hausmann, Bela; Pjevac, Petra; Schreck, Katharina; Herbold, Craig W; Daims, Holger; Wagner, Michael; Loy, Alexander
2018-01-25
The facultative anaerobic chemoorganoheterotrophic alphaproteobacterium Telmatospirillum siberiense 26-4b1 was isolated from a Siberian peatland. We report here a 6.20-Mbp near-complete high-quality draft genome sequence of T. siberiense that reveals expected and novel metabolic potential for the genus Telmatospirillum , including genes for sulfur oxidation. Copyright © 2018 Hausmann et al.
Ekrem, Torbjørn; Stur, Elisabeth
2017-01-01
Abstract Chironomidae (Diptera) pupal exuviae samples are commonly used for biological monitoring of aquatic habitats. DNA barcoding has proved useful for species identification of chironomid life stages containing cellular tissue, but the barcoding success of chironomid pupal exuviae is unknown. We assessed whether standard DNA barcoding could be efficiently used for species identification of chironomid pupal exuviae when compared with morphological techniques and if there were differences in performance between temperate and tropical ecosystems, subfamilies, and tribes. PCR, sequence, and identification success differed significantly between geographic regions and taxonomic groups. For Norway, 27 out of 190 (14.2%) of pupal exuviae resulted in high-quality chironomid sequences that match species. For Costa Rica, 69 out of 190 (36.3%) Costa Rican pupal exuviae resulted in high-quality sequences, but none matched known species. Standard DNA barcoding of chironomid pupal exuviae had limited success in species identification of unknown specimens due to contaminations and lack of matching references in available barcode libraries, especially from Costa Rica. Therefore, we recommend future biodiversity studies that focus their efforts on understudied regions, to simultaneously use morphological and molecular identification techniques to identify all life stages of chironomids and populate the barcode reference library with identified sequences.
High-resolution MRI of cranial nerves in posterior fossa at 3.0 T.
Guo, Zi-Yi; Chen, Jing; Liang, Qi-Zhou; Liao, Hai-Yan; Cheng, Qiong-Yue; Fu, Shui-Xi; Chen, Cai-Xiang; Yu, Dan
2013-02-01
To evaluate the influence of high-resolution imaging obtainable with the higher field strength of 3.0 T on the visualization of the brain nerves in the posterior fossa. In total, 20 nerves were investigated on MRI of 12 volunteers each and selected for comparison, respectively, with the FSE sequences with 5 mm and 2 mm section thicknesses and gradient recalled echo (GRE) sequences acquired with a 3.0-T scanner. The MR images were evaluated by three independent readers who rated image quality according to depiction of anatomic detail and contrast with use of a rating scale. In general, decrease of the slice thickness showed a significant increase in the detection of nerves as well as in the image quality characteristics. Comparing FSE and GRE imaging, the course of brain nerves and brainstem vessels was visualized best with use of the three-dimensional (3D) pulse sequence. The comparison revealed the clear advantage of a thin section. The increased resolution enabled immediate identification of all brainstem nerves. GRE sequence most distinctly and confidently depicted pertinent structures and enables 3D reconstruction to illustrate complex relations of the brainstem. Copyright © 2013 Hainan Medical College. Published by Elsevier B.V. All rights reserved.
Lejzerowicz, Franck; Esling, Philippe; Pillet, Loïc; Wilding, Thomas A.; Black, Kenneth D.; Pawlowski, Jan
2015-01-01
Environmental diversity surveys are crucial for the bioassessment of anthropogenic impacts on marine ecosystems. Traditional benthic monitoring relying on morphotaxonomic inventories of macrofaunal communities is expensive, time-consuming and expertise-demanding. High-throughput sequencing of environmental DNA barcodes (metabarcoding) offers an alternative to describe biological communities. However, whether the metabarcoding approach meets the quality standards of benthic monitoring remains to be tested. Here, we compared morphological and eDNA/RNA-based inventories of metazoans from samples collected at 10 stations around a fish farm in Scotland, including near-cage and distant zones. For each of 5 replicate samples per station, we sequenced the V4 region of the 18S rRNA gene using the Illumina technology. After filtering, we obtained 841,766 metazoan sequences clustered in 163 Operational Taxonomic Units (OTUs). We assigned the OTUs by combining local BLAST searches with phylogenetic analyses. We calculated two commonly used indices: the Infaunal Trophic Index and the AZTI Marine Biotic Index. We found that the molecular data faithfully reflect the morphology-based indices and provides an equivalent assessment of the impact associated with fish farms activities. We advocate that future benthic monitoring should integrate metabarcoding as a rapid and accurate tool for the evaluation of the quality of marine benthic ecosystems. PMID:26355099
A Method for Preparing DNA Sequencing Templates Using a DNA-Binding Microplate
Yang, Yu; Hebron, Haroun R.; Hang, Jun
2009-01-01
A DNA-binding matrix was immobilized on the surface of a 96-well microplate and used for plasmid DNA preparation for DNA sequencing. The same DNA-binding plate was used for bacterial growth, cell lysis, DNA purification, and storage. In a single step using one buffer, bacterial cells were lysed by enzymes, and released DNA was captured on the plate simultaneously. After two wash steps, DNA was eluted and stored in the same plate. Inclusion of phosphates in the culture medium was found to enhance the yield of plasmid significantly. Purified DNA samples were used successfully in DNA sequencing with high consistency and reproducibility. Eleven vectors and nine libraries were tested using this method. In 10 μl sequencing reactions using 3 μl sample and 0.25 μl BigDye Terminator v3.1, the results from a 3730xl sequencer gave a success rate of 90–95% and read-lengths of 700 bases or more. The method is fully automatable and convenient for manual operation as well. It enables reproducible, high-throughput, rapid production of DNA with purity and yields sufficient for high-quality DNA sequencing at a substantially reduced cost. PMID:19568455
Cardiac cine imaging at 3 Tesla: initial experience with a 32-element body-array coil.
Fenchel, Michael; Deshpande, Vibhas S; Nael, Kambiz; Finn, J Paul; Miller, Stephan; Ruehm, Stefan; Laub, Gerhard
2006-08-01
We sought to assess the feasibility of cardiac cine imaging and evaluate image quality at 3 T using a body-array coil with 32 coil elements. Eight healthy volunteers (3 men; median age 29 years) were examined on a 3-T magnetic resonance scanner (Magnetom Trio, Siemens Medical Solutions) using a 32-element phased-array coil (prototype from In vivo Corp.). Gradient-recalled-echo (GRE) cine (GRAPPAx3), GRE cine with tagging lines, steady-state-free-precession (SSFP) cine (GRAPPAx3 and x4), and SSFP cine(TSENSEx4 andx6) images were acquired in short-axis and 4-chamber view. Reference images with identical scan parameters were acquired using the total-imaging-matrix (Tim) coil system with a total of 12 coil elements. Images were assessed by 2 observers in a consensus reading with regard to image quality, noise and presence of artifacts. Furthermore, signal-to-noise values were determined in phantom measurements. In phantom measurements signal-to-noise values were increased by 115-155% for the various cine sequences using the 32-element coil. Scoring of image quality yielded statistically significant increased image quality with the SSFP-GRAPPAx4, SSFP-TSENSEx4, and SSFP-TSENSEx6 sequence using the 32-element coil (P < 0.05). Similarly, scoring of image noise yielded a statistically significant lower noise rating with the SSFP-GRAPPAx4, GRE-GRAPPAx3, SSFP-TSENSEx4, and SSFP-TSENSEx6 sequence using the 32-element coil (P < 0.05). This study shows that cardiac cine imaging at 3 T using a 32-element body-array coil is feasible in healthy volunteers. Using a large number of coil elements with a favorable sensitivity profile supports faster image acquisition, with high diagnostic image quality even for high parallel imaging factors.
ALLPATHS: Assembling Large Genomes with Short Illumina Reads
Gnerre, Sante
2018-02-06
Sante Gnerre from the Broad Institute speaks on the challenge of developing high quality assemblies of large genomes using short reads at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.
Motion adaptive Kalman filter for super-resolution
NASA Astrophysics Data System (ADS)
Richter, Martin; Nasse, Fabian; Schröder, Hartmut
2011-01-01
Superresolution is a sophisticated strategy to enhance image quality of both low and high resolution video, performing tasks like artifact reduction, scaling and sharpness enhancement in one algorithm, all of them reconstructing high frequency components (above Nyquist frequency) in some way. Especially recursive superresolution algorithms can fulfill high quality aspects because they control the video output using a feed-back loop and adapt the result in the next iteration. In addition to excellent output quality, temporal recursive methods are very hardware efficient and therefore even attractive for real-time video processing. A very promising approach is the utilization of Kalman filters as proposed by Farsiu et al. Reliable motion estimation is crucial for the performance of superresolution. Therefore, robust global motion models are mainly used, but this also limits the application of superresolution algorithm. Thus, handling sequences with complex object motion is essential for a wider field of application. Hence, this paper proposes improvements by extending the Kalman filter approach using motion adaptive variance estimation and segmentation techniques. Experiments confirm the potential of our proposal for ideal and real video sequences with complex motion and further compare its performance to state-of-the-art methods like trainable filters.
Herbold, Craig W.; Pelikan, Claus; Kuzyk, Orest; Hausmann, Bela; Angel, Roey; Berry, David; Loy, Alexander
2015-01-01
High throughput sequencing of phylogenetic and functional gene amplicons provides tremendous insight into the structure and functional potential of complex microbial communities. Here, we introduce a highly adaptable and economical PCR approach to barcoding and pooling libraries of numerous target genes. In this approach, we replace gene- and sequencing platform-specific fusion primers with general, interchangeable barcoding primers, enabling nearly limitless customized barcode-primer combinations. Compared to barcoding with long fusion primers, our multiple-target gene approach is more economical because it overall requires lower number of primers and is based on short primers with generally lower synthesis and purification costs. To highlight our approach, we pooled over 900 different small-subunit rRNA and functional gene amplicon libraries obtained from various environmental or host-associated microbial community samples into a single, paired-end Illumina MiSeq run. Although the amplicon regions ranged in size from approximately 290 to 720 bp, we found no significant systematic sequencing bias related to amplicon length or gene target. Our results indicate that this flexible multiplexing approach produces large, diverse, and high quality sets of amplicon sequence data for modern studies in microbial ecology. PMID:26236305
Comparative study of fat-suppression techniques for hip arthroplasty MR imaging.
Molière, Sébastien; Dillenseger, Jean-Philippe; Ehlinger, Matthieu; Kremer, Stéphane; Bierry, Guillaume
2017-09-01
The goal of this study was to evaluate different fat-suppressed fluid-sensitive sequences in association with different metal artifacts reduction techniques (MARS) to determine which combination allows better fat suppression around metallic hip implants. An experimental study using an MRI fat-water phantom quantitatively evaluated contrast shift induced by metallic hip implant for different fat-suppression techniques and MARS. Then a clinical study with patients addressed to MRI unit for painful hip prosthesis compared these techniques in terms of fat suppression quality and diagnosis confidence. Among sequences without MARS, both T2 Dixon and short tau inversion recuperation (STIR) had significantly lower contrast shift (p < 0.05), Dixon offering the best fat suppression. Adding MARS (view-angle tilting or slice-encoding for metal artifact correction (SEMAC)) to STIR gave better results than Dixon alone, and also better than SPAIR and fat saturation with MARS (p < 0.05). There were no statistically significant differences between STIR with view-angle tilting and STIR with SEMAC in terms of fat suppression quality. STIR sequence is the preferred fluid-sensitive MR sequence in patients with metal implant. In combination with MARS (view-angle tilting or SEMAC), STIR appears to be the best option for high-quality fat suppression.
Lisi, Simonetta; Chirichella, Michele; Arisi, Ivan; Goracci, Martina; Cremisi, Federico; Cattaneo, Antonino
2017-01-01
Antibody libraries are important resources to derive antibodies to be used for a wide range of applications, from structural and functional studies to intracellular protein interference studies to developing new diagnostics and therapeutics. Whatever the goal, the key parameter for an antibody library is its complexity (also known as diversity), i.e. the number of distinct elements in the collection, which directly reflects the probability of finding in the library an antibody against a given antigen, of sufficiently high affinity. Quantitative evaluation of antibody library complexity and quality has been for a long time inadequately addressed, due to the high similarity and length of the sequences of the library. Complexity was usually inferred by the transformation efficiency and tested either by fingerprinting and/or sequencing of a few hundred random library elements. Inferring complexity from such a small sampling is, however, very rudimental and gives limited information about the real diversity, because complexity does not scale linearly with sample size. Next-generation sequencing (NGS) has opened new ways to tackle the antibody library complexity quality assessment. However, much remains to be done to fully exploit the potential of NGS for the quantitative analysis of antibody repertoires and to overcome current limitations. To obtain a more reliable antibody library complexity estimate here we show a new, PCR-free, NGS approach to sequence antibody libraries on Illumina platform, coupled to a new bioinformatic analysis and software (Diversity Estimator of Antibody Library, DEAL) that allows to reliably estimate the complexity, taking in consideration the sequencing error. PMID:28505201
Fantini, Marco; Pandolfini, Luca; Lisi, Simonetta; Chirichella, Michele; Arisi, Ivan; Terrigno, Marco; Goracci, Martina; Cremisi, Federico; Cattaneo, Antonino
2017-01-01
Antibody libraries are important resources to derive antibodies to be used for a wide range of applications, from structural and functional studies to intracellular protein interference studies to developing new diagnostics and therapeutics. Whatever the goal, the key parameter for an antibody library is its complexity (also known as diversity), i.e. the number of distinct elements in the collection, which directly reflects the probability of finding in the library an antibody against a given antigen, of sufficiently high affinity. Quantitative evaluation of antibody library complexity and quality has been for a long time inadequately addressed, due to the high similarity and length of the sequences of the library. Complexity was usually inferred by the transformation efficiency and tested either by fingerprinting and/or sequencing of a few hundred random library elements. Inferring complexity from such a small sampling is, however, very rudimental and gives limited information about the real diversity, because complexity does not scale linearly with sample size. Next-generation sequencing (NGS) has opened new ways to tackle the antibody library complexity quality assessment. However, much remains to be done to fully exploit the potential of NGS for the quantitative analysis of antibody repertoires and to overcome current limitations. To obtain a more reliable antibody library complexity estimate here we show a new, PCR-free, NGS approach to sequence antibody libraries on Illumina platform, coupled to a new bioinformatic analysis and software (Diversity Estimator of Antibody Library, DEAL) that allows to reliably estimate the complexity, taking in consideration the sequencing error.
Compression of computer generated phase-shifting hologram sequence using AVC and HEVC
NASA Astrophysics Data System (ADS)
Xing, Yafei; Pesquet-Popescu, Béatrice; Dufaux, Frederic
2013-09-01
With the capability of achieving twice the compression ratio of Advanced Video Coding (AVC) with similar reconstruction quality, High Efficiency Video Coding (HEVC) is expected to become the newleading technique of video coding. In order to reduce the storage and transmission burden of digital holograms, in this paper we propose to use HEVC for compressing the phase-shifting digital hologram sequences (PSDHS). By simulating phase-shifting digital holography (PSDH) interferometry, interference patterns between illuminated three dimensional( 3D) virtual objects and the stepwise phase changed reference wave are generated as digital holograms. The hologram sequences are obtained by the movement of the virtual objects and compressed by AVC and HEVC. The experimental results show that AVC and HEVC are efficient to compress PSDHS, with HEVC giving better performance. Good compression rate and reconstruction quality can be obtained with bitrate above 15000kbps.
ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data
2012-01-01
Background Next-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results. Results Here we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy. Conclusion ReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration. PMID:22946927
A HIGH COVERAGE GENOME SEQUENCE FROM AN ARCHAIC DENISOVAN INDIVIDUAL
Meyer, Matthias; Kircher, Martin; Gansauge, Marie-Theres; Li, Heng; Racimo, Fernando; Mallick, Swapan; Schraiber, Joshua G.; Jay, Flora; Prüfer, Kay; de Filippo, Cesare; Sudmant, Peter H.; Alkan, Can; Fu, Qiaomei; Do, Ron; Rohland, Nadin; Tandon, Arti; Siebauer, Michael; Green, Richard E.; Bryc, Katarzyna; Briggs, Adrian W.; Stenzel, Udo; Dabney, Jesse; Shendure, Jay; Kitzman, Jacob; Hammer, Michael F.; Shunkov, Michael V.; Derevianko, Anatoli P.; Patterson, Nick; Andrés, Aida M.; Eichler, Evan E.; Slatkin, Montgomery; Reich, David; Kelso, Janet; Pääbo, Svante
2013-01-01
We present a DNA library preparation method that has allowed us to reconstruct a high coverage (30X) genome sequence of a Denisovan, an extinct relative of Neandertals. The quality of this genome allows a direct estimation of Denisovan heterozygosity indicating that genetic diversity in these archaic hominins was extremely low. It also allows tentative dating of the specimen on the basis of “missing evolution” in its genome, detailed measurements of Denisovan and Neandertal admixture into present-day human populations, and the generation of a near-complete catalog of genetic changes that swept to high frequency in modern humans since their divergence from Denisovans. PMID:22936568
Pightling, Arthur W.; Petronella, Nicholas; Pagotto, Franco
2014-01-01
The wide availability of whole-genome sequencing (WGS) and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs) in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs) are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps) are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i) depth of sequencing coverage, ii) choice of reference-guided short-read sequence assembler, iii) choice of reference genome, and iv) whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT), using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming). We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers should test a variety of conditions to achieve optimal results. PMID:25144537
Schulz, Jenni; P Marques, José; Ter Telgte, Annemieke; van Dorst, Anouk; de Leeuw, Frank-Erik; Meijer, Frederick J A; Norris, David G
2018-01-01
As a single-shot sequence with a long train of refocusing pulses, Half-Fourier Acquisition Single-Shot Turbo-Spin-Echo (HASTE) suffers from high power deposition limiting use at high resolutions and high field strengths, particularly if combined with acceleration techniques such as simultaneous multi-slice (SMS) imaging. Using a combination of multiband (MB)-excitation and PINS-refocusing pulses will effectively accelerate the acquisition time while staying within the SAR limitations. In particular, uncooperative and young patients will profit from the speed of the MB-PINS HASTE sequence, as clinical diagnosis can be possible without sedation. Materials and MethodsMB-excitation and PINS-refocusing pulses were incorporated into a HASTE-sequence with blipped CAIPIRINHA and TRAPS including an internal FLASH reference scan for online reconstruction. Whole brain MB-PINS HASTE data were acquired on a Siemens 3T-Prisma system from 10 individuals and compared to a clinical HASTE protocol. ResultsThe proposed MB-PINS HASTE protocol accelerates the acquisition by about a factor 2 compared to the clinical HASTE. The diagnostic image quality proved to be comparable for both sequences for the evaluation of the overall aspect of the brain, the detection of white matter changes and areas of tissue loss, and for the evaluation of the CSF spaces although artifacts were more frequently encountered with MB-PINS HASTE. ConclusionsMB-PINS HASTE enables acquisition of slice accelerated highly T2-weighted images and provides good diagnostic image quality while reducing acquisition time. Copyright © 2017 Elsevier B.V. All rights reserved.
Didelot, Audrey; Kotsopoulos, Steve K; Lupo, Audrey; Pekin, Deniz; Li, Xinyu; Atochin, Ivan; Srinivasan, Preethi; Zhong, Qun; Olson, Jeff; Link, Darren R; Laurent-Puig, Pierre; Blons, Hélène; Hutchison, J Brian; Taly, Valerie
2013-05-01
Assessment of DNA integrity and quantity remains a bottleneck for high-throughput molecular genotyping technologies, including next-generation sequencing. In particular, DNA extracted from paraffin-embedded tissues, a major potential source of tumor DNA, varies widely in quality, leading to unpredictable sequencing data. We describe a picoliter droplet-based digital PCR method that enables simultaneous detection of DNA integrity and the quantity of amplifiable DNA. Using a multiplex assay, we detected 4 different target lengths (78, 159, 197, and 550 bp). Assays were validated with human genomic DNA fragmented to sizes of 170 bp to 3000 bp. The technique was validated with DNA quantities as low as 1 ng. We evaluated 12 DNA samples extracted from paraffin-embedded lung adenocarcinoma tissues. One sample contained no amplifiable DNA. The fractions of amplifiable DNA for the 11 other samples were between 0.05% and 10.1% for 78-bp fragments and ≤1% for longer fragments. Four samples were chosen for enrichment and next-generation sequencing. The quality of the sequencing data was in agreement with the results of the DNA-integrity test. Specifically, DNA with low integrity yielded sequencing results with lower levels of coverage and uniformity and had higher levels of false-positive variants. The development of DNA-quality assays will enable researchers to downselect samples or process more DNA to achieve reliable genome sequencing with the highest possible efficiency of cost and effort, as well as minimize the waste of precious samples. © 2013 American Association for Clinical Chemistry.
2014-01-01
Background Variation in seed oil composition and content among soybean varieties is largely attributed to differences in transcript sequences and/or transcript accumulation of oil production related genes in seeds. Discovery and analysis of sequence and expression variations in these genes will accelerate soybean oil quality improvement. Results In an effort to identify these variations, we sequenced the transcriptomes of soybean seeds from nine lines varying in oil composition and/or total oil content. Our results showed that 69,338 distinct transcripts from 32,885 annotated genes were expressed in seeds. A total of 8,037 transcript expression polymorphisms and 50,485 transcript sequence polymorphisms (48,792 SNPs and 1,693 small Indels) were identified among the lines. Effects of the transcript polymorphisms on their encoded protein sequences and functions were predicted. The studies also provided independent evidence that the lack of FAD2-1A gene activity and a non-synonymous SNP in the coding sequence of FAB2C caused elevated oleic acid and stearic acid levels in soybean lines M23 and FAM94-41, respectively. Conclusions As a proof-of-concept, we developed an integrated RNA-seq and bioinformatics approach to identify and functionally annotate transcript polymorphisms, and demonstrated its high effectiveness for discovery of genetic and transcript variations that result in altered oil quality traits. The collection of transcript polymorphisms coupled with their predicted functional effects will be a valuable asset for further discovery of genes, gene variants, and functional markers to improve soybean oil quality. PMID:24755115
Enders, Judith; Rief, Matthias; Zimmermann, Elke; Asbach, Patrick; Diederichs, Gerd; Wetz, Christoph; Siebert, Eberhard; Wagner, Moritz; Hamm, Bernd; Dewey, Marc
2013-01-01
The purpose of the present study was to compare the image quality of spinal magnetic resonance (MR) imaging performed on a high-field horizontal open versus a short-bore MR scanner in a randomized controlled study setup. Altogether, 93 (80% women, mean age 53) consecutive patients underwent spine imaging after random assignement to a 1-T horizontal open MR scanner with a vertical magnetic field or a 1.5-T short-bore MR scanner. This patient subset was part of a larger cohort. Image quality was assessed by determining qualitative parameters, signal-to-noise (SNR) and contrast-to-noise ratios (CNR), and quantitative contour sharpness. The image quality parameters were higher for short-bore MR imaging. Regarding all sequences, the relative differences were 39% for the mean overall qualitative image quality, 53% for the mean SNR values, and 34-37% for the quantitative contour sharpness (P<0.0001). The CNR values were also higher for images obtained with the short-bore MR scanner. No sequence was of very poor (nondiagnostic) image quality. Scanning times were significantly longer for examinations performed on the open MR scanner (mean: 32±22 min versus 20±9 min; P<0.0001). In this randomized controlled comparison of spinal MR imaging with an open versus a short-bore scanner, short-bore MR imaging revealed considerably higher image quality with shorter scanning times. ClinicalTrials.gov NCT00715806.
Zimmermann, Elke; Asbach, Patrick; Diederichs, Gerd; Wetz, Christoph; Siebert, Eberhard; Wagner, Moritz; Hamm, Bernd; Dewey, Marc
2013-01-01
Background The purpose of the present study was to compare the image quality of spinal magnetic resonance (MR) imaging performed on a high-field horizontal open versus a short-bore MR scanner in a randomized controlled study setup. Methods Altogether, 93 (80% women, mean age 53) consecutive patients underwent spine imaging after random assignement to a 1-T horizontal open MR scanner with a vertical magnetic field or a 1.5-T short-bore MR scanner. This patient subset was part of a larger cohort. Image quality was assessed by determining qualitative parameters, signal-to-noise (SNR) and contrast-to-noise ratios (CNR), and quantitative contour sharpness. Results The image quality parameters were higher for short-bore MR imaging. Regarding all sequences, the relative differences were 39% for the mean overall qualitative image quality, 53% for the mean SNR values, and 34–37% for the quantitative contour sharpness (P<0.0001). The CNR values were also higher for images obtained with the short-bore MR scanner. No sequence was of very poor (nondiagnostic) image quality. Scanning times were significantly longer for examinations performed on the open MR scanner (mean: 32±22 min versus 20±9 min; P<0.0001). Conclusions In this randomized controlled comparison of spinal MR imaging with an open versus a short-bore scanner, short-bore MR imaging revealed considerably higher image quality with shorter scanning times. Trial Registration ClinicalTrials.gov NCT00715806 PMID:24391767
Tranchida-Lombardo, Valentina; Aiese Cigliano, Riccardo; Anzar, Irantzu; Landi, Simone; Palombieri, Samuela; Colantuono, Chiara; Bostan, Hamed; Termolino, Pasquale; Aversano, Riccardo; Batelli, Giorgia; Cammareri, Maria; Carputo, Domenico; Chiusano, Maria Luisa; Conicella, Clara; Consiglio, Federica; D'Agostino, Nunzio; De Palma, Monica; Di Matteo, Antonio; Grandillo, Silvana; Sanseverino, Walter; Tucci, Marina; Grillo, Stefania
2017-11-14
Tomato is a high value crop and the primary model for fleshy fruit development and ripening. Breeding priorities include increased fruit quality, shelf life and tolerance to stresses. To contribute towards this goal, we re-sequenced the genomes of Corbarino (COR) and Lucariello (LUC) landraces, which both possess the traits of plant adaptation to water deficit, prolonged fruit shelf-life and good fruit quality. Through the newly developed pipeline Reconstructor, we generated the genome sequences of COR and LUC using datasets of 65.8 M and 56.4 M of 30-150 bp paired-end reads, respectively. New contigs including reads that could not be mapped to the tomato reference genome were assembled, and a total of 43, 054 and 44, 579 gene loci were annotated in COR and LUC. Both genomes showed novel regions with similarity to Solanum pimpinellifolium and Solanum pennellii. In addition to small deletions and insertions, 2, 000 and 1, 700 single nucleotide polymorphisms (SNPs) could exert potentially disruptive effects on 1, 371 and 1, 201 genes in COR and LUC, respectively. A detailed survey of the SNPs occurring in fruit quality, shelf life and stress tolerance related-genes identified several candidates of potential relevance. Variations in ethylene response components may concur in determining peculiar phenotypes of COR and LUC. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Vacca, Davide; Cancila, Valeria; Gulino, Alessandro; Lo Bosco, Giosuè; Belmonte, Beatrice; Di Napoli, Arianna; Florena, Ada Maria; Tripodo, Claudio; Arancio, Walter
2018-02-01
The MinION is a miniaturized high-throughput next generation sequencing platform of novel conception. The use of nucleic acids derived from formalin-fixed paraffin-embedded samples is highly desirable, but their adoption for molecular assays is hurdled by the high degree of fragmentation and by the chemical-induced mutations stemming from the fixation protocols. In order to investigate the suitability of MinION sequencing on formalin-fixed paraffin-embedded samples, the presence and frequency of BRAF c.1799T > A mutation was investigated in two archival tissue specimens of Hairy cell leukemia and Hairy cell leukemia Variant. Despite the poor quality of the starting DNA, BRAF mutation was successfully detected in the Hairy cell leukemia sample with around 50% of the reads obtained within 2 h of the sequencing start. Notably, the mutational burden of the Hairy cell leukemia sample as derived from nanopore sequencing proved to be comparable to a sensitive method for the detection of point mutations, namely the Digital PCR, using a validated assay. Nanopore sequencing can be adopted for targeted sequencing of genetic lesions on critical DNA samples such as those extracted from archival routine formalin-fixed paraffin-embedded samples. This result let speculating about the possibility that the nanopore sequencing could be trustably adopted for the real-time targeted sequencing of genetic lesions. Our report opens the window for the adoption of nanopore sequencing in molecular pathology for research and diagnostics.
Rapid evaluation and quality control of next generation sequencing data with FaQCs.
Lo, Chien-Chi; Chain, Patrick S G
2014-11-19
Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly process large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.
Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R. Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M.; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas
2011-01-01
Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi. PMID:21949797
Fast and accurate de novo genome assembly from long uncorrected reads
Vaser, Robert; Sović, Ivan; Nagarajan, Niranjan
2017-01-01
The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource-intensive error-correction and consensus-generation steps to obtain high-quality assemblies. We show that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment–based, stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore data sets, we show that Racon coupled with miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster. PMID:28100585
High-Throughput Sequencing and De Novo Assembly of the Isatis indigotica Transcriptome
Tang, Xiaoqing; Xiao, Yunhua; Lv, Tingting; Wang, Fangquan; Zhu, QianHao; Zheng, Tianqing; Yang, Jie
2014-01-01
Background Isatis indigotica, the source of the traditional Chinese medicine Radix isatidis (Ban-Lan-Gen), is an extremely important economical crop in China. To facilitate biological, biochemical and molecular research on the medicinal chemicals in I. indigotica, here we report the first I. indigotica transcriptome generated by RNA sequencing (RNA-seq). Results RNA-seq library was created using RNA extracted from a mixed sample including leaf and root. A total of 33,238 unigenes were assembled from more than 28 million of high quality short reads. The quality of the assembly was experimentally examined by cDNA sequencing of seven randomly selected unigenes. Based on blast search 28,184 unigenes had a hit in at least one of the protein and nucleotide databases used in this study, and 8 unigenes were found to be associated with biosynthesis of indole and its derivatives. According to Gene Ontology classification, 22,365 unigenes were categorized into 48 functional groups. Furthermore, Clusters of Orthologous Group and Swiss-Port annotation were assigned for 7,707 and 18,679 unigenes, respectively. Analysis of repeat motifs identified 6,400 simple sequence repeat markers in 4,509 unigenes. Conclusion Our data provide a comprehensive sequence resource for molecular study of I. indigotica. Our results will facilitate studies on the functions of genes involved in the indole alkaloid biosynthesis pathway and on metabolism of nitrogen and indole alkaloids in I. indigotica and its related species. PMID:25259890
Santagati, Vito Davide; Sestili, Francesco; Lafiandra, Domenico; D'Ovidio, Renato; Rogniaux, Helene; Masci, Stefania
2016-07-01
Wheat high molecular weight glutenin subunit variation is important because of its great influence on glutenin polymer structure, that is related to dough technological properties. Among the different subunits, the pair Bx20 and By20 is known to have a negative effect on quality, but the reasons are not clear: Bx20 has two cysteines, which theoretically make this subunit a chain extender of the glutenin polymer, just like the other Bx subunits, showing four cysteines, two of which should be involved in intra-molecular disulfide bonds. By20 has never been characterized so far at molecular level. Here we report the nucleotide sequences of Bx20 and By20 genes isolated from the durum wheat cultivar 'Lira 45' and the validation of the corresponding deduced amino acid sequences by using MALDI-TOF and LC-MS/MS. Four nucleotide differences were identified in the Bx20 gene with respect to the deduced sequence present in NCBI, causing two amino acid substitutions. For the By20 subunit, nucleotide and amino acid sequences revealed a great similarity to By15, both at gene and protein levels, showing five nucleotide changes generating two amino acid differences. No evidence of post-translational modifications has been found. Hypotheses are formulated in regard to relationships with technological quality. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Sho, Shonan; Court, Colin M; Winograd, Paul; Lee, Sangjun; Hou, Shuang; Graeber, Thomas G; Tseng, Hsian-Rong; Tomlinson, James S
2017-07-01
Sequencing analysis of circulating tumor cells (CTCs) enables "liquid biopsy" to guide precision oncology strategies. However, this requires low-template whole genome amplification (WGA) that is prone to errors and biases from uneven amplifications. Currently, quality control (QC) methods for WGA products, as well as the number of CTCs needed for reliable downstream sequencing, remain poorly defined. We sought to define strategies for selecting and generating optimal WGA products from low-template input as it relates to their potential applications in precision oncology strategies. Single pancreatic cancer cells (HPAF-II) were isolated using laser microdissection. WGA was performed using multiple displacement amplification (MDA), multiple annealing and looping based amplification (MALBAC) and PicoPLEX. Quality of amplified DNA products were assessed using a multiplex/RT-qPCR based method that evaluates for 8-cancer related genes and QC-scores were assigned. We utilized this scoring system to assess the impact of de novo modifications to the WGA protocol. WGA products were subjected to Sanger sequencing, array comparative genomic hybridization (aCGH) and next generation sequencing (NGS) to evaluate their performances in respective downstream analyses providing validation of the QC-score. Single-cell WGA products exhibited a significant sample-to-sample variability in amplified DNA quality as assessed by our 8-gene QC assay. Single-cell WGA products that passed the pre-analysis QC had lower amplification bias and improved aCGH/NGS performance metrics when compared to single-cell WGA products that failed the QC. Increasing the number of cellular input resulted in improved QC-scores overall, but a resultant WGA product that consistently passed the QC step required a starting cellular input of at least 20-cells. Our modified-WGA protocol effectively reduced this number, achieving reproducible high-quality WGA products from ≥5-cells as a starting template. A starting cellular input of 5 to 10-cells amplified using the modified-WGA achieved aCGH and NGS results that closely matched that of unamplified, batch genomic DNA. The modified-WGA protocol coupled with the 8-gene QC serve as an effective strategy to enhance the quality of low-template WGA reactions. Furthermore, a threshold number of 5-10 cells are likely needed for a reliable WGA reaction and product with high fidelity to the original starting template.
A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs.
Swain, Martin T; Tsai, Isheng J; Assefa, Samual A; Newbold, Chris; Berriman, Matthew; Otto, Thomas D
2012-06-07
Genome projects now produce draft assemblies within weeks owing to advanced high-throughput sequencing technologies. For milestone projects such as Escherichia coli or Homo sapiens, teams of scientists were employed to manually curate and finish these genomes to a high standard. Nowadays, this is not feasible for most projects, and the quality of genomes is generally of a much lower standard. This protocol describes software (PAGIT) that is used to improve the quality of draft genomes. It offers flexible functionality to close gaps in scaffolds, correct base errors in the consensus sequence and exploit reference genomes (if available) in order to improve scaffolding and generating annotations. The protocol is most accessible for bacterial and small eukaryotic genomes (up to 300 Mb), such as pathogenic bacteria, malaria and parasitic worms. Applying PAGIT to an E. coli assembly takes ∼24 h: it doubles the average contig size and annotates over 4,300 gene models.
Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences.
Gao, Song; Sung, Wing-Kin; Nagarajan, Niranjan
2011-11-01
Scaffolding, the problem of ordering and orienting contigs, typically using paired-end reads, is a crucial step in the assembly of high-quality draft genomes. Even as sequencing technologies and mate-pair protocols have improved significantly, scaffolding programs still rely on heuristics, with no guarantees on the quality of the solution. In this work, we explored the feasibility of an exact solution for scaffolding and present a first tractable solution for this problem (Opera). We also describe a graph contraction procedure that allows the solution to scale to large scaffolding problems and demonstrate this by scaffolding several large real and synthetic datasets. In comparisons with existing scaffolders, Opera simultaneously produced longer and more accurate scaffolds demonstrating the utility of an exact approach. Opera also incorporates an exact quadratic programming formulation to precisely compute gap sizes (Availability: http://sourceforge.net/projects/operasf/ ).
Opera: Reconstructing Optimal Genomic Scaffolds with High-Throughput Paired-End Sequences
Gao, Song; Sung, Wing-Kin
2011-01-01
Abstract Scaffolding, the problem of ordering and orienting contigs, typically using paired-end reads, is a crucial step in the assembly of high-quality draft genomes. Even as sequencing technologies and mate-pair protocols have improved significantly, scaffolding programs still rely on heuristics, with no guarantees on the quality of the solution. In this work, we explored the feasibility of an exact solution for scaffolding and present a first tractable solution for this problem (Opera). We also describe a graph contraction procedure that allows the solution to scale to large scaffolding problems and demonstrate this by scaffolding several large real and synthetic datasets. In comparisons with existing scaffolders, Opera simultaneously produced longer and more accurate scaffolds demonstrating the utility of an exact approach. Opera also incorporates an exact quadratic programming formulation to precisely compute gap sizes (Availability: http://sourceforge.net/projects/operasf/). PMID:21929371
Mazur, Andrzej; De Meyer, Sofie E.; Tian, Rui; ...
2015-07-16
We report that Rhizobium leguminosarum bv. viciae GB30 is an aerobic, motile, Gram-negative, non-spore-forming rod that can exist as a soil saprophyte or as a legume microsymbiont of Pisum sativum. GB30 was isolated in Poland from a nodule recovered from the roots of Pisum sativum growing at Janow. GB30 is also an effective microsymbiont of the annual forage legumes vetch and pea. Here we describe the features of R. leguminosarum bv. viciae strain GB30, together with sequence and annotation. The 7,468,464 bp high-quality permanent draft genome is arranged in 78 scaffolds of 78 contigs containing 7,227 protein-coding genes and 75more » RNA-only encoding genes, and is part of the GEBA-RNB project proposal.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mazur, Andrzej; De Meyer, Sofie E.; Tian, Rui
We report that Rhizobium leguminosarum bv. viciae GB30 is an aerobic, motile, Gram-negative, non-spore-forming rod that can exist as a soil saprophyte or as a legume microsymbiont of Pisum sativum. GB30 was isolated in Poland from a nodule recovered from the roots of Pisum sativum growing at Janow. GB30 is also an effective microsymbiont of the annual forage legumes vetch and pea. Here we describe the features of R. leguminosarum bv. viciae strain GB30, together with sequence and annotation. The 7,468,464 bp high-quality permanent draft genome is arranged in 78 scaffolds of 78 contigs containing 7,227 protein-coding genes and 75more » RNA-only encoding genes, and is part of the GEBA-RNB project proposal.« less
Genome sequence, comparative analysis and haplotype structure of the domestic dog.
Lindblad-Toh, Kerstin; Wade, Claire M; Mikkelsen, Tarjei S; Karlsson, Elinor K; Jaffe, David B; Kamal, Michael; Clamp, Michele; Chang, Jean L; Kulbokas, Edward J; Zody, Michael C; Mauceli, Evan; Xie, Xiaohui; Breen, Matthew; Wayne, Robert K; Ostrander, Elaine A; Ponting, Chris P; Galibert, Francis; Smith, Douglas R; DeJong, Pieter J; Kirkness, Ewen; Alvarez, Pablo; Biagi, Tara; Brockman, William; Butler, Jonathan; Chin, Chee-Wye; Cook, April; Cuff, James; Daly, Mark J; DeCaprio, David; Gnerre, Sante; Grabherr, Manfred; Kellis, Manolis; Kleber, Michael; Bardeleben, Carolyne; Goodstadt, Leo; Heger, Andreas; Hitte, Christophe; Kim, Lisa; Koepfli, Klaus-Peter; Parker, Heidi G; Pollinger, John P; Searle, Stephen M J; Sutter, Nathan B; Thomas, Rachael; Webber, Caleb; Baldwin, Jennifer; Abebe, Adal; Abouelleil, Amr; Aftuck, Lynne; Ait-Zahra, Mostafa; Aldredge, Tyler; Allen, Nicole; An, Peter; Anderson, Scott; Antoine, Claudel; Arachchi, Harindra; Aslam, Ali; Ayotte, Laura; Bachantsang, Pasang; Barry, Andrew; Bayul, Tashi; Benamara, Mostafa; Berlin, Aaron; Bessette, Daniel; Blitshteyn, Berta; Bloom, Toby; Blye, Jason; Boguslavskiy, Leonid; Bonnet, Claude; Boukhgalter, Boris; Brown, Adam; Cahill, Patrick; Calixte, Nadia; Camarata, Jody; Cheshatsang, Yama; Chu, Jeffrey; Citroen, Mieke; Collymore, Alville; Cooke, Patrick; Dawoe, Tenzin; Daza, Riza; Decktor, Karin; DeGray, Stuart; Dhargay, Norbu; Dooley, Kimberly; Dooley, Kathleen; Dorje, Passang; Dorjee, Kunsang; Dorris, Lester; Duffey, Noah; Dupes, Alan; Egbiremolen, Osebhajajeme; Elong, Richard; Falk, Jill; Farina, Abderrahim; Faro, Susan; Ferguson, Diallo; Ferreira, Patricia; Fisher, Sheila; FitzGerald, Mike; Foley, Karen; Foley, Chelsea; Franke, Alicia; Friedrich, Dennis; Gage, Diane; Garber, Manuel; Gearin, Gary; Giannoukos, Georgia; Goode, Tina; Goyette, Audra; Graham, Joseph; Grandbois, Edward; Gyaltsen, Kunsang; Hafez, Nabil; Hagopian, Daniel; Hagos, Birhane; Hall, Jennifer; Healy, Claire; Hegarty, Ryan; Honan, Tracey; Horn, Andrea; Houde, Nathan; Hughes, Leanne; Hunnicutt, Leigh; Husby, M; Jester, Benjamin; Jones, Charlien; Kamat, Asha; Kanga, Ben; Kells, Cristyn; Khazanovich, Dmitry; Kieu, Alix Chinh; Kisner, Peter; Kumar, Mayank; Lance, Krista; Landers, Thomas; Lara, Marcia; Lee, William; Leger, Jean-Pierre; Lennon, Niall; Leuper, Lisa; LeVine, Sarah; Liu, Jinlei; Liu, Xiaohong; Lokyitsang, Yeshi; Lokyitsang, Tashi; Lui, Annie; Macdonald, Jan; Major, John; Marabella, Richard; Maru, Kebede; Matthews, Charles; McDonough, Susan; Mehta, Teena; Meldrim, James; Melnikov, Alexandre; Meneus, Louis; Mihalev, Atanas; Mihova, Tanya; Miller, Karen; Mittelman, Rachel; Mlenga, Valentine; Mulrain, Leonidas; Munson, Glen; Navidi, Adam; Naylor, Jerome; Nguyen, Tuyen; Nguyen, Nga; Nguyen, Cindy; Nguyen, Thu; Nicol, Robert; Norbu, Nyima; Norbu, Choe; Novod, Nathaniel; Nyima, Tenchoe; Olandt, Peter; O'Neill, Barry; O'Neill, Keith; Osman, Sahal; Oyono, Lucien; Patti, Christopher; Perrin, Danielle; Phunkhang, Pema; Pierre, Fritz; Priest, Margaret; Rachupka, Anthony; Raghuraman, Sujaa; Rameau, Rayale; Ray, Verneda; Raymond, Christina; Rege, Filip; Rise, Cecil; Rogers, Julie; Rogov, Peter; Sahalie, Julie; Settipalli, Sampath; Sharpe, Theodore; Shea, Terrance; Sheehan, Mechele; Sherpa, Ngawang; Shi, Jianying; Shih, Diana; Sloan, Jessie; Smith, Cherylyn; Sparrow, Todd; Stalker, John; Stange-Thomann, Nicole; Stavropoulos, Sharon; Stone, Catherine; Stone, Sabrina; Sykes, Sean; Tchuinga, Pierre; Tenzing, Pema; Tesfaye, Senait; Thoulutsang, Dawa; Thoulutsang, Yama; Topham, Kerri; Topping, Ira; Tsamla, Tsamla; Vassiliev, Helen; Venkataraman, Vijay; Vo, Andy; Wangchuk, Tsering; Wangdi, Tsering; Weiand, Michael; Wilkinson, Jane; Wilson, Adam; Yadav, Shailendra; Yang, Shuli; Yang, Xiaoping; Young, Geneva; Yu, Qing; Zainoun, Joanne; Zembek, Lisa; Zimmer, Andrew; Lander, Eric S
2005-12-08
Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.
Xia, Chongjing; Wang, Meinan; Yin, Chuntao; Cornejo, Omar E; Hulbert, Scot; Chen, Xianming
2018-05-24
Puccinia striiformis f. sp. tritici (Pst) causes devastating stripe (yellow) rust on wheat and P. striiformis f. sp. hordei (Psh) causes stripe rust on barley. Several Pst genomes are available, but no Psh genome is available. More genomes of Pst and Psh are needed to understand the genome evolution and molecular mechanisms of their pathogenicity. We sequenced Pst isolate 93-210 and Psh isolate 93TX-2 using PacBio and Illumina technologies, and RNA sequencing. Their genomic sequences were assembled to contigs with high continuity and showed significant structural differences. The circular mitochondria genomes of both were complete. These genomes provide high-quality resources for deciphering the genomic basis of rapid evolution and host adaptation, identifying genes for avirulence and other important traits, and studying host-pathogen interaction.
Standardization and quality management in next-generation sequencing.
Endrullat, Christoph; Glökler, Jörn; Franke, Philipp; Frohme, Marcus
2016-09-01
DNA sequencing continues to evolve quickly even after > 30 years. Many new platforms suddenly appeared and former established systems have vanished in almost the same manner. Since establishment of next-generation sequencing devices, this progress gains momentum due to the continually growing demand for higher throughput, lower costs and better quality of data. In consequence of this rapid development, standardized procedures and data formats as well as comprehensive quality management considerations are still scarce. Here, we listed and summarized current standardization efforts and quality management initiatives from companies, organizations and societies in form of published studies and ongoing projects. These comprise on the one hand quality documentation issues like technical notes, accreditation checklists and guidelines for validation of sequencing workflows. On the other hand, general standard proposals and quality metrics are developed and applied to the sequencing workflow steps with the main focus on upstream processes. Finally, certain standard developments for downstream pipeline data handling, processing and storage are discussed in brief. These standardization approaches represent a first basis for continuing work in order to prospectively implement next-generation sequencing in important areas such as clinical diagnostics, where reliable results and fast processing is crucial. Additionally, these efforts will exert a decisive influence on traceability and reproducibility of sequence data.
Chen, Xinyuan; Dai, Jianrong
2018-05-01
Magnetic Resonance Imaging (MRI) simulation differs from diagnostic MRI in purpose, technical requirements, and implementation. We propose a semiautomatic method for image acceptance and commissioning for the scanner, the radiofrequency (RF) coils, and pulse sequences for an MRI simulator. The ACR MRI accreditation large phantom was used for image quality analysis with seven parameters. Standard ACR sequences with a split head coil were adopted to examine the scanner's basic performance. The performance of simulation RF coils were measured and compared using the standard sequence with different clinical diagnostic coils. We used simulation sequences with simulation coils to test the quality of image and advanced performance of the scanner. Codes and procedures were developed for semiautomatic image quality analysis. When using standard ACR sequences with a split head coil, image quality passed all ACR recommended criteria. The image intensity uniformity with a simulation RF coil decreased about 34% compared with the eight-channel diagnostic head coil, while the other six image quality parameters were acceptable. Those two image quality parameters could be improved to more than 85% by built-in intensity calibration methods. In the simulation sequences test, the contrast resolution was sensitive to the FOV and matrix settings. The geometric distortion of simulation sequences such as T1-weighted and T2-weighted images was well-controlled in the isocenter and 10 cm off-center within a range of ±1% (2 mm). We developed a semiautomatic image quality analysis method for quantitative evaluation of images and commissioning of an MRI simulator. The baseline performances of simulation RF coils and pulse sequences have been established for routine QA. © 2018 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.
ABACAS: algorithm-based automatic contiguation of assembled sequences
Assefa, Samuel; Keane, Thomas M.; Otto, Thomas D.; Newbold, Chris; Berriman, Matthew
2009-01-01
Summary: Due to the availability of new sequencing technologies, we are now increasingly interested in sequencing closely related strains of existing finished genomes. Recently a number of de novo and mapping-based assemblers have been developed to produce high quality draft genomes from new sequencing technology reads. New tools are necessary to take contigs from a draft assembly through to a fully contiguated genome sequence. ABACAS is intended as a tool to rapidly contiguate (align, order, orientate), visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence. The input to ABACAS is a set of contigs which will be aligned to the reference genome, ordered and orientated, visualized in the ACT comparative browser, and optimal primer sequences are automatically generated. Availability and Implementation: ABACAS is implemented in Perl and is freely available for download from http://abacas.sourceforge.net Contact: sa4@sanger.ac.uk PMID:19497936
BLIPPED (BLIpped Pure Phase EncoDing) high resolution MRI with low amplitude gradients
NASA Astrophysics Data System (ADS)
Xiao, Dan; Balcom, Bruce J.
2017-12-01
MRI image resolution is proportional to the maximum k-space value, i.e. the temporal integral of the magnetic field gradient. High resolution imaging usually requires high gradient amplitudes and/or long spatial encoding times. Special gradient hardware is often required for high amplitudes and fast switching. We propose a high resolution imaging sequence that employs low amplitude gradients. This method was inspired by the previously proposed PEPI (π Echo Planar Imaging) sequence, which replaced EPI gradient reversals with multiple RF refocusing pulses. It has been shown that when the refocusing RF pulse is of high quality, i.e. sufficiently close to 180°, the magnetization phase introduced by the spatial encoding magnetic field gradient can be preserved and transferred to the following echo signal without phase rewinding. This phase encoding scheme requires blipped gradients that are identical for each echo, with low and constant amplitude, providing opportunities for high resolution imaging. We now extend the sequence to 3D pure phase encoding with low amplitude gradients. The method is compared with the Hybrid-SESPI (Spin Echo Single Point Imaging) technique to demonstrate the advantages in terms of low gradient duty cycle, compensation of concomitant magnetic field effects and minimal echo spacing, which lead to superior image quality and high resolution. The 3D imaging method was then applied with a parallel plate resonator RF probe, achieving a nominal spatial resolution of 17 μm in one dimension in the 3D image, requiring a maximum gradient amplitude of only 5.8 Gauss/cm.
The origins and impact of primate segmental duplications.
Marques-Bonet, Tomas; Girirajan, Santhosh; Eichler, Evan E
2009-10-01
Duplicated sequences are substrates for the emergence of new genes and are an important source of genetic instability associated with rare and common diseases. Analyses of primate genomes have shown an increase in the proportion of interspersed segmental duplications (SDs) within the genomes of humans and great apes. This contrasts with other mammalian genomes that seem to have their recently duplicated sequences organized in a tandem configuration. In this review, we focus on the mechanistic origin and impact of this difference with respect to evolution, genetic diversity and primate phenotype. Although many genomes will be sequenced in the future, resolution of this aspect of genomic architecture still requires high quality sequences and detailed analyses.
MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.
Lee, Wan-Ping; Stromberg, Michael P; Ward, Alistair; Stewart, Chip; Garrison, Erik P; Marth, Gabor T
2014-01-01
MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me).
MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping
Lee, Wan-Ping; Stromberg, Michael P.; Ward, Alistair; Stewart, Chip; Garrison, Erik P.; Marth, Gabor T.
2014-01-01
MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me). PMID:24599324
Manel, S; Perrier, C; Pratlong, M; Abi-Rached, L; Paganini, J; Pontarotti, P; Aurelle, D
2016-01-01
Genome scans represent powerful approaches to investigate the action of natural selection on the genetic variation of natural populations and to better understand local adaptation. This is very useful, for example, in the field of conservation biology and evolutionary biology. Thanks to Next Generation Sequencing, genomic resources are growing exponentially, improving genome scan analyses in non-model species. Thousands of SNPs called using Reduced Representation Sequencing are increasingly used in genome scans. Besides, genome sequences are also becoming increasingly available, allowing better processing of short-read data, offering physical localization of variants, and improving haplotype reconstruction and data imputation. Ultimately, genome sequences are also becoming the raw material for selection inferences. Here, we discuss how the increasing availability of such genomic resources, notably genome sequences, influences the detection of signals of selection. Mainly, increasing data density and having the information of physical linkage data expand genome scans by (i) improving the overall quality of the data, (ii) helping the reconstruction of demographic history for the population studied to decrease false-positive rates and (iii) improving the statistical power of methods to detect the signal of selection. Of particular importance, the availability of a high-quality reference genome can improve the detection of the signal of selection by (i) allowing matching the potential candidate loci to linked coding regions under selection, (ii) rapidly moving the investigation to the gene and function and (iii) ensuring that the highly variable regions of the genomes that include functional genes are also investigated. For all those reasons, using reference genomes in genome scan analyses is highly recommended. © 2015 John Wiley & Sons Ltd.
Novel chytrid lineages dominate fungal sequences in diverse marine and freshwater habitats
NASA Astrophysics Data System (ADS)
Comeau, André M.; Vincent, Warwick F.; Bernier, Louis; Lovejoy, Connie
2016-07-01
In aquatic environments, fungal communities remain little studied despite their taxonomic and functional diversity. To extend the ecological coverage of this group, we conducted an in-depth analysis of fungal sequences within our collection of 3.6 million V4 18S rRNA pyrosequences originating from 319 individual marine (including sea-ice) and freshwater samples from libraries generated within diverse projects studying Arctic and temperate biomes in the past decade. Among the ~1.7 million post-filtered reads of highest taxonomic and phylogenetic quality, 23,263 fungal sequences were identified. The overall mean proportion was 1.35%, but with large variability; for example, from 0.01 to 59% of total sequences for Arctic seawater samples. Almost all sample types were dominated by Chytridiomycota-like sequences, followed by moderate-to-minor contributions of Ascomycota, Cryptomycota and Basidiomycota. Species and/or strain richness was high, with many novel sequences and high niche separation. The affinity of the most common reads to phytoplankton parasites suggests that aquatic fungi deserve renewed attention for their role in algal succession and carbon cycling.
USDA-ARS?s Scientific Manuscript database
The rapid advancement in high-throughput SNP genotyping technologies along with next generation sequencing (NGS) platforms has decreased the cost, improved the quality of large-scale genome surveys, and allowed specialty crops with limited genomic resources such as carrot (Daucus carota) to access t...
Are commercial providers a viable option for clinical bacterial sequencing?
Raven, Kathy; Blane, Beth; Churcher, Carol; Parkhill, Julian; Peacock, Sharon J
2018-04-05
Bacterial whole-genome sequencing in the clinical setting has the potential to bring major improvements to infection control and clinical practice. Sequencing instruments are not currently available in the majority of routine microbiology laboratories worldwide, but an alternative is to use external sequencing providers. To foster discussion around this we investigated whether send-out services were a viable option. Four providers offering MiSeq sequencing were selected based on cost and evaluated based on the service provided and sequence data quality. DNA was prepared from five methicillin-resistant Staphylococcus aureus (MRSA) isolates, four of which were investigated during a previously published outbreak in the UK together with a reference MRSA isolate (ST22 HO 5096 0412). Cost of sequencing per isolate ranged from £155 to £342 and turnaround times from DNA postage to arrival of sequence data ranged from 12 to 63 days. Comparison of commercially generated genomes against the original sequence data demonstrated very high concordance, with no more than one single nucleotide polymorphism (SNP) difference on core genome mapping between the original sequences and the new sequence for all four providers. Multilocus sequence type could not be assigned based on assembly for the two cheapest sequence providers due to fragmented assemblies probably caused by a lower output of sequence data per isolate. Our results indicate that external providers returned highly accurate genome data, but that improvements are required in turnaround time to make this a viable option for use in clinical practice.
Bayer, Thomas; Adler, Werner; Janka, Rolf; Uder, Michael; Roemer, Frank
2017-12-01
To study the feasibility of magnetic resonance cinematography of the fingers (MRCF) with comparison of image quality of different protocols for depicting the finger anatomy during motion. MRCF was performed during a full flexion and extension movement in 14 healthy volunteers using a finger-gating device. Three real-time sequences (frame rates 17-59 images/min) and one proton density (PD) sequence (3 images/min) were acquired during incremental and continuous motion. Analyses were performed independently by three readers. Qualitative image analysis included Likert-scale grading from 0 (useless) to 5 (excellent) and specific visual analog scale (VAS) grading from 0 (insufficient) to 100 (excellent). Signal-to-noise calculation was performed. Overall percentage agreement and mean absolute disagreement were calculated. Within the real-time sequences a high frame-rate true fast imaging with steady-state free precession (TRUFI) yielded the best image quality with Likert and overall VAS scores of 3.0 ± 0.2 and 60.4 ± 25.3, respectively. The best sequence regarding image quality was an incremental PD with mean values of 4.8 ± 0.2 and 91.2 ± 9.4, respectively. Overall percentage agreement and mean absolute disagreement were 47.9 and 0.7, respectively. No statistically significant SNR differences were found between continuous and incremental motion for the real-time protocols. MRCF is feasible with appropriate image quality during continuous motion using a finger-gating device. Almost perfect image quality is achievable with incremental PD imaging, which represents a compromise for MRCF with the drawback of prolonged scanning time.
Hykin, Sarah M.; Bi, Ke; McGuire, Jimmy A.
2015-01-01
For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens—particularly for use in phylogenetic analyses—has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for genetic analysis. PMID:26505622
Hykin, Sarah M; Bi, Ke; McGuire, Jimmy A
2015-01-01
For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for genetic analysis.
The value of new genome references.
Worley, Kim C; Richards, Stephen; Rogers, Jeffrey
2017-09-15
Genomic information has become a ubiquitous and almost essential aspect of biological research. Over the last 10-15 years, the cost of generating sequence data from DNA or RNA samples has dramatically declined and our ability to interpret those data increased just as remarkably. Although it is still possible for biologists to conduct interesting and valuable research on species for which genomic data are not available, the impact of having access to a high quality whole genome reference assembly for a given species is nothing short of transformational. Research on a species for which we have no DNA or RNA sequence data is restricted in fundamental ways. In contrast, even access to an initial draft quality genome (see below for definitions) opens a wide range of opportunities that are simply not available without that reference genome assembly. Although a complete discussion of the impact of genome sequencing and assembly is beyond the scope of this short paper, the goal of this review is to summarize the most common and highest impact contributions that whole genome sequencing and assembly has had on comparative and evolutionary biology. Copyright © 2016. Published by Elsevier Inc.
Transcriptome-based differentiation of closely-related Miscanthus lines.
Chouvarine, Philippe; Cooksey, Amanda M; McCarthy, Fiona M; Ray, David A; Baldwin, Brian S; Burgess, Shane C; Peterson, Daniel G
2012-01-01
Distinguishing between individuals is critical to those conducting animal/plant breeding, food safety/quality research, diagnostic and clinical testing, and evolutionary biology studies. Classical genetic identification studies are based on marker polymorphisms, but polymorphism-based techniques are time and labor intensive and often cannot distinguish between closely related individuals. Illumina sequencing technologies provide the detailed sequence data required for rapid and efficient differentiation of related species, lines/cultivars, and individuals in a cost-effective manner. Here we describe the use of Illumina high-throughput exome sequencing, coupled with SNP mapping, as a rapid means of distinguishing between related cultivars of the lignocellulosic bioenergy crop giant miscanthus (Miscanthus × giganteus). We provide the first exome sequence database for Miscanthus species complete with Gene Ontology (GO) functional annotations. A SNP comparative analysis of rhizome-derived cDNA sequences was successfully utilized to distinguish three Miscanthus × giganteus cultivars from each other and from other Miscanthus species. Moreover, the resulting phylogenetic tree generated from SNP frequency data parallels the known breeding history of the plants examined. Some of the giant miscanthus plants exhibit considerable sequence divergence. Here we describe an analysis of Miscanthus in which high-throughput exome sequencing was utilized to differentiate between closely related genotypes despite the current lack of a reference genome sequence. We functionally annotated the exome sequences and provide resources to support Miscanthus systems biology. In addition, we demonstrate the use of the commercial high-performance cloud computing to do computational GO annotation.
O'Brien, Heath E; Gong, Yunchen; Fung, Pauline; Wang, Pauline W; Guttman, David S
2011-01-01
Next-generation genomic technology has both greatly accelerated the pace of genome research as well as increased our reliance on draft genome sequences. While groups such as the Genomics Standards Consortium have made strong efforts to promote genome standards there is a still a general lack of uniformity among published draft genomes, leading to challenges for downstream comparative analyses. This lack of uniformity is a particular problem when using standard draft genomes that frequently have large numbers of low-quality sequencing tracts. Here we present a proposal for an "enhanced-quality draft" genome that identifies at least 95% of the coding sequences, thereby effectively providing a full accounting of the genic component of the genome. Enhanced-quality draft genomes are easily attainable through a combination of small- and large-insert next-generation, paired-end sequencing. We illustrate the generation of an enhanced-quality draft genome by re-sequencing the plant pathogenic bacterium Pseudomonas syringae pv. phaseolicola 1448A (Pph 1448A), which has a published, closed genome sequence of 5.93 Mbp. We use a combination of Illumina paired-end and mate-pair sequencing, and surprisingly find that de novo assemblies with 100x paired-end coverage and mate-pair sequencing with as low as low as 2-5x coverage are substantially better than assemblies based on higher coverage. The rapid and low-cost generation of large numbers of enhanced-quality draft genome sequences will be of particular value for microbial diagnostics and biosecurity, which rely on precise discrimination of potentially dangerous clones from closely related benign strains.
Rapid evaluation and quality control of next generation sequencing data with FaQCs
Lo, Chien -Chi; Chain, Patrick S. G.
2014-12-01
Background: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Results: Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly processmore » large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. Conclusion: FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.« less
Sequencing artifacts in the type A influenza databases and attempts to correct them.
Suarez, David L; Chester, Nikki; Hatfield, Jason
2014-07-01
There are over 276 000 influenza gene sequences in public databases, with the quality of the sequences determined by the contributor. As part of a high school class project, influenza sequences with possible errors were identified in the public databases based on the size of the gene being longer than expected, with the hypothesis that these sequences would have an error. Students contacted sequence submitters alerting them of the possible sequence issue(s) and requested they the suspect sequence(s) be correct as appropriate. Type A influenza viruses were screened, and gene segments longer than the accepted size were identified for further analysis. Attention was placed on sequences with additional nucleotides upstream or downstream of the highly conserved non-coding ends of the viral segments. A total of 1081 sequences were identified that met this criterion. Three types of errors were commonly observed: non-influenza primer sequence wasn't removed from the sequence; PCR product was cloned and plasmid sequence was included in the sequence; and Taq polymerase added an adenine at the end of the PCR product. Internal insertions of nucleotide sequence were also commonly observed, but in many cases it was unclear if the sequence was correct or actually contained an error. A total of 215 sequences, or 22.8% of the suspect sequences, were corrected in the public databases in the first year of the student project. Unfortunately 138 additional sequences with possible errors were added to the databases in the second year. Additional awareness of the need for data integrity of sequences submitted to public databases is needed to fully reap the benefits of these large data sets. © 2014 The Authors. Influenza and Other Respiratory Viruses Published by John Wiley & Sons Ltd.
Salerno, Michael; Taylor, Angela; Yang, Yang; Kuruvilla, Sujith; Ragosta, Michael; Meyer, Craig H; Kramer, Christopher M
2014-07-01
Adenosine stress cardiovascular magnetic resonance perfusion imaging can be limited by motion-induced dark-rim artifacts, which may be mistaken for true perfusion abnormalities. A high-resolution variable-density spiral pulse sequence with a novel density compensation strategy has been shown to reduce dark-rim artifacts in first-pass perfusion imaging. We aimed to assess the clinical performance of adenosine stress cardiovascular magnetic resonance using this new perfusion sequence to detect obstructive coronary artery disease. Cardiovascular magnetic resonance perfusion imaging was performed during adenosine stress (140 μg/kg per minute) and at rest on a Siemens 1.5-T Avanto scanner in 41 subjects with chest pain scheduled for coronary angiography. Perfusion images were acquired during injection of 0.1 mmol/kg Gadolinium-diethylenetriaminepentacetate at 3 short-axis locations using a saturation recovery interleaved variable-density spiral pulse sequence. Significant stenosis was defined as >50% by quantitative coronary angiography. Two blinded reviewers evaluated the perfusion images for the presence of adenosine-induced perfusion abnormalities and assessed image quality using a 5-point scale (1 [poor] to 5 [excellent]). The prevalence of obstructive coronary artery disease by quantitative coronary angiography was 68%. The average sensitivity, specificity, and accuracy were 89%, 85%, and 88%, respectively, with a positive predictive value and negative predictive value of 93% and 79%, respectively. The average image quality score was 4.4±0.7, with only 1 study with more than mild dark-rim artifacts. There was good inter-reader reliability with a κ statistic of 0.67. Spiral adenosine stress cardiovascular magnetic resonance results in high diagnostic accuracy for the detection of obstructive coronary artery disease with excellent image quality and minimal dark-rim artifacts. © 2014 American Heart Association, Inc.
Using optical mapping data for the improvement of vertebrate genome assemblies.
Howe, Kerstin; Wood, Jonathan M D
2015-01-01
Optical mapping is a technology that gathers long-range information on genome sequences similar to ordered restriction digest maps. Because it is not subject to cloning, amplification, hybridisation or sequencing bias, it is ideally suited to the improvement of fragmented genome assemblies that can no longer be improved by classical methods. In addition, its low cost and rapid turnaround make it equally useful during the scaffolding process of de novo assembly from high throughput sequencing reads. We describe how optical mapping has been used in practice to produce high quality vertebrate genome assemblies. In particular, we detail the efforts undertaken by the Genome Reference Consortium (GRC), which maintains the reference genomes for human, mouse, zebrafish and chicken, and uses different optical mapping platforms for genome curation.
Holt, Kathryn E; Teo, Yik Y; Li, Heng; Nair, Satheesh; Dougan, Gordon; Wain, John; Parkhill, Julian
2009-08-15
Here, we present a method for estimating the frequencies of SNP alleles present within pooled samples of DNA using high-throughput short-read sequencing. The method was tested on real data from six strains of the highly monomorphic pathogen Salmonella Paratyphi A, sequenced individually and in a pool. A variety of read mapping and quality-weighting procedures were tested to determine the optimal parameters, which afforded > or =80% sensitivity of SNP detection and strong correlation with true SNP frequency at poolwide read depth of 40x, declining only slightly at read depths 20-40x. The method was implemented in Perl and relies on the opensource software Maq for read mapping and SNP calling. The Perl script is freely available from ftp://ftp.sanger.ac.uk/pub/pathogens/pools/.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kalamorz, Falk; Keis, Stefanie; Stanton, Jo-Ann
The genes and molecular machines that allow for a thermoalkaliphilic lifestyle have not been defined. To address this goal, we report on the improved high-quality draft genome sequence of Caldalkalibacillus thermarum strain TA2.A1, an obligately aerobic bacterium that grows optimally at pH 9.5 and 65 to 70 C on a wide variety of carbon and energy sources.
Gehlot, Hukam Singh; Ardley, Julie; Tak, Nisha; ...
2016-06-23
Ensifer sp. PC2 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from a nitrogen-fixing nodule of the tree legume P. cineraria (L.) Druce (Khejri), which is a keystone species that grows in arid and semi-arid regions of the Indian Thar desert. Strain PC2 exists as a dominant saprophyte in alkaline soils of Western Rajasthan. It is fast growing, well-adapted to arid conditions and is able to form an effective symbiosis with several annual crop legumes as well as species of mimosoid trees and shrubs. Here we describe the features of Ensifer sp. PC2, together with genome sequence informationmore » and its annotation. The 8,458,965 bp high-quality permanent draft genome is arranged into 171 scaffolds of 171 contigs containing 8,344 protein-coding genes and 139 RNA-only encoding genes, and is one of the rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project proposal.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gehlot, Hukam Singh; Ardley, Julie; Tak, Nisha
Ensifer sp. PC2 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from a nitrogen-fixing nodule of the tree legume P. cineraria (L.) Druce (Khejri), which is a keystone species that grows in arid and semi-arid regions of the Indian Thar desert. Strain PC2 exists as a dominant saprophyte in alkaline soils of Western Rajasthan. It is fast growing, well-adapted to arid conditions and is able to form an effective symbiosis with several annual crop legumes as well as species of mimosoid trees and shrubs. Here we describe the features of Ensifer sp. PC2, together with genome sequence informationmore » and its annotation. The 8,458,965 bp high-quality permanent draft genome is arranged into 171 scaffolds of 171 contigs containing 8,344 protein-coding genes and 139 RNA-only encoding genes, and is one of the rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project proposal.« less
Efficient isolation method for high-quality genomic DNA from cicada exuviae.
Nguyen, Hoa Quynh; Kim, Ye Inn; Borzée, Amaël; Jang, Yikweon
2017-10-01
In recent years, animal ethics issues have led researchers to explore nondestructive methods to access materials for genetic studies. Cicada exuviae are among those materials because they are cast skins that individuals left after molt and are easily collected. In this study, we aim to identify the most efficient extraction method to obtain high quantity and quality of DNA from cicada exuviae. We compared relative DNA yield and purity of six extraction protocols, including both manual protocols and available commercial kits, extracting from four different exoskeleton parts. Furthermore, amplification and sequencing of genomic DNA were evaluated in terms of availability of sequencing sequence at the expected genomic size. Both the choice of protocol and exuvia part significantly affected DNA yield and purity. Only samples that were extracted using the PowerSoil DNA Isolation kit generated gel bands of expected size as well as successful sequencing results. The failed attempts to extract DNA using other protocols could be partially explained by a low DNA yield from cicada exuviae and partly by contamination with humic acids that exist in the soil where cicada nymphs reside before emergence, as shown by spectroscopic measurements. Genomic DNA extracted from cicada exuviae could provide valuable information for species identification, allowing the investigation of genetic diversity across consecutive broods, or spatiotemporal variation among various populations. Consequently, we hope to provide a simple method to acquire pure genomic DNA applicable for multiple research purposes.
Ardley, Julie; Tian, Rui; O’Hara, Graham; ...
2015-12-01
We report that Ensifer medicae WSM244 is an aerobic, motile, Gram-negative, non-spore-forming rod that can exist as a soil saprophyte or as a legume microsymbiont of Medicago species. WSM244 was isolated in 1979 from a nodule recovered from the roots of the annual Medicago polymorpha L. growing in alkaline soil (pH 8.0) in Tel Afer, Iraq. WSM244 is the only acid-sensitive E. medicae strain that has been sequenced to date. It is effective at fixing nitrogen with M. polymorpha L., as well as with more alkaline-adapted Medicago spp. such as M. littoralis Loisel., M. scutellata (L.) Mill., M. tornata (L.)more » Mill. and M. truncatula Gaertn. This strain is also effective with the perennial M. sativa L. Here we describe the features of E. medicae WSM244, together with genome sequence information and its annotation. The 6,650,282 bp high-quality permanent draft genome is arranged into 91 scaffolds of 91 contigs containing 6,427 protein-coding genes and 68 RNA-only encoding genes, and is one of the rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project proposal.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ardley, Julie; Tian, Rui; O’Hara, Graham
We report that Ensifer medicae WSM244 is an aerobic, motile, Gram-negative, non-spore-forming rod that can exist as a soil saprophyte or as a legume microsymbiont of Medicago species. WSM244 was isolated in 1979 from a nodule recovered from the roots of the annual Medicago polymorpha L. growing in alkaline soil (pH 8.0) in Tel Afer, Iraq. WSM244 is the only acid-sensitive E. medicae strain that has been sequenced to date. It is effective at fixing nitrogen with M. polymorpha L., as well as with more alkaline-adapted Medicago spp. such as M. littoralis Loisel., M. scutellata (L.) Mill., M. tornata (L.)more » Mill. and M. truncatula Gaertn. This strain is also effective with the perennial M. sativa L. Here we describe the features of E. medicae WSM244, together with genome sequence information and its annotation. The 6,650,282 bp high-quality permanent draft genome is arranged into 91 scaffolds of 91 contigs containing 6,427 protein-coding genes and 68 RNA-only encoding genes, and is one of the rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project proposal.« less
Long-read sequencing data analysis for yeasts.
Yue, Jia-Xing; Liti, Gianni
2018-06-01
Long-read sequencing technologies have become increasingly popular due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast Saccharomyces cerevisiae has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here, we present a modular computational framework named long-read sequencing data analysis for yeasts (LRSDAY), the first one-stop solution that streamlines this process. Starting from the raw sequencing reads, LRSDAY can produce chromosome-level genome assembly and comprehensive genome annotation in a highly automated manner with minimal manual intervention, which is not possible using any alternative tool available to date. The annotated genomic features include centromeres, protein-coding genes, tRNAs, transposable elements (TEs), and telomere-associated elements. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable to virtually any eukaryotic organism. When applying LRSDAY to an S. cerevisiae strain, it takes ∼41 h to generate a complete and well-annotated genome from ∼100× Pacific Biosciences (PacBio) running the basic workflow with four threads. Basic experience working within the Linux command-line environment is recommended for carrying out the analysis using LRSDAY.
Development of Genetic Markers in Eucalyptus Species by Target Enrichment and Exome Sequencing
Dasgupta, Modhumita Ghosh; Dharanishanthi, Veeramuthu; Agarwal, Ishangi; Krutovsky, Konstantin V.
2015-01-01
The advent of next-generation sequencing has facilitated large-scale discovery, validation and assessment of genetic markers for high density genotyping. The present study was undertaken to identify markers in genes supposedly related to wood property traits in three Eucalyptus species. Ninety four genes involved in xylogenesis were selected for hybridization probe based nuclear genomic DNA target enrichment and exome sequencing. Genomic DNA was isolated from the leaf tissues and used for on-array probe hybridization followed by Illumina sequencing. The raw sequence reads were trimmed and high-quality reads were mapped to the E. grandis reference sequence and the presence of single nucleotide variants (SNVs) and insertions/ deletions (InDels) were identified across the three species. The average read coverage was 216X and a total of 2294 SNVs and 479 InDels were discovered in E. camaldulensis, 2383 SNVs and 518 InDels in E. tereticornis, and 1228 SNVs and 409 InDels in E. grandis. Additionally, SNV calling and InDel detection were conducted in pair-wise comparisons of E. tereticornis vs. E. grandis, E. camaldulensis vs. E. tereticornis and E. camaldulensis vs. E. grandis. This study presents an efficient and high throughput method on development of genetic markers for family– based QTL and association analysis in Eucalyptus. PMID:25602379
DOE Office of Scientific and Technical Information (OSTI.GOV)
Catfish Genome Consortium; Wang, Shaolin; Peatman, Eric
2010-03-23
Background-Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. Results-A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35percent of the unique sequences had significant similarities tomore » known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. Conclusions-This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.« less
Long-read sequencing of chicken transcripts and identification of new transcript isoforms.
Thomas, Sean; Underwood, Jason G; Tseng, Elizabeth; Holloway, Alisha K
2014-01-01
The chicken has long served as an important model organism in many fields, and continues to aid our understanding of animal development. Functional genomics studies aimed at probing the mechanisms that regulate development require high-quality genomes and transcript annotations. The quality of these resources has improved dramatically over the last several years, but many isoforms and genes have yet to be identified. We hope to contribute to the process of improving these resources with the data presented here: a set of long cDNA sequencing reads, and a curated set of new genes and transcript isoforms not currently represented in the most up-to-date genome annotation currently available to the community of researchers who rely on the chicken genome.
First insight into the faecal microbiota of the high Arctic muskoxen (Ovibos moschatus)
Bockwoldt, Mathias; Hagen, Live H.; Pope, Phillip B.; Sundset, Monica A.
2016-01-01
The faecal microbiota of muskoxen (n=3) pasturing on Ryøya (69° 33′ N 18° 43′ E), Norway, in late September was characterized using high-throughput sequencing of partial 16S rRNA gene regions. A total of 16 209 high-quality sequence reads from bacterial domains and 19 462 from archaea were generated. Preliminary taxonomic classifications of 806 bacterial operational taxonomic units (OTUs) resulted in 53.7–59.3 % of the total sequences being without designations beyond the family level. Firmicutes (70.7–81.1 % of the total sequences) and Bacteroidetes (16.8–25.3 %) constituted the two major bacterial phyla, with uncharacterized members within the family Ruminococcaceae (28.9–40.9 %) as the major phylotype. Multiple-library comparisons between muskoxen and other ruminants indicated a higher similarity for muskoxen faeces and reindeer caecum (P>0.05) and some samples from cattle faeces. The archaeal sequences clustered into 37 OTUs, with dominating phylotypes affiliated to the methane-producing genus Methanobrevibacter (80–92 % of the total sequences). UniFrac analysis demonstrated heterogeneity between muskoxen archaeal libraries and those from reindeer and roe deer (P=1.0e-02, Bonferroni corrected), but not with foregut fermenters. The high proportion of cellulose-degrading Ruminococcus-affiliated bacteria agrees with the ingestion of a highly fibrous diet. Further experiments are required to elucidate the role played by these novel bacteria in the digestion of this fibrous Artic diet eaten by muskoxen. PMID:28348861
Mu, John C.; Tootoonchi Afshar, Pegah; Mohiyuddin, Marghoob; Chen, Xi; Li, Jian; Bani Asadi, Narges; Gerstein, Mark B.; Wong, Wing H.; Lam, Hugo Y. K.
2015-01-01
A high-confidence, comprehensive human variant set is critical in assessing accuracy of sequencing algorithms, which are crucial in precision medicine based on high-throughput sequencing. Although recent works have attempted to provide such a resource, they still do not encompass all major types of variants including structural variants (SVs). Thus, we leveraged the massive high-quality Sanger sequences from the HuRef genome to construct by far the most comprehensive gold set of a single individual, which was cross validated with deep Illumina sequencing, population datasets, and well-established algorithms. It was a necessary effort to completely reanalyze the HuRef genome as its previously published variants were mostly reported five years ago, suffering from compatibility, organization, and accuracy issues that prevent their direct use in benchmarking. Our extensive analysis and validation resulted in a gold set with high specificity and sensitivity. In contrast to the current gold sets of the NA12878 or HS1011 genomes, our gold set is the first that includes small variants, deletion SVs and insertion SVs up to a hundred thousand base-pairs. We demonstrate the utility of our HuRef gold set to benchmark several published SV detection tools. PMID:26412485
Guo, Xiao-Hui; Bi, Zhe-Guang; Wu, Bi-Hua; Wang, Zhen-Zhen; Hu, Ji-Liang; Zheng, You-Liang; Liu, Deng-Cai
2013-12-01
High-molecular-weight glutenin subunits (HMW-GSs) are of considerable interest, because they play a crucial role in determining dough viscoelastic properties and end-use quality of wheat flour. In this paper, ChAy/Bx, a novel chimeric HMW-GS gene from Triticum turgidum ssp. dicoccoides (AABB, 2n=4x=28) accession D129, was isolated and characterized. Sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) analysis revealed that the electrophoretic mobility of the glutenin subunit encoded by ChAy/Bx was slightly faster than that of 1Dy12. The complete ORF of ChAy/Bx contained 1,671 bp encoding a deduced polypeptide of 555 amino acid residues (or 534 amino acid residues for the mature protein), making it the smallest HMW-GS gene known from Triticum species. Sequence analysis showed that ChAy/Bx was neither a conventional x-type nor a conventional y-type subunit gene, but a novel chimeric gene. Its first 1305 nt sequence was highly homologous with the corresponding sequence of 1Ay type genes, while its final 366 nt sequence was highly homologous with the corresponding sequence of 1Bx type genes. The mature ChAy/Bx protein consisted of the N-terminus of 1Ay type subunit (the first 414 amino acid residues) and the C-terminus of 1Bx type subunit (the final 120 amino acid residues). Secondary structure prediction showed that ChAy/Bx contained some domains of 1Ay subunit and some domains of 1Bx subunit. The special structure of this HMW glutenin chimera ChAy/Bx subunit might have unique effects on the end-use quality of wheat flour. Here we propose that homoeologous recombination might be a novel pathway for allelic variation or molecular evolution of HMW-GSs. © 2013.
ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysis
2011-01-01
Background Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of Drosophila melanogaster. Results Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis. Conclusions Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis. PMID:21356108
Vos, Sjoerd B; Micallef, Caroline; Barkhof, Frederik; Hill, Andrea; Winston, Gavin P; Ourselin, Sebastien; Duncan, John S
2018-03-02
T2-FLAIR is the single most sensitive MRI contrast to detect lesions underlying focal epilepsies but 3D sequences used to obtain isotropic high-resolution images are susceptible to motion artefacts. Prospective motion correction (PMC) - demonstrated to improve 3D-T1 image quality in a pediatric population - was applied to high-resolution 3D-T2-FLAIR scans in adult epilepsy patients to evaluate its clinical benefit. Coronal 3D-T2-FLAIR scans were acquired with a 1mm isotropic resolution on a 3T MRI scanner. Two expert neuroradiologists reviewed 40 scans without PMC and 40 with navigator-based PMC. Visual assessment addressed six criteria of image quality (resolution, SNR, WM-GM contrast, intensity homogeneity, lesion conspicuity, diagnostic confidence) on a seven-point Likert scale (from non-diagnostic to outstanding). SNR was also objectively quantified within the white matter. PMC scans had near-identical scores on the criteria of image quality to non-PMC scans, with the notable exception that intensity homogeneity was generally worse. Using PMC, the percentage of scans with bad image quality was substantially lower than without PMC (3.25% vs. 12.5%) on the other five criteria. Quantitative SNR estimates revealed that PMC and non-PMC had no significant difference in SNR (P=0.07). Application of prospective motion correction to 3D-T2-FLAIR sequences decreased the percentage of low-quality scans, reducing the number of scans that need to be repeated to obtain clinically useful data. Copyright © 2018 The Authors. Published by Elsevier Masson SAS.. All rights reserved.
The interobserver-validated relevance of intervertebral spacer materials in MRI artifacting
Heidrich, G.; Bruening, T.; Krefft, S.; Buchhorn, G.; Klinger, H.M.
2006-01-01
Intervertebral spacers for anterior spine fusion are made of different materials, such as titanium, carbon or cobalt-chrome, which can affect the post-fusion MRI scans. Implant-related susceptibility artifacts can decrease the quality of MRI scans, thwarting proper evaluation. This cadaver study aimed to demonstrate the extent that implant-related MRI artifacting affects the post-fusion evaluation of intervertebral spacers. In a cadaveric porcine spine, we evaluated the post-implantation MRI scans of three intervertebral spacers that differed in shape, material, surface qualities and implantation technique. A spacer made of human cortical bone was used as a control. The median sagittal MRI slice was divided into 12 regions of interest (ROI). No significant differences were found on 15 different MRI sequences read independently by an interobserver-validated team of specialists (P>0.05). Artifact-affected image quality was rated on a score of 0-1-2. A maximum score of 24 points (100%) was possible. Turbo spin echo sequences produced the best scores for all spacers and the control. Only the control achieved a score of 100%. The carbon, titanium and cobalt-chrome spacers scored 83.3, 62.5 and 50%, respectively. Our scoring system allowed us to create an implant-related ranking of MRI scan quality in reference to the control that was independent of artifact dimensions. The carbon spacer had the lowest percentage of susceptibility artifacts. Even with turbo spin echo sequences, the susceptibility artifacts produced by the metallic spacers showed a high degree of variability. Despite optimum sequencing, implant design and material are relevant factors in MRI artifacting. PMID:16463200
Dyvorne, Hadrien A; Galea, Nicola; Nevers, Thomas; Fiel, M Isabel; Carpenter, David; Wong, Edmund; Orton, Matthew; de Oliveira, Andre; Feiweier, Thorsten; Vachon, Marie-Louise; Babb, James S; Taouli, Bachir
2013-03-01
To optimize intravoxel incoherent motion (IVIM) diffusion-weighted (DW) imaging by estimating the effects of diffusion gradient polarity and breathing acquisition scheme on image quality, signal-to-noise ratio (SNR), IVIM parameters, and parameter reproducibility, as well as to investigate the potential of IVIM in the detection of hepatic fibrosis. In this institutional review board-approved prospective study, 20 subjects (seven healthy volunteers, 13 patients with hepatitis C virus infection; 14 men, six women; mean age, 46 years) underwent IVIM DW imaging with four sequences: (a) respiratory-triggered (RT) bipolar (BP) sequence, (b) RT monopolar (MP) sequence, (c) free-breathing (FB) BP sequence, and (d) FB MP sequence. Image quality scores were assessed for all sequences. A biexponential analysis with the Bayesian method yielded true diffusion coefficient (D), pseudodiffusion coefficient (D*), and perfusion fraction (PF) in liver parenchyma. Mixed-model analysis of variance was used to compare image quality, SNR, IVIM parameters, and interexamination variability between the four sequences, as well as the ability to differentiate areas of liver fibrosis from normal liver tissue. Image quality with RT sequences was superior to that with FB acquisitions (P = .02) and was not affected by gradient polarity. SNR did not vary significantly between sequences. IVIM parameter reproducibility was moderate to excellent for PF and D, while it was less reproducible for D*. PF and D were both significantly lower in patients with hepatitis C virus than in healthy volunteers with the RT BP sequence (PF = 13.5% ± 5.3 [standard deviation] vs 9.2% ± 2.5, P = .038; D = [1.16 ± 0.07] × 10(-3) mm(2)/sec vs [1.03 ± 0.1] × 10(-3) mm(2)/sec, P = .006). The RT BP DW imaging sequence had the best results in terms of image quality, reproducibility, and ability to discriminate between healthy and fibrotic liver with biexponential fitting.
Single Cell Total RNA Sequencing through Isothermal Amplification in Picoliter-Droplet Emulsion.
Fu, Yusi; Chen, He; Liu, Lu; Huang, Yanyi
2016-11-15
Prevalent single cell RNA amplification and sequencing chemistries mainly focus on polyadenylated RNAs in eukaryotic cells by using oligo(dT) primers for reverse transcription. We develop a new RNA amplification method, "easier-seq", to reverse transcribe and amplify the total RNAs, both with and without polyadenylate tails, from a single cell for transcriptome sequencing with high efficiency, reproducibility, and accuracy. By distributing the reverse transcribed cDNA molecules into 1.5 × 10 5 aqueous droplets in oil, the cDNAs are isothermally amplified using random primers in each of these 65-pL reactors separately. This new method greatly improves the ease of single-cell RNA sequencing by reducing the experimental steps. Meanwhile, with less chance to induce errors, this method can easily maintain the quality of single-cell sequencing. In addition, this polyadenylate-tail-independent method can be seamlessly applied to prokaryotic cell RNA sequencing.
Berger, C; Berger, B; Parson, W
2012-01-01
In recent years, evidence from domestic dogs has increasingly been analyzed by forensic DNA testing. Especially, canine hairs have proved most suitable and practical due to the high rate of hair transfer occurring between dogs and humans. Starting with the description of a contamination-free sample handling procedure, we give a detailed workflow for sequencing hypervariable segments (HVS) of the mtDNA control region from canine evidence. After the hair material is lysed and the DNA extracted by Phenol/Chloroform, the amplification and sequencing strategy comprises the HVS I and II of the canine control region and is optimized for DNA of medium-to-low quality and quantity. The sequencing procedure is based on the Sanger Big-dye deoxy-terminator method and the separation of the sequencing reaction products is performed on a conventional multicolor fluorescence detection capillary electrophoresis platform. Finally, software-aided base calling and sequence interpretation are addressed exemplarily.
Initial sequencing and comparative analysis of the mouse genome
DOE Office of Scientific and Technical Information (OSTI.GOV)
Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan
2002-12-15
The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of themore » genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.« less
Simple chained guide trees give high-quality protein multiple sequence alignments
Boyce, Kieran; Sievers, Fabian; Higgins, Desmond G.
2014-01-01
Guide trees are used to decide the order of sequence alignment in the progressive multiple sequence alignment heuristic. These guide trees are often the limiting factor in making large alignments, and considerable effort has been expended over the years in making these quickly or accurately. In this article we show that, at least for protein families with large numbers of sequences that can be benchmarked with known structures, simple chained guide trees give the most accurate alignments. These also happen to be the fastest and simplest guide trees to construct, computationally. Such guide trees have a striking effect on the accuracy of alignments produced by some of the most widely used alignment packages. There is a marked increase in accuracy and a marked decrease in computational time, once the number of sequences goes much above a few hundred. This is true, even if the order of sequences in the guide tree is random. PMID:25002495
Flegontov, Pavel; Butenko, Anzhelika; Firsov, Sergei; Kraeva, Natalya; Eliáš, Marek; Field, Mark C.; Filatov, Dmitry; Flegontova, Olga; Gerasimov, Evgeny S.; Hlaváčová, Jana; Ishemgulova, Aygul; Jackson, Andrew P.; Kelly, Steve; Kostygov, Alexei Y.; Logacheva, Maria D.; Maslov, Dmitri A.; Opperdoes, Fred R.; O’Reilly, Amanda; Sádlová, Jovana; Ševčíková, Tereza; Venkatesh, Divya; Vlček, Čestmír; Volf, Petr; Jan Votýpka; Záhonová, Kristína; Yurchenko, Vyacheslav; Lukeš, Julius
2016-01-01
Many high-quality genomes are available for dixenous (two hosts) trypanosomatid species of the genera Trypanosoma, Leishmania, and Phytomonas, but only fragmentary information is available for monoxenous (single-host) trypanosomatids. In trypanosomatids, monoxeny is ancestral to dixeny, thus it is anticipated that the genome sequences of the key monoxenous parasites will be instrumental for both understanding the origin of parasitism and the evolution of dixeny. Here, we present a high-quality genome for Leptomonas pyrrhocoris, which is closely related to the dixenous genus Leishmania. The L. pyrrhocoris genome (30.4 Mbp in 60 scaffolds) encodes 10,148 genes. Using the L. pyrrhocoris genome, we pinpointed genes gained in Leishmania. Among those genes, 20 genes with unknown function had expression patterns in the Leishmania mexicana life cycle suggesting their involvement in virulence. By combining differential expression data for L. mexicana, L. major and Leptomonas seymouri, we have identified several additional proteins potentially involved in virulence, including SpoU methylase and U3 small nucleolar ribonucleoprotein IMP3. The population genetics of L. pyrrhocoris was also addressed by sequencing thirteen strains of different geographic origin, allowing the identification of 1,318 genes under positive selection. This set of genes was significantly enriched in components of the cytoskeleton and the flagellum. PMID:27021793
Lavdas, Eleftherios; Mavroidis, Panayiotis; Kostopoulos, Spiros; Glotsos, Dimitrios; Roka, Violeta; Koutsiaris, Aristotle G; Batsikas, Georgios; Sakkas, Georgios K; Tsagkalis, Antonios; Notaras, Ioannis; Stathakis, Sotirios; Papanikolaou, Nikos; Vassiou, Katerina
2013-07-01
The purpose of this study is to evaluate the ability of T2 turbo spin echo (TSE) axial and sagittal BLADE sequences in reducing or even eliminating motion, pulsatile flow and cross-talk artifacts in lumbar spine MRI examinations. Forty four patients, who had routinely undergone a lumbar spine examination, participated in the study. The following pairs of sequences with and without BLADE were compared: a) T2 TSE Sagittal (SAG) in thirty two cases, and b) T2 TSE Axial (AX) also in thirty two cases. Both quantitative and qualitative analyses were performed based on measurements in different normal anatomical structures and examination of seven characteristics, respectively. The qualitative analysis was performed by experienced radiologists. Also, the presence of image motion, pulsatile flow and cross-talk artifacts was evaluated. Based on the results of the qualitative analysis for the different sequences and anatomical structures, the BLADE sequences were found to be significantly superior to the conventional ones in all the cases. The BLADE sequences eliminated the motion artifacts in all the cases. In our results, it was found that in the examined sequences (sagittal and axial) the differences between the BLADE and conventional sequences regarding the elimination of motion, pulsatile flow and cross-talk artifacts were statistically significant. In all the comparisons, the T2 TSE BLADE sequences were significantly superior to the corresponding conventional sequences regarding the classification of their image quality. In conclusion, this technique appears to be capable of potentially eliminating motion, pulsatile flow and cross-talk artifacts in lumbar spine MR images and producing high quality images in collaborative and non-collaborative patients. Copyright © 2013 Elsevier Inc. All rights reserved.
Tracking prominent points in image sequences
NASA Astrophysics Data System (ADS)
Hahn, Michael
1994-03-01
Measuring image motion and inferring scene geometry and camera motion are main aspects of image sequence analysis. The determination of image motion and the structure-from-motion problem are tasks that can be addressed independently or in cooperative processes. In this paper we focus on tracking prominent points. High stability, reliability, and accuracy are criteria for the extraction of prominent points. This implies that tracking should work quite well with those features; unfortunately, the reality looks quite different. In the experimental investigations we processed a long sequence of 128 images. This mono sequence is taken in an outdoor environment at the experimental field of Mercedes Benz in Rastatt. Different tracking schemes are explored and the results with respect to stability and quality are reported.
Comparison of the quality of different magnetic resonance image sequences of multiple myeloma.
Sun, Zhao-yong; Zhang, Hai-bo; Li, Shuo; Wang, Yun; Xue, Hua-dan; Jin, Zheng-yu
2015-02-01
To compare the image quality of T1WI fat phase,T1WI water phase, short time inversion recovery (STIR) sequence, and diffusion weighted imaging (DWI) sequence in the evaluation of multiple myeloma (MM). Totally 20MM patients were enrolled in this study. All patients underwent scanning at coronal T1WI fat phase, coronal T1WI water phase, coronal STIR sequence, and axial DWI sequence. The image quality of the four different sequences was evaluated. The image was divided into seven sections(head and neck, chest, abdomen, pelvis, thigh, leg, and foot), and the signal-to-noise ratio (SNR) of each section was measured at 7 segments (skull, spine, pelvis, humerus, femur, tibia and fibula and ribs) were measured. In addition, 20 active MM lesions were selected, and the contrast-to-noise ratio (CNR) of each scan sequence was calculated. The average image quality scores of T1WI fat phase,T1WI water phase, STIR sequence, and DWI sequence were 4.19 ± 0.70,4.16 ± 0.73,3.89 ± 0.70, and 3.76 ± 0.68, respectively. The image quality at T1-fat phase and T1-water phase were significantly higher than those at STIR (P=0.000 and P=0.001) and DWI sequence (both P=0.000); however, there was no significant difference between T1-fat and T1-water phase (P=0.723)and between STIR and DWI sequence (P=0.167). The SNR of T1WI fat phase was significantly higher than those of the other three sequences (all P=0.000), and there was no significant difference among the other three sequences (all P>0.05). Although the CNR of DWI sequences was slightly higher than those of the other three sequences,there was no significant difference among all of them (all P>0.05). Imaging at T1WI fat phase,T1WI water phase, STIR sequence, and DWI sequence has certain advantages,and they should be combined in the diagnosis of MM.
USDA-ARS?s Scientific Manuscript database
Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat and an important genetic resource for wheat. A reference-quality sequence for the Ae. tauschii genome was produced with a combination of ordered-clone sequencing, whole-genome shotgun sequencing, and BioNano optical geno...
Wilkie, Joel R.; Matuszak, Martha M.; Feng, Mary; Moran, Jean M.; Fraass, Benedick A.
2013-01-01
Purpose: Plan degradation resulting from compromises made to enhance delivery efficiency is an important consideration for intensity modulated radiation therapy (IMRT) treatment plans. IMRT optimization and/or multileaf collimator (MLC) sequencing schemes can be modified to generate more efficient treatment delivery, but the effect those modifications have on plan quality is often difficult to quantify. In this work, the authors present a method for quantitative assessment of overall plan quality degradation due to tradeoffs between delivery efficiency and treatment plan quality, illustrated using comparisons between plans developed allowing different numbers of intensity levels in IMRT optimization and/or MLC sequencing for static segmental MLC IMRT plans. Methods: A plan quality degradation method to evaluate delivery efficiency and plan quality tradeoffs was developed and used to assess planning for 14 prostate and 12 head and neck patients treated with static IMRT. Plan quality was evaluated using a physician's predetermined “quality degradation” factors for relevant clinical plan metrics associated with the plan optimization strategy. Delivery efficiency and plan quality were assessed for a range of optimization and sequencing limitations. The “optimal” (baseline) plan for each case was derived using a clinical cost function with an unlimited number of intensity levels. These plans were sequenced with a clinical MLC leaf sequencer which uses >100 segments, assuring delivered intensities to be within 1% of the optimized intensity pattern. Each patient's optimal plan was also sequenced limiting the number of intensity levels (20, 10, and 5), and then separately optimized with these same numbers of intensity levels. Delivery time was measured for all plans, and direct evaluation of the tradeoffs between delivery time and plan degradation was performed. Results: When considering tradeoffs, the optimal number of intensity levels depends on the treatment site and on the stage in the process at which the levels are limited. The cost of improved delivery efficiency, in terms of plan quality degradation, increased as the number of intensity levels in the sequencer or optimizer decreased. The degradation was more substantial for the head and neck cases relative to the prostate cases, particularly when fewer than 20 intensity levels were used. Plan quality degradation was less severe when the number of intensity levels was limited in the optimizer rather than the sequencer. Conclusions: Analysis of plan quality degradation allows for a quantitative assessment of the compromises in clinical plan quality as delivery efficiency is improved, in order to determine the optimal delivery settings. The technique is based on physician-determined quality degradation factors and can be extended to other clinical situations where investigation of various tradeoffs is warranted. PMID:23822412
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krishnakumar, Raga; Sinha, Anupama; Bird, Sara W.
Emerging sequencing technologies are allowing us to characterize environmental, clinical and laboratory samples with increasing speed and detail, including real-time analysis and interpretation of data. One example of this is being able to rapidly and accurately detect a wide range of pathogenic organisms, both in the clinic and the field. Genomes can have radically different GC content however, such that accurate sequence analysis can be challenging depending upon the technology used. Here, we have characterized the performance of the Oxford MinION nanopore sequencer for detection and evaluation of organisms with a range of genomic nucleotide bias. We have diagnosed themore » quality of base-calling across individual reads and discovered that the position within the read affects base-calling and quality scores. Finally, we have evaluated the performance of the current state-of-the-art neural network-based MinION basecaller, characterizing its behavior with respect to systemic errors as well as context- and sequence-specific errors. Overall, we present a detailed characterization the capabilities of the MinION in terms of generating high-accuracy sequence data from genomes with a wide range of nucleotide content. This study provides a framework for designing the appropriate experiments that are the likely to lead to accurate and rapid field-forward diagnostics.« less
Krishnakumar, Raga; Sinha, Anupama; Bird, Sara W.; ...
2018-02-16
Emerging sequencing technologies are allowing us to characterize environmental, clinical and laboratory samples with increasing speed and detail, including real-time analysis and interpretation of data. One example of this is being able to rapidly and accurately detect a wide range of pathogenic organisms, both in the clinic and the field. Genomes can have radically different GC content however, such that accurate sequence analysis can be challenging depending upon the technology used. Here, we have characterized the performance of the Oxford MinION nanopore sequencer for detection and evaluation of organisms with a range of genomic nucleotide bias. We have diagnosed themore » quality of base-calling across individual reads and discovered that the position within the read affects base-calling and quality scores. Finally, we have evaluated the performance of the current state-of-the-art neural network-based MinION basecaller, characterizing its behavior with respect to systemic errors as well as context- and sequence-specific errors. Overall, we present a detailed characterization the capabilities of the MinION in terms of generating high-accuracy sequence data from genomes with a wide range of nucleotide content. This study provides a framework for designing the appropriate experiments that are the likely to lead to accurate and rapid field-forward diagnostics.« less
Transcriptome characterisation of Pinus tabuliformis and evolution of genes in the Pinus phylogeny
2013-01-01
Background The Chinese pine (Pinus tabuliformis) is an indigenous conifer species in northern China but is relatively underdeveloped as a genomic resource; thus, limiting gene discovery and breeding. Large-scale transcriptome data were obtained using a next-generation sequencing platform to compensate for the lack of P. tabuliformis genomic information. Results The increasing amount of transcriptome data on Pinus provides an excellent resource for multi-gene phylogenetic analysis and studies on how conserved genes and functions are maintained in the face of species divergence. The first P. tabuliformis transcriptome from a normalised cDNA library of multiple tissues and individuals was sequenced in a full 454 GS-FLX run, producing 911,302 sequencing reads. The high quality overlapping expressed sequence tags (ESTs) were assembled into 46,584 putative transcripts, and more than 700 SSRs and 92,000 SNPs/InDels were characterised. Comparative analysis of the transcriptome of six conifer species yielded 191 orthologues, from which we inferred a phylogenetic tree, evolutionary patterns and calculated rates of gene diversion. We also identified 938 fast evolving sequences that may be useful for identifying genes that perhaps evolved in response to positive selection and might be responsible for speciation in the Pinus lineage. Conclusions A large collection of high-quality ESTs was obtained, de novo assembled and characterised, which represents a dramatic expansion of the current transcript catalogues of P. tabuliformis and which will gradually be applied in breeding programs of P. tabuliformis. Furthermore, these data will facilitate future studies of the comparative genomics of P. tabuliformis and other related species. PMID:23597112
NASA Technical Reports Server (NTRS)
Zhang, Zhengdong; Willson, Richard C.; Fox, George E.
2002-01-01
MOTIVATION: The phylogenetic structure of the bacterial world has been intensively studied by comparing sequences of 16S ribosomal RNA (16S rRNA). This database of sequences is now widely used to design probes for the detection of specific bacteria or groups of bacteria one at a time. The success of such methods reflects the fact that there are local sequence segments that are highly characteristic of particular organisms or groups of organisms. It is not clear, however, the extent to which such signature sequences exist in the 16S rRNA dataset. A better understanding of the numbers and distribution of highly informative oligonucleotide sequences may facilitate the design of hybridization arrays that can characterize the phylogenetic position of an unknown organism or serve as the basis for the development of novel approaches for use in bacterial identification. RESULTS: A computer-based algorithm that characterizes the extent to which any individual oligonucleotide sequence in 16S rRNA is characteristic of any particular bacterial grouping was developed. A measure of signature quality, Q(s), was formulated and subsequently calculated for every individual oligonucleotide sequence in the size range of 5-11 nucleotides and for 15mers with reference to each cluster and subcluster in a 929 organism representative phylogenetic tree. Subsequently, the perfect signature sequences were compared to the full set of 7322 sequences to see how common false positives were. The work completed here establishes beyond any doubt that highly characteristic oligonucleotides exist in the bacterial 16S rRNA sequence dataset in large numbers. Over 16,000 15mers were identified that might be useful as signatures. Signature oligonucleotides are available for over 80% of the nodes in the representative tree.
Nielsen, E E; Morgan, J A T; Maher, S L; Edson, J; Gauthier, M; Pepperell, J; Holmes, B J; Bennett, M B; Ovenden, J R
2017-05-01
Archived specimens are highly valuable sources of DNA for retrospective genetic/genomic analysis. However, often limited effort has been made to evaluate and optimize extraction methods, which may be crucial for downstream applications. Here, we assessed and optimized the usefulness of abundant archived skeletal material from sharks as a source of DNA for temporal genomic studies. Six different methods for DNA extraction, encompassing two different commercial kits and three different protocols, were applied to material, so-called bio-swarf, from contemporary and archived jaws and vertebrae of tiger sharks (Galeocerdo cuvier). Protocols were compared for DNA yield and quality using a qPCR approach. For jaw swarf, all methods provided relatively high DNA yield and quality, while large differences in yield between protocols were observed for vertebrae. Similar results were obtained from samples of white shark (Carcharodon carcharias). Application of the optimized methods to 38 museum and private angler trophy specimens dating back to 1912 yielded sufficient DNA for downstream genomic analysis for 68% of the samples. No clear relationships between age of samples, DNA quality and quantity were observed, likely reflecting different preparation and storage methods for the trophies. Trial sequencing of DNA capture genomic libraries using 20 000 baits revealed that a significant proportion of captured sequences were derived from tiger sharks. This study demonstrates that archived shark jaws and vertebrae are potential high-yield sources of DNA for genomic-scale analysis. It also highlights that even for similar tissue types, a careful evaluation of extraction protocols can vastly improve DNA yield. © 2016 John Wiley & Sons Ltd.
Baptista, Rodrigo P; Reis-Cunha, Joao Luis; DeBarry, Jeremy D; Chiari, Egler; Kissinger, Jessica C; Bartholomeu, Daniella C; Macedo, Andrea M
2018-02-14
Next-generation sequencing (NGS) methods are low-cost high-throughput technologies that produce thousands to millions of sequence reads. Despite the high number of raw sequence reads, their short length, relative to Sanger, PacBio or Nanopore reads, complicates the assembly of genomic repeats. Many genome tools are available, but the assembly of highly repetitive genome sequences using only NGS short reads remains challenging. Genome assembly of organisms responsible for important neglected diseases such as Trypanosoma cruzi, the aetiological agent of Chagas disease, is known to be challenging because of their repetitive nature. Only three of six recognized discrete typing units (DTUs) of the parasite have their draft genomes published and therefore genome evolution analyses in the taxon are limited. In this study, we developed a computational workflow to assemble highly repetitive genomes via a combination of de novo and reference-based assembly strategies to better overcome the intrinsic limitations of each, based on Illumina reads. The highly repetitive genome of the human-infecting parasite T. cruzi 231 strain was used as a test subject. The combined-assembly approach shown in this study benefits from the reference-based assembly ability to resolve highly repetitive sequences and from the de novo capacity to assemble genome-specific regions, improving the quality of the assembly. The acceptable confidence obtained by analyzing our results showed that our combined approach is an attractive option to assemble highly repetitive genomes with NGS short reads. Phylogenomic analysis including the 231 strain, the first representative of DTU III whose genome was sequenced, was also performed and provides new insights into T. cruzi genome evolution.
Aberg, Karolina A.; Xie, Lin Y.; Nerella, Srilaxmi; Copeland, William E.; Costello, E. Jane; van den Oord, Edwin J.C.G.
2013-01-01
The potential importance of DNA methylation in the etiology of complex diseases has led to interest in the development of methylome-wide association studies (MWAS) aimed at interrogating all methylation sites in the human genome. When using blood as biomaterial for a MWAS the DNA is typically extracted directly from fresh or frozen whole blood that was collected via venous puncture. However, DNA extracted from dry blood spots may also be an alternative starting material. In the present study, we apply a methyl-CpG binding domain (MBD) protein enrichment-based technique in combination with next generation sequencing (MBD-seq) to assess the methylation status of the ~27 million CpGs in the human autosomal reference genome. We investigate eight methylomes using DNA from blood spots. This data are compared with 1,500 methylomes previously assayed with the same MBD-seq approach using DNA from whole blood. When investigating the sequence quality and the enrichment profile across biological features, we find that DNA extracted from blood spots gives comparable results with DNA extracted from whole blood. Only if the amount of starting material is ≤ 0.5µg DNA we observe a slight decrease in the assay performance. In conclusion, we show that high quality methylome-wide investigations using MBD-seq can be conducted in DNA extracted from archived dry blood spots without sacrificing quality and without bias in enrichment profile as long as the amount of starting material is sufficient. In general, the amount of DNA extracted from a single blood spot is sufficient for methylome-wide investigations with the MBD-seq approach. PMID:23644822
Aberg, Karolina A; Xie, Lin Y; Nerella, Srilaxmi; Copeland, William E; Costello, E Jane; van den Oord, Edwin J C G
2013-05-01
The potential importance of DNA methylation in the etiology of complex diseases has led to interest in the development of methylome-wide association studies (MWAS) aimed at interrogating all methylation sites in the human genome. When using blood as biomaterial for a MWAS the DNA is typically extracted directly from fresh or frozen whole blood that was collected via venous puncture. However, DNA extracted from dry blood spots may also be an alternative starting material. In the present study, we apply a methyl-CpG binding domain (MBD) protein enrichment-based technique in combination with next generation sequencing (MBD-seq) to assess the methylation status of the ~27 million CpGs in the human autosomal reference genome. We investigate eight methylomes using DNA from blood spots. This data are compared with 1,500 methylomes previously assayed with the same MBD-seq approach using DNA from whole blood. When investigating the sequence quality and the enrichment profile across biological features, we find that DNA extracted from blood spots gives comparable results with DNA extracted from whole blood. Only if the amount of starting material is ≤ 0.5µg DNA we observe a slight decrease in the assay performance. In conclusion, we show that high quality methylome-wide investigations using MBD-seq can be conducted in DNA extracted from archived dry blood spots without sacrificing quality and without bias in enrichment profile as long as the amount of starting material is sufficient. In general, the amount of DNA extracted from a single blood spot is sufficient for methylome-wide investigations with the MBD-seq approach.
JGI Plant Genomics Gene Annotation Pipeline
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shu, Shengqiang; Rokhsar, Dan; Goodstein, David
2014-07-14
Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward thismore » aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.« less
Guo, Yinshan; Shi, Guangli; Liu, Zhendong; Zhao, Yuhui; Yang, Xiaoxu; Zhu, Junchi; Li, Kun; Guo, Xiuwu
2015-01-01
In this study, 149 F1 plants from the interspecific cross between 'Red Globe' (Vitis vinifera L.) and 'Shuangyou' (Vitis amurensis Rupr.) and the parent were used to construct a molecular genetic linkage map by using the specific length amplified fragment sequencing technique. DNA sequencing generated 41.282 Gb data consisting of 206,411,693 paired-end reads. The average sequencing depths were 68.35 for 'Red Globe,' 63.65 for 'Shuangyou,' and 8.01 for each progeny. In all, 115,629 high-quality specific length amplified fragments were detected, of which 42,279 were polymorphic. The genetic map was constructed using 7,199 of these polymorphic markers. These polymorphic markers were assigned to 19 linkage groups; the total length of the map was 1929.13 cm, with an average distance of 0.28 cm between each maker. To our knowledge, the genetic maps constructed in this study contain the largest number of molecular markers. These high-density genetic maps might form the basis for the fine quantitative trait loci mapping and molecular-assisted breeding of grape.
Chen, Dana; Orenstein, Yaron; Golodnitsky, Rada; Pellach, Michal; Avrahami, Dorit; Wachtel, Chaim; Ovadia-Shochat, Avital; Shir-Shapira, Hila; Kedmi, Adi; Juven-Gershon, Tamar; Shamir, Ron; Gerber, Doron
2016-01-01
Transcription factors (TFs) alter gene expression in response to changes in the environment through sequence-specific interactions with the DNA. These interactions are best portrayed as a landscape of TF binding affinities. Current methods to study sequence-specific binding preferences suffer from limited dynamic range, sequence bias, lack of specificity and limited throughput. We have developed a microfluidic-based device for SELEX Affinity Landscape MAPping (SELMAP) of TF binding, which allows high-throughput measurement of 16 proteins in parallel. We used it to measure the relative affinities of Pho4, AtERF2 and Btd full-length proteins to millions of different DNA binding sites, and detected both high and low-affinity interactions in equilibrium conditions, generating a comprehensive landscape of the relative TF affinities to all possible DNA 6-mers, and even DNA10-mers with increased sequencing depth. Low quantities of both the TFs and DNA oligomers were sufficient for obtaining high-quality results, significantly reducing experimental costs. SELMAP allows in-depth screening of hundreds of TFs, and provides a means for better understanding of the regulatory processes that govern gene expression. PMID:27628341
A DNA mini-barcode for land plants.
Little, Damon P
2014-05-01
Small portions of the barcode region - mini-barcodes - may be used in place of full-length barcodes to overcome DNA degradation for samples with poor DNA preservation. 591,491,286 rbcL mini-barcode primer combinations were electronically evaluated for PCR universality, and two novel highly universal sets of priming sites were identified. Novel and published rbcL mini-barcode primers were evaluated for PCR amplification [determined with a validated electronic simulation (n = 2765) and empirically (n = 188)], Sanger sequence quality [determined empirically (n = 188)], and taxonomic discrimination [determined empirically (n = 30,472)]. PCR amplification for all mini-barcodes, as estimated by validated electronic simulation, was successful for 90.2-99.8% of species. Overall Sanger sequence quality for mini-barcodes was very low - the best mini-barcode tested produced sequences of adequate quality (B20 ≥ 0.5) for 74.5% of samples. The majority of mini-barcodes provide correct identifications of families in excess of 70.1% of the time. Discriminatory power noticeably decreased at lower taxonomic levels. At the species level, the discriminatory power of the best mini-barcode was less than 38.2%. For samples believed to contain DNA from only one species, an investigator should attempt to sequence, in decreasing order of utility and probability of success, mini-barcodes F (rbcL1/rbcLB), D (F52/R193) and K (F517/R604). For samples believed to contain DNA from more than one species, an investigator should amplify and sequence mini-barcode D (F52/R193). © 2013 John Wiley & Sons Ltd.
The FDA's Experience with Emerging Genomics Technologies-Past, Present, and Future.
Xu, Joshua; Thakkar, Shraddha; Gong, Binsheng; Tong, Weida
2016-07-01
The rapid advancement of emerging genomics technologies and their application for assessing safety and efficacy of FDA-regulated products require a high standard of reliability and robustness supporting regulatory decision-making in the FDA. To facilitate the regulatory application, the FDA implemented a novel data submission program, Voluntary Genomics Data Submission (VGDS), and also to engage the stakeholders. As part of the endeavor, for the past 10 years, the FDA has led an international consortium of regulatory agencies, academia, pharmaceutical companies, and genomics platform providers, which was named MicroArray Quality Control Consortium (MAQC), to address issues such as reproducibility, precision, specificity/sensitivity, and data interpretation. Three projects have been completed so far assessing these genomics technologies: gene expression microarrays, whole genome genotyping arrays, and whole transcriptome sequencing (i.e., RNA-seq). The resultant studies provide the basic parameters for fit-for-purpose application of these new data streams in regulatory environments, and the solutions have been made available to the public through peer-reviewed publications. The latest MAQC project is also called the SEquencing Quality Control (SEQC) project focused on next-generation sequencing. Using reference samples with built-in controls, SEQC studies have demonstrated that relative gene expression can be measured accurately and reliably across laboratories and RNA-seq platforms. Besides prediction performance comparable to microarrays in clinical settings and safety assessments, RNA-seq is shown to have better sensitivity for low expression and reveal novel transcriptomic features. Future effort of MAQC will be focused on quality control of whole genome sequencing and targeted sequencing.
The FDA’s Experience with Emerging Genomics Technologies—Past, Present, and Future
Xu, Joshua; Thakkar, Shraddha; Gong, Binsheng; Tong, Weida
2016-01-01
The rapid advancement of emerging genomics technologies and their application for assessing safety and efficacy of FDA-regulated products require a high standard of reliability and robustness supporting regulatory decision-making in the FDA. To facilitate the regulatory application, the FDA implemented a novel data submission program, Voluntary Genomics Data Submission (VGDS), and also to engage the stakeholders. As part of the endeavor, for the past 10 years, the FDA has led an international consortium of regulatory agencies, academia, pharmaceutical companies, and genomics platform providers, which was named MicroArray Quality Control Consortium (MAQC), to address issues such as reproducibility, precision, specificity/sensitivity, and data interpretation. Three projects have been completed so far assessing these genomics technologies: gene expression microarrays, whole genome genotyping arrays, and whole transcriptome sequencing (i.e., RNA-seq). The resultant studies provide the basic parameters for fit-for-purpose application of these new data streams in regulatory environments, and the solutions have been made available to the public through peer-reviewed publications. The latest MAQC project is also called the SEquencing Quality Control (SEQC) project focused on next-generation sequencing. Using reference samples with built-in controls, SEQC studies have demonstrated that relative gene expression can be measured accurately and reliably across laboratories and RNA-seq platforms. Besides prediction performance comparable to microarrays in clinical settings and safety assessments, RNA-seq is shown to have better sensitivity for low expression and reveal novel transcriptomic features. Future effort of MAQC will be focused on quality control of whole genome sequencing and targeted sequencing. PMID:27116022
RECKONER: read error corrector based on KMC.
Dlugosz, Maciej; Deorowicz, Sebastian
2017-04-01
Presence of sequencing errors in data produced by next-generation sequencers affects quality of downstream analyzes. Accuracy of them can be improved by performing error correction of sequencing reads. We introduce a new correction algorithm capable of processing eukaryotic close to 500 Mbp-genome-size, high error-rated data using less than 4 GB of RAM in about 35 min on 16-core computer. Program is freely available at http://sun.aei.polsl.pl/REFRESH/reckoner . sebastian.deorowicz@polsl.pl. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Deblurring sequential ocular images from multi-spectral imaging (MSI) via mutual information.
Lian, Jian; Zheng, Yuanjie; Jiao, Wanzhen; Yan, Fang; Zhao, Bojun
2018-06-01
Multi-spectral imaging (MSI) produces a sequence of spectral images to capture the inner structure of different species, which was recently introduced into ocular disease diagnosis. However, the quality of MSI images can be significantly degraded by motion blur caused by the inevitable saccades and exposure time required for maintaining a sufficiently high signal-to-noise ratio. This degradation may confuse an ophthalmologist, reduce the examination quality, or defeat various image analysis algorithms. We propose an early work specially on deblurring sequential MSI images, which is distinguished from many of the current image deblurring techniques by resolving the blur kernel simultaneously for all the images in an MSI sequence. It is accomplished by incorporating several a priori constraints including the sharpness of the latent clear image, the spatial and temporal smoothness of the blur kernel and the similarity between temporally-neighboring images in MSI sequence. Specifically, we model the similarity between MSI images with mutual information considering the different wavelengths used for capturing different images in MSI sequence. The optimization of the proposed approach is based on a multi-scale framework and stepwise optimization strategy. Experimental results from 22 MSI sequences validate that our approach outperforms several state-of-the-art techniques in natural image deblurring.
Kresse, Stine H; Namløs, Heidi M; Lorenz, Susanne; Berner, Jeanne-Marie; Myklebost, Ola; Bjerkehagen, Bodil; Meza-Zepeda, Leonardo A
2018-01-01
Nucleic acid material of adequate quality is crucial for successful high-throughput sequencing (HTS) analysis. DNA and RNA isolated from archival FFPE material are frequently degraded and not readily amplifiable due to chemical damage introduced during fixation. To identify optimal nucleic acid extraction kits, DNA and RNA quantity, quality and performance in HTS applications were evaluated. DNA and RNA were isolated from five sarcoma archival FFPE blocks, using eight extraction protocols from seven kits from three different commercial vendors. For DNA extraction, the truXTRAC FFPE DNA kit from Covaris gave higher yields and better amplifiable DNA, but all protocols gave comparable HTS library yields using Agilent SureSelect XT and performed well in downstream variant calling. For RNA extraction, all protocols gave comparable yields and amplifiable RNA. However, for fusion gene detection using the Archer FusionPlex Sarcoma Assay, the truXTRAC FFPE RNA kit from Covaris and Agencourt FormaPure kit from Beckman Coulter showed the highest percentage of unique read-pairs, providing higher complexity of HTS data and more frequent detection of recurrent fusion genes. truXTRAC simultaneous DNA and RNA extraction gave similar outputs as individual protocols. These findings show that although successful HTS libraries could be generated in most cases, the different protocols gave variable quantity and quality for FFPE nucleic acid extraction. Selecting the optimal procedure is highly valuable and may generate results in borderline quality specimens.
Genome Sequence of an Ammonia-Oxidizing Soil Archaeon, “Candidatus Nitrosoarchaeum koreensis” MY1
Kim, Byung Kwon; Jung, Man-Young; Yu, Dong Su; Park, Soo-Je; Oh, Tae Kwang; Rhee, Sung-Keun; Kim, Jihyun F.
2011-01-01
Ammonia-oxidizing archaea are ubiquitous microorganisms which play important roles in global nitrogen and carbon cycle on earth. Here we present the high-quality draft genome sequence of an ammonia-oxidizing archaeon, “Candidatus Nitrosopumilus koreensis” MY1, that dominated an enrichment culture of a soil sample from the rhizosphere. Its genome contains genes for survival in the rhizosphere environment as well as those for carbon fixation and ammonium oxidation to nitrite. PMID:21914867
An elementary research on wireless transmission of holographic 3D moving pictures
NASA Astrophysics Data System (ADS)
Takano, Kunihiko; Sato, Koki; Endo, Takaya; Asano, Hiroaki; Fukuzawa, Atsuo; Asai, Kikuo
2009-05-01
In this paper, a transmitting process of a sequence of holograms describing 3D moving objects over the communicating wireless-network system is presented. A sequence of holograms involves holograms is transformed into a bit stream data, and then it is transmitted over the wireless LAN and Bluetooth. It is shown that applying this technique, holographic data of 3D moving object is transmitted in high quality and a relatively good reconstruction of holographic images is performed.
Apollo 12 photography 70 mm, 16 mm, and 35 mm frame index
NASA Technical Reports Server (NTRS)
1970-01-01
For each 70-mm frame, the index presents information on: (1) the focal length of the camera, (2) the photo scale at the principal point of the frame, (3) the selenographic coordinates at the principal point of the frame, (4) the percentage of forward overlap of the frame, (5) the sun angle (medium, low, high), (6) the quality of the photography, (7) the approximate tilt (minimum and maximum) of the camera, and (8) the direction of tilt. A brief description of each frame is also included. The index to the 16-mm sequence photography includes information concerning the approximate surface coverage of the photographic sequence and a brief description of the principal features shown. A column of remarks is included to indicate: (1) if the sequence is plotted on the photographic index map and (2) the quality of the photography. The pictures taken using the lunar surface closeup stereoscopic camera (35 mm) are also described in this same index format.
Brodsky, Ethan K.; Klaers, Jessica L.; Samsonov, Alexey A.; Kijowski, Richard; Block, Walter F.
2014-01-01
Non-Cartesian imaging sequences and navigational methods can be more sensitive to scanner imperfections that have little impact on conventional clinical sequences, an issue which has repeatedly complicated the commercialization of these techniques by frustrating transitions to multi-center evaluations. One such imperfection is phase errors caused by resonant frequency shifts from eddy currents induced in the cryostat by time-varying gradients, a phenomemon known as B0 eddy currents. These phase errors can have a substantial impact on sequences that use ramp sampling, bipolar gradients, and readouts at varying azimuthal angles. We present a method for measuring and correcting phase errors from B0 eddy currents and examine the results on two different scanner models. This technique yields significant improvements in image quality for high-resolution joint imaging on certain scanners. The results suggest that correction of short time B0 eddy currents in manufacturer provided service routines would simplify adoption of non-Cartesian sampling methods. PMID:22488532
Wen, Qiuting; Kodiweera, Chandana; Dale, Brian M; Shivraman, Giri; Wu, Yu-Chien
2018-01-01
To accelerate high-resolution diffusion imaging, rotating single-shot acquisition (RoSA) with composite reconstruction is proposed. Acceleration was achieved by acquiring only one rotating single-shot blade per diffusion direction, and high-resolution diffusion-weighted (DW) images were reconstructed by using similarities of neighboring DW images. A parallel imaging technique was implemented in RoSA to further improve the image quality and acquisition speed. RoSA performance was evaluated by simulation and human experiments. A brain tensor phantom was developed to determine an optimal blade size and rotation angle by considering similarity in DW images, off-resonance effects, and k-space coverage. With the optimal parameters, RoSA MR pulse sequence and reconstruction algorithm were developed to acquire human brain data. For comparison, multishot echo planar imaging (EPI) and conventional single-shot EPI sequences were performed with matched scan time, resolution, field of view, and diffusion directions. The simulation indicated an optimal blade size of 48 × 256 and a 30 ° rotation angle. For 1 × 1 mm 2 in-plane resolution, RoSA was 12 times faster than the multishot acquisition with comparable image quality. With the same acquisition time as SS-EPI, RoSA provided superior image quality and minimum geometric distortion. RoSA offers fast, high-quality, high-resolution diffusion images. The composite image reconstruction is model-free and compatible with various diffusion computation approaches including parametric and nonparametric analyses. Magn Reson Med 79:264-275, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Tempo and mode of genomic mutations unveil human evolutionary history.
Hara, Yuichiro
2015-01-01
Mutations that have occurred in human genomes provide insight into various aspects of evolutionary history such as speciation events and degrees of natural selection. Comparing genome sequences between human and great apes or among humans is a feasible approach for inferring human evolutionary history. Recent advances in high-throughput or so-called 'next-generation' DNA sequencing technologies have enabled the sequencing of thousands of individual human genomes, as well as a variety of reference genomes of hominids, many of which are publicly available. These sequence data can help to unveil the detailed demographic history of the lineage leading to humans as well as the explosion of modern human population size in the last several thousand years. In addition, high-throughput sequencing illustrates the tempo and mode of de novo mutations, which are producing human genetic variation at this moment. Pedigree-based human genome sequencing has shown that mutation rates vary significantly across the human genome. These studies have also provided an improved timescale of human evolution, because the mutation rate estimated from pedigree analysis is half that estimated from traditional analyses based on molecular phylogeny. Because of the dramatic reduction in sequencing cost, sequencing on-demand samples designed for specific studies is now also becoming popular. To produce data of sufficient quality to meet the requirements of the study, it is necessary to set an explicit sequencing plan that includes the choice of sample collection methods, sequencing platforms, and number of sequence reads.
Single haplotype assembly of the human genome from a hydatidiform mole.
Steinberg, Karyn Meltz; Schneider, Valerie A; Graves-Lindsay, Tina A; Fulton, Robert S; Agarwala, Richa; Huddleston, John; Shiryev, Sergey A; Morgulis, Aleksandr; Surti, Urvashi; Warren, Wesley C; Church, Deanna M; Eichler, Evan E; Wilson, Richard K
2014-12-01
A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly. © 2014 Steinberg et al.; Published by Cold Spring Harbor Laboratory Press.
Leung, Ross Ka-Kit; Dong, Zhi Qiang; Sa, Fei; Chong, Cheong Meng; Lei, Si Wan; Tsui, Stephen Kwok-Wing; Lee, Simon Ming-Yuen
2014-02-01
Minor variants have significant implications in quasispecies evolution, early cancer detection and non-invasive fetal genotyping but their accurate detection by next-generation sequencing (NGS) is hampered by sequencing errors. We generated sequencing data from mixtures at predetermined ratios in order to provide insight into sequencing errors and variations that can arise for which simulation cannot be performed. The information also enables better parameterization in depth of coverage, read quality and heterogeneity, library preparation techniques, technical repeatability for mathematical modeling, theory development and simulation experimental design. We devised minor variant authentication rules that achieved 100% accuracy in both testing and validation experiments. The rules are free from tedious inspection of alignment accuracy, sequencing read quality or errors introduced by homopolymers. The authentication processes only require minor variants to: (1) have minimum depth of coverage larger than 30; (2) be reported by (a) four or more variant callers, or (b) DiBayes or LoFreq, plus SNVer (or BWA when no results are returned by SNVer), and with the interassay coefficient of variation (CV) no larger than 0.1. Quantification accuracy undermined by sequencing errors could neither be overcome by ultra-deep sequencing, nor recruiting more variant callers to reach a consensus, such that consistent underestimation and overestimation (i.e. low CV) were observed. To accommodate stochastic error and adjust the observed ratio within a specified accuracy, we presented a proof of concept for the use of a double calibration curve for quantification, which provides an important reference towards potential industrial-scale fabrication of calibrants for NGS.
Single haplotype assembly of the human genome from a hydatidiform mole
Steinberg, Karyn Meltz; Schneider, Valerie A.; Graves-Lindsay, Tina A.; Fulton, Robert S.; Agarwala, Richa; Huddleston, John; Shiryev, Sergey A.; Morgulis, Aleksandr; Surti, Urvashi; Warren, Wesley C.; Church, Deanna M.; Eichler, Evan E.; Wilson, Richard K.
2014-01-01
A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly. PMID:25373144
Brichtová, Eva; Šenkyřík, J
2017-05-01
A low radiation burden is essential during diagnostic procedures in pediatric patients due to their high tissue sensitivity. Using MR examination instead of the routinely used CT reduces the radiation exposure and the risk of adverse stochastic effects. Our retrospective study evaluated the possibility of using ultrafast single-shot (SSh) sequences and turbo spin echo (TSE) sequences in rapid MR brain imaging in pediatric patients with hydrocephalus and a programmable ventriculoperitoneal drainage system. SSh sequences seem to be suitable for examining pediatric patients due to the speed of using this technique, but significant susceptibility artifacts due to the programmable drainage valve degrade the image quality. Therefore, a rapid MR examination protocol based on TSE sequences, less sensitive to artifacts due to ferromagnetic components, has been developed. Of 61 pediatric patients who were examined using MR and the SSh sequence protocol, a group of 15 patients with hydrocephalus and a programmable drainage system also underwent TSE sequence MR imaging. The susceptibility artifact volume in both rapid MR protocols was evaluated using a semiautomatic volumetry system. A statistically significant decrease in the susceptibility artifact volume has been demonstrated in TSE sequence imaging in comparison with SSh sequences. Using TSE sequences reduced the influence of artifacts from the programmable valve, and the image quality in all cases was rated as excellent. In all patients, rapid MR examinations were performed without any need for intravenous sedation or general anesthesia. Our study results strongly suggest the superiority of the TSE sequence MR protocol compared to the SSh sequence protocol in pediatric patients with a programmable ventriculoperitoneal drainage system due to a significant reduction of susceptibility artifact volume. Both rapid sequence MR protocols provide quick and satisfactory brain imaging with no ionizing radiation and a reduced need for intravenous or general anesthesia.
Metatranscriptomics of Soil Eukaryotic Communities.
Yadav, Rajiv K; Bragalini, Claudia; Fraissinet-Tachet, Laurence; Marmeisse, Roland; Luis, Patricia
2016-01-01
Functions expressed by eukaryotic organisms in soil can be specifically studied by analyzing the pool of eukaryotic-specific polyadenylated mRNA directly extracted from environmental samples. In this chapter, we describe two alternative protocols for the extraction of high-quality RNA from soil samples. Total soil RNA or mRNA can be converted to cDNA for direct high-throughput sequencing. Polyadenylated mRNA-derived full-length cDNAs can also be cloned in expression plasmid vectors to constitute soil cDNA libraries, which can be subsequently screened for functional gene categories. Alternatively, the diversity of specific gene families can also be explored following cDNA sequence capture using exploratory oligonucleotide probes.
Townsley, Brad T; Covington, Michael F; Ichihashi, Yasunori; Zumstein, Kristina; Sinha, Neelima R
2015-01-01
Next Generation Sequencing (NGS) is driving rapid advancement in biological understanding and RNA-sequencing (RNA-seq) has become an indispensable tool for biology and medicine. There is a growing need for access to these technologies although preparation of NGS libraries remains a bottleneck to wider adoption. Here we report a novel method for the production of strand specific RNA-seq libraries utilizing the terminal breathing of double-stranded cDNA to capture and incorporate a sequencing adapter. Breath Adapter Directional sequencing (BrAD-seq) reduces sample handling and requires far fewer enzymatic steps than most available methods to produce high quality strand-specific RNA-seq libraries. The method we present is optimized for 3-prime Digital Gene Expression (DGE) libraries and can easily extend to full transcript coverage shotgun (SHO) type strand-specific libraries and is modularized to accommodate a diversity of RNA and DNA input materials. BrAD-seq offers a highly streamlined and inexpensive option for RNA-seq libraries.
2011-01-01
Background High-throughput SNP genotyping has become an essential requirement for molecular breeding and population genomics studies in plant species. Large scale SNP developments have been reported for several mainstream crops. A growing interest now exists to expand the speed and resolution of genetic analysis to outbred species with highly heterozygous genomes. When nucleotide diversity is high, a refined diagnosis of the target SNP sequence context is needed to convert queried SNPs into high-quality genotypes using the Golden Gate Genotyping Technology (GGGT). This issue becomes exacerbated when attempting to transfer SNPs across species, a scarcely explored topic in plants, and likely to become significant for population genomics and inter specific breeding applications in less domesticated and less funded plant genera. Results We have successfully developed the first set of 768 SNPs assayed by the GGGT for the highly heterozygous genome of Eucalyptus from a mixed Sanger/454 database with 1,164,695 ESTs and the preliminary 4.5X draft genome sequence for E. grandis. A systematic assessment of in silico SNP filtering requirements showed that stringent constraints on the SNP surrounding sequences have a significant impact on SNP genotyping performance and polymorphism. SNP assay success was high for the 288 SNPs selected with more rigorous in silico constraints; 93% of them provided high quality genotype calls and 71% of them were polymorphic in a diverse panel of 96 individuals of five different species. SNP reliability was high across nine Eucalyptus species belonging to three sections within subgenus Symphomyrtus and still satisfactory across species of two additional subgenera, although polymorphism declined as phylogenetic distance increased. Conclusions This study indicates that the GGGT performs well both within and across species of Eucalyptus notwithstanding its nucleotide diversity ≥2%. The development of a much larger array of informative SNPs across multiple Eucalyptus species is feasible, although strongly dependent on having a representative and sufficiently deep collection of sequences from many individuals of each target species. A higher density SNP platform will be instrumental to undertake genome-wide phylogenetic and population genomics studies and to implement molecular breeding by Genomic Selection in Eucalyptus. PMID:21492434
Impact of Roadway Stormwater Runoff on Microbial Contamination in the Receiving Stream.
Wyckoff, Kristen N; Chen, Si; Steinman, Andrew J; He, Qiang
2017-09-01
Stormwater runoff from roadways has increasingly become a regulatory concern for water pollution control. Recent work has suggested roadway stormwater runoff as a potential source of microbial pollutants. The objective of this study was to determine the impact of roadway runoff on the microbiological quality of receiving streams. Microbiological quality of roadway stormwater runoff and the receiving stream was monitored during storm events with both cultivation-dependent fecal bacteria enumeration and cultivation-independent high-throughput sequencing techniques. Enumeration of total coliforms as a measure of fecal microbial pollution found consistently lower total coliform counts in roadway runoff than those in the stream water, suggesting that roadway runoff was not a major contributor of microbial pollutants to the receiving stream. Further characterization of the microbial community in the stormwater samples by 16S ribosomal RNA gene-based high-throughput amplicon sequencing revealed significant differences in the microbial composition of stormwater runoff from the roadways and the receiving stream. The differences in microbial composition between the roadway runoff and stream water demonstrate that roadway runoff did not appear to have a major influence on the stream in terms of microbiological quality. Thus, results from both fecal bacteria enumeration and high-throughput amplicon sequencing techniques were consistent that roadway stormwater runoff was not the primary contributor of microbial loading to the stream. Further studies of additional watersheds with distinct characteristics are needed to validate these findings. Understanding gained in this study could support the development of more effective strategies for stormwater management in sensitive watersheds. Copyright © by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America, Inc.
Rhesus monkeys lack a consistent peak-end effect.
Xu, Eric R; Knight, Emily J; Kralik, Jerald D
2011-12-01
In humans, the order of receiving sequential rewards can significantly influence the overall subjective utility of an outcome. For example, people subjectively rate receiving a large reward by itself significantly higher than receiving the same large reward followed by a smaller one (Do, Rupert, & Wolford, 2008). This result is called the peak-end effect. A comparative analysis of order effects can help determine the generality of such effects across primates, and we therefore examined the influence of reward-quality order on decision making in three rhesus macaque monkeys (Macaca mulatta). When given the choice between a high-low reward sequence and a low-high sequence, all three monkeys preferred receiving the high-value reward first. Follow-up experiments showed that for two of the three monkeys their choices depended specifically on reward-quality order and could not be accounted for by delay discounting. These results provide evidence for the influence of outcome order on decision making in rhesus monkeys. Unlike humans, who usually discount choices when a low-value reward comes last, rhesus monkeys show no such peak-end effect.
Flow cytometry for enrichment and titration in massively parallel DNA sequencing
Sandberg, Julia; Ståhl, Patrik L.; Ahmadian, Afshin; Bjursell, Magnus K.; Lundeberg, Joakim
2009-01-01
Massively parallel DNA sequencing is revolutionizing genomics research throughout the life sciences. However, the reagent costs and labor requirements in current sequencing protocols are still substantial, although improvements are continuously being made. Here, we demonstrate an effective alternative to existing sample titration protocols for the Roche/454 system using Fluorescence Activated Cell Sorting (FACS) technology to determine the optimal DNA-to-bead ratio prior to large-scale sequencing. Our method, which eliminates the need for the costly pilot sequencing of samples during titration is capable of rapidly providing accurate DNA-to-bead ratios that are not biased by the quantification and sedimentation steps included in current protocols. Moreover, we demonstrate that FACS sorting can be readily used to highly enrich fractions of beads carrying template DNA, with near total elimination of empty beads and no downstream sacrifice of DNA sequencing quality. Automated enrichment by FACS is a simple approach to obtain pure samples for bead-based sequencing systems, and offers an efficient, low-cost alternative to current enrichment protocols. PMID:19304748
Heuristics for multiobjective multiple sequence alignment.
Abbasi, Maryam; Paquete, Luís; Pereira, Francisco B
2016-07-15
Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment. We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments. The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show that our approaches can obtain better results than TCoffee and Clustal Omega in terms of the first ratio.
Conte, Matthew A; Gammerdinger, William J; Bartie, Kerry L; Penman, David J; Kocher, Thomas D
2017-05-02
Tilapias are the second most farmed fishes in the world and a sustainable source of food. Like many other fish, tilapias are sexually dimorphic and sex is a commercially important trait in these fish. In this study, we developed a significantly improved assembly of the tilapia genome using the latest genome sequencing methods and show how it improves the characterization of two sex determination regions in two tilapia species. A homozygous clonal XX female Nile tilapia (Oreochromis niloticus) was sequenced to 44X coverage using Pacific Biosciences (PacBio) SMRT sequencing. Dozens of candidate de novo assemblies were generated and an optimal assembly (contig NG50 of 3.3Mbp) was selected using principal component analysis of likelihood scores calculated from several paired-end sequencing libraries. Comparison of the new assembly to the previous O. niloticus genome assembly reveals that recently duplicated portions of the genome are now well represented. The overall number of genes in the new assembly increased by 27.3%, including a 67% increase in pseudogenes. The new tilapia genome assembly correctly represents two recent vasa gene duplication events that have been verified with BAC sequencing. At total of 146Mbp of additional transposable element sequence are now assembled, a large proportion of which are recent insertions. Large centromeric satellite repeats are assembled and annotated in cichlid fish for the first time. Finally, the new assembly identifies the long-range structure of both a ~9Mbp XY sex determination region on LG1 in O. niloticus, and a ~50Mbp WZ sex determination region on LG3 in the related species O. aureus. This study highlights the use of long read sequencing to correctly assemble recent duplications and to characterize repeat-filled regions of the genome. The study serves as an example of the need for high quality genome assemblies and provides a framework for identifying sex determining genes in tilapia and related fish species.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lo, Chien -Chi; Chain, Patrick S. G.
Background: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Results: Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly processmore » large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. Conclusion: FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.« less
Automated sequence-specific protein NMR assignment using the memetic algorithm MATCH.
Volk, Jochen; Herrmann, Torsten; Wüthrich, Kurt
2008-07-01
MATCH (Memetic Algorithm and Combinatorial Optimization Heuristics) is a new memetic algorithm for automated sequence-specific polypeptide backbone NMR assignment of proteins. MATCH employs local optimization for tracing partial sequence-specific assignments within a global, population-based search environment, where the simultaneous application of local and global optimization heuristics guarantees high efficiency and robustness. MATCH thus makes combined use of the two predominant concepts in use for automated NMR assignment of proteins. Dynamic transition and inherent mutation are new techniques that enable automatic adaptation to variable quality of the experimental input data. The concept of dynamic transition is incorporated in all major building blocks of the algorithm, where it enables switching between local and global optimization heuristics at any time during the assignment process. Inherent mutation restricts the intrinsically required randomness of the evolutionary algorithm to those regions of the conformation space that are compatible with the experimental input data. Using intact and artificially deteriorated APSY-NMR input data of proteins, MATCH performed sequence-specific resonance assignment with high efficiency and robustness.
Jackson, K. A.; Stroika, S.; Katz, L. S.; Beal, J.; Brandt, E.; Nadon, C.; Reimer, A.; Major, B.; Conrad, A.; Tarr, C.; Jackson, B. R.; Mody, R. K.
2016-01-01
We report on a case of listeriosis in a patient who probably consumed a prepackaged romaine lettuce–containing product recalled for Listeria monocytogenes contamination. Although definitive epidemiological information demonstrating exposure to the specific recalled product was lacking, the patient reported consumption of a prepackaged romaine lettuce–containing product of either the recalled brand or a different brand. A multinational investigation found that patient and food isolates from the recalled product were indistinguishable by pulsed-field gel electrophoresis and were highly related by whole genome sequencing, differing by four alleles by whole genome multilocus sequence typing and by five high-quality single nucleotide polymorphisms, suggesting a common source. To our knowledge, this is the first time prepackaged lettuce has been identified as a likely source for listeriosis. This investigation highlights the power of whole genome sequencing, as well as the continued need for timely and thorough epidemiological exposure data to identify sources of foodborne infections. PMID:27296429
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andersen, Mikael R.; Salazar, Margarita; Schaap, Peter
2011-06-01
The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compels additional exploration. We therefore undertook whole genome sequencing of the acidogenic A. niger wild type strain (ATCC 1015), and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence and half the telomeric regionsmore » have been elucidated. Moreover, sequence information from ATCC 1015 was utilized to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 megabase of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis revealed up-regulation of the electron transport chain, specifically the alternative oxidative pathway in ATCC 1015, while CBS 513.88 showed significant up regulation of genes associated with biosynthesis of amino acids that are abundant in glucoamylase A, tRNA-synthases and protein transporters.« less
Information recovery through image sequence fusion under wavelet transformation
NASA Astrophysics Data System (ADS)
He, Qiang
2010-04-01
Remote sensing is widely applied to provide information of areas with limited ground access with applications such as to assess the destruction from natural disasters and to plan relief and recovery operations. However, the data collection of aerial digital images is constrained by bad weather, atmospheric conditions, and unstable camera or camcorder. Therefore, how to recover the information from the low-quality remote sensing images and how to enhance the image quality becomes very important for many visual understanding tasks, such like feature detection, object segmentation, and object recognition. The quality of remote sensing imagery can be improved through meaningful combination of the employed images captured from different sensors or from different conditions through information fusion. Here we particularly address information fusion to remote sensing images under multi-resolution analysis in the employed image sequences. The image fusion is to recover complete information by integrating multiple images captured from the same scene. Through image fusion, a new image with high-resolution or more perceptive for human and machine is created from a time series of low-quality images based on image registration between different video frames.
ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia.
Landt, Stephen G; Marinov, Georgi K; Kundaje, Anshul; Kheradpour, Pouya; Pauli, Florencia; Batzoglou, Serafim; Bernstein, Bradley E; Bickel, Peter; Brown, James B; Cayting, Philip; Chen, Yiwen; DeSalvo, Gilberto; Epstein, Charles; Fisher-Aylor, Katherine I; Euskirchen, Ghia; Gerstein, Mark; Gertz, Jason; Hartemink, Alexander J; Hoffman, Michael M; Iyer, Vishwanath R; Jung, Youngsook L; Karmakar, Subhradip; Kellis, Manolis; Kharchenko, Peter V; Li, Qunhua; Liu, Tao; Liu, X Shirley; Ma, Lijia; Milosavljevic, Aleksandar; Myers, Richard M; Park, Peter J; Pazin, Michael J; Perry, Marc D; Raha, Debasish; Reddy, Timothy E; Rozowsky, Joel; Shoresh, Noam; Sidow, Arend; Slattery, Matthew; Stamatoyannopoulos, John A; Tolstorukov, Michael Y; White, Kevin P; Xi, Simon; Farnham, Peggy J; Lieb, Jason D; Wold, Barbara J; Snyder, Michael
2012-09-01
Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.
ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia
Landt, Stephen G.; Marinov, Georgi K.; Kundaje, Anshul; Kheradpour, Pouya; Pauli, Florencia; Batzoglou, Serafim; Bernstein, Bradley E.; Bickel, Peter; Brown, James B.; Cayting, Philip; Chen, Yiwen; DeSalvo, Gilberto; Epstein, Charles; Fisher-Aylor, Katherine I.; Euskirchen, Ghia; Gerstein, Mark; Gertz, Jason; Hartemink, Alexander J.; Hoffman, Michael M.; Iyer, Vishwanath R.; Jung, Youngsook L.; Karmakar, Subhradip; Kellis, Manolis; Kharchenko, Peter V.; Li, Qunhua; Liu, Tao; Liu, X. Shirley; Ma, Lijia; Milosavljevic, Aleksandar; Myers, Richard M.; Park, Peter J.; Pazin, Michael J.; Perry, Marc D.; Raha, Debasish; Reddy, Timothy E.; Rozowsky, Joel; Shoresh, Noam; Sidow, Arend; Slattery, Matthew; Stamatoyannopoulos, John A.; Tolstorukov, Michael Y.; White, Kevin P.; Xi, Simon; Farnham, Peggy J.; Lieb, Jason D.; Wold, Barbara J.; Snyder, Michael
2012-01-01
Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals. PMID:22955991
Effects of the Ion PGM™ Hi-Q™ sequencing chemistry on sequence data quality.
Churchill, Jennifer D; King, Jonathan L; Chakraborty, Ranajit; Budowle, Bruce
2016-09-01
Massively parallel sequencing (MPS) offers substantial improvements over current forensic DNA typing methodologies such as increased resolution, scalability, and throughput. The Ion PGM™ is a promising MPS platform for analysis of forensic biological evidence. The system employs a sequencing-by-synthesis chemistry on a semiconductor chip that measures a pH change due to the release of hydrogen ions as nucleotides are incorporated into the growing DNA strands. However, implementation of MPS into forensic laboratories requires a robust chemistry. Ion Torrent's Hi-Q™ Sequencing Chemistry was evaluated to determine if it could improve on the quality of the generated sequence data in association with selected genetic marker targets. The whole mitochondrial genome and the HID-Ion STR 10-plex panel were sequenced on the Ion PGM™ system with the Ion PGM™ Sequencing 400 Kit and the Ion PGM™ Hi-Q™ Sequencing Kit. Concordance, coverage, strand balance, noise, and deletion ratios were assessed in evaluating the performance of the Ion PGM™ Hi-Q™ Sequencing Kit. The results indicate that reliable, accurate data are generated and that sequencing through homopolymeric regions can be improved with the use of Ion Torrent's Hi-Q™ Sequencing Chemistry. Overall, the quality of the generated sequencing data supports the potential for use of the Ion PGM™ in forensic genetic laboratories.
Generation and analysis of expressed sequence tags from the bone marrow of Chinese Sika deer.
Yao, Baojin; Zhao, Yu; Zhang, Mei; Li, Juan
2012-03-01
Sika deer is one of the best-known and highly valued animals of China. Despite its economic, cultural, and biological importance, there has not been a large-scale sequencing project for Sika deer to date. With the ultimate goal of sequencing the complete genome of this organism, we first established a bone marrow cDNA library for Sika deer and generated a total of 2,025 reads. After processing the sequences, 2,017 high-quality expressed sequence tags (ESTs) were obtained. These ESTs were assembled into 1,157 unigenes, including 238 contigs and 919 singletons. Comparative analyses indicated that 888 (76.75%) of the unigenes had significant matches to sequences in the non-redundant protein database, In addition to highly expressed genes, such as stearoyl-CoA desaturase, cytochrome c oxidase, adipocyte-type fatty acid-binding protein, adiponectin and thymosin beta-4, we also obtained vascular endothelial growth factor-A and heparin-binding growth-associated molecule, both of which are of great importance for angiogenesis research. There were 244 (21.09%) unigenes with no significant match to any sequence in current protein or nucleotide databases, and these sequences may represent genes with unknown function in Sika deer. Open reading frame analysis of the sequences was performed using the getorf program. In addition, the sequences were functionally classified using the gene ontology hierarchy, clusters of orthologous groups of proteins and Kyoto encyclopedia of genes and genomes databases. Analysis of ESTs described in this paper provides an important resource for the transcriptome exploration of Sika deer, and will also facilitate further studies on functional genomics, gene discovery and genome annotation of Sika deer.
Subjective quality evaluation of low-bit-rate video
NASA Astrophysics Data System (ADS)
Masry, Mark; Hemami, Sheila S.; Osberger, Wilfried M.; Rohaly, Ann M.
2001-06-01
A subjective quality evaluation was performed to qualify vie4wre responses to visual defects that appear in low bit rate video at full and reduced frame rates. The stimuli were eight sequences compressed by three motion compensated encoders - Sorenson Video, H.263+ and a Wavelet based coder - operating at five bit/frame rate combinations. The stimulus sequences exhibited obvious coding artifacts whose nature differed across the three coders. The subjective evaluation was performed using the Single Stimulus Continuos Quality Evaluation method of UTI-R Rec. BT.500-8. Viewers watched concatenated coded test sequences and continuously registered the perceived quality using a slider device. Data form 19 viewers was colleted. An analysis of their responses to the presence of various artifacts across the range of possible coding conditions and content is presented. The effects of blockiness and blurriness on perceived quality are examined. The effects of changes in frame rate on perceived quality are found to be related to the nature of the motion in the sequence.
Arthropod genomic resources for the 21st century
USDA-ARS?s Scientific Manuscript database
Genome references are foundational for high quality entomological research today. Species, sub populations and taxonomy are defined by gene flow and genome sequences. Gene content in arthropods is often directly reflective of life history, for example, diet and symbiont related gene loss is observed...
Phanerochaete chrysosporium genomics
Luis F. Larrondo; Rafael Vicuna; Dan Cullen
2005-01-01
A high quality draft genome sequence has been generated for the lignocellulose-degrading basidiomycete Phanerochaete chrysosporium (Martinez et al. 2004). Analysis of the genome in the context of previously established genetics and physiology is presented. Transposable elements and their potential relationship to genes involved in lignin degradation are systematically...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Isanapong, Jantiya; Goodwin, Lynne A.; Bruce, David
Microbial communities in the termite hindgut are essential for degrading plant material. We present the high-quality draft genome sequence of the Opitutaceae bacterium strain TAV1, the first member of the phylum Verrucomicrobia to be isolated from wood-feeding termites. The genomic analysis reveals genes coding for lignocellulosic degradation and nitrogen fixation.
Model-based quality assessment and base-calling for second-generation sequencing data.
Bravo, Héctor Corrada; Irizarry, Rafael A
2010-09-01
Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads-strings of A,C,G, or T's, between 30 and 100 characters long-which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base-calling. The complexity of the base-calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across-sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec-gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base-calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base-calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base-calling performance. © 2009, The International Biometric Society.
QualComp: a new lossy compressor for quality scores based on rate distortion theory
2013-01-01
Background Next Generation Sequencing technologies have revolutionized many fields in biology by reducing the time and cost required for sequencing. As a result, large amounts of sequencing data are being generated. A typical sequencing data file may occupy tens or even hundreds of gigabytes of disk space, prohibitively large for many users. This data consists of both the nucleotide sequences and per-base quality scores that indicate the level of confidence in the readout of these sequences. Quality scores account for about half of the required disk space in the commonly used FASTQ format (before compression), and therefore the compression of the quality scores can significantly reduce storage requirements and speed up analysis and transmission of sequencing data. Results In this paper, we present a new scheme for the lossy compression of the quality scores, to address the problem of storage. Our framework allows the user to specify the rate (bits per quality score) prior to compression, independent of the data to be compressed. Our algorithm can work at any rate, unlike other lossy compression algorithms. We envisage our algorithm as being part of a more general compression scheme that works with the entire FASTQ file. Numerical experiments show that we can achieve a better mean squared error (MSE) for small rates (bits per quality score) than other lossy compression schemes. For the organism PhiX, whose assembled genome is known and assumed to be correct, we show that it is possible to achieve a significant reduction in size with little compromise in performance on downstream applications (e.g., alignment). Conclusions QualComp is an open source software package, written in C and freely available for download at https://sourceforge.net/projects/qualcomp. PMID:23758828
De Meyer, Sofie E.; Parker, Matthew; Van Berkum, Peter; ...
2015-10-16
Cupriavidus sp. strain AMP6 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from a root nodule of Mimosa asperata collected in Santa Ana National Wildlife Refuge, Texas, in 2005. Mimosa asperata is the only legume described so far to exclusively associates with Cupriavidus symbionts. Furthermore, strain AMP6 represents an early-diverging lineage within the symbiotic Cupriavidus group and has the capacity to develop an effective nitrogen-fixing symbiosis with three other species of Mimosa. Here, we describe the genome of Cupriavidus sp. strain AMP6 which enables comparative analyses of symbiotic trait evolution in this genus; the general features, together withmore » sequence and annotation are further discussed. Finally, the 7,579,563 bp high-quality permanent draft genome is arranged in 260 scaffolds of 262 contigs, contains 7,033 protein-coding genes and 97 RNA-only encoding genes, and is part of the GEBA-RNB project proposal.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
De Meyer, Sofie E.; Parker, Matthew; Van Berkum, Peter
Cupriavidus sp. strain AMP6 is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from a root nodule of Mimosa asperata collected in Santa Ana National Wildlife Refuge, Texas, in 2005. Mimosa asperata is the only legume described so far to exclusively associates with Cupriavidus symbionts. Furthermore, strain AMP6 represents an early-diverging lineage within the symbiotic Cupriavidus group and has the capacity to develop an effective nitrogen-fixing symbiosis with three other species of Mimosa. Here, we describe the genome of Cupriavidus sp. strain AMP6 which enables comparative analyses of symbiotic trait evolution in this genus; the general features, together withmore » sequence and annotation are further discussed. Finally, the 7,579,563 bp high-quality permanent draft genome is arranged in 260 scaffolds of 262 contigs, contains 7,033 protein-coding genes and 97 RNA-only encoding genes, and is part of the GEBA-RNB project proposal.« less
Ecological Consistency of SSU rRNA-Based Operational Taxonomic Units at a Global Scale
Schmidt, Thomas S. B.; Matias Rodrigues, João F.; von Mering, Christian
2014-01-01
Operational Taxonomic Units (OTUs), usually defined as clusters of similar 16S/18S rRNA sequences, are the most widely used basic diversity units in large-scale characterizations of microbial communities. However, it remains unclear how well the various proposed OTU clustering algorithms approximate ‘true’ microbial taxa. Here, we explore the ecological consistency of OTUs – based on the assumption that, like true microbial taxa, they should show measurable habitat preferences (niche conservatism). In a global and comprehensive survey of available microbial sequence data, we systematically parse sequence annotations to obtain broad ecological descriptions of sampling sites. Based on these, we observe that sequence-based microbial OTUs generally show high levels of ecological consistency. However, different OTU clustering methods result in marked differences in the strength of this signal. Assuming that ecological consistency can serve as an objective external benchmark for cluster quality, we conclude that hierarchical complete linkage clustering, which provided the most ecologically consistent partitions, should be the default choice for OTU clustering. To our knowledge, this is the first approach to assess cluster quality using an external, biologically meaningful parameter as a benchmark, on a global scale. PMID:24763141
High resolution T2(*)-weighted Magnetic Resonance Imaging at 3 Tesla using PROPELLER-EPI.
Krämer, Martin; Reichenbach, Jürgen R
2014-05-01
We report the application of PROPELLER-EPI for high resolution T2(*)-weighted imaging with sub-millimeter in-plane resolution on a clinical 3 Tesla scanner. Periodically rotated blades of a long-axis PROPELLER-EPI sequence were acquired with fast gradient echo readout and acquisition matrix of 320 × 50 per blade. Images were reconstructed by using 2D-gridding, phase and geometric distortion correction and compensation of resonance frequency drifts that occurred during extended measurements. To characterize these resonance frequency offsets, short FID calibration measurements were added to the PROPELLER-EPI sequence. Functional PROPELLER-EPI was performed with volunteers using a simple block design of right handed finger tapping. Results indicate that PROPELLER-EPI can be employed for fast, high resolution T2(*)-weighted imaging provided geometric distortions and possible resonance frequency drifts are properly corrected. Even small resonance frequency drifts below 10 Hz as well as non-corrected geometric distortions degraded image quality substantially. In the initial fMRI experiment image quality and signal-to-noise ratio was sufficient for obtaining high resolution functional activation maps. Copyright © 2014. Published by Elsevier GmbH.
Kida, Ikuhiro; Ueguchi, Takashi; Matsuoka, Yuichiro; Zhou, Kun; Stemmer, Alto; Porter, David
2016-07-01
The purpose of the present study was to compare periodically rotated overlapping parallel lines with enhanced reconstruction-type turbo spin echo diffusion-weighted imaging (pTSE-DWI) and readout-segmented echo planar imaging (rsEPI-DWI) with single-shot echo planar imaging (ssEPI-DWI) in a 7 T human MR system. We evaluated the signal-to-noise ratio (SNR), image distortion, and apparent diffusion coefficient values in the human brain. Six healthy volunteers were included in this study. The study protocol was approved by our institutional review board. All measurements were performed at 7 T using pTSE-DWI, rsEPI-DWI, and ssEPI-DWI sequences. The spatial resolution was 1.2 × 1.2 mm in-plane with a 3-mm slice thickness. Signal-to-noise ratio was measured using 2 scans. The ssEPI-DWI sequence showed significant image blurring, whereas pTSE-DWI and rsEPI-DWI sequences demonstrated high image quality with low geometrical distortion compared with reference T2-weighted, turbo spin echo images. Signal loss in ventral regions near the air-filled paranasal sinus/nasal cavity was found in ssEPI-DWI and rsEPI-DWI but not pTSE-DWI. The apparent diffusion coefficient values for ssEPI-DWI were 824 ± 17 × 10 and 749 ± 25 × 10 mm/s in the gray matter and white matter, respectively; the values obtained for pTSE-DWI were 798 ± 21 × 10 and 865 ± 40 × 10 mm/s; and the values obtained for rsEPI-DWI were 730 ± 12 × 10 and 722 ± 25 × 10 mm/s. The pTSE-DWI images showed no additional distortion comparison to the T2-weighted images, but had a lower SNR than ssEPI-DWI and rsEPI-DWI. The rsEPI-DWI sequence provided high-quality images with minor distortion and a similar SNR to ssEPI-DWI. Our results suggest that the benefits of the rsEPI-DWI and pTSE-DWI sequences, in terms of SNR, image quality, and image distortion, appear to outweigh those of ssEPI-DWI. Thus, pTSE-DWI and rsEPI-DWI at 7 T have great potential use for clinical diagnoses. However, it is noteworthy that both sequences are limited by the scan time required. In addition, pTSE-DWI has limitations on the number of slices due to specific absorption rate. Overall, rsEPI-DWI is a favorable imaging sequence, taking into account the SNR and image quality at 7 T.
Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo
2003-01-01
To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979
The Saccharomyces Genome Database Variant Viewer
Sheppard, Travis K.; Hitz, Benjamin C.; Engel, Stacia R.; Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C.; Dalusag, Kyla S.; Demeter, Janos; Hellerstedt, Sage T.; Karra, Kalpana; Nash, Robert S.; Paskov, Kelley M.; Skrzypek, Marek S.; Weng, Shuai; Wong, Edith D.; Cherry, J. Michael
2016-01-01
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer. PMID:26578556
Transcriptome-Based Differentiation of Closely-Related Miscanthus Lines
Chouvarine, Philippe; Cooksey, Amanda M.; McCarthy, Fiona M.; ...
2012-01-10
Distinguishing between individuals is critical to those conducting animal/plant breeding, food safety/quality research, diagnostic and clinical testing, and evolutionary biology studies. Classical genetic identification studies are based on marker polymorphisms, but polymorphism-based techniques are time and labor intensive and often cannot distinguish between closely related individuals. Illumina sequencing technologies provide the detailed sequence data required for rapid and efficient differentiation of related species, lines/cultivars, and individuals in a cost-effective manner. Here we describe the use of Illumina high-throughput exome sequencing, coupled with SNP mapping, as a rapid means of distinguishing between related cultivars of the lignocellulosic bioenergy crop giant miscanthusmore » (Miscanthus6giganteus). We provide the first exome sequence database for Miscanthus species complete with Gene Ontology (GO) functional annotations."« less
[Study on correlation between ITS sequence of Arctium lappa and quality of Fructus Arctii].
Xu, Liang; Dou, Deqiang; Wang, Bing; Yang, Yanyun; Kang, Tingguo
2011-07-01
To study the correlation between ITS sequence of Arctium lappa and Fructus Arctii quality of different origin. The samples of Fructu arctii materials were collected from 26 different producing areas. Their ITS sequence were determined after polymerase chain reaction (PCR) and quality were evaluated through the determination of arctiin content by HPLC. Genetic diversity, genotype and correlation were analyzed by ClustalX (1.81), Mage 4.0, SPSS 13.0 statistical software. ITS sequence of A. was obtained from 26 samples, and was registered in the GenBank. Corresponding arctiin content of Fructus arctii and 1000-grain weight were determined. A. lappa genotype correlated with Fructus arctii quality by statistical analysis. The research provided a foundation for revealing the molecular mechanism of Fructus arctii geoherbs.
Dyvorne, Hadrien A.; Galea, Nicola; Nevers, Thomas; Fiel, M. Isabel; Carpenter, David; Wong, Edmund; Orton, Matthew; de Oliveira, Andre; Feiweier, Thorsten; Vachon, Marie-Louise; Babb, James S.
2013-01-01
Purpose: To optimize intravoxel incoherent motion (IVIM) diffusion-weighted (DW) imaging by estimating the effects of diffusion gradient polarity and breathing acquisition scheme on image quality, signal-to-noise ratio (SNR), IVIM parameters, and parameter reproducibility, as well as to investigate the potential of IVIM in the detection of hepatic fibrosis. Materials and Methods: In this institutional review board–approved prospective study, 20 subjects (seven healthy volunteers, 13 patients with hepatitis C virus infection; 14 men, six women; mean age, 46 years) underwent IVIM DW imaging with four sequences: (a) respiratory-triggered (RT) bipolar (BP) sequence, (b) RT monopolar (MP) sequence, (c) free-breathing (FB) BP sequence, and (d) FB MP sequence. Image quality scores were assessed for all sequences. A biexponential analysis with the Bayesian method yielded true diffusion coefficient (D), pseudodiffusion coefficient (D*), and perfusion fraction (PF) in liver parenchyma. Mixed-model analysis of variance was used to compare image quality, SNR, IVIM parameters, and interexamination variability between the four sequences, as well as the ability to differentiate areas of liver fibrosis from normal liver tissue. Results: Image quality with RT sequences was superior to that with FB acquisitions (P = .02) and was not affected by gradient polarity. SNR did not vary significantly between sequences. IVIM parameter reproducibility was moderate to excellent for PF and D, while it was less reproducible for D*. PF and D were both significantly lower in patients with hepatitis C virus than in healthy volunteers with the RT BP sequence (PF = 13.5% ± 5.3 [standard deviation] vs 9.2% ± 2.5, P = .038; D = [1.16 ± 0.07] × 10−3 mm2/sec vs [1.03 ± 0.1] × 10−3 mm2/sec, P = .006). Conclusion: The RT BP DW imaging sequence had the best results in terms of image quality, reproducibility, and ability to discriminate between healthy and fibrotic liver with biexponential fitting. © RSNA, 2012 PMID:23220895
NASA Astrophysics Data System (ADS)
Osman, Mutsim; Abdullatif, Osman
2017-04-01
The Permian to Triassic Khuff carbonate reservoirs (and equivalents) in the Middle East are estimated to contain about 38.4% of the world's natural gas reserves. Excellent exposed outcrops in central Saudi Arabia provide good outcrop equivalents to subsurface Khuff reservoirs. This study conduct high resolution outcrop scale investigations on an analog reservoir for upper Khartam of Khuff Formation. The main objective is to reconstruct litho- and chemo- stratigraphic outcrop analog model that may serve to characterize reservoir high resolution (interwell) heterogeneity, continuity and architecture. Given the fact of the limitation of subsurface data and toolsin capturing interwell reservoir heterogeneity, which in turn increases the value of this study.The methods applied integrate sedimentological, stratigraphic petrographic, petrophysical data and chemical analyses for major, trace and rare earth elements. In addition, laser scanning survey (LIDAR) was also utilized in this study. The results of the stratigraphic investigations revealed that the lithofacies range from mudstone, wackestone, packestone and grainstone. These lithofacies represent environments ranging from supratidal, intertidal, subtidal and shoal complex. Several meter-scale and less high resolution sequences and composite sequences within 4th and 5th order cycles were also recognized in the outcrop analog. The lithofacies and architectural analysis revealed several vertically and laterally stacked sequences at the outcrop as revealed from the stratigraphic sections and the lidar scan. Chemostratigraphy is effective in identifying lithofacies and sequences within the outcrop analog. Moreover, different chemical signatures were also recognized and allowed establishing and correlating high resolution lithofacies, reservoir zones, layers and surfaces bounding reservoirs and non-reservoir zones at scale of meters or less. The results of this high resolution outcrop analog study might help to understand and evaluate Khuff reservoir heterogeneity, quality and architecture. It might also help to fill the gap in knowledge in reservoir characterization models based on low resolution subsurface data alone.
Novel Primer Sets for Next Generation Sequencing-Based Analyses of Water Quality
Lee, Elvina; Khurana, Maninder S.; Whiteley, Andrew S.; Monis, Paul T.; Bath, Andrew; Gordon, Cameron; Ryan, Una M.; Paparini, Andrea
2017-01-01
Next generation sequencing (NGS) has rapidly become an invaluable tool for the detection, identification and relative quantification of environmental microorganisms. Here, we demonstrate two new 16S rDNA primer sets, which are compatible with NGS approaches and are primarily for use in water quality studies. Compared to 16S rRNA gene based universal primers, in silico and experimental analyses demonstrated that the new primers showed increased specificity for the Cyanobacteria and Proteobacteria phyla, allowing increased sensitivity for the detection, identification and relative quantification of toxic bloom-forming microalgae, microbial water quality bioindicators and common pathogens. Significantly, Cyanobacterial and Proteobacterial sequences accounted for ca. 95% of all sequences obtained within NGS runs (when compared to ca. 50% with standard universal NGS primers), providing higher sensitivity and greater phylogenetic resolution of key water quality microbial groups. The increased selectivity of the new primers allow the parallel sequencing of more samples through reduced sequence retrieval levels required to detect target groups, potentially reducing NGS costs by 50% but still guaranteeing optimal coverage and species discrimination. PMID:28118368
A proteome-scale map of the human interactome network
Rolland, Thomas; Taşan, Murat; Charloteaux, Benoit; Pevzner, Samuel J.; Zhong, Quan; Sahni, Nidhi; Yi, Song; Lemmens, Irma; Fontanillo, Celia; Mosca, Roberto; Kamburov, Atanas; Ghiassian, Susan D.; Yang, Xinping; Ghamsari, Lila; Balcha, Dawit; Begg, Bridget E.; Braun, Pascal; Brehme, Marc; Broly, Martin P.; Carvunis, Anne-Ruxandra; Convery-Zupan, Dan; Corominas, Roser; Coulombe-Huntington, Jasmin; Dann, Elizabeth; Dreze, Matija; Dricot, Amélie; Fan, Changyu; Franzosa, Eric; Gebreab, Fana; Gutierrez, Bryan J.; Hardy, Madeleine F.; Jin, Mike; Kang, Shuli; Kiros, Ruth; Lin, Guan Ning; Luck, Katja; MacWilliams, Andrew; Menche, Jörg; Murray, Ryan R.; Palagi, Alexandre; Poulin, Matthew M.; Rambout, Xavier; Rasla, John; Reichert, Patrick; Romero, Viviana; Ruyssinck, Elien; Sahalie, Julie M.; Scholz, Annemarie; Shah, Akash A.; Sharma, Amitabh; Shen, Yun; Spirohn, Kerstin; Tam, Stanley; Tejeda, Alexander O.; Trigg, Shelly A.; Twizere, Jean-Claude; Vega, Kerwin; Walsh, Jennifer; Cusick, Michael E.; Xia, Yu; Barabási, Albert-László; Iakoucheva, Lilia M.; Aloy, Patrick; De Las Rivas, Javier; Tavernier, Jan; Calderwood, Michael A.; Hill, David E.; Hao, Tong; Roth, Frederick P.; Vidal, Marc
2014-01-01
SUMMARY Just as reference genome sequences revolutionized human genetics, reference maps of interactome networks will be critical to fully understand genotype-phenotype relationships. Here, we describe a systematic map of ~14,000 high-quality human binary protein-protein interactions. At equal quality, this map is ~30% larger than what is available from small-scale studies published in the literature in the last few decades. While currently available information is highly biased and only covers a relatively small portion of the proteome, our systematic map appears strikingly more homogeneous, revealing a “broader” human interactome network than currently appreciated. The map also uncovers significant inter-connectivity between known and candidate cancer gene products, providing unbiased evidence for an expanded functional cancer landscape, while demonstrating how high quality interactome models will help “connect the dots” of the genomic revolution. PMID:25416956
SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws
NASA Technical Reports Server (NTRS)
Cooke, Daniel; Rushton, Nelson
2013-01-01
With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.
Estimating genotype error rates from high-coverage next-generation sequence data.
Wall, Jeffrey D; Tang, Ling Fung; Zerbe, Brandon; Kvale, Mark N; Kwok, Pui-Yan; Schaefer, Catherine; Risch, Neil
2014-11-01
Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on the platform and the depth of coverage. Additionally, we found (1) no difference in the error profiles or rates between blood and saliva samples; (2) Complete Genomics sequences had substantially higher error rates than Illumina sequences had; (3) error rates were higher (up to 6%) for rare or unique variants; (4) error rates generally declined with genotype quality (GQ) score, but in a nonlinear fashion for the Illumina data, likely due to loss of specificity of GQ scores greater than 60; and (5) error rates increased with increasing depth of coverage for the Illumina data. These findings, especially (3)-(5), suggest that caution should be taken in interpreting the results of next-generation sequencing-based association studies, and even more so in clinical application of this technology in the absence of validation by other more robust sequencing or genotyping methods. © 2014 Wall et al.; Published by Cold Spring Harbor Laboratory Press.
Intraindividual comparison of image quality in MR urography at 1.5 and 3 tesla in an animal model.
Regier, M; Nolte-Ernsting, C; Adam, G; Kemper, J
2008-10-01
Experimental evaluation of image quality of the upper urinary tract in MR urography (MRU) at 1.5 and 3 Tesla in a porcine model. In this study four healthy domestic pigs, weighing between 71 and 80 kg (mean 73.6 kg), were examined with a standard T1w 3D-GRE and a high-resolution (HR) T1w 3D-GRE sequence at 1.5 and 3 Tesla. Additionally, at 3 Tesla both sequences were performed with parallel imaging (SENSE factor 2). The MR urographic scans were performed after intravenous injection of gadolinium-DTPA (0.1 mmol/kg body weight (bw)) and low-dose furosemide (0.1 mg/kg bw). Image evaluation was performed by two independent radiologists blinded to sequence parameters and field strength. Image analysis included grading of image quality of the segmented collecting system based on a five-point grading scale regarding anatomical depiction and artifacts observed (1: the majority of the segment (>50%) was not depicted or was obscured by major artifacts; 5: the segment was visualized without artifacts and had sharply defined borders). Signal-to-noise (SNR) and contrast-to-noise (CNR) ratios were determined. Statistical analysis included kappa-statistics, Wilcoxon and paired student t-test. The mean scores for MR urographies at 1.5 Tesla were 2.83 for the 3D-GRE and 3.48 for the HR3D-GRE sequence. Significantly higher values were determined using the corresponding sequences at 3 Tesla, averaging 3.19 for the 3D-GRE (p = 0.047) and 3.92 for the HR3D-GRE (p = 0,023) sequence. Delineation of the pelvicaliceal system was rated significantly higher at 3 Tesla compared to 1.5 Tesla (3D-GRE: p = 0.015; HR3D-GRE: p = 0.006). At 3 Tesla the mean SNR and CNR were significantly higher (p < 0.05). A kappa of 0.67 indicated good interobserver agreement. In an experimental setup, MR urography at 3 Tesla allowed for significantly higher image quality and SNR compared to 1.5 Tesla, particularly for the visualization of the pelvicaliceal system.
Restructuring of the Aquatic Bacterial Community by Hydric Dynamics Associated with Superstorm Sandy
Ulrich, Nikea; Rosenberger, Abigail; Brislawn, Colin; Wright, Justin; Kessler, Collin; Toole, David; Solomon, Caroline; Strutt, Steven; McClure, Erin
2016-01-01
ABSTRACT Bacterial community composition and longitudinal fluctuations were monitored in a riverine system during and after Superstorm Sandy to better characterize inter- and intracommunity responses associated with the disturbance associated with a 100-year storm event. High-throughput sequencing of the 16S rRNA gene was used to assess microbial community structure within water samples from Muddy Creek Run, a second-order stream in Huntingdon, PA, at 12 different time points during the storm event (29 October to 3 November 2012) and under seasonally matched baseline conditions. High-throughput sequencing of the 16S rRNA gene was used to track changes in bacterial community structure and divergence during and after Superstorm Sandy. Bacterial community dynamics were correlated to measured physicochemical parameters and fecal indicator bacteria (FIB) concentrations. Bioinformatics analyses of 2.1 million 16S rRNA gene sequences revealed a significant increase in bacterial diversity in samples taken during peak discharge of the storm. Beta-diversity analyses revealed longitudinal shifts in the bacterial community structure. Successional changes were observed, in which Betaproteobacteria and Gammaproteobacteria decreased in 16S rRNA gene relative abundance, while the relative abundance of members of the Firmicutes increased. Furthermore, 16S rRNA gene sequences matching pathogenic bacteria, including strains of Legionella, Campylobacter, Arcobacter, and Helicobacter, as well as bacteria of fecal origin (e.g., Bacteroides), exhibited an increase in abundance after peak discharge of the storm. This study revealed a significant restructuring of in-stream bacterial community structure associated with hydric dynamics of a storm event. IMPORTANCE In order to better understand the microbial risks associated with freshwater environments during a storm event, a more comprehensive understanding of the variations in aquatic bacterial diversity is warranted. This study investigated the bacterial communities during and after Superstorm Sandy to provide fine time point resolution of dynamic changes in bacterial composition. This study adds to the current literature by revealing the variation in bacterial community structure during the course of a storm. This study employed high-throughput DNA sequencing, which generated a deep analysis of inter- and intracommunity responses during a significant storm event. This study has highlighted the utility of applying high-throughput sequencing for water quality monitoring purposes, as this approach enabled a more comprehensive investigation of the bacterial community structure. Altogether, these data suggest a drastic restructuring of the stream bacterial community during a storm event and highlight the potential of high-throughput sequencing approaches for assessing the microbiological quality of our environment. PMID:27060115
Elucidating and mining the Tulipa and Lilium transcriptomes.
Moreno-Pachon, Natalia M; Leeggangers, Hendrika A C F; Nijveen, Harm; Severing, Edouard; Hilhorst, Henk; Immink, Richard G H
2016-10-01
Genome sequencing remains a challenge for species with large and complex genomes containing extensive repetitive sequences, of which the bulbous and monocotyledonous plants tulip and lily are examples. In such a case, sequencing of only the active part of the genome, represented by the transcriptome, is a good alternative to obtain information about gene content. In this study we aimed to generate a high quality transcriptome of tulip and lily and to make this data available as an open-access resource via a user-friendly web-based interface. The Illumina HiSeq 2000 platform was applied and the transcribed RNA was sequenced from a collection of different lily and tulip tissues, respectively. In order to obtain good transcriptome coverage and to facilitate effective data mining, assembly was done using different filtering parameters for clearing out contamination and noise of the RNAseq datasets. This analysis revealed limitations of commonly applied methods and parameter settings used in de novo transcriptome assembly. The final created transcriptomes are publicly available via a user friendly Transcriptome browser ( http://www.bioinformatics.nl/bulbs/db/species/index ). The usefulness of this resource has been exemplified by a search for all potential transcription factors in lily and tulip, with special focus on the TCP transcription factor family. This analysis and other quality parameters point out the quality of the transcriptomes, which can serve as a basis for further genomics studies in lily, tulip, and bulbous plants in general.
Reference genome sequence of the model plant Setaria
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bennetzen, Jeffrey L; Schmutz, Jeremy; Wang, Hao
We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The ~400-Mb assembly covers ~80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species thatmore » demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).« less
Reference genome sequence of the model plant Setaria.
Bennetzen, Jeffrey L; Schmutz, Jeremy; Wang, Hao; Percifield, Ryan; Hawkins, Jennifer; Pontaroli, Ana C; Estep, Matt; Feng, Liang; Vaughn, Justin N; Grimwood, Jane; Jenkins, Jerry; Barry, Kerrie; Lindquist, Erika; Hellsten, Uffe; Deshpande, Shweta; Wang, Xuewen; Wu, Xiaomei; Mitros, Therese; Triplett, Jimmy; Yang, Xiaohan; Ye, Chu-Yu; Mauro-Herrera, Margarita; Wang, Lin; Li, Pinghua; Sharma, Manoj; Sharma, Rita; Ronald, Pamela C; Panaud, Olivier; Kellogg, Elizabeth A; Brutnell, Thomas P; Doust, Andrew N; Tuskan, Gerald A; Rokhsar, Daniel; Devos, Katrien M
2012-05-13
We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The ∼400-Mb assembly covers ∼80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).
Bandelt, Hans-Jürgen; Yao, Yong-Gang; Bravi, Claudio M; Salas, Antonio; Kivisild, Toomas
2009-03-01
Sequence analysis of the mitochondrial genome has become a routine method in the study of mitochondrial diseases. Quite often, the sequencing efforts in the search of pathogenic or disease-associated mutations are affected by technical and interpretive problems, caused by sample mix-up, contamination, biochemical problems, incomplete sequencing, misdocumentation and insufficient reference to previously published data. To assess data quality in case studies of mitochondrial diseases, it is recommended to compare any mtDNA sequence under consideration to their phylogenetically closest lineages available in the Web. The median network method has proven useful for visualizing potential problems with the data. We contrast some early reports of complete mtDNA sequences to more recent total mtDNA sequencing efforts in studies of various mitochondrial diseases. We conclude that the quality of complete mtDNA sequences generated in the medical field in the past few years is somewhat unsatisfactory and may even fall behind that of pioneer manual sequencing in the early nineties. Our study provides a paradigm for an a posteriori evaluation of sequence quality and for detection of potential problems with inferring a pathogenic status of a particular mutation.
Gimonet, Johan; Portmann, Anne-Catherine; Fournier, Coralie; Baert, Leen
2018-06-16
This work shows that an incubation time reduced to 4-5 h to prepare a culture for DNA extraction followed by an automated DNA extraction can shorten the hands-on time, the turnaround time by 30% and increase the throughput while maintaining the WGS quality assessed by high quality Single Nucleotide Polymorphism analysis. Copyright © 2018. Published by Elsevier B.V.
Single-electron random-number generator (RNG) for highly secure ubiquitous computing applications
NASA Astrophysics Data System (ADS)
Uchida, Ken; Tanamoto, Tetsufumi; Fujita, Shinobu
2007-11-01
Since the security of all modern cryptographic techniques relies on unpredictable and irreproducible digital keys generated by random-number generators (RNGs), the realization of high-quality RNG is essential for secure communications. In this report, a new RNG, which utilizes single-electron phenomena, is proposed. A room-temperature operating silicon single-electron transistor (SET) having nearby an electron pocket is used as a high-quality, ultra-small RNG. In the proposed RNG, stochastic single-electron capture/emission processes to/from the electron pocket are detected with high sensitivity by the SET, and result in giant random telegraphic signals (GRTS) on the SET current. It is experimentally demonstrated that the single-electron RNG generates extremely high-quality random digital sequences at room temperature, in spite of its simple configuration. Because of its small-size and low-power properties, the single-electron RNG is promising as a key nanoelectronic device for future ubiquitous computing systems with highly secure mobile communication capabilities.
Cryopreservation of Fish Spermatogonial Cells: The Future of Natural History Collections.
Hagedorn, Mary M; Daly, Jonathan P; Carter, Virginia L; Cole, Kathleen S; Jaafar, Zeehan; Lager, Claire V A; Parenti, Lynne R
2018-04-18
As global biodiversity declines, the value of biological collections increases. Cryopreserved diploid spermatogonial cells meet two goals: to yield high-quality molecular sequence data; and to regenerate new individuals, hence potentially countering species extinction. Cryopreserved spermatogonial cells that allow for such mitigative measures are not currently in natural history museum collections because there are no standard protocols to collect them. Vertebrate specimens, especially fishes, are traditionally formalin-fixed and alcohol-preserved which makes them ideal for morphological studies and as museum vouchers, but inadequate for molecular sequence data. Molecular studies of fishes routinely use tissues preserved in ethanol; yet tissues preserved in this way may yield degraded sequences over time. As an alternative to tissue fixation methods, we assessed and compared previously published cryopreservation methods by gating and counting fish testicular cells with flow cytometry to identify presumptive spermatogonia A-type cells. Here we describe a protocol to cryopreserve tissues that yields a high percentage of viable spermatogonial cells from the testes of Asterropteryx semipunctata, a marine goby. Material cryopreserved using this protocol represents the first frozen and post-thaw viable spermatogonial cells of fishes archived in a natural history museum to provide better quality material for re-derivation of species and DNA preservation and analysis.
ChronQC: a quality control monitoring system for clinical next generation sequencing.
Tawari, Nilesh R; Seow, Justine Jia Wen; Perumal, Dharuman; Ow, Jack L; Ang, Shimin; Devasia, Arun George; Ng, Pauline C
2018-05-15
ChronQC is a quality control (QC) tracking system for clinical implementation of next-generation sequencing (NGS). ChronQC generates time series plots for various QC metrics to allow comparison of current runs to historical runs. ChronQC has multiple features for tracking QC data including Westgard rules for clinical validity, laboratory-defined thresholds and historical observations within a specified time period. Users can record their notes and corrective actions directly onto the plots for long-term recordkeeping. ChronQC facilitates regular monitoring of clinical NGS to enable adherence to high quality clinical standards. ChronQC is freely available on GitHub (https://github.com/nilesh-tawari/ChronQC), Docker (https://hub.docker.com/r/nileshtawari/chronqc/) and the Python Package Index. ChronQC is implemented in Python and runs on all common operating systems (Windows, Linux and Mac OS X). tawari.nilesh@gmail.com or pauline.c.ng@gmail.com. Supplementary data are available at Bioinformatics online.
Shan, Yan; Zeng, Meng-su; Liu, Kai; Miao, Xi-Yin; Lin, Jiang; Fu, Cai xia; Xu, Peng-ju
2015-01-01
To evaluate the effect on image quality and intravoxel incoherent motion (IVIM) parameters of small hepatocellular carcinoma (HCC) from choice of either free-breathing (FB) or navigator-triggered (NT) diffusion-weighted (DW) imaging. Thirty patients with 37 small HCCs underwent IVIM DW imaging using 12 b values (0-800 s/mm) with 2 sequences: NT, FB. A biexponential analysis with the Bayesian method yielded true diffusion coefficient (D), pseudodiffusion coefficient (D*), and perfusion fraction (f) in small HCCs and liver parenchyma. Apparent diffusion coefficient (ADC) was also calculated. The acquisition time and image quality scores were assessed for 2 sequences. Independent sample t test was used to compare image quality, signal intensity ratio, IVIM parameters, and ADC values between the 2 sequences; reproducibility of IVIM parameters, and ADC values between 2 sequences was assessed with the Bland-Altman method (BA-LA). Image quality with NT sequence was superior to that with FB acquisition (P = 0.02). The mean acquisition time for FB scheme was shorter than that of NT sequence (6 minutes 14 seconds vs 10 minutes 21 seconds ± 10 seconds P < 0.01). The signal intensity ratio of small HCCs did not vary significantly between the 2 sequences. The ADC and IVIM parameters from the 2 sequences show no significant difference. Reproducibility of D*and f parameters in small HCC was poor (BA-LA: 95% confidence interval, -180.8% to 189.2% for D* and -133.8% to 174.9% for f). A moderate reproducibility of D and ADC parameters was observed (BA-LA: 95% confidence interval, -83.5% to 76.8% for D and -74.4% to 88.2% for ADC) between the 2 sequences. The NT DW imaging technique offers no advantage in IVIM parameters measurements of small HCC except better image quality, whereas FB technique offers greater confidence in fitted diffusion parameters for matched acquisition periods.
Wylezinska, Marzena; Pinkstone, Marie; Hay, Norman; Scott, Andrew D; Birch, Malcolm J; Miquel, Marc E
2015-12-01
The aim of this work was to investigate the effects of commonly used orthodontic appliances on the magnetic resonance (MR) image quality of the craniofacial region, with special interest in the soft palate and velopharyngeal wall using real-time speech imaging sequences and anatomical imaging of the temporomandibular joints (TMJ) and pituitaries. Common orthodontic appliances were studied on 1.5 T scanner using standard spin and gradient echo sequences (based on the American Society for Testing and Materials standard test method) and sequences previously applied for high-resolution anatomical and dynamic real-time imaging during speech. Images were evaluated for the presence and size of artefacts. Metallic orthodontic appliances had different effects on image quality. The most extensive individual effects were associated with the presence of stainless steel archwire, particularly if combined with stainless steel brackets and stainless steel molar bands. With those appliances, diagnostic quality of magnetic resonance imaging speech and palate images will be most likely severely degraded, or speech imaging and imaging of pituitaries and TMJ will be not possible. All non-metallic, non-metallic with Ni/Cr reinforcement or Ni/Ti alloys appliances were of little concern. The results in the study are only valid at 1.5 T and for the sequences and devices used and cannot necessarily be extrapolated to all sequences and devices. Furthermore, both geometry and size of some appliances are subject dependent, and consequently, the effects on the image quality can vary between subjects. Therefore, the results presented in this article should be treated as a guide when assessing the risks of image quality degradation rather than an absolute evaluation of possible artefacts. Appliances manufactured from stainless steel cause extensive artefacts, which may render image non-diagnostic. The presence and type of orthodontic appliances should be always included in the patient's screening, so the risks of artefacts can be assessed prior to imaging. Although the risks to patients with fixed orthodontic appliances at 1.5 T MR scanners are low, their secure attachment should be confirmed prior to the examination. © The Author 2015. Published by Oxford University Press on behalf of the European Orthodontic Society. All rights reserved. For permissions, please email: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Froehlich, Jan; Grandinetti, Stefan; Eberhardt, Bernd; Walter, Simon; Schilling, Andreas; Brendel, Harald
2014-03-01
High quality video sequences are required for the evaluation of tone mapping operators and high dynamic range (HDR) displays. We provide scenic and documentary scenes with a dynamic range of up to 18 stops. The scenes are staged using professional film lighting, make-up and set design to enable the evaluation of image and material appearance. To address challenges for HDR-displays and temporal tone mapping operators, the sequences include highlights entering and leaving the image, brightness changing over time, high contrast skin tones, specular highlights and bright, saturated colors. HDR-capture is carried out using two cameras mounted on a mirror-rig. To achieve a cinematic depth of field, digital motion picture cameras with Super-35mm size sensors are used. We provide HDR-video sequences to serve as a common ground for the evaluation of temporal tone mapping operators and HDR-displays. They are available to the scientific community for further research.
The effect of texture granularity on texture synthesis quality
NASA Astrophysics Data System (ADS)
Golestaneh, S. Alireza; Subedar, Mahesh M.; Karam, Lina J.
2015-09-01
Natural and artificial textures occur frequently in images and in video sequences. Image/video coding systems based on texture synthesis can make use of a reliable texture synthesis quality assessment method in order to improve the compression performance in terms of perceived quality and bit-rate. Existing objective visual quality assessment methods do not perform satisfactorily when predicting the synthesized texture quality. In our previous work, we showed that texture regularity can be used as an attribute for estimating the quality of synthesized textures. In this paper, we study the effect of another texture attribute, namely texture granularity, on the quality of synthesized textures. For this purpose, subjective studies are conducted to assess the quality of synthesized textures with different levels (low, medium, high) of perceived texture granularity using different types of texture synthesis methods.
Van Geel, Maarten; Busschaert, Pieter; Honnay, Olivier; Lievens, Bart
2014-11-01
In the last few years, 454 pyrosequencing-based analysis of arbuscular mycorrhizal fungal (AMF; Glomeromycota) communities has tremendously increased our knowledge of the distribution and diversity of AMF. Nonetheless, comparing results between different studies is difficult, as different target genes (or regions thereof) and primer combinations, with potentially dissimilar specificities and efficacies, are being utilized. In this study we evaluated six primer pairs that have previously been used in AMF studies (NS31-AM1, AMV4.5NF-AMDGR, AML1-AML2, NS31-AML2, FLR3-LSUmBr and Glo454-NDL22) for their use in 454 pyrosequencing based on both an in silico approach and 454 pyrosequencing of AMF communities from apple tree roots. Primers were evaluated in terms of (i) in silico coverage of Glomeromycota fungi, (ii) the number of high-quality sequences obtained, (iii) selectivity for AMF species, (iv) reproducibility and (v) ability to accurately describe AMF communities. We show that primer pairs AMV4.5NF-AMDGR, AML1-AML2 and NS31-AML2 outperformed the other tested primer pairs in terms of number of Glomeromycota reads (AMF specificity and coverage). Additionally, these primer pairs were found to have no or only few mismatches to AMF sequences and were able to consistently describe AMF communities from apple roots. However, whereas most high-quality AMF sequences were obtained for AMV4.5NF-AMDGR, our results also suggest that this primer pair favored amplification of Glomeraceae sequences at the expense of Ambisporaceae, Claroideoglomeraceae and Paraglomeraceae sequences. Furthermore, we demonstrate the complementary specificity of AMV4.5NF-AMDGR with AML1-AML2, and of AMV4.5NF-AMDGR with NS31-AML2, making these primer combinations highly suitable for tandem use in covering the diversity of AMF communities. Copyright © 2014 Elsevier B.V. All rights reserved.
Genome-wide DNA polymorphisms in two cultivars of mei (Prunus mume sieb. et zucc.).
Sun, Lidan; Zhang, Qixiang; Xu, Zongda; Yang, Weiru; Guo, Yu; Lu, Jiuxing; Pan, Huitang; Cheng, Tangren; Cai, Ming
2013-10-06
Mei (Prunus mume Sieb. et Zucc.) is a famous ornamental plant and fruit crop grown in East Asian countries. Limited genetic resources, especially molecular markers, have hindered the progress of mei breeding projects. Here, we performed low-depth whole-genome sequencing of Prunus mume 'Fenban' and Prunus mume 'Kouzi Yudie' to identify high-quality polymorphic markers between the two cultivars on a large scale. A total of 1464.1 Mb and 1422.1 Mb of 'Fenban' and 'Kouzi Yudie' sequencing data were uniquely mapped to the mei reference genome with about 6-fold coverage, respectively. We detected a large number of putative polymorphic markers from the 196.9 Mb of sequencing data shared by the two cultivars, which together contained 200,627 SNPs, 4,900 InDels, and 7,063 SSRs. Among these markers, 38,773 SNPs, 174 InDels, and 418 SSRs were distributed in the 22.4 Mb CDS region, and 63.0% of these marker-containing CDS sequences were assigned to GO terms. Subsequently, 670 selected SNPs were validated using an Agilent's SureSelect solution phase hybridization assay. A subset of 599 SNPs was used to assess the genetic similarity of a panel of mei germplasm samples and a plum (P. salicina) cultivar, producing a set of informative diversity data. We also analyzed the frequency and distribution of detected InDels and SSRs in mei genome and validated their usefulness as DNA markers. These markers were successfully amplified in the cultivars and in their segregating progeny. A large set of high-quality polymorphic SNPs, InDels, and SSRs were identified in parallel between 'Fenban' and 'Kouzi Yudie' using low-depth whole-genome sequencing. The study presents extensive data on these polymorphic markers, which can be useful for constructing high-resolution genetic maps, performing genome-wide association studies, and designing genomic selection strategies in mei.
Bonfiglio, Silvia; Vanni, Irene; Rossella, Valeria; Truini, Anna; Lazarevic, Dejan; Dal Bello, Maria Giovanna; Alama, Angela; Mora, Marco; Rijavec, Erika; Genova, Carlo; Cittaro, Davide; Grossi, Francesco; Coco, Simona
2016-08-30
Next Generation Sequencing (NGS) has become a valuable tool for molecular landscape characterization of cancer genomes, leading to a better understanding of tumor onset and progression, and opening new avenues in translational oncology. Formalin-fixed paraffin-embedded (FFPE) tissue is the method of choice for storage of clinical samples, however low quality of FFPE genomic DNA (gDNA) can limit its use for downstream applications. To investigate the FFPE specimen suitability for NGS analysis and to establish the performance of two solution-based exome capture technologies, we compared the whole-exome sequencing (WES) data of gDNA extracted from 5 fresh frozen (FF) and 5 matched FFPE lung adenocarcinoma tissues using: SeqCap EZ Human Exome v.3.0 (Roche NimbleGen) and SureSelect XT Human All Exon v.5 (Agilent Technologies). Sequencing metrics on Illumina HiSeq were optimal for both exome systems and comparable among FFPE and FF samples, with a slight increase of PCR duplicates in FFPE, mainly in Roche NimbleGen libraries. Comparison of single nucleotide variants (SNVs) between FFPE-FF pairs reached overlapping values >90 % in both systems. Both WES showed high concordance with target re-sequencing data by Ion PGM™ in 22 lung-cancer genes, regardless the source of samples. Exon coverage of 623 cancer-related genes revealed high coverage efficiency of both kits, proposing WES as a valid alternative to target re-sequencing. High-quality and reliable data can be successfully obtained from WES of FFPE samples starting from a relatively low amount of input gDNA, suggesting the inclusion of NGS-based tests into clinical contest. In conclusion, our analysis suggests that the WES approach could be extended to a translational research context as well as to the clinic (e.g. to study rare malignancies), where the simultaneous analysis of the whole coding region of the genome may help in the detection of cancer-linked variants.
The emerging High Efficiency Video Coding standard (HEVC)
NASA Astrophysics Data System (ADS)
Raja, Gulistan; Khan, Awais
2013-12-01
High definition video (HDV) is becoming popular day by day. This paper describes the performance analysis of latest upcoming video standard known as High Efficiency Video Coding (HEVC). HEVC is designed to fulfil all the requirements for future high definition videos. In this paper, three configurations (intra only, low delay and random access) of HEVC are analyzed using various 480p, 720p and 1080p high definition test video sequences. Simulation results show the superior objective and subjective quality of HEVC.
2016-01-01
Abstract Background Metabarcoding is becoming a common tool used to assess and compare diversity of organisms in environmental samples. Identification of OTUs is one of the critical steps in the process and several taxonomy assignment methods were proposed to accomplish this task. This publication evaluates the quality of reference datasets, alongside with several alignment and phylogeny inference methods used in one of the taxonomy assignment methods, called tree-based approach. This approach assigns anonymous OTUs to taxonomic categories based on relative placements of OTUs and reference sequences on the cladogram and support that these placements receive. New information In tree-based taxonomy assignment approach, reliable identification of anonymous OTUs is based on their placement in monophyletic and highly supported clades together with identified reference taxa. Therefore, it requires high quality reference dataset to be used. Resolution of phylogenetic trees is strongly affected by the presence of erroneous sequences as well as alignment and phylogeny inference methods used in the process. Two preparation steps are essential for the successful application of tree-based taxonomy assignment approach. Curated collections of genetic information do include erroneous sequences. These sequences have detrimental effect on the resolution of cladograms used in tree-based approach. They must be identified and excluded from the reference dataset beforehand. Various combinations of multiple sequence alignment and phylogeny inference methods provide cladograms with different topology and bootstrap support. These combinations of methods need to be tested in order to determine the one that gives highest resolution for the particular reference dataset. Completing the above mentioned preparation steps is expected to decrease the number of unassigned OTUs and thus improve the results of the tree-based taxonomy assignment approach. PMID:27932919
Georges, Arthur; Li, Qiye; Lian, Jinmin; O'Meally, Denis; Deakin, Janine; Wang, Zongji; Zhang, Pei; Fujita, Matthew; Patel, Hardip R; Holleley, Clare E; Zhou, Yang; Zhang, Xiuwen; Matsubara, Kazumi; Waters, Paul; Graves, Jennifer A Marshall; Sarre, Stephen D; Zhang, Guojie
2015-01-01
The lizards of the family Agamidae are one of the most prominent elements of the Australian reptile fauna. Here, we present a genomic resource built on the basis of a wild-caught male ZZ central bearded dragon Pogona vitticeps. The genomic sequence for P. vitticeps, generated on the Illumina HiSeq 2000 platform, comprised 317 Gbp (179X raw read depth) from 13 insert libraries ranging from 250 bp to 40 kbp. After filtering for low-quality and duplicated reads, 146 Gbp of data (83X) was available for assembly. Exceptionally high levels of heterozygosity (0.85 % of single nucleotide polymorphisms plus sequence insertions or deletions) complicated assembly; nevertheless, 96.4 % of reads mapped back to the assembled scaffolds, indicating that the assembly included most of the sequenced genome. Length of the assembly was 1.8 Gbp in 545,310 scaffolds (69,852 longer than 300 bp), the longest being 14.68 Mbp. N50 was 2.29 Mbp. Genes were annotated on the basis of de novo prediction, similarity to the green anole Anolis carolinensis, Gallus gallus and Homo sapiens proteins, and P. vitticeps transcriptome sequence assemblies, to yield 19,406 protein-coding genes in the assembly, 63 % of which had intact open reading frames. Our assembly captured 99 % (246 of 248) of core CEGMA genes, with 93 % (231) being complete. The quality of the P. vitticeps assembly is comparable or superior to that of other published squamate genomes, and the annotated P. vitticeps genome can be accessed through a genome browser available at https://genomics.canberra.edu.au.
Wyllie, David H; Sanderson, Nicholas; Myers, Richard; Peto, Tim; Robinson, Esther; Crook, Derrick W; Smith, E Grace; Walker, A Sarah
2018-06-06
Contact tracing requires reliable identification of closely related bacterial isolates. When we noticed the reporting of artefactual variation between M. tuberculosis isolates during routine next generation sequencing of Mycobacterium spp, we investigated its basis in 2,018 consecutive M. tuberculosis isolates. In the routine process used, clinical samples were decontaminated and inoculated into broth cultures; from positive broth cultures DNA was extracted, sequenced, reads mapped, and consensus sequences determined. We investigated the process of consensus sequence determination, which selects the most common nucleotide at each position. Having determined the high-quality read depth and depth of minor variants across 8,006 M. tuberculosis genomic regions, we quantified the relationship between the minor variant depth and the amount of non-Mycobacterial bacterial DNA, which originates from commensal microbes killed during sample decontamination. In the presence of non-Mycobacterial bacterial DNA, we found significant increases in minor variant frequencies of more than 1.5 fold in 242 regions covering 5.1% of the M. tuberculosis genome. Included within these were four high variation regions strongly influenced by the amount of non-Mycobacterial bacterial DNA. Excluding these four regions from pairwise distance comparisons reduced biologically implausible variation from 5.2% to 0% in an independent validation set derived from 226 individuals. Thus, we have demonstrated an approach identifying critical genomic regions contributing to clinically relevant artefactual variation in bacterial similarity searches. The approach described monitors the outputs of the complex multi-step laboratory and bioinformatics process, allows periodic process adjustments, and will have application to quality control of routine bacterial genomics. Copyright © 2018 Wyllie et al.
Holovachov, Oleksandr
2016-01-01
Metabarcoding is becoming a common tool used to assess and compare diversity of organisms in environmental samples. Identification of OTUs is one of the critical steps in the process and several taxonomy assignment methods were proposed to accomplish this task. This publication evaluates the quality of reference datasets, alongside with several alignment and phylogeny inference methods used in one of the taxonomy assignment methods, called tree-based approach. This approach assigns anonymous OTUs to taxonomic categories based on relative placements of OTUs and reference sequences on the cladogram and support that these placements receive. In tree-based taxonomy assignment approach, reliable identification of anonymous OTUs is based on their placement in monophyletic and highly supported clades together with identified reference taxa. Therefore, it requires high quality reference dataset to be used. Resolution of phylogenetic trees is strongly affected by the presence of erroneous sequences as well as alignment and phylogeny inference methods used in the process. Two preparation steps are essential for the successful application of tree-based taxonomy assignment approach. Curated collections of genetic information do include erroneous sequences. These sequences have detrimental effect on the resolution of cladograms used in tree-based approach. They must be identified and excluded from the reference dataset beforehand.Various combinations of multiple sequence alignment and phylogeny inference methods provide cladograms with different topology and bootstrap support. These combinations of methods need to be tested in order to determine the one that gives highest resolution for the particular reference dataset.Completing the above mentioned preparation steps is expected to decrease the number of unassigned OTUs and thus improve the results of the tree-based taxonomy assignment approach.
Poojari, Sudarsana; Alabi, Olufemi J.; Fofanov, Viacheslav Y.; Naidu, Rayapati A.
2013-01-01
A graft-transmissible disease displaying red veins, red blotches and total reddening of leaves in red-berried wine grape (Vitis vinifera L.) cultivars was observed in commercial vineyards. Next-generation sequencing technology was used to identify etiological agent(s) associated with this emerging disease, designated as grapevine redleaf disease (GRD). High quality RNA extracted from leaves of grape cultivars Merlot and Cabernet Franc with and without GRD symptoms was used to prepare cDNA libraries. Assembly of highly informative sequence reads generated from Illumina sequencing of cDNA libraries, followed by bioinformatic analyses of sequence contigs resulted in specific identification of taxonomically disparate viruses and viroids in samples with and without GRD symptoms. A single-stranded DNA virus, tentatively named Grapevine redleaf-associated virus (GRLaV), and Grapevine fanleaf virus were detected only in grapevines showing GRD symptoms. In contrast, Grapevine rupestris stem pitting-associated virus, Hop stunt viroid, Grapevine yellow speckle viroid 1, Citrus exocortis viroid and Citrus exocortis Yucatan viroid were present in both symptomatic and non-symptomatic grapevines. GRLaV was transmitted by the Virginia creeper leafhopper (Erythroneura ziczac Walsh) from grapevine-to-grapevine under greenhouse conditions. Molecular and phylogenetic analyses indicated that GRLaV, almost identical to recently reported Grapevine Cabernet Franc-associated virus from New York and Grapevine red blotch-associated virus from California, represents an evolutionarily distinct lineage in the family Geminiviridae with genome characteristics distinct from other leafhopper-transmitted geminiviruses. GRD significantly reduced fruit yield and affected berry quality parameters demonstrating negative impacts of the disease. Higher quantities of carbohydrates were present in symptomatic leaves suggesting their possible role in the expression of redleaf symptoms. PMID:23755117
Lossy compression of quality scores in genomic data.
Cánovas, Rodrigo; Moffat, Alistair; Turpin, Andrew
2014-08-01
Next-generation sequencing technologies are revolutionizing medicine. Data from sequencing technologies are typically represented as a string of bases, an associated sequence of per-base quality scores and other metadata, and in aggregate can require a large amount of space. The quality scores show how accurate the bases are with respect to the sequencing process, that is, how confident the sequencer is of having called them correctly, and are the largest component in datasets in which they are retained. Previous research has examined how to store sequences of bases effectively; here we add to that knowledge by examining methods for compressing quality scores. The quality values originate in a continuous domain, and so if a fidelity criterion is introduced, it is possible to introduce flexibility in the way these values are represented, allowing lossy compression over the quality score data. We present existing compression options for quality score data, and then introduce two new lossy techniques. Experiments measuring the trade-off between compression ratio and information loss are reported, including quantifying the effect of lossy representations on a downstream application that carries out single nucleotide polymorphism and insert/deletion detection. The new methods are demonstrably superior to other techniques when assessed against the spectrum of possible trade-offs between storage required and fidelity of representation. An implementation of the methods described here is available at https://github.com/rcanovas/libCSAM. rcanovas@student.unimelb.edu.au Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Combinatorial Pooling Enables Selective Sequencing of the Barley Gene Space
Lonardi, Stefano; Duma, Denisa; Alpert, Matthew; Cordero, Francesca; Beccuti, Marco; Bhat, Prasanna R.; Wu, Yonghui; Ciardo, Gianfranco; Alsaihati, Burair; Ma, Yaqin; Wanamaker, Steve; Resnik, Josh; Bozdag, Serdar; Luo, Ming-Cheng; Close, Timothy J.
2013-01-01
For the vast majority of species – including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding. PMID:23592960
Combinatorial pooling enables selective sequencing of the barley gene space.
Lonardi, Stefano; Duma, Denisa; Alpert, Matthew; Cordero, Francesca; Beccuti, Marco; Bhat, Prasanna R; Wu, Yonghui; Ciardo, Gianfranco; Alsaihati, Burair; Ma, Yaqin; Wanamaker, Steve; Resnik, Josh; Bozdag, Serdar; Luo, Ming-Cheng; Close, Timothy J
2013-04-01
For the vast majority of species - including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding.
An optimized protocol for generation and analysis of Ion Proton sequencing reads for RNA-Seq.
Yuan, Yongxian; Xu, Huaiqian; Leung, Ross Ka-Kit
2016-05-26
Previous studies compared running cost, time and other performance measures of popular sequencing platforms. However, comprehensive assessment of library construction and analysis protocols for Proton sequencing platform remains unexplored. Unlike Illumina sequencing platforms, Proton reads are heterogeneous in length and quality. When sequencing data from different platforms are combined, this can result in reads with various read length. Whether the performance of the commonly used software for handling such kind of data is satisfactory is unknown. By using universal human reference RNA as the initial material, RNaseIII and chemical fragmentation methods in library construction showed similar result in gene and junction discovery number and expression level estimated accuracy. In contrast, sequencing quality, read length and the choice of software affected mapping rate to a much larger extent. Unspliced aligner TMAP attained the highest mapping rate (97.27 % to genome, 86.46 % to transcriptome), though 47.83 % of mapped reads were clipped. Long reads could paradoxically reduce mapping in junctions. With reference annotation guide, the mapping rate of TopHat2 significantly increased from 75.79 to 92.09 %, especially for long (>150 bp) reads. Sailfish, a k-mer based gene expression quantifier attained highly consistent results with that of TaqMan array and highest sensitivity. We provided for the first time, the reference statistics of library preparation methods, gene detection and quantification and junction discovery for RNA-Seq by the Ion Proton platform. Chemical fragmentation performed equally well with the enzyme-based one. The optimal Ion Proton sequencing options and analysis software have been evaluated.
Chen, Xianfeng; Johnson, Stephen; Jeraldo, Patricio; Wang, Junwen; Chia, Nicholas; Kocher, Jean-Pierre A; Chen, Jun
2018-03-01
Illumina paired-end sequencing has been increasingly popular for 16S rRNA gene-based microbiota profiling. It provides higher phylogenetic resolution than single-end reads due to a longer read length. However, the reverse read (R2) often has significant low base quality, and a large proportion of R2s will be discarded after quality control, resulting in a mixture of paired-end and single-end reads. A typical 16S analysis pipeline usually processes either paired-end or single-end reads but not a mixture. Thus, the quantification accuracy and statistical power will be reduced due to the loss of a large amount of reads. As a result, rare taxa may not be detectable with the paired-end approach, or low taxonomic resolution will result in a single-end approach. To have both the higher phylogenetic resolution provided by paired-end reads and the higher sequence coverage by single-end reads, we propose a novel OTU-picking pipeline, hybrid-denovo, that can process a hybrid of single-end and paired-end reads. Using high-quality paired-end reads as a gold standard, we show that hybrid-denovo achieved the highest correlation with the gold standard and performed better than the approaches based on paired-end or single-end reads in terms of quantifying the microbial diversity and taxonomic abundances. By applying our method to a rheumatoid arthritis (RA) data set, we demonstrated that hybrid-denovo captured more microbial diversity and identified more RA-associated taxa than a paired-end or single-end approach. Hybrid-denovo utilizes both paired-end and single-end 16S sequencing reads and is recommended for 16S rRNA gene targeted paired-end sequencing data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hu, Lingzhi, E-mail: hlingzhi@gmail.com, E-mail: raymond.muzic@case.edu; Traughber, Melanie; Su, Kuan-Hao
Purpose: The ultrashort echo-time (UTE) sequence is a promising MR pulse sequence for imaging cortical bone which is otherwise difficult to image using conventional MR sequences and also poses strong attenuation for photons in radiation therapy and PET imaging. The authors report here a systematic characterization of cortical bone signal decay and a scanning time optimization strategy for the UTE sequence through k-space undersampling, which can result in up to a 75% reduction in acquisition time. Using the undersampled UTE imaging sequence, the authors also attempted to quantitatively investigate the MR properties of cortical bone in healthy volunteers, thus demonstratingmore » the feasibility of using such a technique for generating bone-enhanced images which can be used for radiation therapy planning and attenuation correction with PET/MR. Methods: An angularly undersampled, radially encoded UTE sequence was used for scanning the brains of healthy volunteers. Quantitative MR characterization of tissue properties, including water fraction and R2{sup ∗} = 1/T2{sup ∗}, was performed by analyzing the UTE images acquired at multiple echo times. The impact of different sampling rates was evaluated through systematic comparison of the MR image quality, bone-enhanced image quality, image noise, water fraction, and R2{sup ∗} of cortical bone. Results: A reduced angular sampling rate of the UTE trajectory achieves acquisition durations in proportion to the sampling rate and in as short as 25% of the time required for full sampling using a standard Cartesian acquisition, while preserving unique MR contrast within the skull at the cost of a minimal increase in noise level. The R2{sup ∗} of human skull was measured as 0.2–0.3 ms{sup −1} depending on the specific region, which is more than ten times greater than the R2{sup ∗} of soft tissue. The water fraction in human skull was measured to be 60%–80%, which is significantly less than the >90% water fraction in brain. High-quality, bone-enhanced images can be generated using a reduced sampled UTE sequence with no visible compromise in image quality and they preserved bone-to-air contrast with as low as a 25% sampling rate. Conclusions: This UTE strategy with angular undersampling preserves the image quality and contrast of cortical bone, while reducing the total scanning time by as much as 75%. The quantitative results of R2{sup ∗} and the water fraction of skull based on Dixon analysis of UTE images acquired at multiple echo times provide guidance for the clinical adoption and further parameter optimization of the UTE sequence when used for radiation therapy and MR-based PET attenuation correction.« less
Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes.
Oyola, Samuel O; Otto, Thomas D; Gu, Yong; Maslen, Gareth; Manske, Magnus; Campino, Susana; Turner, Daniel J; Macinnis, Bronwyn; Kwiatkowski, Dominic P; Swerdlow, Harold P; Quail, Michael A
2012-01-03
Massively parallel sequencing technology is revolutionizing approaches to genomic and genetic research. Since its advent, the scale and efficiency of Next-Generation Sequencing (NGS) has rapidly improved. In spite of this success, sequencing genomes or genomic regions with extremely biased base composition is still a great challenge to the currently available NGS platforms. The genomes of some important pathogenic organisms like Plasmodium falciparum (high AT content) and Mycobacterium tuberculosis (high GC content) display extremes of base composition. The standard library preparation procedures that employ PCR amplification have been shown to cause uneven read coverage particularly across AT and GC rich regions, leading to problems in genome assembly and variation analyses. Alternative library-preparation approaches that omit PCR amplification require large quantities of starting material and hence are not suitable for small amounts of DNA/RNA such as those from clinical isolates. We have developed and optimized library-preparation procedures suitable for low quantity starting material and tolerant to extremely high AT content sequences. We have used our optimized conditions in parallel with standard methods to prepare Illumina sequencing libraries from a non-clinical and a clinical isolate (containing ~53% host contamination). By analyzing and comparing the quality of sequence data generated, we show that our optimized conditions that involve a PCR additive (TMAC), produces amplified libraries with improved coverage of extremely AT-rich regions and reduced bias toward GC neutral templates. We have developed a robust and optimized Next-Generation Sequencing library amplification method suitable for extremely AT-rich genomes. The new amplification conditions significantly reduce bias and retain the complexity of either extremes of base composition. This development will greatly benefit sequencing clinical samples that often require amplification due to low mass of DNA starting material.
Prediction of pork quality parameters by applying fractals and data mining on MRI.
Caballero, Daniel; Pérez-Palacios, Trinidad; Caro, Andrés; Amigo, José Manuel; Dahl, Anders B; ErsbØll, Bjarne K; Antequera, Teresa
2017-09-01
This work firstly investigates the use of MRI, fractal algorithms and data mining techniques to determine pork quality parameters non-destructively. The main objective was to evaluate the capability of fractal algorithms (Classical Fractal algorithm, CFA; Fractal Texture Algorithm, FTA and One Point Fractal Texture Algorithm, OPFTA) to analyse MRI in order to predict quality parameters of loin. In addition, the effect of the sequence acquisition of MRI (Gradient echo, GE; Spin echo, SE and Turbo 3D, T3D) and the predictive technique of data mining (Isotonic regression, IR and Multiple linear regression, MLR) were analysed. Both fractal algorithm, FTA and OPFTA are appropriate to analyse MRI of loins. The sequence acquisition, the fractal algorithm and the data mining technique seems to influence on the prediction results. For most physico-chemical parameters, prediction equations with moderate to excellent correlation coefficients were achieved by using the following combinations of acquisition sequences of MRI, fractal algorithms and data mining techniques: SE-FTA-MLR, SE-OPFTA-IR, GE-OPFTA-MLR, SE-OPFTA-MLR, with the last one offering the best prediction results. Thus, SE-OPFTA-MLR could be proposed as an alternative technique to determine physico-chemical traits of fresh and dry-cured loins in a non-destructive way with high accuracy. Copyright © 2017. Published by Elsevier Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McIlwain, Sean J.; Peris, Davis; Sardi, Maria
The genome sequences of more than 100 strains of the yeast Saccharomyces cerevisiae have been published. Unfortunately, most of these genome assemblies contain dozens to hundreds of gaps at repetitive sequences, including transposable elements, tRNAs, and subtelomeric regions, which is where novel genes generally reside. Relatively few strains have been chosen for genome sequencing based on their biofuel production potential, leaving an additional knowledge gap. Here, we describe the nearly complete genome sequence of GLBRCY22-3 (Y22-3), a strain of S. cerevisiae derived from the stress-tolerant wild strain NRRL YB-210 and subsequently engineered for xylose metabolism. After benchmarking several genome assemblymore » approaches, we developed a pipeline to integrate Pacific Biosciences (PacBio) and Illumina sequencing data and achieved one of the highest quality genome assemblies for any S. cerevisiae strain. Specifically, the contig N50 is 693 kbp, and the sequences of most chromosomes, the mitochondrial genome, and the 2-micron plasmid are complete. Our annotation predicts 92 genes that are not present in the reference genome of the laboratory strain S288c, over 70% of which were expressed. We predicted functions for 43 of these genes, 28 of which were previously uncharacterized and unnamed. Remarkably, many of these genes are predicted to be involved in stress tolerance and carbon metabolism and are shared with a Brazilian bioethanol production strain, even though the strains differ dramatically at most genetic loci. Lastly, the Y22-3 genome sequence provides an exceptionally high-quality resource for basic and applied research in bioenergy and genetics.« less
McIlwain, Sean J.; Peris, Davis; Sardi, Maria; ...
2016-04-20
The genome sequences of more than 100 strains of the yeast Saccharomyces cerevisiae have been published. Unfortunately, most of these genome assemblies contain dozens to hundreds of gaps at repetitive sequences, including transposable elements, tRNAs, and subtelomeric regions, which is where novel genes generally reside. Relatively few strains have been chosen for genome sequencing based on their biofuel production potential, leaving an additional knowledge gap. Here, we describe the nearly complete genome sequence of GLBRCY22-3 (Y22-3), a strain of S. cerevisiae derived from the stress-tolerant wild strain NRRL YB-210 and subsequently engineered for xylose metabolism. After benchmarking several genome assemblymore » approaches, we developed a pipeline to integrate Pacific Biosciences (PacBio) and Illumina sequencing data and achieved one of the highest quality genome assemblies for any S. cerevisiae strain. Specifically, the contig N50 is 693 kbp, and the sequences of most chromosomes, the mitochondrial genome, and the 2-micron plasmid are complete. Our annotation predicts 92 genes that are not present in the reference genome of the laboratory strain S288c, over 70% of which were expressed. We predicted functions for 43 of these genes, 28 of which were previously uncharacterized and unnamed. Remarkably, many of these genes are predicted to be involved in stress tolerance and carbon metabolism and are shared with a Brazilian bioethanol production strain, even though the strains differ dramatically at most genetic loci. Lastly, the Y22-3 genome sequence provides an exceptionally high-quality resource for basic and applied research in bioenergy and genetics.« less
Pilkington, Sarah M; Crowhurst, Ross; Hilario, Elena; Nardozza, Simona; Fraser, Lena; Peng, Yongyan; Gunaseelan, Kularajathevan; Simpson, Robert; Tahir, Jibran; Deroles, Simon C; Templeton, Kerry; Luo, Zhiwei; Davy, Marcus; Cheng, Canhong; McNeilage, Mark; Scaglione, Davide; Liu, Yifei; Zhang, Qiong; Datson, Paul; De Silva, Nihal; Gardiner, Susan E; Bassett, Heather; Chagné, David; McCallum, John; Dzierzon, Helge; Deng, Cecilia; Wang, Yen-Yi; Barron, Lorna; Manako, Kelvina; Bowen, Judith; Foster, Toshi M; Erridge, Zoe A; Tiffin, Heather; Waite, Chethi N; Davies, Kevin M; Grierson, Ella P; Laing, William A; Kirk, Rebecca; Chen, Xiuyin; Wood, Marion; Montefiori, Mirco; Brummell, David A; Schwinn, Kathy E; Catanach, Andrew; Fullerton, Christina; Li, Dawei; Meiyalaghan, Sathiyamoorthy; Nieuwenhuizen, Niels; Read, Nicola; Prakash, Roneel; Hunter, Don; Zhang, Huaibi; McKenzie, Marian; Knäbel, Mareike; Harris, Alastair; Allan, Andrew C; Gleave, Andrew; Chen, Angela; Janssen, Bart J; Plunkett, Blue; Ampomah-Dwamena, Charles; Voogd, Charlotte; Leif, Davin; Lafferty, Declan; Souleyre, Edwige J F; Varkonyi-Gasic, Erika; Gambi, Francesco; Hanley, Jenny; Yao, Jia-Long; Cheung, Joey; David, Karine M; Warren, Ben; Marsh, Ken; Snowden, Kimberley C; Lin-Wang, Kui; Brian, Lara; Martinez-Sanchez, Marcela; Wang, Mindy; Ileperuma, Nadeesha; Macnee, Nikolai; Campin, Robert; McAtee, Peter; Drummond, Revel S M; Espley, Richard V; Ireland, Hilary S; Wu, Rongmei; Atkinson, Ross G; Karunairetnam, Sakuntala; Bulley, Sean; Chunkath, Shayhan; Hanley, Zac; Storey, Roy; Thrimawithana, Amali H; Thomson, Susan; David, Charles; Testolin, Raffaele; Huang, Hongwen; Hellens, Roger P; Schaffer, Robert J
2018-04-16
Most published genome sequences are drafts, and most are dominated by computational gene prediction. Draft genomes typically incorporate considerable sequence data that are not assigned to chromosomes, and predicted genes without quality confidence measures. The current Actinidia chinensis (kiwifruit) 'Hongyang' draft genome has 164 Mb of sequences unassigned to pseudo-chromosomes, and omissions have been identified in the gene models. A second genome of an A. chinensis (genotype Red5) was fully sequenced. This new sequence resulted in a 554.0 Mb assembly with all but 6 Mb assigned to pseudo-chromosomes. Pseudo-chromosomal comparisons showed a considerable number of translocation events have occurred following a whole genome duplication (WGD) event some consistent with centromeric Robertsonian-like translocations. RNA sequencing data from 12 tissues and ab initio analysis informed a genome-wide manual annotation, using the WebApollo tool. In total, 33,044 gene loci represented by 33,123 isoforms were identified, named and tagged for quality of evidential support. Of these 3114 (9.4%) were identical to a protein within 'Hongyang' The Kiwifruit Information Resource (KIR v2). Some proportion of the differences will be varietal polymorphisms. However, as most computationally predicted Red5 models required manual re-annotation this proportion is expected to be small. The quality of the new gene models was tested by fully sequencing 550 cloned 'Hort16A' cDNAs and comparing with the predicted protein models for Red5 and both the original 'Hongyang' assembly and the revised annotation from KIR v2. Only 48.9% and 63.5% of the cDNAs had a match with 90% identity or better to the original and revised 'Hongyang' annotation, respectively, compared with 90.9% to the Red5 models. Our study highlights the need to take a cautious approach to draft genomes and computationally predicted genes. Our use of the manual annotation tool WebApollo facilitated manual checking and correction of gene models enabling improvement of computational prediction. This utility was especially relevant for certain types of gene families such as the EXPANSIN like genes. Finally, this high quality gene set will supply the kiwifruit and general plant community with a new tool for genomics and other comparative analysis.
Draft genome of the reindeer (Rangifer tarandus).
Li, Zhipeng; Lin, Zeshan; Ba, Hengxing; Chen, Lei; Yang, Yongzhi; Wang, Kun; Qiu, Qiang; Wang, Wen; Li, Guangyu
2017-12-01
The reindeer (Rangifer tarandus) is the only fully domesticated species in the Cervidae family, and it is the only cervid with a circumpolar distribution. Unlike all other cervids, female reindeer, as well as males, regularly grow cranial appendages (antlers, the defining characteristics of cervids). Moreover, reindeer milk contains more protein and less lactose than bovids' milk. A high-quality reference genome of this species will assist efforts to elucidate these and other important features in the reindeer. We obtained 615 Gb (Gigabase) of usable sequences by filtering the low-quality reads of the raw data generated from the Illumina Hiseq 4000 platform, and a 2.64-Gb final assembly, representing 95.7% of the estimated genome (2.76 Gb according to k-mer analysis), including 92.6% of expected genes according to BUSCO analysis. The contig N50 and scaffold N50 sizes were 89.7 kilo base (kb) and 0.94 mega base (Mb), respectively. We annotated 21 555 protein-coding genes and 1.07 Gb of repetitive sequences by de novo and homology-based prediction. Homology-based searches detected 159 rRNA, 547 miRNA, 1339 snRNA, and 863 tRNA sequences in the genome of R. tarandus. The divergence time between R. tarandus and ancestors of Bos taurus and Capra hircus is estimated to be about 29.5 million years ago. Our results provide the first high-quality reference genome for the reindeer and a valuable resource for studying the evolution, domestication, and other unusual characteristics of the reindeer. © The Authors 2017. Published by Oxford University Press.
Danielsson, Frida; Wiking, Mikaela; Mahdessian, Diana; Skogs, Marie; Ait Blal, Hammou; Hjelmare, Martin; Stadler, Charlotte; Uhlén, Mathias; Lundberg, Emma
2013-01-04
One of the major challenges of a chromosome-centric proteome project is to explore in a systematic manner the potential proteins identified from the chromosomal genome sequence, but not yet characterized on a protein level. Here, we describe the use of RNA deep sequencing to screen human cell lines for RNA profiles and to use this information to select cell lines suitable for characterization of the corresponding gene product. In this manner, the subcellular localization of proteins can be analyzed systematically using antibody-based confocal microscopy. We demonstrate the usefulness of selecting cell lines with high expression levels of RNA transcripts to increase the likelihood of high quality immunofluorescence staining and subsequent successful subcellular localization of the corresponding protein. The results show a path to combine transcriptomics with affinity proteomics to characterize the proteins in a gene- or chromosome-centric manner.
Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.
2013-01-01
SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).
Miller, Mark P; Knaus, Brian J; Mullins, Thomas D; Haig, Susan M
2013-01-01
SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25 bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).
ProDeGe: A computational protocol for fully automated decontamination of genomes
Tennessen, Kristin; Andersen, Evan; Clingenpeel, Scott; ...
2015-06-09
Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies.more » On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). Lastly, the procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence.« less
Extracting flat-field images from scene-based image sequences using phase correlation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Caron, James N., E-mail: Caron@RSImd.com; Montes, Marcos J.; Obermark, Jerome L.
Flat-field image processing is an essential step in producing high-quality and radiometrically calibrated images. Flat-fielding corrects for variations in the gain of focal plane array electronics and unequal illumination from the system optics. Typically, a flat-field image is captured by imaging a radiometrically uniform surface. The flat-field image is normalized and removed from the images. There are circumstances, such as with remote sensing, where a flat-field image cannot be acquired in this manner. For these cases, we developed a phase-correlation method that allows the extraction of an effective flat-field image from a sequence of scene-based displaced images. The method usesmore » sub-pixel phase correlation image registration to align the sequence to estimate the static scene. The scene is removed from sequence producing a sequence of misaligned flat-field images. An average flat-field image is derived from the realigned flat-field sequence.« less
Open-pNovo: De Novo Peptide Sequencing with Thousands of Protein Modifications.
Yang, Hao; Chi, Hao; Zhou, Wen-Jing; Zeng, Wen-Feng; He, Kun; Liu, Chao; Sun, Rui-Xiang; He, Si-Min
2017-02-03
De novo peptide sequencing has improved remarkably, but sequencing full-length peptides with unexpected modifications is still a challenging problem. Here we present an open de novo sequencing tool, Open-pNovo, for de novo sequencing of peptides with arbitrary types of modifications. Although the search space increases by ∼300 times, Open-pNovo is close to or even ∼10-times faster than the other three proposed algorithms. Furthermore, considering top-1 candidates on three MS/MS data sets, Open-pNovo can recall over 90% of the results obtained by any one traditional algorithm and report 5-87% more peptides, including 14-250% more modified peptides. On a high-quality simulated data set, ∼85% peptides with arbitrary modifications can be recalled by Open-pNovo, while hardly any results can be recalled by others. In summary, Open-pNovo is an excellent tool for open de novo sequencing and has great potential for discovering unexpected modifications in the real biological applications.
Garcia-Reyero, Natàlia; Griffitt, Robert J.; Liu, Li; Kroll, Kevin J.; Farmerie, William G.; Barber, David S.; Denslow, Nancy D.
2009-01-01
A novel custom microarray for largemouth bass (Micropterus salmoides) was designed with sequences obtained from a normalized cDNA library using the 454 Life Sciences GS-20 pyrosequencer. This approach yielded in excess of 58 million bases of high-quality sequence. The sequence information was combined with 2,616 reads obtained by traditional suppressive subtractive hybridizations to derive a total of 31,391 unique sequences. Annotation and coding sequences were predicted for these transcripts where possible. 16,350 annotated transcripts were selected as target sequences for the design of the custom largemouth bass oligonucleotide microarray. The microarray was validated by examining the transcriptomic response in male largemouth bass exposed to 17β-œstradiol. Transcriptomic responses were assessed in liver and gonad, and indicated gene expression profiles typical of exposure to œstradiol. The results demonstrate the potential to rapidly create the tools necessary to assess large scale transcriptional responses in non-model species, paving the way for expanded impact of toxicogenomics in ecotoxicology. PMID:19936325
ProDeGe: A computational protocol for fully automated decontamination of genomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tennessen, Kristin; Andersen, Evan; Clingenpeel, Scott
Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies.more » On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). Lastly, the procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence.« less
Tablet—next generation sequence assembly visualization
Milne, Iain; Bayer, Micha; Cardle, Linda; Shaw, Paul; Stephen, Gordon; Wright, Frank; Marshall, David
2010-01-01
Summary: Tablet is a lightweight, high-performance graphical viewer for next-generation sequence assemblies and alignments. Supporting a range of input assembly formats, Tablet provides high-quality visualizations showing data in packed or stacked views, allowing instant access and navigation to any region of interest, and whole contig overviews and data summaries. Tablet is both multi-core aware and memory efficient, allowing it to handle assemblies containing millions of reads, even on a 32-bit desktop machine. Availability: Tablet is freely available for Microsoft Windows, Apple Mac OS X, Linux and Solaris. Fully bundled installers can be downloaded from http://bioinf.scri.ac.uk/tablet in 32- and 64-bit versions. Contact: tablet@scri.ac.uk PMID:19965881